Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds a new evaluation script to compare prompt-prefix cache behavior across worktrees; refactors multiple prompt flows to separate system/user messages; changes session-history injection to a separate system message; introduces Gemini cache-token extraction and propagation; and updates tests for these behaviors. Changes
Sequence Diagram(s)sequenceDiagram
participant Orchestrator as Orchestrator Script
participant Child as Child Process<br/>(worktree honcho_llm_call_inner)
participant PromptBuilder as Prompt Builder<br/>(system + user)
participant Provider as LLM Provider
participant Results as Metrics Collector
rect rgba(120, 180, 240, 0.5)
Orchestrator->>Child: spawn subprocess\n(pass scenario payload & namespace)
Child->>PromptBuilder: build messages\n(system_msg, user_msg, namespace)
PromptBuilder-->>Child: messages
Child->>Provider: honcho_llm_call_inner(messages)
Provider-->>Child: response + usage_metadata
Child->>Results: emit per-call metrics JSON\n(duration, tokens, cache_read/create, preview)
Orchestrator-->>Results: aggregate scenario results
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~65 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/deriver/deriver.py (1)
127-149:⚠️ Potential issue | 🟠 Major
prompt=""regresses the Gemini deriver path.Line 129 now empties
prompt, and Lines 142-148 move the entire request intomessages. That works for the text-response providers, but the Google structured-output branch insrc/utils/clients.pystill callsgenerate_content(... contents=prompt ...)whenresponse_modelis set. Withsettings.DERIVER.PROVIDER == "google", this sends an empty prompt and drops both the instructions and the conversation body. Please either keep passing the combined prompt here until that client path understandsmessages, or fix the client and add a regression test forresponse_model + messages.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/deriver/deriver.py` around lines 127 - 149, The call to honcho_llm_call in deriver.py sets prompt="" and moves instructions into messages, which breaks the Google structured-output path that still calls generate_content(... contents=prompt ...) when response_model is set; either restore a combined prompt (e.g., concatenate minimal_deriver_system_prompt(observed) + minimal_deriver_user_prompt(formatted_messages)) when calling honcho_llm_call, or (preferred) update the Google client in src/utils/clients.py (the generate_content/Google structured-output branch) to accept and use the messages parameter when response_model is present instead of only using prompt; after fixing, add a regression test that sets settings.DERIVER.PROVIDER == "google" and calls honcho_llm_call with response_model=PromptRepresentation and messages to ensure the instructions+conversation are sent and parsed correctly.
🧹 Nitpick comments (3)
src/deriver/prompts.py (1)
51-69: Composeminimal_deriver_prompt()from the split helpers.This is now the only place still hand-building the
<messages>wrapper. Reusingminimal_deriver_user_prompt()keeps the combined helper aligned withestimate_minimal_deriver_prompt_tokens()and avoids prompt drift.♻️ Proposed cleanup
def minimal_deriver_prompt( peer_id: str, messages: str, ) -> str: @@ - return c( - f""" -{minimal_deriver_system_prompt(peer_id)} - -Messages to analyze: -<messages> -{messages} -</messages> -""" - ) + return "\n\n".join( + [ + minimal_deriver_system_prompt(peer_id), + minimal_deriver_user_prompt(messages), + ] + )Based on learnings: deriver token estimation should use empty inputs against the same prompt template structure to avoid double-counting wrapper overhead.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/deriver/prompts.py` around lines 51 - 69, The combined prompt in minimal_deriver_prompt is hand-wrapping <messages> instead of reusing the split helper; change minimal_deriver_prompt to build its return by concatenating minimal_deriver_system_prompt(peer_id) with minimal_deriver_user_prompt(...) rather than manually injecting the <messages> wrapper so the template matches estimate_minimal_deriver_prompt_tokens and avoids prompt drift—locate minimal_deriver_prompt and replace the manual block with a call to minimal_deriver_user_prompt using the same arguments (and keep minimal_deriver_system_prompt(peer_id) as before).scripts/compare_prefix_cache.py (2)
259-274: Use an answerable prompt inchange_base.Lines 263, 268, and 273 ask for a city that never appears in either fixture block. That makes the only base-change scenario sensitive to hallucination/abstention behavior instead of just cache behavior, which adds noise to the latency and output-token comparison.
Suggested change
- "user_query": "What city is the user considering for a move?", + "user_query": "What exact date did the user move the launch to?", @@ - "user_query": "What city is the user considering for a move?", + "user_query": "What exact date did the user move the launch to?", @@ - "user_query": "What city is the user considering for a move?", + "user_query": "What exact date did the user move the launch to?",🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/compare_prefix_cache.py` around lines 259 - 274, The change_base scenario currently uses a user_query asking for a city that never appears in the fixture blocks, which conflates cache behavior with model hallucination/abstention; update the "change_base" entries (prime, transition, steady) to use an answerable prompt that references a city present in the fixtures (keep BASE_PREFIX, BASE_PREFIX_VARIANT, and ROLLING_HISTORY_A as-is) so the comparison focuses on cache behavior rather than factual recall—modify the "user_query" values under the "change_base" dict to a question that can be answered from the fixture content.
119-129: Add Google-style docstrings to the new helpers and dataclasses.This file introduces public dataclasses and top-level helpers without docstrings, which makes the probe contract harder to follow when someone updates the scenarios or result shape later.
As per coding guidelines,
**/*.py: Use Google style docstrings.Also applies to: 132-438
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/compare_prefix_cache.py` around lines 119 - 129, Add Google-style docstrings to each public dataclass and top-level helper in this module (including ProviderSpec and VariantSpec and the functions introduced between lines 132-438) so the probe contract is explicit. For each dataclass (ProviderSpec, VariantSpec) add a short one-line summary, followed by Args describing each field (label, provider, model for ProviderSpec; label, worktree for VariantSpec). For each top-level helper function, add a Google-style docstring with a one-line summary, Args for parameters, Returns for return values, and Raises if exceptions are expected. Ensure docstrings are placed directly above the class/function definitions and use consistent formatting across the file.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@scripts/compare_prefix_cache.py`:
- Around line 282-306: The namespace built in build_scenario_payload is
deterministic and must be salted per process run; generate a single run-level
salt (e.g., invocation_salt) once at script startup and append it to the
namespace construction inside build_scenario_payload (the namespace variable
used in the call to build_messages) so all three calls
('prime','transition','steady') share the same salted namespace for that
invocation; apply the same change to the other namespace-building function
referenced around lines 408-425 so every cache key includes the same run salt,
ensuring uniqueness across runs while preserving stability within a single run.
- Around line 25-33: The child subprocess currently expects a huge payload via
argv (CHILD_CODE using json.loads(sys.argv[1])) which breaks on Windows and has
no timeout; refactor the launcher to send the payload over stdin to the child
(keep CHILD_CODE but read from sys.stdin.buffer or sys.stdin.read instead of
sys.argv), add a bounded timeout to the subprocess invocation so provider stalls
can't hang, and wrap the parent-side json.loads(process.stdout) call in
try/except to catch JSONDecodeError and include the raw stdout/stderr in the
logged/raised error to make stdout contamination debuggable; update any calls
referencing honcho_llm_call_inner to use the stdin-based protocol and timeout
behavior.
---
Outside diff comments:
In `@src/deriver/deriver.py`:
- Around line 127-149: The call to honcho_llm_call in deriver.py sets prompt=""
and moves instructions into messages, which breaks the Google structured-output
path that still calls generate_content(... contents=prompt ...) when
response_model is set; either restore a combined prompt (e.g., concatenate
minimal_deriver_system_prompt(observed) +
minimal_deriver_user_prompt(formatted_messages)) when calling honcho_llm_call,
or (preferred) update the Google client in src/utils/clients.py (the
generate_content/Google structured-output branch) to accept and use the messages
parameter when response_model is present instead of only using prompt; after
fixing, add a regression test that sets settings.DERIVER.PROVIDER == "google"
and calls honcho_llm_call with response_model=PromptRepresentation and messages
to ensure the instructions+conversation are sent and parsed correctly.
---
Nitpick comments:
In `@scripts/compare_prefix_cache.py`:
- Around line 259-274: The change_base scenario currently uses a user_query
asking for a city that never appears in the fixture blocks, which conflates
cache behavior with model hallucination/abstention; update the "change_base"
entries (prime, transition, steady) to use an answerable prompt that references
a city present in the fixtures (keep BASE_PREFIX, BASE_PREFIX_VARIANT, and
ROLLING_HISTORY_A as-is) so the comparison focuses on cache behavior rather than
factual recall—modify the "user_query" values under the "change_base" dict to a
question that can be answered from the fixture content.
- Around line 119-129: Add Google-style docstrings to each public dataclass and
top-level helper in this module (including ProviderSpec and VariantSpec and the
functions introduced between lines 132-438) so the probe contract is explicit.
For each dataclass (ProviderSpec, VariantSpec) add a short one-line summary,
followed by Args describing each field (label, provider, model for ProviderSpec;
label, worktree for VariantSpec). For each top-level helper function, add a
Google-style docstring with a one-line summary, Args for parameters, Returns for
return values, and Raises if exceptions are expected. Ensure docstrings are
placed directly above the class/function definitions and use consistent
formatting across the file.
In `@src/deriver/prompts.py`:
- Around line 51-69: The combined prompt in minimal_deriver_prompt is
hand-wrapping <messages> instead of reusing the split helper; change
minimal_deriver_prompt to build its return by concatenating
minimal_deriver_system_prompt(peer_id) with minimal_deriver_user_prompt(...)
rather than manually injecting the <messages> wrapper so the template matches
estimate_minimal_deriver_prompt_tokens and avoids prompt drift—locate
minimal_deriver_prompt and replace the manual block with a call to
minimal_deriver_user_prompt using the same arguments (and keep
minimal_deriver_system_prompt(peer_id) as before).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: cb68a9ad-af1d-48ea-9d70-716dde799eb9
📒 Files selected for processing (8)
scripts/compare_prefix_cache.pysrc/deriver/deriver.pysrc/deriver/prompts.pysrc/dialectic/core.pysrc/utils/clients.pysrc/utils/summarizer.pytests/utils/test_clients.pytests/utils/test_summarizer.py
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/deriver/prompts.py (1)
20-35:⚠️ Potential issue | 🟠 MajorResolve explicit-rule contradiction in examples.
Line 20 says extraction must be direct/explicit, but Lines 33-35 include inference-heavy outputs (e.g., “lives in NYC”, “completed high school”). That can drive non-explicit derivations and reduce output fidelity.
Suggested prompt correction
[EXPLICIT] DEFINITION: Facts about {peer_id} that can be derived directly from their messages. + - Do not infer unstated facts from general/world knowledge. EXAMPLES: - EXPLICIT: "I just had my 25th birthday last Saturday" → "{peer_id} is 25 years old", "{peer_id}'s birthday is June 21st" -- EXPLICIT: "I took my dog for a walk in NYC" → "{peer_id} has a dog", "{peer_id} lives in NYC" -- EXPLICIT: "{peer_id} attended college" + general knowledge → "{peer_id} completed high school or equivalent" +- EXPLICIT: "I took my dog for a walk in NYC" → "{peer_id} has a dog", "{peer_id} was in NYC"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/deriver/prompts.py` around lines 20 - 35, The EXPLICIT rule conflicts with the EXAMPLES: update the prompt so examples match the explicit-only extraction requirement by editing the EXPLICIT/RULES/EXAMPLES block—either (A) tighten the EXAMPLES to only include directly stated conclusions (e.g., change "{peer_id} lives in NYC" to "{peer_id} said they were in NYC" and "{peer_id} completed high school or equivalent" to "{peer_id} said they attended college"), or (B) if you intend to allow minimal inference, explicitly relax the EXPLICIT definition to permit only logically trivial inferences and document that in the EXPLICIT header; make this change where the "EXPLICIT:" heading and the EXAMPLES lines appear so the examples and rule text are consistent.
🧹 Nitpick comments (3)
tests/utils/test_clients.py (1)
94-102: Consider adding edge case tests for completeness.The test covers the happy path but doesn't test edge cases like
Nonemetadata or non-integer values that the implementation handles. While not blocking, additional test coverage would improve confidence.🧪 Additional test cases to consider
def test_extract_gemini_cache_tokens_none_metadata(self): """None metadata should return (0, 0).""" cache_creation, cache_read = extract_gemini_cache_tokens(None) assert cache_creation == 0 assert cache_read == 0 def test_extract_gemini_cache_tokens_missing_attribute(self): """Missing cached_content_token_count should return 0 for cache_read.""" usage_metadata = Mock(spec=[]) # No attributes cache_creation, cache_read = extract_gemini_cache_tokens(usage_metadata) assert cache_creation == 0 assert cache_read == 0🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/utils/test_clients.py` around lines 94 - 102, Add unit tests to cover edge cases for extract_gemini_cache_tokens: add a test that passes None (e.g., test_extract_gemini_cache_tokens_none_metadata) and assert it returns (0, 0), and add a test that passes a Mock with no cached_content_token_count attribute (e.g., test_extract_gemini_cache_tokens_missing_attribute) and assert cache_creation and cache_read are both 0; place these alongside the existing test in tests/utils/test_clients.py and reference the extract_gemini_cache_tokens function in the assertions.scripts/compare_prefix_cache.py (1)
201-219: Consider validating that parsed components are non-empty.The current check verifies structural format but allows empty strings (e.g.,
"=:"or"label=:"). Adding a check for non-empty components would provide clearer feedback.Suggested validation
def parse_provider_spec(raw: str) -> ProviderSpec: if "=" not in raw or ":" not in raw: raise ValueError( f"Invalid provider spec {raw!r}. Expected label=provider:model" ) label, provider_model = raw.split("=", 1) provider, model = provider_model.split(":", 1) + if not label or not provider or not model: + raise ValueError( + f"Invalid provider spec {raw!r}. Label, provider, and model must be non-empty" + ) return ProviderSpec(label=label, provider=provider, model=model)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/compare_prefix_cache.py` around lines 201 - 219, The parse_provider_spec function currently only checks separators but allows empty label/provider/model parts; update parse_provider_spec to validate that label, provider, and model are non-empty after splitting and raise a ValueError with a clear message if any are empty. Specifically, after the split that produces label, provider, and model, add checks like if not label: raise ValueError(...), and similarly for provider and model, ensuring the returned ProviderSpec(label=label, provider=provider, model=model) only occurs when all three components are non-empty.src/deriver/prompts.py (1)
73-82: Use explicit exception types for token-estimate fallback.Line 81 catches all exceptions, which can mask unexpected failures. Narrow this to expected tokenizer/input errors.
Suggested exception handling update
- except Exception: + except (TypeError, ValueError): return 300As per coding guidelines: "Use explicit error handling with appropriate exception types".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/deriver/prompts.py` around lines 73 - 82, The current blanket except in the token estimate fallback hides unexpected errors; change the broad except in the block that builds prompt using minimal_deriver_system_prompt, minimal_deriver_user_prompt and calls estimate_tokens to catch only expected tokenization/input errors (e.g., ValueError and the tokenizer-specific exception your estimate_tokens wrapper raises such as TokenizationError or TokenizerError), optionally log the caught exception, and still return the 300 fallback for those cases; do not swallow other exception types so unexpected issues surface.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/utils/clients.py`:
- Around line 137-140: system messages with non-string content are being
dropped: when iterating messages (check role and msg.get("content")) the code
only appends when content is a string, so system messages containing a list
(e.g., outputs from _normalize_cacheable_system_content) are skipped and lost
for downstream Gemini calls. Fix by detecting list content for the "system" role
and merging it into system_instruction_parts (e.g., extend
system_instruction_parts with the list or join its items into a single string)
so system instructions from both strings and lists are preserved; update the
branch that currently checks isinstance(msg.get("content"), str) to also handle
list types and ensure the resulting system_instruction_parts is compatible with
the Gemini call path.
---
Outside diff comments:
In `@src/deriver/prompts.py`:
- Around line 20-35: The EXPLICIT rule conflicts with the EXAMPLES: update the
prompt so examples match the explicit-only extraction requirement by editing the
EXPLICIT/RULES/EXAMPLES block—either (A) tighten the EXAMPLES to only include
directly stated conclusions (e.g., change "{peer_id} lives in NYC" to "{peer_id}
said they were in NYC" and "{peer_id} completed high school or equivalent" to
"{peer_id} said they attended college"), or (B) if you intend to allow minimal
inference, explicitly relax the EXPLICIT definition to permit only logically
trivial inferences and document that in the EXPLICIT header; make this change
where the "EXPLICIT:" heading and the EXAMPLES lines appear so the examples and
rule text are consistent.
---
Nitpick comments:
In `@scripts/compare_prefix_cache.py`:
- Around line 201-219: The parse_provider_spec function currently only checks
separators but allows empty label/provider/model parts; update
parse_provider_spec to validate that label, provider, and model are non-empty
after splitting and raise a ValueError with a clear message if any are empty.
Specifically, after the split that produces label, provider, and model, add
checks like if not label: raise ValueError(...), and similarly for provider and
model, ensuring the returned ProviderSpec(label=label, provider=provider,
model=model) only occurs when all three components are non-empty.
In `@src/deriver/prompts.py`:
- Around line 73-82: The current blanket except in the token estimate fallback
hides unexpected errors; change the broad except in the block that builds prompt
using minimal_deriver_system_prompt, minimal_deriver_user_prompt and calls
estimate_tokens to catch only expected tokenization/input errors (e.g.,
ValueError and the tokenizer-specific exception your estimate_tokens wrapper
raises such as TokenizationError or TokenizerError), optionally log the caught
exception, and still return the 300 fallback for those cases; do not swallow
other exception types so unexpected issues surface.
In `@tests/utils/test_clients.py`:
- Around line 94-102: Add unit tests to cover edge cases for
extract_gemini_cache_tokens: add a test that passes None (e.g.,
test_extract_gemini_cache_tokens_none_metadata) and assert it returns (0, 0),
and add a test that passes a Mock with no cached_content_token_count attribute
(e.g., test_extract_gemini_cache_tokens_missing_attribute) and assert
cache_creation and cache_read are both 0; place these alongside the existing
test in tests/utils/test_clients.py and reference the
extract_gemini_cache_tokens function in the assertions.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 3baef6e7-24dd-42c4-9fc0-b8558f6cbcc3
📒 Files selected for processing (4)
scripts/compare_prefix_cache.pysrc/deriver/prompts.pysrc/utils/clients.pytests/utils/test_clients.py
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
src/deriver/prompts.py (1)
14-37: Movepeer_idinto the user message if you want the system prefix to stay cache-friendly.The caller already sends separate system and user messages, but this system template still bakes
peer_idinto the cacheable block. That means the expensive prefix changes for every peer, so cache reuse is mostly limited to repeated calls for the same peer instead of amortizing across peers.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/deriver/prompts.py` around lines 14 - 37, The system prompt function minimal_deriver_system_prompt currently interpolates peer_id making the system message non-cacheable; remove peer_id interpolation from minimal_deriver_system_prompt so it returns a generic, cache-friendly system template (no peer-specific values), and add a separate user-level prompt function or message (e.g., minimal_deriver_user_prompt(peer_id) or caller-side user message) that injects the peer_id into the user content; update the caller to send the generic system prompt plus the peer-specific user prompt (containing peer_id) so the expensive system prefix is reusable across peers.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/utils/clients.py`:
- Around line 161-167: The current Gemini conversion drops any non-"text" blocks
from msg["content"], causing loss of tool context like tool_use/tool_result when
retries switch providers; update the loop that builds text_parts/gemini_contents
so that instead of only appending {"text": ...} for text blocks it preserves
other block types by including them in the parts array (e.g., append the block
payload or a minimal mapping with block["type"] and its data) so gemini_contents
records both plain text parts and non-text parts (tool_use/tool_result) for the
given role without altering conversation_messages.
---
Nitpick comments:
In `@src/deriver/prompts.py`:
- Around line 14-37: The system prompt function minimal_deriver_system_prompt
currently interpolates peer_id making the system message non-cacheable; remove
peer_id interpolation from minimal_deriver_system_prompt so it returns a
generic, cache-friendly system template (no peer-specific values), and add a
separate user-level prompt function or message (e.g.,
minimal_deriver_user_prompt(peer_id) or caller-side user message) that injects
the peer_id into the user content; update the caller to send the generic system
prompt plus the peer-specific user prompt (containing peer_id) so the expensive
system prefix is reusable across peers.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 6bf7c450-3654-4ca8-8b13-b6533ebe5687
📒 Files selected for processing (4)
scripts/compare_prefix_cache.pysrc/deriver/prompts.pysrc/utils/clients.pytests/utils/test_clients.py
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/utils/clients.py`:
- Around line 182-196: The current code constructs function_response.name using
tool_use_id which is incorrect; update the logic that builds the parts list for
block_type == "tool_result" to look up the corresponding "tool_use" block in the
message history by matching its id (the value currently in
block.get("tool_use_id")), extract the declared function/tool name from that
tool_use block (e.g., a "tool" or "name" field inside the matching tool_use
entry) and set function_response.name to that actual function name; if no
matching tool_use or name field is found, fall back to the existing tool_use_id
string to avoid breaking, and keep the existing tool_result json/string
conversion and is_error mapping.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 9b63882e-3564-42c7-a471-33d88f8695a7
📒 Files selected for processing (2)
src/utils/clients.pytests/utils/test_clients.py
src/deriver/prompts.py
Outdated
| EXAMPLES: | ||
| - EXPLICIT: "I just had my 25th birthday last Saturday" → "{peer_id} is 25 years old", "{peer_id}'s birthday is June 21st" | ||
| - EXPLICIT: "I took my dog for a walk in NYC" → "{peer_id} has a dog", "{peer_id} lives in NYC" | ||
| - EXPLICIT: "{peer_id} attended college" + general knowledge → "{peer_id} completed high school or equivalent" | ||
| - EXPLICIT: "I just had my 25th birthday last Saturday" → "{peer_id} is 25 years old", "{peer_id} celebrated their birthday last Saturday" | ||
| - EXPLICIT: "I took my dog for a walk in NYC" → "{peer_id} has a dog", "{peer_id} said they took their dog for a walk in NYC" | ||
| - EXPLICIT: "{peer_id} attended college" → "{peer_id} said they attended college" |
There was a problem hiding this comment.
i would not make these example changes because the goal of ther deriver is to extract meaningful explicit facts + not to repeat back the raw message
src/deriver/prompts.py
Outdated
| Returns: | ||
| Formatted prompt string for observation extraction. | ||
| """ | ||
| def minimal_deriver_system_prompt(peer_id: str) -> str: |
There was a problem hiding this comment.
keeping the peer_id in the system prompt results in the system prompt only being cached for deriver calls to the same peer. we could move the peer context to minimal_deriver_user_prompt but not sure if the deriver would perform worse in this case. what do you think @VVoruganti
| def _build_gemini_contents_from_messages( | ||
| messages: list[dict[str, Any]], |
There was a problem hiding this comment.
should we use this in the streaming gemini path too?
|
generally looks good—I would revert the prompt examples, look into caching of the deriver prompt to see if we should extract peer_id, and handle the gemini streaming path |
src/utils/clients.py
Outdated
There was a problem hiding this comment.
in the dialectic, session_history is now its own messages block. that info would be lost here as we only takes messages[0].content. this was a pre-existing issue but accidentally worked because we bundled session history into messages[0] but we should fix this now
# Conflicts: # src/schemas/api.py
Summary
Improves prompt prefix caching by separating stable system instructions from higher-variance request content.
This keeps the cacheable prefix stable across repeated calls and adds supporting metrics/tests for cache behavior.
What Changed
Why
Previously, stable instructions and changing context were more tightly coupled. That reduces prefix-cache reuse when only part of the prompt changes.
This PR isolates stable prompt sections so provider-side prompt caching can reuse more of the unchanged prefix.
Notes
Testing
Verified locally with:
py_compileon changed filesSummary by CodeRabbit
New Features
Chores
Tests