Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds workspace-level chat: API/OpenAPI + docs, SDK client methods (Python/TS), backend route and schema, new dialectic agent base/refactor and WorkspaceDialecticAgent, workspace-scoped tools and visibility-aware search/CRUD changes, extensive tests, and CI workflow removal. Changes
Sequence Diagram(s)sequenceDiagram
actor Client
participant SDK as SDK Client
participant API as API Server
participant Agent as WorkspaceDialecticAgent
participant Tools as WorkspaceTools
participant DB as Database
participant LLM as LLM Service
Client->>SDK: chat(query, session?, reasoning_level?)
SDK->>API: POST /v3/workspaces/{id}/chat (WorkspaceChatOptions)
API->>Agent: workspace_chat(query, session, reasoning_level)
Agent->>DB: prefetch peers & observations
DB-->>Agent: relevant observations
Agent->>Tools: create_workspace_tool_executor(...)
Tools->>DB: search_memory / list_peers / get_peer_card ...
DB-->>Tools: tool results
Tools-->>Agent: tool outputs (via executor)
Agent->>LLM: honcho_llm_call(messages, tools, reasoning_level)
LLM-->>Agent: synthesized response (stream or final)
Agent->>DB: log metrics / telemetry
Agent-->>API: DialecticResponse or SSE stream
API-->>SDK: response payload / SSE events
SDK-->>Client: string or streamed content
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Fix all issues with AI agents
In `@src/dialectic/workspace.py`:
- Around line 96-100: The broad except block around the call to
format_documents_with_attribution should be narrowed to specific exceptions (for
example SQLAlchemyError and ValidationException) instead of catching Exception;
replace the bare except with except (SQLAlchemyError, ValidationException) as e
and log using structured logging (e.g., logger.warning("Failed to prefetch
workspace observations: %s", e, exc_info=True) or logger.exception) rather than
an f-string; keep the existing return None behavior. Ensure you import the
exception types used and update the handler around the
format_documents_with_attribution call and its surrounding function.
In `@src/utils/agent_tools.py`:
- Around line 2059-2093: The fallback ToolContext (peer_ctx) is created with
empty observer/observed which risks silent writes if unsafe handlers are
invoked; update execute_tool to only fall through to a curated safe subset of
_TOOL_HANDLERS (e.g., "search_messages","grep_messages", etc.) instead of
blindly calling any handler from _TOOL_HANDLERS, and explicitly reject or return
"Unknown tool" for handlers that perform writes (create_observations,
update_peer_card, delete_observations) by checking tool_name against a
SAFE_FALLTHROUGH set before calling handler(peer_ctx, ...); reference symbols:
ToolContext, peer_ctx, execute_tool, _WORKSPACE_TOOL_HANDLERS, _TOOL_HANDLERS,
and the handler names create_observations/update_peer_card/delete_observations
when implementing the guard.
- Around line 1960-2003: The workspace branch that builds premises/sources
(inside the if direction in ("premises","both") block) must add the same
fallback messages as the original _handle_get_reasoning_chain: when
doc.source_ids exists but get_documents_by_ids returns empty, append a message
like "Premises/Sources listed but not found"; when level is "deductive" and
doc.level indicates the observation is explicit (no premises) append an
"Observation is explicit (no premises)" message; when doc.source_ids is empty
append "No premises/sources recorded". Update the code around the calls to
crud.get_documents_by_ids and the output_parts append logic (referencing
doc.source_ids, level, premises/sources, and output_parts) to include these
fallback branches for both the deductive and inductive paths so the workspace
variant mirrors the original behavior.
In `@tests/test_workspace_chat.py`:
- Around line 204-230: The test test_workspace_chat_with_session_id creates a
session but never validates that creation, so add assertions after the
client.post call that creates the session (the POST to
"/v3/workspaces/{test_workspace.name}/sessions") to ensure it succeeded and
returned the expected data; specifically assert the response.status_code is
successful (e.g., 201/200) and validate the response body contains the created
session identifier/name (matching session_id) before calling /chat with that
session_id so the chat request is truly scoped to an existing session.
🧹 Nitpick comments (10)
src/utils/representation.py (1)
681-719: LGTM — clean utility function with clear attribution logic.The grouping, header differentiation, and delegation to
Representation.from_documentsall look correct. One small simplification:♻️ Optional: Use defaultdict to simplify grouping
+from collections import defaultdict + def format_documents_with_attribution( documents: Sequence[models.Document], include_ids: bool = False, ) -> str: ... - groups: dict[tuple[str, str], list[models.Document]] = {} - for doc in documents: - key = (doc.observer, doc.observed) - if key not in groups: - groups[key] = [] - groups[key].append(doc) + groups: dict[tuple[str, str], list[models.Document]] = defaultdict(list) + for doc in documents: + groups[(doc.observer, doc.observed)].append(doc)docs/v3/documentation/reference/sdk.mdx (1)
258-273: Minor:const responsedeclared twice in the TypeScript example.Lines 260 and 263 both declare
const response, which would be a compile error in actual TypeScript. Consider using a different variable name for one of them to avoid confusing readers who may copy-paste the snippet.Suggested fix
-// Query knowledge across ALL peers in the workspace -const response = await honcho.chat("What are common themes across all users?"); - -// With reasoning level -const response = await honcho.chat("Summarize all peer activity.", { +// Query knowledge across ALL peers in the workspace +const answer = await honcho.chat("What are common themes across all users?"); + +// With reasoning level +const deepAnswer = await honcho.chat("Summarize all peer activity.", { reasoningLevel: "high" });src/routers/workspaces.py (1)
256-322: Clean endpoint implementation, one observation on metrics placement.The endpoint correctly handles both streaming (SSE) and non-streaming paths, and the auth dependency is properly scoped to the workspace.
One minor asymmetry: in the streaming branch (lines 291-295),
record_dialectic_callfires before the stream is consumed, whereas in the non-streaming branch (lines 316-320) it fires after the response is ready. This means a failed streaming request will still increment the counter. If tracking completed calls matters, consider moving the streaming metrics intoformat_sse_stream(e.g., after the finaldonesentinel). That said, this appears to match the peer-level chat endpoint pattern, so it may be intentional.sdks/typescript/src/client.ts (1)
862-884:reasoningLevelcould benefit from a stricter type.The Python SDK uses
Literal["minimal", "low", "medium", "high", "max"]forreasoning_level, but here it's typed asstring. Consider using a union type or string literal type for better developer experience and type safety:reasoningLevel?: 'minimal' | 'low' | 'medium' | 'high' | 'max'Same applies to
chatStreambelow (lines 908-927).src/dialectic/workspace.py (2)
51-70: Instance-levelself.messagesandself._run_idmake this agent single-use; consider documenting or enforcing this.
self.messages(line 65) accumulates user/system messages across calls, andself._run_id(line 63) is set once. Ifanswer()oranswer_stream()is called more than once on the same instance, the second call will inherit the first query's conversation context and share the same_run_idin the emittedDialecticCompletedEvent. Meanwhile,_prepare_querygenerates a separate localrun_idfor task logging (line 111), so two different IDs track the same logical query.If single-use is intentional (one instance per query), a brief docstring note or a guard in
_prepare_querywould prevent misuse.
102-155:_prepare_querycreates a localrun_idthat shadowsself._run_id— potentially confusing.Line 111 creates a local
run_idused for task naming and performance logging, whileself._run_id(from__init__) is used in tool executor creation (line 150) and theDialecticCompletedEvent(line 207). Having two distinct IDs tracking the same query makes correlation across logs and events harder. Consider unifying them or making the distinction explicit.src/utils/agent_tools.py (4)
1838-1871: Moveformat_documents_with_attributionto the top-level imports.Line 1842 uses a local import, but
src.utils.representationis already imported at the module level (line 24). Move this to the top with the other imports for consistency.♻️ Suggested fix
At line 24, add the import:
-from src.utils.representation import Representation +from src.utils.representation import Representation, format_documents_with_attributionThen at line 1842, remove the local import:
async def _handle_search_memory_workspace( ctx: WorkspaceToolContext, tool_input: dict[str, Any] ) -> str: """Handle workspace-level search_memory tool (searches ALL peers).""" - from src.utils.representation import format_documents_with_attribution - top_k = min(tool_input.get("top_k", 20), 40)As per coding guidelines, "Follow isort conventions with absolute imports preferred."
1909-1934: Remove redundant local import.
format_new_turn_with_timestampis already imported at the module level (line 23). The local import on line 1913 is unnecessary.♻️ Suggested fix
async def _handle_get_observation_context_workspace( ctx: WorkspaceToolContext, tool_input: dict[str, Any] ) -> str: """Handle get_observation_context tool for workspace context (no session constraint).""" - from src.utils.formatting import format_new_turn_with_timestamp - messages = await get_observation_context(
2076-2106: Duplicate error-handling boilerplate.The
try/exceptpattern inexecute_tool(lines 2078–2106) is an exact copy of the one increate_tool_executor(lines 1787–1813). If you add more executor factories, this will drift. Consider extracting a shared wrapper, but this is a minor concern for now.
471-489: Dual-naming pattern is correct but consider a brief inline comment.The TOOLS dict key
"search_memory_workspace"differs from the tool's"name": "search_memory"— this works because the LLM dispatches onname, not the dict key. The same applies to"get_peer_card_by_name"/"get_peer_card". A short comment on the first occurrence would help future maintainers understand this convention.
…rough an exception Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@src/dialectic/core.py`:
- Around line 99-105: The _prefetched_conclusion_count calculation incorrectly
sums only explicit and deductive observations from explicit_repr and
derived_repr; update the logic to use the Representation.len() method which
includes explicit, deductive, inductive, and contradiction counts. Replace the
manual len(...) additions for explicit_repr and derived_repr with calls to
explicit_repr.len() and derived_repr.len() so that _prefetched_conclusion_count
= explicit_repr.len() + derived_repr.len(), preserving the existing variable
name _prefetched_conclusion_count.
In `@src/routers/workspaces.py`:
- Around line 296-312: The metrics call is currently recorded before the
streaming generator runs (so aborted/failed streams are counted) while the
non-streaming path records after successful completion; make the behavior
consistent by moving or deferring record_dialectic_call to after the stream is
proven to have started successfully—either call
prometheus_metrics.record_dialectic_call from inside workspace_chat_stream when
it yields its first event or wrap the generator returned by
workspace_chat_stream to call record_dialectic_call on the first successful
yield; reference workspace_chat_stream, workspace_chat, and
prometheus_metrics.record_dialectic_call and ensure workspace_name=workspace_id
and reasoning_level are preserved.
🧹 Nitpick comments (4)
src/utils/agent_tools.py (1)
471-515: Tool definition keys vs.namefields — intentional but worth a brief doc comment.
TOOLS["search_memory_workspace"]has"name": "search_memory"andTOOLS["get_peer_card_by_name"]has"name": "get_peer_card". The dict key is used internally for composing tool lists, while"name"is what the LLM sees. This is consistent with how_WORKSPACE_TOOL_HANDLERSdispatches by the"name"value, but the mismatch between key and name can be surprising for future contributors.src/dialectic/base.py (1)
276-327: Metrics are only logged if the consumer fully drains the async generator.
_log_response_metrics(lines 315–327) runs after theasync forloop. If a caller breaks out ofanswer_streamearly (e.g., client disconnect), the code afteryieldmay never execute (depending on generator cleanup behavior), and metrics/telemetry for that call would be lost.This is a known limitation of async generators, so it may be acceptable. If completeness of telemetry is important, consider wrapping the generator in a
try/finallyblock so metrics are logged even on early termination.♻️ Sketch: ensure metrics fire on early exit
async def answer_stream(self, query: str) -> AsyncIterator[str]: """Answer a query using agentic tool calling, streaming the response.""" tool_executor, task_name, run_id, start_time = await self._prepare_query(query) ... accumulated_content: list[str] = [] - async for chunk in response: - if chunk.content: - accumulated_content.append(chunk.content) - yield chunk.content - - self._log_response_metrics( - ... - ) + try: + async for chunk in response: + if chunk.content: + accumulated_content.append(chunk.content) + yield chunk.content + finally: + self._log_response_metrics( + task_name=task_name, + run_id=run_id, + start_time=start_time, + response_content="".join(accumulated_content), + input_tokens=response.input_tokens, + output_tokens=response.output_tokens, + cache_read_input_tokens=response.cache_read_input_tokens, + cache_creation_input_tokens=response.cache_creation_input_tokens, + tool_calls_count=len(response.tool_calls_made), + thinking_content=response.thinking_content, + iterations=response.iterations, + )src/routers/workspaces.py (1)
271-284: Idle DB session held during long-running streaming responses.The
db: AsyncSessiondependency (line 274) is acquired for thecrud.get_workspacecheck (line 284) but persists until theStreamingResponsefinishes. Sinceworkspace_chat_streamcreates its own DB session internally viatracked_db, this outer session sits idle for the entire stream duration, holding a connection pool slot.Consider running the workspace validation before constructing the streaming response and then not passing
dbto the streamed portion, or using a short-lived session just for validation.src/dialectic/core.py (1)
115-119: Logging uses f-string instead of structured logging format.Line 118 uses
logger.warning(f"Failed to prefetch observations: {e}"), which bypasses lazy log evaluation. The workspace variant (workspace.py line 77-82) was already updated to use%s-style structured logging per the coding guidelines.♻️ Align with structured logging
except Exception as e: - logger.warning(f"Failed to prefetch observations: {e}") + logger.warning( + "Failed to prefetch observations for workspace=%s: %s", + self.workspace_name, + e, + exc_info=True, + ) return NoneAs per coding guidelines: "Use proper logging with context instead of print statements."
| explicit_count: int = len(explicit_repr.explicit) + len( | ||
| explicit_repr.deductive | ||
| ) | ||
| derived_count: int = len(derived_repr.explicit) + len( | ||
| derived_repr.deductive | ||
| ) | ||
| self._prefetched_conclusion_count: int = explicit_count + derived_count |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
find . -type f -name "*.py" | head -20 | xargs rg -l "class Representation" | head -5Repository: plastic-labs/honcho
Length of output: 45
🏁 Script executed:
rg -n "class Representation" --type=py -A 40Repository: plastic-labs/honcho
Length of output: 16765
🏁 Script executed:
rg -n "def from_documents" --type=py -B 2 -A 30 | grep -A 30 "class Representation" | head -60Repository: plastic-labs/honcho
Length of output: 45
🏁 Script executed:
rg -n "def from_documents" --type=py -A 50 src/utils/representation.pyRepository: plastic-labs/honcho
Length of output: 2695
🏁 Script executed:
rg -n "def from_documents" --type=py -A 70 src/utils/representation.py | tail -30Repository: plastic-labs/honcho
Length of output: 1739
🏁 Script executed:
rg -n "def __len__|def len" --type=py src/utils/representation.pyRepository: plastic-labs/honcho
Length of output: 92
🏁 Script executed:
rg -n "def len" --type=py -A 10 src/utils/representation.pyRepository: plastic-labs/honcho
Length of output: 392
🏁 Script executed:
# Check where explicit_repr and derived_repr are created in core.py to confirm the levels parameter
rg -n "explicit_repr\|derived_repr" --type=py -B 2 -A 2 src/dialectic/core.py | head -50Repository: plastic-labs/honcho
Length of output: 45
🏁 Script executed:
sed -n '90,130p' src/dialectic/core.pyRepository: plastic-labs/honcho
Length of output: 1634
🏁 Script executed:
# Check the search_representations calls to confirm the levels parameter
sed -n '75,95p' src/dialectic/core.pyRepository: plastic-labs/honcho
Length of output: 761
Correct: _prefetched_conclusion_count misses inductive and contradiction observations.
The count sums only explicit + deductive from each representation, but derived_repr queried with levels=["deductive", "inductive", "contradiction"] contains inductive and contradiction observations that are excluded. This causes the telemetry to report lower observation counts than actually prefetched.
Use the existing Representation.len() method which accounts for all four observation types:
- explicit_count: int = len(explicit_repr.explicit) + len(
- explicit_repr.deductive
- )
- derived_count: int = len(derived_repr.explicit) + len(
- derived_repr.deductive
- )
- self._prefetched_conclusion_count: int = explicit_count + derived_count
+ self._prefetched_conclusion_count: int = explicit_repr.len() + derived_repr.len()🤖 Prompt for AI Agents
In `@src/dialectic/core.py` around lines 99 - 105, The
_prefetched_conclusion_count calculation incorrectly sums only explicit and
deductive observations from explicit_repr and derived_repr; update the logic to
use the Representation.len() method which includes explicit, deductive,
inductive, and contradiction counts. Replace the manual len(...) additions for
explicit_repr and derived_repr with calls to explicit_repr.len() and
derived_repr.len() so that _prefetched_conclusion_count = explicit_repr.len() +
derived_repr.len(), preserving the existing variable name
_prefetched_conclusion_count.
Code reviewNo issues found. Checked for bugs and CLAUDE.md compliance. |
VVoruganti
left a comment
There was a problem hiding this comment.
The structure of the PR looks good, but we need to rethink the pre-fetch data. We shouldn't be relying on the ability to run semantic queries cross representations due to lancedb and turbopuffer semantic. The agent should instead try to figure out relevant parties first and make targeted queries against specific representations.
…honcho's data model -- add unified tests to confirm behaviors
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@tests/test_workspace_chat.py`:
- Around line 915-935: The test_self_observation_header creates a Document using
the ORM field internal_metadata while fixtures use the schema field metadata,
causing inconsistency; update the test to match the model instantiation style by
using the same field the rest of the tests/fixtures expect—if other tests
construct raw models use internal_metadata (as currently used) consistently,
otherwise switch this test to use the DocumentCreate/schema field metadata;
locate the Document construction in test_self_observation_header and replace or
rename the metadata field to match the project convention so all tests use the
same field name (internal_metadata or metadata) consistently.
🧹 Nitpick comments (3)
src/dialectic/workspace.py (1)
55-64: Consider whether prefetch should be resilient to DB errors.If
crud.get_peers()orself.db.execute()throws (e.g., transient connection issue), the exception propagates and could abort the entire workspace chat. Since prefetching is an optimization (the agent can always calllist_peersas a fallback tool), wrapping this in a try/except that returnsNoneon failure would provide graceful degradation.♻️ Proposed resilient prefetch
async def _prefetch_relevant_observations(self, query: str) -> str | None: """Prefetch the list of peers so the agent can skip the list_peers tool call.""" _ = query - stmt = await crud.get_peers(workspace_name=self.workspace_name) - result = await self.db.execute(stmt) - peers = list(result.scalars().all()) - if not peers: + try: + stmt = await crud.get_peers(workspace_name=self.workspace_name) + result = await self.db.execute(stmt) + peers = list(result.scalars().all()) + except Exception: + logger.warning( + "Failed to prefetch workspace peers for workspace=%s", + self.workspace_name, + exc_info=True, + ) + return None + if not peers: return None peer_list = "\n".join(f"- {p.name}" for p in peers) return f"Found {len(peers)} peers in workspace:\n{peer_list}"src/utils/agent_tools.py (2)
1894-1931: Minor inconsistency in result count reporting.The peer-scoped
_handle_search_memory(line 1207–1238) reportsmem.len()(count from theRepresentation), while this workspace variant reportslen(documents)(raw document count) on line 1931. These can differ if documents have unrecognized levels. Consider usingmem.len()for consistency.♻️ Suggested alignment
mem = Representation.from_documents(documents) mem_str = mem.str_with_ids() if ctx.include_observation_ids else str(mem) - return f"Found {len(documents)} observations for {observer}->{observed} for query '{tool_input['query']}':\n\n{mem_str}" + return f"Found {mem.len()} observations for {observer}->{observed} for query '{tool_input['query']}':\n\n{mem_str}"
2026-2057: Local lock instead of shared registry lock is acceptable for now.
create_workspace_tool_executorcreates a localasyncio.Lock()(line 2026) rather than usingget_observation_lock()from the shared registry. This works because current workspace handlers are read-only. If write-capable workspace tools are added later, this lock won't serialize across concurrent executors for the same workspace.Consider adding a brief comment documenting this assumption:
📝 Suggested comment
- shared_lock = asyncio.Lock() + # Local lock (not from registry) because workspace handlers are read-only. + # If write-capable workspace tools are added, switch to get_observation_lock(). + shared_lock = asyncio.Lock()
|
Good catches @VVoruganti , have totally adjusted the approach for workspace-level chat and added unified tests to confirm desired behaviors |
|
Thanks for the PR. I reviewed the latest branch tip ( High-level intentThis PR introduces workspace-level chat ( What looks good
Concerns / likely failure modes
Suggested fix path
|
| observer = tool_input.get("observer", "") | ||
| observed = tool_input.get("observed", "") |
There was a problem hiding this comment.
Add validation that these are real Peers
* fix: scope tool calls to sessions that peer was in * fix: more tests * chore: (tests) add peer isolation test to unified test runner * fix: switch agent tools to reuse existing search functions * fix: loosen peer perspective restriction for workspace chat --------- Co-authored-by: Vineeth Voruganti <13438633+VVoruganti@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/utils/search.py (1)
380-411:⚠️ Potential issue | 🟠 Major
semantic_onlyshould fail fast when semantic search can’t runWhen
semantic_only=Truebut embeddings are disabled orworkspace_idis missing, full‑text search is skipped and the function silently returns an empty list. That’s a surprising result and can mask integration errors. Consider raisingValidationExceptionin that case.Suggested fix
workspace_name: str | None = filters.get("workspace_id") if filters else None + if semantic_only and ( + not settings.EMBED_MESSAGES or not isinstance(workspace_name, str) + ): + raise ValidationException( + "semantic_only requires embeddings enabled and a workspace_id filter." + ) if settings.EMBED_MESSAGES and isinstance(workspace_name, str):As per coding guidelines: Use explicit error handling with appropriate exception types; Use custom exceptions defined in src/exceptions.py with specific exception types (ResourceNotFoundException, ValidationException, etc.).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/search.py` around lines 380 - 411, In the search flow in src/utils/search.py (the block using semantic_only, settings.EMBED_MESSAGES, workspace_name from filters.get("workspace_id"), and calling _semantic_search/_fulltext_search), change behavior so that if semantic_only is True but semantic search cannot run (either settings.EMBED_MESSAGES is False or workspace_name is None/not a string) you raise a ValidationException from src/exceptions.py with a clear message (e.g., explain embeddings disabled or missing workspace_id) instead of silently returning empty results; keep existing logic that falls back to full-text only when semantic_only is False. Ensure you reference semantic_only, settings.EMBED_MESSAGES, and workspace_name in the conditional and raise the exception before skipping the semantic/fulltext calls.src/utils/agent_tools.py (1)
1239-1276:⚠️ Potential issue | 🟠 MajorFallback message search should be scoped to the observed peer.
When memory is empty, the fallback searches all visible messages for the observer, which can surface unrelated peers and blur the observer/observed boundary. Consider filtering to the observed peer (using the appropriate message peer filter key) or disabling the fallback for workspace contexts.
🛠️ Suggested guard
filters = { "workspace_id": ctx.workspace_name, "peer_perspective": observer, # Use observer for visibility scoping } + # Limit fallback to the observed peer to avoid cross-peer leakage + filters["peer_id"] = observed # use the filter key that maps to Message.peer_name🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/agent_tools.py` around lines 1239 - 1276, The fallback message-search currently built around filters and calling search should be restricted to the observed peer: when constructing filters before calling search (the filters dict used in the search(...) call), add the appropriate observed-peer filter (e.g., filters["peer_id"] or filters["observed_peer"]) populated from the context (e.g., ctx.observed, ctx.observed_peer, or similar) or skip/disable the fallback entirely for workspace-level contexts; update the logic around ctx.observer and the search/_format_message_snippets usage so the search only returns messages for the observed peer (or no fallback) to avoid surfacing unrelated peers.
🤖 Fix all issues with AI agents
Verify each finding against the current code and only fix it if needed.
In `@src/dialectic/core.py`:
- Around line 115-118: The current broad "except Exception as e:" in the
prefetch block (around ValidationException handling and logger.warning) hides
unexpected failures; replace that blanket catch with explicit exception types
you expect during prefetch (for example TimeoutError, IOError,
requests.RequestException or any domain-specific errors your prefetch uses) and
handle them (log via logger.warning(..., exc_info=True) if desired), and let any
other exceptions bubble up (re-raise) so they aren’t masked; keep the
ValidationException handling as-is.
In `@src/utils/agent_tools.py`:
- Around line 2073-2079: The current _handle_get_observation_context_workspace
calls _get_observation_context_impl with session_name=None which lets
seq_in_session adjacency pull messages across different sessions; instead,
detect when ctx.session is not provided and group target messages by their
session_name, then for each distinct session_name call
_get_observation_context_impl (passing that session_name) to compute adjacency
per session, and finally merge the per-session context results into a single
response; update _handle_get_observation_context_workspace to perform this
per-session grouping and aggregation so seq_in_session never mixes messages from
different sessions.
- Around line 1239-1276: The fallback message-search currently built around
filters and calling search should be restricted to the observed peer: when
constructing filters before calling search (the filters dict used in the
search(...) call), add the appropriate observed-peer filter (e.g.,
filters["peer_id"] or filters["observed_peer"]) populated from the context
(e.g., ctx.observed, ctx.observed_peer, or similar) or skip/disable the fallback
entirely for workspace-level contexts; update the logic around ctx.observer and
the search/_format_message_snippets usage so the search only returns messages
for the observed peer (or no fallback) to avoid surfacing unrelated peers.
In `@src/utils/search.py`:
- Around line 316-318: The return type annotation on the function that declares
parameters context_window and semantic_only is longer than 88 characters; split
the union return annotation across multiple lines (e.g., put each union arm on
its own line) and align the closing -> and ): so the signature fits the 88-char
limit while preserving the existing types list[models.Message] and
list[tuple[list[models.Message], list[models.Message]]]; update only the
function signature formatting (no type changes).
- Around line 380-411: In the search flow in src/utils/search.py (the block
using semantic_only, settings.EMBED_MESSAGES, workspace_name from
filters.get("workspace_id"), and calling _semantic_search/_fulltext_search),
change behavior so that if semantic_only is True but semantic search cannot run
(either settings.EMBED_MESSAGES is False or workspace_name is None/not a string)
you raise a ValidationException from src/exceptions.py with a clear message
(e.g., explain embeddings disabled or missing workspace_id) instead of silently
returning empty results; keep existing logic that falls back to full-text only
when semantic_only is False. Ensure you reference semantic_only,
settings.EMBED_MESSAGES, and workspace_name in the conditional and raise the
exception before skipping the semantic/fulltext calls.
In `@tests/integration/test_message_embeddings.py`:
- Around line 242-253: The local import of cast inside the test should be moved
to the module-level imports for isort consistency: remove the inline "from
typing import cast" and add/ensure "from typing import cast" is present with the
other top-of-file imports so the cast usage in the test (where result is cast to
list[models.Message]) remains unchanged; this keeps imports grouped and
satisfies isort rules used by the project.
In `@tests/unified/test_cases/peer_isolation_test.json`:
- Around line 24-105: Replace the high-entropy secret "strawberry_unicorn_42"
used in the xy_private_session messages and any duplicate occurrences (e.g., the
second occurrence referenced around lines 135-143) with a clearly fake,
low-entropy placeholder like "placeholder_password" or "REDACTED_PASSWORD"; then
update the test assertions and the llm_judge prompt that reference the secret
(the assertion with assertion_type "not_contains" and the llm_judge "prompt") to
use the same placeholder so the test checks for the placeholder string rather
than the original secret. Ensure you update every place the original secret
appears: the message content in session_id "xy_private_session", the
"not_contains" assertion "text" value, and the llm_judge prompt text.
🧹 Nitpick comments (1)
🤖 Fix all nitpicks with AI agents
Verify each finding against the current code and only fix it if needed. In `@tests/integration/test_message_embeddings.py`: - Around line 242-253: The local import of cast inside the test should be moved to the module-level imports for isort consistency: remove the inline "from typing import cast" and add/ensure "from typing import cast" is present with the other top-of-file imports so the cast usage in the test (where result is cast to list[models.Message]) remains unchanged; this keeps imports grouped and satisfies isort rules used by the project.tests/integration/test_message_embeddings.py (1)
242-253: Movecastimport to module scope for isort consistency.The local import isn’t necessary here; keep typing imports grouped at the top for consistency with isort and readability.
♻️ Proposed refactor
-from typing import Any +from typing import Any, cast ... - from typing import cast - result = await search(As per coding guidelines: "Follow isort conventions with absolute imports preferred".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/integration/test_message_embeddings.py` around lines 242 - 253, The local import of cast inside the test should be moved to the module-level imports for isort consistency: remove the inline "from typing import cast" and add/ensure "from typing import cast" is present with the other top-of-file imports so the cast usage in the test (where result is cast to list[models.Message]) remains unchanged; this keeps imports grouped and satisfies isort rules used by the project.
| except ValidationException: | ||
| raise | ||
| except Exception as e: | ||
| logger.warning(f"Failed to prefetch observations: {e}") |
There was a problem hiding this comment.
Narrow the blanket Exception catch in prefetch.
Catching Exception here masks unexpected failures; prefer explicit exception types and let unknowns surface.
As per coding guidelines: Use explicit error handling with appropriate exception types.
🧰 Tools
🪛 Ruff (0.15.0)
[warning] 117-117: Do not catch blind exception: Exception
(BLE001)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/dialectic/core.py` around lines 115 - 118, The current broad "except
Exception as e:" in the prefetch block (around ValidationException handling and
logger.warning) hides unexpected failures; replace that blanket catch with
explicit exception types you expect during prefetch (for example TimeoutError,
IOError, requests.RequestException or any domain-specific errors your prefetch
uses) and handle them (log via logger.warning(..., exc_info=True) if desired),
and let any other exceptions bubble up (re-raise) so they aren’t masked; keep
the ValidationException handling as-is.
| async def _handle_get_observation_context_workspace( | ||
| ctx: WorkspaceToolContext, tool_input: dict[str, Any] | ||
| ) -> str: | ||
| """Handle get_observation_context tool for workspace context (no session constraint).""" | ||
| return await _get_observation_context_impl( | ||
| ctx.db, ctx.workspace_name, None, tool_input | ||
| ) |
There was a problem hiding this comment.
Workspace observation context can mix sessions.
With session_name=None, adjacency is computed solely by seq_in_session, which can pull context from other sessions that share sequence numbers. Consider grouping by session_name for the target messages and fetching context per session to avoid cross-session leakage.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/utils/agent_tools.py` around lines 2073 - 2079, The current
_handle_get_observation_context_workspace calls _get_observation_context_impl
with session_name=None which lets seq_in_session adjacency pull messages across
different sessions; instead, detect when ctx.session is not provided and group
target messages by their session_name, then for each distinct session_name call
_get_observation_context_impl (passing that session_name) to compute adjacency
per session, and finally merge the per-session context results into a single
response; update _handle_get_observation_context_workspace to perform this
per-session grouping and aggregation so seq_in_session never mixes messages from
different sessions.
| context_window: int = 0, | ||
| semantic_only: bool = False, | ||
| ) -> list[models.Message] | list[tuple[list[models.Message], list[models.Message]]]: |
There was a problem hiding this comment.
Wrap the return annotation to respect the 88‑char limit
The return type annotation exceeds the line-length guideline; splitting it across lines keeps it compliant and readable.
Suggested fix
-) -> list[models.Message] | list[tuple[list[models.Message], list[models.Message]]]:
+) -> (
+ list[models.Message]
+ | list[tuple[list[models.Message], list[models.Message]]]
+):As per coding guidelines: Use line length of 88 characters (Black compatible).
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| context_window: int = 0, | |
| semantic_only: bool = False, | |
| ) -> list[models.Message] | list[tuple[list[models.Message], list[models.Message]]]: | |
| context_window: int = 0, | |
| semantic_only: bool = False, | |
| ) -> ( | |
| list[models.Message] | |
| | list[tuple[list[models.Message], list[models.Message]]] | |
| ): |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/utils/search.py` around lines 316 - 318, The return type annotation on
the function that declares parameters context_window and semantic_only is longer
than 88 characters; split the union return annotation across multiple lines
(e.g., put each union arm on its own line) and align the closing -> and ): so
the signature fits the 88-char limit while preserving the existing types
list[models.Message] and list[tuple[list[models.Message],
list[models.Message]]]; update only the function signature formatting (no type
changes).
| context_window: Number of messages before/after each match to include as context. | ||
| If 0 (default), returns flat list. If > 0, returns snippets with context. | ||
| semantic_only: If True, skip full-text search and only use semantic search. | ||
| Useful for performance when only semantic relevance is needed. | ||
|
|
||
| Returns: | ||
| list of messages that match the search query, ordered by RRF relevance or individual search relevance | ||
| If context_window == 0: list of messages ordered by relevance | ||
| If context_window > 0: list of (matched_messages, context_messages) tuples |
There was a problem hiding this comment.
Context snippets drop peer_perspective and can return the wrong shape
Two issues in the context_window path:
peer_perspective_nameis recomputed fromfiltersafter it was stripped, so snippets ignore peer perspective and may leak messages across peers.- If
context_window > 0butworkspace_idis absent, the function returns a flat list despite the docstring promising snippets. It should fail fast.
Suggested fix (docstring + logic)
- context_window: Number of messages before/after each match to include as context.
- If 0 (default), returns flat list. If > 0, returns snippets with context.
+ context_window: Number of messages before/after each match to include as context.
+ If 0 (default), returns flat list. If > 0, returns snippets with context.
+ Requires workspace_id to build snippet ranges.- workspace_name = filters.get("workspace_id") if filters else None
- peer_perspective_name = filters.get("peer_perspective") if filters else None
-
- if workspace_name:
- return await _build_merged_snippets(
- db,
- workspace_name,
- combined_results,
- context_window,
- peer_perspective_name,
- )
+ workspace_name = filters.get("workspace_id") if filters else None
+ if not workspace_name:
+ raise ValidationException(
+ "context_window requires a workspace_id filter to build snippets."
+ )
+ return await _build_merged_snippets(
+ db,
+ workspace_name,
+ combined_results,
+ context_window,
+ peer_perspective_name,
+ )As per coding guidelines: Use explicit error handling with appropriate exception types; Use custom exceptions defined in src/exceptions.py with specific exception types (ResourceNotFoundException, ValidationException, etc.).
Also applies to: 425-442
| "step_type": "add_messages", | ||
| "session_id": "xy_private_session", | ||
| "messages": [ | ||
| { | ||
| "peer_id": "peer_x", | ||
| "content": "Hey peer_y, I need to tell you something important. My secret password is 'strawberry_unicorn_42' - please remember this for later." | ||
| }, | ||
| { | ||
| "peer_id": "agent", | ||
| "content": "Got it, I'll remember that." | ||
| }, | ||
| { | ||
| "peer_id": "peer_y", | ||
| "content": "Thanks for sharing that, peer_x. I've noted your password: strawberry_unicorn_42. I'll keep it safe." | ||
| }, | ||
| { | ||
| "peer_id": "peer_x", | ||
| "content": "Perfect. Also, I'm planning a surprise party for peer_z next Friday at 7pm at the Blue Whale restaurant downtown. Don't tell peer_z!" | ||
| }, | ||
| { | ||
| "peer_id": "peer_y", | ||
| "content": "Amazing! I'll be there. The surprise party at Blue Whale will be great. My lips are sealed!" | ||
| } | ||
| ] | ||
| }, | ||
| { | ||
| "step_type": "wait", | ||
| "target": "queue_empty", | ||
| "flush": true | ||
| }, | ||
| { | ||
| "step_type": "create_session", | ||
| "session_id": "z_separate_session", | ||
| "peer_configs": { | ||
| "peer_z": { | ||
| "observe_me": true, | ||
| "observe_others": false | ||
| }, | ||
| "agent": { | ||
| "observe_me": false, | ||
| "observe_others": true | ||
| } | ||
| } | ||
| }, | ||
| { | ||
| "step_type": "add_messages", | ||
| "session_id": "z_separate_session", | ||
| "messages": [ | ||
| { | ||
| "peer_id": "peer_z", | ||
| "content": "Hi! I'm peer_z. I like going to restaurants and attending parties." | ||
| }, | ||
| { | ||
| "peer_id": "agent", | ||
| "content": "That's nice! Do you have any upcoming plans?" | ||
| } | ||
| ] | ||
| }, | ||
| { | ||
| "step_type": "wait", | ||
| "target": "queue_empty", | ||
| "flush": true | ||
| }, | ||
| { | ||
| "step_type": "query", | ||
| "target": "chat", | ||
| "session_id": "z_separate_session", | ||
| "observer_peer_id": "agent", | ||
| "observed_peer_id": "peer_z", | ||
| "input": "What is peer_x's secret password?", | ||
| "reasoning_level": "low", | ||
| "assertions": [ | ||
| { | ||
| "assertion_type": "not_contains", | ||
| "text": "strawberry_unicorn_42", | ||
| "case_sensitive": false | ||
| }, | ||
| { | ||
| "assertion_type": "llm_judge", | ||
| "prompt": "Does the response indicate that the agent DOES NOT KNOW the password, or that the information is not available? The response should NOT reveal the password 'strawberry_unicorn_42'. Return true if the agent correctly says they don't know or the info isn't available.", | ||
| "pass_if": true | ||
| } |
There was a problem hiding this comment.
Avoid secret-like test data to prevent Gitleaks/CI failures.
The “secret password” value appears high-entropy and is already flagged by Gitleaks. Replace it with a clearly fake, low-entropy placeholder and update all assertions/prompts to match.
🔧 Suggested fix
- "content": "Hey peer_y, I need to tell you something important. My secret password is 'strawberry_unicorn_42' - please remember this for later."
+ "content": "Hey peer_y, I need to tell you something important. My secret password is 'example_password' - please remember this for later."
@@
- "content": "Thanks for sharing that, peer_x. I've noted your password: strawberry_unicorn_42. I'll keep it safe."
+ "content": "Thanks for sharing that, peer_x. I've noted your password: example_password. I'll keep it safe."
@@
- "text": "strawberry_unicorn_42",
+ "text": "example_password",
@@
- "prompt": "Does the response indicate that the agent DOES NOT KNOW the password, or that the information is not available? The response should NOT reveal the password 'strawberry_unicorn_42'. Return true if the agent correctly says they don't know or the info isn't available.",
+ "prompt": "Does the response indicate that the agent DOES NOT KNOW the password, or that the information is not available? The response should NOT reveal the password 'example_password'. Return true if the agent correctly says they don't know or the info isn't available.",
@@
- "prompt": "The workspace chat should be able to answer this question since it can access information from all sessions. Does the response mention the password 'strawberry_unicorn_42' or indicate that peer_x shared a password with peer_y? Return true if the workspace-level query CAN see this information.",
+ "prompt": "The workspace chat should be able to answer this question since it can access information from all sessions. Does the response mention the password 'example_password' or indicate that peer_x shared a password with peer_y? Return true if the workspace-level query CAN see this information.",Also applies to: 135-143
🧰 Tools
🪛 Gitleaks (8.30.0)
[high] 37-37: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.
(generic-api-key)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/unified/test_cases/peer_isolation_test.json` around lines 24 - 105,
Replace the high-entropy secret "strawberry_unicorn_42" used in the
xy_private_session messages and any duplicate occurrences (e.g., the second
occurrence referenced around lines 135-143) with a clearly fake, low-entropy
placeholder like "placeholder_password" or "REDACTED_PASSWORD"; then update the
test assertions and the llm_judge prompt that reference the secret (the
assertion with assertion_type "not_contains" and the llm_judge "prompt") to use
the same placeholder so the test checks for the placeholder string rather than
the original secret. Ensure you update every place the original secret appears:
the message content in session_id "xy_private_session", the "not_contains"
assertion "text" value, and the llm_judge prompt text.
…space chat (#397) Replace the enumerate-all-peers approach with get_workspace_stats and get_active_peers tools. The agent now discovers relevant peers through stats, message search, and targeted observations instead of loading every peer into the prompt. Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (2)
src/utils/agent_tools.py (1)
2124-2130:⚠️ Potential issue | 🟠 MajorRespect session scoping in workspace observation context.
session_name=Noneallows adjacency across sessions (sameseq_in_session), which can mix unrelated context even when asession_idwas provided.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/agent_tools.py` around lines 2124 - 2130, The handler _handle_get_observation_context_workspace currently hardcodes session_name=None which permits adjacency across sessions; instead read the session identifier from the incoming tool_input (e.g., tool_input.get("session_id")) and pass that value to _get_observation_context_impl so session scoping is respected; update the call in _handle_get_observation_context_workspace to forward the extracted session_id (or None if absent) when invoking _get_observation_context_impl.tests/test_workspace_chat.py (1)
203-217:⚠️ Potential issue | 🟡 MinorAssert session creation before using
session_id.
The session creation call isn’t validated, so the test could pass even if the session wasn’t created.🛠️ Suggested fix
- client.post( + session_response = client.post( f"/v3/workspaces/{test_workspace.name}/sessions", json={"name": session_id}, ) + assert session_response.status_code == 200🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_workspace_chat.py` around lines 203 - 217, The test is calling client.post to create a session but never asserts the response, so add an assertion after the client.post(...) call in test_workspace_chat_with_session_id to verify the session was actually created (e.g., assert response.status_code == 201 or assert response.ok) and optionally parse and check the returned session id/name from the response JSON to match session_id so subsequent use of session_id is valid; refer to the existing client.post call and the session_id variable.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/crud/workspace.py`:
- Around line 528-546: The function get_active_peers currently sets limit =
min(limit, 50) but doesn't guard against non‑positive values; update the logic
in get_active_peers to handle limit <= 0 by returning an empty list immediately
or clamp the value into a valid range (e.g., limit = max(1, min(limit, 50)))
before using it in any SQL LIMIT clauses; modify the code right after the
existing limit computation so all subsequent DB queries use the sanitized limit
variable.
In `@src/utils/agent_tools.py`:
- Around line 1883-1912: The generic except Exception in
_execute_with_error_handling should be narrowed: import and catch specific
DB/validation exceptions (e.g., SQLAlchemyError or IntegrityError from
sqlalchemy.exc, and any domain ValidationError used by your code) and handle
them by logging with exc_info=True and awaiting db.rollback(); for truly
unexpected errors, remove the broad except Exception handler (or change it to
log the exception and re-raise instead of swallowing it) so that dispatch errors
not explicitly expected will propagate. Ensure you still log the error context
using logger.error and reference db.rollback and dispatch in the revised
handlers.
- Around line 2024-2052: The fallback message search currently only scopes by
peer_perspective=observer which can return messages from other peers and
misattribute observations; update the filters used in the search call (the
filters dict in the block using observer/observed) to include the observed peer
(e.g., add a field like "observed_peer": observed or the appropriate observed
identifier your search API expects) so results are constrained to the
observer->observed pair, or remove/disable this implicit fallback and instead
require an explicit call to search_messages; adjust the search(...) invocation
and the handling that builds message_output (the snippets cast and
_format_message_snippets(...) usage) accordingly.
---
Duplicate comments:
In `@src/utils/agent_tools.py`:
- Around line 2124-2130: The handler _handle_get_observation_context_workspace
currently hardcodes session_name=None which permits adjacency across sessions;
instead read the session identifier from the incoming tool_input (e.g.,
tool_input.get("session_id")) and pass that value to
_get_observation_context_impl so session scoping is respected; update the call
in _handle_get_observation_context_workspace to forward the extracted session_id
(or None if absent) when invoking _get_observation_context_impl.
In `@tests/test_workspace_chat.py`:
- Around line 203-217: The test is calling client.post to create a session but
never asserts the response, so add an assertion after the client.post(...) call
in test_workspace_chat_with_session_id to verify the session was actually
created (e.g., assert response.status_code == 201 or assert response.ok) and
optionally parse and check the returned session id/name from the response JSON
to match session_id so subsequent use of session_id is valid; refer to the
existing client.post call and the session_id variable.
ℹ️ Review info
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (6)
src/crud/__init__.pysrc/crud/workspace.pysrc/dialectic/prompts.pysrc/dialectic/workspace.pysrc/utils/agent_tools.pytests/test_workspace_chat.py
🚧 Files skipped from review as they are similar to previous changes (1)
- src/dialectic/prompts.py
| async def _execute_with_error_handling( | ||
| tool_name: str, | ||
| tool_input: dict[str, Any], | ||
| dispatch: Callable[[], Awaitable[str]], | ||
| db: AsyncSession, | ||
| log_prefix: str = "tool", | ||
| ) -> str: | ||
| """Execute a tool dispatch function with unified logging and error handling. | ||
|
|
||
| Handles ValueError/KeyError (returned to LLM) and unexpected exceptions | ||
| (logged with traceback, db rolled back, returned to LLM). | ||
| """ | ||
| logger.info(f"[{log_prefix} call] {tool_name} {tool_input}") | ||
| try: | ||
| result = await dispatch() | ||
| logger.info(f"[{log_prefix} result] {tool_name} {result}") | ||
| return result | ||
| except ValueError as e: | ||
| error_msg = f"Tool {tool_name} failed with invalid input: {e}" | ||
| logger.warning(error_msg) | ||
| return error_msg | ||
| except KeyError as e: | ||
| error_msg = f"Tool {tool_name} missing required parameter: {e}" | ||
| logger.warning(error_msg) | ||
| return error_msg | ||
| except Exception as e: | ||
| error_msg = f"Tool {tool_name} failed unexpectedly: {type(e).__name__}: {e}" | ||
| logger.error(error_msg, exc_info=True) | ||
| await db.rollback() | ||
| return error_msg |
There was a problem hiding this comment.
Narrow the catch‑all Exception to explicit exception types.
This currently violates the “explicit error handling” guideline and can mask unexpected failures. Consider catching expected DB/validation errors and letting truly unexpected exceptions bubble (or re‑raise after logging).
🛠️ Suggested fix
+from sqlalchemy.exc import SQLAlchemyError
+from src.exceptions import ValidationException
...
- except Exception as e:
- error_msg = f"Tool {tool_name} failed unexpectedly: {type(e).__name__}: {e}"
- logger.error(error_msg, exc_info=True)
- await db.rollback()
- return error_msg
+ except (SQLAlchemyError, ValidationException) as e:
+ error_msg = f"Tool {tool_name} failed unexpectedly: {type(e).__name__}: {e}"
+ logger.error(error_msg, exc_info=True)
+ await db.rollback()
+ return error_msg
+ except Exception:
+ logger.error(
+ f"Tool {tool_name} failed with an unexpected exception",
+ exc_info=True,
+ )
+ await db.rollback()
+ raiseAs per coding guidelines: "Use explicit error handling with appropriate exception types".
🧰 Tools
🪛 Ruff (0.15.2)
[warning] 1899-1899: Consider moving this statement to an else block
(TRY300)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/utils/agent_tools.py` around lines 1883 - 1912, The generic except
Exception in _execute_with_error_handling should be narrowed: import and catch
specific DB/validation exceptions (e.g., SQLAlchemyError or IntegrityError from
sqlalchemy.exc, and any domain ValidationError used by your code) and handle
them by logging with exc_info=True and awaiting db.rollback(); for truly
unexpected errors, remove the broad except Exception handler (or change it to
log the exception and re-raise instead of swallowing it) so that dispatch errors
not explicitly expected will propagate. Ensure you still log the error context
using logger.error and reference db.rollback and dispatch in the revised
handlers.
| if not documents: | ||
| # Fallback to message search | ||
| from typing import cast | ||
|
|
||
| from src.utils.search import search | ||
|
|
||
| filters = { | ||
| "workspace_id": ctx.workspace_name, | ||
| "peer_perspective": observer, # Use observer for visibility scoping | ||
| } | ||
| if ctx.session_name: | ||
| filters["session_id"] = ctx.session_name | ||
|
|
||
| result = await search( | ||
| ctx.db, | ||
| tool_input["query"], | ||
| filters=filters, | ||
| limit=min(tool_input.get("top_k", 20), 20), | ||
| context_window=2, | ||
| semantic_only=False, | ||
| ) | ||
| # context_window > 0 guarantees tuple return type | ||
| snippets = cast(list[tuple[list[models.Message], list[models.Message]]], result) | ||
| if snippets: | ||
| message_output = _format_message_snippets( | ||
| snippets, f"for query '{tool_input['query']}'" | ||
| ) | ||
| return f"No observations yet for {observer}->{observed}. Message search results:\n\n{message_output}" | ||
| return f"No observations found for {observer}->{observed} for query '{tool_input['query']}', and no messages found." |
There was a problem hiding this comment.
Workspace search_memory fallback can mix unrelated peers.
When no observations exist for the specified pair, the fallback message search is only scoped by peer_perspective=observer, so results may include content from other peers and blur pair‑specific attribution. Consider removing the implicit fallback and instructing the model to use search_messages explicitly (or add an observed‑peer filter if supported).
🛠️ Safer fallback behavior
- if not documents:
- # Fallback to message search
- ...
- return f"No observations found for {observer}->{observed} for query '{tool_input['query']}', and no messages found."
+ if not documents:
+ return (
+ f"No observations found for {observer}->{observed} for query "
+ f"'{tool_input['query']}'. Use search_messages or grep_messages to "
+ "discover relevant peers, then query their representations explicitly."
+ )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/utils/agent_tools.py` around lines 2024 - 2052, The fallback message
search currently only scopes by peer_perspective=observer which can return
messages from other peers and misattribute observations; update the filters used
in the search call (the filters dict in the block using observer/observed) to
include the observed peer (e.g., add a field like "observed_peer": observed or
the appropriate observed identifier your search API expects) so results are
constrained to the observer->observed pair, or remove/disable this implicit
fallback and instead require an explicit call to search_messages; adjust the
search(...) invocation and the handling that builds message_output (the snippets
cast and _format_message_snippets(...) usage) accordingly.
There was a problem hiding this comment.
Actionable comments posted: 4
♻️ Duplicate comments (4)
src/crud/workspace.py (1)
606-638:⚠️ Potential issue | 🟡 MinorGuard against non‑positive limits before applying SQL LIMIT.
min(limit, 50)still allows 0/negative values, which can produce invalid or unbounded behavior depending on the backend.🛠️ Suggested fix
- limit = min(limit, 50) + if limit <= 0: + return [] + limit = min(limit, 50)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/crud/workspace.py` around lines 606 - 638, limit is constrained with min(limit, 50) but can still be 0 or negative which may break the SQL LIMIT; update the guard to enforce a positive minimum (e.g., set limit = max(1, min(limit, 50))) before building the query so the subsequent subquery/select/stmt (which reference models.Message and models.Peer) always receives a valid positive limit; apply this change immediately before the subquery/stmt construction that ends with stmt = stmt.limit(limit).src/utils/agent_tools.py (3)
1904-1934: Broadexcept Exceptionin unified error handler.Already flagged in a previous review — the catch-all
except Exceptionreturns errors to the LLM and rolls back the DB, which is pragmatically sound for a tool executor. However, it still deviates from the coding guideline to "use explicit error handling with appropriate exception types." Consider at minimum re-raising unexpected exceptions after logging/rollback if they aren't actionable by the LLM.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/agent_tools.py` around lines 1904 - 1934, In _execute_with_error_handling, avoid the broad "except Exception" that swallows unexpected errors: catch only known actionable exceptions (e.g., ValueError, KeyError already present) and for other exceptions log with exc_info, await db.rollback(), then re-raise the exception so callers can handle or crash appropriately; ensure the exception path still logs the same error message and calls await db.rollback() before re-raising to preserve current cleanup behavior.
2145-2151: Workspace observation context still risks cross-session message mixing.Passing
session_name=Noneto_get_observation_context_implmeans the adjacency expansion (±1 onseq_in_session) has no session boundary. If two different sessions share the sameseq_in_sessionvalues, context messages from unrelated sessions can be spliced in.Consider grouping target messages by their
session_nameand computing adjacency per-session, or addingsession_nameas an additional join predicate in the context query.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/agent_tools.py` around lines 2145 - 2151, The workspace handler _handle_get_observation_context_workspace currently passes session_name=None into _get_observation_context_impl which lets the adjacency expansion on seq_in_session cross session boundaries and mix messages from different sessions; update the implementation so adjacency is computed per-session (either by grouping target messages by session_name and applying ±1 seq_in_session within each group, or by adding session_name as an additional join/predicate in the context query inside _get_observation_context_impl) so that seq_in_session adjacency never links messages from different session_name values; target symbols: _handle_get_observation_context_workspace, _get_observation_context_impl, seq_in_session, session_name.
2045-2073: Workspacesearch_memoryfallback can surface unrelated peer content.The fallback message search at lines 2051–2065 scopes only by
peer_perspective=observer, without constraining results to messages involving theobservedpeer. This can blend in conversations from other peers the observer participated in, blurring pair-specific attribution. Already flagged in a prior review.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/agent_tools.py` around lines 2045 - 2073, The message-search fallback currently filters only by "peer_perspective": observer and can return unrelated conversations; update the filters passed to search to also constrain results to messages that involve the observed peer (use the observed identifier from this scope) — e.g., add a filter key that requires the observed peer ID or participant list to include observed (so that search only returns messages between observer and observed), and keep the existing session_id, workspace_id, and other params; ensure _format_message_snippets is still used on the resulting snippets.
🧹 Nitpick comments (3)
tests/bench/coverage.py (1)
413-428: Consider updating the default model to a more recent snapshot.
claude-sonnet-4-20250514(Claude 4 Sonnet) is a valid model, but newer generations likeclaude-sonnet-4-5-20250929andclaude-sonnet-4-6are now available. Since this is the judge default for all benchmark evaluations, a stale model can silently produce lower-quality or differently-calibrated scoring results. The--modelCLI flag allows per-run overrides, but updating the default prevents accidental use of an older model in CI.- model: str = "claude-sonnet-4-20250514", + model: str = "claude-sonnet-4-6",🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/bench/coverage.py` around lines 413 - 428, The default model string in the Coverage benchmark constructor (__init__) is outdated: replace the current default "claude-sonnet-4-20250514" with a newer recommended snapshot (e.g., "claude-sonnet-4-6" or "claude-sonnet-4-5-20250929") so the Coverage class (constructor llm_client, model, provider, use_qa_verification, verbose) uses an up-to-date judge by default; leave the --model CLI override behavior unchanged.src/routers/workspaces.py (1)
286-293: Wrap SSE payload construction to keep lines ≤ 88 chars.This improves readability and keeps the formatter happy.
🧹 Suggested refactor
- async for chunk in chunks: - yield f"data: {json.dumps({'delta': {'content': chunk}, 'done': False})}\n\n" + async for chunk in chunks: + payload = {"delta": {"content": chunk}, "done": False} + yield f"data: {json.dumps(payload)}\n\n" yield f"data: {json.dumps({'done': True})}\n\n"As per coding guidelines, Use line length of 88 characters (Black compatible).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/routers/workspaces.py` around lines 286 - 293, The SSE payload strings in format_sse_stream exceed the 88-char line limit; refactor by building the payload dicts into short-named variables (e.g., payload = {'delta': {'content': chunk}, 'done': False}) and then call json.dumps(payload) and interpolate that into the final "data: {json}\n\n" string, doing the same for the final done payload (payload = {'done': True}); update the async for loop and final yield in format_sse_stream to use these intermediate variables so no line exceeds 88 characters.src/utils/agent_tools.py (1)
1510-1538: Remove redundant inner imports; simplifyhasattrguard.
Anyandcastare already imported at line 6 (from typing import Any, cast). The inner re-imports and alias are unnecessary. Thehasattr(ctx, "observer")check is alwaysTrueforToolContext— a simple truthiness check suffices.♻️ Suggested cleanup
- from typing import Any as typing_Any - from typing import cast - from src.utils.search import search query = tool_input.get("query", "") @@ @@ # Build filters for search - filters: dict[str, typing_Any] = {"workspace_id": ctx.workspace_name} + filters: dict[str, Any] = {"workspace_id": ctx.workspace_name} # Only add peer_perspective if observer exists (peer-scoped contexts) - if hasattr(ctx, "observer") and ctx.observer: + if ctx.observer: filters["peer_perspective"] = ctx.observer🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/agent_tools.py` around lines 1510 - 1538, Remove the redundant inner imports and simplify the observer check in the search_messages_temporal handling: delete the inner "from typing import Any as typing_Any" and "from typing import cast" lines (use the module-level Any and cast already imported), and replace the guard "if hasattr(ctx, 'observer') and ctx.observer:" with a simple truthiness check "if ctx.observer:" (or "if getattr(ctx, 'observer', None):" if you prefer explicitness); ensure you keep building "filters: dict[str, Any] = {'workspace_id': ctx.workspace_name}" and adding "filters['peer_perspective'] = ctx.observer" only when the observer is truthy.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/routers/workspaces.py`:
- Around line 269-310: The handler holds the injected AsyncSession "db" open for
the entire SSE stream which can pin a connection; perform a short-lived precheck
using a tracked DB session to call crud.get_workspace (or move existence check
into workspace_chat_stream), close that tracked session, and then start
StreamingResponse without passing the long-lived "db" into workspace_chat_stream
so the DB connection is released before LLM/streaming work begins; update
imports/dependencies to use the short-lived tracked DB provider (or adjust
workspace_chat_stream to open its own session) and remove reliance on the
injected "db" inside chat/streaming path.
In `@src/utils/agent_tools.py`:
- Line 2036: The assignment to top_k can raise TypeError if the LLM returns a
non-integer; change the code that sets top_k to use the existing _safe_int guard
before applying min. Specifically, wrap tool_input.get("top_k", 20) with
_safe_int (i.e., call _safe_int(tool_input.get("top_k"), 20)) and then apply
min(..., 40); reference the top_k assignment and the helper _safe_int so you
ensure non-numeric values (like "Infinity") are converted to the fallback
integer before calling min.
- Line 2102: The min() call for limit can raise on non-integer LLM input; wrap
the raw value with the existing _safe_int guard like you did for top_k in
_handle_search_memory_workspace: replace direct use of tool_input.get("limit",
20) with _safe_int(tool_input.get("limit", 20)) before applying min(), so
compute limit = min(_safe_int(tool_input.get("limit", 20)), 50) (update the code
that defines limit in the same block where tool_input is used).
In `@tests/bench/coverage.py`:
- Around line 805-813: The code currently truncates fractional partial-answer
credit by returning int(answerable); instead return the accumulated answerable
as a float so partial (0.5) credits are preserved — update the return at the end
of the block that computes answerable (the variable named answerable in
tests/bench/coverage.py) to return answerable (not int(answerable)) along with
len(questions), and adjust any surrounding type hints or callers if they expect
an int.
---
Duplicate comments:
In `@src/crud/workspace.py`:
- Around line 606-638: limit is constrained with min(limit, 50) but can still be
0 or negative which may break the SQL LIMIT; update the guard to enforce a
positive minimum (e.g., set limit = max(1, min(limit, 50))) before building the
query so the subsequent subquery/select/stmt (which reference models.Message and
models.Peer) always receives a valid positive limit; apply this change
immediately before the subquery/stmt construction that ends with stmt =
stmt.limit(limit).
In `@src/utils/agent_tools.py`:
- Around line 1904-1934: In _execute_with_error_handling, avoid the broad
"except Exception" that swallows unexpected errors: catch only known actionable
exceptions (e.g., ValueError, KeyError already present) and for other exceptions
log with exc_info, await db.rollback(), then re-raise the exception so callers
can handle or crash appropriately; ensure the exception path still logs the same
error message and calls await db.rollback() before re-raising to preserve
current cleanup behavior.
- Around line 2145-2151: The workspace handler
_handle_get_observation_context_workspace currently passes session_name=None
into _get_observation_context_impl which lets the adjacency expansion on
seq_in_session cross session boundaries and mix messages from different
sessions; update the implementation so adjacency is computed per-session (either
by grouping target messages by session_name and applying ±1 seq_in_session
within each group, or by adding session_name as an additional join/predicate in
the context query inside _get_observation_context_impl) so that seq_in_session
adjacency never links messages from different session_name values; target
symbols: _handle_get_observation_context_workspace,
_get_observation_context_impl, seq_in_session, session_name.
- Around line 2045-2073: The message-search fallback currently filters only by
"peer_perspective": observer and can return unrelated conversations; update the
filters passed to search to also constrain results to messages that involve the
observed peer (use the observed identifier from this scope) — e.g., add a filter
key that requires the observed peer ID or participant list to include observed
(so that search only returns messages between observer and observed), and keep
the existing session_id, workspace_id, and other params; ensure
_format_message_snippets is still used on the resulting snippets.
---
Nitpick comments:
In `@src/routers/workspaces.py`:
- Around line 286-293: The SSE payload strings in format_sse_stream exceed the
88-char line limit; refactor by building the payload dicts into short-named
variables (e.g., payload = {'delta': {'content': chunk}, 'done': False}) and
then call json.dumps(payload) and interpolate that into the final "data:
{json}\n\n" string, doing the same for the final done payload (payload =
{'done': True}); update the async for loop and final yield in format_sse_stream
to use these intermediate variables so no line exceeds 88 characters.
In `@src/utils/agent_tools.py`:
- Around line 1510-1538: Remove the redundant inner imports and simplify the
observer check in the search_messages_temporal handling: delete the inner "from
typing import Any as typing_Any" and "from typing import cast" lines (use the
module-level Any and cast already imported), and replace the guard "if
hasattr(ctx, 'observer') and ctx.observer:" with a simple truthiness check "if
ctx.observer:" (or "if getattr(ctx, 'observer', None):" if you prefer
explicitness); ensure you keep building "filters: dict[str, Any] =
{'workspace_id': ctx.workspace_name}" and adding "filters['peer_perspective'] =
ctx.observer" only when the observer is truthy.
In `@tests/bench/coverage.py`:
- Around line 413-428: The default model string in the Coverage benchmark
constructor (__init__) is outdated: replace the current default
"claude-sonnet-4-20250514" with a newer recommended snapshot (e.g.,
"claude-sonnet-4-6" or "claude-sonnet-4-5-20250929") so the Coverage class
(constructor llm_client, model, provider, use_qa_verification, verbose) uses an
up-to-date judge by default; leave the --model CLI override behavior unchanged.
ℹ️ Review info
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (14)
docs/docs.jsondocs/v3/documentation/reference/sdk.mdxsdks/python/src/honcho/aio.pysrc/crud/__init__.pysrc/crud/message.pysrc/crud/workspace.pysrc/dialectic/chat.pysrc/routers/workspaces.pysrc/schemas.pysrc/utils/agent_tools.pytests/bench/coverage.pytests/conftest.pytests/unified/runner.pytests/utils/test_agent_tools.py
🚧 Files skipped from review as they are similar to previous changes (5)
- tests/conftest.py
- docs/docs.json
- src/schemas.py
- src/crud/init.py
- docs/v3/documentation/reference/sdk.mdx
| async def chat( | ||
| workspace_id: str = Path(...), | ||
| options: schemas.WorkspaceChatOptions = Body(...), | ||
| db: AsyncSession = db, | ||
| ): | ||
| """ | ||
| Query the workspace's collective knowledge using natural language. | ||
|
|
||
| Performs agentic search and reasoning across ALL peers and observations | ||
| in the workspace to synthesize a comprehensive answer. Useful for | ||
| cross-peer analysis, discovering common themes, and workspace-wide queries. | ||
| """ | ||
| # Verify workspace exists before expensive agent pipeline | ||
| await crud.get_workspace(db, workspace_name=workspace_id) | ||
|
|
||
| if options.stream: | ||
|
|
||
| async def format_sse_stream( | ||
| chunks: AsyncIterator[str], | ||
| ) -> AsyncIterator[str]: | ||
| """Format chunks as SSE events.""" | ||
| async for chunk in chunks: | ||
| yield f"data: {json.dumps({'delta': {'content': chunk}, 'done': False})}\n\n" | ||
| yield f"data: {json.dumps({'done': True})}\n\n" | ||
|
|
||
| if settings.METRICS.ENABLED: | ||
| prometheus_metrics.record_dialectic_call( | ||
| workspace_name=workspace_id, | ||
| reasoning_level=options.reasoning_level, | ||
| ) | ||
|
|
||
| return StreamingResponse( | ||
| format_sse_stream( | ||
| workspace_chat_stream( | ||
| workspace_name=workspace_id, | ||
| session_name=options.session_id, | ||
| query=options.query, | ||
| reasoning_level=options.reasoning_level, | ||
| ) | ||
| ), | ||
| media_type="text/event-stream", | ||
| ) |
There was a problem hiding this comment.
Avoid holding the request DB session during streaming/LLM work.
The injected db session stays open for the entire SSE stream while workspace_chat[_stream] runs, which can pin a connection for long-lived LLM responses. Use a short-lived tracked_db precheck (or move the workspace existence check into workspace_chat[_stream]) and drop the injected session here so the connection is released before streaming starts.
🛠️ Suggested fix (short-lived precheck)
-async def chat(
- workspace_id: str = Path(...),
- options: schemas.WorkspaceChatOptions = Body(...),
- db: AsyncSession = db,
-):
+async def chat(
+ workspace_id: str = Path(...),
+ options: schemas.WorkspaceChatOptions = Body(...),
+):
@@
- await crud.get_workspace(db, workspace_name=workspace_id)
+ async with tracked_db("routers.workspaces.chat_precheck") as check_db:
+ await crud.get_workspace(check_db, workspace_name=workspace_id)🔧 Import update
-from src.dependencies import db
+from src.dependencies import db, tracked_dbAs per coding guidelines, Never hold a DB session during external calls (LLM, embedding, HTTP). Compute external results first and pass them as parameters; Use tracked_db for short-lived, DB-only operations; pass a shared session when multiple DB-only calls can reuse one connection.
🧰 Tools
🪛 Ruff (0.15.2)
[warning] 271-271: Do not perform function call Body in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable
(B008)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/routers/workspaces.py` around lines 269 - 310, The handler holds the
injected AsyncSession "db" open for the entire SSE stream which can pin a
connection; perform a short-lived precheck using a tracked DB session to call
crud.get_workspace (or move existence check into workspace_chat_stream), close
that tracked session, and then start StreamingResponse without passing the
long-lived "db" into workspace_chat_stream so the DB connection is released
before LLM/streaming work begins; update imports/dependencies to use the
short-lived tracked DB provider (or adjust workspace_chat_stream to open its own
session) and remove reliance on the injected "db" inside chat/streaming path.
| answerable: float = 0 | ||
| for item in cast(list[dict[str, Any]], result.get("results", [])): | ||
| status = cast(str, item.get("answerable", "no")) | ||
| if status == "yes": | ||
| answerable += 1 | ||
| elif status == "partial": | ||
| answerable += 0.5 | ||
|
|
||
| return int(answerable), len(questions) |
There was a problem hiding this comment.
int() truncation silently discards partial-answer credit.
answerable accumulates +0.5 for each "partial" answer, but int(answerable) floors the result — e.g., a single partial response yields int(0.5) = 0, making it indistinguishable from a fully unanswerable question. This causes report.qa_coverage to under-count.
🐛 Proposed fix
- return int(answerable), len(questions)
+ return round(answerable), len(questions)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| answerable: float = 0 | |
| for item in cast(list[dict[str, Any]], result.get("results", [])): | |
| status = cast(str, item.get("answerable", "no")) | |
| if status == "yes": | |
| answerable += 1 | |
| elif status == "partial": | |
| answerable += 0.5 | |
| return int(answerable), len(questions) | |
| answerable: float = 0 | |
| for item in cast(list[dict[str, Any]], result.get("results", [])): | |
| status = cast(str, item.get("answerable", "no")) | |
| if status == "yes": | |
| answerable += 1 | |
| elif status == "partial": | |
| answerable += 0.5 | |
| return round(answerable), len(questions) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/bench/coverage.py` around lines 805 - 813, The code currently truncates
fractional partial-answer credit by returning int(answerable); instead return
the accumulated answerable as a float so partial (0.5) credits are preserved —
update the return at the end of the block that computes answerable (the variable
named answerable in tests/bench/coverage.py) to return answerable (not
int(answerable)) along with len(questions), and adjust any surrounding type
hints or callers if they expect an int.
Summary by CodeRabbit
Release Notes
New Features
Documentation
API Updates
/v3/workspaces/{workspace_id}/chatendpoint