Skip to content

feat: workspace-level chat#373

Open
dr-frmr wants to merge 12 commits intomainfrom
ben/add-workspace-chat
Open

feat: workspace-level chat#373
dr-frmr wants to merge 12 commits intomainfrom
ben/add-workspace-chat

Conversation

@dr-frmr
Copy link
Copy Markdown
Contributor

@dr-frmr dr-frmr commented Feb 6, 2026

Summary by CodeRabbit

Release Notes

  • New Features

    • Workspace-level chat: Query knowledge across all peers in a workspace using natural language
    • Streaming support for workspace queries with Server-Sent Events (SSE)
    • Configurable reasoning levels (minimal, low, medium, high, max) for workspace queries
    • Optional session scoping for targeted workspace knowledge queries
  • Documentation

    • Added comprehensive API reference for workspace chat endpoint
    • New feature documentation with Python and TypeScript SDK examples
    • Updated SDK reference guides with workspace chat usage patterns
  • API Updates

    • New POST /v3/workspaces/{workspace_id}/chat endpoint
    • API specification updated to version 3.0.2
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Feb 6, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • ✅ Review completed - (🔄 Check again to review again)

Walkthrough

Adds workspace-level chat: API/OpenAPI + docs, SDK client methods (Python/TS), backend route and schema, new dialectic agent base/refactor and WorkspaceDialecticAgent, workspace-scoped tools and visibility-aware search/CRUD changes, extensive tests, and CI workflow removal.

Changes

Cohort / File(s) Summary
Docs & OpenAPI
docs/docs.json, docs/v3/api-reference/endpoint/workspaces/chat.mdx, docs/v3/documentation/features/chat.mdx, docs/v3/documentation/reference/sdk.mdx, docs/v3/openapi.json
Add workspace chat docs; bump OpenAPI version; add POST /v3/workspaces/{workspace_id}/chat and WorkspaceChatOptions schema; minor description edits.
Python SDK
sdks/python/src/honcho/client.py, sdks/python/src/honcho/aio.py, sdks/python/src/honcho/http/routes.py
Add chat and chat_stream methods (sync & async) and workspace_chat(...) route helper for new endpoint.
TypeScript SDK
sdks/typescript/src/client.ts, sdks/typescript/src/index.ts, sdks/typescript/src/types/api.ts
Add chat/chatStream methods, private workspace POST helpers, and types WorkspaceChatParams/WorkspaceChatResponse (exported).
API Router & Schema
src/routers/workspaces.py, src/schemas.py
New POST /workspaces/{workspace_id}/chat route supporting SSE streaming and non-stream responses; add WorkspaceChatOptions Pydantic model.
Dialectic Agents
src/dialectic/base.py, src/dialectic/core.py, src/dialectic/chat.py, src/dialectic/workspace.py, src/dialectic/prompts.py
Introduce BaseDialecticAgent (shared flow, metrics, streaming); refactor DialecticAgent to inherit; add WorkspaceDialecticAgent, workspace chat entrypoints, and workspace system prompt.
Workspace Tooling & Representation
src/utils/agent_tools.py, src/utils/representation.py
Add WorkspaceToolContext, create_workspace_tool_executor, workspace tool handlers/dispatch and WORKSPACE_DIALECTIC_TOOLS; add format_documents_with_attribution.
Search & Message CRUD
src/utils/search.py, src/crud/message.py
Extend search API with context_window and semantic_only; propagate peer_perspective through message CRUD/search flows for visibility filtering.
Workspace Router Tests & Fixtures
tests/conftest.py, tests/test_workspace_chat.py, tests/integration/test_token_metrics.py
Add workspace chat mocks and an extensive integration test suite; update token-metrics patches to new base module.
Unified Tests & Cases
tests/unified/schema.py, tests/unified/runner.py, tests/unified/test_cases/*
Add workspace_chat target and multiple unified JSON test cases exercising cross-peer/observation/message-based workspace queries.
CRUD & Tools Tests
tests/crud/test_message_perspective.py, tests/utils/test_agent_tools.py, tests/dialectic/test_core.py
Add tests validating peer-perspective visibility, workspace tool handlers, and session-history visibility for dialectic agents.
Workspace CRUD Exports & Analytics
src/crud/__init__.py, src/crud/workspace.py
Export new ActivePeer, WorkspaceStats, get_active_peers, get_workspace_stats and add workspace analytics helpers.
CI Workflow
.github/workflows/claude.yml
Removed Claude-based GitHub Actions workflow file.

Sequence Diagram(s)

sequenceDiagram
    actor Client
    participant SDK as SDK Client
    participant API as API Server
    participant Agent as WorkspaceDialecticAgent
    participant Tools as WorkspaceTools
    participant DB as Database
    participant LLM as LLM Service

    Client->>SDK: chat(query, session?, reasoning_level?)
    SDK->>API: POST /v3/workspaces/{id}/chat (WorkspaceChatOptions)
    API->>Agent: workspace_chat(query, session, reasoning_level)
    Agent->>DB: prefetch peers & observations
    DB-->>Agent: relevant observations
    Agent->>Tools: create_workspace_tool_executor(...)
    Tools->>DB: search_memory / list_peers / get_peer_card ...
    DB-->>Tools: tool results
    Tools-->>Agent: tool outputs (via executor)
    Agent->>LLM: honcho_llm_call(messages, tools, reasoning_level)
    LLM-->>Agent: synthesized response (stream or final)
    Agent->>DB: log metrics / telemetry
    Agent-->>API: DialecticResponse or SSE stream
    API-->>SDK: response payload / SSE events
    SDK-->>Client: string or streamed content
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • dr-frmr

Poem

🐰 I nibble docs and wire new chat,

Across peers, the answers sat.
Tools hop in, memories gleam,
Streams of tokens, thoughts that beam—
Honcho hums; the workspace chat!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 74.59% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: workspace-level chat' clearly and concisely summarizes the main feature addition: workspace-level chat functionality.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch ben/add-workspace-chat
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

@dr-frmr dr-frmr marked this pull request as draft February 6, 2026 20:48
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI agents
In `@src/dialectic/workspace.py`:
- Around line 96-100: The broad except block around the call to
format_documents_with_attribution should be narrowed to specific exceptions (for
example SQLAlchemyError and ValidationException) instead of catching Exception;
replace the bare except with except (SQLAlchemyError, ValidationException) as e
and log using structured logging (e.g., logger.warning("Failed to prefetch
workspace observations: %s", e, exc_info=True) or logger.exception) rather than
an f-string; keep the existing return None behavior. Ensure you import the
exception types used and update the handler around the
format_documents_with_attribution call and its surrounding function.

In `@src/utils/agent_tools.py`:
- Around line 2059-2093: The fallback ToolContext (peer_ctx) is created with
empty observer/observed which risks silent writes if unsafe handlers are
invoked; update execute_tool to only fall through to a curated safe subset of
_TOOL_HANDLERS (e.g., "search_messages","grep_messages", etc.) instead of
blindly calling any handler from _TOOL_HANDLERS, and explicitly reject or return
"Unknown tool" for handlers that perform writes (create_observations,
update_peer_card, delete_observations) by checking tool_name against a
SAFE_FALLTHROUGH set before calling handler(peer_ctx, ...); reference symbols:
ToolContext, peer_ctx, execute_tool, _WORKSPACE_TOOL_HANDLERS, _TOOL_HANDLERS,
and the handler names create_observations/update_peer_card/delete_observations
when implementing the guard.
- Around line 1960-2003: The workspace branch that builds premises/sources
(inside the if direction in ("premises","both") block) must add the same
fallback messages as the original _handle_get_reasoning_chain: when
doc.source_ids exists but get_documents_by_ids returns empty, append a message
like "Premises/Sources listed but not found"; when level is "deductive" and
doc.level indicates the observation is explicit (no premises) append an
"Observation is explicit (no premises)" message; when doc.source_ids is empty
append "No premises/sources recorded". Update the code around the calls to
crud.get_documents_by_ids and the output_parts append logic (referencing
doc.source_ids, level, premises/sources, and output_parts) to include these
fallback branches for both the deductive and inductive paths so the workspace
variant mirrors the original behavior.

In `@tests/test_workspace_chat.py`:
- Around line 204-230: The test test_workspace_chat_with_session_id creates a
session but never validates that creation, so add assertions after the
client.post call that creates the session (the POST to
"/v3/workspaces/{test_workspace.name}/sessions") to ensure it succeeded and
returned the expected data; specifically assert the response.status_code is
successful (e.g., 201/200) and validate the response body contains the created
session identifier/name (matching session_id) before calling /chat with that
session_id so the chat request is truly scoped to an existing session.
🧹 Nitpick comments (10)
src/utils/representation.py (1)

681-719: LGTM — clean utility function with clear attribution logic.

The grouping, header differentiation, and delegation to Representation.from_documents all look correct. One small simplification:

♻️ Optional: Use defaultdict to simplify grouping
+from collections import defaultdict
+
 def format_documents_with_attribution(
     documents: Sequence[models.Document],
     include_ids: bool = False,
 ) -> str:
     ...
-    groups: dict[tuple[str, str], list[models.Document]] = {}
-    for doc in documents:
-        key = (doc.observer, doc.observed)
-        if key not in groups:
-            groups[key] = []
-        groups[key].append(doc)
+    groups: dict[tuple[str, str], list[models.Document]] = defaultdict(list)
+    for doc in documents:
+        groups[(doc.observer, doc.observed)].append(doc)
docs/v3/documentation/reference/sdk.mdx (1)

258-273: Minor: const response declared twice in the TypeScript example.

Lines 260 and 263 both declare const response, which would be a compile error in actual TypeScript. Consider using a different variable name for one of them to avoid confusing readers who may copy-paste the snippet.

Suggested fix
-// Query knowledge across ALL peers in the workspace
-const response = await honcho.chat("What are common themes across all users?");
-
-// With reasoning level
-const response = await honcho.chat("Summarize all peer activity.", {
+// Query knowledge across ALL peers in the workspace
+const answer = await honcho.chat("What are common themes across all users?");
+
+// With reasoning level
+const deepAnswer = await honcho.chat("Summarize all peer activity.", {
   reasoningLevel: "high"
 });
src/routers/workspaces.py (1)

256-322: Clean endpoint implementation, one observation on metrics placement.

The endpoint correctly handles both streaming (SSE) and non-streaming paths, and the auth dependency is properly scoped to the workspace.

One minor asymmetry: in the streaming branch (lines 291-295), record_dialectic_call fires before the stream is consumed, whereas in the non-streaming branch (lines 316-320) it fires after the response is ready. This means a failed streaming request will still increment the counter. If tracking completed calls matters, consider moving the streaming metrics into format_sse_stream (e.g., after the final done sentinel). That said, this appears to match the peer-level chat endpoint pattern, so it may be intentional.

sdks/typescript/src/client.ts (1)

862-884: reasoningLevel could benefit from a stricter type.

The Python SDK uses Literal["minimal", "low", "medium", "high", "max"] for reasoning_level, but here it's typed as string. Consider using a union type or string literal type for better developer experience and type safety:

reasoningLevel?: 'minimal' | 'low' | 'medium' | 'high' | 'max'

Same applies to chatStream below (lines 908-927).

src/dialectic/workspace.py (2)

51-70: Instance-level self.messages and self._run_id make this agent single-use; consider documenting or enforcing this.

self.messages (line 65) accumulates user/system messages across calls, and self._run_id (line 63) is set once. If answer() or answer_stream() is called more than once on the same instance, the second call will inherit the first query's conversation context and share the same _run_id in the emitted DialecticCompletedEvent. Meanwhile, _prepare_query generates a separate local run_id for task logging (line 111), so two different IDs track the same logical query.

If single-use is intentional (one instance per query), a brief docstring note or a guard in _prepare_query would prevent misuse.


102-155: _prepare_query creates a local run_id that shadows self._run_id — potentially confusing.

Line 111 creates a local run_id used for task naming and performance logging, while self._run_id (from __init__) is used in tool executor creation (line 150) and the DialecticCompletedEvent (line 207). Having two distinct IDs tracking the same query makes correlation across logs and events harder. Consider unifying them or making the distinction explicit.

src/utils/agent_tools.py (4)

1838-1871: Move format_documents_with_attribution to the top-level imports.

Line 1842 uses a local import, but src.utils.representation is already imported at the module level (line 24). Move this to the top with the other imports for consistency.

♻️ Suggested fix

At line 24, add the import:

-from src.utils.representation import Representation
+from src.utils.representation import Representation, format_documents_with_attribution

Then at line 1842, remove the local import:

 async def _handle_search_memory_workspace(
     ctx: WorkspaceToolContext, tool_input: dict[str, Any]
 ) -> str:
     """Handle workspace-level search_memory tool (searches ALL peers)."""
-    from src.utils.representation import format_documents_with_attribution
-
     top_k = min(tool_input.get("top_k", 20), 40)

As per coding guidelines, "Follow isort conventions with absolute imports preferred."


1909-1934: Remove redundant local import.

format_new_turn_with_timestamp is already imported at the module level (line 23). The local import on line 1913 is unnecessary.

♻️ Suggested fix
 async def _handle_get_observation_context_workspace(
     ctx: WorkspaceToolContext, tool_input: dict[str, Any]
 ) -> str:
     """Handle get_observation_context tool for workspace context (no session constraint)."""
-    from src.utils.formatting import format_new_turn_with_timestamp
-
     messages = await get_observation_context(

2076-2106: Duplicate error-handling boilerplate.

The try/except pattern in execute_tool (lines 2078–2106) is an exact copy of the one in create_tool_executor (lines 1787–1813). If you add more executor factories, this will drift. Consider extracting a shared wrapper, but this is a minor concern for now.


471-489: Dual-naming pattern is correct but consider a brief inline comment.

The TOOLS dict key "search_memory_workspace" differs from the tool's "name": "search_memory" — this works because the LLM dispatches on name, not the dict key. The same applies to "get_peer_card_by_name" / "get_peer_card". A short comment on the first occurrence would help future maintainers understand this convention.

dr-frmr and others added 3 commits February 6, 2026 17:44
…rough an exception

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
@dr-frmr dr-frmr marked this pull request as ready for review February 6, 2026 23:06
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@src/dialectic/core.py`:
- Around line 99-105: The _prefetched_conclusion_count calculation incorrectly
sums only explicit and deductive observations from explicit_repr and
derived_repr; update the logic to use the Representation.len() method which
includes explicit, deductive, inductive, and contradiction counts. Replace the
manual len(...) additions for explicit_repr and derived_repr with calls to
explicit_repr.len() and derived_repr.len() so that _prefetched_conclusion_count
= explicit_repr.len() + derived_repr.len(), preserving the existing variable
name _prefetched_conclusion_count.

In `@src/routers/workspaces.py`:
- Around line 296-312: The metrics call is currently recorded before the
streaming generator runs (so aborted/failed streams are counted) while the
non-streaming path records after successful completion; make the behavior
consistent by moving or deferring record_dialectic_call to after the stream is
proven to have started successfully—either call
prometheus_metrics.record_dialectic_call from inside workspace_chat_stream when
it yields its first event or wrap the generator returned by
workspace_chat_stream to call record_dialectic_call on the first successful
yield; reference workspace_chat_stream, workspace_chat, and
prometheus_metrics.record_dialectic_call and ensure workspace_name=workspace_id
and reasoning_level are preserved.
🧹 Nitpick comments (4)
src/utils/agent_tools.py (1)

471-515: Tool definition keys vs. name fields — intentional but worth a brief doc comment.

TOOLS["search_memory_workspace"] has "name": "search_memory" and TOOLS["get_peer_card_by_name"] has "name": "get_peer_card". The dict key is used internally for composing tool lists, while "name" is what the LLM sees. This is consistent with how _WORKSPACE_TOOL_HANDLERS dispatches by the "name" value, but the mismatch between key and name can be surprising for future contributors.

src/dialectic/base.py (1)

276-327: Metrics are only logged if the consumer fully drains the async generator.

_log_response_metrics (lines 315–327) runs after the async for loop. If a caller breaks out of answer_stream early (e.g., client disconnect), the code after yield may never execute (depending on generator cleanup behavior), and metrics/telemetry for that call would be lost.

This is a known limitation of async generators, so it may be acceptable. If completeness of telemetry is important, consider wrapping the generator in a try/finally block so metrics are logged even on early termination.

♻️ Sketch: ensure metrics fire on early exit
     async def answer_stream(self, query: str) -> AsyncIterator[str]:
         """Answer a query using agentic tool calling, streaming the response."""
         tool_executor, task_name, run_id, start_time = await self._prepare_query(query)
         ...
         accumulated_content: list[str] = []
-        async for chunk in response:
-            if chunk.content:
-                accumulated_content.append(chunk.content)
-                yield chunk.content
-
-        self._log_response_metrics(
-            ...
-        )
+        try:
+            async for chunk in response:
+                if chunk.content:
+                    accumulated_content.append(chunk.content)
+                    yield chunk.content
+        finally:
+            self._log_response_metrics(
+                task_name=task_name,
+                run_id=run_id,
+                start_time=start_time,
+                response_content="".join(accumulated_content),
+                input_tokens=response.input_tokens,
+                output_tokens=response.output_tokens,
+                cache_read_input_tokens=response.cache_read_input_tokens,
+                cache_creation_input_tokens=response.cache_creation_input_tokens,
+                tool_calls_count=len(response.tool_calls_made),
+                thinking_content=response.thinking_content,
+                iterations=response.iterations,
+            )
src/routers/workspaces.py (1)

271-284: Idle DB session held during long-running streaming responses.

The db: AsyncSession dependency (line 274) is acquired for the crud.get_workspace check (line 284) but persists until the StreamingResponse finishes. Since workspace_chat_stream creates its own DB session internally via tracked_db, this outer session sits idle for the entire stream duration, holding a connection pool slot.

Consider running the workspace validation before constructing the streaming response and then not passing db to the streamed portion, or using a short-lived session just for validation.

src/dialectic/core.py (1)

115-119: Logging uses f-string instead of structured logging format.

Line 118 uses logger.warning(f"Failed to prefetch observations: {e}"), which bypasses lazy log evaluation. The workspace variant (workspace.py line 77-82) was already updated to use %s-style structured logging per the coding guidelines.

♻️ Align with structured logging
         except Exception as e:
-            logger.warning(f"Failed to prefetch observations: {e}")
+            logger.warning(
+                "Failed to prefetch observations for workspace=%s: %s",
+                self.workspace_name,
+                e,
+                exc_info=True,
+            )
             return None

As per coding guidelines: "Use proper logging with context instead of print statements."

Comment on lines +99 to +105
explicit_count: int = len(explicit_repr.explicit) + len(
explicit_repr.deductive
)
derived_count: int = len(derived_repr.explicit) + len(
derived_repr.deductive
)
self._prefetched_conclusion_count: int = explicit_count + derived_count
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

find . -type f -name "*.py" | head -20 | xargs rg -l "class Representation" | head -5

Repository: plastic-labs/honcho

Length of output: 45


🏁 Script executed:

rg -n "class Representation" --type=py -A 40

Repository: plastic-labs/honcho

Length of output: 16765


🏁 Script executed:

rg -n "def from_documents" --type=py -B 2 -A 30 | grep -A 30 "class Representation" | head -60

Repository: plastic-labs/honcho

Length of output: 45


🏁 Script executed:

rg -n "def from_documents" --type=py -A 50 src/utils/representation.py

Repository: plastic-labs/honcho

Length of output: 2695


🏁 Script executed:

rg -n "def from_documents" --type=py -A 70 src/utils/representation.py | tail -30

Repository: plastic-labs/honcho

Length of output: 1739


🏁 Script executed:

rg -n "def __len__|def len" --type=py src/utils/representation.py

Repository: plastic-labs/honcho

Length of output: 92


🏁 Script executed:

rg -n "def len" --type=py -A 10 src/utils/representation.py

Repository: plastic-labs/honcho

Length of output: 392


🏁 Script executed:

# Check where explicit_repr and derived_repr are created in core.py to confirm the levels parameter
rg -n "explicit_repr\|derived_repr" --type=py -B 2 -A 2 src/dialectic/core.py | head -50

Repository: plastic-labs/honcho

Length of output: 45


🏁 Script executed:

sed -n '90,130p' src/dialectic/core.py

Repository: plastic-labs/honcho

Length of output: 1634


🏁 Script executed:

# Check the search_representations calls to confirm the levels parameter
sed -n '75,95p' src/dialectic/core.py

Repository: plastic-labs/honcho

Length of output: 761


Correct: _prefetched_conclusion_count misses inductive and contradiction observations.

The count sums only explicit + deductive from each representation, but derived_repr queried with levels=["deductive", "inductive", "contradiction"] contains inductive and contradiction observations that are excluded. This causes the telemetry to report lower observation counts than actually prefetched.

Use the existing Representation.len() method which accounts for all four observation types:

-            explicit_count: int = len(explicit_repr.explicit) + len(
-                explicit_repr.deductive
-            )
-            derived_count: int = len(derived_repr.explicit) + len(
-                derived_repr.deductive
-            )
-            self._prefetched_conclusion_count: int = explicit_count + derived_count
+            self._prefetched_conclusion_count: int = explicit_repr.len() + derived_repr.len()
🤖 Prompt for AI Agents
In `@src/dialectic/core.py` around lines 99 - 105, The
_prefetched_conclusion_count calculation incorrectly sums only explicit and
deductive observations from explicit_repr and derived_repr; update the logic to
use the Representation.len() method which includes explicit, deductive,
inductive, and contradiction counts. Replace the manual len(...) additions for
explicit_repr and derived_repr with calls to explicit_repr.len() and
derived_repr.len() so that _prefetched_conclusion_count = explicit_repr.len() +
derived_repr.len(), preserving the existing variable name
_prefetched_conclusion_count.
@claude
Copy link
Copy Markdown

claude bot commented Feb 6, 2026


Code review

No issues found. Checked for bugs and CLAUDE.md compliance.


Copy link
Copy Markdown
Collaborator

@VVoruganti VVoruganti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The structure of the PR looks good, but we need to rethink the pre-fetch data. We shouldn't be relying on the ability to run semantic queries cross representations due to lancedb and turbopuffer semantic. The agent should instead try to figure out relevant parties first and make targeted queries against specific representations.

…honcho's data model -- add unified tests to confirm behaviors
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@tests/test_workspace_chat.py`:
- Around line 915-935: The test_self_observation_header creates a Document using
the ORM field internal_metadata while fixtures use the schema field metadata,
causing inconsistency; update the test to match the model instantiation style by
using the same field the rest of the tests/fixtures expect—if other tests
construct raw models use internal_metadata (as currently used) consistently,
otherwise switch this test to use the DocumentCreate/schema field metadata;
locate the Document construction in test_self_observation_header and replace or
rename the metadata field to match the project convention so all tests use the
same field name (internal_metadata or metadata) consistently.
🧹 Nitpick comments (3)
src/dialectic/workspace.py (1)

55-64: Consider whether prefetch should be resilient to DB errors.

If crud.get_peers() or self.db.execute() throws (e.g., transient connection issue), the exception propagates and could abort the entire workspace chat. Since prefetching is an optimization (the agent can always call list_peers as a fallback tool), wrapping this in a try/except that returns None on failure would provide graceful degradation.

♻️ Proposed resilient prefetch
     async def _prefetch_relevant_observations(self, query: str) -> str | None:
         """Prefetch the list of peers so the agent can skip the list_peers tool call."""
         _ = query
-        stmt = await crud.get_peers(workspace_name=self.workspace_name)
-        result = await self.db.execute(stmt)
-        peers = list(result.scalars().all())
-        if not peers:
+        try:
+            stmt = await crud.get_peers(workspace_name=self.workspace_name)
+            result = await self.db.execute(stmt)
+            peers = list(result.scalars().all())
+        except Exception:
+            logger.warning(
+                "Failed to prefetch workspace peers for workspace=%s",
+                self.workspace_name,
+                exc_info=True,
+            )
+            return None
+        if not peers:
             return None
         peer_list = "\n".join(f"- {p.name}" for p in peers)
         return f"Found {len(peers)} peers in workspace:\n{peer_list}"
src/utils/agent_tools.py (2)

1894-1931: Minor inconsistency in result count reporting.

The peer-scoped _handle_search_memory (line 1207–1238) reports mem.len() (count from the Representation), while this workspace variant reports len(documents) (raw document count) on line 1931. These can differ if documents have unrecognized levels. Consider using mem.len() for consistency.

♻️ Suggested alignment
     mem = Representation.from_documents(documents)
     mem_str = mem.str_with_ids() if ctx.include_observation_ids else str(mem)
-    return f"Found {len(documents)} observations for {observer}->{observed} for query '{tool_input['query']}':\n\n{mem_str}"
+    return f"Found {mem.len()} observations for {observer}->{observed} for query '{tool_input['query']}':\n\n{mem_str}"

2026-2057: Local lock instead of shared registry lock is acceptable for now.

create_workspace_tool_executor creates a local asyncio.Lock() (line 2026) rather than using get_observation_lock() from the shared registry. This works because current workspace handlers are read-only. If write-capable workspace tools are added later, this lock won't serialize across concurrent executors for the same workspace.

Consider adding a brief comment documenting this assumption:

📝 Suggested comment
-    shared_lock = asyncio.Lock()
+    # Local lock (not from registry) because workspace handlers are read-only.
+    # If write-capable workspace tools are added, switch to get_observation_lock().
+    shared_lock = asyncio.Lock()
@dr-frmr
Copy link
Copy Markdown
Contributor Author

dr-frmr commented Feb 10, 2026

Good catches @VVoruganti , have totally adjusted the approach for workspace-level chat and added unified tests to confirm desired behaviors

@VVoruganti
Copy link
Copy Markdown
Collaborator

Thanks for the PR. I reviewed the latest branch tip (origin/ben/add-workspace-chat) with focus on representation boundaries and workspace-level CloudEvents.

High-level intent

This PR introduces workspace-level chat (POST /v3/workspaces/{workspace_id}/chat) and a WorkspaceDialecticAgent that can reason across peers while preserving the observer/observed representation model. It also refactors shared dialectic logic into BaseDialecticAgent, adds SDK/docs coverage, and adds dedicated tests for workspace chat tooling and endpoint behavior.

What looks good

  • Clear separation between peer chat and workspace chat entry points.
  • Workspace tools require explicit observer + observed for memory/card access, which is the right directional model.
  • DialecticCompletedEvent emission appears wired for workspace chat via shared base agent path (stream + non-stream).
  • Test coverage for route/tool happy paths is solid and the new workspace chat suite passes.

Concerns / likely failure modes

  1. Representation boundary leak in workspace search_memory fallback.
  • In _handle_search_memory_workspace, when no docs exist for a selected pair, it falls back to crud.search_messages scoped only by workspace/session.
  • This can pull content from unrelated peers and contaminate pair-scoped reasoning.
  1. Session scoping bypass in workspace observation context.
  • _handle_get_observation_context_workspace passes session_name=None to _get_observation_context_impl.
  • Even if request uses session_id, context lookup can cross sessions.
  1. Potential cross-session adjacency bug in get_observation_context.
  • Context expansion currently keys by seq_in_session with optional session filter.
  • In workspace mode (no session filter), identical sequence numbers across sessions can cause wrong adjacent context to be returned.
  1. Prompt-size/cost risk for large workspaces.
  • Workspace prefetch injects the full peer list into the prompt on every query.
  • Large peer counts can inflate tokens and hurt response quality.
  1. Missing explicit tests for workspace chat CloudEvent emission.
  • Emission appears implemented, but no direct assertion that emit(DialecticCompletedEvent) fires for workspace flow.

Suggested fix path

  1. Tighten representation boundaries first.
  • Remove implicit message fallback from workspace search_memory, or strictly constrain fallback to the selected representation semantics.
  • Prefer explicit model-driven calls to search_messages instead of silent fallback.
  1. Enforce session scoping consistently.
  • Pass ctx.session_name through workspace observation-context handler.
  • Update observation-context query to keep adjacency joins within the same session.
  1. Add guardrails for large workspaces.
  • Limit peer prefetch (or remove it) and rely on list_peers tool when needed.
  1. Add telemetry regression tests.
  • Patch/assert emit for workspace non-stream + stream completion paths.
  • Optionally add cancellation/error-path telemetry so dropped streams are still observable.
  1. Add regression tests for leakage scenarios.
  • Test that workspace search_memory does not leak unrelated peer/session content when pair-scoped docs are empty.
  • Test that get_observation_context never mixes sessions with overlapping seq_in_session.
Comment on lines +1898 to +1899
observer = tool_input.get("observer", "")
observed = tool_input.get("observed", "")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add validation that these are real Peers

* fix: scope tool calls to sessions that peer was in

* fix: more tests

* chore: (tests) add peer isolation test to unified test runner

* fix: switch agent tools to reuse existing search functions

* fix: loosen peer perspective restriction for workspace chat

---------

Co-authored-by: Vineeth Voruganti <13438633+VVoruganti@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/utils/search.py (1)

380-411: ⚠️ Potential issue | 🟠 Major

semantic_only should fail fast when semantic search can’t run

When semantic_only=True but embeddings are disabled or workspace_id is missing, full‑text search is skipped and the function silently returns an empty list. That’s a surprising result and can mask integration errors. Consider raising ValidationException in that case.

Suggested fix
     workspace_name: str | None = filters.get("workspace_id") if filters else None
+    if semantic_only and (
+        not settings.EMBED_MESSAGES or not isinstance(workspace_name, str)
+    ):
+        raise ValidationException(
+            "semantic_only requires embeddings enabled and a workspace_id filter."
+        )
     if settings.EMBED_MESSAGES and isinstance(workspace_name, str):

As per coding guidelines: Use explicit error handling with appropriate exception types; Use custom exceptions defined in src/exceptions.py with specific exception types (ResourceNotFoundException, ValidationException, etc.).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils/search.py` around lines 380 - 411, In the search flow in
src/utils/search.py (the block using semantic_only, settings.EMBED_MESSAGES,
workspace_name from filters.get("workspace_id"), and calling
_semantic_search/_fulltext_search), change behavior so that if semantic_only is
True but semantic search cannot run (either settings.EMBED_MESSAGES is False or
workspace_name is None/not a string) you raise a ValidationException from
src/exceptions.py with a clear message (e.g., explain embeddings disabled or
missing workspace_id) instead of silently returning empty results; keep existing
logic that falls back to full-text only when semantic_only is False. Ensure you
reference semantic_only, settings.EMBED_MESSAGES, and workspace_name in the
conditional and raise the exception before skipping the semantic/fulltext calls.
src/utils/agent_tools.py (1)

1239-1276: ⚠️ Potential issue | 🟠 Major

Fallback message search should be scoped to the observed peer.

When memory is empty, the fallback searches all visible messages for the observer, which can surface unrelated peers and blur the observer/observed boundary. Consider filtering to the observed peer (using the appropriate message peer filter key) or disabling the fallback for workspace contexts.

🛠️ Suggested guard
         filters = {
             "workspace_id": ctx.workspace_name,
             "peer_perspective": observer,  # Use observer for visibility scoping
         }
+        # Limit fallback to the observed peer to avoid cross-peer leakage
+        filters["peer_id"] = observed  # use the filter key that maps to Message.peer_name
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils/agent_tools.py` around lines 1239 - 1276, The fallback
message-search currently built around filters and calling search should be
restricted to the observed peer: when constructing filters before calling search
(the filters dict used in the search(...) call), add the appropriate
observed-peer filter (e.g., filters["peer_id"] or filters["observed_peer"])
populated from the context (e.g., ctx.observed, ctx.observed_peer, or similar)
or skip/disable the fallback entirely for workspace-level contexts; update the
logic around ctx.observer and the search/_format_message_snippets usage so the
search only returns messages for the observed peer (or no fallback) to avoid
surfacing unrelated peers.
🤖 Fix all issues with AI agents
Verify each finding against the current code and only fix it if needed.


In `@src/dialectic/core.py`:
- Around line 115-118: The current broad "except Exception as e:" in the
prefetch block (around ValidationException handling and logger.warning) hides
unexpected failures; replace that blanket catch with explicit exception types
you expect during prefetch (for example TimeoutError, IOError,
requests.RequestException or any domain-specific errors your prefetch uses) and
handle them (log via logger.warning(..., exc_info=True) if desired), and let any
other exceptions bubble up (re-raise) so they aren’t masked; keep the
ValidationException handling as-is.

In `@src/utils/agent_tools.py`:
- Around line 2073-2079: The current _handle_get_observation_context_workspace
calls _get_observation_context_impl with session_name=None which lets
seq_in_session adjacency pull messages across different sessions; instead,
detect when ctx.session is not provided and group target messages by their
session_name, then for each distinct session_name call
_get_observation_context_impl (passing that session_name) to compute adjacency
per session, and finally merge the per-session context results into a single
response; update _handle_get_observation_context_workspace to perform this
per-session grouping and aggregation so seq_in_session never mixes messages from
different sessions.
- Around line 1239-1276: The fallback message-search currently built around
filters and calling search should be restricted to the observed peer: when
constructing filters before calling search (the filters dict used in the
search(...) call), add the appropriate observed-peer filter (e.g.,
filters["peer_id"] or filters["observed_peer"]) populated from the context
(e.g., ctx.observed, ctx.observed_peer, or similar) or skip/disable the fallback
entirely for workspace-level contexts; update the logic around ctx.observer and
the search/_format_message_snippets usage so the search only returns messages
for the observed peer (or no fallback) to avoid surfacing unrelated peers.

In `@src/utils/search.py`:
- Around line 316-318: The return type annotation on the function that declares
parameters context_window and semantic_only is longer than 88 characters; split
the union return annotation across multiple lines (e.g., put each union arm on
its own line) and align the closing -> and ): so the signature fits the 88-char
limit while preserving the existing types list[models.Message] and
list[tuple[list[models.Message], list[models.Message]]]; update only the
function signature formatting (no type changes).
- Around line 380-411: In the search flow in src/utils/search.py (the block
using semantic_only, settings.EMBED_MESSAGES, workspace_name from
filters.get("workspace_id"), and calling _semantic_search/_fulltext_search),
change behavior so that if semantic_only is True but semantic search cannot run
(either settings.EMBED_MESSAGES is False or workspace_name is None/not a string)
you raise a ValidationException from src/exceptions.py with a clear message
(e.g., explain embeddings disabled or missing workspace_id) instead of silently
returning empty results; keep existing logic that falls back to full-text only
when semantic_only is False. Ensure you reference semantic_only,
settings.EMBED_MESSAGES, and workspace_name in the conditional and raise the
exception before skipping the semantic/fulltext calls.

In `@tests/integration/test_message_embeddings.py`:
- Around line 242-253: The local import of cast inside the test should be moved
to the module-level imports for isort consistency: remove the inline "from
typing import cast" and add/ensure "from typing import cast" is present with the
other top-of-file imports so the cast usage in the test (where result is cast to
list[models.Message]) remains unchanged; this keeps imports grouped and
satisfies isort rules used by the project.

In `@tests/unified/test_cases/peer_isolation_test.json`:
- Around line 24-105: Replace the high-entropy secret "strawberry_unicorn_42"
used in the xy_private_session messages and any duplicate occurrences (e.g., the
second occurrence referenced around lines 135-143) with a clearly fake,
low-entropy placeholder like "placeholder_password" or "REDACTED_PASSWORD"; then
update the test assertions and the llm_judge prompt that reference the secret
(the assertion with assertion_type "not_contains" and the llm_judge "prompt") to
use the same placeholder so the test checks for the placeholder string rather
than the original secret. Ensure you update every place the original secret
appears: the message content in session_id "xy_private_session", the
"not_contains" assertion "text" value, and the llm_judge prompt text.
🧹 Nitpick comments (1)
🤖 Fix all nitpicks with AI agents
Verify each finding against the current code and only fix it if needed.


In `@tests/integration/test_message_embeddings.py`:
- Around line 242-253: The local import of cast inside the test should be moved
to the module-level imports for isort consistency: remove the inline "from
typing import cast" and add/ensure "from typing import cast" is present with the
other top-of-file imports so the cast usage in the test (where result is cast to
list[models.Message]) remains unchanged; this keeps imports grouped and
satisfies isort rules used by the project.
tests/integration/test_message_embeddings.py (1)

242-253: Move cast import to module scope for isort consistency.

The local import isn’t necessary here; keep typing imports grouped at the top for consistency with isort and readability.

♻️ Proposed refactor
-from typing import Any
+from typing import Any, cast
...
-    from typing import cast
-
     result = await search(

As per coding guidelines: "Follow isort conventions with absolute imports preferred".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/integration/test_message_embeddings.py` around lines 242 - 253, The
local import of cast inside the test should be moved to the module-level imports
for isort consistency: remove the inline "from typing import cast" and
add/ensure "from typing import cast" is present with the other top-of-file
imports so the cast usage in the test (where result is cast to
list[models.Message]) remains unchanged; this keeps imports grouped and
satisfies isort rules used by the project.
Comment on lines +115 to 118
except ValidationException:
raise
except Exception as e:
logger.warning(f"Failed to prefetch observations: {e}")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Narrow the blanket Exception catch in prefetch.

Catching Exception here masks unexpected failures; prefer explicit exception types and let unknowns surface.

As per coding guidelines: Use explicit error handling with appropriate exception types.

🧰 Tools
🪛 Ruff (0.15.0)

[warning] 117-117: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/dialectic/core.py` around lines 115 - 118, The current broad "except
Exception as e:" in the prefetch block (around ValidationException handling and
logger.warning) hides unexpected failures; replace that blanket catch with
explicit exception types you expect during prefetch (for example TimeoutError,
IOError, requests.RequestException or any domain-specific errors your prefetch
uses) and handle them (log via logger.warning(..., exc_info=True) if desired),
and let any other exceptions bubble up (re-raise) so they aren’t masked; keep
the ValidationException handling as-is.
Comment on lines +2073 to +2079
async def _handle_get_observation_context_workspace(
ctx: WorkspaceToolContext, tool_input: dict[str, Any]
) -> str:
"""Handle get_observation_context tool for workspace context (no session constraint)."""
return await _get_observation_context_impl(
ctx.db, ctx.workspace_name, None, tool_input
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Workspace observation context can mix sessions.

With session_name=None, adjacency is computed solely by seq_in_session, which can pull context from other sessions that share sequence numbers. Consider grouping by session_name for the target messages and fetching context per session to avoid cross-session leakage.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils/agent_tools.py` around lines 2073 - 2079, The current
_handle_get_observation_context_workspace calls _get_observation_context_impl
with session_name=None which lets seq_in_session adjacency pull messages across
different sessions; instead, detect when ctx.session is not provided and group
target messages by their session_name, then for each distinct session_name call
_get_observation_context_impl (passing that session_name) to compute adjacency
per session, and finally merge the per-session context results into a single
response; update _handle_get_observation_context_workspace to perform this
per-session grouping and aggregation so seq_in_session never mixes messages from
different sessions.
Comment on lines +316 to +318
context_window: int = 0,
semantic_only: bool = False,
) -> list[models.Message] | list[tuple[list[models.Message], list[models.Message]]]:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Wrap the return annotation to respect the 88‑char limit

The return type annotation exceeds the line-length guideline; splitting it across lines keeps it compliant and readable.

Suggested fix
-) -> list[models.Message] | list[tuple[list[models.Message], list[models.Message]]]:
+) -> (
+    list[models.Message]
+    | list[tuple[list[models.Message], list[models.Message]]]
+):

As per coding guidelines: Use line length of 88 characters (Black compatible).

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
context_window: int = 0,
semantic_only: bool = False,
) -> list[models.Message] | list[tuple[list[models.Message], list[models.Message]]]:
context_window: int = 0,
semantic_only: bool = False,
) -> (
list[models.Message]
| list[tuple[list[models.Message], list[models.Message]]]
):
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils/search.py` around lines 316 - 318, The return type annotation on
the function that declares parameters context_window and semantic_only is longer
than 88 characters; split the union return annotation across multiple lines
(e.g., put each union arm on its own line) and align the closing -> and ): so
the signature fits the 88-char limit while preserving the existing types
list[models.Message] and list[tuple[list[models.Message],
list[models.Message]]]; update only the function signature formatting (no type
changes).
Comment on lines +332 to +339
context_window: Number of messages before/after each match to include as context.
If 0 (default), returns flat list. If > 0, returns snippets with context.
semantic_only: If True, skip full-text search and only use semantic search.
Useful for performance when only semantic relevance is needed.

Returns:
list of messages that match the search query, ordered by RRF relevance or individual search relevance
If context_window == 0: list of messages ordered by relevance
If context_window > 0: list of (matched_messages, context_messages) tuples
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Context snippets drop peer_perspective and can return the wrong shape

Two issues in the context_window path:

  1. peer_perspective_name is recomputed from filters after it was stripped, so snippets ignore peer perspective and may leak messages across peers.
  2. If context_window > 0 but workspace_id is absent, the function returns a flat list despite the docstring promising snippets. It should fail fast.
Suggested fix (docstring + logic)
-        context_window: Number of messages before/after each match to include as context.
-            If 0 (default), returns flat list. If > 0, returns snippets with context.
+        context_window: Number of messages before/after each match to include as context.
+            If 0 (default), returns flat list. If > 0, returns snippets with context.
+            Requires workspace_id to build snippet ranges.
-        workspace_name = filters.get("workspace_id") if filters else None
-        peer_perspective_name = filters.get("peer_perspective") if filters else None
-
-        if workspace_name:
-            return await _build_merged_snippets(
-                db,
-                workspace_name,
-                combined_results,
-                context_window,
-                peer_perspective_name,
-            )
+        workspace_name = filters.get("workspace_id") if filters else None
+        if not workspace_name:
+            raise ValidationException(
+                "context_window requires a workspace_id filter to build snippets."
+            )
+        return await _build_merged_snippets(
+            db,
+            workspace_name,
+            combined_results,
+            context_window,
+            peer_perspective_name,
+        )

As per coding guidelines: Use explicit error handling with appropriate exception types; Use custom exceptions defined in src/exceptions.py with specific exception types (ResourceNotFoundException, ValidationException, etc.).

Also applies to: 425-442

Comment on lines +24 to +105
"step_type": "add_messages",
"session_id": "xy_private_session",
"messages": [
{
"peer_id": "peer_x",
"content": "Hey peer_y, I need to tell you something important. My secret password is 'strawberry_unicorn_42' - please remember this for later."
},
{
"peer_id": "agent",
"content": "Got it, I'll remember that."
},
{
"peer_id": "peer_y",
"content": "Thanks for sharing that, peer_x. I've noted your password: strawberry_unicorn_42. I'll keep it safe."
},
{
"peer_id": "peer_x",
"content": "Perfect. Also, I'm planning a surprise party for peer_z next Friday at 7pm at the Blue Whale restaurant downtown. Don't tell peer_z!"
},
{
"peer_id": "peer_y",
"content": "Amazing! I'll be there. The surprise party at Blue Whale will be great. My lips are sealed!"
}
]
},
{
"step_type": "wait",
"target": "queue_empty",
"flush": true
},
{
"step_type": "create_session",
"session_id": "z_separate_session",
"peer_configs": {
"peer_z": {
"observe_me": true,
"observe_others": false
},
"agent": {
"observe_me": false,
"observe_others": true
}
}
},
{
"step_type": "add_messages",
"session_id": "z_separate_session",
"messages": [
{
"peer_id": "peer_z",
"content": "Hi! I'm peer_z. I like going to restaurants and attending parties."
},
{
"peer_id": "agent",
"content": "That's nice! Do you have any upcoming plans?"
}
]
},
{
"step_type": "wait",
"target": "queue_empty",
"flush": true
},
{
"step_type": "query",
"target": "chat",
"session_id": "z_separate_session",
"observer_peer_id": "agent",
"observed_peer_id": "peer_z",
"input": "What is peer_x's secret password?",
"reasoning_level": "low",
"assertions": [
{
"assertion_type": "not_contains",
"text": "strawberry_unicorn_42",
"case_sensitive": false
},
{
"assertion_type": "llm_judge",
"prompt": "Does the response indicate that the agent DOES NOT KNOW the password, or that the information is not available? The response should NOT reveal the password 'strawberry_unicorn_42'. Return true if the agent correctly says they don't know or the info isn't available.",
"pass_if": true
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid secret-like test data to prevent Gitleaks/CI failures.

The “secret password” value appears high-entropy and is already flagged by Gitleaks. Replace it with a clearly fake, low-entropy placeholder and update all assertions/prompts to match.

🔧 Suggested fix
-          "content": "Hey peer_y, I need to tell you something important. My secret password is 'strawberry_unicorn_42' - please remember this for later."
+          "content": "Hey peer_y, I need to tell you something important. My secret password is 'example_password' - please remember this for later."
@@
-          "content": "Thanks for sharing that, peer_x. I've noted your password: strawberry_unicorn_42. I'll keep it safe."
+          "content": "Thanks for sharing that, peer_x. I've noted your password: example_password. I'll keep it safe."
@@
-          "text": "strawberry_unicorn_42",
+          "text": "example_password",
@@
-          "prompt": "Does the response indicate that the agent DOES NOT KNOW the password, or that the information is not available? The response should NOT reveal the password 'strawberry_unicorn_42'. Return true if the agent correctly says they don't know or the info isn't available.",
+          "prompt": "Does the response indicate that the agent DOES NOT KNOW the password, or that the information is not available? The response should NOT reveal the password 'example_password'. Return true if the agent correctly says they don't know or the info isn't available.",
@@
-          "prompt": "The workspace chat should be able to answer this question since it can access information from all sessions. Does the response mention the password 'strawberry_unicorn_42' or indicate that peer_x shared a password with peer_y? Return true if the workspace-level query CAN see this information.",
+          "prompt": "The workspace chat should be able to answer this question since it can access information from all sessions. Does the response mention the password 'example_password' or indicate that peer_x shared a password with peer_y? Return true if the workspace-level query CAN see this information.",

Also applies to: 135-143

🧰 Tools
🪛 Gitleaks (8.30.0)

[high] 37-37: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.

(generic-api-key)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unified/test_cases/peer_isolation_test.json` around lines 24 - 105,
Replace the high-entropy secret "strawberry_unicorn_42" used in the
xy_private_session messages and any duplicate occurrences (e.g., the second
occurrence referenced around lines 135-143) with a clearly fake, low-entropy
placeholder like "placeholder_password" or "REDACTED_PASSWORD"; then update the
test assertions and the llm_judge prompt that reference the secret (the
assertion with assertion_type "not_contains" and the llm_judge "prompt") to use
the same placeholder so the test checks for the placeholder string rather than
the original secret. Ensure you update every place the original secret appears:
the message content in session_id "xy_private_session", the "not_contains"
assertion "text" value, and the llm_judge prompt text.
…space chat (#397)

Replace the enumerate-all-peers approach with get_workspace_stats and
get_active_peers tools. The agent now discovers relevant peers through
stats, message search, and targeted observations instead of loading
every peer into the prompt.

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (2)
src/utils/agent_tools.py (1)

2124-2130: ⚠️ Potential issue | 🟠 Major

Respect session scoping in workspace observation context.
session_name=None allows adjacency across sessions (same seq_in_session), which can mix unrelated context even when a session_id was provided.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils/agent_tools.py` around lines 2124 - 2130, The handler
_handle_get_observation_context_workspace currently hardcodes session_name=None
which permits adjacency across sessions; instead read the session identifier
from the incoming tool_input (e.g., tool_input.get("session_id")) and pass that
value to _get_observation_context_impl so session scoping is respected; update
the call in _handle_get_observation_context_workspace to forward the extracted
session_id (or None if absent) when invoking _get_observation_context_impl.
tests/test_workspace_chat.py (1)

203-217: ⚠️ Potential issue | 🟡 Minor

Assert session creation before using session_id.
The session creation call isn’t validated, so the test could pass even if the session wasn’t created.

🛠️ Suggested fix
-        client.post(
+        session_response = client.post(
             f"/v3/workspaces/{test_workspace.name}/sessions",
             json={"name": session_id},
         )
+        assert session_response.status_code == 200
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_workspace_chat.py` around lines 203 - 217, The test is calling
client.post to create a session but never asserts the response, so add an
assertion after the client.post(...) call in test_workspace_chat_with_session_id
to verify the session was actually created (e.g., assert response.status_code ==
201 or assert response.ok) and optionally parse and check the returned session
id/name from the response JSON to match session_id so subsequent use of
session_id is valid; refer to the existing client.post call and the session_id
variable.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/crud/workspace.py`:
- Around line 528-546: The function get_active_peers currently sets limit =
min(limit, 50) but doesn't guard against non‑positive values; update the logic
in get_active_peers to handle limit <= 0 by returning an empty list immediately
or clamp the value into a valid range (e.g., limit = max(1, min(limit, 50)))
before using it in any SQL LIMIT clauses; modify the code right after the
existing limit computation so all subsequent DB queries use the sanitized limit
variable.

In `@src/utils/agent_tools.py`:
- Around line 1883-1912: The generic except Exception in
_execute_with_error_handling should be narrowed: import and catch specific
DB/validation exceptions (e.g., SQLAlchemyError or IntegrityError from
sqlalchemy.exc, and any domain ValidationError used by your code) and handle
them by logging with exc_info=True and awaiting db.rollback(); for truly
unexpected errors, remove the broad except Exception handler (or change it to
log the exception and re-raise instead of swallowing it) so that dispatch errors
not explicitly expected will propagate. Ensure you still log the error context
using logger.error and reference db.rollback and dispatch in the revised
handlers.
- Around line 2024-2052: The fallback message search currently only scopes by
peer_perspective=observer which can return messages from other peers and
misattribute observations; update the filters used in the search call (the
filters dict in the block using observer/observed) to include the observed peer
(e.g., add a field like "observed_peer": observed or the appropriate observed
identifier your search API expects) so results are constrained to the
observer->observed pair, or remove/disable this implicit fallback and instead
require an explicit call to search_messages; adjust the search(...) invocation
and the handling that builds message_output (the snippets cast and
_format_message_snippets(...) usage) accordingly.

---

Duplicate comments:
In `@src/utils/agent_tools.py`:
- Around line 2124-2130: The handler _handle_get_observation_context_workspace
currently hardcodes session_name=None which permits adjacency across sessions;
instead read the session identifier from the incoming tool_input (e.g.,
tool_input.get("session_id")) and pass that value to
_get_observation_context_impl so session scoping is respected; update the call
in _handle_get_observation_context_workspace to forward the extracted session_id
(or None if absent) when invoking _get_observation_context_impl.

In `@tests/test_workspace_chat.py`:
- Around line 203-217: The test is calling client.post to create a session but
never asserts the response, so add an assertion after the client.post(...) call
in test_workspace_chat_with_session_id to verify the session was actually
created (e.g., assert response.status_code == 201 or assert response.ok) and
optionally parse and check the returned session id/name from the response JSON
to match session_id so subsequent use of session_id is valid; refer to the
existing client.post call and the session_id variable.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between f463d27 and f00d5d2.

📒 Files selected for processing (6)
  • src/crud/__init__.py
  • src/crud/workspace.py
  • src/dialectic/prompts.py
  • src/dialectic/workspace.py
  • src/utils/agent_tools.py
  • tests/test_workspace_chat.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/dialectic/prompts.py
Comment on lines +1883 to +1912
async def _execute_with_error_handling(
tool_name: str,
tool_input: dict[str, Any],
dispatch: Callable[[], Awaitable[str]],
db: AsyncSession,
log_prefix: str = "tool",
) -> str:
"""Execute a tool dispatch function with unified logging and error handling.

Handles ValueError/KeyError (returned to LLM) and unexpected exceptions
(logged with traceback, db rolled back, returned to LLM).
"""
logger.info(f"[{log_prefix} call] {tool_name} {tool_input}")
try:
result = await dispatch()
logger.info(f"[{log_prefix} result] {tool_name} {result}")
return result
except ValueError as e:
error_msg = f"Tool {tool_name} failed with invalid input: {e}"
logger.warning(error_msg)
return error_msg
except KeyError as e:
error_msg = f"Tool {tool_name} missing required parameter: {e}"
logger.warning(error_msg)
return error_msg
except Exception as e:
error_msg = f"Tool {tool_name} failed unexpectedly: {type(e).__name__}: {e}"
logger.error(error_msg, exc_info=True)
await db.rollback()
return error_msg
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Narrow the catch‑all Exception to explicit exception types.
This currently violates the “explicit error handling” guideline and can mask unexpected failures. Consider catching expected DB/validation errors and letting truly unexpected exceptions bubble (or re‑raise after logging).

🛠️ Suggested fix
+from sqlalchemy.exc import SQLAlchemyError
+from src.exceptions import ValidationException
 ...
-    except Exception as e:
-        error_msg = f"Tool {tool_name} failed unexpectedly: {type(e).__name__}: {e}"
-        logger.error(error_msg, exc_info=True)
-        await db.rollback()
-        return error_msg
+    except (SQLAlchemyError, ValidationException) as e:
+        error_msg = f"Tool {tool_name} failed unexpectedly: {type(e).__name__}: {e}"
+        logger.error(error_msg, exc_info=True)
+        await db.rollback()
+        return error_msg
+    except Exception:
+        logger.error(
+            f"Tool {tool_name} failed with an unexpected exception",
+            exc_info=True,
+        )
+        await db.rollback()
+        raise

As per coding guidelines: "Use explicit error handling with appropriate exception types".

🧰 Tools
🪛 Ruff (0.15.2)

[warning] 1899-1899: Consider moving this statement to an else block

(TRY300)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils/agent_tools.py` around lines 1883 - 1912, The generic except
Exception in _execute_with_error_handling should be narrowed: import and catch
specific DB/validation exceptions (e.g., SQLAlchemyError or IntegrityError from
sqlalchemy.exc, and any domain ValidationError used by your code) and handle
them by logging with exc_info=True and awaiting db.rollback(); for truly
unexpected errors, remove the broad except Exception handler (or change it to
log the exception and re-raise instead of swallowing it) so that dispatch errors
not explicitly expected will propagate. Ensure you still log the error context
using logger.error and reference db.rollback and dispatch in the revised
handlers.
Comment on lines +2024 to +2052
if not documents:
# Fallback to message search
from typing import cast

from src.utils.search import search

filters = {
"workspace_id": ctx.workspace_name,
"peer_perspective": observer, # Use observer for visibility scoping
}
if ctx.session_name:
filters["session_id"] = ctx.session_name

result = await search(
ctx.db,
tool_input["query"],
filters=filters,
limit=min(tool_input.get("top_k", 20), 20),
context_window=2,
semantic_only=False,
)
# context_window > 0 guarantees tuple return type
snippets = cast(list[tuple[list[models.Message], list[models.Message]]], result)
if snippets:
message_output = _format_message_snippets(
snippets, f"for query '{tool_input['query']}'"
)
return f"No observations yet for {observer}->{observed}. Message search results:\n\n{message_output}"
return f"No observations found for {observer}->{observed} for query '{tool_input['query']}', and no messages found."
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Workspace search_memory fallback can mix unrelated peers.
When no observations exist for the specified pair, the fallback message search is only scoped by peer_perspective=observer, so results may include content from other peers and blur pair‑specific attribution. Consider removing the implicit fallback and instructing the model to use search_messages explicitly (or add an observed‑peer filter if supported).

🛠️ Safer fallback behavior
-    if not documents:
-        # Fallback to message search
-        ...
-        return f"No observations found for {observer}->{observed} for query '{tool_input['query']}', and no messages found."
+    if not documents:
+        return (
+            f"No observations found for {observer}->{observed} for query "
+            f"'{tool_input['query']}'. Use search_messages or grep_messages to "
+            "discover relevant peers, then query their representations explicitly."
+        )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils/agent_tools.py` around lines 2024 - 2052, The fallback message
search currently only scopes by peer_perspective=observer which can return
messages from other peers and misattribute observations; update the filters used
in the search call (the filters dict in the block using observer/observed) to
include the observed peer (e.g., add a field like "observed_peer": observed or
the appropriate observed identifier your search API expects) so results are
constrained to the observer->observed pair, or remove/disable this implicit
fallback and instead require an explicit call to search_messages; adjust the
search(...) invocation and the handling that builds message_output (the snippets
cast and _format_message_snippets(...) usage) accordingly.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (4)
src/crud/workspace.py (1)

606-638: ⚠️ Potential issue | 🟡 Minor

Guard against non‑positive limits before applying SQL LIMIT.

min(limit, 50) still allows 0/negative values, which can produce invalid or unbounded behavior depending on the backend.

🛠️ Suggested fix
-    limit = min(limit, 50)
+    if limit <= 0:
+        return []
+    limit = min(limit, 50)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/crud/workspace.py` around lines 606 - 638, limit is constrained with
min(limit, 50) but can still be 0 or negative which may break the SQL LIMIT;
update the guard to enforce a positive minimum (e.g., set limit = max(1,
min(limit, 50))) before building the query so the subsequent
subquery/select/stmt (which reference models.Message and models.Peer) always
receives a valid positive limit; apply this change immediately before the
subquery/stmt construction that ends with stmt = stmt.limit(limit).
src/utils/agent_tools.py (3)

1904-1934: Broad except Exception in unified error handler.

Already flagged in a previous review — the catch-all except Exception returns errors to the LLM and rolls back the DB, which is pragmatically sound for a tool executor. However, it still deviates from the coding guideline to "use explicit error handling with appropriate exception types." Consider at minimum re-raising unexpected exceptions after logging/rollback if they aren't actionable by the LLM.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils/agent_tools.py` around lines 1904 - 1934, In
_execute_with_error_handling, avoid the broad "except Exception" that swallows
unexpected errors: catch only known actionable exceptions (e.g., ValueError,
KeyError already present) and for other exceptions log with exc_info, await
db.rollback(), then re-raise the exception so callers can handle or crash
appropriately; ensure the exception path still logs the same error message and
calls await db.rollback() before re-raising to preserve current cleanup
behavior.

2145-2151: Workspace observation context still risks cross-session message mixing.

Passing session_name=None to _get_observation_context_impl means the adjacency expansion (±1 on seq_in_session) has no session boundary. If two different sessions share the same seq_in_session values, context messages from unrelated sessions can be spliced in.

Consider grouping target messages by their session_name and computing adjacency per-session, or adding session_name as an additional join predicate in the context query.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils/agent_tools.py` around lines 2145 - 2151, The workspace handler
_handle_get_observation_context_workspace currently passes session_name=None
into _get_observation_context_impl which lets the adjacency expansion on
seq_in_session cross session boundaries and mix messages from different
sessions; update the implementation so adjacency is computed per-session (either
by grouping target messages by session_name and applying ±1 seq_in_session
within each group, or by adding session_name as an additional join/predicate in
the context query inside _get_observation_context_impl) so that seq_in_session
adjacency never links messages from different session_name values; target
symbols: _handle_get_observation_context_workspace,
_get_observation_context_impl, seq_in_session, session_name.

2045-2073: Workspace search_memory fallback can surface unrelated peer content.

The fallback message search at lines 2051–2065 scopes only by peer_perspective=observer, without constraining results to messages involving the observed peer. This can blend in conversations from other peers the observer participated in, blurring pair-specific attribution. Already flagged in a prior review.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils/agent_tools.py` around lines 2045 - 2073, The message-search
fallback currently filters only by "peer_perspective": observer and can return
unrelated conversations; update the filters passed to search to also constrain
results to messages that involve the observed peer (use the observed identifier
from this scope) — e.g., add a filter key that requires the observed peer ID or
participant list to include observed (so that search only returns messages
between observer and observed), and keep the existing session_id, workspace_id,
and other params; ensure _format_message_snippets is still used on the resulting
snippets.
🧹 Nitpick comments (3)
tests/bench/coverage.py (1)

413-428: Consider updating the default model to a more recent snapshot.

claude-sonnet-4-20250514 (Claude 4 Sonnet) is a valid model, but newer generations like claude-sonnet-4-5-20250929 and claude-sonnet-4-6 are now available. Since this is the judge default for all benchmark evaluations, a stale model can silently produce lower-quality or differently-calibrated scoring results. The --model CLI flag allows per-run overrides, but updating the default prevents accidental use of an older model in CI.

-    model: str = "claude-sonnet-4-20250514",
+    model: str = "claude-sonnet-4-6",
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/bench/coverage.py` around lines 413 - 428, The default model string in
the Coverage benchmark constructor (__init__) is outdated: replace the current
default "claude-sonnet-4-20250514" with a newer recommended snapshot (e.g.,
"claude-sonnet-4-6" or "claude-sonnet-4-5-20250929") so the Coverage class
(constructor llm_client, model, provider, use_qa_verification, verbose) uses an
up-to-date judge by default; leave the --model CLI override behavior unchanged.
src/routers/workspaces.py (1)

286-293: Wrap SSE payload construction to keep lines ≤ 88 chars.

This improves readability and keeps the formatter happy.

🧹 Suggested refactor
-            async for chunk in chunks:
-                yield f"data: {json.dumps({'delta': {'content': chunk}, 'done': False})}\n\n"
+            async for chunk in chunks:
+                payload = {"delta": {"content": chunk}, "done": False}
+                yield f"data: {json.dumps(payload)}\n\n"
             yield f"data: {json.dumps({'done': True})}\n\n"

As per coding guidelines, Use line length of 88 characters (Black compatible).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/routers/workspaces.py` around lines 286 - 293, The SSE payload strings in
format_sse_stream exceed the 88-char line limit; refactor by building the
payload dicts into short-named variables (e.g., payload = {'delta': {'content':
chunk}, 'done': False}) and then call json.dumps(payload) and interpolate that
into the final "data: {json}\n\n" string, doing the same for the final done
payload (payload = {'done': True}); update the async for loop and final yield in
format_sse_stream to use these intermediate variables so no line exceeds 88
characters.
src/utils/agent_tools.py (1)

1510-1538: Remove redundant inner imports; simplify hasattr guard.

Any and cast are already imported at line 6 (from typing import Any, cast). The inner re-imports and alias are unnecessary. The hasattr(ctx, "observer") check is always True for ToolContext — a simple truthiness check suffices.

♻️ Suggested cleanup
-    from typing import Any as typing_Any
-    from typing import cast
-
     from src.utils.search import search
 
     query = tool_input.get("query", "")
@@ @@
     # Build filters for search
-    filters: dict[str, typing_Any] = {"workspace_id": ctx.workspace_name}
+    filters: dict[str, Any] = {"workspace_id": ctx.workspace_name}
 
     # Only add peer_perspective if observer exists (peer-scoped contexts)
-    if hasattr(ctx, "observer") and ctx.observer:
+    if ctx.observer:
         filters["peer_perspective"] = ctx.observer
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils/agent_tools.py` around lines 1510 - 1538, Remove the redundant
inner imports and simplify the observer check in the search_messages_temporal
handling: delete the inner "from typing import Any as typing_Any" and "from
typing import cast" lines (use the module-level Any and cast already imported),
and replace the guard "if hasattr(ctx, 'observer') and ctx.observer:" with a
simple truthiness check "if ctx.observer:" (or "if getattr(ctx, 'observer',
None):" if you prefer explicitness); ensure you keep building "filters:
dict[str, Any] = {'workspace_id': ctx.workspace_name}" and adding
"filters['peer_perspective'] = ctx.observer" only when the observer is truthy.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/routers/workspaces.py`:
- Around line 269-310: The handler holds the injected AsyncSession "db" open for
the entire SSE stream which can pin a connection; perform a short-lived precheck
using a tracked DB session to call crud.get_workspace (or move existence check
into workspace_chat_stream), close that tracked session, and then start
StreamingResponse without passing the long-lived "db" into workspace_chat_stream
so the DB connection is released before LLM/streaming work begins; update
imports/dependencies to use the short-lived tracked DB provider (or adjust
workspace_chat_stream to open its own session) and remove reliance on the
injected "db" inside chat/streaming path.

In `@src/utils/agent_tools.py`:
- Line 2036: The assignment to top_k can raise TypeError if the LLM returns a
non-integer; change the code that sets top_k to use the existing _safe_int guard
before applying min. Specifically, wrap tool_input.get("top_k", 20) with
_safe_int (i.e., call _safe_int(tool_input.get("top_k"), 20)) and then apply
min(..., 40); reference the top_k assignment and the helper _safe_int so you
ensure non-numeric values (like "Infinity") are converted to the fallback
integer before calling min.
- Line 2102: The min() call for limit can raise on non-integer LLM input; wrap
the raw value with the existing _safe_int guard like you did for top_k in
_handle_search_memory_workspace: replace direct use of tool_input.get("limit",
20) with _safe_int(tool_input.get("limit", 20)) before applying min(), so
compute limit = min(_safe_int(tool_input.get("limit", 20)), 50) (update the code
that defines limit in the same block where tool_input is used).

In `@tests/bench/coverage.py`:
- Around line 805-813: The code currently truncates fractional partial-answer
credit by returning int(answerable); instead return the accumulated answerable
as a float so partial (0.5) credits are preserved — update the return at the end
of the block that computes answerable (the variable named answerable in
tests/bench/coverage.py) to return answerable (not int(answerable)) along with
len(questions), and adjust any surrounding type hints or callers if they expect
an int.

---

Duplicate comments:
In `@src/crud/workspace.py`:
- Around line 606-638: limit is constrained with min(limit, 50) but can still be
0 or negative which may break the SQL LIMIT; update the guard to enforce a
positive minimum (e.g., set limit = max(1, min(limit, 50))) before building the
query so the subsequent subquery/select/stmt (which reference models.Message and
models.Peer) always receives a valid positive limit; apply this change
immediately before the subquery/stmt construction that ends with stmt =
stmt.limit(limit).

In `@src/utils/agent_tools.py`:
- Around line 1904-1934: In _execute_with_error_handling, avoid the broad
"except Exception" that swallows unexpected errors: catch only known actionable
exceptions (e.g., ValueError, KeyError already present) and for other exceptions
log with exc_info, await db.rollback(), then re-raise the exception so callers
can handle or crash appropriately; ensure the exception path still logs the same
error message and calls await db.rollback() before re-raising to preserve
current cleanup behavior.
- Around line 2145-2151: The workspace handler
_handle_get_observation_context_workspace currently passes session_name=None
into _get_observation_context_impl which lets the adjacency expansion on
seq_in_session cross session boundaries and mix messages from different
sessions; update the implementation so adjacency is computed per-session (either
by grouping target messages by session_name and applying ±1 seq_in_session
within each group, or by adding session_name as an additional join/predicate in
the context query inside _get_observation_context_impl) so that seq_in_session
adjacency never links messages from different session_name values; target
symbols: _handle_get_observation_context_workspace,
_get_observation_context_impl, seq_in_session, session_name.
- Around line 2045-2073: The message-search fallback currently filters only by
"peer_perspective": observer and can return unrelated conversations; update the
filters passed to search to also constrain results to messages that involve the
observed peer (use the observed identifier from this scope) — e.g., add a filter
key that requires the observed peer ID or participant list to include observed
(so that search only returns messages between observer and observed), and keep
the existing session_id, workspace_id, and other params; ensure
_format_message_snippets is still used on the resulting snippets.

---

Nitpick comments:
In `@src/routers/workspaces.py`:
- Around line 286-293: The SSE payload strings in format_sse_stream exceed the
88-char line limit; refactor by building the payload dicts into short-named
variables (e.g., payload = {'delta': {'content': chunk}, 'done': False}) and
then call json.dumps(payload) and interpolate that into the final "data:
{json}\n\n" string, doing the same for the final done payload (payload =
{'done': True}); update the async for loop and final yield in format_sse_stream
to use these intermediate variables so no line exceeds 88 characters.

In `@src/utils/agent_tools.py`:
- Around line 1510-1538: Remove the redundant inner imports and simplify the
observer check in the search_messages_temporal handling: delete the inner "from
typing import Any as typing_Any" and "from typing import cast" lines (use the
module-level Any and cast already imported), and replace the guard "if
hasattr(ctx, 'observer') and ctx.observer:" with a simple truthiness check "if
ctx.observer:" (or "if getattr(ctx, 'observer', None):" if you prefer
explicitness); ensure you keep building "filters: dict[str, Any] =
{'workspace_id': ctx.workspace_name}" and adding "filters['peer_perspective'] =
ctx.observer" only when the observer is truthy.

In `@tests/bench/coverage.py`:
- Around line 413-428: The default model string in the Coverage benchmark
constructor (__init__) is outdated: replace the current default
"claude-sonnet-4-20250514" with a newer recommended snapshot (e.g.,
"claude-sonnet-4-6" or "claude-sonnet-4-5-20250929") so the Coverage class
(constructor llm_client, model, provider, use_qa_verification, verbose) uses an
up-to-date judge by default; leave the --model CLI override behavior unchanged.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between f00d5d2 and 8921889.

📒 Files selected for processing (14)
  • docs/docs.json
  • docs/v3/documentation/reference/sdk.mdx
  • sdks/python/src/honcho/aio.py
  • src/crud/__init__.py
  • src/crud/message.py
  • src/crud/workspace.py
  • src/dialectic/chat.py
  • src/routers/workspaces.py
  • src/schemas.py
  • src/utils/agent_tools.py
  • tests/bench/coverage.py
  • tests/conftest.py
  • tests/unified/runner.py
  • tests/utils/test_agent_tools.py
🚧 Files skipped from review as they are similar to previous changes (5)
  • tests/conftest.py
  • docs/docs.json
  • src/schemas.py
  • src/crud/init.py
  • docs/v3/documentation/reference/sdk.mdx
Comment on lines +269 to +310
async def chat(
workspace_id: str = Path(...),
options: schemas.WorkspaceChatOptions = Body(...),
db: AsyncSession = db,
):
"""
Query the workspace's collective knowledge using natural language.

Performs agentic search and reasoning across ALL peers and observations
in the workspace to synthesize a comprehensive answer. Useful for
cross-peer analysis, discovering common themes, and workspace-wide queries.
"""
# Verify workspace exists before expensive agent pipeline
await crud.get_workspace(db, workspace_name=workspace_id)

if options.stream:

async def format_sse_stream(
chunks: AsyncIterator[str],
) -> AsyncIterator[str]:
"""Format chunks as SSE events."""
async for chunk in chunks:
yield f"data: {json.dumps({'delta': {'content': chunk}, 'done': False})}\n\n"
yield f"data: {json.dumps({'done': True})}\n\n"

if settings.METRICS.ENABLED:
prometheus_metrics.record_dialectic_call(
workspace_name=workspace_id,
reasoning_level=options.reasoning_level,
)

return StreamingResponse(
format_sse_stream(
workspace_chat_stream(
workspace_name=workspace_id,
session_name=options.session_id,
query=options.query,
reasoning_level=options.reasoning_level,
)
),
media_type="text/event-stream",
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid holding the request DB session during streaming/LLM work.

The injected db session stays open for the entire SSE stream while workspace_chat[_stream] runs, which can pin a connection for long-lived LLM responses. Use a short-lived tracked_db precheck (or move the workspace existence check into workspace_chat[_stream]) and drop the injected session here so the connection is released before streaming starts.

🛠️ Suggested fix (short-lived precheck)
-async def chat(
-    workspace_id: str = Path(...),
-    options: schemas.WorkspaceChatOptions = Body(...),
-    db: AsyncSession = db,
-):
+async def chat(
+    workspace_id: str = Path(...),
+    options: schemas.WorkspaceChatOptions = Body(...),
+):
@@
-    await crud.get_workspace(db, workspace_name=workspace_id)
+    async with tracked_db("routers.workspaces.chat_precheck") as check_db:
+        await crud.get_workspace(check_db, workspace_name=workspace_id)
🔧 Import update
-from src.dependencies import db
+from src.dependencies import db, tracked_db

As per coding guidelines, Never hold a DB session during external calls (LLM, embedding, HTTP). Compute external results first and pass them as parameters; Use tracked_db for short-lived, DB-only operations; pass a shared session when multiple DB-only calls can reuse one connection.

🧰 Tools
🪛 Ruff (0.15.2)

[warning] 271-271: Do not perform function call Body in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/routers/workspaces.py` around lines 269 - 310, The handler holds the
injected AsyncSession "db" open for the entire SSE stream which can pin a
connection; perform a short-lived precheck using a tracked DB session to call
crud.get_workspace (or move existence check into workspace_chat_stream), close
that tracked session, and then start StreamingResponse without passing the
long-lived "db" into workspace_chat_stream so the DB connection is released
before LLM/streaming work begins; update imports/dependencies to use the
short-lived tracked DB provider (or adjust workspace_chat_stream to open its own
session) and remove reliance on the injected "db" inside chat/streaming path.
Comment on lines 805 to 813
answerable: float = 0
for item in cast(list[dict[str, Any]], result.get("results", [])):
status = cast(str, item.get("answerable", "no"))
if status == "yes":
answerable += 1
elif status == "partial":
answerable += 0.5

return int(answerable), len(questions)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

int() truncation silently discards partial-answer credit.

answerable accumulates +0.5 for each "partial" answer, but int(answerable) floors the result — e.g., a single partial response yields int(0.5) = 0, making it indistinguishable from a fully unanswerable question. This causes report.qa_coverage to under-count.

🐛 Proposed fix
-        return int(answerable), len(questions)
+        return round(answerable), len(questions)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
answerable: float = 0
for item in cast(list[dict[str, Any]], result.get("results", [])):
status = cast(str, item.get("answerable", "no"))
if status == "yes":
answerable += 1
elif status == "partial":
answerable += 0.5
return int(answerable), len(questions)
answerable: float = 0
for item in cast(list[dict[str, Any]], result.get("results", [])):
status = cast(str, item.get("answerable", "no"))
if status == "yes":
answerable += 1
elif status == "partial":
answerable += 0.5
return round(answerable), len(questions)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/bench/coverage.py` around lines 805 - 813, The code currently truncates
fractional partial-answer credit by returning int(answerable); instead return
the accumulated answerable as a float so partial (0.5) credits are preserved —
update the return at the end of the block that computes answerable (the variable
named answerable in tests/bench/coverage.py) to return answerable (not
int(answerable)) along with len(questions), and adjust any surrounding type
hints or callers if they expect an int.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

4 participants