Skip to content

fix(api_server): streaming breaks when agent makes tool calls#2985

Merged
teknium1 merged 5 commits intomainfrom
hermes/hermes-7d7ac769
Mar 25, 2026
Merged

fix(api_server): streaming breaks when agent makes tool calls#2985
teknium1 merged 5 commits intomainfrom
hermes/hermes-7d7ac769

Conversation

@teknium1
Copy link
Copy Markdown
Contributor

Summary

When the agent makes tool calls during streaming, it fires stream_delta_callback(None) to signal the CLI display to close its response box. The API server's _on_delta callback was forwarding this None directly into the SSE queue, where the SSE writer treats it as end-of-stream and terminates the HTTP response prematurely.

After tool calls complete, the agent streams the final answer through the same callback, but the SSE response was already closed. Open WebUI (and similar frontends) never received the actual answer — they just saw the response "get stuck" during tool calling.

Fix

Filter out None in _on_delta so the SSE stream stays open through tool calls. The SSE loop already detects completion via agent_task.done(), which handles stream termination correctly without needing the None sentinel.

Test plan

  • Added test_stream_survives_tool_call_none_sentinel — simulates mid-stream None signals (tool calls) with text before and after, verifies all content reaches the SSE stream
  • All 115 API server tests pass
  • Non-streaming mode was unaffected (it blocks until agent completes)

Reported by Rohit Paul on X.

…ion events

Added calls to _fire_first_delta() in the AIAgent class to improve the handling of tool generation events, ensuring timely notifications during the processing of function calls and tool usage.
Enhanced the timeout configuration for chat completions in the AIAgent class by introducing customizable connection, read, and write timeouts using environment variables. This ensures more robust handling of API requests during streaming operations.
Updated the default stream read timeout from 120 seconds to 60 seconds in the AIAgent class, enhancing the timeout configuration for chat completions. This change aims to improve responsiveness during streaming operations.
Improved the error handling and retry mechanism for streaming requests in the AIAgent class. Introduced a configurable maximum number of stream retries and refined the handling of transient network errors, allowing for retries with fresh connections. Non-transient errors now trigger a fallback to non-streaming only when appropriate, ensuring better resilience during API interactions.
The agent fires stream_delta_callback(None) to signal the CLI display
to close its response box before tool execution begins. The API server's
_on_delta callback was forwarding this None directly into the SSE queue,
where the SSE writer treats it as end-of-stream and terminates the HTTP
response prematurely.

After tool calls complete, the agent streams the final answer through
the same callback, but the SSE response was already closed — so Open
WebUI (and similar frontends) never received the actual answer.

Fix: filter out None in _on_delta so the SSE stream stays open. The SSE
loop already detects completion via agent_task.done(), which handles
stream termination correctly without needing the None sentinel.

Reported by Rohit Paul on X.
@teknium1 teknium1 merged commit b2a6b01 into main Mar 25, 2026
1 of 2 checks passed
InB4DevOps pushed a commit to InB4DevOps/hermes-agent that referenced this pull request Mar 25, 2026
…search#2985)

* fix(run_agent): ensure _fire_first_delta() is called for tool generation events

Added calls to _fire_first_delta() in the AIAgent class to improve the handling of tool generation events, ensuring timely notifications during the processing of function calls and tool usage.

* fix(run_agent): improve timeout handling for chat completions

Enhanced the timeout configuration for chat completions in the AIAgent class by introducing customizable connection, read, and write timeouts using environment variables. This ensures more robust handling of API requests during streaming operations.

* fix(run_agent): reduce default stream read timeout for chat completions

Updated the default stream read timeout from 120 seconds to 60 seconds in the AIAgent class, enhancing the timeout configuration for chat completions. This change aims to improve responsiveness during streaming operations.

* fix(run_agent): enhance streaming error handling and retry logic

Improved the error handling and retry mechanism for streaming requests in the AIAgent class. Introduced a configurable maximum number of stream retries and refined the handling of transient network errors, allowing for retries with fresh connections. Non-transient errors now trigger a fallback to non-streaming only when appropriate, ensuring better resilience during API interactions.

* fix(api_server): streaming breaks when agent makes tool calls

The agent fires stream_delta_callback(None) to signal the CLI display
to close its response box before tool execution begins. The API server's
_on_delta callback was forwarding this None directly into the SSE queue,
where the SSE writer treats it as end-of-stream and terminates the HTTP
response prematurely.

After tool calls complete, the agent streams the final answer through
the same callback, but the SSE response was already closed — so Open
WebUI (and similar frontends) never received the actual answer.

Fix: filter out None in _on_delta so the SSE stream stays open. The SSE
loop already detects completion via agent_task.done(), which handles
stream termination correctly without needing the None sentinel.

Reported by Rohit Paul on X.
outsourc-e pushed a commit to outsourc-e/hermes-agent that referenced this pull request Mar 26, 2026
…search#2985)

* fix(run_agent): ensure _fire_first_delta() is called for tool generation events

Added calls to _fire_first_delta() in the AIAgent class to improve the handling of tool generation events, ensuring timely notifications during the processing of function calls and tool usage.

* fix(run_agent): improve timeout handling for chat completions

Enhanced the timeout configuration for chat completions in the AIAgent class by introducing customizable connection, read, and write timeouts using environment variables. This ensures more robust handling of API requests during streaming operations.

* fix(run_agent): reduce default stream read timeout for chat completions

Updated the default stream read timeout from 120 seconds to 60 seconds in the AIAgent class, enhancing the timeout configuration for chat completions. This change aims to improve responsiveness during streaming operations.

* fix(run_agent): enhance streaming error handling and retry logic

Improved the error handling and retry mechanism for streaming requests in the AIAgent class. Introduced a configurable maximum number of stream retries and refined the handling of transient network errors, allowing for retries with fresh connections. Non-transient errors now trigger a fallback to non-streaming only when appropriate, ensuring better resilience during API interactions.

* fix(api_server): streaming breaks when agent makes tool calls

The agent fires stream_delta_callback(None) to signal the CLI display
to close its response box before tool execution begins. The API server's
_on_delta callback was forwarding this None directly into the SSE queue,
where the SSE writer treats it as end-of-stream and terminates the HTTP
response prematurely.

After tool calls complete, the agent streams the final answer through
the same callback, but the SSE response was already closed — so Open
WebUI (and similar frontends) never received the actual answer.

Fix: filter out None in _on_delta so the SSE stream stays open. The SSE
loop already detects completion via agent_task.done(), which handles
stream termination correctly without needing the None sentinel.

Reported by Rohit Paul on X.
StreamOfRon pushed a commit to StreamOfRon/hermes-agent that referenced this pull request Mar 29, 2026
…search#2985)

* fix(run_agent): ensure _fire_first_delta() is called for tool generation events

Added calls to _fire_first_delta() in the AIAgent class to improve the handling of tool generation events, ensuring timely notifications during the processing of function calls and tool usage.

* fix(run_agent): improve timeout handling for chat completions

Enhanced the timeout configuration for chat completions in the AIAgent class by introducing customizable connection, read, and write timeouts using environment variables. This ensures more robust handling of API requests during streaming operations.

* fix(run_agent): reduce default stream read timeout for chat completions

Updated the default stream read timeout from 120 seconds to 60 seconds in the AIAgent class, enhancing the timeout configuration for chat completions. This change aims to improve responsiveness during streaming operations.

* fix(run_agent): enhance streaming error handling and retry logic

Improved the error handling and retry mechanism for streaming requests in the AIAgent class. Introduced a configurable maximum number of stream retries and refined the handling of transient network errors, allowing for retries with fresh connections. Non-transient errors now trigger a fallback to non-streaming only when appropriate, ensuring better resilience during API interactions.

* fix(api_server): streaming breaks when agent makes tool calls

The agent fires stream_delta_callback(None) to signal the CLI display
to close its response box before tool execution begins. The API server's
_on_delta callback was forwarding this None directly into the SSE queue,
where the SSE writer treats it as end-of-stream and terminates the HTTP
response prematurely.

After tool calls complete, the agent streams the final answer through
the same callback, but the SSE response was already closed — so Open
WebUI (and similar frontends) never received the actual answer.

Fix: filter out None in _on_delta so the SSE stream stays open. The SSE
loop already detects completion via agent_task.done(), which handles
stream termination correctly without needing the None sentinel.

Reported by Rohit Paul on X.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

1 participant