fix(api_server): streaming breaks when agent makes tool calls by teknium1 · Pull Request #2985 · NousResearch/hermes-agent

teknium1 · 2026-03-25T16:53:37Z

Summary

When the agent makes tool calls during streaming, it fires stream_delta_callback(None) to signal the CLI display to close its response box. The API server's _on_delta callback was forwarding this None directly into the SSE queue, where the SSE writer treats it as end-of-stream and terminates the HTTP response prematurely.

After tool calls complete, the agent streams the final answer through the same callback, but the SSE response was already closed. Open WebUI (and similar frontends) never received the actual answer — they just saw the response "get stuck" during tool calling.

Fix

Filter out None in _on_delta so the SSE stream stays open through tool calls. The SSE loop already detects completion via agent_task.done(), which handles stream termination correctly without needing the None sentinel.

Test plan

Added test_stream_survives_tool_call_none_sentinel — simulates mid-stream None signals (tool calls) with text before and after, verifies all content reaches the SSE stream
All 115 API server tests pass
Non-streaming mode was unaffected (it blocks until agent completes)

Reported by Rohit Paul on X.

…ion events Added calls to _fire_first_delta() in the AIAgent class to improve the handling of tool generation events, ensuring timely notifications during the processing of function calls and tool usage.

Enhanced the timeout configuration for chat completions in the AIAgent class by introducing customizable connection, read, and write timeouts using environment variables. This ensures more robust handling of API requests during streaming operations.

Updated the default stream read timeout from 120 seconds to 60 seconds in the AIAgent class, enhancing the timeout configuration for chat completions. This change aims to improve responsiveness during streaming operations.

Improved the error handling and retry mechanism for streaming requests in the AIAgent class. Introduced a configurable maximum number of stream retries and refined the handling of transient network errors, allowing for retries with fresh connections. Non-transient errors now trigger a fallback to non-streaming only when appropriate, ensuring better resilience during API interactions.

The agent fires stream_delta_callback(None) to signal the CLI display to close its response box before tool execution begins. The API server's _on_delta callback was forwarding this None directly into the SSE queue, where the SSE writer treats it as end-of-stream and terminates the HTTP response prematurely. After tool calls complete, the agent streams the final answer through the same callback, but the SSE response was already closed — so Open WebUI (and similar frontends) never received the actual answer. Fix: filter out None in _on_delta so the SSE stream stays open. The SSE loop already detects completion via agent_task.done(), which handles stream termination correctly without needing the None sentinel. Reported by Rohit Paul on X.

…search#2985) * fix(run_agent): ensure _fire_first_delta() is called for tool generation events Added calls to _fire_first_delta() in the AIAgent class to improve the handling of tool generation events, ensuring timely notifications during the processing of function calls and tool usage. * fix(run_agent): improve timeout handling for chat completions Enhanced the timeout configuration for chat completions in the AIAgent class by introducing customizable connection, read, and write timeouts using environment variables. This ensures more robust handling of API requests during streaming operations. * fix(run_agent): reduce default stream read timeout for chat completions Updated the default stream read timeout from 120 seconds to 60 seconds in the AIAgent class, enhancing the timeout configuration for chat completions. This change aims to improve responsiveness during streaming operations. * fix(run_agent): enhance streaming error handling and retry logic Improved the error handling and retry mechanism for streaming requests in the AIAgent class. Introduced a configurable maximum number of stream retries and refined the handling of transient network errors, allowing for retries with fresh connections. Non-transient errors now trigger a fallback to non-streaming only when appropriate, ensuring better resilience during API interactions. * fix(api_server): streaming breaks when agent makes tool calls The agent fires stream_delta_callback(None) to signal the CLI display to close its response box before tool execution begins. The API server's _on_delta callback was forwarding this None directly into the SSE queue, where the SSE writer treats it as end-of-stream and terminates the HTTP response prematurely. After tool calls complete, the agent streams the final answer through the same callback, but the SSE response was already closed — so Open WebUI (and similar frontends) never received the actual answer. Fix: filter out None in _on_delta so the SSE stream stays open. The SSE loop already detects completion via agent_task.done(), which handles stream termination correctly without needing the None sentinel. Reported by Rohit Paul on X.

teknium1 added 5 commits March 25, 2026 08:35

fix(run_agent): ensure _fire_first_delta() is called for tool generat…

36292c9

…ion events Added calls to _fire_first_delta() in the AIAgent class to improve the handling of tool generation events, ensuring timely notifications during the processing of function calls and tool usage.

teknium1 merged commit b2a6b01 into main Mar 25, 2026
1 of 2 checks passed

teknium1 mentioned this pull request Mar 25, 2026

Problem: Open WebUI streaming mode returns empty responses when tools… #2958

Closed

19 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(api_server): streaming breaks when agent makes tool calls#2985

fix(api_server): streaming breaks when agent makes tool calls#2985
teknium1 merged 5 commits intomainfrom
hermes/hermes-7d7ac769

teknium1 commented Mar 25, 2026

Uh oh!

Labels

1 participant

Conversation