fix(reasoning): skip duplicate callback for <think>-extracted reasoning during streaming by teknium1 · Pull Request #3116 · NousResearch/hermes-agent

teknium1 · 2026-03-26T01:53:32Z

Local models (Ollama, LM Studio) embed reasoning in <think> tags in delta.content. During streaming, _stream_delta() already displays these blocks. Then _build_assistant_message() extracts them again and fires reasoning_callback, causing duplicate reasoning display.

Track whether reasoning came from structured fields (reasoning_content) vs <think> tag extraction. Only fire the callback for <think>-extracted reasoning when stream_delta_callback is NOT active. Structured reasoning always fires regardless.

Salvaged from PR #2076 by @dusterbloom (Fix A only — Fix B was already covered by PR #3013's _current_reasoning_callback() centralization). Closes #2069.

…ng during streaming Local models (Ollama, LM Studio) embed reasoning in <think> tags in delta.content. During streaming, _stream_delta() already displays these blocks. Then _build_assistant_message() extracts them again and fires reasoning_callback, causing duplicate display. Track whether reasoning came from structured fields (reasoning_content) vs <think> tag extraction. Only fire the callback for <think>-extracted reasoning when stream_delta_callback is NOT active. Structured reasoning always fires regardless. Salvaged from PR #2076 by dusterbloom (Fix A only — Fix B was already covered by PR #3013's _current_reasoning_callback centralization). Closes #2069.

…ng during streaming (NousResearch#3116) Local models (Ollama, LM Studio) embed reasoning in <think> tags in delta.content. During streaming, _stream_delta() already displays these blocks. Then _build_assistant_message() extracts them again and fires reasoning_callback, causing duplicate display. Track whether reasoning came from structured fields (reasoning_content) vs <think> tag extraction. Only fire the callback for <think>-extracted reasoning when stream_delta_callback is NOT active. Structured reasoning always fires regardless. Salvaged from PR NousResearch#2076 by dusterbloom (Fix A only — Fix B was already covered by PR NousResearch#3013's _current_reasoning_callback centralization). Closes NousResearch#2069.

teknium1 merged commit 156b503 into main Mar 26, 2026
3 of 4 checks passed

teknium1 mentioned this pull request Mar 26, 2026

fix(reasoning): prevent duplicate display and [thinking] spam for local models #2076

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(reasoning): skip duplicate callback for <think>-extracted reasoning during streaming#3116

fix(reasoning): skip duplicate callback for <think>-extracted reasoning during streaming#3116
teknium1 merged 1 commit intomainfrom
hermes/hermes-7d7ac769

teknium1 commented Mar 26, 2026

Uh oh!

Labels

1 participant

Conversation

teknium1 commented Mar 26, 2026

Uh oh!

Labels

1 participant