Skip to content

fix(reasoning): skip duplicate callback for <think>-extracted reasoning during streaming#3116

Merged
teknium1 merged 1 commit intomainfrom
hermes/hermes-7d7ac769
Mar 26, 2026
Merged

fix(reasoning): skip duplicate callback for <think>-extracted reasoning during streaming#3116
teknium1 merged 1 commit intomainfrom
hermes/hermes-7d7ac769

Conversation

@teknium1
Copy link
Copy Markdown
Contributor

Local models (Ollama, LM Studio) embed reasoning in <think> tags in delta.content. During streaming, _stream_delta() already displays these blocks. Then _build_assistant_message() extracts them again and fires reasoning_callback, causing duplicate reasoning display.

Track whether reasoning came from structured fields (reasoning_content) vs <think> tag extraction. Only fire the callback for <think>-extracted reasoning when stream_delta_callback is NOT active. Structured reasoning always fires regardless.

Salvaged from PR #2076 by @dusterbloom (Fix A only — Fix B was already covered by PR #3013's _current_reasoning_callback() centralization). Closes #2069.

…ng during streaming

Local models (Ollama, LM Studio) embed reasoning in <think> tags in
delta.content. During streaming, _stream_delta() already displays these
blocks. Then _build_assistant_message() extracts them again and fires
reasoning_callback, causing duplicate display.

Track whether reasoning came from structured fields (reasoning_content)
vs <think> tag extraction. Only fire the callback for <think>-extracted
reasoning when stream_delta_callback is NOT active. Structured reasoning
always fires regardless.

Salvaged from PR #2076 by dusterbloom (Fix A only — Fix B was already
covered by PR #3013's _current_reasoning_callback centralization).
Closes #2069.
@teknium1 teknium1 merged commit 156b503 into main Mar 26, 2026
3 of 4 checks passed
outsourc-e pushed a commit to outsourc-e/hermes-agent that referenced this pull request Mar 26, 2026
…ng during streaming (NousResearch#3116)

Local models (Ollama, LM Studio) embed reasoning in <think> tags in
delta.content. During streaming, _stream_delta() already displays these
blocks. Then _build_assistant_message() extracts them again and fires
reasoning_callback, causing duplicate display.

Track whether reasoning came from structured fields (reasoning_content)
vs <think> tag extraction. Only fire the callback for <think>-extracted
reasoning when stream_delta_callback is NOT active. Structured reasoning
always fires regardless.

Salvaged from PR NousResearch#2076 by dusterbloom (Fix A only — Fix B was already
covered by PR NousResearch#3013's _current_reasoning_callback centralization).
Closes NousResearch#2069.
StreamOfRon pushed a commit to StreamOfRon/hermes-agent that referenced this pull request Mar 29, 2026
…ng during streaming (NousResearch#3116)

Local models (Ollama, LM Studio) embed reasoning in <think> tags in
delta.content. During streaming, _stream_delta() already displays these
blocks. Then _build_assistant_message() extracts them again and fires
reasoning_callback, causing duplicate display.

Track whether reasoning came from structured fields (reasoning_content)
vs <think> tag extraction. Only fire the callback for <think>-extracted
reasoning when stream_delta_callback is NOT active. Structured reasoning
always fires regardless.

Salvaged from PR NousResearch#2076 by dusterbloom (Fix A only — Fix B was already
covered by PR NousResearch#3013's _current_reasoning_callback centralization).
Closes NousResearch#2069.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

1 participant