How much reasoning is true reasoning in LLM reasoning models?

Question

Posing a technical question to a reasoning LLM may elicit a series of "thinking-like" sentences. For example,

Suppose this model is already pre-fossilized into its 600 billion weight values.

Facts like "donut-shaped coil" could simply be the highest ranked transformer continuation to the prompt "A toroidal solenoid is a ...", right?

However, I am not clear how do reasoning phrases like "Let me start by recalling ..., right?" manage to come out of the transformer?

The training corpus will likely not contain any phrases like "let me start by recalling" on web pages related to toroidal solenoids, correct?

Is this particular sentence added during model fine-tuning, so that this particular transformer always answers any question with the phrase "Let me start by recalling ..., right?"

How much reasoning is actually true reasoning in reasoning models?

cinch · Accepted Answer · 2025-02-18 22:38:12Z

1

LLMs generate phrases like "Let me start by recalling..." as a discourse marker, because during pre-training from explanatory texts they observe countless examples of structured explanations (e.g., tutorials, essays, Q&A forums) where humans use such phrases to scaffold reasoning. Furthermore, fine-tuned models like ChatGPT are trained on instructional data such as "Explain step-by-step..." that reinforces explicit reasoning-like formats and thus they autoregressively learn that similar marked phrases correlate with reasoning discourse, even if they never appeared alongside terms like "toroidal solenoid."

Indeed foundation LLMs lack internal world models and symbolic logical representations, their "reasoning" is purely statistical mimicry. They retrieve and recombine fragments of text that co-occurred with similar prompts or concepts, and there’s no step-by-step deduction or validation, just probabilistic token prediction. Recent research on chain-of-thought prompting has shown that models can produce emergent intermediate and transitional reasoning steps as a natural consequence of the patterns they’ve learned from extensive multi-step explanations in their training data. You may further read Wei et al. (2023) "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models".

We explore how generating a chain of thought—a series of intermediate reasoning steps—significantly improves the ability of large language models to perform complex reasoning. In particular, we show how such reasoning abilities emerge naturally in sufficiently large language models via a simple method called chain-of-thought prompting, where a few chain of thought demonstrations are provided as exemplars in prompting.
Experiments on three large language models show that chain-of-thought prompting improves performance on a range of arithmetic, commonsense, and symbolic reasoning tasks. The empirical gains can be striking. For instance, prompting a PaLM 540B with just eight chain-of-thought exemplars achieves state-of-the-art accuracy on the GSM8K benchmark of math word problems, surpassing even finetuned GPT-3 with a verifier.

answered Feb 18, 2025 at 22:38

cinch

11.3k3 gold badges9 silver badges18 bronze badges

$\begingroup$ Thanks for your thoughtful answer. Let's experiment a little. Suppose we show the model 100 exemplar answers in the style of thank you notes (e.g. 1. "Thank you for your generosity and thoughtfulness. Your kindness has brightened my day." and 2. "I wanted to express my deepest thanks for your encouragement and support."). Then we prompt the model with the same prompt ("Does a toroidal solenoid cause a magnetic field?"), could you give an example of a concrete answer that might be generated by the model given those thank-you-note exemplars it has seen, please? $\endgroup$

James
– James

2025-02-19 13:03:16 +00:00
Commented Feb 19, 2025 at 13:03
1

$\begingroup$ "Thank you for your insightful question regarding toroidal solenoids. I appreciate your curiosity about electromagnetic phenomena. Yes, a toroidal solenoid does generate a magnetic field when an electric current passes through it. The magnetic field is confined within the core of the toroid, creating a closed-loop field that doesn't extend outside the coil. Your engagement with such topics enriches our collective understanding, and I'm grateful for the opportunity to discuss this with you.", reflecting the model's tendency to mimic the thank-you style with the provision of factual information. $\endgroup$

cinch
– cinch

2025-02-20 06:38:26 +00:00
Commented Feb 20, 2025 at 6:38
1

$\begingroup$ Lol, yes that is such an excellent impersonation. $\endgroup$

James
– James

2025-02-20 06:55:40 +00:00
Commented Feb 20, 2025 at 6:55
$\begingroup$ Hope it now clarifies your lingering concern about this specific post. $\endgroup$

cinch
– cinch

2025-02-21 04:57:48 +00:00
Commented Feb 21, 2025 at 4:57
1

$\begingroup$ Indeed this is a useful analogy here. Both LLM fine tuning and style transfer in vision aim to shift a pre-trained model toward a target style or domain in modality-specific ways. In language, fine tuning adjusts the model’s parameters using additional, often domain- or style-specific data so that the model’s outputs reflect desired linguistic patterns. In contrast, style transfer in vision typically separates the content of an image from its style, such as colors, textures, and brushstrokes, and recombines them so that the same content is rendered with a different visual style. $\endgroup$

cinch
– cinch

2025-02-21 06:21:05 +00:00
Commented Feb 21, 2025 at 6:21

| Show 1 more comment

mindcrime · Accepted Answer · 2025-02-18 19:42:03Z

2

Unfortunately I don't think we have a widely accepted definition of what is meant by "true reasoning" so this question is hard to really answer. To some extent, we risk falling prey to the AI effect with this discussion.

That said there has been some research into trying to understand the extent to which "reasoning models" are doing "true" reasoning. One such paper came to my attention in the last day or two. I don't have the link in front of me as I type this, but I'll hunt it down and come back and edit more info into this once I can find it.

answered Feb 18, 2025 at 19:42

mindcrime♦

3,9921 gold badge16 silver badges32 bronze badges

1

$\begingroup$ Thank you. Does it matter to the model that it has already "thought aloud" for 5 paragraphs before giving its final answer? I mean, every word in those 5 paragraphs where it "thought aloud" would indeed be included as context for its final answer, right? Would that additional 5 paragraphs of context have changed the model's final answer, compared to a situation where the model is not allowed to "think" before giving a final answer? $\endgroup$

James
– James

2025-02-19 13:10:56 +00:00
Commented Feb 19, 2025 at 13:10
1

$\begingroup$ Yeah, there's an open question is those paragraphs of "thought" are really "thought" in the sense of how humans think or not. At least one paper or article i read questioned that and did some research to try to suss that out. That's the one I'm still trying to find. It's buried somewhere in my huge list of open browser tabs or bookmarks. That said, I did find a couple of other papers that deal with this as well. I'll edit my answer later to expand on some of that. $\endgroup$

mindcrime
– mindcrime ♦

2025-02-19 19:11:43 +00:00
Commented Feb 19, 2025 at 19:11
$\begingroup$ Yes, my hunch is that since the weights are all frozen, "thinking" may just be a form of context adding. E.g. asking "Should one invest in the stock market?" may elicit a strong yes, citing historically consistent data of stock price growth. But asking "Should one invest in the stock market? I am 80 years old." may give a different answer because of the added context "I am 80 years old" which leads to other web sources taking into account risks of short-term market crashes. By "thinking aloud", the model is trying to add context to the original question, making its final answer more balanced. $\endgroup$

James
– James

2025-02-20 01:22:41 +00:00
Commented Feb 20, 2025 at 1:22
$\begingroup$ i thought it was well established that reasoning is just logical deduction which is just tree search $\endgroup$

JobHunter69
– JobHunter69

2025-05-15 06:32:42 +00:00
Commented May 15, 2025 at 6:32

Add a comment |

Stack Exchange Network

How much reasoning is true reasoning in LLM reasoning models?

2 Answers 2

You must log in to answer this question.

Hot Network Questions

How much reasoning is true reasoning in LLM reasoning models?

2 Answers 2

You must log in to answer this question.

Related

Hot Network Questions