0
$\begingroup$

Posing a technical question to a reasoning LLM may elicit a series of "thinking-like" sentences. For example,

enter image description here

Suppose this model is already pre-fossilized into its 600 billion weight values.

Facts like "donut-shaped coil" could simply be the highest ranked transformer continuation to the prompt "A toroidal solenoid is a ...", right?

However, I am not clear how do reasoning phrases like "Let me start by recalling ..., right?" manage to come out of the transformer?

The training corpus will likely not contain any phrases like "let me start by recalling" on web pages related to toroidal solenoids, correct?

Is this particular sentence added during model fine-tuning, so that this particular transformer always answers any question with the phrase "Let me start by recalling ..., right?"

How much reasoning is actually true reasoning in reasoning models?

$\endgroup$

2 Answers 2

1
$\begingroup$

LLMs generate phrases like "Let me start by recalling..." as a discourse marker, because during pre-training from explanatory texts they observe countless examples of structured explanations (e.g., tutorials, essays, Q&A forums) where humans use such phrases to scaffold reasoning. Furthermore, fine-tuned models like ChatGPT are trained on instructional data such as "Explain step-by-step..." that reinforces explicit reasoning-like formats and thus they autoregressively learn that similar marked phrases correlate with reasoning discourse, even if they never appeared alongside terms like "toroidal solenoid."

Indeed foundation LLMs lack internal world models and symbolic logical representations, their "reasoning" is purely statistical mimicry. They retrieve and recombine fragments of text that co-occurred with similar prompts or concepts, and there’s no step-by-step deduction or validation, just probabilistic token prediction. Recent research on chain-of-thought prompting has shown that models can produce emergent intermediate and transitional reasoning steps as a natural consequence of the patterns they’ve learned from extensive multi-step explanations in their training data. You may further read Wei et al. (2023) "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models".

We explore how generating a chain of thought—a series of intermediate reasoning steps—significantly improves the ability of large language models to perform complex reasoning. In particular, we show how such reasoning abilities emerge naturally in sufficiently large language models via a simple method called chain-of-thought prompting, where a few chain of thought demonstrations are provided as exemplars in prompting.
Experiments on three large language models show that chain-of-thought prompting improves performance on a range of arithmetic, commonsense, and symbolic reasoning tasks. The empirical gains can be striking. For instance, prompting a PaLM 540B with just eight chain-of-thought exemplars achieves state-of-the-art accuracy on the GSM8K benchmark of math word problems, surpassing even finetuned GPT-3 with a verifier.

$\endgroup$
6
  • $\begingroup$ Thanks for your thoughtful answer. Let's experiment a little. Suppose we show the model 100 exemplar answers in the style of thank you notes (e.g. 1. "Thank you for your generosity and thoughtfulness. Your kindness has brightened my day." and 2. "I wanted to express my deepest thanks for your encouragement and support."). Then we prompt the model with the same prompt ("Does a toroidal solenoid cause a magnetic field?"), could you give an example of a concrete answer that might be generated by the model given those thank-you-note exemplars it has seen, please? $\endgroup$ Commented Feb 19, 2025 at 13:03
  • 1
    $\begingroup$ "Thank you for your insightful question regarding toroidal solenoids. I appreciate your curiosity about electromagnetic phenomena. Yes, a toroidal solenoid does generate a magnetic field when an electric current passes through it. The magnetic field is confined within the core of the toroid, creating a closed-loop field that doesn't extend outside the coil. Your engagement with such topics enriches our collective understanding, and I'm grateful for the opportunity to discuss this with you.", reflecting the model's tendency to mimic the thank-you style with the provision of factual information. $\endgroup$ Commented Feb 20, 2025 at 6:38
  • 1
    $\begingroup$ Lol, yes that is such an excellent impersonation. $\endgroup$ Commented Feb 20, 2025 at 6:55
  • $\begingroup$ Hope it now clarifies your lingering concern about this specific post. $\endgroup$ Commented Feb 21, 2025 at 4:57
  • 1
    $\begingroup$ Indeed this is a useful analogy here. Both LLM fine tuning and style transfer in vision aim to shift a pre-trained model toward a target style or domain in modality-specific ways. In language, fine tuning adjusts the model’s parameters using additional, often domain- or style-specific data so that the model’s outputs reflect desired linguistic patterns. In contrast, style transfer in vision typically separates the content of an image from its style, such as colors, textures, and brushstrokes, and recombines them so that the same content is rendered with a different visual style. $\endgroup$ Commented Feb 21, 2025 at 6:21
2
$\begingroup$

Unfortunately I don't think we have a widely accepted definition of what is meant by "true reasoning" so this question is hard to really answer. To some extent, we risk falling prey to the AI effect with this discussion.

That said there has been some research into trying to understand the extent to which "reasoning models" are doing "true" reasoning. One such paper came to my attention in the last day or two. I don't have the link in front of me as I type this, but I'll hunt it down and come back and edit more info into this once I can find it.

$\endgroup$
4
  • 1
    $\begingroup$ Thank you. Does it matter to the model that it has already "thought aloud" for 5 paragraphs before giving its final answer? I mean, every word in those 5 paragraphs where it "thought aloud" would indeed be included as context for its final answer, right? Would that additional 5 paragraphs of context have changed the model's final answer, compared to a situation where the model is not allowed to "think" before giving a final answer? $\endgroup$ Commented Feb 19, 2025 at 13:10
  • 1
    $\begingroup$ Yeah, there's an open question is those paragraphs of "thought" are really "thought" in the sense of how humans think or not. At least one paper or article i read questioned that and did some research to try to suss that out. That's the one I'm still trying to find. It's buried somewhere in my huge list of open browser tabs or bookmarks. That said, I did find a couple of other papers that deal with this as well. I'll edit my answer later to expand on some of that. $\endgroup$ Commented Feb 19, 2025 at 19:11
  • $\begingroup$ Yes, my hunch is that since the weights are all frozen, "thinking" may just be a form of context adding. E.g. asking "Should one invest in the stock market?" may elicit a strong yes, citing historically consistent data of stock price growth. But asking "Should one invest in the stock market? I am 80 years old." may give a different answer because of the added context "I am 80 years old" which leads to other web sources taking into account risks of short-term market crashes. By "thinking aloud", the model is trying to add context to the original question, making its final answer more balanced. $\endgroup$ Commented Feb 20, 2025 at 1:22
  • $\begingroup$ i thought it was well established that reasoning is just logical deduction which is just tree search $\endgroup$ Commented May 15, 2025 at 6:32

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.