Questions tagged [large-language-models]
For questions about large language models (LLMs), i.e. language models that are "large" in size and the data they use.
292 questions
5
votes
1
answer
857
views
Why do larger language models still fail on simple compositional reasoning tasks?
Large language models often perform impressively on benchmark tasks, coding, and natural language generation, but they can still fail on reasoning problems that seem simple for humans, especially when ...
1
vote
0
answers
12
views
Why do transformer models sometimes produce fluent but logically inconsistent answers even when retrieval provides the correct context?
I understand that transformer-based language models can generate highly fluent responses, and that retrieval-augmented generation (RAG) is often used to improve factual grounding by supplying relevant ...
1
vote
1
answer
25
views
Can TF-IDF be used as a machine translation loss function?
TF-IDF cosine similarity is a powerful means of determining the similarity of two text documents. Can it be used as a loss function in a machine translation model training?
For example, when measuring ...
0
votes
1
answer
40
views
How do I access a llama.cpp server instance with the Continue extension for VSCodium?
If I'm running GLM-4.7-Flash-GGUF:Q6_K_XL from the powershell terminal like this ...
0
votes
0
answers
19
views
Using LLM models downloaded from huggingface with langgraph
I downloaded the weights of a llama model from huggingface. It works for simple tasks, but I don't know how to use it with langgraph to create agents or how to bind tools.
Here is how I downloaded it:
...
0
votes
0
answers
21
views
Why do Llama 2 7B ARC-Easy results differ across evaluations?
I provided test for arc-easy results for Llama 2 7B.
I use lm evaluation harness, I get:
acc: 75.51
acc_norm: 73.86
In the Llama 2 technical report, arc-easy is reported as 75.2. But in some ...
-1
votes
0
answers
15
views
Searching job boards and creating an excel file
We are looking to understand which AI tool or tools would be best to create a scrape of public job boards for specific job titles. We will also need the contact information from the jobs. Then we ...
0
votes
0
answers
16
views
Docs with date - Temporal RAG
I have hundreds of chunked documents categorized under News and Pages categories. Docs that fall under News category have publish date, while Pages category files don’t have.
I’m trying to apply ...
1
vote
1
answer
62
views
Why do large language models struggle with consistent multi step reasoning?
Large language models often perform well on single-step tasks but sometimes fail when reasoning requires multiple logical steps.
Is this limitation mainly due to training data patterns, model ...
0
votes
1
answer
26
views
Are there AI architectures where one model generates reasoning and another model verifies or monitors the reasoning process?
I am a student who has recently started learning about artificial intelligence and reasoning systems, so I apologize in advance if this question is already well known in the literature.
Many modern ...
1
vote
2
answers
86
views
Why do large language models hallucinate facts even when trained on large datasets?
Large language models such as GPT, LLaMA, and Claude are trained on massive datasets and can generate highly coherent text. However, they still frequently produce incorrect or fabricated information, ...
0
votes
1
answer
70
views
Did AI redefine how we measure performance?
Did AI change the benchmark for measuring machine performance from following exact design goals to meeting design objectives? Where the latter could be evaluated based on criteria such as relevance ...
3
votes
1
answer
86
views
Why do modern LLMs still struggle with multi-step logical reasoning despite large context windows?
Large language models today have huge context windows, sometimes exceeding 100k+ tokens, yet they still fail on tasks that require consistent multi-step logical reasoning. I’m referring to tasks like:
...
3
votes
1
answer
83
views
What happened to residual attention?
In 2020, RealFormer introduced residual attention (c):
But in 2024, the state-of-the-art DeepSeek transformer model still uses Post-LayerNorm (a):
Residual attention is a simple addition operation, ...
3
votes
1
answer
110
views
Why do large language models sometimes fail to learn long-term dependencies even with transformer architectures?
Transformers are designed to capture long-range dependencies better than RNNs and LSTMs, but in practice, many models still fail to maintain consistent long-term reasoning.
For example, when working ...