Newest 'large-language-models' Questions - Artificial Intelligence Stack Exchange

5 votes

1 answer

857 views

Why do larger language models still fail on simple compositional reasoning tasks?

Large language models often perform impressively on benchmark tasks, coding, and natural language generation, but they can still fail on reasoning problems that seem simple for humans, especially when ...

Avalon Brooks

637

asked yesterday

1 vote

0 answers

12 views

Why do transformer models sometimes produce fluent but logically inconsistent answers even when retrieval provides the correct context?

I understand that transformer-based language models can generate highly fluent responses, and that retrieval-augmented generation (RAG) is often used to improve factual grounding by supplying relevant ...

Avalon Brooks

637

asked yesterday

1 vote

1 answer

25 views

Can TF-IDF be used as a machine translation loss function?

TF-IDF cosine similarity is a powerful means of determining the similarity of two text documents. Can it be used as a loss function in a machine translation model training? For example, when measuring ...

Geremia

599

asked Mar 21 at 22:59

0 votes

1 answer

40 views

How do I access a llama.cpp server instance with the Continue extension for VSCodium?

If I'm running GLM-4.7-Flash-GGUF:Q6_K_XL from the powershell terminal like this ...

ChristianOConnor

101

asked Mar 20 at 21:48

0 votes

0 answers

19 views

Using LLM models downloaded from huggingface with langgraph

I downloaded the weights of a llama model from huggingface. It works for simple tasks, but I don't know how to use it with langgraph to create agents or how to bind tools. Here is how I downloaded it: ...

jottbe

176

asked Mar 18 at 23:51

0 votes

0 answers

21 views

Why do Llama 2 7B ARC-Easy results differ across evaluations?

I provided test for arc-easy results for Llama 2 7B. I use lm evaluation harness, I get: acc: 75.51 acc_norm: 73.86 In the Llama 2 technical report, arc-easy is reported as 75.2. But in some ...

danlee

1

asked Mar 17 at 8:46

-1 votes

0 answers

15 views

Searching job boards and creating an excel file

We are looking to understand which AI tool or tools would be best to create a scrape of public job boards for specific job titles. We will also need the contact information from the jobs. Then we ...

Centurion27

1

asked Mar 12 at 2:12

0 votes

0 answers

16 views

Docs with date - Temporal RAG

I have hundreds of chunked documents categorized under News and Pages categories. Docs that fall under News category have publish date, while Pages category files don’t have. I’m trying to apply ...

Mayar Alzerki

1

asked Mar 12 at 0:40

1 vote

1 answer

62 views

Why do large language models struggle with consistent multi step reasoning?

Large language models often perform well on single-step tasks but sometimes fail when reasoning requires multiple logical steps. Is this limitation mainly due to training data patterns, model ...

Avalon Brooks

637

asked Mar 11 at 8:02

0 votes

1 answer

26 views

Are there AI architectures where one model generates reasoning and another model verifies or monitors the reasoning process?

I am a student who has recently started learning about artificial intelligence and reasoning systems, so I apologize in advance if this question is already well known in the literature. Many modern ...

Sagar P.

1

asked Mar 11 at 3:19

1 vote

2 answers

86 views

Why do large language models hallucinate facts even when trained on large datasets?

Large language models such as GPT, LLaMA, and Claude are trained on massive datasets and can generate highly coherent text. However, they still frequently produce incorrect or fabricated information, ...

Avalon Brooks

637

asked Mar 9 at 8:11

0 votes

1 answer

70 views

Did AI redefine how we measure performance?

Did AI change the benchmark for measuring machine performance from following exact design goals to meeting design objectives? Where the latter could be evaluated based on criteria such as relevance ...

Mohamed El Nawawy

93

asked Mar 4 at 15:54

3 votes

1 answer

86 views

Why do modern LLMs still struggle with multi-step logical reasoning despite large context windows?

Large language models today have huge context windows, sometimes exceeding 100k+ tokens, yet they still fail on tasks that require consistent multi-step logical reasoning. I’m referring to tasks like: ...

Avalon Brooks

637

asked Mar 2 at 5:31

3 votes

1 answer

83 views

What happened to residual attention?

In 2020, RealFormer introduced residual attention (c): But in 2024, the state-of-the-art DeepSeek transformer model still uses Post-LayerNorm (a): Residual attention is a simple addition operation, ...

Daniel T

133

asked Feb 27 at 4:57

3 votes

1 answer

110 views

Why do large language models sometimes fail to learn long-term dependencies even with transformer architectures?

Transformers are designed to capture long-range dependencies better than RNNs and LSTMs, but in practice, many models still fail to maintain consistent long-term reasoning. For example, when working ...

Avalon Brooks

637

asked Feb 20 at 5:28

Stack Exchange Network

Questions tagged [large-language-models]

Why do larger language models still fail on simple compositional reasoning tasks?

Why do transformer models sometimes produce fluent but logically inconsistent answers even when retrieval provides the correct context?

Can TF-IDF be used as a machine translation loss function?

How do I access a llama.cpp server instance with the Continue extension for VSCodium?

Using LLM models downloaded from huggingface with langgraph

Why do Llama 2 7B ARC-Easy results differ across evaluations?

Searching job boards and creating an excel file

Docs with date - Temporal RAG

Why do large language models struggle with consistent multi step reasoning?

Are there AI architectures where one model generates reasoning and another model verifies or monitors the reasoning process?

Why do large language models hallucinate facts even when trained on large datasets?

Did AI redefine how we measure performance?

Why do modern LLMs still struggle with multi-step logical reasoning despite large context windows?

What happened to residual attention?

Why do large language models sometimes fail to learn long-term dependencies even with transformer architectures?

Hot Network Questions