Skip to content

Fix web_fetch#922

Open
LizBing wants to merge 2 commits intosipeed:mainfrom
LizBing:fix-web-fetch
Open

Fix web_fetch#922
LizBing wants to merge 2 commits intosipeed:mainfrom
LizBing:fix-web-fetch

Conversation

@LizBing
Copy link

@LizBing LizBing commented Feb 28, 2026

📝 Description

Previously, there was no actual content returned to the LLM.

🗣️ Type of Change

  • 🐞 Bug fix (non-breaking change which fixes an issue)
  • ✨ New feature (non-breaking change which adds functionality)
  • 📖 Documentation update
  • ⚡ Code refactoring (no functional changes, no api changes)

🤖 AI Code Generation

  • 🤖 Fully AI-generated (100% AI, 0% Human)
  • 🛠️ Mostly AI-generated (AI draft, Human verified/modified)
  • 👨‍💻 Mostly Human-written (Human lead, AI assisted or none)

🔗 Related Issue

📚 Technical Context (Skip for Docs)

  • Reference URL:
  • Reasoning:

🧪 Test Environment

  • Hardware:
  • OS:
  • Model/Provider:
  • Channels:

📸 Evidence (Optional)

Click to view Logs/Screenshots

☑️ Checklist

  • My code/docs follow the style of this project.
  • I have performed a self-review of my own changes.
  • I have updated the documentation accordingly.
@afjcjsbx
Copy link
Contributor

Duplicate of: #833

Copy link

@nikolasdehor nikolasdehor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: Fix web_fetch

Context: I submitted PR #447 which also fixed web_fetch (proxy support). That PR was closed/superseded, so I am familiar with this code path.

The Bug

The original code returned only metadata in ForLLM (byte count, URL, extractor, truncated flag) but not the actual fetched content. This meant the LLM received a summary like "Fetched 5000 bytes from example.com" but never saw the actual page content. That is a critical bug -- the whole point of web_fetch is to give the LLM the page content to reason about.

The Fix

The fix correctly appends the text variable (the actual fetched content) to the ForLLM field. The format is clean:

Fetched N bytes from URL (extractor: X, truncated: Y)
Content:
<actual content>

Observations

  1. This is correct and necessary. Without the content in ForLLM, the tool is functionally broken for LLM consumption.

  2. Potential size concern -- text could be very large (the truncation happens earlier in the pipeline). If the content is, say, 100KB of markdown, this entire string goes into the LLM context window. This is the existing behavior though (the truncation logic upstream handles it), so the fix is correct as-is.

  3. ForUser still gets the JSON result -- Good that the user-facing output is unchanged.

  4. No tests -- Consider adding a test that verifies ForLLM contains the actual fetched content. This would prevent regression.

LGTM. Small fix, high impact.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

3 participants