Conversation
|
Duplicate of: #833 |
nikolasdehor
left a comment
There was a problem hiding this comment.
Review: Fix web_fetch
Context: I submitted PR #447 which also fixed web_fetch (proxy support). That PR was closed/superseded, so I am familiar with this code path.
The Bug
The original code returned only metadata in ForLLM (byte count, URL, extractor, truncated flag) but not the actual fetched content. This meant the LLM received a summary like "Fetched 5000 bytes from example.com" but never saw the actual page content. That is a critical bug -- the whole point of web_fetch is to give the LLM the page content to reason about.
The Fix
The fix correctly appends the text variable (the actual fetched content) to the ForLLM field. The format is clean:
Fetched N bytes from URL (extractor: X, truncated: Y)
Content:
<actual content>
Observations
-
This is correct and necessary. Without the content in
ForLLM, the tool is functionally broken for LLM consumption. -
Potential size concern --
textcould be very large (the truncation happens earlier in the pipeline). If the content is, say, 100KB of markdown, this entire string goes into the LLM context window. This is the existing behavior though (the truncation logic upstream handles it), so the fix is correct as-is. -
ForUserstill gets the JSON result -- Good that the user-facing output is unchanged. -
No tests -- Consider adding a test that verifies
ForLLMcontains the actual fetched content. This would prevent regression.
LGTM. Small fix, high impact.
📝 Description
Previously, there was no actual content returned to the LLM.
🗣️ Type of Change
🤖 AI Code Generation
🔗 Related Issue
📚 Technical Context (Skip for Docs)
🧪 Test Environment
📸 Evidence (Optional)
Click to view Logs/Screenshots
☑️ Checklist