fix(whatsapp): download documents, audio, and video media from messages by teknium1 · Pull Request #2978 · NousResearch/hermes-agent

teknium1 · 2026-03-25T15:37:22Z

Summary

Salvage of PR #2818 by @noestelar onto current main. Also addresses the voice note bug reported in #2856 (which PR #2865 by @ayberkesn also fixed).

What this does: The WhatsApp bridge only downloaded images — documents, audio/voice notes, and video were detected (hasMedia = true) but never actually downloaded via downloadMediaMessage(). This meant mediaUrls stayed empty and the agent couldn't access these files.

Changes

bridge.js:

Add downloadMediaMessage() for video (→ document_cache/), audio/ptt (→ audio_cache/), and documents (→ document_cache/ with sanitized filenames)
Dedicated cache directories instead of reusing image_cache/ for everything

whatsapp.py:

Handle local file paths from bridge for DOCUMENT, VOICE, and VIDEO message types
MIME detection via existing SUPPORTED_DOCUMENT_TYPES map for documents
Text content injection for readable files (.txt, .md, .csv, .json, .py, etc.)

Follow-up fixes (applied during salvage)

Removed unused cache_document_from_bytes import
Added 100KB size cap on text injection (matches Telegram/Discord/Slack)
Aligned injection format with other platforms ([Content of name]: instead of markdown)

Fixes #2856 (bugs 1 & 2).
Contributor credit: @noestelar (primary implementation), @ayberkesn (independent voice note fix in #2865).

Add downloadMediaMessage() calls for documents, audio/voice notes, and video in bridge.js — previously only images were downloaded, leaving all other file types inaccessible to the agent. Handle local file paths from the bridge for DOCUMENT, VOICE, and VIDEO types in whatsapp.py with proper MIME detection. Inject text content inline for readable files (.txt, .md, .csv, .json, etc.). Follow-up fixes applied during salvage: - Remove unused cache_document_from_bytes import - Add 100KB size cap on text injection (matches Telegram/Discord/Slack) - Align injection format with other platforms Cherry-picked from PR #2818. Also fixes #2856 (bugs 1 & 2). PR #2865 by ayberkesn fixed the same voice note issue.

…es (NousResearch#2978) Add downloadMediaMessage() calls for documents, audio/voice notes, and video in bridge.js — previously only images were downloaded, leaving all other file types inaccessible to the agent. Handle local file paths from the bridge for DOCUMENT, VOICE, and VIDEO types in whatsapp.py with proper MIME detection. Inject text content inline for readable files (.txt, .md, .csv, .json, etc.). Follow-up fixes applied during salvage: - Remove unused cache_document_from_bytes import - Add 100KB size cap on text injection (matches Telegram/Discord/Slack) - Align injection format with other platforms Cherry-picked from PR NousResearch#2818. Also fixes NousResearch#2856 (bugs 1 & 2). PR NousResearch#2865 by ayberkesn fixed the same voice note issue. Co-authored-by: noestelar <hola@noeali.com>

teknium1 merged commit c6f4515 into main Mar 25, 2026

dieutx mentioned this pull request Mar 25, 2026

fix(gateway): add media download retry to Mattermost, Slack, and base cache #2982

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(whatsapp): download documents, audio, and video media from messages#2978

fix(whatsapp): download documents, audio, and video media from messages#2978
teknium1 merged 1 commit intomainfrom
hermes/hermes-84f0446e

teknium1 commented Mar 25, 2026

Labels

2 participants

Conversation