Skip to content

fix(whatsapp): download documents, audio, and video media from messages#2978

Merged
teknium1 merged 1 commit intomainfrom
hermes/hermes-84f0446e
Mar 25, 2026
Merged

fix(whatsapp): download documents, audio, and video media from messages#2978
teknium1 merged 1 commit intomainfrom
hermes/hermes-84f0446e

Conversation

@teknium1
Copy link
Copy Markdown
Contributor

Summary

Salvage of PR #2818 by @noestelar onto current main. Also addresses the voice note bug reported in #2856 (which PR #2865 by @ayberkesn also fixed).

What this does: The WhatsApp bridge only downloaded images — documents, audio/voice notes, and video were detected (hasMedia = true) but never actually downloaded via downloadMediaMessage(). This meant mediaUrls stayed empty and the agent couldn't access these files.

Changes

bridge.js:

  • Add downloadMediaMessage() for video (→ document_cache/), audio/ptt (→ audio_cache/), and documents (→ document_cache/ with sanitized filenames)
  • Dedicated cache directories instead of reusing image_cache/ for everything

whatsapp.py:

  • Handle local file paths from bridge for DOCUMENT, VOICE, and VIDEO message types
  • MIME detection via existing SUPPORTED_DOCUMENT_TYPES map for documents
  • Text content injection for readable files (.txt, .md, .csv, .json, .py, etc.)

Follow-up fixes (applied during salvage)

  • Removed unused cache_document_from_bytes import
  • Added 100KB size cap on text injection (matches Telegram/Discord/Slack)
  • Aligned injection format with other platforms ([Content of name]: instead of markdown)

Fixes #2856 (bugs 1 & 2).
Contributor credit: @noestelar (primary implementation), @ayberkesn (independent voice note fix in #2865).

Add downloadMediaMessage() calls for documents, audio/voice notes, and
video in bridge.js — previously only images were downloaded, leaving all
other file types inaccessible to the agent.

Handle local file paths from the bridge for DOCUMENT, VOICE, and VIDEO
types in whatsapp.py with proper MIME detection. Inject text content
inline for readable files (.txt, .md, .csv, .json, etc.).

Follow-up fixes applied during salvage:
- Remove unused cache_document_from_bytes import
- Add 100KB size cap on text injection (matches Telegram/Discord/Slack)
- Align injection format with other platforms

Cherry-picked from PR #2818. Also fixes #2856 (bugs 1 & 2).
PR #2865 by ayberkesn fixed the same voice note issue.
@teknium1 teknium1 merged commit c6f4515 into main Mar 25, 2026
InB4DevOps pushed a commit to InB4DevOps/hermes-agent that referenced this pull request Mar 25, 2026
…es (NousResearch#2978)

Add downloadMediaMessage() calls for documents, audio/voice notes, and
video in bridge.js — previously only images were downloaded, leaving all
other file types inaccessible to the agent.

Handle local file paths from the bridge for DOCUMENT, VOICE, and VIDEO
types in whatsapp.py with proper MIME detection. Inject text content
inline for readable files (.txt, .md, .csv, .json, etc.).

Follow-up fixes applied during salvage:
- Remove unused cache_document_from_bytes import
- Add 100KB size cap on text injection (matches Telegram/Discord/Slack)
- Align injection format with other platforms

Cherry-picked from PR NousResearch#2818. Also fixes NousResearch#2856 (bugs 1 & 2).
PR NousResearch#2865 by ayberkesn fixed the same voice note issue.

Co-authored-by: noestelar <hola@noeali.com>
outsourc-e pushed a commit to outsourc-e/hermes-agent that referenced this pull request Mar 26, 2026
…es (NousResearch#2978)

Add downloadMediaMessage() calls for documents, audio/voice notes, and
video in bridge.js — previously only images were downloaded, leaving all
other file types inaccessible to the agent.

Handle local file paths from the bridge for DOCUMENT, VOICE, and VIDEO
types in whatsapp.py with proper MIME detection. Inject text content
inline for readable files (.txt, .md, .csv, .json, etc.).

Follow-up fixes applied during salvage:
- Remove unused cache_document_from_bytes import
- Add 100KB size cap on text injection (matches Telegram/Discord/Slack)
- Align injection format with other platforms

Cherry-picked from PR NousResearch#2818. Also fixes NousResearch#2856 (bugs 1 & 2).
PR NousResearch#2865 by ayberkesn fixed the same voice note issue.

Co-authored-by: noestelar <hola@noeali.com>
StreamOfRon pushed a commit to StreamOfRon/hermes-agent that referenced this pull request Mar 29, 2026
…es (NousResearch#2978)

Add downloadMediaMessage() calls for documents, audio/voice notes, and
video in bridge.js — previously only images were downloaded, leaving all
other file types inaccessible to the agent.

Handle local file paths from the bridge for DOCUMENT, VOICE, and VIDEO
types in whatsapp.py with proper MIME detection. Inject text content
inline for readable files (.txt, .md, .csv, .json, etc.).

Follow-up fixes applied during salvage:
- Remove unused cache_document_from_bytes import
- Add 100KB size cap on text injection (matches Telegram/Discord/Slack)
- Align injection format with other platforms

Cherry-picked from PR NousResearch#2818. Also fixes NousResearch#2856 (bugs 1 & 2).
PR NousResearch#2865 by ayberkesn fixed the same voice note issue.

Co-authored-by: noestelar <hola@noeali.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants