Skip to content

fix: store asyncio task references to prevent GC mid-execution#3267

Merged
teknium1 merged 1 commit intomainfrom
hermes/hermes-dd753a5f
Mar 26, 2026
Merged

fix: store asyncio task references to prevent GC mid-execution#3267
teknium1 merged 1 commit intomainfrom
hermes/hermes-dd753a5f

Conversation

@teknium1
Copy link
Copy Markdown
Contributor

Summary

Python's asyncio event loop holds only weak references to tasks. Without a strong reference, the GC can destroy a task while it's awaiting I/O — silently dropping messages. Python 3.12+ made this more aggressive (docs, SO discussion).

Full audit of all 12 gateway platform adapters found 6 untracked create_task calls across 6 files:

Per-message tasks (tracked via _background_tasks set from base class)

  • webhook.pyhandle_message task
  • sms.pyhandle_message task
  • signal.py — SSE response aclose task

Long-running infrastructure tasks (stored in named instance vars)

  • slack.py — Socket Mode handler → self._socket_mode_task
  • discord.py — bot client → self._bot_task
  • whatsapp.py — message poll loop → self._poll_task (2 call sites)

All other adapters (telegram, mattermost, matrix, email, homeassistant, dingtalk) already tracked their tasks correctly.

Salvaged from #3160 by @memosr — expanded from 1 file to 6.

Test plan

  • Gateway tests: 1461 passed
  • Full suite: 6226 passed (1 pre-existing failure deselected)
Python's asyncio event loop holds only weak references to tasks.
Without a strong reference, the garbage collector can destroy a task
while it's awaiting I/O — silently dropping messages. Python 3.12+
made this more aggressive.

Audit of all gateway platform adapters found 6 untracked create_task
calls across 6 files:

Per-message tasks (tracked via _background_tasks set from base class):
- gateway/platforms/webhook.py: handle_message task
- gateway/platforms/sms.py: handle_message task
- gateway/platforms/signal.py: SSE response aclose task

Long-running infrastructure tasks (stored in named instance vars):
- gateway/platforms/slack.py: Socket Mode handler (_socket_mode_task)
- gateway/platforms/discord.py: bot client (_bot_task)
- gateway/platforms/whatsapp.py: message poll loop (_poll_task, 2 sites)

All other adapters (telegram, mattermost, matrix, email, homeassistant,
dingtalk) already tracked their tasks correctly.

Salvaged from PR #3160 by memosr — expanded from 1 file to 6.
@teknium1 teknium1 merged commit 243ee67 into main Mar 26, 2026
1 of 2 checks passed
StreamOfRon pushed a commit to StreamOfRon/hermes-agent that referenced this pull request Mar 29, 2026
…esearch#3267)

Python's asyncio event loop holds only weak references to tasks.
Without a strong reference, the garbage collector can destroy a task
while it's awaiting I/O — silently dropping messages. Python 3.12+
made this more aggressive.

Audit of all gateway platform adapters found 6 untracked create_task
calls across 6 files:

Per-message tasks (tracked via _background_tasks set from base class):
- gateway/platforms/webhook.py: handle_message task
- gateway/platforms/sms.py: handle_message task
- gateway/platforms/signal.py: SSE response aclose task

Long-running infrastructure tasks (stored in named instance vars):
- gateway/platforms/slack.py: Socket Mode handler (_socket_mode_task)
- gateway/platforms/discord.py: bot client (_bot_task)
- gateway/platforms/whatsapp.py: message poll loop (_poll_task, 2 sites)

All other adapters (telegram, mattermost, matrix, email, homeassistant,
dingtalk) already tracked their tasks correctly.

Salvaged from PR NousResearch#3160 by memosr — expanded from 1 file to 6.
teknium1 added a commit that referenced this pull request Mar 29, 2026
Replace per-request aiohttp.ClientSession() in every WhatsApp adapter
method with a single persistent self._http_session, matching the pattern
used by Mattermost, HomeAssistant, and SMS adapters.

Changes:
- Create self._http_session in connect(), close in disconnect()
- All bridge HTTP calls (send, edit, send-media, typing, get_chat_info,
  poll_messages) now use the shared session
- Explicitly cancel _poll_task on disconnect() instead of relying
  solely on self._running = False
- Health-check sessions in connect() remain ephemeral (persistent
  session not yet created at that point)
- Remove per-method ImportError guards for aiohttp (always available
  when gateway runs via [messaging] extras)

Salvaged from PR #1851 by Himess. The _poll_task storage was already
on main from PR #3267; this adds the disconnect cancellation and the
persistent session.

Tests: 4 new tests for session close, already-closed skip, poll task
cancellation, and done-task skip.
teknium1 added a commit that referenced this pull request Mar 29, 2026
Replace per-request aiohttp.ClientSession() in every WhatsApp adapter
method with a single persistent self._http_session, matching the pattern
used by Mattermost, HomeAssistant, and SMS adapters.

Changes:
- Create self._http_session in connect(), close in disconnect()
- All bridge HTTP calls (send, edit, send-media, typing, get_chat_info,
  poll_messages) now use the shared session
- Explicitly cancel _poll_task on disconnect() instead of relying
  solely on self._running = False
- Health-check sessions in connect() remain ephemeral (persistent
  session not yet created at that point)
- Remove per-method ImportError guards for aiohttp (always available
  when gateway runs via [messaging] extras)

Salvaged from PR #1851 by Himess. The _poll_task storage was already
on main from PR #3267; this adds the disconnect cancellation and the
persistent session.

Tests: 4 new tests for session close, already-closed skip, poll task
cancellation, and done-task skip.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

1 participant