fix(security): catch multi-word prompt injection bypass in skills_guard by 0xbyt4 · Pull Request #192 · NousResearch/hermes-agent

0xbyt4 · 2026-02-28T17:17:23Z

Summary

The prompt injection regex in skills_guard.py only matched a single word between "ignore" and "instructions" (e.g. ignore previous instructions)
Multi-word variants like ignore all prior instructions or ignore the above instructions bypassed the scanner entirely
Fixed by changing \s+ to \s+(?:\w+\s+)* to allow arbitrary intermediate words before the keyword

The regex `ignore\s+(previous|all|...)\s+instructions` only matched a single keyword between 'ignore' and 'instructions'. Phrases like 'ignore all prior instructions' bypassed the scanner entirely. Changed to `ignore\s+(?:\w+\s+)*(previous|all|...)\s+instructions` to allow arbitrary words before the keyword.

The 'disregard ... instructions/rules/guidelines' regex had the same single-word gap vulnerability as the 'ignore' pattern fixed in PR #192. 'disregard all your instructions' bypassed the scanner. Added (?:\w+\s+)* between both keyword groups to allow arbitrary intermediate words.

teknium1 · 2026-03-04T13:55:48Z

Merged in ba214e4 — thanks @0xbyt4!

Found the same single-word gap vulnerability in the disregard pattern on line 175 — disregard all your instructions bypassed it too. Fixed that in the follow-up commit (same technique: (?:\w+\s+)* between keyword groups).

Systematic audit of all prompt injection regexes in skills_guard.py found 8 more patterns with the same single-word gap vulnerability fixed in PR #192. Multi-word variants like 'pretend that you are', 'output the full system prompt', 'respond without your safety filters', etc. all bypassed the scanner. Fixed patterns: - you are [now] → you are [... now] - do not [tell] the user → do not [... tell ... the] user - pretend [you are|to be] → pretend [... you are|to be] - output the [system|initial] prompt → output [... system|initial] prompt - act as if you [have no] [restrictions] → act as if [... you ... have no ... restrictions] - respond without [restrictions] → respond without [... restrictions] - you have been [updated] to → you have been [... updated] to - share [the] [entire] [conversation] → share [... conversation] All use (?:\w+\s+)* to allow arbitrary intermediate words.

teknium1 · 2026-03-04T14:00:51Z

Follow-up: audited all injection patterns and found 8 more with the same vulnerability (021f62c). Every prompt injection regex in skills_guard.py now uses (?:\w+\s+)* between keyword groups to handle multi-word variants.

teknium1 merged commit 520a26c into NousResearch:main Mar 4, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(security): catch multi-word prompt injection bypass in skills_guard#192

fix(security): catch multi-word prompt injection bypass in skills_guard#192
teknium1 merged 1 commit intoNousResearch:mainfrom
0xbyt4:fix/skills-guard-injection-bypass

0xbyt4 commented Feb 28, 2026

teknium1 commented Mar 4, 2026

teknium1 commented Mar 4, 2026

Labels

2 participants

Conversation

0xbyt4 commented Feb 28, 2026

Summary

teknium1 commented Mar 4, 2026

teknium1 commented Mar 4, 2026

Labels

2 participants