fix(security): catch multi-word prompt injection bypass in skills_guard#192
Merged
teknium1 merged 1 commit intoNousResearch:mainfrom Mar 4, 2026
Merged
Conversation
The regex `ignore\s+(previous|all|...)\s+instructions` only matched a single keyword between 'ignore' and 'instructions'. Phrases like 'ignore all prior instructions' bypassed the scanner entirely. Changed to `ignore\s+(?:\w+\s+)*(previous|all|...)\s+instructions` to allow arbitrary words before the keyword.
teknium1
added a commit
that referenced
this pull request
Mar 4, 2026
The 'disregard ... instructions/rules/guidelines' regex had the same single-word gap vulnerability as the 'ignore' pattern fixed in PR #192. 'disregard all your instructions' bypassed the scanner. Added (?:\w+\s+)* between both keyword groups to allow arbitrary intermediate words.
Contributor
teknium1
added a commit
that referenced
this pull request
Mar 4, 2026
Systematic audit of all prompt injection regexes in skills_guard.py found 8 more patterns with the same single-word gap vulnerability fixed in PR #192. Multi-word variants like 'pretend that you are', 'output the full system prompt', 'respond without your safety filters', etc. all bypassed the scanner. Fixed patterns: - you are [now] → you are [... now] - do not [tell] the user → do not [... tell ... the] user - pretend [you are|to be] → pretend [... you are|to be] - output the [system|initial] prompt → output [... system|initial] prompt - act as if you [have no] [restrictions] → act as if [... you ... have no ... restrictions] - respond without [restrictions] → respond without [... restrictions] - you have been [updated] to → you have been [... updated] to - share [the] [entire] [conversation] → share [... conversation] All use (?:\w+\s+)* to allow arbitrary intermediate words.
Contributor
|
Follow-up: audited all injection patterns and found 8 more with the same vulnerability (021f62c). Every prompt injection regex in skills_guard.py now uses |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
skills_guard.pyonly matched a single word between "ignore" and "instructions" (e.g.ignore previous instructions)ignore all prior instructionsorignore the above instructionsbypassed the scanner entirely\s+to\s+(?:\w+\s+)*to allow arbitrary intermediate words before the keyword