Conversation
|
All contributors have signed the CLA ✍️ ✅ |
|
I have read the CLA Document and I hereby sign the CLA |
ad59823 to
dbeba0c
Compare
| turn_id: Option<String>, | ||
| parse: fn(&ConfiguredHandler, CommandRunResult, Option<String>) -> ParsedHandler<T>, | ||
| ) -> Vec<ParsedHandler<T>> { | ||
| let results = join_all( |
There was a problem hiding this comment.
this means you run everything in parallel. So the ordering of the hooks is not respected, even for "Sync" hooks
There was a problem hiding this comment.
That's on purpose based on my understanding of the existing spec. I'm unclear what sync hooks are
There was a problem hiding this comment.
Would be good to make sure we don't miss something here
There was a problem hiding this comment.
i have the same confusion. is sync vs. async whether they block the turn, rather than whether they're serialized? this should be clarified in docs/code, otherwise I'd think I could serialize hook execution (hook A formats, hook B runs tests given formatting finished).
There was a problem hiding this comment.
As far as I'm aware, the mechanics are correct here:
- Hooks execute in parallel to maximize overall processing speed
- The agent loop waits for the hook to complete in the typical case
- The spec supports doing async (i.e. hooks run in the background - we'll do that later)
|
@jif-oai Thank you so much for the thorough review! Super helpful. I was trying to keep this as minimal as I could, focussed on laying the groundwork, so there's a bunch of stuff not fully implemented even for these two hooks. That'll come as a fast follow. But you've also identified a couple issue here, so I'm about to correct these 🙏 |
14b625c to
efcd98b
Compare
jif-oai
left a comment
There was a problem hiding this comment.
Looks better to me. I think we should discuss the sandboxing point
codex-rs/hooks/src/events/stop.rs
Outdated
| should_block = true; | ||
| block_reason = parsed.reason.clone(); | ||
| block_message_for_model = parsed.reason.clone(); | ||
| if let Some(reason) = parsed.reason { |
There was a problem hiding this comment.
How does it work if there is no reason? How is the model supposed to know what to do? What do you restart the turn on?
There was a problem hiding this comment.
+1, restarting the turn with no additional context seems bad.
There was a problem hiding this comment.
Totally agree, great catch, making a change to enforce that reason has to be there if stop-block is true (which is also part of the spec)
There was a problem hiding this comment.
Stop hook (failed)
warning: Wizard Tower Stop hook reviewed the completed reply (129 chars).
error: hook returned decision "block" without a non-empty reason
Implemented. Also handling the exit code 2 with black stderr variant of the same issue
| .stderr(Stdio::piped()) | ||
| .kill_on_drop(true); | ||
|
|
||
| let mut child = match command.spawn() { |
There was a problem hiding this comment.
Discussion point on sandboxing here:
Hooks run outside of our sandboxing which, I think, come from the fact that a hook is user defined. But Stop hooks for example take the assistant message as input. So it means that in practice, a hook can execute something coming from the model. That sounds a bit dangerous to me. Any thoughts on this @bolinfest ?
There was a problem hiding this comment.
Would love to get Bolin's thoughts, and yeah, hooks are inherently unsafe -- though I don't think it's on us to provide their security
codex-rs/hooks/src/events/stop.rs
Outdated
| } | ||
| None => match run_result.exit_code { | ||
| Some(0) => { | ||
| if let Some(parsed) = output_parser::parse_stop(&run_result.stdout) { |
There was a problem hiding this comment.
We shouldn't silently ignore if the JSON parsing fails
There was a problem hiding this comment.
Great point, thank you, implemented
sayan-oai
left a comment
There was a problem hiding this comment.
mostly behavioral questions. the overall structure makes sense to me.
| turn_id: Option<String>, | ||
| parse: fn(&ConfiguredHandler, CommandRunResult, Option<String>) -> ParsedHandler<T>, | ||
| ) -> Vec<ParsedHandler<T>> { | ||
| let results = join_all( |
There was a problem hiding this comment.
i have the same confusion. is sync vs. async whether they block the turn, rather than whether they're serialized? this should be clarified in docs/code, otherwise I'd think I could serialize hook execution (hook A formats, hook B runs tests given formatting finished).
| }); | ||
| } | ||
| } | ||
| } else if let Some(additional_context) = trimmed_non_empty(&run_result.stdout) { |
There was a problem hiding this comment.
do we want to inject stdout into model context if parsing fails? this also seems different than the behavior for stop hooks, which doesn't inject on parse failure (it seems to silently no-op).
There was a problem hiding this comment.
Thank you, now treating malformed JSON-like output as an error, while preserving intentional plain text output
| additional_context_for_model = Some(additional_context); | ||
| } | ||
| } | ||
| let _ = parsed.universal.suppress_output; |
There was a problem hiding this comment.
should we include suppress_output in the schema if we ignore it here and on Stop?
There was a problem hiding this comment.
im assuming this is deferred, like clear. we should document this in code.
| .await; | ||
| } | ||
| if stop_outcome.should_block { | ||
| if stop_hook_active { |
There was a problem hiding this comment.
maybe this is aspirational/deferred, but is it possible to have a less coarse way of defending against infinite loops? instead of a bool, could we filter by unique reasons, or have a >1 limit? it seems plausible to want to have >1 stop hook run per turn. feel free to push back, not blocking.
There was a problem hiding this comment.
Yes, that's coming in the next PR using the stop_hook_active mechanism
| } | ||
|
|
||
| #[test] | ||
| fn select_handlers_keeps_duplicate_stop_handlers() { |
There was a problem hiding this comment.
do we want to keep duplicated handlers? why not dedupe?
There was a problem hiding this comment.
Yeah, I had it an earlier version of the PR, but jif pointed out that it's probably a mistake. If the user's config has duplicates defined, it could be for a valid usecase (the script does something different the second time) and generally I think respecting that is probably right. Though if you disagree, happy to put it back
|
Appreciate the hardening pass, feels like we're close! Changes are made, please review once more 🙏 |
# Conflicts: # codex-rs/app-server-protocol/schema/typescript/EventMsg.ts # codex-rs/app-server-protocol/schema/typescript/ServerNotification.ts # codex-rs/app-server/src/bespoke_event_handling.rs # codex-rs/core/src/features.rs # codex-rs/exec/src/event_processor_with_human_output.rs
# Conflicts: # codex-rs/core/src/state/session.rs
# Conflicts: # codex-rs/core/src/codex_tests.rs
(Experimental)
This PR adds a first MVP for hooks, with SessionStart and Stop
The core design is:
On the AppServer side, hooks are exposed as operational metadata rather than transcript-native items:
Hooks messages are not persisted, they remain ephemeral. The context changes they add are (they get appended to the user's prompt)