Conversation
charley-oai
approved these changes
Mar 4, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Follow-up to #13388. This uses the same general fix pattern as #12421, but in the
codex-corecompact/resume/fork path.Why
compact_resume_after_second_compaction_preserves_historystarted overflowing the stack on Windows CI after#13388.The important part is that this was not a compaction-recursion bug. The test exercises a path with several thin
async fnwrappers around much larger thread-spawn, resume, and fork futures. When oneasync fnawaits another inline, the outer future stores the callee future as part of its own state machine. In a long wrapper chain, that means a caller can accidentally inline a lot more state than the source code suggests.That is exactly what was happening here:
ThreadManagerconvenience methods such asstart_thread,resume_thread_from_rollout, andfork_threadwere inlining the larger spawn/resume futures beneath them.core_test_support::test_codexadded another wrapper layer on top of those same paths.compact_resume_forkadds a few more helpers, and this particular test drives the resume/fork path multiple times.On Windows, that was enough to push both the libtest thread and Tokio worker threads over the edge. The previous 8 MiB test-thread workaround proved the failure was stack-related, but it did not address the underlying future size.
How This Was Debugged
The useful debugging pattern here was to turn the CI-only failure into a local low-stack repro.
#[tokio::test]path.tests/allbinary directly with progressively smallerRUST_MIN_STACKvalues.Running the built binary directly matters: it keeps the reduced stack size focused on the test process instead of also applying it to
cargoandrustc.That made it possible to answer two questions quickly:
After this change, the built test binary passes with
RUST_MIN_STACK=917504and still overflows at786432, which is enough evidence to justify removing the explicit 8 MiB override while keeping a deterministic low-stack repro for future debugging.If we hit a similar issue again, the first places to inspect are thin
async fnwrappers that mostly forward into a much larger async implementation.Box::pin()Primerasync fncompiles into a state machine. If a wrapper does this:then
wrapper()stores the fullinner()future inline as part of its own state.If the wrapper instead does this:
then the child future lives on the heap, and the outer future only stores a pinned pointer to it. That usually trades one allocation for a substantially smaller outer future, which is exactly the tradeoff we want when the problem is stack pressure rather than raw CPU time.
Useful references:
Box::pinWhat Changed
core/src/thread_manager.rsaroundstart_thread,resume_thread_from_rollout,fork_thread, and the correspondingThreadManagerStatespawn helpers so callers no longer inline the full spawn/resume state machine through multiple layers.core/tests/common/test_codex.rsandcore/tests/suite/compact_resume_fork.rs, which sit directly on top of the same path.compact_resume_after_second_compaction_preserves_historyincore/tests/suite/compact_resume_fork.rsto a normal#[tokio::test]and removed the explicitTEST_STACK_SIZE_BYTESthread/runtime sizing.compact_resume_forkby makingfetch_conversation_path()synchronous, which removes one more unnecessary future layer from the test path.Verification
cargo test -p codex-core --test all suite::compact_resume_fork::compact_resume_after_second_compaction_preserves_history -- --exact --nocapturecargo test -p codex-core --test all suite::compact_resume_fork -- --nocapturecodex-coretests/allbinary directly with reduced stack sizes:RUST_MIN_STACK=917504passesRUST_MIN_STACK=786432still overflowscargo test -p codex-corecodex/test_stdio_serverbinaries or hit the existingsearch_toolwiremock mismatches.