Skip to content

core: box wrapper futures to reduce stack pressure#13429

Merged
bolinfest merged 1 commit intomainfrom
pr13429
Mar 4, 2026
Merged

core: box wrapper futures to reduce stack pressure#13429
bolinfest merged 1 commit intomainfrom
pr13429

Conversation

@bolinfest
Copy link
Copy Markdown
Collaborator

@bolinfest bolinfest commented Mar 4, 2026

Follow-up to #13388. This uses the same general fix pattern as #12421, but in the codex-core compact/resume/fork path.

Why

compact_resume_after_second_compaction_preserves_history started overflowing the stack on Windows CI after #13388.

The important part is that this was not a compaction-recursion bug. The test exercises a path with several thin async fn wrappers around much larger thread-spawn, resume, and fork futures. When one async fn awaits another inline, the outer future stores the callee future as part of its own state machine. In a long wrapper chain, that means a caller can accidentally inline a lot more state than the source code suggests.

That is exactly what was happening here:

  • ThreadManager convenience methods such as start_thread, resume_thread_from_rollout, and fork_thread were inlining the larger spawn/resume futures beneath them.
  • core_test_support::test_codex added another wrapper layer on top of those same paths.
  • compact_resume_fork adds a few more helpers, and this particular test drives the resume/fork path multiple times.

On Windows, that was enough to push both the libtest thread and Tokio worker threads over the edge. The previous 8 MiB test-thread workaround proved the failure was stack-related, but it did not address the underlying future size.

How This Was Debugged

The useful debugging pattern here was to turn the CI-only failure into a local low-stack repro.

  1. First, remove the explicit large-stack harness so the test runs on the normal #[tokio::test] path.
  2. Build the test binary normally.
  3. Re-run the already-built tests/all binary directly with progressively smaller RUST_MIN_STACK values.

Running the built binary directly matters: it keeps the reduced stack size focused on the test process instead of also applying it to cargo and rustc.

That made it possible to answer two questions quickly:

  • Does the failure still reproduce without the workaround? Yes.
  • Does boxing the wrapper futures actually buy back stack headroom? Also yes.

After this change, the built test binary passes with RUST_MIN_STACK=917504 and still overflows at 786432, which is enough evidence to justify removing the explicit 8 MiB override while keeping a deterministic low-stack repro for future debugging.

If we hit a similar issue again, the first places to inspect are thin async fn wrappers that mostly forward into a much larger async implementation.

Box::pin() Primer

async fn compiles into a state machine. If a wrapper does this:

async fn wrapper() {
    inner().await;
}

then wrapper() stores the full inner() future inline as part of its own state.

If the wrapper instead does this:

async fn wrapper() {
    Box::pin(inner()).await;
}

then the child future lives on the heap, and the outer future only stores a pinned pointer to it. That usually trades one allocation for a substantially smaller outer future, which is exactly the tradeoff we want when the problem is stack pressure rather than raw CPU time.

Useful references:

What Changed

  • Boxed the wrapper futures in core/src/thread_manager.rs around start_thread, resume_thread_from_rollout, fork_thread, and the corresponding ThreadManagerState spawn helpers so callers no longer inline the full spawn/resume state machine through multiple layers.
  • Boxed the matching test-only wrapper futures in core/tests/common/test_codex.rs and core/tests/suite/compact_resume_fork.rs, which sit directly on top of the same path.
  • Restored compact_resume_after_second_compaction_preserves_history in core/tests/suite/compact_resume_fork.rs to a normal #[tokio::test] and removed the explicit TEST_STACK_SIZE_BYTES thread/runtime sizing.
  • Simplified a tiny helper in compact_resume_fork by making fetch_conversation_path() synchronous, which removes one more unnecessary future layer from the test path.

Verification

  • cargo test -p codex-core --test all suite::compact_resume_fork::compact_resume_after_second_compaction_preserves_history -- --exact --nocapture
  • cargo test -p codex-core --test all suite::compact_resume_fork -- --nocapture
  • Re-ran the built codex-core tests/all binary directly with reduced stack sizes:
    • RUST_MIN_STACK=917504 passes
    • RUST_MIN_STACK=786432 still overflows
  • cargo test -p codex-core
    • Still fails locally in unrelated existing integration areas that expect the codex / test_stdio_server binaries or hit the existing search_tool wiremock mismatches.
@bolinfest bolinfest changed the title core: box thread spawn wrappers to reduce stack pressure Mar 4, 2026
@bolinfest bolinfest requested a review from charley-oai March 4, 2026 05:22
@bolinfest bolinfest enabled auto-merge (squash) March 4, 2026 05:32
@bolinfest bolinfest merged commit 7134220 into main Mar 4, 2026
54 of 57 checks passed
@bolinfest bolinfest deleted the pr13429 branch March 4, 2026 05:44
@github-actions github-actions bot locked and limited conversation to collaborators Mar 4, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

2 participants