Skip to content

Fix stdio-to-uds peer-close flake#13882

Merged
aibrahim-oai merged 4 commits intomainfrom
dev/flaky-stdio-to-uds-peer-close
Mar 12, 2026
Merged

Fix stdio-to-uds peer-close flake#13882
aibrahim-oai merged 4 commits intomainfrom
dev/flaky-stdio-to-uds-peer-close

Conversation

@aibrahim-oai
Copy link
Copy Markdown
Collaborator

@aibrahim-oai aibrahim-oai commented Mar 7, 2026

What changed

  • codex-stdio-to-uds now tolerates NotConnected when shutdown(Write) happens after the peer has already closed.
  • The socket test was rewritten to send stdin from a fixture file and to read an exact request payload length instead of waiting on EOF timing.

Why this fixes the flake

  • This one exposed a real cross-platform runtime edge case: on macOS, the peer can close first after a successful exchange, and shutdown(Write) can report NotConnected even though the interaction already succeeded.
  • Treating that specific ordering as a harmless shutdown condition removes the production-level false failure.
  • The old test compounded the problem by depending on EOF timing, which varies by platform and scheduler. Exact-length IO makes the test deterministic and focused on the actual data exchange.

Scope

  • Production logic change with matching test rewrite.
@aibrahim-oai aibrahim-oai force-pushed the dev/flaky-stdio-to-uds-peer-close branch from 22a8636 to 84c6afe Compare March 8, 2026 05:14
@aibrahim-oai aibrahim-oai requested a review from bolinfest March 8, 2026 06:32
@aibrahim-oai
Copy link
Copy Markdown
Collaborator Author

@codex review this

Copy link
Copy Markdown
Contributor

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 84c6afe447

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

"anyhow",
"chrono",
"clap",
"codex-otel",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Remove unintended codex-state lockfile dependency

Cargo.lock now adds codex-otel under the codex-state package, but codex-rs/state/Cargo.toml does not declare that dependency. This lockfile drift can make cargo --locked/CI fail with a lock-update-required error, even though the stdio-to-uds change is unrelated.

Useful? React with 👍 / 👎.

stream
.shutdown(Shutdown::Write)
.context("failed to shutdown socket writer")?;
// The peer can close immediately after sending its response. On macOS that
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we certain that this can only happen on macOS?

Ok(())
});

Command::new(codex_utils_cargo_bin::cargo_bin("codex-stdio-to-uds")?)
let stdin = std::fs::File::open(&request_path).context("failed to open child stdin fixture")?;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like either here or in a docstring on pipes_stdin_and_stdout_through_socket(), some of the PR body should be memorialized as a comment to explain why this test code is complicated.

If I understand correctly, a key change is using read_exact() in server_thread instead of read_to_end() to eliminate the dependency on EOF timing.

I'm unclear whether the move from assert_cmd::Command to std::process::Command is necessary to ensure the test is no longer flaky. Is it mainly there to build up server_events so that if the test does fail, the error includes the server_events received thus far, making it easier to debug?

@aibrahim-oai aibrahim-oai merged commit 09aa71a into main Mar 12, 2026
32 checks passed
@aibrahim-oai aibrahim-oai deleted the dev/flaky-stdio-to-uds-peer-close branch March 12, 2026 16:52
@github-actions github-actions bot locked and limited conversation to collaborators Mar 12, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

2 participants