Skip to content

Stabilize thread resume replay tests#13885

Merged
aibrahim-oai merged 1 commit intomainfrom
dev/flaky-thread-resume-replay-tests
Mar 9, 2026
Merged

Stabilize thread resume replay tests#13885
aibrahim-oai merged 1 commit intomainfrom
dev/flaky-thread-resume-replay-tests

Conversation

@aibrahim-oai
Copy link
Copy Markdown
Collaborator

@aibrahim-oai aibrahim-oai commented Mar 7, 2026

What changed

  • The thread-resume replay tests now use unchecked mock sequencing so the replay flow can complete before the test asserts.
  • They also poll outbound /responses request counts and fail immediately if replay emits an unexpected extra request.

Why this fixes the flake

  • The previous version asserted while the replay machinery was still mid-flight, so the test was sometimes checking an intermediate state instead of the completed behavior.
  • Strict mock sequencing made that problem worse by forcing the test to care about exact sub-step timing rather than about the end result.
  • Letting replay settle and then asserting on stabilized request counts makes the test validate the real contract: the replay path finishes and does not send extra model requests.

Scope

  • Test-only change.
aibrahim-oai added a commit that referenced this pull request Mar 7, 2026
aibrahim-oai added a commit that referenced this pull request Mar 7, 2026
aibrahim-oai added a commit that referenced this pull request Mar 7, 2026
@aibrahim-oai aibrahim-oai force-pushed the dev/flaky-thread-resume-replay-tests branch from d4e3293 to b42545d Compare March 7, 2026 19:08
aibrahim-oai added a commit that referenced this pull request Mar 7, 2026
aibrahim-oai added a commit that referenced this pull request Mar 8, 2026
@aibrahim-oai aibrahim-oai force-pushed the dev/flaky-thread-resume-replay-tests branch from 625d187 to 56be931 Compare March 8, 2026 05:39
aibrahim-oai added a commit that referenced this pull request Mar 8, 2026
@aibrahim-oai aibrahim-oai force-pushed the dev/flaky-thread-resume-replay-tests branch from 56be931 to 889d399 Compare March 8, 2026 05:40
primary.read_stream_until_notification_message("turn/completed"),
)
.await??;
wait_for_responses_request_count(&server, 3).await?;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, do we care about how many requests that we sent to responses? seems brittle. can we keep just the create_mock_responses_server_sequence_unchecked fix and remove this one?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the wait is what removes the flakiness

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting, i don't understand why if it's the last thing we await on in this test 🤷

@aibrahim-oai aibrahim-oai force-pushed the dev/flaky-thread-resume-replay-tests branch 2 times, most recently from f82cf88 to 35ed827 Compare March 9, 2026 16:54
@aibrahim-oai aibrahim-oai force-pushed the dev/flaky-thread-resume-replay-tests branch from 35ed827 to 1e61d90 Compare March 9, 2026 17:21
@aibrahim-oai aibrahim-oai merged commit 10bf600 into main Mar 9, 2026
31 checks passed
@aibrahim-oai aibrahim-oai deleted the dev/flaky-thread-resume-replay-tests branch March 9, 2026 17:41
@github-actions github-actions bot locked and limited conversation to collaborators Mar 9, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

2 participants