fix(core): prevent hanging turn/start due to websocket warming issues#14838
fix(core): prevent hanging turn/start due to websocket warming issues#14838
Conversation
c96827b to
e0db860
Compare
abe0e47 to
c6c288f
Compare
|
@codex review |
ac37a80 to
cf34cf3
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d880b09652
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
bc399c5 to
7fa8577
Compare
| Ok(new_conn) => new_conn, | ||
| Err(err) => { | ||
| if matches!(err, ApiError::Transport(TransportError::Timeout)) { | ||
| self.reset_websocket_session(); |
There was a problem hiding this comment.
Why is this needed? Don't we auto retry?
There was a problem hiding this comment.
so a retry does happen but at the turn level, not by creating a brand new websocket client from scratch
So when a request fails, Codex retries the same turn using the same turn-scoped session object. If we do not call reset_websocket_session(), that retry can carry leftover websocket state from the timed-out attempt and act like it is continuing a connection that never really came up.
10f193c to
0a87c31
Compare
fc3b456 to
4f88caa
Compare
| use similar::TextDiff; | ||
| use tempfile::TempDir; | ||
|
|
||
| fn trim_single_trailing_newline(contents: &str) -> &str { |
4f88caa to
be145ab
Compare
Description
This PR fixes a bad first-turn failure mode in app-server when the startup websocket prewarm hangs. Before this change,
initialize -> thread/start -> turn/startcould sit behind the prewarm for up to five minutes, so the client would not seeturn/started, and eventurn/interruptwould block because the turn had not actually started yet.Now, we:
websocket_startup_timeout_msin config.tomlturn/startedis sent immediately onturn/starteven if the websocket is still connectingturn/interruptcan be used to cancel a turn that is still waiting on the websocket warmupWhy
The old behavior made app-server feel stuck at exactly the moment the client expects turn lifecycle events to start flowing. That was especially painful for external clients, because from their point of view the server had accepted the request but then went silent for minutes.
Configuring the websocket startup timeout
Can set it in config.toml like this: