chore: stop app-server auth refresh storms after permanent token failure#15530
chore: stop app-server auth refresh storms after permanent token failure#15530
Conversation
c0f9be3 to
91850c2
Compare
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 163231a64e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
8076ec9 to
41696c8
Compare
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 41696c897b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
41696c8 to
40f642f
Compare
|
@codex review |
|
Codex Review: Didn't find any major issues. 👍 ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
| } | ||
|
|
||
| #[derive(Clone)] | ||
| struct AuthScopedRefreshFailure { |
There was a problem hiding this comment.
Should we store this as the value of auth: Option<CodexAuth>, ?
There was a problem hiding this comment.
CodexAuth is strictly identity related so hopefully we leave it as-is
| error: &RefreshTokenFailedError, | ||
| ) { | ||
| if let Ok(mut guard) = self.inner.write() { | ||
| let current_auth_matches = |
There was a problem hiding this comment.
do you need this condition? The failure encodes the auth and it's compared on read.
There was a problem hiding this comment.
this is still useful, just in case auth actually gets reloaded we don't record a false positive of permanent error
| if let Ok(mut guard) = self.inner.write() { | ||
| let previous = guard.auth.as_ref(); | ||
| let changed = !AuthManager::auths_equal(previous, new_auth.as_ref()); | ||
| let auth_changed_for_refresh = |
There was a problem hiding this comment.
Storing failure along the main auth might naturally perform the cleaning.
There was a problem hiding this comment.
Codex raises a good point: permanent refresh failure is manager-owned state, not part of the auth or identity model itself. That makes the manager the right place to track it. Pushing it down into the auth layer is not a clean fit, because that layer has a broader surface area and more code paths that would need to preserve and propagate the state correctly.
built from #14256. PR description from @etraut-openai:
This PR addresses a hole in PR 11802. The previous PR assumed that app server clients would respond to token refresh failures by presenting the user with an error ("you must log in again") and then not making further attempts to call network endpoints using the expired token. While they do present the user with this error, they don't prevent further attempts to call network endpoints and can repeatedly call
getAuthStatus(refreshToken=true)resulting in many failed calls to the token refresh endpoint.There are three solutions I considered here:
I chose option 3.
Here's Codex's explanation of this change:
When an app-server client asks
getAuthStatus(refreshToken=true), we may try to refresh a stale ChatGPT access token. If that refresh fails permanently (for examplerefresh_token_reused, expired, or revoked), the old behavior was bad in two ways:/oauth/tokenrequests and repeatedly surfacing the same failure.This is especially painful for app-server clients because they poll auth status and can keep driving the refresh path without any real chance of recovery.
This change makes permanent refresh failures terminal for the current managed auth snapshot without changing the app-server API contract.
What changed:
AuthManagernow poisons the current managed auth snapshot in memory after a permanent refresh failure, keyed to the unchangedAuthDotJson.getAuthStatus(includeToken=true)now omitsauthTokenafter a permanent refresh failure instead of handing out the stale cached bearer token.This keeps the current auth method visible to clients, avoids forcing an immediate logout flow, and stops repeated refresh attempts for credentials that cannot recover.