Skip to content

chore: stop app-server auth refresh storms after permanent token failure#15530

Merged
celia-oai merged 8 commits intomainfrom
dev/cc/token-refresh
Mar 24, 2026
Merged

chore: stop app-server auth refresh storms after permanent token failure#15530
celia-oai merged 8 commits intomainfrom
dev/cc/token-refresh

Conversation

@celia-oai
Copy link
Copy Markdown
Collaborator

@celia-oai celia-oai commented Mar 23, 2026

built from #14256. PR description from @etraut-openai:

This PR addresses a hole in PR 11802. The previous PR assumed that app server clients would respond to token refresh failures by presenting the user with an error ("you must log in again") and then not making further attempts to call network endpoints using the expired token. While they do present the user with this error, they don't prevent further attempts to call network endpoints and can repeatedly call getAuthStatus(refreshToken=true) resulting in many failed calls to the token refresh endpoint.

There are three solutions I considered here:

  1. Change the getAuthStatus app server call to return a null auth if the caller specified "refreshToken" on input and the refresh attempt fails. This will cause clients to immediately log out the user and return them to the log in screen. This is a really bad user experience. It's also a breaking change in the app server contract that could break third-party clients.
  2. Augment the getAuthStatus app server call to return an additional field that indicates the state of "token could not be refreshed". This is a non-breaking change to the app server API, but it requires non-trivial changes for all clients to properly handle this new field properly.
  3. Change the getAuthStatus implementation to handle the case where a token refresh fails by marking the AuthManager's in-memory access and refresh tokens as "poisoned" so it they are no longer used. This is the simplest fix that requires no client changes.

I chose option 3.

Here's Codex's explanation of this change:

When an app-server client asks getAuthStatus(refreshToken=true), we may try to refresh a stale ChatGPT access token. If that refresh fails permanently (for example refresh_token_reused, expired, or revoked), the old behavior was bad in two ways:

  1. We kept the in-memory auth snapshot alive as if it were still usable.
  2. Later auth checks could retry refresh again and again, creating a storm of doomed /oauth/token requests and repeatedly surfacing the same failure.

This is especially painful for app-server clients because they poll auth status and can keep driving the refresh path without any real chance of recovery.

This change makes permanent refresh failures terminal for the current managed auth snapshot without changing the app-server API contract.

What changed:

  • AuthManager now poisons the current managed auth snapshot in memory after a permanent refresh failure, keyed to the unchanged AuthDotJson.
  • Once poisoned, later refresh attempts for that same snapshot fail fast locally without calling the auth service again.
  • The poison is cleared automatically when auth materially changes, such as a new login, logout, or reload of different auth state from storage.
  • getAuthStatus(includeToken=true) now omits authToken after a permanent refresh failure instead of handing out the stale cached bearer token.

This keeps the current auth method visible to clients, avoids forcing an immediate logout flow, and stops repeated refresh attempts for credentials that cannot recover.

@celia-oai celia-oai changed the title Dev/cc/token refresh Mar 23, 2026
@celia-oai celia-oai changed the title chore: Stop app-server auth refresh storms after permanent token failure Mar 23, 2026
@celia-oai celia-oai force-pushed the dev/cc/token-refresh branch from c0f9be3 to 91850c2 Compare March 23, 2026 20:03
@celia-oai celia-oai requested a review from pakrym-oai March 23, 2026 21:18
@celia-oai celia-oai marked this pull request as ready for review March 23, 2026 21:19
@celia-oai
Copy link
Copy Markdown
Collaborator Author

@codex review

Copy link
Copy Markdown
Contributor

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 163231a64e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@celia-oai celia-oai force-pushed the dev/cc/token-refresh branch 2 times, most recently from 8076ec9 to 41696c8 Compare March 23, 2026 22:46
@celia-oai
Copy link
Copy Markdown
Collaborator Author

@codex review

2 similar comments
@celia-oai
Copy link
Copy Markdown
Collaborator Author

@codex review

@celia-oai
Copy link
Copy Markdown
Collaborator Author

@codex review

Copy link
Copy Markdown
Contributor

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 41696c897b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@celia-oai celia-oai force-pushed the dev/cc/token-refresh branch from 41696c8 to 40f642f Compare March 23, 2026 23:29
@celia-oai
Copy link
Copy Markdown
Collaborator Author

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown
Contributor

Codex Review: Didn't find any major issues. 👍

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@celia-oai celia-oai requested a review from pakrym-oai March 24, 2026 17:52
}

#[derive(Clone)]
struct AuthScopedRefreshFailure {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we store this as the value of auth: Option<CodexAuth>, ?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CodexAuth is strictly identity related so hopefully we leave it as-is

error: &RefreshTokenFailedError,
) {
if let Ok(mut guard) = self.inner.write() {
let current_auth_matches =
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you need this condition? The failure encodes the auth and it's compared on read.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is still useful, just in case auth actually gets reloaded we don't record a false positive of permanent error

if let Ok(mut guard) = self.inner.write() {
let previous = guard.auth.as_ref();
let changed = !AuthManager::auths_equal(previous, new_auth.as_ref());
let auth_changed_for_refresh =
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Storing failure along the main auth might naturally perform the cleaning.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Codex raises a good point: permanent refresh failure is manager-owned state, not part of the auth or identity model itself. That makes the manager the right place to track it. Pushing it down into the auth layer is not a clean fit, because that layer has a broader surface area and more code paths that would need to preserve and propagate the state correctly.

@celia-oai celia-oai merged commit 88694e8 into main Mar 24, 2026
36 checks passed
@celia-oai celia-oai deleted the dev/cc/token-refresh branch March 24, 2026 19:40
@github-actions github-actions bot locked and limited conversation to collaborators Mar 24, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

3 participants