Skip to content

login: add custom CA support for login flows#14178

Merged
joshka-oai merged 5 commits intomainfrom
joshka/login-custom-ca-rework
Mar 13, 2026
Merged

login: add custom CA support for login flows#14178
joshka-oai merged 5 commits intomainfrom
joshka/login-custom-ca-rework

Conversation

@joshka-oai
Copy link
Copy Markdown
Contributor

@joshka-oai joshka-oai commented Mar 10, 2026

Stacked PRs

This work is split across three stacked PRs:

Review order: #14178, then #14239, then #14240.

Supersedes #6864.

Thanks to @3axap4eHko for the original implementation and investigation here. Although this version rearranges the code and history significantly, the majority of the credit for this work belongs to them.

Problem

Login flows need to work in enterprise environments where outbound TLS is intercepted by an internal proxy or gateway. In those setups, system root certificates alone are often insufficient to validate the OAuth and device-code endpoints used during login. The change adds a login-specific custom CA loading path, but the important contracts around env precedence, PEM compatibility, test boundaries, and probe-only workarounds need to be explicit so reviewers can understand what behavior is intentional.

For users and operators, the behavior is simple: if login needs to trust a custom root CA, set CODEX_CA_CERTIFICATE to a PEM file containing one or more certificates. If that variable is unset, login falls back to SSL_CERT_FILE. If neither is set, login uses system roots. Invalid or empty PEM files now fail with an error that points back to those environment variables and explains how to recover.

What This Delivers

Users can now make Codex login work behind enterprise TLS interception by pointing CODEX_CA_CERTIFICATE at a PEM bundle containing the relevant root certificates. If that variable is unset, login falls back to SSL_CERT_FILE, then to system roots.

This PR applies that behavior to both browser-based and device-code login flows. It also makes login tolerant of the PEM shapes operators actually have in hand: multi-certificate bundles, OpenSSL TRUSTED CERTIFICATE labels, and bundles that include well-formed CRLs.

Mental model

codex-login is the place where the login flows construct ad hoc outbound HTTP clients. That makes it the right boundary for a narrow CA policy: look for CODEX_CA_CERTIFICATE, fall back to SSL_CERT_FILE, load every parseable certificate block in that bundle into a reqwest::Client, and fail early with a clear user-facing error if the bundle is unreadable or malformed.

The implementation is intentionally pragmatic about PEM input shape. It accepts ordinary certificate bundles, multi-certificate bundles, OpenSSL TRUSTED CERTIFICATE labels, and bundles that also contain CRLs. It does not validate a certificate chain or prove a handshake; it only constructs the root store used by login.

Non-goals

This change does not introduce a general-purpose transport abstraction for the rest of the product. It does not validate whether the provided bundle forms a real chain, and it does not add handshake-level integration tests against a live TLS server. It also does not change login state management or OAuth semantics beyond ensuring the existing flows share the same CA-loading rules.

Tradeoffs

The main tradeoff is keeping this logic scoped to login-specific client construction rather than lifting it into a broader shared HTTP layer. That keeps the review surface smaller, but it also means future login-adjacent code must continue to use build_login_http_client() or it can silently bypass enterprise CA overrides.

The TRUSTED CERTIFICATE handling is also intentionally a local compatibility shim. The rustls ecosystem does not currently accept that PEM label upstream, so the code normalizes it locally and trims the OpenSSL X509_AUX trailer bytes down to the certificate DER that reqwest can consume.

Architecture

custom_ca.rs is now the single place that owns login CA behavior. It selects the CA file from the environment, reads it, normalizes PEM label shape where needed, iterates mixed PEM sections with rustls-pki-types, ignores CRLs, trims OpenSSL trust metadata when necessary, and returns either a configured reqwest::Client or a typed error.

The browser login server and the device-code flow both call build_login_http_client(), so they share the same trust-store policy. Environment-sensitive tests run through the login_ca_probe helper binary because those tests must control process-wide env vars and cannot reliably build a real reqwest client in-process on macOS seatbelt runs.

Observability

The custom CA path logs which environment variable selected the bundle, which file path was loaded, how many certificates were accepted, when TRUSTED CERTIFICATE labels were normalized, when CRLs were ignored, and where client construction failed. Returned errors remain user-facing and include the relevant path, env var, and remediation hint.

This gives enough signal for three audiences:

  • users can see why login failed and which env/file caused it
  • sysadmins can confirm which override actually won
  • developers can tell whether the failure happened during file read, PEM parsing, certificate registration, or final reqwest client construction

Tests

Pure unit tests stay limited to env precedence and empty-value handling. Real client construction lives in subprocess tests so the suite remains hermetic with respect to process env and macOS sandbox behavior.

The subprocess tests verify:

  • CODEX_CA_CERTIFICATE precedence over SSL_CERT_FILE
  • fallback to SSL_CERT_FILE
  • single-certificate and multi-certificate bundles
  • malformed and empty-bundle errors
  • OpenSSL TRUSTED CERTIFICATE handling
  • CRL tolerance for well-formed CRL sections

The named PEM fixtures under login/tests/fixtures/ are shared by the tests so their purpose stays reviewable.

@joshka-oai joshka-oai force-pushed the joshka/login-custom-ca-rework branch 2 times, most recently from 8e585bd to a1844fe Compare March 10, 2026 04:57
Copy link
Copy Markdown
Contributor Author

@joshka-oai joshka-oai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comparison to the original PR (#6864), mainly to make review easier for @3axap4eHko:

The main behavioral intent is still the same: login can use a custom CA bundle via CODEX_CA_CERTIFICATE, with fallback to SSL_CERT_FILE, and the login flows share that client-construction path.

The biggest structural change is that most of the CA-specific code now lives in a dedicated custom_ca.rs module instead of adding a few hundred lines to server.rs. That was mainly to keep the login server code focused on the server flow and make the CA-loading logic more additive and easier to review in isolation.

Relative to the original PR, this version also leans harder into robustness and observability:

  • the PEM fixtures were moved into named files and made shared across tests
  • custom CA parsing now has a typed error surface and clearer user-facing failure messages
  • the CA-loading path logs enough context to diagnose which env var/file won and where loading/building failed
  • TRUSTED CERTIFICATE handling is documented as a compatibility shim, with rationale and upstream context
  • CRLs are handled via rustls-pki-types mixed-section parsing rather than broad text stripping
  • env-sensitive tests were isolated into subprocess tests, and the macOS sandbox/proxy panic was contained to the probe path so the tests stay focused on CA behavior

So from my perspective, the main delta versus #6864 is not a change in product intent so much as a reorganization around reviewability, clearer contracts, and more defensive behavior around real-world CA bundles and test environments.

Most of the credit for the actual feature still belongs to @3axap4eHko; this version rearranges and hardens the work substantially, but it is still building on the original implementation and review feedback.

@joshka-oai joshka-oai force-pushed the joshka/login-custom-ca-rework branch from a1844fe to e3487e3 Compare March 10, 2026 05:02
@joshka-oai joshka-oai force-pushed the joshka/login-custom-ca-rework branch from e3487e3 to 3dcb290 Compare March 10, 2026 19:01
@joshka-oai joshka-oai changed the title login: extract custom CA handling into dedicated module Mar 10, 2026
@joshka-oai joshka-oai force-pushed the joshka/login-custom-ca-rework branch from 3dcb290 to 08a39d7 Compare March 10, 2026 19:33
@3axap4eHko
Copy link
Copy Markdown
Contributor

3axap4eHko commented Mar 11, 2026

Thanks for the rework. I re-compared this against #6864, and the main things that still feel lost from the old PR are:

  1. The subprocess probe no longer exercises the exact production builder path. login_ca_probe now goes through probe_support::build_login_http_client(), which uses .no_proxy() specifically for tests. I understand why that exists, but it does mean the subprocess coverage is no longer validating the same reqwest builder configuration that production login uses. fix: add custom CA certificate support for login flows #6864's probe was closer to "exercise the real path in a subprocess".

  2. TRUSTED CERTIFICATE support now depends on hand-rolled DER boundary parsing (first_der_item / der_item_length) to trim trailing X509_AUX bytes. That is a fair bit more bespoke than fix: add custom CA certificate support for login flows #6864. If this is going to stay, it would be good to justify it with the exact real-world inputs it was validated against, because this is the kind of parsing logic that is easy to carry forward without much confidence.

  3. CRL tolerance is weaker than it was in fix: add custom CA certificate support for login flows #6864. The current code ignores well-formed X509 CRL sections, but it still fails the whole bundle if a CRL section is malformed. The old strip_pem_block approach was more forgiving there: malformed CRLs were removed before PEM parsing, so valid certs in the same bundle could still load. For the enterprise-bundle use case, that more permissive behavior still seems desirable.

One note so this doesn’t send anyone on the wrong trail: the env-scrubbing issue from #6864 looks already fixed on the current tip. run_probe() now removes inherited CODEX_CA_CERTIFICATE / SSL_CERT_FILE before applying test-specific values.

@joshka-oai
Copy link
Copy Markdown
Contributor Author

Thank you for your review. It's exactly what I needed on this (and very appreciated). I'll take a look at the mentioned concerns tomorrow morning. Are you able to confirm that this works in your environment? It's difficult to anticipate which of the mentioned inconsistencies will lead to failure and which will mostly just work fine as runtime configuration of these proxies is fairly bespoke in the real world.

@joshka-oai
Copy link
Copy Markdown
Contributor Author

joshka-oai commented Mar 11, 2026

Thanks, this is helpful. I think the three points break down a bit differently.

On the subprocess probe using .no_proxy()

That was an intentional test-only workaround, not a change to the production login path.

The issue we hit was that the subprocess tests could fail before the CA code under test reported either success or a structured error, because on macOS seatbelt runs reqwest::Client::builder().build() can panic while probing platform proxy settings. That matters in practice for us because these tests are often run from inside Codex, where spawned processes commonly end up under seatbelt.

I tried to document that at the owner level here:

So yes, this is weaker than “exercise the exact production builder in a subprocess,” but that tradeoff was deliberate: without disabling proxy autodetection in the probe, these tests were not reliably testing CA behavior at all.

On the bespoke TRUSTED CERTIFICATE handling

Agreed this is more custom than the earlier version, but it is there to handle a real OpenSSL PEM variant rather than an invented edge case.

I added the rationale and upstream references directly in the code:

This is also covered by a real OpenSSL-generated fixture in the repo rather than a hand-written approximation:

In particular, this points at:

The intent here was to stay narrow: normalize the label, trim off trailing OpenSSL X509_AUX trust metadata, and hand only the certificate DER to reqwest.

On malformed CRLs

I think this is the most valid open point, but I’m comfortable treating it as a follow-up if we find it matters in practice.

Right now the code ignores well-formed CRLs, but a malformed CRL can still fail the bundle before we classify that section as ignorable. I documented that as a known limitation here:

My take is that this is a reasonable place to fail closed for now. This is a security-sensitive root-loading path, and in the normal case we should be able to expect CA bundles provided by system administrators to contain well-formed items. If that assumption turns out to be wrong in the field, making CRL handling more permissive is a two-way door and we can relax it later with concrete examples, but I don’t think we need to preemptively broaden acceptance of malformed mixed PEM input now.

@joshka-oai joshka-oai requested review from bolinfest and gpeal and removed request for bolinfest March 11, 2026 21:47
@joshka-oai joshka-oai enabled auto-merge (squash) March 12, 2026 23:00
3axap4eHko and others added 5 commits March 12, 2026 16:19
Enterprise TLS inspection proxies use custom roots, so OAuth
token exchanges fail when we only trust system CAs. This
change switches PEM parsing to rustls-pki-types (multi-cert
bundles included) and surfaces clearer, user-facing errors
that explain how to fix invalid or empty CA files via
CODEX_CA_CERTIFICATE/SSL_CERT_FILE.

To avoid cross-test races with process-wide env vars, CA path
selection now uses a small EnvSource abstraction in unit
tests, and environment-dependent behavior is verified via an
assert_cmd-driven login_ca_probe helper binary. This keeps
normal tests isolated while still validating env precedence
and error messaging.

Also updates login dev-deps (assert_cmd/pretty_assertions),
removes serial_test, and re-exports build_login_http_client
for the probe helper.

Co-authored-by: Codex <noreply@openai.com>
Route login's reqwest client construction through a shared custom CA
loader so browser and device-code authentication can honor
CODEX_CA_CERTIFICATE and SSL_CERT_FILE. This supports
multi-certificate bundles, OpenSSL TRUSTED CERTIFICATE
compatibility, CRL skipping, and named PEM fixtures used by
subprocess integration tests.

Co-authored-by: Codex <noreply@openai.com>
Route the spawned login_ca_probe binary through a hidden
probe_support entry point that disables reqwest proxy
autodetection. This keeps the subprocess CA tests focused on
custom bundle loading instead of the system-configuration panic
that can happen under macOS seatbelt, and documents why the
workaround stays out of the normal public API.

Co-authored-by: Codex <noreply@openai.com>
@joshka-oai joshka-oai force-pushed the joshka/login-custom-ca-rework branch from 08a39d7 to 334d227 Compare March 12, 2026 23:34
joshka-oai added a commit that referenced this pull request Mar 12, 2026
## Stacked PRs

This work is split across three stacked PRs:

- #14178: add custom CA support for browser and device-code login flows,
docs, and hermetic subprocess tests
- #14239: broaden the shared custom CA path from login to other outbound
`reqwest` clients across Codex
- #14240: extend that shared custom CA handling to secure websocket TLS
so websocket connections honor the same CA env vars

Review order: #14178, then #14239, then #14240.

Builds on top of #14239, which itself builds on #14178.

## Problem

The shared custom-CA path covered `reqwest` clients after #14239, but
secure websocket connections still used tungstenite's default TLS
connector. In enterprise MITM setups, that meant HTTPS requests could
succeed while websocket connections still failed because they were not
loading the same custom root CA bundle.

## What This Delivers

Secure websocket connections now honor the same custom CA configuration
as the HTTPS clients introduced in the earlier stacked PRs. After this
lands, setups that already work for HTTPS behind an intercepting proxy
can also work for websocket-backed features instead of failing later
during websocket TLS setup.

For users and operators, the configuration does not change:
`CODEX_CA_CERTIFICATE` wins, then `SSL_CERT_FILE`, then system roots.
The difference is that websocket TLS now participates in that same
policy.

## Mental model

There is now one shared custom-CA policy for both HTTPS and secure
websocket connections.

`reqwest` callers continue to use
`build_reqwest_client_with_custom_ca(...)`.
Websocket callers now ask `codex-client` for a rustls client config when
a custom CA bundle is configured, then pass that config into tungstenite
explicitly.

The env precedence remains the same:
- `CODEX_CA_CERTIFICATE` wins
- otherwise fall back to `SSL_CERT_FILE`
- otherwise use system roots

## Non-goals

This does not add a live end-to-end TLS interception test. It does not
change the fallback behavior when no custom CA env var is set. It also
does not try to generalize all websocket transport concerns into
`codex-client`; it only extends the shared CA-loading policy to the
websocket TLS boundary.

## Tradeoffs

The main tradeoff is that websocket callers now have an explicit shared
rustls-config path only when a custom CA bundle is configured. That
keeps the normal no-override path simple, but it means the websocket
implementation still has two TLS setup paths: default connector when no
override is set, explicit connector when it is.

This PR also renames the shared CA error type to match reality. The
error is no longer reqwest-specific, because the same CA-loading
failures now surface from websocket TLS configuration too.

## Architecture

`codex-client::custom_ca` now exposes an optional rustls client-config
builder alongside the existing reqwest client builder.

The websocket clients in `codex-api` now consume that shared config:
- responses websocket connections
- realtime websocket connections

When no custom CA env var is set, websocket callers still use their
ordinary default connector path. When a custom CA bundle is configured,
they build an explicit rustls connector with the same roots that the
shared HTTPS path uses.

## Observability

The websocket path now inherits the same CA-loading logs and user-facing
errors as the shared HTTPS path. Failures to read, parse, or register
custom CA certificates are surfaced before websocket TLS is attempted,
instead of failing later as an opaque certificate-validation problem.

## Tests

This PR relies on the existing `codex-client` CA tests plus the
websocket-focused suites in `codex-api` and `codex-core`.

Automated coverage run for this stack:
- `cargo test -p codex-client -p codex-api`
- `cargo test -p codex-core websocket_fallback -- --nocapture`

Manual validation:
- `CODEX_CA_CERTIFICATE=~/.mitmproxy/mitmproxy-ca-cert.pem
HTTPS_PROXY=http://127.0.0.1:8080 just codex`
- with `mitmproxy` installed via `brew install mitmproxy`
- and mitmproxy also configured as the system proxy in macOS Wi-Fi
settings

That manual check was specifically useful because it exercises the
websocket path behind a real intercepting proxy instead of only the
reqwest HTTPS path.

Co-authored-by: Codex <noreply@openai.com>
@joshka-oai joshka-oai merged commit 76d8d17 into main Mar 13, 2026
52 of 54 checks passed
@joshka-oai joshka-oai deleted the joshka/login-custom-ca-rework branch March 13, 2026 00:14
@github-actions github-actions bot locked and limited conversation to collaborators Mar 13, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

3 participants