login: add custom CA support for login flows#14178
Conversation
8e585bd to
a1844fe
Compare
joshka-oai
left a comment
There was a problem hiding this comment.
Comparison to the original PR (#6864), mainly to make review easier for @3axap4eHko:
The main behavioral intent is still the same: login can use a custom CA bundle via CODEX_CA_CERTIFICATE, with fallback to SSL_CERT_FILE, and the login flows share that client-construction path.
The biggest structural change is that most of the CA-specific code now lives in a dedicated custom_ca.rs module instead of adding a few hundred lines to server.rs. That was mainly to keep the login server code focused on the server flow and make the CA-loading logic more additive and easier to review in isolation.
Relative to the original PR, this version also leans harder into robustness and observability:
- the PEM fixtures were moved into named files and made shared across tests
- custom CA parsing now has a typed error surface and clearer user-facing failure messages
- the CA-loading path logs enough context to diagnose which env var/file won and where loading/building failed
TRUSTED CERTIFICATEhandling is documented as a compatibility shim, with rationale and upstream context- CRLs are handled via
rustls-pki-typesmixed-section parsing rather than broad text stripping - env-sensitive tests were isolated into subprocess tests, and the macOS sandbox/proxy panic was contained to the probe path so the tests stay focused on CA behavior
So from my perspective, the main delta versus #6864 is not a change in product intent so much as a reorganization around reviewability, clearer contracts, and more defensive behavior around real-world CA bundles and test environments.
Most of the credit for the actual feature still belongs to @3axap4eHko; this version rearranges and hardens the work substantially, but it is still building on the original implementation and review feedback.
a1844fe to
e3487e3
Compare
e3487e3 to
3dcb290
Compare
3dcb290 to
08a39d7
Compare
|
Thanks for the rework. I re-compared this against #6864, and the main things that still feel lost from the old PR are:
One note so this doesn’t send anyone on the wrong trail: the env-scrubbing issue from #6864 looks already fixed on the current tip. |
|
Thank you for your review. It's exactly what I needed on this (and very appreciated). I'll take a look at the mentioned concerns tomorrow morning. Are you able to confirm that this works in your environment? It's difficult to anticipate which of the mentioned inconsistencies will lead to failure and which will mostly just work fine as runtime configuration of these proxies is fairly bespoke in the real world. |
|
Thanks, this is helpful. I think the three points break down a bit differently.
That was an intentional test-only workaround, not a change to the production login path. The issue we hit was that the subprocess tests could fail before the CA code under test reported either success or a structured error, because on macOS seatbelt runs I tried to document that at the owner level here:
So yes, this is weaker than “exercise the exact production builder in a subprocess,” but that tradeoff was deliberate: without disabling proxy autodetection in the probe, these tests were not reliably testing CA behavior at all.
Agreed this is more custom than the earlier version, but it is there to handle a real OpenSSL PEM variant rather than an invented edge case. I added the rationale and upstream references directly in the code:
This is also covered by a real OpenSSL-generated fixture in the repo rather than a hand-written approximation:
In particular, this points at:
The intent here was to stay narrow: normalize the label, trim off trailing OpenSSL
I think this is the most valid open point, but I’m comfortable treating it as a follow-up if we find it matters in practice. Right now the code ignores well-formed CRLs, but a malformed CRL can still fail the bundle before we classify that section as ignorable. I documented that as a known limitation here: My take is that this is a reasonable place to fail closed for now. This is a security-sensitive root-loading path, and in the normal case we should be able to expect CA bundles provided by system administrators to contain well-formed items. If that assumption turns out to be wrong in the field, making CRL handling more permissive is a two-way door and we can relax it later with concrete examples, but I don’t think we need to preemptively broaden acceptance of malformed mixed PEM input now. |
Enterprise TLS inspection proxies use custom roots, so OAuth token exchanges fail when we only trust system CAs. This change switches PEM parsing to rustls-pki-types (multi-cert bundles included) and surfaces clearer, user-facing errors that explain how to fix invalid or empty CA files via CODEX_CA_CERTIFICATE/SSL_CERT_FILE. To avoid cross-test races with process-wide env vars, CA path selection now uses a small EnvSource abstraction in unit tests, and environment-dependent behavior is verified via an assert_cmd-driven login_ca_probe helper binary. This keeps normal tests isolated while still validating env precedence and error messaging. Also updates login dev-deps (assert_cmd/pretty_assertions), removes serial_test, and re-exports build_login_http_client for the probe helper. Co-authored-by: Codex <noreply@openai.com>
Route login's reqwest client construction through a shared custom CA loader so browser and device-code authentication can honor CODEX_CA_CERTIFICATE and SSL_CERT_FILE. This supports multi-certificate bundles, OpenSSL TRUSTED CERTIFICATE compatibility, CRL skipping, and named PEM fixtures used by subprocess integration tests. Co-authored-by: Codex <noreply@openai.com>
Route the spawned login_ca_probe binary through a hidden probe_support entry point that disables reqwest proxy autodetection. This keeps the subprocess CA tests focused on custom bundle loading instead of the system-configuration panic that can happen under macOS seatbelt, and documents why the workaround stays out of the normal public API. Co-authored-by: Codex <noreply@openai.com>
08a39d7 to
334d227
Compare
## Stacked PRs This work is split across three stacked PRs: - #14178: add custom CA support for browser and device-code login flows, docs, and hermetic subprocess tests - #14239: broaden the shared custom CA path from login to other outbound `reqwest` clients across Codex - #14240: extend that shared custom CA handling to secure websocket TLS so websocket connections honor the same CA env vars Review order: #14178, then #14239, then #14240. Builds on top of #14239, which itself builds on #14178. ## Problem The shared custom-CA path covered `reqwest` clients after #14239, but secure websocket connections still used tungstenite's default TLS connector. In enterprise MITM setups, that meant HTTPS requests could succeed while websocket connections still failed because they were not loading the same custom root CA bundle. ## What This Delivers Secure websocket connections now honor the same custom CA configuration as the HTTPS clients introduced in the earlier stacked PRs. After this lands, setups that already work for HTTPS behind an intercepting proxy can also work for websocket-backed features instead of failing later during websocket TLS setup. For users and operators, the configuration does not change: `CODEX_CA_CERTIFICATE` wins, then `SSL_CERT_FILE`, then system roots. The difference is that websocket TLS now participates in that same policy. ## Mental model There is now one shared custom-CA policy for both HTTPS and secure websocket connections. `reqwest` callers continue to use `build_reqwest_client_with_custom_ca(...)`. Websocket callers now ask `codex-client` for a rustls client config when a custom CA bundle is configured, then pass that config into tungstenite explicitly. The env precedence remains the same: - `CODEX_CA_CERTIFICATE` wins - otherwise fall back to `SSL_CERT_FILE` - otherwise use system roots ## Non-goals This does not add a live end-to-end TLS interception test. It does not change the fallback behavior when no custom CA env var is set. It also does not try to generalize all websocket transport concerns into `codex-client`; it only extends the shared CA-loading policy to the websocket TLS boundary. ## Tradeoffs The main tradeoff is that websocket callers now have an explicit shared rustls-config path only when a custom CA bundle is configured. That keeps the normal no-override path simple, but it means the websocket implementation still has two TLS setup paths: default connector when no override is set, explicit connector when it is. This PR also renames the shared CA error type to match reality. The error is no longer reqwest-specific, because the same CA-loading failures now surface from websocket TLS configuration too. ## Architecture `codex-client::custom_ca` now exposes an optional rustls client-config builder alongside the existing reqwest client builder. The websocket clients in `codex-api` now consume that shared config: - responses websocket connections - realtime websocket connections When no custom CA env var is set, websocket callers still use their ordinary default connector path. When a custom CA bundle is configured, they build an explicit rustls connector with the same roots that the shared HTTPS path uses. ## Observability The websocket path now inherits the same CA-loading logs and user-facing errors as the shared HTTPS path. Failures to read, parse, or register custom CA certificates are surfaced before websocket TLS is attempted, instead of failing later as an opaque certificate-validation problem. ## Tests This PR relies on the existing `codex-client` CA tests plus the websocket-focused suites in `codex-api` and `codex-core`. Automated coverage run for this stack: - `cargo test -p codex-client -p codex-api` - `cargo test -p codex-core websocket_fallback -- --nocapture` Manual validation: - `CODEX_CA_CERTIFICATE=~/.mitmproxy/mitmproxy-ca-cert.pem HTTPS_PROXY=http://127.0.0.1:8080 just codex` - with `mitmproxy` installed via `brew install mitmproxy` - and mitmproxy also configured as the system proxy in macOS Wi-Fi settings That manual check was specifically useful because it exercises the websocket path behind a real intercepting proxy instead of only the reqwest HTTPS path. Co-authored-by: Codex <noreply@openai.com>
Stacked PRs
This work is split across three stacked PRs:
reqwestclients across CodexReview order: #14178, then #14239, then #14240.
Supersedes #6864.
Thanks to @3axap4eHko for the original implementation and investigation here. Although this version rearranges the code and history significantly, the majority of the credit for this work belongs to them.
Problem
Login flows need to work in enterprise environments where outbound TLS is intercepted by an internal proxy or gateway. In those setups, system root certificates alone are often insufficient to validate the OAuth and device-code endpoints used during login. The change adds a login-specific custom CA loading path, but the important contracts around env precedence, PEM compatibility, test boundaries, and probe-only workarounds need to be explicit so reviewers can understand what behavior is intentional.
For users and operators, the behavior is simple: if login needs to trust a custom root CA, set
CODEX_CA_CERTIFICATEto a PEM file containing one or more certificates. If that variable is unset, login falls back toSSL_CERT_FILE. If neither is set, login uses system roots. Invalid or empty PEM files now fail with an error that points back to those environment variables and explains how to recover.What This Delivers
Users can now make Codex login work behind enterprise TLS interception by pointing
CODEX_CA_CERTIFICATEat a PEM bundle containing the relevant root certificates. If that variable is unset, login falls back toSSL_CERT_FILE, then to system roots.This PR applies that behavior to both browser-based and device-code login flows. It also makes login tolerant of the PEM shapes operators actually have in hand: multi-certificate bundles, OpenSSL
TRUSTED CERTIFICATElabels, and bundles that include well-formed CRLs.Mental model
codex-loginis the place where the login flows construct ad hoc outbound HTTP clients. That makes it the right boundary for a narrow CA policy: look forCODEX_CA_CERTIFICATE, fall back toSSL_CERT_FILE, load every parseable certificate block in that bundle into areqwest::Client, and fail early with a clear user-facing error if the bundle is unreadable or malformed.The implementation is intentionally pragmatic about PEM input shape. It accepts ordinary certificate bundles, multi-certificate bundles, OpenSSL
TRUSTED CERTIFICATElabels, and bundles that also contain CRLs. It does not validate a certificate chain or prove a handshake; it only constructs the root store used by login.Non-goals
This change does not introduce a general-purpose transport abstraction for the rest of the product. It does not validate whether the provided bundle forms a real chain, and it does not add handshake-level integration tests against a live TLS server. It also does not change login state management or OAuth semantics beyond ensuring the existing flows share the same CA-loading rules.
Tradeoffs
The main tradeoff is keeping this logic scoped to login-specific client construction rather than lifting it into a broader shared HTTP layer. That keeps the review surface smaller, but it also means future login-adjacent code must continue to use
build_login_http_client()or it can silently bypass enterprise CA overrides.The
TRUSTED CERTIFICATEhandling is also intentionally a local compatibility shim. The rustls ecosystem does not currently accept that PEM label upstream, so the code normalizes it locally and trims the OpenSSLX509_AUXtrailer bytes down to the certificate DER thatreqwestcan consume.Architecture
custom_ca.rsis now the single place that owns login CA behavior. It selects the CA file from the environment, reads it, normalizes PEM label shape where needed, iterates mixed PEM sections withrustls-pki-types, ignores CRLs, trims OpenSSL trust metadata when necessary, and returns either a configuredreqwest::Clientor a typed error.The browser login server and the device-code flow both call
build_login_http_client(), so they share the same trust-store policy. Environment-sensitive tests run through thelogin_ca_probehelper binary because those tests must control process-wide env vars and cannot reliably build a real reqwest client in-process on macOS seatbelt runs.Observability
The custom CA path logs which environment variable selected the bundle, which file path was loaded, how many certificates were accepted, when
TRUSTED CERTIFICATElabels were normalized, when CRLs were ignored, and where client construction failed. Returned errors remain user-facing and include the relevant path, env var, and remediation hint.This gives enough signal for three audiences:
Tests
Pure unit tests stay limited to env precedence and empty-value handling. Real client construction lives in subprocess tests so the suite remains hermetic with respect to process env and macOS sandbox behavior.
The subprocess tests verify:
CODEX_CA_CERTIFICATEprecedence overSSL_CERT_FILESSL_CERT_FILETRUSTED CERTIFICATEhandlingThe named PEM fixtures under
login/tests/fixtures/are shared by the tests so their purpose stays reviewable.