Skip to content

Route deepseek-v4-flash to Fireworks when DeepSeek API is unhealthy#745

Open
jahooma wants to merge 1 commit into
mainfrom
deepseek-fireworks-fallback
Open

Route deepseek-v4-flash to Fireworks when DeepSeek API is unhealthy#745
jahooma wants to merge 1 commit into
mainfrom
deepseek-fireworks-fallback

Conversation

@jahooma

@jahooma jahooma commented May 24, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Adds Fireworks as a transparent fallback for deepseek-v4-flash (accounts/fireworks/models/deepseek-v4-flash).
  • New passive circuit breaker (deepseek-health.ts): 3 failures / 60s window opens the circuit for 5 min, then the next request probes DeepSeek and resets on success. No background polling — every user request is the probe, so all pods converge naturally.
  • Tighter 60s headersTimeout for the Flash undici agent so dead-API requests fail fast instead of hanging on the existing 30-min default (kept for reasoning models on v4-pro).
  • _post.ts routes to Fireworks when the circuit is open, plus inline pre-stream failover so the first user to hit an outage also gets a Fireworks response instead of an error.
  • Pricing entry in FIREWORKS_PRICING_MAP: 0.14 / 0.03 / 0.28 per M tokens (input / cached / output).

How it works

  1. createDeepSeekRequestTracked wraps the DeepSeek fetch. Network errors, timeouts, 5xx/408/429 → recordDeepSeekFailure(). 2xx → recordDeepSeekSuccess() (clears state).
  2. When recentFailures.length >= 3 within the 60s window, openUntil = now + 5min.
  3. Routing in _post.ts calls shouldBypassDeepSeek(model) and, if true, sets useDeepSeek = false so the existing Fireworks branch picks up the same model id (now in FIREWORKS_MODEL_MAP).
  4. After cooldown expires, the next request retries DeepSeek directly. Success resets; another failure re-opens.

Test plan

  • Unit tests for circuit breaker and outage classifier (11 new tests in deepseek-health.test.ts, all passing).
  • Existing fireworks-deployment and fireworks-health test suites still pass (75 llm-api tests total).
  • bun run typecheck clean for changed files (pre-existing SDK errors unrelated).
  • Manual verification once deployed: confirm deepseek/deepseek-v4-flash calls succeed via DeepSeek when healthy and via Fireworks once the breaker opens (induce by temporarily pointing DEEPSEEK_BASE_URL at a sink, or watch real outage logs).

🤖 Generated with Claude Code

Adds Fireworks as a transparent fallback for deepseek-v4-flash, gated by
a passive circuit breaker so we only divert when the official DeepSeek
API actually misbehaves.

- New deepseek-health.ts circuit breaker: 3 failures in 60s opens the
  circuit for 5 min; the next request after expiry probes DeepSeek again
  and resets on success. No background polling — every user request is
  itself the probe.
- Tighter 60s headersTimeout for the Flash undici agent so dead-API
  requests fail fast (the existing 30-min default is kept for reasoning
  models on v4-pro).
- handleDeepSeek{Stream,NonStream} now wrap the fetch call so network
  errors, timeouts, and 5xx/408/429 responses feed the breaker; 2xx
  resets it.
- _post.ts routes to Fireworks when the circuit is open and adds inline
  pre-stream failover so the first user to hit an outage also gets a
  Fireworks response instead of an error.
- Adds accounts/fireworks/models/deepseek-v4-flash to the Fireworks
  model + pricing maps.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

1 participant