简体中文 | English
This is a redesigned fork of fengshao1227/ccg-workflow. See What Changed below.
A competitive multi-model development system where every significant task is dispatched to ALL available models (Codex + Gemini) in parallel. Claude orchestrates, competes, compares outputs using benchmark-weighted evaluation criteria, and synthesizes the best solution.
Why competitive? Our benchmark analysis shows no single model dominates all tasks. Claude leads code quality (SWE-bench 80.8%), Codex leads autonomous execution (Terminal-Bench 77.3%), and Gemini leads visual design (WebDev Arena #1). Competitive dispatch ensures you always get the best output.
The original CCG uses static routing — frontend tasks always go to Gemini, backend tasks always go to Codex. This fork changes that completely.
Frontend task → Gemini only (assumed "frontend expert")
Backend task → Codex only (assumed "backend expert")
Claude → Orchestrator (doesn't compete)
The problem: these assumptions aren't backed by benchmarks. SWE-bench scores are nearly tied across all three models (~80%), and each model has different strengths that don't align with a simple frontend/backend split.
Any task → Send to ALL models in parallel → Compare outputs with weighted scores → Use the best one
Every task goes to Codex + Gemini + Claude at the same time. Claude then scores all three outputs using benchmark-weighted criteria and picks the best one (or combines the best parts from each).
| Feature | Original | This Fork |
|---|---|---|
| Routing | Fixed: Gemini=frontend, Codex=backend | All models compete on every task |
| How to pick the best output | Trust rules: "Codex is backend authority" | Weighted scoring based on actual benchmarks |
| Claude's role | Orchestrator only (doesn't generate solutions) | Orchestrator AND competitor (generates its own solution too) |
| Code review | 2 models (Codex + Gemini) | 3 models (Codex + Gemini + Claude), with consensus scoring (3/3, 2/3, 1/3) |
| Model strengths | Assumed (no evidence cited) | Documented with benchmark data and sources |
| Model weaknesses | Not mentioned | Each model's prompts list known limitations |
| Cost control | Always 2 models | 3 modes: Competitive (all 3), Focused (best-match), Quick (Claude only) |
We fact-checked the original claims against published benchmarks:
| Original Claim | Reality |
|---|---|
| "Gemini is the frontend expert" | Gemini leads visual appeal (WebDev Arena #1) but Claude produces better responsive and accessible code (Index.dev test) |
| "Codex is the backend expert" | Codex leads terminal workflows (77.3%) but Claude leads architecture planning and code quality (80.8% SWE-bench) |
| One model is always best for a domain | No — each model wins on different dimensions, not domains |
┌─→ Codex CLI ─→ Output A ─┐
│ │
Task ─→ Claude ─────┼─→ Gemini CLI ─→ Output B ─┼─→ Weighted ─→ Best
(also │ │ Compare Solution
generates └─→ Claude Self ─→ Output C ─┘
Output C)
External models have no write access — they only return patches/analysis, which Claude evaluates and applies.
A full /ccg:workflow run goes through 6 phases. At every phase that involves model output, all models compete in parallel:
flowchart TD
Start["🚀 User runs /ccg:workflow"] --> P1
subgraph P1["Phase 1 — Research"]
R1["Analyze requirements"]
R2["Search codebase via MCP"]
R3["Score completeness (≥7 to proceed)"]
R1 --> R2 --> R3
end
P1 --> P2
subgraph P2["Phase 2 — Ideation (Competitive)"]
direction LR
C2["Codex analyzes"] & G2["Gemini analyzes"] & CL2["Claude analyzes"]
end
P2 --> W2["⚖️ Weighted scoring → Best analysis"]
W2 --> P3
subgraph P3["Phase 3 — Planning (Competitive)"]
direction LR
C3["Codex plans"] & G3["Gemini plans"] & CL3["Claude plans"]
end
P3 --> W3["⚖️ Weighted scoring → Best plan"]
W3 --> Approve{"👤 User approves plan?"}
Approve -- No --> P3
Approve -- Yes --> P4
subgraph P4["Phase 4 — Execution"]
E1["Claude implements code"]
E2["(sole code writer — external models have zero write access)"]
E1 --> E2
end
P4 --> P5
subgraph P5["Phase 5 — Review (Competitive)"]
direction LR
C5["Codex reviews"] & G5["Gemini reviews"] & CL5["Claude reviews"]
end
P5 --> W5["⚖️ Weighted consensus → Issues ranked by agreement (3/3, 2/3, 1/3)"]
W5 --> Fix{"Critical issues?"}
Fix -- Yes --> FixCode["Claude fixes"] --> P5
Fix -- No --> P6
subgraph P6["Phase 6 — Final Review"]
F1["Verify against plan"]
F2["Run tests"]
F3["Report to user"]
F1 --> F2 --> F3
end
P6 --> Done["✅ Done"]
style P2 fill:#e8f4f8,stroke:#2196F3
style P3 fill:#e8f4f8,stroke:#2196F3
style P5 fill:#e8f4f8,stroke:#2196F3
style W2 fill:#fff3e0,stroke:#FF9800
style W3 fill:#fff3e0,stroke:#FF9800
style W5 fill:#fff3e0,stroke:#FF9800
Each model's output is scored on multiple dimensions. The weights depend on the task type:
Codex Gemini Claude
───── ────── ──────
Tech depth 0.30 0.20 0.35 ← Claude weighted highest
Risk detection 0.35 0.15 0.30 ← Codex weighted highest
UX / visual 0.10 0.40 0.20 ← Gemini weighted highest
Actionability 0.25 0.25 0.25 ← Equal
Score = Σ (dimension rating × weight)
Winner = highest total score
Final output = winner + cherry-picked improvements from others
Different task types use different weight tables — see /ccg:routing-guide for all of them.
npx ccg-workflowRequirements: Claude Code CLI, Node.js 20+
Important: This project depends on
ora@9.xandstring-width@8.x, which require Node.js >= 20.
Optional: Codex CLI (competitive dispatch), Gemini CLI (competitive dispatch)
Routing decisions are based on published benchmarks, not assumptions:
| Benchmark | Claude Opus 4.6 | GPT-5.3-Codex | Gemini 3.1 Pro | Source |
|---|---|---|---|---|
| SWE-bench Verified | 80.8% | 80.0% | 80.6% | marc0.dev |
| Terminal-Bench 2.0 | 74.7% | 77.3% | 78.4% (hard) | SmartScope |
| WebDev Arena | ~1510 ELO | 1477 ELO | 1487 ELO | Google Dev Blog |
| Code Review Quality | #1 tied | Below Claude | Below Claude | Milvus Blog |
| Responsive Design | Leader | - | Below Claude | Index.dev |
| Model | Best At | Evidence |
|---|---|---|
| Claude | Code quality, architecture, responsive/a11y, code review | SWE-bench leader, Milvus review #1, Index.dev responsive test |
| Codex | Edge case detection, autonomous execution, Python, terminal workflows | Terminal-Bench leader, catches bugs others miss, cloud sandbox |
| Gemini | Visual design, rapid prototyping, large codebase analysis | WebDev Arena #1, fastest iteration, 1M token context |
| Mode | Description | Cost |
|---|---|---|
| Competitive (default) | All models compete, weighted comparison | ~3x |
| Focused | Best-match model + Claude verification | ~1-2x |
| Quick | Claude only | 1x |
See /ccg:routing-guide for full weighted evaluation criteria per task type.
| Command | Description |
|---|---|
/ccg:workflow |
Full 6-phase competitive workflow |
/ccg:plan |
Competitive multi-model planning |
/ccg:execute |
Competitive execution with output comparison |
/ccg:feat |
Smart feature development |
/ccg:frontend |
Frontend tasks (all models, visual weight) |
/ccg:backend |
Backend tasks (all models, logic weight) |
/ccg:analyze |
Technical analysis |
/ccg:debug |
Competitive problem diagnosis |
/ccg:optimize |
Competitive performance optimization |
/ccg:test |
Test generation |
/ccg:review |
Competitive code review (weighted consensus) |
/ccg:routing-guide |
View benchmark evidence and routing criteria |
/ccg:commit |
Git commit |
/ccg:rollback |
Git rollback |
/ccg:clean-branches |
Clean branches |
/ccg:worktree |
Worktree management |
/ccg:init |
Initialize CLAUDE.md |
/ccg:enhance |
Prompt enhancement |
/ccg:spec-init |
Initialize OPSX environment |
/ccg:spec-research |
Requirements → Constraints |
/ccg:spec-plan |
Constraints → Zero-decision plan |
/ccg:spec-impl |
Execute plan + archive |
/ccg:spec-review |
Dual-model cross-review |
/ccg:team-research |
Agent Teams requirements → constraints |
/ccg:team-plan |
Agent Teams constraints → parallel plan |
/ccg:team-exec |
Agent Teams parallel execution |
/ccg:team-review |
Agent Teams weighted consensus review |
/ccg:codex-exec |
Codex full execution (plan → code → review) |
Every competitive dispatch uses task-type-specific weights:
Analysis: Tech depth (Claude .35) | Risk (Codex .35) | UX (Gemini .40) | Action (.25 each)
Planning: Architecture (Claude .40) | Scale (Claude .30) | UI (Gemini .40) | Detail (Codex .30)
Implementation: Correctness (Claude .35) | Visual (Gemini .45) | Edge cases (Codex .35) | Maintain (Claude .35)
Review: Security (Codex .35) | Visual (Gemini .40) | Logic (Claude .35) | Perf (Codex .30)
Integrates OPSX architecture to turn requirements into constraints:
/ccg:spec-init
/ccg:spec-research implement user authentication
/ccg:spec-plan
/ccg:spec-impl
/ccg:spec-review/ccg:team-research implement real-time collaboration kanban API
/ccg:team-plan kanban-api
/ccg:team-exec
/ccg:team-reviewPrerequisite: CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1 in settings.json
~/.claude/
├── commands/ccg/ # Slash commands
├── agents/ccg/ # Sub-agents
├── skills/ # Quality gates
├── bin/codeagent-wrapper
└── .ccg/
├── config.toml
└── prompts/{codex,gemini}/
| Variable | Description | Default |
|---|---|---|
CODEAGENT_POST_MESSAGE_DELAY |
Wait time after Codex completion (seconds) | 5 |
CODEX_TIMEOUT |
codeagent-wrapper execution timeout (seconds) | 7200 |
BASH_DEFAULT_TIMEOUT_MS |
Claude Code Bash default timeout (ms) | 120000 |
BASH_MAX_TIMEOUT_MS |
Claude Code Bash max timeout (ms) | 600000 |
CCG automatically installs a Hook to auto-authorize codeagent-wrapper commands.
Requirement: jq must be installed.
Code retrieval MCP (choose one):
- ace-tool (recommended)
- ContextWeaver (alternative)
Optional: Context7, Playwright, DeepWiki, Exa
npx ccg-workflow@latest # Update
npx ccg-workflow # Select "Uninstall"- cexll/myclaude - codeagent-wrapper
- UfoMiao/zcf - Git tools
- GudaStudio/skills - Original routing design
- ace-tool - MCP tool
MIT
v1.8.0 (Competitive Multi-Model Redesign) | Issues