CCG - Claude + Codex + Gemini Competitive Multi-Model System

This is a redesigned fork of fengshao1227/ccg-workflow. See What Changed below.

A competitive multi-model development system where every significant task is dispatched to ALL available models (Codex + Gemini) in parallel. Claude orchestrates, competes, compares outputs using benchmark-weighted evaluation criteria, and synthesizes the best solution.

Why competitive? Our benchmark analysis shows no single model dominates all tasks. Claude leads code quality (SWE-bench 80.8%), Codex leads autonomous execution (Terminal-Bench 77.3%), and Gemini leads visual design (WebDev Arena #1). Competitive dispatch ensures you always get the best output.

What Changed in This Fork

The original CCG uses static routing — frontend tasks always go to Gemini, backend tasks always go to Codex. This fork changes that completely.

Before (Original)

Frontend task → Gemini only    (assumed "frontend expert")
Backend task  → Codex only     (assumed "backend expert")
Claude        → Orchestrator   (doesn't compete)

The problem: these assumptions aren't backed by benchmarks. SWE-bench scores are nearly tied across all three models (~80%), and each model has different strengths that don't align with a simple frontend/backend split.

After (This Fork)

Any task → Send to ALL models in parallel → Compare outputs with weighted scores → Use the best one

Every task goes to Codex + Gemini + Claude at the same time. Claude then scores all three outputs using benchmark-weighted criteria and picks the best one (or combines the best parts from each).

What's different in practice

Feature	Original	This Fork
Routing	Fixed: Gemini=frontend, Codex=backend	All models compete on every task
How to pick the best output	Trust rules: "Codex is backend authority"	Weighted scoring based on actual benchmarks
Claude's role	Orchestrator only (doesn't generate solutions)	Orchestrator AND competitor (generates its own solution too)
Code review	2 models (Codex + Gemini)	3 models (Codex + Gemini + Claude), with consensus scoring (3/3, 2/3, 1/3)
Model strengths	Assumed (no evidence cited)	Documented with benchmark data and sources
Model weaknesses	Not mentioned	Each model's prompts list known limitations
Cost control	Always 2 models	3 modes: Competitive (all 3), Focused (best-match), Quick (Claude only)

Why the change

We fact-checked the original claims against published benchmarks:

Original Claim	Reality
"Gemini is the frontend expert"	Gemini leads visual appeal (WebDev Arena #1) but Claude produces better responsive and accessible code (Index.dev test)
"Codex is the backend expert"	Codex leads terminal workflows (77.3%) but Claude leads architecture planning and code quality (80.8% SWE-bench)
One model is always best for a domain	No — each model wins on different dimensions, not domains

Architecture

                    ┌─→ Codex CLI  ─→ Output A ─┐
                    │                             │
Task ─→ Claude ─────┼─→ Gemini CLI ─→ Output B ─┼─→ Weighted ─→ Best
        (also       │                             │   Compare     Solution
         generates  └─→ Claude Self ─→ Output C ─┘
         Output C)

External models have no write access — they only return patches/analysis, which Claude evaluates and applies.

How It Works

A full /ccg:workflow run goes through 6 phases. At every phase that involves model output, all models compete in parallel:

flowchart TD
    Start["🚀 User runs /ccg:workflow"] --> P1

    subgraph P1["Phase 1 — Research"]
        R1["Analyze requirements"]
        R2["Search codebase via MCP"]
        R3["Score completeness (≥7 to proceed)"]
        R1 --> R2 --> R3
    end

    P1 --> P2

    subgraph P2["Phase 2 — Ideation (Competitive)"]
        direction LR
        C2["Codex analyzes"] & G2["Gemini analyzes"] & CL2["Claude analyzes"]
    end

    P2 --> W2["⚖️ Weighted scoring → Best analysis"]
    W2 --> P3

    subgraph P3["Phase 3 — Planning (Competitive)"]
        direction LR
        C3["Codex plans"] & G3["Gemini plans"] & CL3["Claude plans"]
    end

    P3 --> W3["⚖️ Weighted scoring → Best plan"]
    W3 --> Approve{"👤 User approves plan?"}
    Approve -- No --> P3
    Approve -- Yes --> P4

    subgraph P4["Phase 4 — Execution"]
        E1["Claude implements code"]
        E2["(sole code writer — external models have zero write access)"]
        E1 --> E2
    end

    P4 --> P5

    subgraph P5["Phase 5 — Review (Competitive)"]
        direction LR
        C5["Codex reviews"] & G5["Gemini reviews"] & CL5["Claude reviews"]
    end

    P5 --> W5["⚖️ Weighted consensus → Issues ranked by agreement (3/3, 2/3, 1/3)"]
    W5 --> Fix{"Critical issues?"}
    Fix -- Yes --> FixCode["Claude fixes"] --> P5
    Fix -- No --> P6

    subgraph P6["Phase 6 — Final Review"]
        F1["Verify against plan"]
        F2["Run tests"]
        F3["Report to user"]
        F1 --> F2 --> F3
    end

    P6 --> Done["✅ Done"]

    style P2 fill:#e8f4f8,stroke:#2196F3
    style P3 fill:#e8f4f8,stroke:#2196F3
    style P5 fill:#e8f4f8,stroke:#2196F3
    style W2 fill:#fff3e0,stroke:#FF9800
    style W3 fill:#fff3e0,stroke:#FF9800
    style W5 fill:#fff3e0,stroke:#FF9800

Weighted Scoring Example (Phase 2 — Analysis)

Each model's output is scored on multiple dimensions. The weights depend on the task type:

                    Codex    Gemini    Claude
                    ─────    ──────    ──────
Tech depth          0.30     0.20      0.35  ← Claude weighted highest
Risk detection      0.35     0.15      0.30  ← Codex weighted highest
UX / visual         0.10     0.40      0.20  ← Gemini weighted highest
Actionability       0.25     0.25      0.25  ← Equal

Score = Σ (dimension rating × weight)
Winner = highest total score
Final output = winner + cherry-picked improvements from others

Different task types use different weight tables — see /ccg:routing-guide for all of them.

Installation

npx ccg-workflow

Requirements: Claude Code CLI, Node.js 20+

Important: This project depends on ora@9.x and string-width@8.x, which require Node.js >= 20.

Optional: Codex CLI (competitive dispatch), Gemini CLI (competitive dispatch)

Benchmark Evidence

Routing decisions are based on published benchmarks, not assumptions:

Benchmark	Claude Opus 4.6	GPT-5.3-Codex	Gemini 3.1 Pro	Source
SWE-bench Verified	80.8%	80.0%	80.6%	marc0.dev
Terminal-Bench 2.0	74.7%	77.3%	78.4% (hard)	SmartScope
WebDev Arena	~1510 ELO	1477 ELO	1487 ELO	Google Dev Blog
Code Review Quality	#1 tied	Below Claude	Below Claude	Milvus Blog
Responsive Design	Leader	-	Below Claude	Index.dev

Model Strength Summary

Model	Best At	Evidence
Claude	Code quality, architecture, responsive/a11y, code review	SWE-bench leader, Milvus review #1, Index.dev responsive test
Codex	Edge case detection, autonomous execution, Python, terminal workflows	Terminal-Bench leader, catches bugs others miss, cloud sandbox
Gemini	Visual design, rapid prototyping, large codebase analysis	WebDev Arena #1, fastest iteration, 1M token context

Dispatch Modes

Mode	Description	Cost
Competitive (default)	All models compete, weighted comparison	~3x
Focused	Best-match model + Claude verification	~1-2x
Quick	Claude only	1x

See /ccg:routing-guide for full weighted evaluation criteria per task type.

Commands

Command	Description
`/ccg:workflow`	Full 6-phase competitive workflow
`/ccg:plan`	Competitive multi-model planning
`/ccg:execute`	Competitive execution with output comparison
`/ccg:feat`	Smart feature development
`/ccg:frontend`	Frontend tasks (all models, visual weight)
`/ccg:backend`	Backend tasks (all models, logic weight)
`/ccg:analyze`	Technical analysis
`/ccg:debug`	Competitive problem diagnosis
`/ccg:optimize`	Competitive performance optimization
`/ccg:test`	Test generation
`/ccg:review`	Competitive code review (weighted consensus)
`/ccg:routing-guide`	View benchmark evidence and routing criteria
`/ccg:commit`	Git commit
`/ccg:rollback`	Git rollback
`/ccg:clean-branches`	Clean branches
`/ccg:worktree`	Worktree management
`/ccg:init`	Initialize CLAUDE.md
`/ccg:enhance`	Prompt enhancement
`/ccg:spec-init`	Initialize OPSX environment
`/ccg:spec-research`	Requirements → Constraints
`/ccg:spec-plan`	Constraints → Zero-decision plan
`/ccg:spec-impl`	Execute plan + archive
`/ccg:spec-review`	Dual-model cross-review
`/ccg:team-research`	Agent Teams requirements → constraints
`/ccg:team-plan`	Agent Teams constraints → parallel plan
`/ccg:team-exec`	Agent Teams parallel execution
`/ccg:team-review`	Agent Teams weighted consensus review
`/ccg:codex-exec`	Codex full execution (plan → code → review)

Weighted Evaluation

Every competitive dispatch uses task-type-specific weights:

Analysis:    Tech depth (Claude .35) | Risk (Codex .35) | UX (Gemini .40) | Action (.25 each)
Planning:    Architecture (Claude .40) | Scale (Claude .30) | UI (Gemini .40) | Detail (Codex .30)
Implementation: Correctness (Claude .35) | Visual (Gemini .45) | Edge cases (Codex .35) | Maintain (Claude .35)
Review:      Security (Codex .35) | Visual (Gemini .40) | Logic (Claude .35) | Perf (Codex .30)

OPSX Spec-Driven (v1.7.52+)

Integrates OPSX architecture to turn requirements into constraints:

/ccg:spec-init
/ccg:spec-research implement user authentication
/ccg:spec-plan
/ccg:spec-impl
/ccg:spec-review

Agent Teams Parallel Execution (v1.7.60+)

/ccg:team-research implement real-time collaboration kanban API
/ccg:team-plan kanban-api
/ccg:team-exec
/ccg:team-review

Prerequisite: CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1 in settings.json

Configuration

Directory Structure

~/.claude/
├── commands/ccg/       # Slash commands
├── agents/ccg/         # Sub-agents
├── skills/             # Quality gates
├── bin/codeagent-wrapper
└── .ccg/
    ├── config.toml
    └── prompts/{codex,gemini}/

Environment Variables

Variable	Description	Default
`CODEAGENT_POST_MESSAGE_DELAY`	Wait time after Codex completion (seconds)	5
`CODEX_TIMEOUT`	codeagent-wrapper execution timeout (seconds)	7200
`BASH_DEFAULT_TIMEOUT_MS`	Claude Code Bash default timeout (ms)	120000
`BASH_MAX_TIMEOUT_MS`	Claude Code Bash max timeout (ms)	600000

codeagent-wrapper Auto-Authorization

CCG automatically installs a Hook to auto-authorize codeagent-wrapper commands.

Requirement: jq must be installed.

MCP Configuration

Code retrieval MCP (choose one):

ace-tool (recommended)
ContextWeaver (alternative)

Optional: Context7, Playwright, DeepWiki, Exa

Update / Uninstall

npx ccg-workflow@latest          # Update
npx ccg-workflow                 # Select "Uninstall"

Credits

cexll/myclaude - codeagent-wrapper
UfoMiao/zcf - Git tools
GudaStudio/skills - Original routing design
ace-tool - MCP tool

Star History

License

MIT

v1.8.0 (Competitive Multi-Model Redesign) | Issues

Name		Name	Last commit message	Last commit date
Latest commit History 160 Commits
bin		bin
codeagent-wrapper		codeagent-wrapper
src		src
templates		templates
.gitignore		.gitignore
.npmignore		.npmignore
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
README.zh-CN.md		README.zh-CN.md
build.config.ts		build.config.ts
package.json		package.json
tsconfig.json		tsconfig.json
vitest.config.ts		vitest.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CCG - Claude + Codex + Gemini Competitive Multi-Model System

What Changed in This Fork

Before (Original)

After (This Fork)

What's different in practice

Why the change

Architecture

How It Works

Weighted Scoring Example (Phase 2 — Analysis)

Installation

Benchmark Evidence

Model Strength Summary

Dispatch Modes

Commands

Weighted Evaluation

OPSX Spec-Driven (v1.7.52+)

Agent Teams Parallel Execution (v1.7.60+)

Configuration

Directory Structure

Environment Variables

codeagent-wrapper Auto-Authorization

MCP Configuration

Update / Uninstall

Credits

Star History

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CCG - Claude + Codex + Gemini Competitive Multi-Model System

What Changed in This Fork

Before (Original)

After (This Fork)

What's different in practice

Why the change

Architecture

How It Works

Weighted Scoring Example (Phase 2 — Analysis)

Installation

Benchmark Evidence

Model Strength Summary

Dispatch Modes

Commands

Weighted Evaluation

OPSX Spec-Driven (v1.7.52+)

Agent Teams Parallel Execution (v1.7.60+)

Configuration

Directory Structure

Environment Variables

codeagent-wrapper Auto-Authorization

MCP Configuration

Update / Uninstall

Credits

Star History

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages