Route every prompt to the best model — for you specifically.
PiPiMink is a Go service that routes each prompt to the LLM most likely to produce the best response — for you specifically.
Most AI routers use generic, one-size-fits-all benchmarks (MMLU, HumanEval, etc.) that measure average performance across millions of anonymous prompts. PiPiMink takes the opposite approach: it learns which models work best for your actual use cases, based on benchmarks you define and results you observe yourself.
The routing is intentionally subjective. A model that scores 90% on a generic coding benchmark may still give worse answers than a smaller local model for your specific coding style, domain, or workflow. PiPiMink lets you measure exactly that.
There is no global leaderboard that defines what "best" means. You do.
Every chat request goes through two steps: a routing decision, then the actual model call.
sequenceDiagram
participant Client
participant PiPiMink
participant Cache
participant MetaModel as Meta-model<br/>(routing LLM)
participant TargetModel as Target model<br/>(selected LLM)
Client->>PiPiMink: POST /v1/chat/completions<br/>{ "messages": [...] }
PiPiMink->>Cache: lookup(hash(prompt + model_capabilities))
alt cache hit
Cache-->>PiPiMink: cached model name
else cache miss
PiPiMink->>MetaModel: prompt + capability tags<br/>+ benchmark scores of all enabled models
MetaModel-->>PiPiMink: { "modelname": "gpt-4o",<br/> "reason": "...",<br/> "matching_tags": [...] }
PiPiMink->>Cache: store(decision)
end
PiPiMink->>TargetModel: original messages
TargetModel-->>PiPiMink: response
PiPiMink-->>Client: response
The routing priority inside the meta-model call is:
- Capability tags — primary signal (what the model says it excels at)
- Benchmark scores — secondary signal (how the model actually performed on your tasks)
- Average response time — tiebreaker only (measured latency from benchmarks)
During a model refresh, PiPiMink asks every available model to assess its own strengths and weaknesses. Each model replies with a structured JSON tag list:
{
"strengths": ["code-generation", "step-by-step-reasoning", "multilingual"],
"weaknesses": ["real-time-information", "image-generation"]
}These tags are stored in PostgreSQL alongside each model's metadata. The exact prompts used for this interview are editable in the admin config page — you can steer which capability dimensions get reported.
When a chat request arrives, PiPiMink sends the user's prompt — along with the capability tags and benchmark scores of all enabled models — to a configurable meta-model. The meta-model returns a structured routing decision:
{
"modelname": "gpt-4o",
"reason": "The request requires deep reasoning and code review",
"matching_tags": ["code-generation", "step-by-step-reasoning"],
"tag_relevance": { "code-generation": 9, "step-by-step-reasoning": 8 }
}PiPiMink then forwards the original prompt to the selected model and returns its response.
The built-in benchmark suite evaluates models across coding, reasoning, instruction-following, creative writing, summarization, and factual QA. Scores are measured by an LLM judge you configure — meaning the evaluation reflects your standards, not an industry average.
Because benchmark results feed directly into the routing decision, the more you benchmark with tasks relevant to your workflow, the better the routing gets — for you.
Response time is measured automatically during benchmark runs. Each benchmark task records how long the model took to respond (in milliseconds). These per-task latencies are averaged per model and stored in PostgreSQL.
When the meta-model makes a routing decision, the average response time is included as a tiebreaker only: if two models are equally suited based on capability tags and benchmark scores, the faster one is preferred. Latency never overrides a better quality match.
Models that have not been benchmarked yet simply have no latency data — routing works normally based on tags alone.
Routing decisions are cached in memory using a hash of the normalized prompt and the current capability snapshot. Cache entries expire by TTL and are evicted by LRU when the size limit is reached, so the router stays fast for repeated or similar prompts.
| Type | Examples |
|---|---|
openai-compatible |
OpenAI, Gemini, OpenRouter, LM Studio, any local server (Ollama, llama.cpp, MLX) |
anthropic |
Anthropic Claude (uses the native Messages API) |
Azure AI Foundry is supported via per-model model_configs entries. See SETUP.md for details.
PiPiMink is a drop-in proxy. Existing clients require no changes:
| Endpoint | Description |
|---|---|
POST /chat |
Native PiPiMink — always auto-routes |
GET /models |
List all models with tags, scores, and latency |
POST /v1/chat/completions |
OpenAI-compatible (streaming supported) |
GET /v1/models |
OpenAI-compatible model list |
POST /api/chat, GET /api/tags |
Ollama-compatible |
GET /admin |
Model management UI |
GET /admin/config |
Benchmark task + tagging prompt editor |
GET /metrics |
Prometheus/OpenMetrics |
Clients like Open WebUI can connect directly using either the OpenAI-compatible or Ollama-compatible endpoint. PiPiMink will appear as a single model and route each request internally.
| Category | Scoring |
|---|---|
coding |
LLM judge — multi-criteria (correctness, efficiency, clarity, edge cases) |
reasoning |
Deterministic — exact numeric answer |
instruction-following |
Format validator — structural checks |
creative-writing |
LLM judge — multi-criteria (imagery, originality, structure, tone) |
summarization |
LLM judge — multi-criteria (coverage, accuracy, conciseness, format) |
factual-qa |
Deterministic — substring match |
LLM-judge tasks use a configurable judge model (BENCHMARK_JUDGE_PROVIDER / BENCHMARK_JUDGE_MODEL). Each criterion is scored independently on a 0–10 scale; the final score is the average across all criteria. This gives fine-grained, continuous scores rather than binary pass/fail.
Benchmark scores feed directly into routing decisions as the secondary signal after capability tags, pushing traffic toward models that empirically perform better on the types of tasks you actually run.
| Path | Purpose |
|---|---|
main.go |
Entry point |
cmd/server/ |
HTTP server, handlers, routing logic, admin UI |
internal/llm/ |
Provider clients, capability tagging, model selection, routing cache |
internal/benchmark/ |
Benchmark task definitions, runner, scorer |
internal/database/ |
PostgreSQL persistence and schema migration |
internal/models/ |
Domain types |
docs/ |
Generated OpenAPI / Swagger artifacts |
cp providers.example.json providers.json # configure your providers
cp .env.example .env # fill in API keys
./scripts/start-stack.sh # start DB + appThen open http://localhost:8080/admin to discover, tag, and benchmark your models.
For detailed setup instructions including Azure AI Foundry, admin UI usage, configuration reference, and local development, see SETUP.md.
This project was developed with the assistance of AI coding tools. Contributions that use AI are welcome — see CONTRIBUTING.md for guidelines.
PiPiMink is open source software licensed under the Apache License 2.0. You are free to use, modify, and redistribute the code under the terms of that license.
The project name "PiPiMink", logo, and official branding are governed by a separate trademark policy and are not covered by the Apache-2.0 license.
See also: NOTICE | CONTRIBUTING.md | TRADEMARKS.md
