Mercury lets you build agent workflows your way.
Compose agents, tools, and skills in a workflow, configure the runtime however you need, and let Mercury handle checkpoints, retries, resume, and execution semantics underneath.
Most workflow systems make you choose between ease of use and runtime control. Mercury is built to give you both:
- A simple workflow model built from agents, tools, and skills.
- A runtime that handles the hard parts underneath.
- Configurable execution when you need more control.
- A stable core you can build on without locking into one stack.
Mercury is for teams that want workflows to stay easy to author while the runtime takes responsibility for the heavy lifting.
A Mercury workflow is a graph of tasks.
Each task is one of:
agenttoolskill
Tasks can depend on earlier tasks. Mercury figures out what is ready to run, executes tasks in order, tracks outputs, records events, and persists enough state to resume later.
You can start with the workflow itself and ignore deeper runtime controls until you need them.
uv venv --python 3.12
uv sync --extra devimport asyncio
from mercury import register_tool, run_flow
async def echo_tool(inp, ctx):
return {"output": {"text": inp["text"], "task_id": ctx.task_id}}
async def main():
register_tool("echo_tool", echo_tool)
result = await run_flow(
{
"workflow_id": "hello-flow",
"tasks": [
{
"id": "task_a",
"kind": "tool",
"target": "echo_tool",
"input": {"text": "hello mercury"},
}
],
},
planner_id="rules",
workspace=".",
)
print(result)
asyncio.run(main())Run:
mercury run \
--workflow workflow.json \
--planner-id rules \
--workspace .Resume:
mercury resume --checkpoint .mercury/checkpoints/<run_id>.jsonInspect:
mercury inspect --checkpoint .mercury/checkpoints/<run_id>.json --jsonMercury includes cookbook-style examples that show how product use cases map onto the same runtime:
- research_write.py: simple research and writing workflow
- examples/cookbook/rag/flow.py: retrieval-augmented workflow backed by Convex
- examples/cookbook/nlp2sql/flow.py: text-to-SQL workflow backed by Convex
- examples/cookbook/README.md: cookbook setup and run instructions
The cookbook is the main place to see how Mercury maps to real use cases rather than abstract patterns.
Mercury is designed around a simple idea:
- Users should think in workflows, not runtime internals.
- Workflows should be easy to compose from agents, tools, and skills.
- Runtime control should be available without becoming mandatory.
- The system should stay configurable without forcing one planner, scheduler, sandbox, model stack, or tool stack.
In practice that means Mercury aims to feel lightweight at the surface while taking responsibility for the difficult runtime behavior underneath.
Mercury keeps the runtime burden in the engine so workflow code can stay focused on behavior.
Built-in runtime responsibilities include:
- dependency-aware task execution
- retries with exponential backoff
- fallback outputs
- failure propagation to dependent tasks
- checkpoint and resume
- cancellation
- append-only event journaling
- scheduler state restoration on resume
- contract enforcement around planners and schedulers
Mercury is kernel-first internally. The kernel is the source of truth for execution correctness, while runtime behavior remains configurable.
Kernel responsibilities:
- parse and validate workflow boundaries
- maintain run state and task lifecycle transitions
- own retries, blocking, cancellation, checkpointing, and resume
- enforce planner, scheduler, and runtime contracts
- persist checkpoints and event journals
Extension responsibilities:
- handlers implement business behavior
- planners decide what to enqueue and when to complete
- schedulers choose among ready task IDs
- runtime plugins shape execution policy around the kernel
This split is what lets Mercury stay simple to use while remaining deeply configurable.
When you need more control, Mercury lets you configure runtime behavior per run.
Current runtime controls include:
planner_id+planner_configscheduler_id+scheduler_configsandbox_id+sandbox_confighitl_id+hitl_configinbound_adapter_id+inbound_adapter_configmax_concurrencydurability_mode(sync,async,exit)
Built-in adapters today:
- planners:
rules,rules_pydanticai - schedulers:
superstep,ready_queue - sandboxes:
host,docker - hitl:
none,cli_gate
These are runtime controls, not the beginner mental model.
If you want to understand the kernel contracts, diagrams, and execution lifecycle in detail, see:
[Link health] Run markdown-link-check README.md docs/*.md before publishing to cover the new pages.
From mercury:
run_flow(...) -> RunResultresume_flow(...) -> RunResultinspect_run(checkpoint_path) -> dictcancel_run(run_id) -> None- registrations:
register_agentregister_toolregister_skillregister_plannerregister_schedulerregister_sandboxregister_hitlregister_inbound_adapterregister_hook
Canonical memory compartments:
working: latest structured outputs for runtime lookupsepisodic: append-only lifecycle and event recordsartifacts: immutable task outputs keyed by artifact ID
Workspace layout under <workspace>/.mercury/:
checkpoints/traces/artifacts/context/events/skills/
Event journal contract:
- path:
.mercury/events/<run_id>.jsonl - one JSON object per line with:
run_idworkflow_idtickevent_typepayloadtimestamp
To add a custom adapter or handler:
- Implement the contract.
- Register it by ID.
- Reference that ID from
run_flow(...)or the CLI.
Mercury keeps the kernel stable while letting the surrounding runtime evolve.
Mercury's near-term direction is to make the runtime more capable without making the user model heavier.
Planned areas include:
- more cookbook coverage for product use cases
- real Docker-backed sandboxing
- more meaningful
traces/andcontext/outputs - stronger productized entrypoints beyond the current CLI/runtime surface
- retrieval-oriented memory integrations that remain adapter/config driven
- richer reasoning scratchpads with checkpoint-aware persistence
- deeper runtime surfaces for pause/resume and human review flows