Desktop Agent
One message and it researches, creates files, opens apps, and delivers the result. No staging, no multi-step prompting. Just say what you need.
Understudy is an open-source AI agent that lives on your computer. Give it a task and it researches, browses the web, clicks through desktop apps, manages files, and replies through your existing channels. Teach it once and it learns. Use it daily and it gets faster.
Four demos, each showing a different side of what Understudy can do.
The agent researches the web, controls your browser, invokes skills, and delivers a polished result — all from a single instruction. No staging, no multi-step prompting. One local runtime handles everything.
Send a message from your phone via Telegram. Understudy receives it on your Mac, converts a file to PDF, opens desktop Telegram, finds the right contact, and sends it — all through GUI automation. Phone view and desktop view shown side by side.
Demonstrate a workflow once — Understudy watches, understands the intent, and publishes a reusable skill. Interactively refine the generated skill, then invoke it with natural language. On replay, the agent automatically generalizes: Google Image search becomes browser automation, downloads become shell commands, while Pixelmator Pro stays GUI-controlled.
A six-stage pipeline browses the real App Store, installs an app via iPhone Mirroring, explores it autonomously — discovering features it's never seen — composes a narrated review video locally, uploads it to YouTube, and cleans up the device. The middle stage is genuinely agentic: 51 quality-gate rules guide the agent, but it navigates an unfamiliar app freely and makes its own editorial decisions. About one hour, zero human intervention.
One local agent that can see your screen, open apps, browse the web, run commands, and send messages — all from a single instruction.
One message and it researches, creates files, opens apps, and delivers the result. No staging, no multi-step prompting. Just say what you need.
Send a message from your phone via Telegram, Slack, or any of 8 channels. Understudy works on your Mac and sends back the result while you're away.
Telegram, Slack, Discord, WhatsApp, Signal, LINE, iMessage, and Web. Control your agent through the messaging apps you already use.
Like a new colleague who grows into the role — Understudy starts by following instructions, then gradually learns your routines and finds better ways to get things done.
No subscriptions, no lock-in. Run it locally with full control over your data and your choice of AI provider.
MIT license. Full source code on GitHub. Inspect, modify, and contribute freely.
Anthropic, OpenAI, Google, MiniMax, and more. Use your own API key — no bundled subscription required.
Runs on your machine. Screenshots, recordings, and task data stay on your computer by default.
Full native GUI automation on macOS. Linux and Windows desktop support are planned — contributions welcome.
Install from npm and let the wizard walk you through setup.
Five layers of capability, built progressively. No shortcuts.