Penpal: Dispatch Tool Today, RPG Interface Tomorrow

Isometric sketch of an RPG world map as an agentic software architecture with agent figures, castle repositories, and dispatch board

What if your agentic business looked like a world you could walk through

I started building Penpal because I was tired of context-switching.

Not the context-switching between tasks — the context-switching between managing AI agents. Open a terminal, start a session, write a prompt, watch it run, catch it when it goes off the rails, restart, repeat. It felt like babysitting interns who were brilliant but couldn't be left alone for five minutes. The leverage was real but the overhead was killing it.

So I built a dispatch layer. The kind of thing you build because you need it, not because someone asked for it.

The repo is open source. github.com/therealsiege/Penpal — Electron desktop app, TypeScript, MIT licensed. Stars and contributions welcome.

What Penpal Is Right Now

Penpal watches your GitHub repositories for issues labeled agent-ready. When one appears, it launches a three-agent pod:

  1. Solver — implements the solution in an isolated Git worktree
  2. Reviewer — validates the work independently, without seeing the solver's approach
  3. Executor — runs the tests, and if they fail, kicks off a self-correction loop

The worktree isolation is important. Each pod works in its own branch, in its own directory, without touching your main codebase until it's ready. If something blows up, nothing contaminates anything else. The pod finishes, opens a PR, and moves on.

That's the basic loop. Issue in, pull request out.

But there's more infrastructure underneath it than that description suggests.

The Routing Layer

Before a pod starts, Penpal scores the complexity of the task. Simple bug fix with clear reproduction steps? Route to the economic profile — local Ollama, zero cost, handles it fine. Architectural decision with broad surface area? Route to Opus with more iterations and tighter governance. You configure the thresholds. The system handles the assignment.

This matters more than it sounds. Running Opus on every task is how you burn through budget on things that didn't need it. Running a local model on something genuinely complex is how you get PRs that look finished but aren't. The routing layer is where the economics of running agents at scale actually live.

The Fleet Layer

Penpal runs across machines. Each machine shows up as a pin on an isometric world map inside the Electron desktop app, with a heartbeat indicator showing whether it's online and what it's running. No central broker — fleet discovery happens through Slack heartbeats. The machines find each other.

If you have three machines, you have three parallel execution environments. Issues get dispatched across the fleet based on availability and load. One pod per machine, running simultaneously, each completely isolated.

The ReasoningBank

As pods complete, Penpal logs what worked. Not the code — the patterns. What context was useful, what the routing score was, what iteration the solution came together on. These feed back into future routing decisions and context injection.

It's a crude version of institutional memory. The system gets slightly smarter about which problems need what resources every time a pod closes. It doesn't feel like much at first. Over hundreds of pods it adds up.

The Governance Layer

This one I don't see people talk about enough when they write about agentic systems: you need rules.

Not soft guidelines. Hard rules. File limits per PR. Forbidden paths. Diff size ceilings. Banned imports. The solver doesn't know your codebase politics. It doesn't know that the authentication module is a minefield or that the database migration files have a specific protocol. Governance rules encode what the agent cannot touch, cannot produce, and cannot merge.

Without this layer, autonomous agents in a real codebase will eventually do something creative and expensive.

The Honest Limitations

Penpal works well on scoped, well-specified issues. "Fix the null pointer exception in the user session handler" — excellent. "Improve the overall architecture of the payments module" — you're going to be disappointed.

The three-agent pipeline catches a lot, but it doesn't catch everything. The reviewer doesn't have external context about your system. The executor only validates what the tests cover. If your test coverage is weak, the executor's sign-off means less than it should.

And the worktree isolation that makes parallel execution safe also means the agents have no awareness of each other's work. Two pods can't collaborate. They can't notice that their PRs will conflict until merge time. That's a real constraint.

I'm not trying to make Penpal sound more capable than it is. It's a dispatch layer with a quality loop on top. The leverage is real. The autonomy is partial.

The Part I Actually Want to Build

Here's where I'll lose some people.

I've been thinking about what it means to manage an agentic workforce at scale. Not a single developer running a few pods. A business where dozens of agents are executing work across multiple repositories, multiple services, multiple objectives — simultaneously, continuously, autonomously.

The way we currently interface with that kind of system is wrong. Dashboards, logs, queues. Admin panels. We borrowed the interface metaphors from infrastructure monitoring because that was the closest analogy we had. But agents aren't servers. They're workers. And you don't manage workers with a log aggregator.

You manage workers with situational awareness. With spatial context. With a sense of what's happening and where, not just whether the queue depth is acceptable.

So here's the vision: an RPG as the operating interface for an agentic business.

Your company is a world. The repositories are regions. The agents are characters — each with a class, a skill profile, a history of completed quests. When you dispatch a pod, you're sending a party into a dungeon. The dungeon is the codebase. The quest is the issue. The loot is the merged PR.

This isn't a metaphor for a PowerPoint. I mean a literal game interface — isometric, explorable, with agents moving through a rendered world map of your business. You see the fortress where your auth service lives. You see the market district where your billing integration runs. You see your agents moving through those spaces, taking on quests, returning with artifacts.

The XP and progression systems aren't cosmetic. An agent that has successfully solved 50 authentication issues has a track record in that domain. You route auth work to it preferentially. The "level" is a real signal. The "class" is a real specialization. The leaderboard isn't for ego — it's for understanding which agents are performing and which need different task assignment.

The seasonal challenges aren't just for fun either. They're how you discover capability. Put your agents through a timed challenge across an unusual problem category and you learn something about the routing configuration that weeks of normal operation wouldn't surface.

What I'm describing is a command interface for a new kind of organization — one where the distinction between "doing work" and "managing work" has collapsed for the things agents can handle, and humans are left doing what humans are actually good at: setting direction, defining what matters, and making the calls that require judgment the system doesn't have.

The RPG interface makes that relationship legible. You're not a developer anymore. You're a guild master. You're deciding which quests to take, which parties to send, what the objective is. The agents execute. You direct.

Why This Isn't Whimsy

I want to be clear that this isn't a "make work fun" project. Gamification for its own sake is condescending — people who actually want to get things done don't need fake achievement badges.

This is about legibility at scale.

When you have three agents running, a list of pods and their status is fine. When you have thirty agents running across five machines, touching six services, with interdependencies between their outputs — a list of pods is not fine. You need spatial, persistent, narrative context. You need to walk into your office and see what your company is doing, the way a general can look at a map and understand the state of a campaign.

The game interface is the most information-dense, spatially coherent, historically continuous interface format we have. We've spent fifty years building the grammar for how games communicate state to players. I want to borrow that grammar for something that actually matters.

Penpal is version one of that idea. Dispatch works. The fleet works. The quality loop works. What comes next is the world it lives in.


Star the repo on GitHub →

It's early. The RPG layer doesn't exist yet. But the dispatch infrastructure is real, and if you're running AI agents at any kind of scale, the problems it solves are real too. Contributions welcome — especially if you want to help build the world.