#7
AI Agents

One Developer, 13 Processes, and the Agent Platform Nobody Planned

AlexClaw is a personal autonomous agent built on OTP. Its author says he didn't plan most of it — he just kept solving the next problem. The architecture tells a different story.

There’s a GitHub README that starts with a quote: “I didn’t plan most of this. I just kept solving the next problem.”

It’s from the author of AlexClaw, a personal autonomous AI agent he posted to the Elixir Forum last week. The project monitors RSS feeds, GitHub repos, and APIs; accumulates knowledge in PostgreSQL with semantic search; routes tasks to the cheapest LLM that satisfies a required reasoning tier; and communicates with its owner via Telegram. It runs on a 13-process OTP supervision tree. It uses 125MB idle. It’s fully self-hosted.

He didn’t plan the architecture. But when you look at what the architecture turned out to be, it’s hard not to notice how much OTP did the planning for him.

The Problem with Building Agents in 2026

Most people building personal AI agents right now are assembling them from parts that weren’t designed for this. A Python script that calls an LLM. A Redis queue for task state. A cron job for scheduling. A Postgres table for memory. A webhook for Telegram. Each piece sensible on its own; the seams between them are where things fall apart. The agent misses the cron window. The webhook fires twice. The Redis state and the Postgres state diverge. The LLM call hangs and nothing notices.

The standard answer is to add more infrastructure: a task queue system, a circuit breaker library, a retry decorator. What you end up with is a personal agent running on roughly the same stack you’d use for a distributed e-commerce platform — which is either a feature or a sign that the primitives aren’t right.

AlexClaw’s author reached for Elixir, he says, because it fit. What he found was that OTP had already solved most of the problems he was about to solve manually.

What 13 Processes Actually Means

The supervision tree isn’t an implementation detail. It’s the design.

GenServer handles the LLM router, each gateway (Telegram, Discord), the workflow executor, and the circuit breakers. Each one is isolated — a failure in the Telegram gateway doesn’t take down the workflow engine. ETS holds the LLM usage counters and config cache. The DynamicSupervisor manages worker pools. A plain Supervisor at the top controls the lifecycle of the whole thing. The runtime configuration — API keys, model prompts, routing tiers — lives in Postgres, cached in ETS, editable via admin UI at runtime with no restart required.

There are no threads to manage. No lock primitives. No shared memory to protect. The concurrency model is message passing, and it’s been working this way since 1986.

What’s novel is the circuit breaker. Each skill — each callable tool in the agent’s repertoire — has its own GenServer-backed circuit breaker: consecutive failures open the circuit, a cooldown timer resets it, Telegram sends a notification on state transitions. Dead-letter routing handles the case where a circuit is open mid-workflow: the step can skip, halt, or fall back to an alternative skill. The author wrote this himself in pure OTP. Zero external dependencies. The pattern is straight from Erlang textbooks that are older than most Python programmers.

# Skills declare their possible outcomes (branches)
# The executor routes to different workflow steps based on which branch fires
# No LLM involved in routing — pure deterministic pattern matching
defmodule MyWorkflow do
  use AlexClaw.Workflow

  step :fetch_data, skill: :web_fetcher, on: [
    ok: :analyze,
    error: :notify_owner
  ]

  step :analyze, skill: :llm_analyzer, on: [
    ok: :store,
    flagged: :escalate
  ]
end

Sequential, deterministic, auditable. No fan-out — one path per run. The workflow engine spends zero LLM tokens on routing decisions because pattern matching doesn’t need inference.

The Local-First LLM Router

The piece that’s most interesting as a standalone architectural decision: AlexClaw routes every task to the cheapest model that satisfies the required reasoning tier.

Heavy tasks — multi-step analysis, writing — go to a cloud model. Light tasks — formatting, filtering, simple classification — go to a local model running via Ollama or LM Studio. Medium tasks land somewhere between. The routing logic is pure pattern matching on the tier label. No inference, no embeddings. The daily usage per provider is tracked in ETS for per-provider cost limits.

The implication is subtle: for a personal agent that runs continuously, inference cost is an operational constraint, not just a number to optimize once at launch. You need the routing to be fast, reliable, and adjustable at runtime — not hard-coded at startup. What AlexClaw built is closer to a load balancer for LLM capacity than to the “pick a model in the config” approach most personal agent frameworks use.

The whole configuration — providers, tiers, routing rules — is managed in Postgres, cached in ETS, editable from the admin UI. You can add a new provider, change its tier assignment, or adjust its daily cap without stopping anything.

The Quote Reads Differently Now

“I didn’t plan most of this. I just kept solving the next problem.”

Taken at face value, it sounds like the typical engineering origin story: organic growth from a single script to a real system. But look at what the accumulation produced. It produced a supervision tree. It produced circuit breakers. It produced a deterministic router and a cost-aware model selection strategy. It produced a system that’s observable, fault-tolerant, and configurable at runtime without restarts.

None of that was specified in advance. All of it is standard OTP.

This is the thing that’s hard to communicate about Elixir to someone who hasn’t built with it: the “good instincts” path and the OTP path are the same path. When you reach for the natural solution to “what happens when this GenServer crashes,” you get supervision. When you reach for “how do I track state across concurrent requests,” you get ETS. When you reach for “how do I isolate failures in one part of the system,” you get the supervision tree.

You don’t need to know you’re building a fault-tolerant distributed system. You just keep solving the next problem.


AlexClaw is on GitHub, MIT-licensed, single-user by design. The Elixir Forum thread is worth reading for the Q&A about the circuit breaker design and the reasoning behind keeping it single-user.