This is the architecture I use to build agents. It's what OpenAI, Devin, Cursor, and about 30 other companies have figured out. If you put in 12 months, you can figure it out too. Or you can read this.

I've been shipping software for 22 years. I'm a UI engineer—JavaScript, TypeScript, design, UX—but I've built distributed systems too. Currently Principal Engineer at Prequel. Before that, Tech Lead at Elastic Security for 5 years. Before that, UI Architect at Endgame. I provide value every time, and I win. I know what works and what breaks.

The components

Orchestration

Temporal TODO

MCP

The Model Context Protocol SDKs (TypeScript and Go) for MCP serving and clients. MCP lets your agent call tools and lets tools expose themselves to agents in a standard way.

Knowledge base

Postgres with pgvector and full-text search for semantic and keyword searching of embedded knowledge. You need both. Vector search finds similar things. Full-text search finds exact things. Your knowledge base should always have both.

I use Postgres because it does vectors (pgvector), full-text (tsvector), documents (JSONB), and graphs (Apache AGE). One query language, one consistency model, one place to look when things go wrong. You may use separate databases—Pinecone for vectors, Elasticsearch for search, Neo4j for graphs—because you need performance at the edges or you already have expertise there.

The search pipeline

Agents need relevant data. Summarization is one technique to allow a conversation to continue despite an overly full context. So-called RAG (Retrieval Augmented Generation) is another. I propose the following: allow the agent to propose some 'knowledge'. Index that knowledge using your search pipeline. When the agent's context is nearly full, ask the agent to index any useful knowledge, and then to provide a query into the knowledge base, as well as some starting instructions. The agent's history is cleared. The knowledge base is queried with the agent-provided query, and the results are added to the history. The agent's express instructions are added to the history. The order is: system instructions, agent instructions, query results. Summarization can optionally be added as well. This dynamic system is powerful enough to obviate the need for delegation, or so I hope to prove. Devin AI seems to think so.

  1. Source of truth database: This is where your actual data lives. Documents, conversations, notes, whatever. This is the canonical store.

  2. Change detection: When a record changes in the source of truth, trigger re-indexing. This can be a database trigger, a change data capture stream, or just a job that runs after writes.

  3. Embedding generation: When something changes, compute a new embedding using your embedding model (e.g., nomic-embed-text). Store this vector.

  4. Vector index: An optimized store for similarity search. This could be pgvector in Postgres, or a dedicated vector database, or a managed service.

  5. Document index: A queryable store where you can do complex logic on fields—filters, aggregations, facets, nested queries. This is what Elasticsearch gives you. Agents need this to model complex data and run sophisticated queries against the knowledge base. MongoDB could be another good option.

  6. Full-text index: An index with stemming, tokenization, and ranking. This is what lets users search for "running" and find documents containing "run" or "ran."

  7. Graph index (optional but powerful): A graph database lets agents model relationships—who knows whom, what depends on what, how concepts connect. Useful for reasoning over connected data.

Implementation options

The point is: you need a pipeline from source of truth → embedding → searchable indexes, and you probably need multiple types of search (semantic, keyword, full-text) because they find different things.

Token streaming

Redis or NATS for streaming tokens from the Temporal worker back to the web server so the browser can watch tokens come in from the LLM. The Temporal worker does the inference, publishes tokens to a channel, the web server subscribes and forwards to the browser.

I use pub/sub because it gives real-time delivery without polling overhead. Redis is simple. NATS is lighter if you don't need Redis's other features. You may poll if your use case tolerates latency or you want fewer moving parts.

Backend

Go or TypeScript with Vercel AI or homebrew agent endpoints. Serves HTTP/2 with long polling, SSE, or websockets.

The backend provides:

I recommend postgres.

Conversations (and most things) are automatically added to the knowledge base. Many things are.

Web server

nginx or Caddy for serving the site. CDNs for static assets.

Bundler

Rollup or Vite.

UI framework

React or Preact. Radix UI for accessible, unstyled primitives. Tailwind or CSS modules for styling.

Preferred libraries: nanostores for state, Monaco Editor for code editing, Three.js for 3D, D3 for data visualization.

Chat UI

useChat or homebrew nanostores-based conversation code.

The chat UI must:

Browser virtualization

Virtua for virtualized lists in the browser. You'll need this when conversations get long.

Storage

Pocketbase for storing prompts and system instructions. Simple, self-hosted, has auth and realtime subscriptions built in.

Agent capabilities

Sandboxes

agent-sandbox via Kubernetes allows agents to request sandboxes and execute code in them. Python and TypeScript environments. The agent can spin up a sandbox, run code, get results, tear it down.

Scheduling

Tools that let the agent schedule work using Temporal scheduling primitives. An agent can say "run this workflow tomorrow at 9am" or "run this every hour."

Collaboration (CRDT-based)

CRDT-based collaboration tools let agents and humans work on the same data in real time:

Off-the-shelf CRDT libraries

Off-the-shelf chat

What you'll need to build

For example:

Computational tools (agent-as-a-tool)

The core pattern: one agent can call another agent as a tool. When doing so, the caller can pass parameters that get embedded into the callee's system instructions. This creates a recursive delegation chain.

I use agent-as-a-tool because it lets you branch, parallelize, and specialize without hard-coding the structure. You may use a single agent with many tools if your context fits and you don't need parallelism. You may use a pipeline if your workflow is linear and predictable.

The pattern

  1. A user talks to an Account Executive agent
  2. The AE agent delegates to a Scrum Master agent, passing context via system instruction parameters
  3. The Scrum Master delegates to Developer agents, UX agents, etc.
  4. Each agent can spawn multiple child agents concurrently, branching the work
  5. Child agents can delegate further, recursively

This creates a tree of agents working on a coordinated task.

Orchestration and strategy

A central orchestrator controls the delegation policy using strategic algorithms:

Delegation budgets

Each agent has a delegation budget—a limit on how many child agents it can spawn before it must yield. This prevents runaway recursion and allows the system to:

Why this works

This architecture handles both:

The orchestrator decides when to explore (try new approaches) and when to exploit (commit to a known path). The delegation budget ensures the system stays responsive even under heavy branching.

How it fits together

  1. Browser — React/Preact + nanostores + virtua + rehype/remark
  2. Web server — Caddy or nginx
  3. Backend — Go/TS + HTTP/2 + SSE/WS (conversations, history, search, streaming)
  4. Data stores
  5. Orchestration — Temporal (workflows, activities, scheduling)
  6. Capabilities

What this gets you

An agent that can:

This is the stack. The rest is implementation.

About me

22 years shipping software. UI engineer first—JavaScript, TypeScript, React, Preact, Lit Elements. I have a serious eye for design and UX.

Current: Principal Engineer at Prequel

Previous: Tech Lead at Elastic Security (5 years), UI Architect at Endgame (defense/security)

Infrastructure: Docker Compose, Helm, Terraform, Kubernetes

Observability: Grafana, Sentry

Frontend: React, Preact, Lit Elements; CSS modules, Tailwind

Build: Rollup (preferred), Vite; Make, act

Editor/tools: Cursor, Qwen, Neovim, WezTerm

I've built this architecture. I have 20+ private MCP servers, integrations with Apache AGE, Dgraph, pgvector, and Temporal. I've shipped agents that use this stack in production.

Get in touch

I'm looking for a remote Principal or Staff Engineer role focused on AI infrastructure, agent systems, or frontend architecture. I bring 22 years of shipping, strong opinions on design, and the ability to build the whole stack.

If you're building something like this and want to talk, reach out.

[TODO: add contact info]