This is the architecture I use to build agents. It's what OpenAI, Devin, Cursor, and about 30 other companies have figured out. If you put in 12 months, you can figure it out too. Or you can read this.

I've been shipping software for 22 years. I'm a UI engineer—JavaScript, TypeScript, design, UX—but I've built distributed systems too. Currently Principal Engineer at Prequel. Before that, Tech Lead at Elastic Security for 5 years. Before that, UI Architect at Endgame. I provide value every time, and I win. I know what works and what breaks.

The components

Orchestration

Temporal TODO

MCP

The Model Context Protocol SDKs (TypeScript and Go) for MCP serving and clients. MCP lets your agent call tools and lets tools expose themselves to agents in a standard way.

Knowledge base

Postgres with pgvector and full-text search for semantic and keyword searching of embedded knowledge. You need both. Vector search finds similar things. Full-text search finds exact things. Your knowledge base should always have both.

I use Postgres because it does vectors (pgvector), full-text (tsvector), documents (JSONB), and graphs (Apache AGE). One query language, one consistency model, one place to look when things go wrong. You may use separate databases—Pinecone for vectors, Elasticsearch for search, Neo4j for graphs—because you need performance at the edges or you already have expertise there.

The search pipeline

Agents need relevant data. Summarization is one technique to allow a conversation to continue despite an overly full context. So-called RAG (Retrieval Augmented Generation) is another. I propose the following: allow the agent to propose some 'knowledge'. Index that knowledge using your search pipeline. When the agent's context is nearly full, ask the agent to index any useful knowledge, and then to provide a query into the knowledge base, as well as some starting instructions. The agent's history is cleared. The knowledge base is queried with the agent-provided query, and the results are added to the history. The agent's express instructions are added to the history. The order is: system instructions, agent instructions, query results. Summarization can optionally be added as well. This dynamic system is powerful enough to obviate the need for delegation, or so I hope to prove. Devin AI seems to think so.

Source of truth database: This is where your actual data lives. Documents, conversations, notes, whatever. This is the canonical store.
Change detection: When a record changes in the source of truth, trigger re-indexing. This can be a database trigger, a change data capture stream, or just a job that runs after writes.
Embedding generation: When something changes, compute a new embedding using your embedding model (e.g., nomic-embed-text). Store this vector.
Vector index: An optimized store for similarity search. This could be pgvector in Postgres, or a dedicated vector database, or a managed service.
Document index: A queryable store where you can do complex logic on fields—filters, aggregations, facets, nested queries. This is what Elasticsearch gives you. Agents need this to model complex data and run sophisticated queries against the knowledge base. MongoDB could be another good option.
Full-text index: An index with stemming, tokenization, and ranking. This is what lets users search for "running" and find documents containing "run" or "ran."
Graph index (optional but powerful): A graph database lets agents model relationships—who knows whom, what depends on what, how concepts connect. Useful for reasoning over connected data.
- Apache AGE is a Postgres extension that adds graph capabilities. Apache 2.0 license. Lets you stay in Postgres.
- Dgraph is a standalone graph database. Apache 2.0.
- Neo4j is popular but GPL v3 (copyleft), so check if that works for you.

Implementation options

Postgres does it all: pgvector for vectors, tsvector for full-text, GIN/B-tree for document queries, Apache AGE for graphs. One database, simpler operations.
Elasticsearch does it all: Vectors, full-text, document queries, and (limited) graph traversal in one system. More operational complexity, more query power.
Specialized databases: Use dedicated systems for each concern—Pinecone for vectors, Algolia for search, Neo4j or Dgraph for graphs. More cost, more ops, potentially better at each thing.
Skip something: Maybe you know your use case doesn't need graphs. Maybe full-text search is overkill. Think for yourself—I can't be an authority on things I don't know about your situation.

The point is: you need a pipeline from source of truth → embedding → searchable indexes, and you probably need multiple types of search (semantic, keyword, full-text) because they find different things.

Token streaming

Redis or NATS for streaming tokens from the Temporal worker back to the web server so the browser can watch tokens come in from the LLM. The Temporal worker does the inference, publishes tokens to a channel, the web server subscribes and forwards to the browser.

I use pub/sub because it gives real-time delivery without polling overhead. Redis is simple. NATS is lighter if you don't need Redis's other features. You may poll if your use case tolerates latency or you want fewer moving parts.

Backend

Go or TypeScript with Vercel AI or homebrew agent endpoints. Serves HTTP/2 with long polling, SSE, or websockets.

The backend provides:

Lists of conversations
History of conversations
Duplicate, archive, delete, or create conversations
Post to conversations
Poll or subscribe to new messages
Search conversations

I recommend postgres.

Conversations (and most things) are automatically added to the knowledge base. Many things are.

Web server

nginx or Caddy for serving the site. CDNs for static assets.

Bundler

Rollup or Vite.

UI framework

React or Preact. Radix UI for accessible, unstyled primitives. Tailwind or CSS modules for styling.

Preferred libraries: nanostores for state, Monaco Editor for code editing, Three.js for 3D, D3 for data visualization.

Chat UI

useChat or homebrew nanostores-based conversation code.

The chat UI must:

Connect to the backend
Display conversations
Let users start, post to, interrupt, update, stop, and duplicate conversations
Show streaming tokens
Render GitHub-flavored markdown
Use rehype and remark to convert custom HTML elements in the markdown into React/Preact components (this lets agents post messages containing interactive components)

Browser virtualization

Virtua for virtualized lists in the browser. You'll need this when conversations get long.

Storage

Pocketbase for storing prompts and system instructions. Simple, self-hosted, has auth and realtime subscriptions built in.

Agent capabilities

Sandboxes

agent-sandbox via Kubernetes allows agents to request sandboxes and execute code in them. Python and TypeScript environments. The agent can spin up a sandbox, run code, get results, tear it down.

Scheduling

Tools that let the agent schedule work using Temporal scheduling primitives. An agent can say "run this workflow tomorrow at 9am" or "run this every hour."

Collaboration (CRDT-based)

CRDT-based collaboration tools let agents and humans work on the same data in real time:

Document sharing
Kanban boards
Chat rooms

Off-the-shelf CRDT libraries

Yjs - mature, rich ecosystem. Bindings for ProseMirror, Lexical, CodeMirror. Providers for WebSocket, WebRTC, IndexedDB. This is the one to use for documents and kanban.
Automerge - simpler API, good for JSON-structured data, strong offline-first support. Use this if you want version history and branching built in.

Off-the-shelf chat

Matrix protocol with Matrix-CRDT - bridges Yjs with Matrix. Your chat becomes a CRDT document that agents can read and write.
Element - Matrix client for humans.

What you'll need to build

Agent subscriptions: Tools that let agents subscribe to CRDT changes (e.g., "notify me when a card moves to the Done column").
Notification routing: Fine-grained notifications via email, Slack, or SMS when things change.
Kanban UI: Yjs gives you the data layer. You'll need to build (or adapt) the UI. See Yboard for a simple starting point.

For example:

Agents subscribe to new cards and changes in specific areas of a kanban board
Humans view the same board in the browser
Changes trigger notifications to the right people via the right channel

Computational tools (agent-as-a-tool)

The core pattern: one agent can call another agent as a tool. When doing so, the caller can pass parameters that get embedded into the callee's system instructions. This creates a recursive delegation chain.

I use agent-as-a-tool because it lets you branch, parallelize, and specialize without hard-coding the structure. You may use a single agent with many tools if your context fits and you don't need parallelism. You may use a pipeline if your workflow is linear and predictable.

The pattern

A user talks to an Account Executive agent
The AE agent delegates to a Scrum Master agent, passing context via system instruction parameters
The Scrum Master delegates to Developer agents, UX agents, etc.
Each agent can spawn multiple child agents concurrently, branching the work
Child agents can delegate further, recursively

This creates a tree of agents working on a coordinated task.

Orchestration and strategy

A central orchestrator controls the delegation policy using strategic algorithms:

Deep Monte Carlo tree search (MCTS): Explore the tree of possible actions, simulate outcomes, backpropagate value
Deep counterfactual regret minimization: Learn from regret over past decisions, converge toward optimal play
Upper confidence bound (UCB): Balance exploration vs. exploitation—try new branches vs. exploit known-good paths
Perceptrons or heuristics: Learned or hand-coded policies for when to branch, when to stop, when to delegate

Delegation budgets

Each agent has a delegation budget—a limit on how many child agents it can spawn before it must yield. This prevents runaway recursion and allows the system to:

Stop early: Return a quick estimate instead of a fully-vetted response
Coordinate globally: A single high-level agent can influence the work of all descendants
Scale: A huge team of agents works on a coordinated task without losing coherence

Why this works

This architecture handles both:

Long-running async workloads: Background research, code generation, document drafting—agents work for hours or days, branching and merging
High-frequency near-real-time domains: Fast decision-making with tight budgets—agents yield quickly with partial results

The orchestrator decides when to explore (try new approaches) and when to exploit (commit to a known path). The delegation budget ensures the system stays responsive even under heavy branching.

How it fits together

Browser — React/Preact + nanostores + virtua + rehype/remark
Web server — Caddy or nginx
Backend — Go/TS + HTTP/2 + SSE/WS (conversations, history, search, streaming)
Data stores
- Redis/NATS for streaming
- Pocketbase for prompts
- Postgres for knowledge (pgvector + FTS + AGE)
Orchestration — Temporal (workflows, activities, scheduling)
Capabilities
- MCP servers (tools)
- LLM (local or remote)
- Sandboxes (k8s)

What this gets you

An agent that can:

Have long-running conversations
Call tools (via MCP)
Search and retrieve knowledge (vector + keyword)
Execute code in sandboxes
Schedule future work
Collaborate with humans via shared documents, kanban, chat
Spawn and orchestrate other agents
Stream responses to the browser in real time

This is the stack. The rest is implementation.

About me

22 years shipping software. UI engineer first—JavaScript, TypeScript, React, Preact, Lit Elements. I have a serious eye for design and UX.

Current: Principal Engineer at Prequel

Previous: Tech Lead at Elastic Security (5 years), UI Architect at Endgame (defense/security)

Infrastructure: Docker Compose, Helm, Terraform, Kubernetes

Observability: Grafana, Sentry

Frontend: React, Preact, Lit Elements; CSS modules, Tailwind

Build: Rollup (preferred), Vite; Make, act

Editor/tools: Cursor, Qwen, Neovim, WezTerm

I've built this architecture. I have 20+ private MCP servers, integrations with Apache AGE, Dgraph, pgvector, and Temporal. I've shipped agents that use this stack in production.

Get in touch

I'm looking for a remote Principal or Staff Engineer role focused on AI infrastructure, agent systems, or frontend architecture. I bring 22 years of shipping, strong opinions on design, and the ability to build the whole stack.

If you're building something like this and want to talk, reach out.

[TODO: add contact info]