Let's assume you're planning to build agents in 2026 and you have no idea what you're doing. :P Let's start by going over basic tools. I don't know everything, but I'm confident you'll learn something trying these out.

What is an agent?

The industry has several definitions.

Gartner calls it "a software entity that perceives its environment, makes decisions, takes actions, and works toward goals." They warn of "agent washing"—vendors calling assistants "agents" when they're just chatbots with extra steps.

BuiltABot distinguishes agents from chatbots: chatbots "reset context per session" and handle simple Q&A; agents "maintain context across conversations, reason through multi-step workflows, and take actions via integrations."

Google's A2A Protocol defines an agent as "an independent software entity" that:

Exposes capabilities via an Agent Card (JSON metadata)
Communicates via HTTP/JSON-RPC endpoints
Handles tasks, provides status updates, returns artifacts

Key concept from the spec: agents are "opaque"—they don't share internal state with other agents.

My definition:

An agent is a computer system that can be controlled with natural language and uses inference to interpret inputs and make policy decisions. Everything else—memory, tools, planning—is implementation detail.

Basic Agent Architecture

Tools we need

Inferencing

For local inference, I use Ollama or llama.cpp. Both provide a server mode. Ollama also includes a UI and handles model downloading. Both work well for local development.

Providers

Closed-weight provider APIs like OpenAI, Anthropic, and others work, but having your own GPU—rented or owned—ensures you have the correct models, no rate limits or throttling, and the best privacy available. When I have a choice, I prefer to rent an RTX 5090 from Runpod. You can run Ollama on these instances easily, and it will download models from Hugging Face automatically.

Ollama cloud has been reliable for basic development. It supports a curated list of cloud models and embedding models. You can configure your app to use your local Ollama server as the primary inference endpoint, with Ollama cloud as a fallback. If you don't need custom models or fine-tuned variants, this is a cost-effective and reliable option.

Runpod has been solid. I recommend finding a public Runpod template on GitHub and customizing it. Create a Docker image, publish it to a registry, and connect Runpod to your registry. This lets you quickly spin up pods with your configured environment. Runpod doesn't charge for ingress or egress, so I use Tailscale without Global Networking. Once a pod starts, it appears on my Tailscale network shortly after. Store your Tailscale auth key in a Runpod secret. I have struggled with building images for runpod.

Vultr is the next hosting provider I'll be evaluating.

Models

nomic-embed-text — embeddings for RAG

Databases

pgvector — vector search in Postgres
Apache AGE — graph database in Postgres

Image and audio

stable-diffusion-webui — SDXL image generation
ComfyUI — for agentic image pipelines (API only)
NeuTTS Air — text-to-speech

Agent infrastructure

MCP TypeScript SDK
MCP Go SDK
Temporal — agent run loops and workflows
Docker — isolated agent runners

Coding

Cursor — AI coding assistant

Backend / infra

PocketBase — backend in a binary

Notes and knowledge

Joplin — notes, integrated with a custom MCP server

Browser automation

Puppeteer — headless Chrome

On the radar

Unsloth — quantized LoRAs
SGLang — inference engine
LibreChat — multi-provider chat interface