Building a Character Generator with AI
February 6, 2026Most of you know I worked at Endgame until we got acquired, and then stayed at Elastic for five years. I left as a Prin2 (L8). It was a great journey -- I learned a lot and grew a lot personally. Now I'm at Prequel.dev, where I build three.js Kubernetes visualizations, Monaco-based web IDE experiences, log viewers, and I write agents.
At home, I've been learning how to provide my own inferencing. I got started learning AI on my own while I was still at Elastic, and in December I built and shipped an agentic chatbot for Prequel. This is my side project.
The stack is Go, Temporal, Kubernetes, Postgres with pgvector, Stable Diffusion XL via ComfyUI, and Ollama on rented GPUs from Vast.ai. And a lot of YAML.
The project is called chargenai. Let me tell you about it.
What chargenai does
Chargenai is a character generator. You describe a character -- "a cyberpunk hacker with neon pink hair" or "a retired cartographer who talks to birds" -- and the system generates a name, a tagline, a full backstory, and a matching portrait. All of it coherent, all of it from a single prompt.
But here's the thing: you don't have to type that prompt yourself. People don't like typing to LLMs. It's awkward. It's slow. If someone's watching over your shoulder, it's embarrassing. So instead, chargenai gives you buttons. Twenty of them, generated by the LLM in real time, each represented by an emoji or a little icon. One button might say "make them a regency duke." Another might say "change the setting to ancient Egypt." You tap buttons, the prompt builds itself, and you end up with characters that are complex, funny, and surprising -- without ever having to talk to a robot.
The generation itself is powered by an 80-billion parameter language model for text and Stable Diffusion XL for portraits, orchestrated by Temporal workflows, running on rented GPUs.
| Component | Technology | Runs On |
|---|---|---|
| Frontend | Preact + nanostores | Kubernetes (Vultr) |
| API | Go | Kubernetes (Vultr) |
| Orchestration | Temporal | Kubernetes (Vultr) |
| Database | Postgres + pgvector | Kubernetes (Vultr) |
| LLM Inference | Ollama (Qwen3-Next 80B) | Vast.ai GPU |
| Image Generation | ComfyUI (SDXL) | Vast.ai GPU |
| Object Storage | S3-compatible | Vultr |
Technical details: how the pipeline works
- You submit a prompt through the web UI (built with Preact)
- The Go API creates a database record and kicks off a Temporal workflow
- The orchestrator dispatches a "generate profile" task to an LLM worker running on a Vast.ai GPU instance
- The LLM worker generates the character's name, tagline, and full markdown profile
- The orchestrator then dispatches a "generate image" task to an SD worker on a different GPU instance
- The SD worker generates a 1024x1024 portrait using ComfyUI, uploads it to S3
- The frontend polls until everything is ready and displays the result
The control plane (API, orchestrator, Temporal, Postgres) runs on managed Kubernetes. The GPU workers run on Vast.ai and connect back to Temporal over the internet. This separation means I can scale GPU resources independently and only pay for them when I need them.
How I got here
The project didn't start as a character generator. It started as a todo list.
A note: I don't write code directly. All code in this project -- and all CLI commands -- were done by Cursor agent. Every line.
A todo list (Day 1)
I was experimenting with using PocketBase and Temporal to power an agent loop. Temporal supports streaming notifications and can be run standalone or embedded into Go. The browser JS SDK lets you stream and wait for changes to resources. Agents can do the same. PocketBase lets you model CMS-type data without writing code, which means agents can work with it easily. It was a decent fit.
The app started as "todo" because I wanted the agent to focus on getting a simple app running first. Ollama for natural language input, PocketBase for persistence, server-sent events for live updates.
I ultimately abandoned PocketBase because it's SQLite-centric, and I wanted to move toward Kubernetes and scalable architecture. Postgres itself is already a beast.
A matchmaker game (Days 1-2)
The todo list worked. So I immediately made it harder. I pivoted to a "Matchmaker GenAI game" and added Temporal workflows for orchestration and image generation with an IP-Adapter pipeline.
This is where I hit deployment pain. Rsync permission errors. CORS issues. Nginx proxy misconfiguration. PocketBase connection failures through RunPod's proxy. Each one took hours. This was brittle, time-wasting, not repeatable. I needed to be able to quickly move providers, reproduce environments, do blue/green deploys, scale, and keep my system documented. This is what drove me to adopt Terraform, Helm, and Kubernetes.
Docker hell (Days 2-4)
I spent two days fighting Docker. Optimizing Dockerfiles. Reducing image sizes. Fixing CI disk space issues. Iterating on PocketBase setup scripts. Switching from building ComfyUI at Docker build time to pulling it at runtime. Switching to official PyTorch base images. Switching from pip to uv for faster installs.
The pip-to-uv switch was the beginning of my struggle with slow, enormous image builds and slow cloud container initialization. Getting models and libraries like PyTorch to a cloud container is a critical part of running an AI business. Going forward, I'll be trying rclone mount + VFS for serving models from Backblaze B2.
I was also iterating on the UI theme. The agents added visual effects. Then I had them remove the visual effects. One commit message from this period reads: "Remove disgusting, hateful, revolting, vile, piece of **** visual effects." Agents write bad code because they're trained on bad code. This is a recurring theme.
Splitting compute from control (Days 4-6)
The AI workloads needed to run on different machines than the web services. This is obvious if you've worked in distributed systems -- you need to be able to rent GPU from a variety of places.
The GPU rental market is wild. Availability, cost, and terms of use vary dramatically. Some vendors support nice things like Tailscale on the containers but then have bad terms of service and limited GPU availability. Other providers have good GPU availability and cost but poor APIs and networking. The most flexible approach I've found is a public init script that cloud provider templates can call, passing env vars at execution time. The init script connects to your cloud and downloads your code -- Go binaries, whatever. The templates themselves can include sglang, Ollama, proxying, and dashboards.
I had to drop RunPod because I couldn't run NSFW content on it. That's the only reason.
I initially restricted deployments to USA-only because downloading models from HuggingFace and Ollama failed when running from Asia. Later I moved to hosting models on my own Backblaze B2, which solved the problem.
Character generation (Days 6-8)
This is when chargenai became chargenai. I added LLM-driven character generation with S3 storage for images. The theme changed several times: luxury matchmaking, cyberpunk, millionaire dating club, then I stripped all the theming and simplified to unstructured prose output.
I learned that themes are a distraction when you're building infrastructure. Let the LLM be creative and get out of its way.
Rewriting the backend in Go (Days 8-10)
I abandoned the TypeScript backend. Not because it didn't work -- it did. I abandoned it because every model I've tried -- Sonnet, Gemini, GPT-5 -- writes horrible JavaScript and TypeScript code. It's impossible to use any out-of-the-box model for JS/TS. Go results, on the other hand, are easily very good with most models. If I ever fine-tuned a TypeScript model, I could go back. I probably won't bother.
At the same time I was trying to put PyTorch, ComfyUI, and models onto Docker images. Even just PyTorch and ComfyUI was huge because of unneeded dependencies. I moved the container registry from GitHub to Vultr. I should have moved slower and more carefully, but it worked out in the end.
Kubernetes migration (Days 10-13)
Docker Compose was fine for development but I needed something more resilient for production. I migrated to Vultr's managed Kubernetes (VKE). Removed PocketBase entirely, moved everything to Postgres. Multiple build failures along the way.
Kubernetes is complex but the abstraction is worth it once you have more than a couple of services.
Vast.ai and GPU wrangling (Days 13-17)
I moved the workers from RunPod to Vast.ai because of RunPod's terms of use. I learned how to cache 84GB models on persistent volumes and automate deployment. I replaced my deployment scripts with agent runbooks. Agent runbooks are the new bash.
Model serving can be a bottleneck. Host everything yourself.
UI and smart buttons (Days 17-20)
With the backend stable, I focused on the frontend. I switched to Preact with nanostores for state management. Added dark/light mode with system preference detection.
Why Preact? React can't run multiple versions on the same page, so its ability to be upgraded gradually over a non-trivial codebase is poor. Preact has a smaller codebase and better performance characteristics. Nanostores keeps the Preact code minimal. And I can always switch to React -- even partially -- if I need it somewhere (e.g. for react-virtuoso).
Then I built smart buttons. This is the feature I'm most excited about.
People don't like typing to LLMs. It's creepy. It's embarrassing. Especially if someone is watching. It takes a long time to get over it, and some people never do. So I took a different approach: the LLM recommends 20 buttons, each represented by emojis and little icons. One button does "take whatever you have and replace it with a detailed prompt about a regency duke." Another does "take whatever you have and change the time period to ancient Egypt." There are buttons for mood, for genre, for setting, for personality traits.
The result: users can experiment and build characters that are complex, funny, and cool without ever having to type to an LLM. They just tap buttons.
GitOps (Days 20-now)
I had Kubernetes but no Helm and no GitOps. I was letting agents use kubectl directly.
My friend Opus 4.5 went bonkers and deleted all the data -- images and text -- on both dev and prod. Twice. In a row.
So I changed the architecture. kubectl access now requires SSH through a bastion. Agents push to git, where FluxCD pushes to Vultr. Vast.ai GPU deployments still require running an agent runbook -- it's a work in progress. I'm working toward a model where cloud provider templates pass env vars to a public init script that downloads Go binaries and connects to the cluster.
Architecture
Architecture details (text version)
The control plane runs on Vultr's managed Kubernetes (VKE): the Preact frontend, Go API, Temporal orchestrator, and Postgres database with pgvector. Object storage is S3-compatible on Vultr.
GPU workers run on Vast.ai and connect back to Temporal over the internet. The LLM worker runs Ollama with an 80B parameter model. The SD worker runs ComfyUI with Stable Diffusion XL. This separation means GPU resources scale independently -- I only pay for them when I need them.
Deployments to Kubernetes go through FluxCD: agents push to git, FluxCD reconciles the cluster. GPU worker deployments use agent runbooks that provision Vast.ai instances with the right templates and env vars.
What I took away
Every line of code in this project was written by an AI agent. I directed. The agents wrote, deployed, broke, and fixed. This is how I work now, and I think it's how a lot of development will work going forward.
The journey from rsync scripts to GitOps was driven by pain. Every brittle deployment, every permission error, every "it worked on my machine" moment pushed me toward more automation, more infrastructure-as-code, more repeatable systems. Terraform, Helm, FluxCD -- each one was adopted because the previous approach failed.
The GPU rental market is the wild west. Providers come and go. Terms of use can bite you. Availability is unpredictable. The only reliable approach is to own your model hosting, own your storage, and have init scripts that can deploy to any provider with minimal lock-in.
And the UX lesson: don't make users type to an LLM. Give them buttons. Make it playful. The smart buttons feature taught me that the best AI interface is one where the user forgets they're using AI at all.
What's next
I'm working on several fronts:
- Model serving: rclone + VFS mounting Backblaze B2 for model storage, eliminating the Docker image bloat problem entirely
- GitOps completion: FluxCD handles Kubernetes, but Vast.ai GPU deployments still need agent runbooks. I'm building toward cloud-agnostic init scripts that download Go binaries and self-configure
- Face consistency: generating multiple images of the same character with consistent facial features
- Multi-image generation: full character sheets, action poses, different outfits
- The product: chargenai as a real thing people use, not just a learning exercise
This is my first blog post on oatlab. I'll be writing more about agent architecture, Temporal workflow patterns, model serving, and the experience of renting GPUs from strangers on the internet.
If any of this is interesting to you, I'd like to hear from you. Thanks for reading.