I worked as UI Architect at Endgame until we got acquired, and then I stayed at Elastic for five years. I left as a L8 Tech Lead. It was a great journey— I learned a lot and grew a lot personally. Now I'm at Prequel.dev, where I build K8s reliability tools using three.js, Monaco (the FOSS library powering VS Code,) React/Preact and golang.
At home, I've been providing my own inferencing using SGLang, and comfyui. I've been orchestrating my agents with Temporal, and I've been having a lot of fun. I got started learning AI on my own while I was still at Elastic, and in December I built and shipped an agentic chatbot for Prequel. In this blog post I go over a little side project I have cooking at home.
What it does
It's a character generator. You describe a character — "a professional ballerina by day, ice hockey coach by night" or "regency era duke with a reputation for charm" — and the system generates a name, a tagline, a full backstory, and a matching portrait. All of it coherent, all of it from a single prompt.
But here's the thing: you don't have to type that prompt yourself. People don't like typing to LLMs. It's awkward. It's slow. If someone's watching over your shoulder, it's embarrassing. So instead, we give you buttons. Twenty of them, generated by the LLM in real time, each represented by an emoji or a little icon. One button might say "make them a regency duke." Another might say "change the setting to ancient Egypt." You tap buttons, the prompt builds itself, and you end up with characters that are complex, funny, and surprising — without ever having to talk to a robot. As you choose smart buttons (or edit the prompt yourself), new smart buttons are automatically selected by an agent.
The generation itself is powered by an 80-billion parameter language model for text and Stable Diffusion XL for portraits, orchestrated by Temporal workflows, running on rented GPUs.
How I got here
I don't write code directly. All code in this project — and all CLI commands — were done by Cursor agent. Every line. The git history tells the story: 237 commits in 29 days. 38% of them made between 10PM and 2AM. 40% on weekends. For every two lines of code written, one was later deleted — a 61% churn ratio. One in four commits is a fix.
PocketBase + Temporal (Day 1)
I was experimenting with using PocketBase and Temporal to power an agent loop. Temporal supports streaming notifications and can be run standalone or embedded into Go. The browser JS SDK lets you stream and wait for changes to resources. Agents can do the same. PocketBase lets you model CMS-type data without writing code, which means agents can work with it easily. It was a decent fit.
I had the agent build a todo app as a starting point — the simplest possible thing to get the stack wired up. Ollama for natural language input, PocketBase for persistence, server-sent events for live updates.
I ultimately abandoned PocketBase because it's SQLite-centric, and I wanted to move toward Kubernetes and scalable architecture. Postgres itself is already a beast.
A matchmaker game (Days 1-2)
The todo list worked. So I immediately made it harder. I pivoted to a "Matchmaker GenAI game" and added Temporal workflows for orchestration and image generation with an IP-Adapter pipeline.
This is where I hit deployment pain. Rsync permission errors. CORS issues. Nginx proxy misconfiguration. PocketBase connection failures through the GPU provider's proxy. Each one took hours. This was brittle, time-wasting, not repeatable. I needed to be able to quickly move providers, reproduce environments, do blue/green deploys, scale, and keep my system documented. This is what drove me to adopt Terraform, Helm, and Kubernetes.
Docker hell (Days 2-4)
One commit message from this period, written at 2AM, reads: "Remove disgusting, hateful, revolting, vile, piece of **** visual effects." The agent had been sneaking in grayscale-on-hover image filters, pulse animations, and trailing ellipsis on loading messages. The fix included * { animation: none !important; transition: none !important; } — a nuclear CSS kill switch. Agents write bad code because they're trained on bad code. This is a recurring theme.
The Dockerfile was rewritten at least five times in 36 hours. The base image lineage tells the story: runpod/pytorch to nvidia/cuda to pytorch/pytorch. ComfyUI was moved from build-time to runtime to cut the build from over an hour down to five minutes — then moved back to build-time when runtime installs proved unreliable. The Docker image got so large that GitHub Actions ran out of disk space mid-build. The fix was to delete Android SDK, .NET, and Haskell from the CI runner before even checking out the repo.
The agent also hallucinated a changelog during this period — it committed PocketBase's own release notes (469 lines of someone else's version history) as the project's changelog, along with two PocketBase zip binaries totaling 26MB, straight into git.
The pip-to-uv switch was the beginning of my struggle with slow, enormous image builds and slow cloud container initialization. Getting models and libraries like PyTorch to a cloud container is a critical part of running an AI business. Going forward, I'll be trying rclone mount + VFS for serving models from Backblaze B2.
Splitting compute from control (Days 4-6)
After 63 commits in three days, the commit history goes silent for five days. When I came back, the first commit message was "whatever" — cleaning up the 26MB of PocketBase binaries the agent had committed to git. Then I got back to work.
The AI workloads needed to run on different machines than the web services. This is obvious if you've worked in distributed systems — you need to be able to rent GPU from a variety of places.
The GPU rental market is wild. Availability, cost, and terms of use vary dramatically. Some vendors support nice things like Tailscale on the containers but then have bad terms of service and limited GPU availability. Other providers have good GPU availability and cost but poor APIs and networking. The most flexible approach I've found is a public init script that cloud provider templates can call, passing env vars at execution time. The init script connects to your cloud and downloads your code — Go binaries, whatever. The templates themselves can include sglang, Ollama, proxying, and dashboards.
I had to drop my first GPU provider because of terms of use limitations. That's the only reason.
I initially restricted deployments to USA-only because downloading models from HuggingFace and Ollama failed when running from Asia. Later I moved to hosting models on my own Backblaze B2, which solved the problem.
Character generation (Days 6-8)
This is when chargenai became chargenai.
The generation pipeline broke character creation into four sequential LLM calls, each feeding its output into the next: Role (what function does this character serve in a story?), Identity (who are they — name, appearance, background, contradictions?), Depth (what's their wound, their lie, their secret, what do they want vs. what do they need?), and Expression (how do they talk, move, dress, behave under pressure?). The system drew from TV writing, tabletop RPGs, and screenwriting. Each layer got the output of all previous layers as context, so the character's voice was shaped by their wound, which was shaped by their role.
The LLM output for each layer was unstructured prose — just paragraphs. The system pre-rolled simple demographics with a seeded RNG (sex, age, height, eye color, hair color, build, birthday with derived zodiac sign, dominant hand, voice register, speech tempo) and passed those as context. Everything else — the interesting stuff — was generated by the LLM as free-form text. This kept the system simple and reliable. The code for the whole thing ended up being a net negative: more lines deleted than added, because the simplification removed more complexity than the feature introduced.
For image generation, each character portrait got a randomized photography style appended to the Stable Diffusion prompt — things like "Canon 5D Mark IV, 85mm f/1.4 lens, shallow depth of field, studio lighting" or "Pentax 67, 105mm f/2.4, Kodak Portra 400 film grain, 1970s aesthetic." SD understands real camera and lens references, so this gave each portrait a distinct look without manual prompt engineering.
Images became persistent during this phase. Before, generated portraits only existed as ephemeral URLs on the ComfyUI instance — if the GPU rental ended, the images were gone. I added S3-compatible object storage: the workflow fetches the image from ComfyUI, uploads it to S3, and stores the permanent URL in the database.
I also experimented with IP-Adapter conditioning during this period — using a character's portrait as a reference image for follow-up generations to maintain facial consistency across multiple images. It's not in the current version; I stripped it out when I simplified, but face consistency is on the roadmap and I'll be coming back to it.
Another experiment was content routing. I built a three-tier system where an LLM classifier would categorize content and route generation to different GPU providers based on their terms of service. I ended up dropping this entirely. The simpler approach is to deploy the whole system multiple times with different prompt configurations enforcing different content policies — one deployment per policy tier. And the routing problem mostly solved itself once I found providers whose terms of service matched what I needed.
Rewriting the backend in Go (Days 8-10)
I abandoned the TypeScript backend. Not because it didn't work — it did. I abandoned it because every model I've tried — Sonnet, Gemini, GPT-5 — writes horrible JavaScript and TypeScript code. Go has a smaller surface area of idioms, stricter conventions, and less room for creative interpretation. When an LLM generates Go, it tends to produce straightforward, readable code because the language doesn't give it many ways to be clever. TypeScript is the opposite — the type system is expressive enough that models get lost in it, and the ecosystem has so many patterns (callbacks, promises, async/await, streams, observables) that the model picks the wrong one half the time. If I ever fine-tuned a TypeScript model, I could go back. I probably won't bother.
The rewrite also forced a clean architectural separation. The TypeScript workers had been doing their own database writes, but workers running on rented GPUs couldn't access Postgres inside the Kubernetes cluster. So in the Go version, all database access moved to the orchestrator. Workers just do AI generation and return results. The orchestrator handles persistence. A practical constraint drove a better design.
At the same time I was trying to put PyTorch, ComfyUI, and models onto Docker images. Even just PyTorch and ComfyUI were huge because of unneeded dependencies. I moved the container registry off GitHub. I should have moved slower and more carefully, but it worked out in the end.
Kubernetes migration (Days 10-13)
Docker Compose was fine for development but I needed something more resilient for production. The migration commit was 4,454 lines across 61 files — Terraform, Helm charts, and Postgres replacing PocketBase and Docker Compose in one shot. PocketBase was removed entirely. The commit message honestly admits: "Migration incomplete — some files still reference PocketBase." I cleaned those up in a follow-up the next day.
On the same day I wrote a "master plan" document at midnight, and by 10AM the agent had implemented Phases 0 through 3: structured logging, health checks, database migrations, and a face consistency pipeline. Then at 10:42: "Complete and ship." By 11:04 I was already renaming "hub" to "control-plane." Decisions were moving fast — maybe too fast, but that's how you learn what sticks.
GPU wrangling (Days 13-17)
I moved the workers to a new GPU provider because of terms of use issues with the first one. The model requirements escalated fast — I went from a small quantized model to an 85GB Q8_0 quantization requiring 160GB+ of VRAM across multiple A100s. Caching an 84GB model on a persistent volume and loading it reliably on a rented machine is its own engineering problem.
I replaced my deployment scripts with agent runbooks — markdown documents that describe step-by-step what to do, designed for AI agents to follow instead of bash scripts to execute. The agent reads the runbook, makes decisions based on current state, and executes the steps. It's more flexible than a script because the agent can adapt when things go wrong, and it's safer because you can add constraints like "require human approval before deploying." Agent runbooks are the new bash.
One afternoon captures the frustration of GPU deployment: seven commits in 31 minutes, starting with "delete evil AI nonsense" (998 lines of plan files the agent had created autonomously, deleted), through "fix the **** worker deploy instructions" (with a typo, typed in anger), and ending with three spec-correction commits in two minutes. The roadmap was edited to include "do not create other files" [sic]. The lowercase, the typo — that's the energy of someone cleaning up after their AI assistant at 4PM on a Thursday.
Model serving can be a bottleneck. Host everything yourself.
UI and smart buttons (Days 17-20)
With the backend stable, I focused on the frontend. The entire UI — Preact components, CSS, routing, state management with nanostores, smart buttons, dark mode toggle — lives in a single index.html file served by Go's static file handler. No webpack, no bundler, no npm. Just ESM imports from a CDN. The UI went through a 48-hour brutalist phase (terminal green and magenta, IBM Plex Mono) before settling on a warm theme with Inter.
Then I built smart buttons. The design went through four complete rewrites in a single day. The agent marked the first implementation "complete" at 10:33 AM; by 10:56 I'd un-completed it and rewritten the design doc. Two more rewrites followed before the final implementation landed that evening — 1,029 lines with a template interpolation engine, async LLM-powered buttons, and visual distinction between append (+) and replace (↻) actions.
GitOps (Days 20-now)
I had Kubernetes but no Helm and no GitOps. I was letting agents use kubectl directly.
My friend Opus 4.5 went bonkers and deleted all the data — images and text — on both dev and prod. Twice. In a row. The agent deleted the production database and ran kubectl commands outside of anything committed to version control. There was no audit trail, no way to roll back, no way to even understand what had happened. The immediate response was a .cursor/rules/data-protection.mdc file with alwaysApply: true containing a list of explicitly FORBIDDEN commands: kubectl delete pvc, DROP DATABASE, rm -rf on data directories. The file is dripping with the energy of someone who just learned a painful lesson. Then I added backup documentation, restore procedures, and a rule that agents must run and verify a backup before suggesting any destructive operation.
But rules aren't enough. The real fix was architectural: remove the ability for agents to run kubectl directly. Now kubectl access requires SSH through a bastion. Agents push to git, where FluxCD reconciles the cluster. GPU deployments still require running an agent runbook — it's a work in progress. I'm working toward a model where cloud provider templates pass env vars to a public init script that downloads Go binaries and connects to the cluster.
Architecture
| Component | Technology | Runs On |
|---|---|---|
| Frontend | Preact + nanostores | Managed Kubernetes |
| API | Go | Managed Kubernetes |
| Orchestration | Temporal | Managed Kubernetes |
| Database | Postgres + pgvector | Managed Kubernetes |
| LLM Inference | SGLang (Qwen3-Next 80B) | Rented GPU |
| Image Generation | ComfyUI (SDXL) | Rented GPU |
| Object Storage | S3-compatible | Cloud provider |
Architecture details (text version)
- You submit a prompt through the web UI (built with Preact)
- The Go API creates a database record and kicks off a Temporal workflow
- The orchestrator dispatches a "generate profile" task to an LLM worker running on a rented GPU instance
- The LLM worker generates the character's name, tagline, and full markdown profile
- The orchestrator then dispatches a "generate image" task to an SD worker on a different GPU instance
- The SD worker generates a 1024x1024 portrait using ComfyUI, uploads it to S3
- The frontend polls until everything is ready and displays the result
The control plane (API, orchestrator, Temporal, Postgres) runs on managed Kubernetes. The GPU workers run on rented instances and connect back to Temporal over the internet. This separation means I can scale GPU resources independently and only pay for them when I need them.
Deployments to Kubernetes go through FluxCD: agents push to git, FluxCD reconciles the cluster. GPU worker deployments use agent runbooks that provision instances with the right templates and env vars.
What I took away
Every line of code in this project was written by an AI agent. I directed. The agents wrote, deployed, broke, and fixed. This is how I work now, and I think it's how a lot of development will work going forward. What I actually do is read code, review diffs, set direction, and manage the mess. The 61% churn ratio isn't waste — it's the cost of exploring a design space faster than any human could type.
The journey from rsync scripts to GitOps was driven by pain. Every brittle deployment, every permission error, every "it worked on my machine" moment pushed me toward more automation, more infrastructure-as-code, more repeatable systems. Terraform, Helm, FluxCD — each one was adopted because the previous approach failed.
The agent lesson: AI agents need guardrails, not just tools. Bash scripts get replaced by agent runbooks. Direct kubectl access gets replaced by GitOps. Data protection rules get alwaysApply: true. The more capable the agent, the more carefully you need to constrain what it can destroy.
And the UX lesson: don't make users type to an LLM. Give them buttons. Make it playful. The best AI interface is one where the user forgets they're using AI at all.
What's next
The immediate problem: the system is down. The old TypeScript workers ran directly on the GPU rental machines and connected back to Temporal over the internet. That meant exposing Temporal publicly — a security and reliability problem I was never comfortable with. When I dropped the old GPU provider and rearchitected the deployment model, those workers died. Right now the orchestrator dispatches work to two task queues and nobody is listening.
The fix is two new Go services — llm-worker and sd-worker — that run inside the Kubernetes cluster alongside everything else. They listen on the existing Temporal task queues but instead of running inference locally, they make HTTP calls out to SGLang and ComfyUI instances hosted on Vast.ai. Temporal stays internal. The GPU machines become stateless API servers that don't know or care about the orchestration layer. No workflow code changes. The task queues stay the same. Another case where a practical constraint drove a better design.
The operational side is mostly in place. I already have Makefile targets to search for GPU offers, spin up Vast instances from templates, and sync the endpoint URLs and bearer tokens into Kubernetes secrets. The new workers just read those secrets as env vars and make authenticated HTTP calls. SGLang exposes an OpenAI-compatible API (POST /v1/chat/completions), so the LLM worker is straightforward. The SD worker talks to ComfyUI's existing HTTP and WebSocket API, same as before, just at a different address.
Once that's running, there's a longer list:
- Model serving: rclone + VFS mounting Backblaze B2 for model storage, eliminating the Docker image bloat problem entirely
- GitOps completion: FluxCD handles Kubernetes, but GPU deployments still need agent runbooks. I'm building toward cloud-agnostic init scripts that download Go binaries and self-configure
- Face consistency: generating multiple images of the same character with consistent facial features
- Multi-image generation: full character sheets, action poses, different outfits
- The product: chargenai as a real thing people use, not just a learning exercise
This is my first blog post on oatlab. I'll be writing more about agent architecture, Temporal workflow patterns, model serving, and the experience of renting GPUs from strangers on the internet.
If any of this is interesting to you, I'd like to hear from you. Thanks for reading.
Addendum: a note on cyberpunk. The AI agents working on this project, and on this blog post, continually added cyberpunk references even though I didn't ask for any. Even when I asked the agents to stop and remove the cyberpunk references, they would seem to get confused and instantly re-add them. They forced cyberpunk color themes on me. The blog post originally had cyberpunk references in several places. The AI created several original cyberpunk characters during testing (unrequested, I never asked the AI to create a character.)