← Back to BlogArchitecture Articles
21 posts
- Background Jobs For Indie Developers in 2026: When You Need A Queue, When You Do Not, And What I Actually Use 5/16/2026 Every job queue tutorial is written for companies running ten thousand jobs a second. As a solo developer you do not need Sidekiq Pro and a Kubernetes cluster to send a welcome email. Here is the actual background job setup that earns its place for indie projects in 2026, and the day a Stripe webhook taught me why setTimeout was never going to be enough.
- Rate Limiting Your SaaS API in 2026: The AI Scraper Problem, Token Buckets, and the Layered Defense That Actually Works 5/16/2026 A single AI agent scraped one of my endpoints twenty-three thousand times in a night and turned a $40 OpenAI budget into a $312 invoice before I woke up. Most rate limiting tutorials are written for traffic that pretends to be polite. Here is what actually defending a SaaS API looks like in 2026, with the AI bot wave already through the door.
- Feature Flags For Solo Developers in 2026: When You Need Them, When You Do Not, And What I Actually Use 5/15/2026 Every feature flag tool is pitched at companies with a hundred engineers. As a solo developer you do not need a $200 a month LaunchDarkly seat to ship safely. Here is the actual feature flag setup that earns its place for small teams and indie projects in 2026, and the moment you finally outgrow a config file.
- Zero-Downtime Postgres Migrations: The Mistakes That Locked My Production Database 5/15/2026 A single ALTER TABLE on a 40 million row table can freeze your app for forty minutes. Most migration tutorials skip the part where the database is also serving live traffic. Here is what shipping schema changes to a real production Postgres in 2026 actually looks like, including the operations I now refuse to run during business hours.
- Server-Sent Events vs WebSockets in 2026: When Each One Actually Wins 5/14/2026 WebSockets get reached for by reflex. Half the time the right answer is the boring one nobody talks about: Server-Sent Events. Here is the actual decision framework for real-time features in 2026, and the cost both choices hide from you.
- Stripe Webhooks in Production: Idempotency, Retries, and the Mistakes That Cost Me Real Money 5/14/2026 Stripe webhooks look like a five-minute integration in the docs. Then a customer is double-charged, a subscription event arrives out of order, your handler 500s for an hour, and Stripe quietly retries the same event 47 times. Here is what shipping webhooks to real billing flows actually looks like in 2026.
- Passkeys in Production: What I Wish I Knew Before Replacing Passwords 5/8/2026 Passkeys look simple in the WebAuthn demo. They get strange the moment you handle a user with two laptops, a stolen phone, a Bitwarden subscription, and a corporate device that blocks iCloud Keychain. Here is what shipping passkeys to real users actually looks like in 2026.
- TypeScript at Scale: Why Your tsc Takes 90 Seconds and How to Fix It 5/8/2026 Your build is slow. Your editor lags when you hover a type. CI spends more time type-checking than running tests. None of this is unavoidable. Most of the cost is a small number of patterns that are easy to write and expensive to compile. Here is how to find them and what to do.
- JavaScript Async Lifetimes: The Leak You Have and Probably Do Not Know About 5/7/2026 Promise.all does not cancel sibling tasks when one fails. Your async code is likely leaking database connections, keeping fetches alive after unmount, and holding ports open through process exits. ES2026 finally gives you the primitives to fix this without a library. Here is how.
- Embedding Models And Reranking In Production 2026: Picking The Pair That Actually Lifts Retrieval Quality 5/6/2026 The embedding model decides what your retriever can find. The reranker decides what makes it to the LLM. By 2026 the production patterns for picking and pairing these two have stabilized, and most teams are still leaving real recall on the table because they treat embeddings as a commodity and skip reranking entirely. Here is what actually works, and what to stop doing.
- RAG Chunking Strategies In Production 2026: What Actually Survives Real Documents And Real Queries 5/6/2026 Most RAG systems do not fail at the LLM. They fail at the chunker. By 2026 the patterns for splitting documents into retrievable units have matured into a small set of choices that consistently outperform the default 512-token slicer everybody starts with. Here is what those choices are, where each one breaks, and how to pick the right one without rebuilding the index every Friday.
- AI Guardrails And Output Validation In Production 2026: What Actually Catches Bad Outputs Before Users Do 5/5/2026 Most teams discover their guardrails are missing the moment a screenshot of their AI saying something stupid hits the timeline. By 2026 the patterns for catching bad LLM outputs before they ship to users have settled into something concrete: layered validators, fast cheap checks first, expensive ones only when needed, and a clear policy for what to do when validation fails. Here is what that looks like in real systems.
- Small Language Models In Production 2026: Where SLMs Beat Frontier Models, And Where They Quietly Fail 5/5/2026 The 8B-parameter model that runs on a single GPU is good enough for more of your pipeline than you think, and worse than you think for the parts you keep wanting to give it. By 2026 the production patterns for using small language models alongside frontier ones have settled into a clear shape: route by task, not by vibe, and stop paying for capabilities you are not using. Here is how that actually plays out.
- Designing Tools For AI Agents In 2026: Schemas, Descriptions, And The Pitfalls That Make LLMs Fail Silently 5/4/2026 The bug in most agents is not the model. It is the tools you handed it. Vague descriptions, overlapping responsibilities, and schemas that look fine on paper produce agents that confidently call the wrong function with the wrong arguments. Here is how to design tools the model can actually use, drawn from the production patterns that have stabilized by 2026.
- Multi-Modal AI Agents In Production: Vision, Audio, And The Glue That Actually Works In 2026 5/4/2026 Shipping a multi-modal agent is not the same as adding an image input to your chat. The teams running real vision and audio agents in production by 2026 have discovered the same set of sharp edges: tokenization surprises, latency that explodes on the second modality, evaluation that needs new shapes, and cost curves that look nothing like text. Here is what that actually looks like once it is in front of users.
- AI Agent Reliability Engineering in 2026: SLOs, Error Budgets, And Failure Modes That Actually Matter 5/2/2026 Treating an AI agent like a normal service is how you get a 95 percent uptime number that hides a 60 percent task success rate. The teams running real agent products in 2026 measure reliability differently, set SLOs on outcomes instead of HTTP codes, and have rehearsed every failure mode the agent introduces. Here is what that looks like.
- The LLM Router Pattern in 2026: Model Routing, Fallbacks, and Cost Control That Actually Works 5/1/2026 Picking one model for your whole app is the bug. The teams shipping the best AI products in 2026 route every request to the cheapest model that can handle it, fail over when providers blink, and treat model selection as part of the app, not part of the prompt. Here is how to do it without making a mess.
- Sandboxing AI-Generated Code: E2B vs Vercel Sandbox vs Modal vs Daytona in 2026 5/1/2026 Letting an LLM write code is the easy part. Letting it run that code on a machine that touches your data is the part that should keep you up at night. Here is how the production sandboxes compare in 2026, and what actually matters when you pick one.
- AI Agent Frameworks in 2026: LangGraph vs Mastra vs Vercel AI SDK vs OpenAI Agents SDK vs Pydantic AI 4/30/2026 There are too many agent frameworks and most of the comparisons online are useless. Here is what I have actually shipped on each, where they shine, and where they will quietly cost you a weekend you did not budget for.
- Generative UI in 2026: What Actually Works for Developers 4/30/2026 Chat is a terrible interface for most things AI agents do. Generative UI is finally good enough to ship, and the patterns that work are not the ones the demos show. Here is what I have learned shipping AI features that render real components instead of walls of text.
- AI Voice Agents in Production: What Actually Works in 2026 4/29/2026 Voice agents went from "cute demo" to "real product surface" this year. Most of them still feel terrible. Here is what separates the voice AI experiences people actually use from the ones they hang up on, written from the trenches.