How I Work: Rigor First, Build Second

March 30, 2026

In the past month, I've shipped a personal portfolio site, a geospatial optimization tool, and a hybrid AI retrieval system — solo, from problem brief to deployed product. The ceiling on what one person can build has gotten very high very fast.

AI tools made that possible, but that isn't the whole story. I've spent my career defining problems, designing solutions, aligning people, and measuring whether things work. That's still the most crucial part.

Planning Before Doing

It is natural to want to use a powerful tool immediately. That instinct is how you end up with a working prototype that solves the wrong problem, or a well-intentioned system that fails on edge cases.

I don't touch Claude Code until I've done five things:

Phase 1 — Problem Brief My initial one-page document that answers: who has the problem, what it is, why it matters, what success looks like, and (critically) what I'm explicitly not solving. The last part is where most projects fail. Scope creep isn't an execution problem. It's a planning problem.

Phase 2 — PRD I define my product vision, my solution, and prioritize user stories. MVP scope is defined as how I can prove the concept. I map out my future milestones that add meaningful value, not just features. I write an out-of-scope list (not necessarily forever, but at least until M3). Finally, I determine open questions that need answers before the first line of code gets written.

Phase 3 — Architecture Alignment Stack recommendation with rationale. System components and how they connect. Key data models. Build vs. buy decisions. And the exercise that does a ton of heavy lifting: defining two-way vs one-way doors. I force myself to be critical about system up front so that I build with a plan. This saves a ton of headaches downstream.

Phase 4 — Build Plan I define a sequenced list of Claude Code prompts, ordered by dependency. These are like my project epics. Each task defines what to build, what inputs are needed, what the expected output is, and how to verify it worked. This is where planning meets execution. This is also an exercise in efficiency. I refine prompts and tasks in Claude so that I'm not burning Claude Code tokens going in circles. I'm also figuring out which tasks I can perform on my own so that I'm writing and learning some code while (again) optimizing my token usage.

Phase 5 — Build and Iterate Define learnings, changes, and what the next execution sequence looks like. In some cases, this means actively debugging between tasks determined in the previous step. In other cases, this means revisiting my roadmap and re-prioritizing what I'd like to get done in the next phase.

This isn't waterfall. Multiple phases often happen in the same session. The point isn't ceremony, it's that the framework is the discipline that makes everything else work. Fast without rigor isn't fast. It's unfocused output.

What I've Built

victorcaceres.com

The personal portfolio site, the first step on any builder's journey. Built on Next.js 14 App Router, Tailwind CSS, and next-mdx-remote, deployed on Vercel. No CMS, content lives in typed TypeScript config files and markdown.

I wanted the site to feel serious but not stiff, analytically sharp but human, and rooted in the physical world. The Playfair Display headlines, dark forest green, and cream background were deliberate choices to achieve that. I'm not a UX designer, but I know what feels right.

NodeVantage

I wanted to start by building something within a familiar domain. I built a tool that optimizes warehouse facility placement using uploaded order history and carrier rate data. The output is an interactive map with cost and transit breakdowns by carrier and service level.

It is built on React + Vite + Tailwind on the frontend, Python + FastAPI on the backend, Supabase for persistence, deployed on Vercel and Render.

The decision that mattered most on NodeVantage was ensuring that outputs could be optimized for speed, cost, or sustainability. A 10% improvement on a $1M cost baseline is meaningfully different from a 10% improvement on a $100K baseline, so cost can't always be the deciding factor for these outputs. Getting that right required thinking through the scoring model before any code was written. Doing this before writing or committing any code made everything that came after much easier.

Pokédex AI: Hybrid RAG + Structured Query System

Just as pasta is a vessel for ingesting sauce, this project exists as a vessel for me to practice AI system design end-to-end. I wanted to see if I could build and evaluate a RAG system from data ingestion through retrieval, generation, and measurement. RAG (Retrieval Augmented Generation) is a fancy way of saying "AI looks something up before it answers." It doesn't just generate answers from what it was trained on, it pulls relevant information from a knowledge base and uses that to generate a response.

Let's get serious — what should you be able to ask a Pokédex?

Descriptive questions: "Tell me about Jigglypuff." These are answered well by pulling relevant text from a knowledge base and having Claude summarize it. That's RAG. It works fine here.
Computational questions: "What moves does Exeggutor learn in Red and Blue?" These aren't answered by finding relevant text. They require looking up a specific record in a database and returning an exact answer. If you run these through RAG, the system will guess confidently and get it wrong because it's doing fuzzy text matching against data that requires precise lookup.

A Pokédex needs routing logic. Before answering a question, it needs to know what kind of question it is. If it's descriptive, it goes to the text search pipeline. If it's computational, it goes to the database. If it needs both, it does both.

For example, Claude reads the question, "What moves does Exeggutor learn in Red and Blue?", and returns a structured JSON object that says, "this is a question about Exeggutor, in Red and Blue, asking about moves." This is enough information for the backend to know which database query to run. The data layer covers 1,025 Pokémon across all 9 current generations, 180 regional variants (which created a ton of edge case headaches), 833 unique moves, 561,654 move records across every game version group, complete evolution chains, and a type-effectiveness table.

Once the Pokédex was built, I needed to know if it worked (not just having it run without crashing). I wrote test questions (with the correct answers already known) and ran the system against them. It was basically a grading rubric for the AI. The result was 90% of questions answered correctly overall and 100% correct on the database lookup questions. The 10% that were deemed incorrect by the python script I ran were correct after reviewing the outputs manually. The evaluation was just being extremely strict with keywords.

I know that you can't stop at "it works in the demo." You need to measure whether it's right and test the edge cases. Adding measurement is what separates a proof of concept from something that can be put in front of real users. You prove it works, see exactly where it fails, and improve it systematically rather than guessing.

What Building This Way Has Taught Me

The hardest decisions are about data. Data architecture is rarely a two-way door. Getting it right before building is crucial to avoid costly throwaway work. Deciding to go hybrid instead of pure RAG defined the entire system architecture. None of these are decisions to make on the fly. They're product decisions that set the long-term vision.

Scope discipline is how you ship. Scope creep happens because product thinking breaks down. I've caught myself drifting away from my phased plans a few times during these projects. It's so easy when AI enables you to move this quickly. Keeping a "cool ideas" bank has helped me avoid chasing the shiny objects so I can stick with my plans and get things shipped.

Knowing the domain is how you catch bad outcomes. I caught the Pokédex system collapsing generation ranges for move availability and confidently stating that a move was available in generations where it had been cut. Automated testing didn't catch it. I caught it while manually reviewing the test results because I knew the answer was wrong. The person who knows the domain is the person who catches the subtle failures. That knowledge isn't a nice-to-have. It's load bearing.

AI is a multiplier, not a replacement. AI handles syntax and execution. It doesn't figure out what to build, why it matters, how the pieces connect, and what success looks like. This is human work. Always will be.

The Fine Print

The relevant skill in the AI era is the ability to hold a problem clearly in your head, decompose it into well-specified components, make good decisions under uncertainty, and know what good looks like.

Those are skills I've been building for years across operational and product roles. They transfer directly to building AI systems.

Victor Caceres is a senior technology and operations professional. He writes about what he's building and what he's thinking at victorcaceres.com.