I Spent $500 in a Week on AI-Assisted Coding. Here's What I Learned About Not Doing That.
Large contexts, MAX mode, Opus 4.6 Thinking — the most powerful AI coding tools are also the most expensive. After a brutal invoice, I figured out how to get 90% of the performance at 20% of the cost.
The invoice that changed my behavior
I opened my billing dashboard on a Monday morning and stared at the number. Five hundred and twelve dollars. One week. Not a month — a week.
I hadn't done anything unusual. I was building a feature, debugging an integration, refactoring some tests. Normal engineering work. But I'd been doing it with every setting cranked to maximum: Opus 4.6 Thinking in MAX mode, full repository context, long multi-turn conversations that accumulated thousands of tokens per message.
Each individual request felt harmless. A quick "refactor this module" here, a "debug this error with full context" there. But at the token rates for frontier thinking models, "harmless" adds up fast. I was essentially running a small GPU cluster every time I asked a question.
That invoice was the beginning of a very deliberate process to understand where the money was going — and more importantly, where it didn't need to go.
Where the money actually goes
AI coding costs aren't evenly distributed. After tracking my usage for two weeks, the breakdown was clear:
Context size is the multiplier. Every request sends your conversation history, attached files, and system prompts to the model. A fresh conversation with a small question might use 2,000 tokens. A long conversation with 15 files attached and a multi-turn debugging session can easily hit 100,000+ tokens per request. That's a 50x cost difference for a single message.
Thinking models compound the problem. Models like Opus 4.6 Thinking don't just read your input — they generate an internal chain-of-thought before producing the visible response. That reasoning chain can be 3-5x the length of the final answer, and you're paying for every token of it. A response that looks like 500 tokens might have cost 3,000 tokens behind the scenes.
MAX mode is the premium tier. Running a thinking model in MAX mode removes the output cap and gives you the full reasoning depth. It's extraordinarily capable — and extraordinarily expensive. A single complex request in MAX mode can cost more than an entire day of normal usage.
Here's a rough mental model of the cost tiers:
| Configuration | Relative Cost | When It Shines |
|---|---|---|
| Fast model, small context | 1x | Quick questions, simple edits |
| Standard model, medium context | 5-10x | Feature implementation, code review |
| Thinking model, large context | 30-50x | Complex debugging, architecture decisions |
| Thinking model, MAX mode, full repo context | 100-200x | Multi-file refactors, deep analysis |
That bottom row is where my $500 went. I was using the 200x configuration for tasks that a 5x configuration would have handled just as well.
The cost-aware workflow
After the invoice shock, I developed a tiered approach. The core insight: match the model to the task, not the other way around. Using the most powerful model for everything is like taking a helicopter to the grocery store. It works, but you're paying for capabilities you don't need.
Tier 1: Fast model for mechanical tasks
Most of what we do with AI coding assistants is mechanical. Renaming a variable across a file. Generating a type from a JSON sample. Writing a unit test for a pure function. Adding error handling to a try/catch block.
These tasks don't require reasoning. They require pattern matching and code generation — exactly what fast, cheap models excel at. I switched to using the fastest available model for anything that fits this description:
- Boilerplate generation
- Simple refactors (rename, extract function, inline variable)
- Writing tests for straightforward functions
- Generating types, interfaces, or schemas
- Formatting or restructuring code
- Documentation and comments
This alone cut my daily cost by 60%. The output quality for these tasks is virtually identical between a fast model and a frontier thinking model.
Tier 2: Standard model for feature work
When I'm implementing a feature — writing new logic, integrating an API, building a component — I use a standard-tier model without thinking mode. It's smart enough to understand intent, generate idiomatic code, and handle moderate complexity.
The key discipline here is context management. Instead of attaching my entire codebase and asking "build this feature," I attach only the files that are directly relevant:
- The file I'm editing
- The types/interfaces it depends on
- One or two examples of similar patterns in the codebase
Three to five files, not thirty. This keeps the context window small and the cost predictable. It also produces better results — models perform worse with too much irrelevant context, not better.
Tier 3: Thinking model for hard problems
I reserve the expensive thinking models for genuinely hard problems — the ones where I need the model to reason, not just generate:
- Debugging a race condition across multiple services
- Designing the architecture for a new system component
- Understanding a complex error with a deep stack trace
- Reviewing critical code for subtle bugs
- Untangling a gnarly type error in a generic TypeScript function
These are the tasks where thinking models earn their cost. The extended chain-of-thought lets them consider edge cases, weigh trade-offs, and catch issues that standard models miss. But they represent maybe 10-15% of my daily work.
Tier 4: MAX mode — the nuclear option
MAX mode with full context gets used once or twice a week, tops. It's for moments when I'm genuinely stuck and need the model to analyze a large surface area of code with deep reasoning:
- A bug that spans five files and three abstraction layers
- A major refactor where the model needs to understand the entire module to suggest a safe approach
- Reviewing an entire PR for architectural issues
Before I reach for MAX mode, I ask myself: "Have I tried solving this with a cheaper model first?" If the answer is no, I start there. Most of the time, Tier 2 or 3 gets me to the answer.
Practical strategies that actually save money
Beyond the tiered model approach, a few habits made a significant difference:
Start fresh conversations frequently. Long conversations accumulate context. Every new message includes the entire conversation history. By message 20, you're sending a novel-length prompt for every request. I now start a new conversation every time I switch tasks — and sometimes mid-task when the conversation gets long.
Be specific in your prompts. "Fix this bug" with 10 files attached is expensive and slow. "The processOrder function in order-service.ts throws a null reference on line 47 when customer.address is undefined — add a guard clause" is cheap and fast. Specificity reduces the work the model needs to do, which reduces tokens, which reduces cost.
Use the AI for planning, then execute yourself. For complex features, I'll use a thinking model once to design the approach — which files to change, what patterns to follow, what edge cases to handle. Then I execute the plan using a fast model (or just my own hands). One expensive planning call replaces ten expensive implementation calls.
Read the code yourself first. This sounds obvious, but it's the habit I lost. When every answer is a prompt away, you stop reading the code. You ask the AI "what does this function do?" instead of spending two minutes reading it. Those two-minute questions, at thinking-model token rates, cost real money. And reading the code yourself builds understanding that no model can substitute.
Leverage cached and indexed context. Many AI coding tools maintain a local index of your codebase. Queries against the index are cheap or free. Use search, symbol lookup, and go-to-definition before attaching files manually. Let the tool find the relevant context instead of dumping everything into the prompt.
The 90/10 rule
After a month of deliberate cost management, my weekly spend dropped from $500 to around $80-100 — an 80% reduction. My productivity didn't noticeably change. If anything, it improved, because I was thinking more carefully about what I was asking and why.
The uncomfortable truth is that most AI-assisted coding doesn't need frontier models. It needs fast, cheap models applied to well-scoped tasks. The frontier models are genuinely transformative for the hard 10% of problems — the ones where you're stuck, confused, or making a decision with significant consequences. Using them for everything is not just expensive; it's a crutch that atrophies your own engineering judgment.
The best AI-assisted workflow I've found is one where I do the thinking about what to build and the AI helps me build it faster. When I reverse that — when I outsource the thinking to the AI and become a prompt jockey — both the quality and the cost go in the wrong direction.
A note on the economics
AI model pricing will continue to drop. What costs $500 today might cost $50 in a year. But the principle will remain: there will always be a hierarchy of model capabilities and costs, and the expensive tier will always be tempting. The discipline of matching the tool to the task — of not reflexively reaching for the most powerful option — is a skill that pays dividends regardless of the price per token.
And if you're expensing this to your company, you have an even stronger reason to be intentional. A team of ten engineers each burning $500/week on AI tools is $260,000 per year. That's a senior engineer's salary. At some point, someone in finance will notice — and you'd rather have a story about deliberate, optimized usage than "we just had everything on MAX mode."
This article is part of a series on AI engineering and developer productivity.
Related Posts

The AI Engineering Mindset: What Changes When You Build with LLMs
AI engineering isn't just software engineering with a model attached. The feedback loops, failure modes, and quality signals are fundamentally different. Here's how to think about it.

The Line Coder Is Dead. Long Live the Engineer.
Writing code line by line is no longer the job. The job is conducting agents — knowing what to build, how to decompose it, and how to steer AI toward the right solution. The skill that matters now is engineering judgment, not typing speed.

Creation Is Cheap Now. Creativity Is the Moat.
AI made production fast and almost free. Anyone can build an app, generate content, ship a product. The scarce resource is no longer execution — it's the idea worth executing. The most creative person in the room just became the most dangerous.