The Manual · Ch 17

Hosting choices and keeping costs down

Local vs cloud, model choice, and how to keep monthly cost from creeping up.

Costs Down

Where the agent runs and what model it uses are the two biggest cost drivers. Get them right and you can run a serious agent for around $200 a month, all in. Get them wrong and you can spend ten times that.

Where it runs

You have options for where the agent lives, a small dedicated machine in your home, or somewhere in the cloud. Each has trade-offs in cost, security, and how much it can do. Don't overthink it. Ask your agent (or whoever's setting yours up): "based on what I want to do and what I'm willing to spend, what's the right hosting option for me?" You'll get a clear recommendation. The one thing worth knowing: whatever you pick, the cost difference between the cheapest and the most capable is small compared to the model bill. Pick the option that fits how you'll actually use the thing.

Pick the model

The single biggest lever on output quality. Get the smartest, most agentic model you can. The most opinionated take I will share: Just get the best model in the world. End of discussion. Don't get distracted by benchmark debates online. Most of them are marketing. The most "agentic" model, the one that self-corrects, recovers from errors, finishes tasks instead of bouncing back to you with reasons it couldn't, is the right default for an agent. That has consistently been the top-tier model from the leading lab. Smaller models are fine for narrow tasks (heartbeats, gathering, simple cron jobs). They are not fine as the main brain.

Pay-per-token vs. flat plan

Two ways to pay the model provider: Pay-per-token API. Pay for every token in and out. Cheap if you use the agent a little. Brutal if you use it a lot.

Flat-rate subscription. A monthly plan that gives you near-unlimited usage, with practical fair-use limits. For someone using an agent seriously, the flat plan is dramatically cheaper. I pay the $200 plan, and I asked my agent what the last month would have cost on the API. It said $2,500. If you are using the agent every day for real work, pay the flat rate and stop thinking about it. If you are dabbling, the API is fine. Once you cross the threshold of "I use this every day," switch.

Model routing for cost

Even on a flat plan, you can hit limits if you are heavy. The fix is model routing, the agent uses a cheaper model for cheap work and saves the smart model for real reasoning. Tell the agent: "from now on, switch to a simpler model when you're doing simple tasks. Run heartbeats and routine scheduled tasks on the cheap model. Warn me when I've been heavy on usage so I can rest the smart model." Concrete policy: Heartbeat: cheap model. Most scheduled cron jobs: cheap model. Sub-agent gatherers (read this file, fetch this record): cheap model. Drafting, planning, decision-making: smart model. Long-form writing in your voice: smart model. Save it as a rule. The agent decides per task.

Token discipline as a habit

A few small habits that compound: Compress your rules. (Chapter 6.) Short rules = cheaper sessions. Move procedural content to skills. (Chapter 7.) Skills load on demand. Use sub-agents for long context-heavy work. (Chapter 11.) Cleaner main context. Reset long conversations. When a session has gotten enormous and the topic has shifted, reset and start fresh. The agent re-loads its rules; the bloat is gone. Don't re-explain things. If you find yourself explaining the same thing twice, it should be a rule or a skill.

Backup model

Sometimes your primary model provider has an outage. Have a backup. A cheap monthly subscription to a competing model is enough, it's not your daily driver, it just keeps you working when the main one is down. Ask your agent to set one up.

Don't fall for benchmarks

A short rant. The internet is full of "model X beat model Y on benchmark Z." Most of these benchmarks measure things that have very little to do with running an agent in production. What matters for agent work: Does it self-correct when a tool call fails? Does it stay on task across a 50-step workflow? Does it follow rules consistently in long sessions? Does it know when to ask versus when to act? There is no public benchmark for any of these. Trust the model that the people building agents actually run in production. Right now, that is one model, and the answer is uninteresting because it is the obvious one. Move on. Where to run the agent, which model to use, and how to keep the monthly bill from spiraling.

Work with us

Put AI to work
in your business.

A free 30-minute strategy call. We map your highest-ROI AI opportunities and scope a custom engagement.

Book a strategy call See case studies

No commitment 30 minutes Custom roadmap