Claude Opus vs Sonnet vs Haiku — Smart Model Routing
Route Claude requests to the right model—Opus for reasoning, Sonnet for balance, Haiku for speed—and cut your API costs by 60% while keeping response quality where it matters.
Why Model Routing Matters for AI Agents
Not every Claude request needs Opus. Simple classifications, formatting, and retrieval tasks run fine on Haiku at 1/10 the cost. Complex reasoning, code generation, and multi-step planning benefit from Opus's reasoning depth. Smart routing means you pay only for the capability you actually need.
For indie developers and founders, this isn't premature optimization—it's the difference between a $500/month API bill and $50/month at scale. Agents that make dozens of API calls per user interaction need intentional model selection from the start.
Claude Opus: Deep Reasoning and Complex Planning
Opus is your heavyweight. It excels at multi-step reasoning, analyzing ambiguous requirements, debugging complex code, and solving novel problems. Use it for tasks that genuinely require step-by-step logic.
Typical routes: architectural decisions, bug analysis across large codebases, writing detailed specifications, evaluating tradeoffs. Response time: 2–5 seconds. Cost: $15 per 1M input tokens.
In an agent system, route to Opus when your task classifier detects low-confidence requests or when previous Sonnet attempts failed to meet quality thresholds.
Claude Sonnet: The Smart Default
Sonnet sits in the sweet spot: fast enough for real-time chat, smart enough for most technical tasks. It handles code reviews, documentation generation, structured data extraction, and multi-turn conversations without breaking speed budgets.
Typical routes: customer support responses, API schema generation, refactoring suggestions, content moderation. Response time: 500–1000ms. Cost: $3 per 1M input tokens.
Make Sonnet your default fallback. If you're unsure which model to use, start here—you'll only bump to Opus when task complexity actually demands it.
Claude Haiku: Fast Filtering and Triage
Haiku is your lightweight. It's fast (typically <200ms), cheap ($0.80 per 1M input tokens), and surprisingly capable at classification, entity extraction, and simple transformations.
Typical routes: determining if a support ticket needs human escalation, classifying user intent before routing to specialized agents, extracting structured data from unstructured text, content safety checks.
In a multi-step agent workflow, use Haiku for the first decision layer. Only pass complex reasoning or final content generation upstream to Sonnet or Opus.
Practical Routing Strategy for Agents
Layer your routing decisions: start with Haiku for lightweight triage, escalate to Sonnet for moderate complexity, reserve Opus for genuinely difficult reasoning. The key is building a confidence score or task classifier that sits before model selection.
Example pattern: Haiku extracts intent and validates input format → Sonnet handles main logic and generation → Opus only runs on edge cases or when Sonnet confidence drops below threshold. This three-tier approach typically reduces Opus usage to 5–10% of total requests while maintaining output quality.
async function routeMessage(userInput: string): Promise<string> {
const intent = await claude(userInput, 'haiku', 'classify_intent');
if (intent.requiresReasoning) {
return await claude(userInput, 'opus', 'detailed_analysis');
}
if (intent.complexity === 'high') {
return await claude(userInput, 'sonnet', 'standard_response');
}
return await claude(userInput, 'haiku', 'simple_response');
}Open-Source Implementation: Pantheon
The Pantheon repo (github.com/lewisallena17/pantheon) provides a production-ready reference implementation for smart Claude model routing in Next.js. It includes task classification, cost tracking, latency monitoring, and fallback chains.
Fork it to get: TypeScript routing layer, Supabase cost logging, A/B testing framework, and example agent workflows. The starter already integrates model selection with your API routes—extend the classifier logic for your specific use case.
Open-source implementation
Everything in this article runs in pantheon — a production-ready Next.js + Supabase + Claude starter. Clone it, deploy to Vercel, run PM2. The dashboard auto-commits every agent edit and reverts itself if TypeScript breaks.
◈ Tools mentioned
- Supabase — open-source Firebase alt
- Vercel — zero-config Next.js hosting
- Claude — AI assistant by Anthropic
- Gumroad — sell digital products
Some links may pay us a referral if you sign up. Never affects the price you pay.
Get the full starter kit
Start with a Haiku → Sonnet → Opus routing chain, measure where your requests actually land, and you'll cut API costs while keeping quality high—clone Pantheon to get started today.