Claude API Tier 1 vs Tier 2 Rate Limits
If you're building AI agents with Claude and hitting rate limit errors in production, you need to understand the hard RPM and TPM boundaries between Tier 1 and Tier 2—and how to architect around them before they break your system.
Tier 1 vs Tier 2: The Numbers That Matter
Tier 1 (free and low-volume paid users) gives you 10,000 TPM and 500 RPM. Tier 2 (qualified users with Claude API history) bumps you to 50,000 TPM and 5,000 RPM. For indie builders, that's the difference between running one or two concurrent agent threads and running 10+ in parallel.
TPM (tokens per minute) is your real bottleneck. A single claude-3-5-sonnet call averaging 2,000 output tokens eats 2,000 TPM. At Tier 1, that's only 5 concurrent requests. At Tier 2, you're at 25. RPM is rarely your limiting factor unless you're making many small, fast requests.
How to Request Tier 2 (and Actually Get It)
Anthropic reviews upgrade requests based on account age, API usage patterns, and intended use case. You need at least a few weeks of consistent API calls and a clear description of your production workload. Submit your request in the Anthropic console—don't just email support.
Build a small batch of production queries first. Show ~5-10 days of API activity. Mention your tech stack (Next.js + Supabase is a green flag). Tier 2 approval typically takes 1-3 business days. If rejected, wait 30 days and reapply with more usage data.
Rate Limit Handling in Next.js + TypeScript
The right approach is exponential backoff with request queuing. Don't retry immediately—wait 60 seconds on a 429, then double the wait each time. For agents making sequential decisions, this destroys latency. Instead, batch independent calls and queue them.
Here's a minimal queue pattern that respects Tier 1 limits:
const client = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });
const queue: Array<{ fn: () => Promise<any>; resolve: (v: any) => void; reject: (e: any) => void }> = [];
let activeRequests = 0;
const MAX_CONCURRENT = 4; // Conservative for Tier 1
async function enqueue(fn: () => Promise<any>) {
return new Promise((resolve, reject) => {
queue.push({ fn, resolve, reject });
processQueue();
});
}
async function processQueue() {
while (queue.length > 0 && activeRequests < MAX_CONCURRENT) {
activeRequests++;
const { fn, resolve, reject } = queue.shift()!;
try {
resolve(await fn());
} catch (e) {
reject(e);
} finally {
activeRequests--;
processQueue();
}
}
}Monitoring Rate Limits in Production
Log the `usage` object returned by Claude API calls. Track `input_tokens + output_tokens` per request. In Supabase, store a `rate_limit_log` table with timestamp, tokens, and request_id. Set a Cloud Function alert if daily TPM exceeds 80% of your limit.
Use headers: the `anthropic-ratelimit-remaining-requests` and `anthropic-ratelimit-remaining-tokens` headers tell you exactly where you stand. Check them before retrying—if you're close to zero, pause your agent entirely rather than burning retries.
Designing Agents That Don't Hit Limits
Single-threaded agent loops (decide → act → observe → repeat) are naturally rate-limit-safe. The risk comes with parallelization: spawning 10 agents simultaneously on Tier 1 will fail. Use a work queue with a max concurrency of 3-4 on Tier 1, or 15-20 on Tier 2.
Cache Claude's system prompts and reuse them. Use prompt caching (available on claude-3-5-sonnet) to avoid re-tokenizing identical context. For multi-step agents, keep a single conversation thread rather than starting fresh each step.
Open-Source Rate Limit Manager: Pantheon
The Pantheon repo at github.com/lewisallena17/pantheon implements a production-grade rate limit queue and monitoring dashboard. It's built for Next.js + Supabase and handles exponential backoff, Tier detection, and real-time usage dashboards. Fork it as a starter for your agent infrastructure.
Open-source implementation
Everything in this article runs in pantheon — a production-ready Next.js + Supabase + Claude starter. Clone it, deploy to Vercel, run PM2. The dashboard auto-commits every agent edit and reverts itself if TypeScript breaks.
◈ Tools mentioned
- Supabase — open-source Firebase alt
- Vercel — zero-config Next.js hosting
- Anthropic — Claude API
- Claude — AI assistant by Anthropic
- Gumroad — sell digital products
Some links may pay us a referral if you sign up. Never affects the price you pay.
Get the full starter kit
Tier 2 unlocks 5x more throughput—request it early, build with queueing from day one, and monitor your TPM ceiling in production to keep your agents scaling without crashes.