Claude API Tier 1 vs Tier 2 Rate Limits

If you're building AI agents with Claude and hitting rate limit errors in production, you need to understand the hard RPM and TPM boundaries between Tier 1 and Tier 2—and how to architect around them before they break your system.

◆ The Kit
Pantheon Starter Kit — Build your own autonomous AI workforce
Full Next.js + Supabase + Claude codebase. 9 PM2 agents wired up. Cost guardrails included. 43 SEO-ready topic pages with AdSense + affiliate slots already plumbed.
$39
buy on gumroad →
ADVERTISEMENT

Tier 1 vs Tier 2: The Numbers That Matter

Tier 1 (free and low-volume paid users) gives you 10,000 TPM and 500 RPM. Tier 2 (qualified users with Claude API history) bumps you to 50,000 TPM and 5,000 RPM. For indie builders, that's the difference between running one or two concurrent agent threads and running 10+ in parallel.

TPM (tokens per minute) is your real bottleneck. A single claude-3-5-sonnet call averaging 2,000 output tokens eats 2,000 TPM. At Tier 1, that's only 5 concurrent requests. At Tier 2, you're at 25. RPM is rarely your limiting factor unless you're making many small, fast requests.

ADVERTISEMENT
Get the Pantheon Starter Kit$39
◇ no time to read?
Get one tight email when I publish something worth sharing — autonomous AI agents, cost engineering, post-mortems. No spam, no SaaS pitches.

How to Request Tier 2 (and Actually Get It)

Anthropic reviews upgrade requests based on account age, API usage patterns, and intended use case. You need at least a few weeks of consistent API calls and a clear description of your production workload. Submit your request in the Anthropic console—don't just email support.

Build a small batch of production queries first. Show ~5-10 days of API activity. Mention your tech stack (Next.js + Supabase is a green flag). Tier 2 approval typically takes 1-3 business days. If rejected, wait 30 days and reapply with more usage data.

Rate Limit Handling in Next.js + TypeScript

The right approach is exponential backoff with request queuing. Don't retry immediately—wait 60 seconds on a 429, then double the wait each time. For agents making sequential decisions, this destroys latency. Instead, batch independent calls and queue them.

Here's a minimal queue pattern that respects Tier 1 limits:

const client = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });
const queue: Array<{ fn: () => Promise<any>; resolve: (v: any) => void; reject: (e: any) => void }> = [];
let activeRequests = 0;
const MAX_CONCURRENT = 4; // Conservative for Tier 1

async function enqueue(fn: () => Promise<any>) {
  return new Promise((resolve, reject) => {
    queue.push({ fn, resolve, reject });
    processQueue();
  });
}

async function processQueue() {
  while (queue.length > 0 && activeRequests < MAX_CONCURRENT) {
    activeRequests++;
    const { fn, resolve, reject } = queue.shift()!;
    try {
      resolve(await fn());
    } catch (e) {
      reject(e);
    } finally {
      activeRequests--;
      processQueue();
    }
  }
}

Monitoring Rate Limits in Production

Log the `usage` object returned by Claude API calls. Track `input_tokens + output_tokens` per request. In Supabase, store a `rate_limit_log` table with timestamp, tokens, and request_id. Set a Cloud Function alert if daily TPM exceeds 80% of your limit.

Use headers: the `anthropic-ratelimit-remaining-requests` and `anthropic-ratelimit-remaining-tokens` headers tell you exactly where you stand. Check them before retrying—if you're close to zero, pause your agent entirely rather than burning retries.

Designing Agents That Don't Hit Limits

Single-threaded agent loops (decide → act → observe → repeat) are naturally rate-limit-safe. The risk comes with parallelization: spawning 10 agents simultaneously on Tier 1 will fail. Use a work queue with a max concurrency of 3-4 on Tier 1, or 15-20 on Tier 2.

Cache Claude's system prompts and reuse them. Use prompt caching (available on claude-3-5-sonnet) to avoid re-tokenizing identical context. For multi-step agents, keep a single conversation thread rather than starting fresh each step.

Open-Source Rate Limit Manager: Pantheon

The Pantheon repo at github.com/lewisallena17/pantheon implements a production-grade rate limit queue and monitoring dashboard. It's built for Next.js + Supabase and handles exponential backoff, Tier detection, and real-time usage dashboards. Fork it as a starter for your agent infrastructure.

Open-source implementation

Everything in this article runs in pantheon — a production-ready Next.js + Supabase + Claude starter. Clone it, deploy to Vercel, run PM2. The dashboard auto-commits every agent edit and reverts itself if TypeScript breaks.

◈ Tools mentioned

Some links may pay us a referral if you sign up. Never affects the price you pay.

Get the full starter kit

Tier 2 unlocks 5x more throughput—request it early, build with queueing from day one, and monitor your TPM ceiling in production to keep your agents scaling without crashes.