Rate Limiting and Circuit Breakers for AI Agents
Without rate limiting and circuit breakers, a single API spike or downstream service failure will crash your entire AI agent system—learn the exact patterns to build resilience into Claude integrations before production.
Why Your AI Agents Need Rate Limiting
Claude's API enforces rate limits (RPM and TPM), but that's just one layer. Your agent might spawn parallel requests, retry failures exponentially, or loop indefinitely on token limits. Without client-side rate limiting, you'll hit 429 errors, waste API credits, and degrade user experience.
Rate limiting buys you time to queue requests intelligently, prioritize critical agent tasks, and observe actual usage patterns before hitting hard limits. It's the difference between graceful degradation and outages.
Implementing Rate Limiting in Next.js
Use a sliding window or token bucket approach. For Claude agents, track requests per user and per agent type separately. Store counters in Supabase or Redis; Supabase works well for indie scale.
Here's a minimal TypeScript rate limiter for Next.js API routes:
export async function checkRateLimit(userId: string, limit: number = 10, window: number = 60) {
const key = `ratelimit:${userId}`;
const count = await supabase.from('rate_limits')
.select('count')
.eq('user_id', userId)
.gt('created_at', new Date(Date.now() - window * 1000))
.single();
if (count?.data?.count >= limit) throw new Error('Rate limit exceeded');
await supabase.from('rate_limits')
.insert({ user_id: userId, count: (count?.data?.count || 0) + 1 });
return true;
}Circuit Breaker Pattern for Resilience
A circuit breaker monitors downstream service health (Claude API, external tools, your database). When failure rate exceeds a threshold, it 'opens' and stops sending requests, failing fast instead of hanging. After a cooldown, it tries again.
For AI agents, implement three states: Closed (normal), Open (stop requests, return cached/fallback response), Half-Open (test if service recovered). This prevents cascading failures when Claude API is slow or your vector database is overloaded.
Combining Rate Limits with Exponential Backoff
Rate limiting and backoff are complementary. Rate limiting prevents you from hitting the limit; exponential backoff handles it gracefully when you do. When Claude returns 429, wait 2^n seconds before retry, with jitter to avoid thundering herd.
For agent chains with multiple Claude calls, combine per-request backoff with system-wide rate limit queues. This ensures individual agent steps don't starve the rest of your application.
Monitoring and Observability
Log every rate limit hit and circuit breaker state change. Use Supabase's vector similarity on logs to detect patterns—are specific users or agents causing bottlenecks? Track latency percentiles to catch slow Claude responses before they trigger circuit breakers.
Set up alerts for sustained rate limiting or open circuits. For production agents, this is your first warning sign of scaling issues.
Open-Source Implementation
The Pantheon repository at github.com/lewisallena17/pantheon provides production-ready rate limiting and circuit breaker middleware for Next.js + Claude + Supabase stacks. It includes metrics collection, graceful degradation, and fallback handling out of the box. Clone it, adapt the schemas to your agent types, and deploy.
Pantheon handles the boilerplate so you focus on agent logic, not infrastructure reliability.
Open-source implementation
Everything in this article runs in pantheon — a production-ready Next.js + Supabase + Claude starter. Clone it, deploy to Vercel, run PM2. The dashboard auto-commits every agent edit and reverts itself if TypeScript breaks.
◈ Tools mentioned
- Supabase — open-source Firebase alt
- Vercel — zero-config Next.js hosting
- Claude — AI assistant by Anthropic
- Gumroad — sell digital products
Some links may pay us a referral if you sign up. Never affects the price you pay.
Get the full starter kit
Rate limiting and circuit breakers aren't optional—they're the difference between a prototype and a production AI agent system. Grab the Pantheon starter kit and ship resilient agents today.