Rate Limiting API Routes in Next.js Middleware
Rate limiting your Next.js API routes in middleware prevents abuse, protects your Claude API quota, and keeps your indie app running predictably under load—without burning through infrastructure costs or getting blocked by third-party services.
Why Middleware Rate Limiting Matters for AI Systems
When you're building AI agent systems with Claude, every API call has a cost. A single malicious user or buggy client can exhaust your rate limits in minutes, blocking legitimate requests. Middleware-based rate limiting catches abuse before it reaches your business logic.
For indie founders, this is existential: your Claude API key is a shared resource. One runaway agent or unintended loop can trigger rate limit errors across your entire system. Enforcing limits at the middleware layer is your first line of defense.
Token-Bucket Algorithm in Next.js Middleware
The token-bucket algorithm is the industry standard for rate limiting. Each user gets a bucket that refills at a fixed rate. Requests consume tokens; when the bucket is empty, requests are rejected. It handles bursts gracefully while enforcing long-term limits.
Next.js middleware executes on every request before route handlers, making it the perfect place to implement this. You can track tokens per user, IP, or API key—whatever makes sense for your multi-tenant or agent-based system.
import { NextRequest, NextResponse } from 'next/server';
const rateLimitStore = new Map<string, { tokens: number; lastRefill: number }>();
const RATE_LIMIT = 100; // requests per minute
const REFILL_RATE = RATE_LIMIT / 60; // tokens per second
export function middleware(request: NextRequest) {
const userId = request.headers.get('x-user-id') || request.ip || 'anonymous';
const now = Date.now();
const bucket = rateLimitStore.get(userId) || { tokens: RATE_LIMIT, lastRefill: now };
const elapsed = (now - bucket.lastRefill) / 1000;
bucket.tokens = Math.min(RATE_LIMIT, bucket.tokens + elapsed * REFILL_RATE);
bucket.lastRefill = now;
if (bucket.tokens < 1) {
return NextResponse.json({ error: 'Rate limit exceeded' }, { status: 429 });
}
bucket.tokens -= 1;
rateLimitStore.set(userId, bucket);
return NextResponse.next();
}Storing Rate Limit State with Supabase
In-memory storage works for single-instance deployments, but distributed systems need persistent state. Supabase PostgreSQL is perfect for tracking rate limit buckets across serverless functions.
Store user_id, tokens, and last_refill_time in a simple table. On each request, query and update atomically using PostgreSQL's transaction isolation. This ensures accurate counts even under concurrent load from multiple serverless containers.
Handling Claude API Rate Limits Upstream
Rate limiting your own endpoints is step one. Step two is respecting Claude's rate limits and avoiding cascading failures. When Claude returns a 429, implement exponential backoff in your middleware or queue system.
For AI agent systems, consider adding Claude-specific headers to your rate limit response, so agents know whether the limit is yours or Anthropic's. This prevents agents from retrying when they shouldn't.
Testing Rate Limits in Development
Test rate limiting with a simple load generator: send 150 requests in rapid succession to your /api/chat endpoint. The first 100 should succeed; requests 101+ should return 429 with a Retry-After header.
Use libraries like Apache Bench or autocannon to simulate realistic traffic patterns. For AI systems, test with concurrent agent requests—the real chaos you'll face in production.
Open-Source Implementation Reference
The Pantheon repository (github.com/lewisallena17/pantheon) includes a production-ready rate limiting middleware designed for Claude-integrated systems. It combines token-bucket logic with Supabase state management, handles distributed deployments, and includes monitoring hooks.
Fork or reference Pantheon's middleware patterns directly into your Next.js project. It's built specifically for the indie founder stack: Next.js, Supabase, and Claude APIs.
Open-source implementation
Everything in this article runs in pantheon — a production-ready Next.js + Supabase + Claude starter. Clone it, deploy to Vercel, run PM2. The dashboard auto-commits every agent edit and reverts itself if TypeScript breaks.
◈ Tools mentioned
- Supabase — open-source Firebase alt
- Vercel — zero-config Next.js hosting
- Anthropic — Claude API
- Claude — AI assistant by Anthropic
- Gumroad — sell digital products
Some links may pay us a referral if you sign up. Never affects the price you pay.
Get the full starter kit
Implement middleware rate limiting now using token-bucket logic and Supabase persistence—protect your Claude quota, stabilize your API, and build systems that scale predictably. Start with Pantheon's open-source patterns and deploy production-grade protection today.