Context Compression for Long-Running AI Agents

Long-running AI agents hit a hard wall: every message adds context, token costs spiral, and response latency becomes unbearable—but strategic context compression keeps your Claude agents fast and cheap.

◆ The Kit

Pantheon Starter Kit — Build your own autonomous AI workforce

Full Next.js + Supabase + Claude codebase. 9 PM2 agents wired up. Cost guardrails included. 43 SEO-ready topic pages with AdSense + affiliate slots already plumbed.

$39

buy on gumroad →

Why Context Grows Into a Problem

AI agents that run for hours, days, or across multiple conversations accumulate memory. Each new interaction includes all previous context: system prompts, conversation history, retrieved documents, tool outputs. With Claude, this means tokens multiply fast.

At $3 per 1M input tokens, a 100k-token conversation costs $0.30. Scale to 10 concurrent agents running daily, and you're burning budget on redundant context. Worse: larger context windows increase latency by 200-400ms, breaking the responsiveness users expect.

◆Get the Pantheon Starter Kit$39→

Selective History Retention

Not all history matters equally. Recent exchanges inform the agent's current task; old interactions become noise. Implement a sliding window: keep only the last N turns, or retain messages within the last M hours.

In Supabase, store conversations with timestamps and mark 'active' messages. Query only active messages when building the context window. For a customer support agent, you might keep the last 8 turns but summarize anything older than 24 hours into a single brief recap.

const activeMessages = await supabase
  .from('messages')
  .select('role, content')
  .eq('agent_id', agentId)
  .gt('created_at', new Date(Date.now() - 24 * 60 * 60 * 1000).toISOString())
  .order('created_at', { ascending: false })
  .limit(8);

Summarization at Conversation Boundaries

When an agent completes a task or workflow, summarize the entire exchange into a single 'session summary' message. This becomes the new context seed for future conversations with the same user.

Call Claude to generate a 200-300 token summary: goals achieved, decisions made, relevant facts. Replace the full history with one summary line. A data analysis agent might summarize: 'User analyzed Q3 sales data, identified a 12% decline in region 4, recommended inventory reduction.'

Semantic Deduplication

Agents often re-state the same facts or constraints. If the system prompt already covers a rule, don't repeat it in context. Use embedding-based similarity checks to detect near-duplicate information and remove lower-confidence versions.

Before adding a retrieved document or user clarification to context, compare its embedding against existing context. If similarity exceeds 0.92, skip it. This is especially useful for agents that query knowledge bases repeatedly.

Tool Output Caching

Agents call tools (APIs, databases, searches) constantly. Tool outputs don't need to live forever in context. Cache results with a TTL: keep a database query result for 10 minutes, a web search for 1 hour. If the agent asks the same question within the TTL, return the cached result without bloating context.

In Next.js, use Redis or a simple Supabase table to store hash(tool_input) → output. Check the cache before calling the actual tool. This cuts both API calls and token usage.

Progressive Context Building

Don't load full context upfront. Build messages incrementally: system prompt → last 2 turns → relevant docs → tool results. Stop adding context once you hit 80% of your target token budget. Claude works fine with less context if it's high-signal.

This requires tuning per use case, but the payoff is real: less latency, lower cost, and often better outputs because the agent isn't distracted by noise.

Open-source implementation

Everything in this article runs in pantheon — a production-ready Next.js + Supabase + Claude starter. Clone it, deploy to Vercel, run PM2. The dashboard auto-commits every agent edit and reverts itself if TypeScript breaks.

◈ Tools mentioned

Supabase — open-source Firebase alt
Vercel — zero-config Next.js hosting
Claude — AI assistant by Anthropic
Gumroad — sell digital products

Some links may pay us a referral if you sign up. Never affects the price you pay.

Get the full starter kit

Start with selective history retention and summarization—most agents save 30-50% tokens immediately. Use the open-source Pantheon implementation to integrate these patterns into your Next.js + Supabase stack today.

🛒 Buy on Gumroad — $39 📧 Subscribe for updates 🏠 Live dashboard