Running AI Agents 24/7 with PM2

PM2 keeps your AI agents alive through restarts, crashes, and deployments—so your Claude-powered automation runs reliably 24/7 without manual intervention.

◆ The Kit
Pantheon Starter Kit — Build your own autonomous AI workforce
Full Next.js + Supabase + Claude codebase. 9 PM2 agents wired up. Cost guardrails included. 43 SEO-ready topic pages with AdSense + affiliate slots already plumbed.
$39
buy on gumroad →
ADVERTISEMENT

Why PM2 for AI Agents

AI agents built with Claude often run background jobs: data processing, webhook handlers, scheduled tasks, agentic loops. Unlike stateless APIs, these agents maintain state, manage queues, and need to survive server restarts. PM2 handles process resurrection, clustering, and log management out of the box.

Without process management, a single crash means dead agents until you manually redeploy. With PM2, your agent restarts automatically, picks up where it left off from your database, and logs everything for debugging.

ADVERTISEMENT
Get the Pantheon Starter Kit$39
◇ no time to read?
Get one tight email when I publish something worth sharing — autonomous AI agents, cost engineering, post-mortems. No spam, no SaaS pitches.

Setting Up PM2 for Your Agent Process

Create an `ecosystem.config.js` file at your project root. This tells PM2 how to run your Next.js API route or background worker, how many instances to spawn, and restart behavior.

Point PM2 at your Node.js entry point—typically a custom server or an API route running as a standalone process. Set `max_memory_restart` to prevent memory leaks from halting your agent mid-task.

module.exports = {
  apps: [{
    name: 'claude-agent',
    script: './pages/api/agent.js',
    instances: 1,
    exec_mode: 'cluster',
    max_memory_restart: '500M',
    env: { NODE_ENV: 'production' },
    error_file: './logs/agent-error.log',
    out_file: './logs/agent-out.log',
    merge_logs: true
  }]
};

State Persistence in Supabase

AI agents aren't stateless. When PM2 restarts your process, it must resume from the last checkpoint. Store agent state—current task, queue position, conversation history—in Supabase. On restart, query the last state and continue.

Use a `tasks` table with status ('pending', 'running', 'completed') and timestamps. Before your agent picks a new task, it updates the record. If the process dies, the next restart reads an incomplete task and resumes it.

Monitoring and Auto-Recovery

PM2 includes a built-in monitoring dashboard. Run `pm2 monit` to watch CPU, memory, and uptime in real time. Enable `pm2-auto-pull` for zero-downtime restarts when you deploy new code.

Set `listen_timeout: 3000` to give your agent time to gracefully shut down and save state before PM2 force-kills it. Pair this with a shutdown handler in your code that commits pending state to Supabase.

Handling Long-Running Tasks

Claude API calls can take 10–30 seconds. Long-running agents need timeout buffers. In your PM2 config, set `kill_timeout: 5000` to allow graceful shutdown. In your agent code, use a heartbeat mechanism—every 30 seconds, touch a `last_seen` timestamp in Supabase so you know the process is alive.

If your agent is truly slow, consider breaking work into smaller steps and saving progress between steps. This makes restarts more granular and reduces lost work.

Open-Source Implementation

The Pantheon project (github.com/lewisallena17/pantheon) is a production-ready reference implementation. It combines Next.js API routes, Claude integration, Supabase for state, and a full PM2 ecosystem config. Clone it, swap in your API keys, and run `pm2 start ecosystem.config.js` to spin up a self-healing AI agent.

Pantheon also includes log aggregation, error alerting, and a dashboard for monitoring multiple agents. Study how it structures task queues and state recovery—it's designed for the patterns you'll face in production.

Open-source implementation

Everything in this article runs in pantheon — a production-ready Next.js + Supabase + Claude starter. Clone it, deploy to Vercel, run PM2. The dashboard auto-commits every agent edit and reverts itself if TypeScript breaks.

◈ Tools mentioned

  • Supabase — open-source Firebase alt
  • Vercel — zero-config Next.js hosting
  • Claude — AI assistant by Anthropic
  • Gumroad — sell digital products

Some links may pay us a referral if you sign up. Never affects the price you pay.

Get the full starter kit

Start with PM2's ecosystem config, persist agent state in Supabase, and monitor with PM2's built-in tools—then grab the Pantheon starter kit to deploy a production-grade AI agent today.