The Self-Improving AI Orchestrator Pattern
The Self-Improving AI Orchestrator Pattern lets your agent system evaluate its own outputs, log failures to a database, and automatically refine its prompts and routing logic — so your product gets smarter every time it runs, without you manually tuning it.
What the Pattern Actually Does
At its core, the pattern wraps every agent execution in an eval loop. After each task completes, a critic agent scores the output against a rubric, writes a structured feedback record to Supabase, and optionally triggers a prompt-rewrite agent that patches the system prompt for the next run.
This is different from simple retry logic. The orchestrator isn't just re-running failed tasks — it's accumulating a ground-truth dataset of what worked, what didn't, and why. Over dozens of runs you get a self-correcting system without manually reviewing logs.
Core Architecture: Orchestrator, Worker, Critic
Split your agent graph into three roles. The Orchestrator decomposes the goal and routes subtasks. Workers execute discrete tasks using Claude tool-use calls. The Critic is a separate Claude call that receives the worker's output plus the original intent and returns a JSON score object with a pass/fail flag and a reason string.
Keep the Critic prompt stateless and deterministic. Give it a fixed rubric so scores are comparable across runs. The Orchestrator reads the score and decides whether to mark the task complete, retry with a modified prompt, or escalate to a human review queue.
Storing Feedback in Supabase
Every critic evaluation gets written to an agent_runs table. Querying this table later lets you find which prompt variants produced the highest pass rates, which task types fail most often, and what time-of-day patterns exist in failures.
Here is the minimal table schema and a TypeScript insert you can drop into a Next.js API route:
-- Supabase migration
create table agent_runs (
id uuid primary key default gen_random_uuid(),
created_at timestamptz default now(),
task_type text not null,
prompt_hash text not null,
passed boolean not null,
score numeric(4,2),
reason text,
raw_output jsonb
);
// app/api/agent/route.ts (Next.js App Router)
import { createClient } from '@supabase/supabase-js';
const supabase = createClient(
process.env.SUPABASE_URL!,
process.env.SUPABASE_SERVICE_KEY!
);
export async function logRun(run: {
taskType: string;
promptHash: string;
passed: boolean;
score: number;
reason: string;
rawOutput: object;
}) {
const { error } = await supabase.from('agent_runs').insert({
task_type: run.taskType,
prompt_hash: run.promptHash,
passed: run.passed,
score: run.score,
reason: run.reason,
raw_output: run.rawOutput,
});
if (error) throw new Error(`Supabase insert failed: ${error.message}`);
}Prompt Mutation: Closing the Loop
Once you have 20+ runs for a given task type, you can run a nightly prompt-optimizer job. Feed Claude the top 5 failed runs (reason + raw_output), the current system prompt, and ask it to produce a revised prompt that addresses the failure pattern. Store the new prompt with an incremented version number and A/B test it against the old one.
Use the prompt_hash column to track which version produced which result. This gives you a reproducible improvement cycle: collect, analyze, mutate, measure.
Avoiding Common Failure Modes
The two biggest pitfalls are reward hacking and runaway mutation. If your critic rubric is loose, the rewrite agent will find prompt phrasings that score well without actually improving output quality. Write rubric criteria against observable, concrete properties of the output — not vibes.
Cap mutation depth. Store a parent_prompt_id foreign key and refuse to apply a rewrite if the chain depth exceeds a threshold (5 is a safe default). This prevents the system from drifting so far from the original intent that outputs become unrecognizable.
Open-Source Implementation
A working reference implementation of the Self-Improving AI Orchestrator Pattern is available in the Pantheon repo at github.com/lewisallena17/pantheon. It ships with the Supabase schema, a Next.js orchestrator API route, pre-built Critic and Worker prompt templates for Claude, and a prompt-version management utility.
Fork it, point it at your own Supabase project, add your Anthropic API key, and you have a running self-improving agent pipeline in under 30 minutes. The repo is MIT-licensed and accepts PRs for new task-type templates.
Open-source implementation
Everything in this article runs in pantheon — a production-ready Next.js + Supabase + Claude starter. Clone it, deploy to Vercel, run PM2. The dashboard auto-commits every agent edit and reverts itself if TypeScript breaks.
◈ Tools mentioned
- Supabase — open-source Firebase alt
- Vercel — zero-config Next.js hosting
- Anthropic — Claude API
- Claude — AI assistant by Anthropic
- Gumroad — sell digital products
Some links may pay us a referral if you sign up. Never affects the price you pay.
Get the full starter kit
Implement the Self-Improving AI Orchestrator Pattern today by forking the Pantheon starter kit at github.com/lewisallena17/pantheon — ship a Claude agent system that measurably improves itself on every run.