Atmosphere — Eval Dashboard

Baselines

BaselineTotal runsPass rate Last verdictLast run at
Loading…

Recent runs

TimestampBaselineVerdict Judge modelAgent versionDetails
Loading…

How to record runs

Wire your CI pipeline to POST a JSON body to /api/admin/evals/runs after each LLM-as-judge run:

POST /api/admin/evals/runs
{
  "id": "ci-2026-05-15-1734",
  "baseline": "intent-support",
  "timestamp": "2026-05-15T17:34:00Z",
  "agentVersion": "atmosphere-4.0.46",
  "prompt": "...the judge prompt that was sent...",
  "judgeResponse": "{\"verdict\": true}",
  "verdict": true,
  "scores": {"relevance": 0.9, "groundedness": 0.95},
  "judgeModel": "gpt-4o-mini",
  "passed": true,
  "notes": "promoted to main"
}

The endpoint requires atmosphere.admin.http-write-enabled=true plus an authenticated principal whose ControlAuthorizer grants evals.write. Every accepted submission is recorded in the control audit log so an operator can reconstruct who submitted what.