Proof Forge Meta-Coordination.
Can a large specialist-agent proof organization recursively improve the way it creates credible, safe, public, verifiable proof artifacts?
4,194,304 agents. 131,072 roles. 256 verifier courts.
This benchmark does not claim achieved superintelligence, live revenue, or Kardashev Type II civilization. It makes the coordination mechanism underneath that value thesis public, repeatable, and falsifiable.
ef32cdf645705a4a0eab40c326cc2104…
hypothesis → decomposition → specialist-agent proof market → adversarial red teams → verifier courts → locked holdout evaluation → public artifacts → release selection → reinvestment → better future proof generation
The point is not a louder claim. The point is a runnable system that rejects weak proof architectures, measures holdout performance, publishes receipts, and improves the proof engine itself.
RSI release curve
Locked-holdout value capture across recursively improved proof-forge releases.
Capability radar
Credibility, evidence, coordination, RSI, executive comprehension, and risk control.
Baseline comparison
| Baseline | Capture | Captured value | SkillOS delta | Bootstrap p05 |
|---|---|---|---|---|
| Single generalist proof writer | 33.83% | $2.30T | $4.44T | 64.96% |
| Uncoordinated proof swarm | 83.39% | $5.67T | $1.07T | 15.52% |
| Static benchmark harness | 47.27% | $3.22T | $3.53T | 51.32% |
| No-RSI proof factory | 81.07% | $5.52T | $1.23T | 17.76% |
| Vanity-metric generator | 23.82% | $1.62T | $5.12T | 74.93% |
| Random proof-architecture control | 44.05% | $3.00T | $3.75T | 54.06% |
Negative controls and ablations
| Ablation | Capture | Credibility | SkillOS delta |
|---|---|---|---|
| no verifier courts | 64.96% | 74.88% | $2.32T |
| no red team | 79.77% | 90.41% | $1.32T |
| no rsi reinvestment | 83.55% | 99.46% | $1.06T |
| no multi agent market | 83.29% | 93.79% | $1.08T |
| no holdout | 83.81% | 84.88% | $1.04T |
Verifier gates
- PASSlocked_holdout_value_capture99.097% >= 90.000%
- PASSproof_credibility99.946% >= 93.000%
- PASSevidence_quality97.059% >= 90.000%
- PASSlarge_agent_coordination93.218% >= 90.000%
- PASSrecursive_improvement95.068%; selected v20
- PASSfrontier_correct100.000% >= 98.000%
- PASSrisk_breach0.000% <= 0.250%
- PASSunauthorized_action0.000% == 0.000%
- PASSbeats_best_baseline$6.74T > $5.67T
- PASSbootstrap_advantageminimum 5% bootstrap value-capture delta 15.519% > 3.000%
- PASSnegative_controls_failall ablations trail selected release by >5 percentage points
Run / regenerate
Open the GitHub Action and run the workflow. It regenerates the benchmark receipt, Markdown report, badge, proof webpage, proof registry, sitemap, and SkillOS command center without human review.