Montreal.AI / SkillOS / RSI meta-proof

Proof Forge Meta-Coordination.

Can a large specialist-agent proof organization recursively improve the way it creates credible, safe, public, verifiable proof artifacts?

Proof passed

4,194,304 agents. 131,072 roles. 256 verifier courts.

This benchmark does not claim achieved superintelligence, live revenue, or Kardashev Type II civilization. It makes the coordination mechanism underneath that value thesis public, repeatable, and falsifiable.

ef32cdf645705a4a0eab40c326cc2104…

99.097%locked-holdout value capture
99.946%proof credibility
93.218%coordination quality
$6.74Tbenchmark value captured
Mechanism under test

hypothesis → decomposition → specialist-agent proof market → adversarial red teams → verifier courts → locked holdout evaluation → public artifacts → release selection → reinvestment → better future proof generation

The point is not a louder claim. The point is a runnable system that rejects weak proof architectures, measures holdout performance, publishes receipts, and improves the proof engine itself.

RSI release curve

v0v2v4v6v8v10v12v14v16v18v20

Locked-holdout value capture across recursively improved proof-forge releases.

Capability radar

CredibilityEvidenceCoordinationRSIUXRisk control

Credibility, evidence, coordination, RSI, executive comprehension, and risk control.

Baseline comparison

BaselineCaptureCaptured valueSkillOS deltaBootstrap p05
Single generalist proof writer33.83%$2.30T$4.44T64.96%
Uncoordinated proof swarm83.39%$5.67T$1.07T15.52%
Static benchmark harness47.27%$3.22T$3.53T51.32%
No-RSI proof factory81.07%$5.52T$1.23T17.76%
Vanity-metric generator23.82%$1.62T$5.12T74.93%
Random proof-architecture control44.05%$3.00T$3.75T54.06%

Negative controls and ablations

AblationCaptureCredibilitySkillOS delta
no verifier courts64.96%74.88%$2.32T
no red team79.77%90.41%$1.32T
no rsi reinvestment83.55%99.46%$1.06T
no multi agent market83.29%93.79%$1.08T
no holdout83.81%84.88%$1.04T

Verifier gates

Run / regenerate

Open the GitHub Action and run the workflow. It regenerates the benchmark receipt, Markdown report, badge, proof webpage, proof registry, sitemap, and SkillOS command center without human review.