ReadonlynameBenchmark name (e.g., 'swe-bench').
ReadonlyvariantVariant, if applicable (e.g., 'lite', 'verified').
ReadonlytotalTotal instances attempted.
ReadonlypassedInstances whose evaluation reported pass.
Readonlypasspassed / total, in [0, 1].
ReadonlyrunWall-clock runtime in milliseconds.
ReadonlymetadataBenchmark-specific extras (dataset hash, model IDs, etc.).
High-level summary of a benchmark run, CLI-printable and JSON-serializable. Benchmarks that need extra dimensions attach them via
metadata.