nexus-agents - v2.80.0
    Preparing search index...

    Interface BenchmarkAdapter<TInstance, TPrediction, TEvalResult>

    Contract every benchmark implementation fulfills.

    Type parameters:

    • TInstance: one task / problem in the benchmark (e.g., a SWE-bench issue)
    • TPrediction: the solver's output (e.g., a proposed patch)
    • TEvalResult: the evaluator's verdict (e.g., patch applied + tests passed)

    A correct implementation composes as: loadInstances -> runInstance(each) -> evaluate(each) -> summarize

    class SweBenchAdapter implements BenchmarkAdapter<SweIssue, SwePatch, SweEval> {
    readonly name = 'swe-bench';
    readonly variant = 'lite';
    async loadInstances(config) { ... }
    async runInstance(inst, ctx) { ... }
    async evaluate(inst, pred) { ... }
    summarize(results) { ... }
    }
    interface BenchmarkAdapter<TInstance, TPrediction, TEvalResult> {
        name: string;
        variant?: string;
        loadInstances(
            config: Record<string, unknown>,
        ): Promise<readonly TInstance[]>;
        runInstance(
            instance: TInstance,
            ctx: BenchmarkRunContext,
        ): Promise<TPrediction>;
        evaluate(
            instance: TInstance,
            prediction: TPrediction,
        ): Promise<TEvalResult>;
        isPass(result: TEvalResult): boolean;
        summarize(
            results: readonly TEvalResult[],
            runTimeMs: number,
        ): BenchmarkRunSummary;
    }

    Type Parameters

    • TInstance
    • TPrediction
    • TEvalResult
    Index

    Properties

    name: string

    Stable identifier (e.g., 'swe-bench', 'humaneval'). Used in CLI routing and reporting.

    variant?: string

    Optional variant within a benchmark family (e.g., 'lite' vs 'verified').

    Methods