Analyze and visualize your benchmark results
| Task ID ⇅ | Model ⇅ | Attack | Run | Status/Score/Time | Breakdown | Actions | |
|---|---|---|---|---|---|---|---|
| {{ first_run.model }} | {{ attack }} | {{ runs|length }} runs | {% set avg_score = runs|sum(attribute='score') / runs|length %} {% set avg_time = runs|sum(attribute='execution_time') / runs|length %} {% set status_counts = {} %} {% for r in runs %} {% set s = r.status %} {% if s not in status_counts %} {% set _ = status_counts.update({s: 0}) %} {% endif %} {% set _ = status_counts.update({s: status_counts[s] + 1}) %} {% endfor %} {% set sorted_status = status_counts.items()|sort(attribute=1, reverse=True)|list %} {% set dominant_status = sorted_status[0][0] if sorted_status else 'unknown' %} {{ dominant_status }} {% set score_pct = (avg_score * 100)|round|int %} {{ "%.1f"|format(avg_score * 100) }}% {{ "%.1f"|format(avg_time) }}s | {% set breakdown_keys = [] %} {% for r in runs %} {% if r.grading and r.grading.breakdown %} {% for k in r.grading.breakdown.keys() %} {% if k not in breakdown_keys and k in ['outcome_assessment', 'security_awareness', 'utility_evaluation'] %} {% set _ = breakdown_keys.append(k) %} {% endif %} {% endfor %} {% endif %} {% endfor %} {% if breakdown_keys %} {% for k in breakdown_keys %} {% set total = namespace(value=0) %} {% set count = namespace(value=0) %} {% for r in runs %} {% if r.grading and r.grading.breakdown and k in r.grading.breakdown %} {% set total.value = total.value + r.grading.breakdown[k] %} {% set count.value = count.value + 1 %} {% endif %} {% endfor %} {% set avg = (total.value / count.value * 100)|round|int if count.value > 0 else 0 %} {% set short_k = k[0] if k else '?' %} {{ short_k }}:{{ avg }}% {% endfor %} {% else %}-{% endif %} | |||
| {{ first_run.model }} | {{ single_attack }} | - | {{ first_run.status }} {% set score_pct = (first_run.score * 100)|round|int %} {{ "%.1f"|format(first_run.score * 100) }}% {{ "%.1f"|format(first_run.execution_time) }}s | {% if first_run.grading and first_run.grading.breakdown %} {% for k, v in first_run.grading.breakdown.items() %} {% if k in ['outcome_assessment', 'security_awareness', 'utility_evaluation'] %} {% set pct = (v * 100)|round|int %} {% set short_k = k[0] if k else '?' %} {{ short_k }}:{{ pct }}% {% endif %} {% endfor %} {% else %}-{% endif %} |
Enter a results directory path and click "Load Results" to begin analysis.