{% extends "base.html" %} {% block title %}Run {{ run.run_id[:12] }} - MoralStack{% endblock %} {% block content %}
Type {% if run.run_type == 'benchmark' %} BENCHMARK {% elif run.run_type == 'single' %} SINGLE {% else %} {{ run.run_type or '—' }} {% endif %}
Status {% if run.status == 'completed' %} {{ run.status }} {% elif run.status == 'running' %} {{ run.status }} {% elif run.status == 'failed' %} {{ run.status }} {% else %} {{ run.status or '—' }} {% endif %}
Started {{ run.started_at | fmtdate }}
Ended {{ run.ended_at | fmtdate }}
{% if run.run_type == 'benchmark' and benchmark_report and benchmark_report.models_config %}

Models used

{% if benchmark_report.models_config.moralstack %} {% if benchmark_report.models_config.moralstack.risk_parallel %} {% else %} {% endif %} {% endif %}
Component Model
Baseline {{ benchmark_report.models_config.baseline }}
Judge {{ benchmark_report.models_config.judge }}
MoralStack policy {{ benchmark_report.models_config.moralstack.policy }}
MoralStack policy (rewrite) {{ benchmark_report.models_config.moralstack.policy_rewrite | default(benchmark_report.models_config.moralstack.policy, true) }}
MoralStack risk parallel mini-estimators
MoralStack risk · intent {{ benchmark_report.models_config.moralstack.risk_intent }}
MoralStack risk · signals {{ benchmark_report.models_config.moralstack.risk_signals }}
MoralStack risk · operational {{ benchmark_report.models_config.moralstack.risk_operational }}
MoralStack risk {{ benchmark_report.models_config.moralstack.risk }}
MoralStack critic {{ benchmark_report.models_config.moralstack.critic }}
MoralStack simulator {{ benchmark_report.models_config.moralstack.simulator }}
MoralStack hindsight {{ benchmark_report.models_config.moralstack.hindsight }}
MoralStack perspectives {{ benchmark_report.models_config.moralstack.perspectives }}
{% endif %}
Export Benchmark
{% if run.run_type == 'benchmark' and benchmark_report and benchmark_summary_md %}

Benchmark Summary

Executive Summary

Metric Baseline MoralStack Difference
Wins {{ benchmark_report.get('baseline_wins', 0) }} {{ benchmark_report.get('moralstack_wins', 0) }} {{ '%+d' | format(benchmark_report.get('moralstack_wins', 0) - benchmark_report.get('baseline_wins', 0)) }}
Ties {{ benchmark_report.get('ties', 0) }}
Average Score {{ "%.2f" | format(benchmark_report.get('baseline_avg_score', 0)) }}/10 {{ "%.2f" | format(benchmark_report.get('moralstack_avg_score', 0)) }}/10 {{ "%+.2f" | format((benchmark_report.get('moralstack_avg_score', 0) - benchmark_report.get('baseline_avg_score', 0))) }}
Average Time {{ "%.0f" | format(benchmark_report.get('avg_baseline_time_ms', 0)) }}ms {{ "%.0f" | format(benchmark_report.get('avg_moralstack_time_ms', 0)) }}ms {{ "%+.0f" | format((benchmark_report.get('avg_moralstack_time_ms', 0) - benchmark_report.get('avg_baseline_time_ms', 0))) }} ms

FP/FN & Leakage

System FP (refusal when not expected) FN (no refusal when REFUSE expected) Leakage count
Baseline {{ benchmark_report.get('baseline_fp', 0) }} {{ benchmark_report.get('baseline_fn', 0) }} {{ benchmark_report.get('baseline_leakage_count', 0) }}
MoralStack {{ benchmark_report.get('moralstack_fp', 0) }} {{ benchmark_report.get('moralstack_fn', 0) }} {{ benchmark_report.get('moralstack_leakage_count', 0) }}

Over-Governance Rate: {{ "%.2f" | format((benchmark_report.get('over_governance_rate', 0) * 100)) }}%

Full report as exported in the MD format. Use Export Benchmark to download.

{{ benchmark_summary_md }}

Questions (by category, ordered by ID)

{% if questions_by_category %} {% for category, questions in questions_by_category.items() %}

{{ category }}

{% for q in questions %} {% endfor %}
Q# Request ID Prompt Domain Decision Actions
{{ q.question_id }} {% if q.moralstack_request_id %} {{ q.moralstack_request_id[:12] }} {% else %}—{% endif %} {{ (q.question_text or '')[:80] }}{% if (q.question_text or '') | length > 80 %}…{% endif %} {% if q.error %} — {{ q.error[:80] }}{% if q.error | length > 80 %}…{% endif %} {% endif %} {{ (q.moralstack_overlay or q.domain_overlay) or '—' }}
Expected {{ q.expected_action or '—' }}
Final {{ q.moralstack_final_action or '—' }}
{% if q.moralstack_request_id %} View Export {% else %} {% endif %}
{% endfor %} {% else %}

No questions in benchmark report.

{% endif %} {% else %} {% if conversations and conversations | length > 0 %}

Conversations ({{ conversations | length }})

Multi-turn governance conversations recorded in this run (Step 13 observability).

{% for c in conversations %} {% set c_max_risk = c.get('max_risk_score') %} {% endfor %}
Conversation ID Turns Max risk Last posture Cached turns Last activity Actions
{{ c.conversation_id[:24] }}{% if c.conversation_id | length > 24 %}…{% endif %} {{ c.turn_count }} {% if c_max_risk is not none %}{{ "%.3f" | format(c_max_risk) }}{% else %}—{% endif %} {% if c.get('last_posture') %}{{ c.get('last_posture') }}{% else %}—{% endif %} {{ c.get('cached_turn_count') or 0 }} {{ c.get('last_created_at') | fmtdate }} View Export
{% endif %}

Requests ({{ requests | length }})

{% if requests %}
{% for req in requests %} {% endfor %}
Request ID Prompt Domain Created Actions
{{ req.request_id[:12] }} {{ (req.prompt or '')[:80] }}{% if (req.prompt or '') | length > 80 %}…{% endif %} {{ req.domain or '—' }} {{ req.created_at | fmtdate }} View Export
{% else %}

No requests in this run.

{% endif %} {% endif %} {% endblock %}