Generated by EfficientAI
{% if logo_data_uri %} {% else %}
EA
{% endif %}
EfficientAI

{{ title }}

{{ subtitle }}
Generated by: {{ generated_by }}
Date: {{ generated_at.strftime("%B %d, %Y") }}
Comparison ID: {{ simulation_id or comparison_id }}
Providers tested: {{ providers_tested | join(", ") }}
Voices tested: {{ voices_tested | join(", ") }}
{% if show_run_summary %}
Total runs / Samples: {{ total_runs_observed }} runs observed, {{ sample_count }} samples
{% else %}
Total Samples: {{ sample_count }} samples
{% endif %} {% if report_options.include_endpoint %}
Endpoint modes: {{ endpoint_modes | join(", ") if endpoint_modes else "Unknown" }}
{% endif %} {% if report_options.include_latency or report_options.include_ttfb %}
Latency formula: Total Latency = TTFB + remaining audio delivery time
{% endif %} {% if report_options.include_ttfb %}
TTFB: Time to First Byte (request start to first audio byte)
{% endif %}
Provider / Model / Voice tested:
{% for row in provider_overview %} {% endfor %}
Provider Models Tested Voices Tested Samples
{{ row.provider }} {{ row.models }} {{ row.voices }} {{ row.sample_count }}

Key evaluation dimensions:

Summary Insights

{% if summary.top_naturalness %}
{{ "Top Naturalness" if has_comparison_variants else "Naturalness Score" }}
{{ service._to_score(summary.top_naturalness.avg_mos) if summary.top_naturalness else "N/A" }} MOS
{% if summary.top_naturalness %}
Provider{{ summary.top_naturalness.provider_display }}
Model{{ summary.top_naturalness.model_display }}
Voice{{ summary.top_naturalness.voice_display }}
{% else %}
N/A
{% endif %}
{% endif %} {% if summary.lowest_latency %}
{{ "Lowest Latency" if has_comparison_variants else "Latency" }}
{{ service._to_ms(summary.lowest_latency.avg_latency_ms) if summary.lowest_latency else "N/A" }}
{% if summary.lowest_latency %}
Provider{{ summary.lowest_latency.provider_display }}
Model{{ summary.lowest_latency.model_display }}
Voice{{ summary.lowest_latency.voice_display }}
{% else %}
N/A
{% endif %} {% if report_options.include_endpoint %}
{{ summary.lowest_latency.endpoint_type if summary.lowest_latency else "N/A" }}
{% endif %}
{% endif %} {% if summary.lowest_hallucination %}
{{ "Lowest Hallucination" if has_comparison_variants else "Hallucination Rate" }}
{{ service._to_pct(summary.lowest_hallucination.avg_wer) if summary.lowest_hallucination else "N/A" }} WER
{% if summary.lowest_hallucination %}
Provider{{ summary.lowest_hallucination.provider_display }}
Model{{ summary.lowest_hallucination.model_display }}
Voice{{ summary.lowest_hallucination.voice_display }}
{% else %}
N/A
{% endif %}
{% endif %} {% if summary.best_context %}
{{ "Best Prosody" if has_comparison_variants else "Prosody Score" }}
{{ service._to_score(summary.best_context.avg_prosody, 3) if summary.best_context else "N/A" }}
{% if summary.best_context %}
Provider{{ summary.best_context.provider_display }}
Model{{ summary.best_context.model_display }}
Voice{{ summary.best_context.voice_display }}
{% else %}
N/A
{% endif %}
{% endif %} {% if not summary.top_naturalness and not summary.lowest_latency and not summary.lowest_hallucination and not summary.best_context %}
No summary cards selected for this report configuration.
{% endif %}

1. Aggregate Benchmark Metrics

{% for column in aggregate_metric_columns %} {% endfor %} {% if show_endpoint_column %} {% endif %} {% for row in provider_rows %} {% for column in aggregate_metric_columns %} {% endfor %} {% if show_endpoint_column %} {% endif %} {% endfor %}
Provider Model Voice{{ column.label }}Endpoint
{{ row.provider_display }} {{ row.model_display }} {{ row.voice_display }} {% if column.format_kind == 'ms' %} {{ service._to_ms(row[column.key]) }} {% elif column.format_kind == 'pct' %} {{ service._to_pct(row[column.key]) }} {% else %} {{ service._to_score(row[column.key], 3 if column.key == 'avg_prosody' else 2) }} {% endif %} {{ row.endpoint_type }}

2. Metric Rankings by Category

{% if metric_sections %} {% for metric in metric_sections %}

{{ metric.title }}

{{ metric.full_form }}
{{ metric.subtitle }}
{% if metric.definition %}
Definition: {{ metric.definition }}
{% endif %} {% for row in metric.rows %}
{{ row.variant_label }}
{{ row.display_value }}
{% endfor %}
Legend: {{ metric.range_label }}
{% if metric.zone_labels %}
{% if metric.zone_ticks %} {% for tick in metric.zone_ticks %} {% endfor %} {% endif %}
{{ metric.zone_labels.start }} {{ metric.zone_labels.middle }} {{ metric.zone_labels.end }}
{% endif %}
{% endfor %} {% else %}
No ranking metrics selected for this report configuration.
{% endif %} {% if show_hallucination_section %}

3. Speech Accuracy & Hallucination

Accuracy metrics are computed from existing Voice Playground evaluation outputs (WER/CER and ASR transcript where available).

{% if report_options.include_wer %} {% endif %} {% if report_options.include_cer %} {% endif %} {% for row in provider_rows %} {% if report_options.include_wer %} {% endif %} {% if report_options.include_cer %} {% endif %} {% endfor %}
Provider Model VoiceWERCER
{{ row.provider_display }} {{ row.model_display }} {{ row.voice_display }}{{ service._to_pct(row.avg_wer) }}{{ service._to_pct(row.avg_cer) }}
{% if report_options.include_hallucination_examples %}

Hallucination Examples

{% if not report_options.include_wer %}
Enable WER to include hallucination example ranking in the report.
{% elif hallucination_examples %} {% for example in hallucination_examples %}
{{ example.provider_display }} • {{ example.model_display }} • {{ example.voice_display }} WER {{ service._to_pct(example.wer) }}
Prompt reference: {{ example.prompt }}
Hallucination error (ASR transcript): {{ example.prediction }}
{% endfor %} {% else %}
No ASR transcript examples available in current comparison outputs.
{% endif %} {% endif %} {% endif %} {% if disclaimer_sections %}

4. Disclaimer & Interpretation Guidance

{% for section in disclaimer_sections %}

{{ section.title }}

{% endfor %} {% endif %}