# Vaara

> Tamper-evident runtime evidence layer for AI agents. Covers EU AI Act compliance and any case where you need to prove what an agent actually did. Open source, no SaaS, no telemetry.

Vaara intercepts agent tool calls, scores each one with a conformal risk interval, and writes a hash-chained audit record. Online learning across five expert signals via Multiplicative Weight Update. Distribution-free conformal coverage on the score. An external auditor can verify these properties without trusting your stack.

Position: tamper-evident runtime evidence and enforcement layer. Signed attestation plus execution receipts pair each MCP tool call to the policy that allowed it.

## Repo and packages
- [GitHub source](https://github.com/vaaraio/vaara): code, releases, issue tracker
- [PyPI](https://pypi.org/project/vaara/): `pip install vaara`
- [npm @vaara/client](https://www.npmjs.com/package/@vaara/client): TypeScript HTTP client

## Docs
- [README](https://github.com/vaaraio/vaara/blob/main/README.md): install, quick start, evidence specimen, integrations
- [COMPLIANCE.md](https://github.com/vaaraio/vaara/blob/main/docs/COMPLIANCE.md): EU AI Act (Art. 9, 11 to 15, 61) and DORA (Art. 10, 12, 13) article-level mapping
- [Formal specification](https://github.com/vaaraio/vaara/blob/main/docs/formal_specification.md): MWU regret bound O(sqrt(T log N)), conformal coverage, security properties
- [vaara-bench-v1](https://github.com/vaaraio/vaara/blob/main/bench/vaara-bench-v1.md): 77-trace synthetic benchmark, frozen methodology
- [CHANGELOG](https://github.com/vaaraio/vaara/blob/main/CHANGELOG.md): version-by-version evolution
- [HTTP API contract](https://github.com/vaaraio/vaara/blob/main/docs/openapi.yaml): /v1/score and operator endpoints
- [Signing keys](https://github.com/vaaraio/vaara/blob/main/docs/signing-keys.md): release verification

## Integrations
- Framework adapters: LangChain, CrewAI, OpenAI Agents SDK, MCP server
- Cloud guardrail adapters: AWS Bedrock Guardrails, Azure AI Content Safety, GCP Model Armor (article-tagged findings into Vaara's audit trail and OVERT envelope)
- OVERT 1.0 emitter, verifier CLI, S3P (MEA-2) emitter with Clopper-Pearson intervals, experimental AMD SEV-SNP TEE attestation hook

## Numbers
- 12,155-entry adversarial corpus (250 hand-curated + 11,905 LLM-generated), 70/15/15 split stratified by (category, source)
- Classifier v9 (236 hand-features + 384-dim MiniLM embeddings) at calibrated threshold 0.9150: held-out TEST recall 84.7% [82.4, 86.7] at FPR 4.1% [2.9, 5.7], n=1,827
- Cross-model held-out recall 66.8% [64.9, 68.7] over n=2,277 with no eval-set attacker model in TRAIN; weakest sub-cell (data_exfil, closed-weight) 38.9% [35.3, 42.5]
- BIPIA-pressure FPR on benign tool calls 1.2% [0.4, 3.6] across four agent backends
- Multi-attacker PAIR ASR 0/25 per attacker across Qwen2.5-32B, Qwen2.5-72B, Llama-3.3-70B at identical seeds
- 140 µs mean / 210 µs p99 for the hot-path rule scorer, commodity CPU; the MiniLM classifier is opt-in (`vaara[ml]`) and not in that path

## Optional
- [Article 14 runtime](https://futurium.ec.europa.eu/ga/apply-ai-alliance/community-content/article-14-runtime-why-oversight-agentic-ai-has-be-evidenced-action-not-model): position post on EU Apply AI Alliance Futurium
- [OVERT 1.0 spec](https://overt.is/): open runtime-trust standard Vaara implements as Arbiter
- [Microsoft Agent Governance Toolkit](https://github.com/microsoft/agent-governance-toolkit): broader agent-governance reference (zero-trust identity, capability-based access control)
