Security Module

Module: security Cohesion: 0.85 Members: 1

Security Module Documentation

This document provides detailed documentation for the security module, covering its purpose, security model, permission tiers, guardian risk scoring, and threat detection mechanisms.

1. Purpose

The security module is designed to safeguard the agent's operations by implementing robust permission controls, detecting sensitive information, preventing server-side request forgery, and providing an AI-powered approval system for tool calls. Its core purpose is to ensure the agent operates within defined safety parameters and to mitigate potential risks associated with automated actions.

2. Security Model

The overarching security model emphasizes a "fail-closed" approach, meaning that in cases of error or uncertainty, the system defaults to denying potentially risky actions. The Guardian sub-agent operates in a read-only mode for evaluations, ensuring that its analysis itself does not introduce new vulnerabilities. Critical information, such as detected secrets, is redacted in output to prevent accidental exposure.

3. Permission Tiers

The system employs a five-tier permission system to control the agent's actions and interactions. These tiers allow for granular control over what the agent can do and when it requires user intervention:

Tool classifications (e.g., READ_ONLY_TOOLS, EDIT_TOOLS) are used to categorize actions and apply appropriate permission checks based on the configured mode.

4. Guardian Risk Scoring

The Guardian Sub-Agent is an AI-powered automatic approval reviewer for tool calls. It evaluates the safety of proposed actions using a dedicated Large Language Model (LLM) and assigns a structured risk score between 0 and 100:

Each GuardianEvaluation includes the riskScore, a reasoning (human-readable explanation), a decision (approve, prompt_user, or deny), and a list of risks identified.

The GuardianContext provides the necessary information for the evaluation, including toolName, content (arguments/command), cwd (current working directory), recentFiles, and a yoloMode flag.

5. Threat Detection

The security module incorporates several mechanisms for detecting and mitigating various threats: