Harness Engineering: How a Stateless LLM Becomes a Persistent, Evolving Agent
Version: 1.0 •
Date: March 26, 2026 Author: Xiaogang Wang (XG) + Swarm (AI Co-Architect) Status: For PE / Tech Leadership Review Classification: Internal
Table of Contents
Executive Summary 3
Architecture Overview 4
Six-Layer Architecture
The Compound Loop
The Harness — Core Innovation 5
Context Engineering
Memory Pipeline
Self-Evolution
Safety & Self-Harness
Intelligence Layer 9
Autonomous Pipeline
Proactive Intelligence
Job System
Session Architecture 11
Multi-Tab Parallel Sessions
Swarm Brain — Multi-Channel
Interface Layer 13
Three-Column Command Center
Core Engine & Growth Trajectory 14
Key Design Decisions & Tradeoffs 15
Competitive Positioning 16
Future Roadmap 17
1. Executive Summary
SwarmAI is a desktop application that wraps Claude's Agent SDK inside a harness — a structured layer of context management, persistent memory, self-evolution, and safety controls that transforms a stateless large language model into a persistent, evolving personal AI agent.
The core thesis is simple: most AI tools reset when you close them. Context is lost, decisions are forgotten, and users re-explain the same things session after session. SwarmAI solves this structurally, not through fine-tuning but through engineered knowledge persistence.
Key Innovation: The "Harness" — an 11-file context priority chain, 3-layer memory distillation pipeline, self-evolution registry, and 7 post-session hooks that create a compound loop: every session makes the next one better. Every correction prevents a class of future mistakes. The system doesn't just run; it compounds.
SwarmAI's architecture is organized into six horizontal layers. Each layer has a clear responsibility boundary and communicates with adjacent layers through well-defined interfaces. The Harness layer (Layer 3) is the core innovation — it is what differentiates SwarmAI from a simple LLM wrapper.
Figure 1: SwarmAI Agentic OS Architecture — Six-layer design with the Harness as the core innovation layer
The defining characteristic of SwarmAI's architecture is the compound loop — a feedback cycle where every session's output becomes the next session's input:
Session executes — user interacts with the agent, decisions are made, code is written, files are created
Memory updates — DailyActivity files accumulate; when ≥3 unprocessed files exist, distillation promotes recurring themes, key decisions, and corrections to MEMORY.md
Context enriched — next session's system prompt is assembled from the updated 11-file chain, now containing the latest memory, evolution state, and project context
Agent is smarter — the next session starts with full awareness of everything that happened, mistakes to avoid, and capabilities to leverage
Design Principle: Prevention over recovery. The compound loop is designed to make errors structurally impossible over time, not to handle them after they occur. Every correction captured in EVOLUTION.md prevents an entire class of future mistakes.
3. The Harness — Core Innovation
The Harness is what makes SwarmAI more than a ChatGPT wrapper. It is a structured layer of engineering between the user interface and the raw LLM that provides four critical capabilities: context continuity, memory persistence, self-improvement, and safety. Without the Harness, Claude is a stateless function. With it, Claude becomes a persistent, evolving agent that compounds value across sessions.
3.1 Context Engineering
Most AI tools assemble a single system prompt and send it with every request. SwarmAI maintains an 11-file priority chain (P0–P10) that is assembled, cached, and budget-managed through a multi-stage pipeline. This is the most token-intensive subsystem and the one with the highest impact on agent quality.
Figure 2: Context Engineering — 11-file priority chain with token budget management and L0/L1 caching
Priority Chain
Priority
File
Owner
Truncation
Purpose
P0
SWARMAI.md
System
Never
Core identity & principles
P1
IDENTITY.md
System
Never
Agent name, avatar, intro
P2
SOUL.md
System
Never
Personality & tone
P3
AGENT.md
System
Truncatable
Behavioral directives
P4
USER.md
User
Truncatable
User preferences & background
P5
STEERING.md
User
Truncatable
Session-level overrides
P6
TOOLS.md
User
Truncatable
Tool & environment config
P7
MEMORY.md
Agent
Head-trimmed
Persistent memory (newest kept)
P8
EVOLUTION.md
Agent
Head-trimmed
Self-evolution registry
P9
KNOWLEDGE.md
Auto
Truncatable
Domain knowledge index
P10
PROJECTS.md
Auto
Lowest priority
Active projects index
Assembly Pipeline
ContextDirectoryLoader.ensure_directory() — provisions/updates .context/ files from source templates
ContextDirectoryLoader.load_all() — L1 cache check (git-first freshness), assemble from sources if stale
Token Budget Engine — enforces budget (100K tokens for 1M context models), truncates P10–P3 as needed, never touches P0–P2
Key Design Decisions
Session-type-aware loading — Channel DMs skip EVOLUTION.md, PROJECTS.md, DailyActivity, and briefing (~9K tokens / 30% savings for quick exchanges)
L0/L1 cache — L1 uses git-first freshness (falls back to mtime); L0 is an AI-summarized compact version for constrained models
Head-trimming for agent-owned files — MEMORY.md and EVOLUTION.md keep the newest content; old entries trim from the top. This ensures recent context always wins.
3.2 Memory Pipeline
The memory pipeline is a three-layer distillation system that converts raw session activity into durable, curated knowledge. It solves the fundamental problem of AI amnesia: without this pipeline, every session starts from zero.
Figure 3: Memory Pipeline — Three-layer distillation from session capture to curated long-term memory
Three Layers
Layer
Storage
Lifecycle
Content
1. Capture
DailyActivity/YYYY-MM-DD.md
30 days, then archived
Per-session: deliverables, git commits, decisions, lessons, next steps. Auto-extracted by DailyActivityExtractionHook
This is a critical safety mechanism born from a real Sev-2 incident (COE C005). The distillation hook verifies all implementation claims against git log before promoting them to MEMORY.md. Without this, a mid-session DailyActivity snapshot captured before later commits can create false memories that compound across sessions. Claims that fail verification are tagged [UNVERIFIED] rather than promoted as fact.
Weekly LLM Maintenance
A scheduled background job (self-tune) performs weekly maintenance: prune resolved Open Threads, archive stale Recent Context entries (>30 days), verify Key Decisions still reflect reality. This prevents memory bloat and ensures the agent's knowledge stays current.
3.3 Self-Evolution
Self-evolution is the capability that makes SwarmAI get better over time. When the agent encounters a capability gap, it doesn't just fail — it can build a new skill, test it, and register it for future sessions. When the user corrects a mistake, the correction is captured permanently so the same class of error never recurs.
Design Principle: Corrections are the highest-value entries. They represent proven failure modes with known patterns. Deleting a correction is equivalent to removing a safety guard. The registry is append-mostly; corrections are append-only.
Gap Detection (Reactive + Proactive)
Reactive:s_self-evolution skill (always-active) detects capability gaps in real-time during sessions, orchestrates up to 3 build attempts
Proactive: Weekly maintenance scans DailyActivity for recurring error patterns and surfaces them in session briefings as capability gaps to address
3.4 Safety & Self-Harness
Safety is not a feature — it is a structural property of the architecture. SwarmAI implements defense-in-depth through multiple independent safety layers:
Layer
Mechanism
Details
Tool Logger
Audit trail
Every tool invocation logged with timestamp, parameters, and result
Command Blocker
Pattern matching
13 dangerous patterns blocked (rm -rf, DROP TABLE, force push, etc.)
Every decision in autonomous pipeline tagged: mechanical (auto), taste (batch at delivery), judgment (block for human)
The Self-Harness subsystem (context_health_hook.py) performs continuous validation: checking that all 11 context files exist and parse correctly, detecting DDD document staleness, auto-refreshing KNOWLEDGE.md and PROJECTS.md indexes, and auto-committing workspace changes.
4. Intelligence Layer
The Intelligence layer sits above the Harness and provides proactive awareness, autonomous execution capabilities, and background automation. While the Harness ensures the agent remembers and improves, the Intelligence layer ensures it anticipates, acts, and automates.
4.1 Autonomous Pipeline
The autonomous pipeline drives the full development lifecycle from a one-sentence requirement to a PR-ready delivery. It is the implementation of AIDLC Phase 3 (AI-Management) where AI makes autonomous decisions and humans step in only when needed.
Figure 5: Autonomous Pipeline — 8-stage lifecycle with DDD+SDD+TDD methodology and safety mechanisms
Eight Stages
Stage
Output
Gate
EVALUATE
ROI score, GO/DEFER/REJECT recommendation
ROI ≥ 3.5 to proceed
THINK
3 alternatives (Minimal/Ideal/Creative) with tradeoffs
User picks approach
PLAN
Design doc (SDD) with acceptance criteria, file list, effort
Three methodologies form a closed loop: DDD (4 project documents) provides autonomous judgment — "should we build this?". SDD (design doc with acceptance criteria) produces specs — "here's exactly what to build". TDD (tests before code) verifies delivery — "proof we built it correctly". The key insight: when no human reviews every line, the test suite IS the quality gate.
4.2 Proactive Intelligence
The proactive intelligence subsystem (1,142 lines, 106+ tests) provides session-start briefings and suggestions through four levels of analysis:
Level
Capability
How
L0
Parsing
Extract structured data from DailyActivity, MEMORY.md, open threads
Priority × staleness × frequency × blocking × momentum scoring per item
L3
Cross-session learning
JSON-persisted learning state: skip penalty for ignored suggestions, affinity bonus for accepted ones
L4
Signal highlights
Reads signal_digest.json for external intelligence (HN, RSS, GitHub); effectiveness scoring with trend detection
4.3 Job System
The job system provides background automation that runs independently of chat sessions. Jobs are scheduled via macOS launchd and execute as headless Claude CLI processes with full MCP tool access.
System jobs (read-only): self-tune, signal-fetch, signal-digest — maintain the agent's autonomous capabilities
User jobs (configurable): morning inbox triage, weekly summaries, monitoring — defined in user-jobs.yaml
The session layer manages the lifecycle of Claude subprocess instances. It replaced a monolithic AgentManager (5,428 lines) with four focused components during the v7 re-architecture in March 2026. The decomposition was driven by a real need: supporting parallel chat tabs and dedicated channel slots without resource exhaustion.
Module-level singletons. initialize() wires all components at startup. configure_hooks() for post-session hooks.
Key Invariants
Protected states (STREAMING, WAITING_INPUT) are never evicted
Subprocess spawn serialized via module-level _spawn_lock + _env_lock
Retry uses --resume flag to restore conversation context across crashes
Hooks fire via BackgroundHookExecutor — never block the request path
One dedicated slot always reserved for channels (min_tabs = 2)
5.2 Swarm Brain — Multi-Channel
The Swarm Brain architecture ensures that regardless of which channel the user communicates through — desktop chat tab, Slack, or future platforms — it is always the same Swarm, same memory, same context.
Figure 7: Swarm Brain — One AI, every channel, shared memory with three layers of continuity
Three Layers of Continuity
Layer
Mechanism
Scope
L1: Shared Memory
11 context files loaded at every prompt build
All sessions (tabs + channels)
L2: Cross-Channel Session
All channels share ONE Claude conversation (--resume)
Slack + future
L3: Active Session Digest
Sibling session summaries injected into prompts
Tabs ↔ Channels (bidirectional)
Adding a new channel: Write an adapter (~250 lines implementing ChannelAdapterBase), register in the gateway, map user identity. Zero architecture change required.
6. Interface Layer
6.1 Three-Column Command Center
The interface is designed as a single integrated system where the Chat Center orchestrates everything. The three columns are not independent panels — they are views into one unified workspace connected by drag-to-chat context injection.
Figure 8: Three-Column Command Center — SwarmWS Explorer, Chat Center, and Swarm Radar with drag-to-chat
Column
Purpose
Key Interactions
SwarmWS Explorer (left)
Persistent local workspace: Knowledge/, Projects/, .context/, DailyActivity/
Git-tracked with ETag polling (5s). Drag files to chat for instant context. Agent reads/writes/organizes/commits directly.
Chat Center (center)
Multi-session command surface with 1–4 parallel tabs
SSE streaming, per-tab state isolation, 55+ skills, MCP tools. Controls both Explorer (write files) and Radar (create todos).
Swarm Radar (right)
Attention dashboard: ToDos, active sessions, artifacts, background jobs
Drag ToDo/artifact to chat — agent gets full work packet. Background job results appear here. Session status visible.
7. Core Engine & Growth Trajectory
The Swarm Core Engine is the meta-architecture that ties all six flywheels together. Each flywheel feeds the others, creating compound growth: memory informs context, context improves sessions, sessions trigger evolution, evolution builds skills, skills improve memory capture, and the cycle continues.
Figure 9: Swarm Core Engine — Six interconnected flywheels with compound loop and growth trajectory
Context adapts per session type, proactive gap detection, DDD auto-sync, growth metrics
In Progress (3/6)
L4
Autonomous
Full AIDLC pipeline with checkpoint/resume, self-directed learning, human-in-the-loop judgment framework
Planned
8. Key Design Decisions & Tradeoffs
Decision
Choice
Alternative Considered
Rationale
Memory architecture
3-layer distillation (file-based)
Vector database (RAG)
Files are git-trackable, human-readable, editable. Vector DB adds latency, opacity, and a dependency. File-based memory can be inspected, corrected, and version-controlled.
Session management
4-component decomposition
Monolithic AgentManager
5,428-line God Object caused 15+ bugs during v7 migration (COE). Decomposition into Router/Unit/Lifecycle/Registry enabled parallel sessions and clean error boundaries.
Context assembly
11-file priority chain with budget
Single large system prompt
Priority-based truncation ensures identity and safety survive even under extreme context pressure. Budget management prevents context overflow crashes.
Channel architecture
Shared session (serialized)
Independent sessions per channel
"One brain" principle: user says something on Slack knows it too. Independent sessions fragment the agent's understanding of the user.
Skill system
SKILL.md instruction files
Compiled plugins / function registry
SKILL.md files are LLM-native: the agent reads them as natural language instructions. No compilation step, no registration API. A new skill is a markdown file.
Data storage
All local (SQLite + filesystem)
Cloud database
Zero cloud dependency for user data. Privacy by default. Works offline. No account required. User owns all their data.
Safety model
Defense-in-depth (7 layers)
Single permission gate
No single layer is sufficient. Tool logger + command blocker + sandbox + permission dialog + escalation + health hook + decision classification provide redundant protection.
Background jobs
macOS launchd
In-process cron / cloud scheduler
launchd survives app restarts, runs when app is closed, managed by OS. In-process cron dies with the app. Cloud scheduler adds dependency.
9. Competitive Positioning
SwarmAI occupies a unique position in the AI tooling landscape: it is not a code editor (Cursor/Windsurf), not an IDE (Kiro), not a CLI agent (Claude Code), and not a multi-platform connector (OpenClaw). It is an agentic operating system that optimizes for depth over breadth.
Capability
SwarmAI
Claude Code
Kiro
Cursor
OpenClaw
Persistent memory
3-layer pipeline
CLAUDE.md (manual)
Per-project specs
Per-project
Session pruning
Context system
11-file P0-P10 + budgets
Single prompt
Spec-driven
Codebase indexing
Standard prompt
Multi-session
1-4 parallel tabs
1 session
1 session
1 session
Per-channel
Self-evolution
55+ skills, corrections
No
No
No
No
Autonomous pipeline
8-stage + DDD+TDD
Manual
Spec-driven
No
No
Multi-channel
Unified brain
Terminal
IDE only
IDE only
21+ (isolated)
Scope
All knowledge work
Coding
Coding
Coding
Messaging + skills
Core Differentiator: The Harness. Every competitor either provides a raw LLM with a chat interface (Cursor, Claude Code) or a skill marketplace with session management (OpenClaw). None provides the compound loop of context engineering + memory distillation + self-evolution + safety harness that makes an AI agent genuinely improve over time.
10. Future Roadmap
Phase
Target
Key Deliverables
L3 Completion
Q2 2026
Growth metrics dashboard, DDD auto-sync, stale correction auto-healing, full session-type optimization
L4 Autonomous
Q3 2026
Full AIDLC pipeline with checkpoint/resume across sessions, self-directed learning loops, judgment framework with calibrated confidence
MCP Gateway
When SDK supports
Shared MCP server instances across sessions (currently 4 sessions × 5 MCPs = 20 instances). Reduces memory from ~2.9GB to ~750MB.
Multi-User
Q4 2026
Team workspace with shared projects, role-based access, collaborative memory (separate from personal memory)
Cross-Platform
Q4 2026
Linux support (currently macOS + Windows). launchd → systemd adaptation for background jobs.