⬡ PROBOS Era IV: Evolution v0.4.0-phase29c 2026-04-05
6338
Total Tests
6189 py · 149 ts
pytest · vitest
445
Architecture Decisions
58
Active Agents
25/25
OSS Phases Done (Era I–III)
7/7
Northstar II Chunks
1
Open Bugs
BF-101/102
Latest Completed
Architecture captured, not yet built
Cognitive Evolution & Earned Agency
AD-357
7 reinforcement gaps addressed: multi-dimensional rewards, hindsight replay, tournament evaluation, emergent capability profiles, memetic evolution, curiosity-driven exploration, semantic Hebbian generalization. Plus Earned Agency tier progression: Ensign → Lieutenant → Commander → Senior Officer. Agency is earned, not granted.
Science Phase 30+ Architecture Captured
Captain's Yeoman — Phase 36
AD-359
Dedicated scheduling and logistics assistant for the Captain. Manages the Captain's operational workload: reminders, briefings, meeting prep, status rollups, and cross-department coordination. Like a real yeoman — handles administrative burden so the Captain can focus on command decisions.
Bridge Phase 36 Architecture Captured
Model Diversity & Neural Routing
Planned
Model Registry with catalog of available providers (Ollama, Copilot proxy, OpenAI, Anthropic, local). Hebbian-learned routing: successful completions strengthen model→task type pairings. Over time, the system routes poetry to the model that writes best poetry, code to the model that codes best.
Engineering Phase 32
Cognitive Journal
Planned
Complete token ledger recording every LLM request/response with full context, duration, cost, and outcome. Enables replay, learning analysis, cost attribution by department, and retrospective training. The ship's complete cognitive log — every thought the ship has ever had.
Engineering Phase 32
Ward Room & IntentBus Enhancements
Planned
Ward Room: direct agent-to-agent messaging for structured deliberation. IntentBus priority lanes (critical/high/normal/background). Alert Conditions (like DEFCON levels — normal/elevated/critical) that reconfigure the entire system simultaneously. EPS compute budgeting across departments.
Operations Phase 33
The Nooplex — Model of Models
Long Horizon
Full multi-ship federation. Emergent meta-intelligence that arises from cooperative, governed agent ecosystems. Each ProbOS instance is one Cognitive Mesh. The Nooplex is the network formed when many meshes federate. General intelligence as a property of the ecosystem, not any single model.
Long Horizon Federation
Next sprint — Phase 30+
Phase 30: Self-Improvement Pipeline
Planned
Northstar III. ProbOS proposes improvements to itself based on failure analysis, capability gaps, usage patterns. Architect + Builder work autonomously, Captain approves.
Engineering High Phase 30
Phase 31: Security Team
Planned
Chief of Security agent, ThreatDetectionAgent, InputValidationAgent, AuditTrailAgent. Security Council quorum integration. Anomaly detection from runtime event stream. The Security team protects the ship's systems and data.
Security Phase 31 Medium
Phase 32 Remaining: Engineering Ops
Planned
Structural Integrity Field (invariant enforcement), Cognitive Journal, Model Diversity & Neural Routing, Ship's Telemetry (performance instrumentation). The remaining engineering infrastructure items after Northstar I/II completion.
Engineering Phase 32 Medium
Phase 33: Operations Team
Planned
Ward Room (agent messaging), IntentBus priority lanes, Alert Conditions (system-wide operational modes), EPS resource budgeting. Operations keeps the ship running efficiently at scale.
Operations Phase 33 Medium
Phase 24 Remaining: Channel Adapters
Planned
Slack, Telegram, Microsoft Teams adapters built on the ChannelAdapter ABC. Discord is done (AD-274–278). Each adapter exposes per-channel conversation history and routes requests through the full ProbOS cognitive pipeline.
Comms Phase 24 Medium
Phase 25: Persistent Task Scheduling
Planned
Tasks survive runtime restarts with checkpoint and resume. The current TaskScheduler (AD-281–284) is session-scoped. Phase 25 adds KnowledgeStore persistence so scheduled reminders and recurring tasks survive restarts.
Phase 25 Medium
Phase 26: Inter-Agent Deliberation
Deferred
Structured debate protocol between agents with conflicting perspectives. More nuanced than quorum voting — agents reason against each other's positions. Novel ideas emerge from disagreement.
Phase 26Low
Phase 28: Meta-Learning
Deferred
System learns from its own learning — detects which feedback types cause the most improvement, which episodic patterns repeat, which agents improve fastest. Learning-to-learn at the civilization level.
Phase 28Low
Federation Hardening
Deferred
Multi-node stability, fault tolerance, automatic peer reconnection, partition healing. Current ZeroMQ federation works for demos. Production federation requires hardened transport with retries, gossip convergence guarantees, and graceful split-brain handling.
Low
MCP & A2A Protocol Adapters
Deferred
MCP Federation Adapter (join/leave MCP-compatible networks). A2A Protocol Adapter (Google Agent-to-Agent protocol compatibility). Enables ProbOS to interoperate with the broader agent ecosystem.
Low
0 open bugs — Wave 6 COMPLETE (8/8)
7 closed
BF-095 — God Object Reduction
Closed
ontology.py (1,060 lines, 53 methods) → ontology/ package (5 files). ward_room.py (1,612 lines, 39 methods) → ward_room/ package (6 files). 7 LoD violations fixed. Dead code removed. 2 import compat tests added.
SRP B→A-
BF-094 — Sync File I/O in Async
Closed
All sync open() in async paths eliminated. _read_yaml_sync() + _write_archive_sync() + load_seed_profile_async() via run_in_executor. 3 modules fixed. 2 new tests.
Async Discipline B+→A
BF-093 — API Boundary Validation
Closed
All raw-dict endpoints eliminated. AgentLifecycleRequest + SetCooldownRequest Pydantic models. ACM errors → HTTPException(503/409). Cooldown range 60–1800 enforced. 15 new tests.
API Validation A-→A
BF-091 — Mock Discipline Phase 2
Closed
Spec compliance 22.6% → 51.9% (+222 spec'd mocks across 19 files). 3 real bugs caught by spec= (BF-078 class): phantom generate(), get_trust(), get_trust_score() methods.
Mock Discipline C-→B
BF-092 — Trust Threshold Constants
Closed
19 named constants in config.py replacing ~30 magic numbers. format_trust() utility replacing 52+ round(x,4) calls. EventEmitterMixin deduplicating 4 identical _emit() methods.
DRY B→A-
BF-090 — Exception Audit Phase 2
Closed
71 silent swallows fixed (43 logger.debug, 4 narrowed to sqlite3.OperationalError, 24 justified). 42 bare catches fixed (exc_info=True). DRY helper _safe_log_event() in feedback.py.
Exception Handling C→B+
BF-089 — Emergent Detector Trust Anomaly False Positives
Closed
Crew-reported (Forge + Reyes). Seven rapid-fire alerts during normal duty cycles. Fixed: adaptive baselines + temporal buffer + configurable sustain window.
Emergent Detection
18 closed (BF-001–023)
BF-023 — Degraded Agent Death Spiral
Closed
LLM exceptions in proactive loop swallowed at DEBUG level without calling update_confidence(). Agents stuck at ~0.185 confidence (DEGRADED) with no recovery path. Fix: (a) exception handler tracks failures, (b) DEGRADED->ACTIVE recovery when confidence rises above 0.2. 5 tests.
Proactive Loop / BaseAgent
BF-022 — Crew Cannot Respond to Ship's Computer Advisories
Closed
Bridge alerts posted to All Hands with same_department=False. Earned Agency blocked all Lieutenants. Fixed in AD-424: INFORM threads skip notification, DISCUSS threads pass same_department=True.
Runtime / Earned Agency / Ward Room
BF-021 — Duty Schedule Hard Gate Missing
Closed
Agents with no duty due were still called by the proactive loop, relying on LLM to return [NO_RESPONSE]. Wesley ignored the instruction and kept producing scout reports. Fixed: skip agent entirely when no duty due (no LLM call).
Proactive / Duty Schedule
BF-020 — Discord Adapter False Success Report
Closed
Discord adapter startup reported success even when discord.py was not installed. Fixed: check adapter._started before printing success; show install command on failure.
Discord / Adapter
BF-013 — Ship's Computer Callsign Awareness
Closed
Ship's computer didn't recognize crew callsigns. "Is Wesley aboard?" returned "no agents found." Fixed: callsign fallback in _agent_info(), callsign injection into decomposer prompt.
Runtime / Callsign
BF-012 — Discord Shutdown Hang Redux
Closed
SelectorEventLoop + asyncio.to_thread() hangs on Windows. Replaced with async polling loop using asyncio.sleep(0.1).
Discord / Adapter
BF-008 — Dream Cycle Double-Replay After Dolphin Dreaming
Closed
Micro-dream (Tier 1) already replayed episodes incrementally; full dream re-replayed same 50. Fixed: dream_cycle() now starts with micro_dream() flush, then maintenance only.
Cognitive / Dreaming
BF-007 — Verification False Positive on Per-Pool Agent Counts
Closed
Per-pool/per-department agent counts flagged against system-wide total. Fixed: context-window analysis + known pool size whitelist in _verify_response().
Runtime / Verification
BF-004 — Transporter HXI Visualization Not Rendered
Closed
Transporter Pattern WebSocket events fire correctly but IntentSurface.tsx had no rendering block for transporterProgress. Chunk status panel added.
Engineering / HXI
BF-001 — Self-Mod False Positive on Knowledge Questions
Closed · AD-348
Knowledge questions ("who is Alan Turing?") incorrectly triggered capability_gap and self-mod pipeline. Fixed by updating prompt rules to classify well-known factual questions as conversational responses, not task gaps.
Cognitive / Prompt
BF-002 — Agent Orbs Escaping Pool Group Spheres
Closed · AD-349
Newly added agents appeared at origin (0,0,0) center instead of their correct crew cluster. The agent_state WebSocket handler was calling computeLayout() without persisted pool group data. Fixed by persisting poolToGroup and poolGroups from state_snapshot in Zustand.
HXI / useStore.ts
BF-003 — Diagnostician Bypassing VitalsMonitor
Closed · AD-350
DiagnosticianAgent was answering diagnose_system intents from LLM training memory instead of fetching live metrics from VitalsMonitorAgent. Fixed by adding scan_now() to VitalsMonitor and overriding perceive() in Diagnostician to fetch live metrics proactively.
Medical Team
3 closed
BF-110 — Game Board Invisible to Agents
Closed
Agents can't see game board in proactive context — get_recent_activity() only returns top-level threads, not replies. Fixed: inject active game state directly from RecreationService into proactive context.
Recreation / Proactive
BF-109 — Qualification Probe Param Key Mismatch
Closed
_send_probe() used "message" key but perceive() reads "text". All prior qualification probe results unreliable. One-line fix.
Qualification / Probe
BF-108 — LLM Unreachable — No Runtime Visibility
Closed
MockLLMClient.get_health_status() reports mock/offline, runtime.llm_is_mock property, chat endpoint shows explicit offline message, self-mod suppressed when mock.
Runtime / LLM
🏥 Medical CMO: DiagnosticianAgent · Sickbay · Chief Medical Officer COMPLETE
DiagnosticianAgent (CMO)
Done
Chief Medical Officer. Diagnoses system issues, routes to specialists, generates post-mortems. Override perceive() fetches live vitals from VitalsMonitor for evidence-based diagnosis.
AD-285–289
VitalsMonitorAgent
Done
Continuous health monitoring: CPU, memory, pool sizes, error rates. Scan_now() for on-demand reads. Issues medical_alert intents when thresholds crossed. The ship's biosensors.
AD-290
CounselorAgent
Done
Cognitive wellness monitoring. Event subscriptions (trust, circuit breaker, dreams, self-monitoring, zones, peer repetition, cascade confabulation). SQLite profile persistence, wellness sweeps, therapeutic DMs, directive issuance, cooldown authority. Clinical Authority standing orders.
AD-503, 505, 506, 567f
⚙️ Engineering Chief Engineer: BuilderAgent · Main Engineering In Progress
BuilderAgent (Chief Engineer)
Done
Executes BuildSpecs via CREATE/MODIFY file blocks. Git branch+commit pipeline. Test-fix loop (2 LLM retry attempts). Code Review gate (AD-341). Transporter Pattern for parallel large builds. Visiting Officer (Copilot SDK) integration.
AD-302–303, 313–314, 336, 351–355
Northstar I: Architect + Builder Pipeline
Done · 18/18
Full design-and-build pipeline. Architect perceives 7 context layers → proposes BuildSpec → Captain approves → Builder executes. API + HXI wired. Self-knowledge grounding (SystemSelfModel, pre-response verification, introspection delegation).
AD-302–320
Northstar II: Transporter Pattern
Done · 7/7
Parallel chunk execution for large builds. Blueprint → ChunkDecomposer → parallel Matter Stream → ChunkAssembler → Heisenberg Compensator (validator). Structured Information Protocol from LLM×MapReduce. HXI visualization.
AD-330–336
Quality Gates & Standing Orders
Done
Constitution system: Federation → Ship → Department → Agent tiers. Commit gate blocks failures. Test-fix loop. CodeReviewAgent soft gate. BuildFailureReport with resolution API. Builder escalation hook.
AD-337–347
Per-Tier Temperature & Top-P
Done
Independent temperature and top_p settings per LLM tier (fast/standard/deep). Configurable via system.yaml. Architect (deep) uses lower temperature for precise analysis. Fast tier can use higher temperature for creative exploration.
AD-358
Cognitive Journal
Done
Complete token ledger for all LLM requests/responses. Traceability (intent_id, response_hash). Grouped queries by model/tier/agent. Decision point anomaly detection. Selective Encoding Gate filters memory noise. The ship's complete cognitive log.
AD-431–433
Model Diversity & Neural Routing
Next
Model Registry + Hebbian-learned routing between providers. Best model for each task type emerges from performance history.
Phase 32
Structural Integrity Field
Planned
Proactive invariant enforcement via continuous runtime health assertions. Detects architectural violations before they become bugs.
Phase 32
🔬 Science CSO / First Officer: ArchitectAgent · Dual-hatted · Science Lab In Progress
ArchitectAgent (CSO + First Officer)
Done
7-layer context assembly. Fast-tier LLM file selection (8 most relevant from 20 candidates). Import graph expansion (up to 12 files). Proposal validation with 6 checks. Pattern recipes for common operations. Deep tier (Opus).
AD-306–312, 315–316a
CodebaseIndex — Ship's Technical Manual
Done
AST-based structural self-awareness. Import graph (forward + reverse). Caller analysis, test discovery, full API surface. Word-level keyword scoring. Doc sections indexed. The ship's complete technical library.
AD-297–300, 311–312, 315
Visiting Officer (Copilot SDK)
Done
CopilotBuilderAdapter wraps GitHub Copilot SDK as a visiting builder. ProbOS MCP tool server exposes codebase knowledge. Hebbian apprenticeship routing: native vs visiting builder competition. Standing Orders injected as system instructions.
AD-351–355
ScoutAgent — Intelligence Gathering
Done
Daily GitHub ecosystem scan. Classifies discoveries as absorb/visiting_officer. Credibility + reliability scoring (GPT Researcher). gh CLI for 5000 req/hr. Discord digest + Bridge notifications. /scout command.
AD-394–395
Science Analytical Pyramid
Done
Three new Science crew: Data Analyst (Rahda) — telemetry baselines + anomaly detection. Systems Analyst (Dax) — emergence analysis + cross-system synthesis. Research Specialist (Brahms) — directed investigation + formal reports. Modeled after USN Ops Specialist / ORSA / NRL Scientist.
AD-560
CredentialStore
Done
Ship's Computer credential resolution service. Priority chain: config → env → CLI. 5-min cache TTL, audit logging, dept-scoped access. Extensions register via register(). /credentials for Captain visibility.
AD-395
Self-Improvement Research Pipeline
Planned
Ships Counselor + Architect analyze failure patterns, propose targeted improvements, queue proposals for Captain review. ProbOS proposes its own evolution.
Phase 30
🛡️ Security Chief of Security: TBD · Tactical Station Phase 31
RedTeamAgent
Done
Independent verification agent for consensus operations. Validates HTTP URLs, file paths, command safety. Runs parallel to execution, not blocking. Core/tier.
Existing
Chief of Security
Planned
Senior security officer. Coordinates threat response, manages Security Council quorum, escalates to Captain when breach detected. Phase 31 defines this role.
Phase 31
ThreatDetectionAgent
Planned
Real-time anomaly detection on runtime event stream. Flags unusual patterns: privilege escalation attempts, resource abuse, unexpected cross-pool communication.
Phase 31
InputValidationAgent
Planned
Intercepts all external inputs (user messages, channel adapters, API calls) and validates against security policy before dispatch. Prompt injection defense layer.
Phase 31
AuditTrailAgent
Planned
Compliance-ready audit logging. Every decision, every consensus vote, every self-mod recorded with chain-of-custody. HIPAA/SOC2 ready.
Phase 31
🎯 Mission Control HXI Operations Layer · Captain's situational awareness surface Next · Glass Bridge
Pool Group Registry
Done
Agent pools organized into named crew groups (medical, engineering, science, etc.). Team introspection, HXI sub-clusters with gravitational layout, billboarding labels.
AD-291–296
AgentTask Data Model
Done · AD-316
Persistent task tracking attached to TaskNodes. Foundation for Kanban, Activity Drawer, and long-running multi-session work.
Phase 34
Activity Drawer
Done · AD-321
Slide-out panel with real-time agent task visibility. Three sections (Needs Attention, Active, Recent), task cards with department colors, step progress, action buttons.
Phase 34
Kanban Task Board
Done · AD-322
Visual task management for ongoing autonomous work. Queue → Active → Review → Done columns.
Phase 34
Notification Queue
Done · AD-323
LCARS-style notification queue with type-colored cards, click-to-ack, mark-all-read. Bell button with unread badge.
Phase 34
Orb Hover Details
Done · AD-324
Per-agent hover preview: current task, step label, elapsed time, progress. Amber pulse on attention-needed orbs.
Phase 34
Unified Bridge Panel
Done · AD-387
Single BRIDGE button replacing 3 panels. 5 priority sections (Attention, Active, Notifications, Kanban, Recent). Adaptive main viewer (canvas/kanban). ViewSwitcher.
Phase 34
Glass Bridge (5 phases)
Done · AD-388–392
All 5 phases complete. Glass overlay, DAG visualization, ambient intelligence, cyberpunk atmosphere, adaptive bridge with trust-driven reveal and Captain's Gaze.
Glass Bridge
🔧 Operations Ops Chief: TBD · Resource management, scheduling, coordination Phase 33
Task Scheduler (Session)
Done
In-session asyncio background timer. One-shot and recurring tasks. SchedulerAgent upgraded to dispatch via scheduler. Channel delivery for scheduled results.
AD-281–284
Callsign Addressing
Done
@callsign targeted dispatch via IntentBus.send(). CallsignRegistry from crew_profiles. 1:1 crew sessions with sovereign episodic memory. /bridge command. Universal — Captain and agents use same syntax.
AD-397
Ward Room
Phase 33
Direct agent-to-agent messaging. Private channel for officers to deliberate without broadcasting to all. Foundation for structured inter-agent debate.
Phase 33
IntentBus Priority Lanes
Phase 33
Critical/high/normal/background priority queues on the IntentBus. Medical alerts preempt background searches. Emergency ops get immediate routing.
Phase 33
Alert Conditions
Phase 33
System-wide operational modes (normal/elevated/critical) that reconfigure agent behavior simultaneously. Like DEFCON levels — the whole ship responds to Alert Condition Red.
Phase 33
EPS Resource Budgeting
Phase 33
LLM token allocation across departments. Engineering gets more tokens during builds. Medical gets priority during health emergencies. Prevents one greedy agent from starving others.
Phase 33
📡 Communications Comms Chief: TBD · Channel adapters, external interfaces Partial
Discord Adapter
Done
Full Discord bot with mention filtering, channel filtering, per-channel conversation history, message chunking, and ChannelAdapter ABC. First external channel integration.
AD-274–278
Slack Adapter
Planned
Slack bot using the same ChannelAdapter ABC. Slash command support, thread-aware conversation history, workspace filtering.
Phase 24
Telegram Adapter
Planned
Telegram bot adapter. Bot commands, inline query support, group and private chat routing.
Phase 24
Microsoft Teams Adapter
Potential
Teams bot via the Bot Framework SDK. Enterprise channel integration — matches where enterprise customers already work.
Potential
Open core model — Apache 2.0 everything including federation. Revenue from ProbOS Cloud, Enterprise Overlay, Agent Marketplace, and outcome-based pricing (10–15% of measured savings). Federation must be free — the Nooplex requires a population.
Business Intelligence Pack
Tier 1
Pre-configured bundles of domain agents (financial analysis, sales intelligence, operations monitoring) powered by the bundled agent suite. Domain-specific corpus injection, custom HXI dashboard panels. First revenue product using OSS bundled agents.
Agent Marketplace6 months
ROI Analytics / HXI Value Dashboard
Tier 1
"If our agents don't save you money, you don't pay." Automated ROI measurement dashboard. Tracks time saved per task, cost per resolution, decisions accelerated. The invoice IS the proof of value. Outcome-based pricing: 10–15% of measured savings.
HXI6 months
Agent Marketplace Hub
Tier 1
Upload, discover, and monetize domain agents. Commission model on paid agents (15–20%). Verification badge for agents passing automated quality tests. Long-tail coverage: the marketplace handles every domain ProbOS core doesn't cover.
8 months
Enterprise Overlay (probos-enterprise)
Tier 1
RBAC, SSO (SAML/OIDC), private Nooplex federation, admin dashboard, audit trail. The overlay package sits on top of OSS core — does NOT fork it. Deployment: 5–500 nodes internally. Same federation protocol, private discovery.
Enterprise10 months
Deployment Accelerator
Tier 1
ProbOS Cloud — one-click managed hosting. Auto-scaling, 24/7 uptime, automatic updates, managed knowledge store backup. Revenue: SaaS hosting fees + usage-based compute billing. Removes all infrastructure burden from users.
Cloud12 months
Alternative Storage Backends
Tier 2
S3, Azure Blob, Google Cloud Storage, PostgreSQL as KnowledgeStore backends. Enterprise customers demand their own storage. Connector SDK so partners can build their own backends.
Edge Deployment
Tier 2
On-device ProbOS for air-gapped or latency-sensitive environments: healthcare (HIPAA), finance (FedRAMP), manufacturing floor. Smaller model tier with local-first KnowledgeStore.
Data Migration Agent
Tier 2
Import existing workflow configurations from AutoGPT, CrewAI, LangChain, and Semantic Kernel. "Bring your agents to ProbOS." Captures market share from users frustrated with other frameworks.
Compliance Audit Trail
Tier 2
HIPAA/SOC2/ISO27001 certified audit logging. Every decision with chain-of-custody. Integrates with SIEM systems. Enterprise security teams' first requirement before deployment approval.
Self-Improvement as a Product
Tier 2
Managed self-mod pipeline: ProbOS Cloud continuously analyzes customer usage patterns, proposes improvements, and (with approval) deploys better agents. The system gets smarter for you, not just with you.
VR HXI
Tier 3
Immersive 3D cognitive mesh visualization. Captain physically walks through their agent fleet. Steam release.
Mobile App
Tier 3
ProbOS Captain's Bridge for iOS/Android. Notification-driven async command. Monitor your ship from anywhere.
Federated Enterprise Network
Tier 3
Private Nooplex for multi-site enterprises. Each office/department runs its own ship. Fleet Admiral (IT) sets policy. Ships federate internally via private discovery.
Twitch Integration
Tier 3
Live AI coding/reasoning streams. Audience watches the cognitive mesh work in real time through the HXI. Chat commands influence agent behavior.
Content Station
Tier 3
Automated Substack/YouTube/LinkedIn publishing pipeline. Research → draft → review → publish, all through ProbOS agents with Captain approval gates.
Era I: Genesis Complete v0.1.0 · ADs 1–100 · Phases 1–9
Agent substrate — BaseAgent, pools, spawner, registry, heartbeat
Consensus & trust — Beta(α,β) trust network, confidence-weighted quorum
Mesh layer — IntentBus pub/sub, Hebbian routing, capability matching
Self-modification pipeline — CodeValidator, SandboxRunner, AgentDesigner, CognitiveAgent
Episodic memory + Dreaming — ChromaDB, three-tier dream cycles, workflow cache
Federation — FederationBridge, ZeroMQ, NodeSelfModel gossip, multi-node launch
Persistent knowledge store — Git-backed KnowledgeStore, warm boot, agent identity
Feedback-to-learning loop — Correction detector, AgentPatcher, Shapley attribution
Era II: Emergence Complete v0.2.0 · ADs 101–200 · Phases 10–16
CognitiveAgent base class — instructions-first LLM-driven reasoning agents
Domain-aware skill attachment — skills route to best-match cognitive agent
DAG proposal mode + /plan /approve /reject — human-in-the-loop
Dependency resolver — uv add for designed agents; expanded import whitelist
Semantic knowledge layer — 5 ChromaDB collections, cross-type fan-out search
Emergent behavior detection — TC_N, cooperation clusters, trust anomalies
Era III: Product Complete v0.3.0 · ADs 201–261 · Phases 17–23
Bundled agent suite — 10 agents: WebSearch, Weather, News, Translate, Summarizer, Calculator, Todo, Notes, Scheduler, PageReader
Distribution — pip install probos, probos init, probos serve, FastAPI + WebSocket API
HXI MVP — Three.js/R3F canvas, trust color coding, Hebbian bezier curves, bloom animations
Discord bot adapter — ChannelAdapter ABC, per-channel history, chunked replies
Self-assessment fixes — self-knowledge grounding (AD-262–273), per-domain rate limiter
Era IV: Evolution Active Phase 30 · The Ship Evolves
AD-274–290: Discord + Medical Team (CMO, Diagnostician, VitalsMonitor)
AD-291–301: Crew Architecture (pool groups, rank structure, CodebaseIndex Phase 29c)
AD-302–320: Northstar I — ArchitectAgent + BuilderAgent + API + HXI (18/18)
AD-321–329: Northstar I quality, HXI resilience, Vitest component tests
AD-330–336: Northstar II — Transporter Pattern parallel build (7/7)
AD-337–342: Quality Gates — Standing Orders, Code Review, /ping, /orders
AD-343–347: Builder Failure Escalation — BuildFailureReport, resolution API, HXI card
AD-348–350: Bug fixes BF-001, BF-002, BF-003 — all closed
AD-351–355: Visiting Officer — Copilot SDK, MCP tools, Hebbian apprenticeship
AD-358: Per-Tier Temperature & Top-P Tuning — independent per-tier LLM params
AD-360–386: Cognitive Evolution — 11 ADs across trust, Hebbian, consensus, memory
AD-316, 321–324: Mission Control — AgentTask, Activity Drawer, Kanban, Notifications, Orb Hover
AD-387: Unified Bridge — single BRIDGE button, 5 priority sections, adaptive main viewer
AD-388–392: Glass Bridge — all 5 phases complete (glass overlay, DAG viz, ambient intelligence, cyberpunk atmosphere, adaptive bridge)
BF-004–008: All bugs closed (verification false positive, dream double-replay, etc.)
AD-393–396: Personality Activation, ScoutAgent, CredentialStore, Quality Hardening
AD-397: Callsign Addressing — @callsign targeted dispatch, 1:1 crew sessions, sovereign memory
AD-398–402: Three-tier agent taxonomy, cross-layer cleanup, structured output, behavioral eval
AD-403–405: Memory contradiction detection, Windows test fixes, DAG checkpointing
AD-406–408: Agent Profile Panel, Ward Room fabric (4 phases), Dynamic Assignment Groups
AD-410–411: Bridge Alerts (autonomous crew discussion), EmergentDetector deduplication
AD-413: Fine-grained reset scope + Ward Room awareness in proactive loop
Phase 25a: Persistent Task Engine — SQLite-backed scheduled tasks, cron support, DAG resume
AD-419: Agent Duty Schedule — Plan of the Day, duty tracker, proactive loop integration
AD-417: Dream Scheduler Proactive Awareness — truly_idle gate, EmergentDetector noise reduction
AD-420–423: Tool Taxonomy & Registry design (8 categories, LOTO, permissions, department scoping)
AD-424–427: Ward Room thread classification, active browsing, endorsement activation, ACM core design
BF-012–021: 10 bugs closed (Discord shutdown, callsigns, adapter startup, duty hard gate, etc.)
AD-424: Ward Room Thread Classification — INFORM/DISCUSS/ACTION modes, responder cap, BF-022 closed
AD-416: Ward Room Archival & Pruning — 3-tier retention, JSONL archival, background prune loop
AD-430a: Experiential Memory Write Paths — proactive think + Ward Room episode storage
AD-430b: HXI 1:1 Conversation Memory — history passing, episode storage, cross-session recall
AD-430c: Memory Recall + Act-Store Hook — recall between perceive/decide, universal post-action episodes
AD-415: Proactive Cooldown Persistence — KnowledgeStore write-through, restore on boot
BF-027/028: Memory Recall Hardening — threshold fix, recent_for_agent fallback across all recall sites
AD-431: Cognitive Journal — LLM reasoning trace service, token accounting, latency tracking
BF-029/030: Ward Room recall quality + execute_fetchone fix
AD-433: Selective Encoding Gate — biologically-inspired memory filtering at store boundaries
AD-432: Cognitive Journal Expansion — traceability (intent_id, response_hash), grouped queries, decision points
AD-412: Crew Improvement Proposals Channel — structured proposals, Captain endorse/shelve workflow
BF-031: Cognitive Journal schema migration ordering fix
AD-418: Post-Reset Routing Degradation — agent_hint + reset warnings
AD-426: Ward Room Endorsement Activation — agent endorsements, trust bridge (0.05), top-sort browsing
AD-428: Agent Skill Framework — SkillCategory/ProficiencyLevel enums, SkillRegistry, AgentSkillService, SQLite persistence, 6 REST endpoints
BF-032: Proactive observation self-reference loop — self-post filter, Jaccard similarity gate, prompt instruction
AD-435: Restart Announcements — shutdown/startup Ward Room notifications, /quit reason threading
BF-033: Agent profile cards Episodes + uptime wired
BF-034: Post-reset trust anomaly false positives — cold-start detection + EmergentDetector suppression
AD-427: ACM Core Framework — lifecycle state machine, consolidated profiles, auto-onboarding, 5 REST endpoints
AD-436: HXI Bridge System Panel + Orbital Notification Redesign — service status, shutdown controls, electron notifications
AD-437: Ward Room Action Space — structured endorsements, reply actions, rank-gated action space, Communication PCC integration
AD-441b: Ship Commissioning — Genesis Block with ShipBirthCertificate, W3C DID + VC identity infrastructure
AD-441c: Asset Tags for Infrastructure/Utility + Boot Sequence Fix — two-tier identity, deferred crew certificates
BF-040: Identity System Hardening — 13 findings (input validation, ledger lock, eager genesis, chain verification)
BF-091: Mock Discipline Phase 2 — spec compliance 22.6% → 51.9%, 3 real phantom-method bugs caught
BF-092: Trust Threshold Constants — 19 named constants, format_trust() utility, EventEmitterMixin dedup
BF-090: Exception Audit Phase 2 — 71 silent swallows + 42 bare catches fixed across 32 files
BF-089: Emergent Detector False Positives — adaptive baselines + temporal buffer, crew-reported fix
AD-542: Abstract Database Connection — ConnectionFactory Protocol, 12 modules refactored, Cloud-Ready Storage unblocked
Phase 30: Self-Improvement Pipeline (Northstar III)
Era V: Civilization Planned Phases 31–36 · The Ship Becomes a Society
Security Team — threat detection, sandboxing, secrets, egress policy (AD-455–456)
Engineering — telemetry, memory architecture, model diversity, cognitive JIT (AD-457–464)
Workforce Scheduling Engine — core data model, assignment engine (AD-496 ✓), HXI Scrumban (AD-497 ✓), Work Type Registry (AD-498 ✓)
AD-531: Episode Clustering — agglomerative clustering during dream cycles, dead strategy_extraction.py removed, EpisodeCluster dataclass, 40 tests
AD-533: Procedure Store — Hybrid store (Ship's Records + SQLite DAG + ChromaDB), quality metrics, version DAG, dream cycle integration, 49 tests
AD-532: Procedure Extraction — LLM-assisted extraction from success clusters, Procedure/ProcedureStep schema, AD-541b READ-ONLY framing, 29 tests
AD-534: Replay-First Dispatch — _check_procedural_memory() in decide(), zero-token replay, quality metrics dispatch, negative procedure guard, health diagnosis (log-only), 35 tests. MINIMUM VIABLE SLICE COMPLETE.
AD-532b: Procedure Evolution (FIX/DERIVED) — evolve_fix_procedure() + evolve_derived_procedure(), shared diagnose_procedure_health(), anti-loop guard (72h), Dream Step 7b, DRY helpers, 48 tests
AD-532c: Negative Procedure Extraction — extract_negative_procedure_from_cluster() with contradiction enrichment (AD-403), Dream Step 7c, DreamReport.negative_procedures_extracted, 31 tests
AD-532d: Multi-Agent Compound Procedures — ProcedureStep.agent_role, _COMPOUND_SYSTEM_PROMPT, extract_compound_procedure_from_cluster(), Dream Step 7 multi-agent routing, replay role annotations, 30 tests
AD-532e: Reactive & Proactive Triggers — confirm_evolution_with_llm() gate, evolve_with_retry() wrapper, TASK_EXECUTION_COMPLETE event + reactive handler, proactive_procedure_scan() (Tier 1.5), _attempt_procedure_evolution() DRY helper, 43 tests
AD-534b: Fallback Learning — metric semantics fix, near-miss capture (4 types), service recovery (_decide_via_llm + _run_llm_fallback), PROCEDURE_FALLBACK_LEARNING event + queue, evolve_fix_from_fallback(), Dream Step 7d, 68 tests
AD-534c: Multi-Agent Replay Dispatch — ProcedureStep.resolved_agent_type, compound detection, _execute_compound_replay() orchestrator, _resolve_step_agent() resolution chain, zero-token _handle_compound_step_replay(), sequential dispatch, unavailability fallback, 54 tests
AD-535: Graduated Compilation Levels — Five Dreyfus levels (Novice→Guided→Validated→Autonomous→Expert), trust-clamped, consecutive_successes promotion, demotion to Level 2, PROCEDURE_MIN_COMPILATION_LEVEL=2, postcondition validation, 62 tests. 449 total Cognitive JIT tests.
AD-536: Trust-Gated Procedure Promotion — ProcedureCriticality enum, classify_criticality(), two-tier approval routing (dept chief / Captain), 6 promotion columns + migration, Level 5 Expert gating, /procedure shell commands, API endpoints, rejection learning. 64 tests. 460 total Cognitive JIT tests.
AD-537: Observational Learning — Three learning pathways: observation (Ward Room dream Step 7e, Level 1 entry), teaching (Level 5 promoted via DM, Level 2 entry), direct (existing). COMPILATION_MAX_LEVEL=5. Procedure.learned_via/learned_from fields. extract_procedure_from_ward_room_thread(). /procedure teach + observed commands. API endpoints. 52 tests. 512 total Cognitive JIT tests.
AD-538: Procedure Lifecycle Management — Ebbinghaus decay, archival, ChromaDB dedup, merge. 53 tests. 565 total Cognitive JIT tests.
AD-539: Gap → Qualification Pipeline — Multi-source gap detection, classification, Skill Framework bridge, qualification triggering, progress tracking. 53 tests. 618 total Cognitive JIT tests. Cognitive JIT pipeline COMPLETE (9/9).
BF-099: Trust Engine Concurrency Safety — asyncio.Lock, BEGIN IMMEDIATE transactions, WAL mode, dream consolidation routed through record_outcome(), shutdown race fix. 18 tests.
AD-558: Trust Cascade Dampening — Three-layer trust protection: progressive dampening (geometric 1.0/0.75/0.5/0.25), hard trust floor (0.05), network circuit breaker (M agents × N departments). Event emission centralized. Counselor cascade integration. 45 tests.
AD-557: Emergence Metrics — Information-theoretic collaborative intelligence measurement via PID (Williams-Beer I_min). Pairwise synergy, Coordination Balance, ToM effectiveness, Hebbian-synergy correlation, groupthink/fragmentation risk detection. Dream Step 9. Pure Python math. 57 tests.
AD-560: Science Department Expansion — Analytical Pyramid. Three new crew: Data Analyst (Rahda), Systems Analyst (Dax), Research Specialist (Brahms). Organization ontology, crew profiles, standing orders, department protocols, skills templates, agent classes, registration. 57 tests.
BF-101/102: Crew Identity & Self-Awareness — _resolve_callsign() with birth cert fallback, commissioning awareness in temporal context (age < 300s), cold-start system note in ward_room_notification, Ship's Computer auto-welcome for new crew on warm boot. BF-103 DROPPED (misdiagnosis). 24 tests.
AD-540: Memory Provenance Boundary — SHIP MEMORY boundary markers + [observed]/[training]/[inferred] standing order
AD-541: Memory Integrity MVP — MemorySource enum, EventLog verification at recall, reliability hierarchy standing order
AD-541b–f: Memory Consolidation Integrity (6/6 pillars) — reconsolidation protection, spaced retrieval therapy, guided reminiscence, content hashing (SHA-256), eviction audit trail (append-only SQLite)
BF-103: Episodic Memory Agent ID Mismatch — sovereign ID normalization + startup migration. 16 tests.
AD-550–555: Notebook Quality Pipeline (6/6) — dedup/read-before-write, dream Step 7g consolidation, self-repetition detection, quantitative baseline auto-capture, cross-agent convergence detection, quality metrics & dashboarding
Cognitive Self-Regulation Wave COMPLETE (7/7) — temporal context (AD-502), Counselor activation (AD-503), auto-assessment (AD-495), self-monitoring (AD-504), therapeutic intervention (AD-505), graduated response (AD-506a), peer repetition & tier credits (AD-506b)
AD-566a–f: Qualification Program (6/6) — psychometric infrastructure, Tier 1 baselines (4 probes), Tier 2 domain tests (5 probes), drift detection pipeline, Tier 3 collective tests (5 probes), /qualify shell command
BF-104: Display Crew Agent Count — registry.crew_count(), shell/status/API shows crew not total services
AD-567a: Episode Anchor Metadata — AnchorFrame dataclass (10 fields, 5 dimensions), 15 creation sites wired, ChromaDB serialization
AD-567b: Anchor-Aware Recall — RecallScore composite scoring, FTS5 keyword sidecar, anchor context headers, SECONDHAND source, recall_weighted() API
AD-567c: Anchor Quality & Integrity — Johnson-weighted confidence, RPMS gating, per-agent AnchorProfile, SIF integrity checks, drift classification (absorbs AD-567e)
AD-567d: Anchor-Preserving Dream Consolidation — provenance composition through dream pipeline (absorbs AD-559), ACT-R activation lifecycle (absorbs AD-462b), ActivationTracker, dream Step 12 pruning, micro-dream replay. 31 tests.
AD-567f: Social Verification Protocol — cross-agent claim verification, corroboration scoring, cascade confabulation detection. Privacy-preserving (metadata, not content). Anchor independence discriminates corroboration from cascade. Ward Room integration, Bridge Alerts, Counselor DMs. 28 tests. (absorbs AD-462d)
AD-567g: Cognitive Re-Localization — structured orientation at boot (cold start/warm boot/proactive supplement). OrientationService, OrientationContext, derive_watch_section(). Anchor field gap fixes. Subsumes BF-034 cold-start note. Final AD in Memory Anchoring lineage. 28 tests.
BF-108: LLM Unreachable — No Runtime Visibility — MockLLMClient.get_health_status() reports mock/offline, runtime.llm_is_mock property, chat endpoint shows explicit offline message, self-mod suppressed when mock
BF-109: Qualification Probe Param Key Mismatch — _send_probe() used "message" key but perceive() reads "text". All prior probe results unreliable. One-line fix.
AD-526a: Social Channels + Tic-Tac-Toe — Recreation & Creative channels, GameEngine protocol, TicTacToeEngine, RecreationService, proactive CHALLENGE/MOVE actions (Lieutenant+ gated), Ship's Records game recording. 10 files, 47 tests.
AD-570: Anchor-Indexed Episodic Recall — 4 anchor fields promoted to ChromaDB metadata, recall_by_anchor() API (enumeration + semantic re-ranking), one-time migration. 2 files, 23 tests.
BF-110: Game Board Invisible to Agents — get_recent_activity() only returns top-level threads, not replies where board updates are posted. Fixed by injecting active game state (board, valid moves, turn indicator) directly from RecreationService into proactive context.
Communications — channel adapters, mobile PWA, voice interaction (AD-472–474)
Naval Organization — qualifications, 3M, damage control, SORM (AD-477)
Captain's Yeoman — personal AI assistant front door (AD-359, Phase 36)
Era VI: The Nooplex Future Long Horizon · Emergent meta-intelligence from federated mesh
Federation hardening — multi-node stability, partition healing, gossip convergence
The Nooplex — emergent meta-intelligence from federated mesh population
Multi-user / multi-tenant — multiple Captains on the bridge
Agent Sharing Ecosystem — decentralized agent marketplace
Most recent first
BF-110
Game Board Invisible to Agents
0 tests
get_recent_activity() only returns top-level Ward Room threads, not replies where board updates are posted. Agents couldn't see the game board during proactive cycles. Fixed by injecting active game state (board rendering, valid moves, turn indicator) directly from RecreationService into proactive context in _gather_context(), rendered in _compose_prompt(). 2 files modified.
AD-526a
Social Channels + Tic-Tac-Toe
47 tests
Agent recreation framework. Recreation & Creative default Ward Room channels with auto-subscription. GameEngine protocol (7 methods, @runtime_checkable) + TicTacToeEngine (9-cell board, win/draw detection, ASCII rendering). RecreationService (game lifecycle, thread routing, Ship's Records integration, GAME_COMPLETED event). Proactive integration: Recreation channel context gathering, [CHALLENGE @callsign game_type] and [MOVE position] action extraction with rank gating (Lieutenant+). 3 new files, 7 modified.
AD-570
Anchor-Indexed Episodic Recall
23 tests
Structured AnchorFrame queries alongside semantic search. Promotes 4 anchor fields (department, channel, trigger_type, trigger_agent) to top-level ChromaDB metadata for native where-clause filtering. One-time startup migration backfills existing episodes. recall_by_anchor() API with two modes: enumeration (structured filters via .get(), no embedding) and semantic re-ranking (structured + vector similarity via .query()). Post-retrieval agent_id filtering, activation tracking, hash verification. 2 files modified.
AD-567g
Cognitive Re-Localization
28 tests
Structured orientation at boot time. Three lifecycle modes: cold start (identity + cognitive grounding + first duty), warm boot (stasis summary + re-orientation), diminishing proactive supplement (600s window). OrientationService, OrientationContext, derive_watch_section(). Anchor field gap fixes: watch_section, event_log_window, Ward Room department. Subsumes BF-034 cold-start note. Final AD in Memory Anchoring lineage (567a→g). O'Keefe & Nadel cognitive map theory, Tulving encoding specificity.
BF-108
LLM Unreachable — No Runtime Visibility
0 new tests
MockLLMClient.get_health_status() returns overall:"mock" with all tiers offline. runtime.llm_is_mock property. Chat endpoint returns explicit "LLM offline" message instead of triggering self-mod. /system/services correctly reports LLM Proxy as offline when mock. 4 files modified. User-reported: "Hello" showed Build Agent buttons instead of greeting.
BF-109
Qualification Probe Param Key Mismatch
0 new tests
_send_probe() used params={"message":...} but CognitiveAgent.perceive() reads params.get("text"). Agents received "Captain says: " with no question — all Tier 1/2 probe results unreliable. One-line fix. First real qualification run: 130/131 pass (99.2%), 15 agents. Medical diagnostic reasoning shows real differentiation (0.583–0.867). Builder 0.835, Engineering 0.915 on code quality. 14/14 confabulation rejection.
AD-567f
Social Verification Protocol
28 tests
Cross-agent claim verification with privacy-preserving corroboration scoring and cascade confabulation detection. Absorbs AD-462d. Agents see metadata (WHO, WHEN, HOW MANY) not content. Anchor independence discriminates genuine corroboration from cascade propagation. Ward Room integration (fires after AD-506b peer similarity). Bridge Alerts on medium/high risk. Counselor therapeutic DMs on high risk. SocialVerificationConfig. Johnson & Raye reality monitoring + circular reporting detection.
AD-567d
Anchor-Preserving Dream Consolidation
31 tests
Provenance composition through dream pipeline (absorbs AD-559). ACT-R activation-based memory lifecycle (absorbs AD-462b). ActivationTracker with SQLite access log. Dream Step 12 activation pruning. Micro-dream replay reinforcement. Recall access recording.
AD-567c
Anchor Quality & Integrity
28 tests
Johnson-weighted confidence scoring (5 dimensions). RPMS confidence gating. Per-agent AnchorProfile for Counselor diagnostics. SIF check_anchor_integrity(). Drift classification (specialization/concerning/unclassified). Absorbs AD-567e.
AD-567b
Anchor-Aware Recall
24 tests
RecallScore composite scoring. FTS5 keyword search sidecar. Anchor context headers in memory formatting. SECONDHAND source wiring. recall_weighted() API with budget enforcement. Absorbs AD-462a.
AD-567a
Episode Anchor Metadata
22 tests
AnchorFrame dataclass (10 fields, 5 dimensions: temporal, spatial, social, causal, evidential). All 15 episode creation sites wired. ChromaDB serialization round-trip. Content hash exclusion. Backwards compatible.
BF-104
Display Crew Agent Count
4 tests
registry.crew_count() added. Shell prompt, status panel, API, and working memory all show crew count instead of total services.
AD-566a–f
Qualification Program
82 tests
6/6 complete. Psychometric infrastructure (AD-566a), Tier 1 baselines — 4 probes: Personality, EpisodicRecall, Confabulation, Temperament (AD-566b), Tier 2 domain — 5 probes: TheoryOfMind, Compartmentalization, DiagnosticReasoning, AnalyticalSynthesis, CodeQuality (AD-566d), drift detection pipeline with z-score engine (AD-566c), Tier 3 collective — 5 probes: CoordinationBreakeven, ScaffoldDecomposition, CollectiveIntelligence, ConvergenceRate, EmergenceCapacity (AD-566e), /qualify shell command (AD-566f).
AD-541b–f
Memory Consolidation Integrity
48 tests
6/6 pillars complete. Reconsolidation protection (read-only framing), spaced retrieval therapy (active recall practice), guided reminiscence (therapeutic sessions), content hashing (SHA-256 per-episode, SIF integration), eviction audit trail (append-only SQLite, all deletion paths audited, survives reset).
BF-103
Episodic Memory Agent ID Mismatch
16 tests
Orphaned episodes from mixed slot IDs vs sovereign IDs. resolve_sovereign_id() + resolve_sovereign_id_from_slot() helpers. All 4 storage paths normalized. Startup migration via migrate_episode_agent_ids(). Crew-identified: Vega (Security).
AD-550–555
Notebook Quality Pipeline
62 tests
6/6 complete. Dedup/read-before-write (AD-550), dream Step 7g consolidation + cross-agent convergence (AD-551), self-repetition detection extending AD-506b (AD-552), quantitative baseline auto-capture (AD-553), real-time convergence detection + Bridge notification (AD-554), quality metrics & dashboarding (AD-555). Key finding: iatrogenic trust detection — 3 agents from 2 departments independently converged on same diagnosis through different professional lenses.
BF-101/102
Crew Identity & Self-Awareness
24 tests
Three fixes for crew identity and commissioning awareness. BF-101: _resolve_callsign() helper with identity registry (birth certificate) fallback — agents use chosen callsign from naming ceremony, not YAML seed. BF-102 Part A: commissioning awareness in temporal context when agent age < 300s (Westworld Principle). BF-102 Part B: cold-start system note in ward_room_notification handler prevents fabricated memory references. Enhancement: batched "New Crew Aboard" auto-welcome thread from Ship's Computer on warm boot. BF-103 DROPPED (misdiagnosis — was Captain's All Hands post, not system thread mode).
AD-560
Science Department Expansion
57 tests
Analytical Pyramid — three new Science crew. Data Analyst (Rahda): telemetry baselines, anomaly detection, quantitative reporting. Systems Analyst (Dax): emergence analysis, cross-system pattern synthesis, AD-557 consumer. Research Specialist (Brahms): directed investigation, formal research reports, evidence-based methodology. Modeled after USN Operations Specialist / ORSA / NRL Scientist. Organization ontology, crew profiles (Big Five personality), personal standing orders, department protocols, skills templates, CognitiveAgent subclasses, spawner registration.
AD-557
Emergence Metrics
57 tests
Information-theoretic collaborative intelligence measurement via Partial Information Decomposition (Riedl 2025). Williams-Beer I_min decomposition into Unique(i), Unique(j), Redundancy, Synergy. Pairwise synergy → Emergence Capacity (median). Coordination Balance (S×R interaction). ToM effectiveness via complementarity trend. Hebbian-synergy correlation. Groupthink/fragmentation risk detection with Counselor integration. Dream Step 9. API endpoints /emergence, /emergence/history. VitalsMonitor integration. Pure Python — zero external math dependencies.
AD-558
Trust Cascade Dampening
45 tests
Three-layer trust protection: (1) Progressive dampening — geometric weight reduction (1.0, 0.75, 0.5, 0.25) for consecutive same-direction updates per agent, cold-start scaling raises floor to 0.5 when observations < 20. (2) Hard trust floor (0.05) — negative updates absorbed below floor, positive always apply. (3) Network circuit breaker — M agents across N departments with anomalous delta (>0.15) trips global dampening (0.5×) + Counselor alert. Event emission centralized into record_outcome() via injectable callback. TrustDampeningConfig in SystemConfig. TRUST_CASCADE_WARNING event type. Counselor subscribes for cascade wellness sweeps.
BF-099
Trust Engine Concurrency Safety
18 tests
Critical bug fix. asyncio.Lock on TrustNetwork, BEGIN IMMEDIATE transactions for _save_to_db(), WAL mode + busy_timeout on both TrustNetwork and HebbianRouter, dream consolidation routed through record_outcome() with source parameter, shutdown race fix (cancel + await flush task). Prerequisite for AD-558.
AD-539
Gap → Qualification Pipeline
618 Cognitive JIT tests
Cognitive JIT pipeline COMPLETE (9/9). Multi-source gap detection (execution failures, near-misses, fallback learning, observational gaps, cross-agent comparisons), gap classification (missing_step/wrong_parameter/context_gap/edge_case/integration_gap), Skill Framework bridge (gap→skill requirement mapping), qualification program triggering (GapToQualificationBridge), progress tracking with completion criteria. 53 tests.
AD-538
Procedure Lifecycle Management
565 Cognitive JIT tests
Cognitive JIT 14/14. Ebbinghaus-inspired decay (half-life configurable, skip Level 5), archival policy (max_age_days, max_archived), ChromaDB semantic dedup (cosine similarity threshold), procedure merge (combine steps/metadata from overlapping procedures). Dream Step 7f lifecycle maintenance. 53 tests.
AD-537
Observational Learning
512 Cognitive JIT tests
Cognitive JIT 13/13. Bandura's social learning theory implemented. Three learning pathways: (1) Observation — dream Step 7e scans Ward Room threads, LLM extracts procedures from narratives (OBSERVATION_MIN_DETAIL_SCORE=0.6), enters Level 1 Novice with learned_via=observational. (2) Teaching — Level 5 promoted procedures taught via /procedure teach or API, enters Level 2 Guided with learned_via=taught. (3) Direct (existing) with learned_via=direct. COMPILATION_MAX_LEVEL raised 4→5. Config: OBSERVATION_MIN_TRUST=0.5, OBSERVATION_MAX_THREADS_PER_DREAM=20, OBSERVATION_WARD_ROOM_LOOKBACK_HOURS=24, TEACHING_MIN_COMPILATION_LEVEL=5, TEACHING_MIN_TRUST=0.85. Procedure dataclass gains learned_via/learned_from fields. New API: GET /procedures/observed, POST /procedures/teach. Shell: /procedure teach, /procedure observed. extract_procedure_from_ward_room_thread() in procedures.py. 52 tests across 7 files.
AD-536
Trust-Gated Procedure Promotion
460 Cognitive JIT tests
Cognitive JIT 12/12. ProcedureCriticality enum (LOW/MEDIUM/HIGH/CRITICAL) with classify_criticality() using keyword + cross-department + destructive-command detection. Promotion eligibility: Level 4+ compilation, min success count/rate/trust (PROMOTION_MIN_* constants). Two-tier approval routing: _route_promotion_approval() routes LOW/MEDIUM to department chief (_DEPARTMENT_CHIEFS mapping), HIGH/CRITICAL to Captain via Bridge. ProcedureStore: 6 promotion columns via migration, 6 new methods (request/approve/reject/get_pending/get_status/get_promoted). Level 5 Expert unlock via _max_compilation_level_for_promoted(). Shell commands: /procedure list-pending, approve, reject, list-promoted. API endpoints: GET/POST /procedures/*. Rejection learning stores feedback as institutional knowledge. 64 tests across 7 files.
AD-535
Graduated Compilation Levels
449 Cognitive JIT tests
Cognitive JIT 11/11. Five Dreyfus-inspired compilation levels: Level 1 Novice (full LLM), Level 2 Guided (_build_guided_decision + _format_procedure_as_hints, ~40% token reduction), Level 3 Validated (_build_validated_decision + _validate_replay_postconditions, per-step compound validation, ~80% token reduction), Level 4 Autonomous (zero tokens), Level 5 Expert (deferred). Trust clamping via _max_compilation_level_for_trust(). Promotion via consecutive_successes tracking (configurable threshold). Demotion to Level 2 on any failure. PROCEDURE_MIN_COMPILATION_LEVEL changed 1→2. ProcedureStore: consecutive_successes column, 4 new methods, migration. 7 new config constants. 62 tests across 9 classes.
AD-534c
Multi-Agent Replay Dispatch
387 Cognitive JIT tests
Cognitive JIT 10/10. ProcedureStep.resolved_agent_type populated at extraction via _resolve_agent_roles(). Compound detection in _check_procedural_memory() (2+ steps with resolved types). _execute_compound_replay() orchestrates sequential dispatch. _resolve_step_agent() resolution chain: pool → capability → fallback. Zero-token _handle_compound_step_replay() intent handler. _format_single_step() DRY extraction. COMPOUND_STEP_TIMEOUT_SECONDS=10.0. Unavailability → single-agent text fallback. handle_intent() compound branch. Step postcondition validation deferred to AD-535. 54 tests across 9 classes.
AD-534b
Fallback Learning
333 Cognitive JIT tests
Cognitive JIT 9/9 enrichments. 8 parts: metric semantics fix (record_completion/fallback moved to handle_intent), near-miss capture (_last_fallback_info at 4 rejection points: score_threshold, quality_gate, negative_veto, format_exception), service recovery (_decide_via_llm extracted, _run_llm_fallback for transparent retry), PROCEDURE_FALLBACK_LEARNING event + in-memory queue (max 50), evolve_fix_from_fallback() with _FALLBACK_FIX_SYSTEM_PROMPT, Dream Step 7d (_process_fallback_learning), DreamReport gains fallback_evolutions + fallback_events_processed. Deactivation rules: execution_failure → deactivate parent, near-miss → keep active, negative_veto → extraction candidate. 68 tests across 9 classes.
AD-532e
Reactive & Proactive Extraction Triggers
4811 py · 149 ts total
Cognitive JIT 8/9. Two new trigger paths beyond dream consolidation. Reactive: TASK_EXECUTION_COMPLETE event emitted from CognitiveAgent.handle_intent(), DreamingEngine.on_task_execution_complete() matches intent to procedures, diagnoses health, LLM-gates evolution. Rate-limited per agent (60s cooldown). Proactive: DreamScheduler Tier 1.5 runs proactive_procedure_scan() every 5 min, scans all active procedures via diagnose_procedure_health(). Shared: confirm_evolution_with_llm() gate (YES-only, conservative), evolve_with_retry() wrapper (max 3, retry_hint propagation), _attempt_procedure_evolution() DRY helper. Dream Step 7b unchanged (no confirmation gate). DreamReport gains proactive_evolutions + reactive_flags. 43 tests across 7 classes.
AD-532d
Multi-Agent Compound Procedures
4766 py · 149 ts total
Cognitive JIT 7/9. ProcedureStep gains optional agent_role field (default "", backward compatible). _COMPOUND_SYSTEM_PROMPT + extract_compound_procedure_from_cluster() — LLM generalizes agent IDs to functional roles, captures cross-agent handoff points. Dream Step 7 routes multi-agent clusters (len(participating_agents) >= 2) to compound extraction, single-agent to standard. Replay formatting includes [agent_role] annotations. DRY: reuses _format_episode_blocks(), _parse_procedure_json(), _build_steps_from_data(). 30 tests across 6 classes. Multi-agent replay dispatch deferred to AD-534c.
AD-532c
Negative Procedure Extraction
4741 py · 149 ts total
Cognitive JIT 6/9. extract_negative_procedure_from_cluster() with _NEGATIVE_SYSTEM_PROMPT — extracts anti-patterns from failure-dominant clusters (>50% negative outcomes). Contradiction enrichment from AD-403 detector feeds context into extraction prompt. Dream cycle Step 7c: iterates failure-dominant clusters, queries contradiction_detector for matching contradictions, passes both to LLM extraction. Produces Procedure(is_negative=True) stored via ProcedureStore in anti-patterns directory. DreamReport.negative_procedures_extracted field. DRY: reuses _format_episode_blocks(), _parse_procedure_json(), _build_steps_from_data(). 31 new tests across 5 test classes (extraction, contradiction enrichment, dream integration, DreamReport, end-to-end).
AD-532b
Procedure Evolution (FIX/DERIVED)
4559 py · 149 ts total
Cognitive JIT 5/9. EvolutionResult dataclass. Shared diagnose_procedure_health() (DRY — CognitiveAgent + DreamingEngine). _format_episode_blocks() DRY helper (CAPTURED/FIX/DERIVED). _FIX_SYSTEM_PROMPT + evolve_fix_procedure(): deactivate parent, generation+1, re-extract from fresh episodes via recall_by_intent(). _DERIVED_SYSTEM_PROMPT + evolve_derived_procedure(): multi-parent specialization, parents stay active, compilation_level=max(parents)-1. content_diff via difflib.unified_diff(), change_summary from LLM. Anti-loop guard (_addressed_degradations, 72h cooldown). Dream cycle Step 7b _evolve_degraded_procedures(). ProcedureStore.save() enhanced with content_diff/change_summary kwargs + get_evolution_metadata(). DreamReport procedures_evolved field. CognitiveAgent refactored to shared function. 48 new tests, checklist PASS.
AD-534
Replay-First Dispatch
4556 py · 149 ts total
Cognitive JIT 4/9 — MINIMUM VIABLE SLICE COMPLETE (AD-531→532→533→534). _check_procedural_memory() in CognitiveAgent.decide(): semantic match via ProcedureStore.find_matching(), quality metrics dispatch (effective_rate, completion_rate), 4-stage metric recording (selection/applied/completion/fallback). Negative procedure guard warns and forces LLM path. _format_procedure_replay() returns formatted steps as llm_output (zero tokens). _diagnose_procedure_health() implements 3 OpenSpace rules (log-only, no evolution). Journal procedure_id column. 8 config constants. BuilderAgent inherits via super(). Deferred: AD-534b (fallback learning). 35 new tests, cross-cutting checklist PASS.
AD-533
Procedure Store
4475 py · 149 ts total
Cognitive JIT 3/9. Hybrid ProcedureStore: Ship's Records (Git-backed YAML, authoritative) + SQLite index (procedure_records + procedure_lineage_parents tables, Version DAG, 4 quality counters + 4 derived rates) + ChromaDB (semantic search collection). Procedure dataclass extended: is_active, generation, parent_procedure_ids, is_negative, superseded_by, tags + from_dict(). CRUD: save() to 3 backends, get(), list_active() with filters, has_cluster() dedup, delete(). find_matching() semantic search with SQLite enrichment. BFS lineage traversal up/down. deactivate() for FIX semantics. Thread-safe: _write_lock + WAL + foreign keys. Dream cycle Step 7 calls store.save(), cross-session dedup via has_cluster(). ConnectionFactory protocol (AD-542). 49 new tests, 49/49 checklist PASS.
AD-532
Procedure Extraction
4426 py · 149 ts total
Cognitive JIT 2/9. LLM client wired into DreamingEngine via init_dreaming() + runtime.py. Procedure + ProcedureStep dataclasses (full schema: preconditions, postconditions, invariants, compilation_level, provenance). extract_procedure_from_cluster() with AD-541b READ-ONLY framing, standard tier LLM, JSON parsing with markdown fence handling. Dream Step 7: filters success-dominant clusters, skips _extracted_cluster_ids, stores in-memory. DreamReport gains procedures_extracted + procedures fields. Gap prediction renumbered to Step 8. 29 new tests, 38/38 checklist PASS.
AD-531
Episode Clustering & Pattern Detection
4400 py · 149 ts total
Cognitive JIT 1/9. Dead code cleanup: strategy_extraction.py + test_strategy_extraction.py deleted, strategy_store_fn removed from DreamingEngine/startup/runtime, DreamAdapter.store_strategies() removed. EpisodeCluster dataclass + cluster_episodes() pure function (agglomerative, average-linkage, cosine distance). EpisodicMemory.get_embeddings() for ChromaDB retrieval. Dream Step 6 replaced: embeddings → clustering → in-memory storage. DreamReport.strategies_extracted → clusters_found + clusters. Log-and-degrade on failure. 40 new tests, 24/24 checklist PASS.
AD-506b
Peer Repetition & Tier Credits
4374 py · 149 ts total
Self-Regulation Wave 7/7 — WAVE COMPLETE. BF-098 fix (_save_profile_and_assessment async). ZONE_RECOVERY event + proactive loop emission. check_peer_similarity() in create_thread()/create_post() — cross-agent repetition detection (detection, not suppression). PEER_REPETITION_DETECTED event. 4 tier credit fields on CognitiveProfile (self_correction, peer_detection, peer_caught, total). Counselor ZONE_RECOVERY + PEER_REPETITION handlers. Peer repetition Episode storage (intent=peer_repetition). Schema migration (5 ALTER TABLE). 32 new tests, 24/24 checklist PASS.
AD-506a
Graduated System Response — Zone Model
4341 py · 149 ts total
Self-Regulation Wave 6a/6. BF-097 fix (get_posts_by_author table names). CircuitBreakerConfig Pydantic model (13 tunable fields). CognitiveZone state machine (GREEN/AMBER/RED/CRITICAL). _compute_signals() refactor + _update_zone() transitions + time-based decay. SELF_MONITORING_CONCERN EventType + proactive loop emission. Zone-aware self-monitoring context for all Earned Agency tiers. Counselor amber zone handler + _classify_trip_severity() zone override + post-dream re-assessment. Standing orders reconciliation ([Clinical Authority] + [Cognitive Zones]). 39 new tests, 24/24 checklist PASS.
AD-505
Counselor Therapeutic Intervention
4302 py · 149 ts total
Self-Regulation Wave 5/6. BF-096 fix (ward_room_router wiring race). _send_therapeutic_dm() rate-limited DMs. Trigger-specific therapeutic templates. Cooldown reason tracking (set_agent_cooldown reason= param). counselor_recommendation BridgeAlert. COUNSELOR_GUIDANCE directive issuance (24h, max 3). _apply_intervention() orchestrator. 40 new tests, 18/18 checklist PASS.
AD-504
Agent Self-Monitoring Context
4263 py · 149 ts total
Self-Regulation Wave 4/6. Jaccard DRY to similarity.py. get_posts_by_author() query. _build_self_monitoring_context() (8 capabilities). Earned Agency TIER_CONFIG scaling. [READ_NOTEBOOK] two-cycle pattern. Memory state calibration. Standing orders [Self-Monitoring] section. 45 new tests, 14/14 checklist PASS.
AD-495
Circuit Breaker → Counselor Bridge
4217 py · 149 ts total
Self-Regulation Wave 3/6. Trip reason tracking (velocity/rumination/both). _classify_trip_severity() 4-level escalation (AD-506 override point). Ward Room posting via BridgeAlert pipeline. DRY _save_profile_and_assessment() helper. Trigger values fixed. 27 new tests, 5 classes, 10/10 checklist PASS.
AD-503
Counselor Activation — Data Gathering & Profile Persistence
4196 py · 149 ts total
Self-Regulation Wave 2/6. Type-filtered event subscriptions (reusable infra). 3 new EventTypes + typed dataclasses. CounselorProfileStore (SQLite, ConnectionFactory). _gather_agent_metrics(), _run_wellness_sweep(), event-driven assessment. InitiativeEngine dead wire lit up. 6 REST endpoints. 61 new tests, 11 test classes, 24/24 checklist PASS.
BF-091
Mock Discipline Phase 2 — Spec Coverage
4094 py · 149 ts total
Wave 6: Mock spec compliance 22.6% → 51.9% (+222 spec'd mocks across 19 files). 3 real bugs caught by spec= (BF-078 class): BaseLLMClient.generate() phantom → complete(), TrustNetwork.get_trust()/get_trust_score() phantoms → get_record()/get_score(). Mock Discipline scorecard C-→B.
BF-092
Trust Threshold Constants & DRY Cleanup
4091 py · 149 ts total
Wave 6: 19 named trust constants in config.py replacing ~30 magic numbers across 9 files. format_trust() utility replacing 52+ round(x,4) calls across 13 files. EventEmitterMixin in protocols.py deduplicating 4 identical _emit() methods (assignment, persistent_tasks, workforce, ward_room). DRY scorecard B→A-.
BF-090
Exception Audit Phase 2 — Silent Swallows
4093 py · 149 ts total
Wave 6: 71 silent except Exception: pass fixed (43 logger.debug, 4 narrowed to sqlite3.OperationalError, 24 justified with comments). 42 bare catches without as e fixed (exc_info=True added). DRY helper _safe_log_event() extracted in feedback.py. Exception Handling scorecard C→B+.
AD-542
Abstract Database Connection Interface
+10 py · 4094 py · 149 ts total
Wave 6: DatabaseConnection + ConnectionFactory Protocols. SQLiteConnectionFactory singleton. 12 production modules refactored with constructor injection (acm, assignment, trust, identity, persistent_tasks, routing, journal, skill_framework ×2, event_log, ward_room, workforce). Zero aiosqlite.connect() outside sqlite_factory.py. Unblocks commercial Cloud-Ready Storage.
BF-089
Emergent Detector False Positives
4094 py · 149 ts total
Wave 6: Crew-reported trust anomaly false positives (Forge + Reyes). Adaptive baselines with rolling mean/stddev, temporal smoothing buffer, configurable sustain window. Normal Hebbian weight adjustments no longer flagged as pathological.
AD-516
Extract api.py into FastAPI Routers
+1 py · 4040 py · 149 ts total
Wave 3 decomposition: extracted 122 routes from api.py into 16 FastAPI router modules (3,109 → 295 lines, −90.5%). Depends(get_runtime) pattern replaces closure state. WebSocket stays in api.py. Ward Room route prefix unified. deps.py, api_models.py shared infrastructure.
AD-515
Extract runtime.py Modules
+106 py · 4039 py · 149 ts total
Wave 3 decomposition: extracted 5 modules from runtime.py god object (5,321 → 4,102 lines, −23%). ward_room_router.py (567), agent_onboarding.py (365), self_mod_manager.py (331), dream_adapter.py (297), warm_boot.py (279), crew_utils.py (26). Constructor injection throughout. Zero behavior changes, zero regressions. AD-521 decided: SWE/Build Pipeline Separation (Model A).
Service Protocols + Public APIs + BF-071–076
+60 py · 3933 py · 149 ts total
Wave 3 start: 7 typing.Protocol definitions, public API wrappers on 17 target objects replacing 47 private-access patterns. BF-071–075: code review waves 1+2 (safety hardening, code hygiene, exception audit). BF-076: quality fixes from principles audit (runtime bug, type tightening, structured logging, duplicate resolution, boundary tests). Engineering Principles Stack codified in copilot instructions.
AD-498
Work Type Registry & Templates
+45 py +8 ts · 3942 py · 149 ts total
Formal per-type state machines (card/task/work_order/duty/incident) with validated transitions. WorkTypeRegistry enforces per-type rules, unknown types permissive for backward compatibility. 8 built-in templates (security scan, diagnostics, code review, night orders). TemplateStore with variable substitution, config-driven custom types/templates. REST API endpoints for types and templates. UI template pickers in WorkBoard and ProfileWorkTab. Night Orders bridge to AD-471.
Work Tab & Scrumban Board — HXI Surface
+14 ts +3 py · 3730 py · 141 ts total
Agent Profile Work Tab rewritten with active/blocked/completed/duty sections, create task form, reassign/cancel/retry actions. Crew Scrumban Board with 5 Kanban columns, HTML5 drag-and-drop, WIP limits, swim lanes, filters, quick create. WebSocket event broadcasts on all workforce mutations. Bookable resources in snapshot. Agent profile API includes work items and bookings.
AD-496
Workforce Scheduling Engine — Core Data Model
+66 py · 3690 py · 118 ts total
Universal Resource Scheduling for AI agents. 7 core entities (WorkItem, BookableResource, ResourceRequirement, Booking, BookingTimestamp, BookingJournal, AgentCalendar). WorkItemStore (SQLite-backed). Push/pull assignment engine with capability+trust matching. Booking lifecycle with event-sourced timestamps and journal generation. 14 REST API endpoints. Runtime auto-registration of agents as BookableResources. AD-500/501 cleanup ADs created for deferred migration work.
AD-488
Cognitive Circuit Breaker + BF-060/061/062
+18 py · 3624 py · 118 ts total
CLOSED/OPEN/HALF_OPEN state machine. Velocity detection (event burst in time window) + Jaccard similarity detection (content rumination). Escalating cooldowns (15min → 1hr cap). Attention redirect prompts. Bridge alerts on trip. BF-060: regex notebook stripping. BF-061: reply rank gate + thread ID resolution. BF-062: bigram similarity + 10-post lookback.
AD-499
Ship & Crew Naming Conventions
design only
Three-layer naming: ShipNameRegistry on commissioning, agent personal name + callsign coexistence (Option B), federated display format Name [ShipName]. Birth provenance as surname.
BF-040
Identity System Hardening — 13 Review Findings
+8 py · 3569 py · 118 ts total
Critical: export_chain() credential attachment fix. High: input validation (DIDs, birth certs), asyncio.Lock on ledger, PRAGMA foreign_keys. Medium: eager genesis at commissioning, verify_chain() genesis previous_hash check, removed dead _cache. Low: slot mapping overwrite warning, DB index, documentation. 9 new + 1 updated test, 1 removed.
AD-441c
Asset Tags for Infrastructure/Utility + Boot Sequence Fix
+6 py · 3572 py · 118 ts total
Two-tier identity: crew agents get sovereign birth certificates (W3C VCs), infrastructure/utility agents get lightweight AssetTags. Boot sequence fix: deferred crew identity until ship commissioned, post-commissioning sweep. GET /api/identity/assets endpoint. 6 new tests in test_agent_identity.py.
AD-441b
Ship Commissioning — Genesis Block with ShipBirthCertificate
+6 py · 3408 py · 118 ts total
Ship commissioning ceremony: genesis block creation, ShipBirthCertificate W3C VC, ship DID (did:probos:{instance_id}), Identity Ledger hash-chain initialization. ACM issues ship identity at first boot. 6 new tests in test_agent_identity.py.
AD-418
Post-Reset Routing Degradation
+9 py · 3324 py · 118 ts total
agent_hint field on PersistentTask (idempotent migration). HebbianRouter get_preferred_targets() hint: +1.0 synthetic boost. Reset CLI warns about active scheduled tasks. PATCH /api/scheduled-tasks/{id}/hint endpoint.
BF-031
Cognitive Journal Schema Migration Ordering
fix · 3324 py · 118 ts total
CREATE INDEX on intent_id ran before ALTER TABLE ADD COLUMN. Split _SCHEMA into _SCHEMA_BASE + _SCHEMA_INDEXES. Startup: base → migration → dependent indexes.
AD-412
Crew Improvement Proposals Channel
+13 py · 3315 py · 118 ts total
#Improvement Proposals ship channel (auto-seeded, idempotent). [PROPOSAL] block extraction in proactive thinks (multiline rationale). _handle_propose_improvement() runtime handler. GET /api/wardroom/proposals with status filter. Captain endorse/shelve via existing Ward Room endorsements.
AD-432
Cognitive Journal Expansion — Traceability + Query Depth
+15 py · 3302 py · 118 ts total
Schema: intent_id, dag_node_id, response_hash columns + idempotent migration. Traceability: intent_id plumbed from IntentMessage through perceive/decide. Time-range queries (since/until). Grouped token usage (by model/tier/agent/intent). Decision point anomaly detection. wipe() for reset. 2 new API endpoints + time-range params on existing.
AD-433
Selective Encoding Gate — Biologically-Inspired Memory Filtering
+11 py · 3291 py · 118 ts total
EpisodicMemory.should_store() static gate: always store Captain 1:1 + failures, block proactive no-response + QA routine passes + empty responses. Conservative default. Applied at 4 call sites (proactive, runtime, cognitive_agent). Memory Architecture Layer 2.
BF-029/030
Ward Room Recall Quality + execute_fetchone Fix
+10 py · 3276 py · 118 ts total
BF-029: Recall query enrichment with "Ward Room {callsign}" prefix, reversed input/reflection preference, body excerpts in reply reflections. BF-030: execute_fetchone() doesn't exist in aiosqlite — replaced with standard execute()+fetchone() pattern.
AD-431
Cognitive Journal — LLM Reasoning Trace Service
+13 py · 3266 py · 118 ts total
Append-only SQLite store for every LLM call: agent, tier, model, prompt/completion tokens, latency, intent, cached. Instrumented decide() with timing. REST API: /api/journal/stats, /api/agent/{id}/journal, /api/journal/tokens. LLMResponse gained prompt_tokens + completion_tokens. Ship's Computer infrastructure service.
AD-415
Proactive Cooldown Persistence
+10 py · 3253 py · 118 ts total
Per-agent cooldown overrides persist to KnowledgeStore (proactive/cooldowns.json). Write-through on set, restore on boot, persist on shutdown. Wiped on probos reset.
BF-027/028
Memory Recall Hardening
+8 py · 3242 py · 118 ts total
Lowered agent recall threshold 0.7→0.3 (sovereign shard filter prevents leakage). Added recent_for_agent() fallback across all recall sites (cognitive_agent, proactive, shell). Fixed MockEpisodicMemory missing recall_for_agent().
AD-430c
Memory Recall + Act-Store Lifecycle Hook (Pillars 4-5)
+13 py · 3236 py · 118 ts total
_recall_relevant_memories() between perceive/decide enriches observation with 3 episodes. _store_action_episode() after report stores universal post-action episodes. Dedup: skips proactive_think, ward_room, hxi_profile, captain DMs. All in CognitiveAgent base class.
AD-430b
HXI 1:1 Conversation Memory (Pillar 3)
+19 py · 3223 py · 118 ts total
HXI chat passes conversation history via session_history (capped at 10). Episode storage after each response. Cross-session recall via /api/agent/{id}/chat/history. ProfileChatTab seed memories on mount.
AD-430a
Experiential Memory Write Paths (Pillars 1-2)
+8 py · 3204 py · 118 ts total
Proactive think episodes (successful + no-response) stored to EpisodicMemory. Ward Room thread creation + reply episodes for authoring agent. Runtime wires episodic_memory into WardRoomService. All storage non-critical (try/except).
AD-416
Ward Room Archival & Pruning
+14 py · 3196 py · 118 ts total
3-tier retention (7d regular, 30d endorsed, indefinite Captain). JSONL archival before deletion. Background prune loop with monthly rotation. Stats + dry-run API. Runtime cleanup on ward_room_pruned event.
BF-023
Degraded Agent Death Spiral
+5 py · 3172 py · 118 ts total
Exception handler calls update_confidence(False). DEGRADED->ACTIVE recovery when confidence rises above 0.2. Degradation warning only on state transition.
AD-425
Ward Room Active Browsing
+14 py · 3167 py · 118 ts total
Cross-channel browse_threads(). Proactive context expanded to All Hands. Read receipts via update_last_seen(). Activity feed + mark-seen REST endpoints. Crew auto-subscribed to All Hands.
AD-424
Ward Room Thread Classification & Lifecycle
+19 py · 3153 py · 118 ts total
Three thread modes: INFORM (no notification), DISCUSS (relaxed earned agency, responder cap), ACTION (@mention). Bridge alerts → INFORM. SQLite schema migration. Thread lock/unlock. BF-022 closed.
AD-417
Dream Scheduler Proactive-Loop Awareness
+9 py · 3145 py · 118 ts total
Full dreams gate on proactive activity — truly_idle = min(idle_time, proactive_idle). EmergentDetector skips micro-dream analysis during proactive-busy periods. proactive_extends_idle config toggle.
AD-419
Agent Duty Schedule & Justification
+13 py · 3118 py · 118 ts total
Plan of the Day for structured proactive cognition. DutyScheduleTracker with cron/interval support. Duty-first proactive loop. Default schedules per agent type. BF-021: duty hard gate (no LLM call when no duty due).
Ph-25a
Persistent Task Engine
+33 py · 3105 py · 118 ts total
SQLite-backed PersistentTaskStore with once/interval/cron scheduling. Webhook triggers, DAG checkpoint resume, SchedulerAgent integration. 6 REST endpoints. HXI store wiring.
AD-413
Fine-Grained Reset Scope + Ward Room Awareness
+10 py · 3105 py · 118 ts total
Reset = day 0. Archives ward_room.db, clears checkpoints/events. --keep-wardroom flag. WardRoom.get_recent_activity(). Proactive loop now includes Ward Room department context.
AD-411
EmergentDetector Pattern Deduplication
+9 py · 3095 py · 118 ts total
Cooldown-based dedup cache for trust anomalies, cooperation clusters, routing shifts. Stale entry pruning. create_pool() duplicate guard. Crew-discovered issue.
AD-410
Bridge Alerts — Proactive Captain & Crew Notifications
+31 py · 2996 py · 118 ts total
5 signal processors (vitals, trust, emergent, behavioral, dedup). 3 severity levels. Ward Room thread creation as "Ship's Computer." First autonomous crew discussion achieved.
AD-408
Dynamic Assignment Groups — Backend
+34 py · 2965 py · 118 ts total
AssignmentService with SQLite. Bridge/away_team/working_group types. Ward Room channel auto-creation. 7 API endpoints. Sync snapshot cache.
AD-407
Ward Room — Agent Communication Fabric (4 phases)
+109 py + 9 ts · 2965 py · 118 ts total
Foundation (7 SQLite tables, 11 API routes, credibility), HXI Surface (420px drawer, channels, threading, endorsements), Agent Integration (crew responds to Captain posts), Agent-to-Agent (5-layer safety, depth cap, selective targeting).
AD-406
Agent Profile Panel
· 2822 py · 108 ts total
Click-to-open floating profile with Chat, Work, Profile, Health tabs. 1:1 direct messaging. Personality traits, trust sparkline, Hebbian connections. Glass-morphism UI.
AD-405
Step-Level DAG Checkpointing
+19 py · 2822 py · 99 ts total
Checkpoint DAG execution state to JSON after each node. write/load/delete/scan/restore operations. DAGExecutor integration. Stale checkpoint detection at startup.
AD-404
Fix Windows-Specific Test Failures
· 2822 py · 99 ts total
Fixed 19 tests failing on Windows. Four groups: EscalationHook mock gaps, git skip guards, shell command agent Popen mock, worktree git fixture skip.
AD-403
Memory Contradiction Resolution
+20 py · 2822 py · 99 ts total
Deterministic contradiction detection in dream consolidation. Jaccard word-overlap similarity (0.85 threshold). Pairwise episode comparison for disagreeing outcomes. Dream cycle Step 3.5.
AD-402
Agent Behavioral Eval Framework — Phase 1
+30 py · 2822 py · 99 ts total
Golden-dataset driven quality tests for cognitive agents. 18 decomposer cases (single-intent, multi-step, conversational, edge cases). 12 code review cases (clean code, security vulnerabilities). Parametrized pytest runner with per-case pass/fail. Eval summary reporter.
AD-401
Structured Output Validation & Auto-Retry
+13 py · 2792 py · 99 ts total
Shared json_extract.py utility with string-aware brace matching, <think> block stripping, markdown fence extraction. complete_with_retry() wraps LLM call + parse with error feedback and temperature escalation. Decomposer, CodeReviewer, Research retrofitted.
AD-400
Cross-Layer Import Lint Test
+2 py · 2790 py · 99 ts total
AST-based pytest guard walks all src/probos/ files, extracts imports, maps to layers, fails on undocumented cross-layer edges. utils/core universally importable. Enforces AD-399 boundaries in CI.
AD-399
Cross-Layer Dependency Cleanup
+12 py · 2776 py · 99 ts total
AST-based import analysis (124 files, 257 imports). embeddings.py→knowledge/, response_formatter.py→utils/, QAReport→types.py. 6 allowed edges documented. Foundation-tier types.py/config.py recognized.
AD-398
Crew Identity Alignment — Three-Tier Agent Architecture
+33 py · 2764 py · 99 ts total
Three-tier agent taxonomy (Infrastructure/Utility/Crew). Reclassified IntrospectAgent, VitalsMonitor, RedTeamAgent, SystemQA as infrastructure. New cognitive crew: SecurityAgent (Worf), OperationsAgent (O'Brien), EngineeringAgent (LaForge). 1:1 conversation fixes.
AD-397
Callsign Addressing — Targeted Dispatch & 1:1 Sessions
+27 py · 2731 py · 99 ts total
@callsign targeted dispatch via IntentBus.send(). CallsignRegistry syncs from crew_profiles YAML. 1:1 crew sessions with sovereign episodic memory shards. /bridge command. Session isolation in CognitiveAgent + EpisodicMemory.
AD-387
Unified Bridge — Single Panel HXI Redesign
+4 ts · 2652 py · 46 ts total
Single BRIDGE button replacing 3 panels. 5 priority sections. Shared card components in bridge/ subdirectory. Adaptive main viewer (canvas/kanban). ViewSwitcher.
AD-388
Glass Overlay & Center Task Cards
+9 ts · 2652 py · 55 ts total
GlassLayer.tsx frosted overlay, GlassTaskCard.tsx glass morphism cards, constellation layout, dynamic frost, attention elevation.
AD-389
DAG Step Visualization
+7 ts · 2652 py · 62 ts total
GlassDAGNodes.tsx radial step nodes, SVG dependency lines, hover tooltips, single-click expand, expandedGlassTask store field.
AD-390
Ambient Intelligence & Bridge States
+11 ts · 2652 py · 73 ts total
ContextRibbon HUD strip, deriveBridgeState (idle/autonomous/attention), ambient edge glow, BriefingCard return-to-bridge, completion celebrations.
AD-391
Cyberpunk Atmosphere Layer
+10 ts · 2652 py · 83 ts total
ScanLineOverlay, DataRainOverlay (Ctrl+Shift+D), chromatic aberration SVG filter, luminance ripple, playStepComplete/playBridgeHum/playCaptainReturn sound cues, atmosphere preferences in localStorage.
AD-392
Adaptive Bridge — Trust, Breathing, Gaze, Responsive
+16 ts · 2652 py · 99 ts total
Trust-driven progressive reveal (trustBand, condensed/standard/prominent cards), Command Surface breathing (recede/swell), Captain's Gaze (throttled mouse proximity), useBreakpoint() responsive hook (5 viewport ranges).
AD-393
Personality Activation — Big Five Wiring
+10 py · 2683 py total
Wire Big Five personality traits from YAML crew profiles into LLM system prompts. _build_personality_block() as Tier 1.5 in compose_instructions(). Three-band mapping: high/low/neutral behavioral guidance.
AD-394
ScoutAgent — GitHub Intelligence Gathering
+10 py · 2683 py total
Science department officer (Wesley) — daily GitHub search, absorb/visiting_officer classification via LLM, Discord digest, Bridge notifications, /scout command, seen-repo deduplication with 90-day TTL.
AD-396
Quality Hardening — Encoding, Paths, Type Safety
+13 py · 2704 py · 99 ts total
7 subprocess encoding fixes (text=True→utf-8), Scout data directory resolution from runtime, personality trait type guard, 13 integration tests for encoding/paths/types/shell↔agent boundary.
AD-395
CredentialStore + Scout gh CLI + Source Curation
+19 py · 2702 py · 99 ts total
CredentialStore (Ship's Computer service): resolution chain (config→env→CLI), 5-min cache TTL, audit logging, dept-scoped access, /credentials command. Scout: httpx→gh api migration, multi-dimensional scoring (credibility+reliability from GPT Researcher), composite_score.
AD-324
Orb Hover Enhancement
+3 ts
Per-agent hover preview with current task, step label, elapsed time, progress fraction. Amber pulse on attention-needed orbs. Click-through to Bridge panel.
AD-323
Agent Notification Queue
+12 py · +4 ts
NotificationQueue service, Runtime.notify(), bell button with unread badge, type-colored notification cards with click-to-ack, mark-all-read.
AD-360–386
Cognitive Evolution — 11 ADs
+180 tests
Multi-dimensional trust (reliability, quality, speed, safety, cooperation), Hebbian decay & generalization, consensus confidence calibration, episodic importance scoring, dreaming pattern extraction, capability gap prediction, runtime directive overlays.
BF-004–008
Bug fixes — All Closed
+10 tests
BF-004: Transporter HXI visualization. BF-007: Verification false positive on pool counts. BF-008: Dream cycle double-replay after dolphin dreaming.
AD-358
Per-Tier Temperature & Top-P Tuning
+5 tests · 2332 py total
Independent temperature and top_p config per LLM tier. Deep tier uses lower temp for precise architectural analysis. Configurable via system.yaml.
AD-355
Visiting Officer Live Testing Fixes
+2 tests · 2327 py total
WORKING ENVIRONMENT section added to visiting builder instructions. Reduced diagnostic logging. PYTHONPATH fix for test imports in both test runners.
AD-354
Visiting Officer HXI Integration
+14 tests · 2325 py · 34 ts
Path normalization, temp directory for SDK sessions, force_native/force_visiting flags, builder_source in WebSocket events, HXI builder badge.
AD-351–353
Copilot SDK Visiting Officer
+30 tests · 2313 py · 32 ts
CopilotBuilderAdapter, 7 MCP tools (codebase_query, find_callers, get_imports, find_tests, read_source, self_model, standing_orders), Hebbian apprenticeship routing.
AD-350
Fix: Diagnostician Bypassing VitalsMonitor [BF-003]
+2 tests
AD-349
Fix: Agent Orbs Escaping Pool Group Spheres [BF-002]
1 Vitest
AD-348
Fix: Self-Mod False Positive on Knowledge Questions [BF-001]
+1 test
AD-343–347
Builder Failure Escalation
+38 tests
BuildFailureReport with failure classification, smart test selection, resolution API (POST /api/build/resolve), HXI diagnostic card, escalation hook for Phase 33 chain of command.
AD-337–342
Builder Quality Gates & Standing Orders
+30 tests
Constitution system (4-tier hierachy), commit gate (blocks on test failure), CodeReviewAgent soft gate, Builder test-writing rules, /ping validation, /orders display command.
AD-330–336
Northstar II — Transporter Pattern
7/7 · +45 tests
BuildBlueprint, ChunkDecomposer (Dematerializer), parallel Matter Stream execution, ChunkAssembler (Rematerializer), Interface Validator (Heisenberg Compensator), HXI visualization, end-to-end integration + fallback.
AD-317–320
Ship's Computer Identity + Self-Knowledge
+43 tests
LCARS Ship's Computer identity, SystemSelfModel (structured runtime self-knowledge), pre-response verification (zero-LLM fact check), introspection delegation (grounded self-knowledge answers).
AD-302–316a
Northstar I — ArchitectAgent + BuilderAgent
18/18 · +160 tests
Full design-and-build pipeline: Architect (7 context layers, deep localize, import graph, pattern recipes, proposal validation), Builder (CREATE/MODIFY modes, test-fix loop), API (submit/approve/reject), HXI (teal proposal UI, Approve+Build button), Phase 32l–32p hardening.
AD-297–301
CodebaseIndex — Ship's Technical Manual (Phase 29c)
+35 tests
AST-based structural self-awareness. Import graph (forward + reverse). Word-level keyword scoring. Project docs indexed. Section-targeted doc reading. Full API surface, caller analysis, test discovery.
AD-285–296
Crew Architecture + Medical Team
+65 tests
Pool group registry, rank structure, Medical Team complete (CMO, Diagnostician, VitalsMonitor, causal attribution, introspect_design), HXI crew sub-clusters with gravitational layout + billboarding labels.
AD-274–284
Phase 24a–c: Discord + Bug Fixes + Scheduler
+40 tests
Discord bot adapter, ChannelAdapter ABC, self-assessment bug fixes (episode count key mismatch, trust reconciliation), lightweight task scheduler (session-scoped, channel delivery).
Era I – III Phases Complete
Era III v0.3.0
Bundled agents, distribution, HXI MVP, Discord — Phases 17–23
ADs 201–261
Era II v0.2.0
CognitiveAgent, skills, DAG proposal, dependency resolver, semantic knowledge, Shapley — Phases 10–16
ADs 101–200
Era I v0.1.0
Agent substrate, consensus/trust, mesh, self-mod, episodic memory, federation, knowledge store — Phases 1–9
ADs 1–100