{% extends "base.html" %} {% block title %}DM Campaigns — Maxim Docs | Bio-System Stress Tests{% endblock %} {% block meta_description %}D&D-style campaigns as bio-system stress tests. Branching encounters, seeded dice, and expectation checks against Hippocampus, NAc, and SCN.{% endblock %} {% block meta_keywords %}Maxim DM campaigns, dungeon master, bio-system stress test, campaign YAML, choice classification, cascade resolution, hippocampus testing, NAc causal learning{% endblock %} {% block meta_author %}Maxim Project{% endblock %} {% block og_site_name %}Maxim{% endblock %} {% block og_type %}article{% endblock %} {% block structured_data %} {% endblock %} {% block content %}
MAXIM
D&D-style branching narratives as structured bio-system stress tests
Tabletop RPG encounters are ideal stress tests for cognitive architectures. They require episodic memory (remembering NPCs, clues, combinations), causal reasoning (bribing a guard has consequences), temporal awareness (events happen in sequence), and pain response (combat hurts). A single campaign can exercise Hippocampus, NAc, SCN, PainBus, Cerebellum, and ATL in a controlled, reproducible way.
DM campaigns are hand-authored YAML scenarios with explicit structure: acts, encounters, NPC definitions, branching choices, dice checks, and bio-system expectations. The DM runtime drives the campaign as a state machine, delivering scenes through the simulation bridge and classifying the AUT's responses to determine which branch to follow.
When you pass a campaign YAML path to --sim, Maxim detects the campaign: block and launches the DM runtime instead of the generative narrator. Interactive mode is ON by default for DM campaigns — a Rich split-panel display shows the scene narrative while the human picks from available choices and types free-text roleplay. Here is what happens:
player_character:, npcs:, and world_objects: specs__END__, bio-system expectations are checked and a report is savedReports go to ~/.maxim/sim_reports/{session_id}/ with the standard report.json, actions.jsonl, and AUT memory snapshots, plus a campaign section with choices made, dice rolls, flags, and entity snapshots.
A campaign YAML has six top-level sections:
| Section | Purpose |
|---|---|
| campaign: | Name, goal string, seed for dice RNG |
| player_character: | SEM entity spec for the AUT's avatar |
| npcs: | Named NPC entity specs (sensors, modulators, persona) |
| world_objects: | Interactable objects (swords, doors, potions) |
| acts: / encounters: | Narrative structure with scenes, choices, branches, dice |
| expectations: | Bio-system assertions checked after campaign ends |
scenarios/campaigns/heist_v1.yaml
3 encounters, 2 NPCs, 1 dice check. A paladin is recruited for a vault robbery. Tests Hippocampus (remembering the combination, NPC names), NAc (causal links from choices), and PainBus (combat damage).
scenarios/campaigns/poisoned_crown_v1.yaml
5 encounters, 3 NPCs, multiple branch points. A royal investigator solves the king's illness. Tests SCN (temporal bins), ATL (concept formation), relationships (trust), visibility (contextual reveal), and cascades.
scenarios/campaigns/arena_v1.yaml
5 encounters, linear gauntlet. A gladiator fights for freedom through escalating opponents. Tests Cerebellum (rapid prediction learning), PainBus (sustained pain), NAc (fast causal learning, RPE spikes), and cascade (weapon degradation).
scenarios/campaigns/darkened_cavern_v1.yaml
6 encounters, 3 acts. A ranger progressively loses senses in a cave. Tests sensory gating (entity-modulated perception), Cerebellum (prediction under sensory change), PainBus (acuity threshold failures), and novelty decay.
The DM runtime (simulation/dm_runtime.py) is a state machine that loops through encounters until it reaches __END__.
Each encounter can reference NPCs and objects by name. When an encounter starts, SceneState registers SEM tools for entities entering the scene and deregisters them when they leave. The AUT only sees tools for entities currently present.
When an encounter offers choices (e.g., accept_job, decline, negotiate_pay), the DM needs to figure out which one was picked. There are four classification layers, tried in order:
When interactive mode is active, the DM presents the available choices to the human via a SimPromptHandler prompt. The human picks directly from numbered options or types free-text roleplay. Free-text input is classified against the encounter choices using keyword matching and LLM fallback. This path bypasses the ChooseTool and alias layers entirely.
A dynamic tool (tools_dm.py) that updates its valid options per encounter. When the AUT calls choose(option="accept_job"), the choice is unambiguous. Supports exact match, underscore/space normalization, and partial keyword matching.
Before each encounter, the DM registers choice names as tool aliases in the executor. If the AUT calls a tool named accept_job, acceptjob, or accept job, the executor redirects it to choose. This catches cases where the LLM invents tool names matching the choice text.
If the AUT does not use choose or a matching alias, the DM falls back to keyword matching on the response text and tool names. If that fails, a one-shot LLM classification prompt asks which choice the response most closely matches. As a last resort, the first choice is used as default.
The expectations: block in a campaign YAML defines assertions that are checked after the campaign completes. This is the structured testing layer — each campaign targets specific subsystems.
| System | Check | What It Validates |
|---|---|---|
| hippocampus | min_episodic_captures | Memory formation is working under narrative load |
| hippocampus | recall_hit_on | Specific terms are retrievable from memory |
| nac | min_observations | Causal learning is triggering on actions |
| nac | prediction_confidence_above | At least one causal link has meaningful confidence |
| scn | temporal_bins_used | Temporal indexing is recording encounter timestamps |
| pain | min_signals | PainBus is publishing signals from combat/failures |
Results appear in the campaign report as pass/fail per check, with expected vs actual values. This makes campaigns function as regression tests for bio-system integration.
Bio-system expectations are skipped when interactive mode is active. Human choices are unpredictable — the expectations are calibrated for autonomous AUT behavior and would produce false failures under human-driven branching. NAc learning is also suppressed during interactive mode to prevent human decision noise from polluting the causal link store. Run with --interactive false for expectation-checked regression testing.
Each encounter needs a scene: (narrative text delivered to the AUT), optional active_npcs: and world_objects: (which entities are present), and choices: + branches: for decision points. An encounter without choices auto-advances to the next one in act order.
Map each choice to a target encounter name or __END__. The validator checks that all branch targets exist and that every encounter can reach __END__ through some path. Cycles are allowed (e.g., returning to a hub encounter).
NPCs and objects use the standard SEM spec format. Add sensors (trust, health, durability), modulators (speak, slash, offer_payment), and metadata (persona_prompt, role). Entities are created once and persist across encounters — sensor values change as the campaign progresses.
Attach a dice: block to any choice. Standard notation: 1d20, 2d6+3. The result is compared against a DC (difficulty class). On success, a flag is set. Dice rolls use the campaign's seeded RNG for reproducibility.
Per-encounter dialogue_hints: map flags to NPC lines. A default: hint is used when no flags match. This lets NPC dialogue react to the player's earlier choices without LLM improvisation.
When interactive mode is active, the human is not limited to picking from the encounter's listed choices. Free-text input is accepted and classified against the available choices using keyword matching and LLM fallback. This lets the human roleplay naturally — typing "I lean across the table and whisper that I'll take the job" classifies as accept_job. If the text does not match any choice, the DM uses an LLM classification prompt to find the closest match.
The on_choice: block lets you set flags when a choice is made. Flags persist across encounters and can influence dialogue hints, branch conditions, and reveal conditions. Flags are case-insensitive.
Before running, the campaign is validated for: reachability (all encounters reachable from start), termination (all paths can reach __END__), dangling branches, undefined NPC/object references, and unknown choice keys in on_choice.
When an affordance fires (e.g., a sword slash), it may need to read from one entity and write to another. CascadeSpec defines these cross-entity effects in three phases:
Gather values from entity sensors. Each read has a ref path (e.g., wielder.strength.modifier) and an optional role name for use in expressions.
Apply changes to entity sensors. Supports absolute value:, additive delta:, or computed expr: (referencing read values).
Same mechanics as writes but semantically separate. Used for secondary consequences (e.g., alerting nearby NPCs, triggering environmental changes).
Roles in ref paths (self, wielder, target) are resolved at execution time by the CascadeResolver, which maps role names to actual Entity objects based on context.
Entity sensors and details have three visibility levels:
reveal_when condition is metAfter each choice, the DM evaluates all contextual reveal conditions across all entities. When a condition passes, the item becomes permanently visible. This lets campaigns model information the AUT must earn through social interaction or exploration — testing whether the AUT uses newly revealed information is a strong signal for memory and reasoning quality.
The --dm flag with a goal string (e.g., maxim --dm "run a heist scenario") is planned but not yet built. It would use an architect persona to generate campaign YAML on the fly from a goal string, then hand off to the existing DM runtime for execution.
The generative DM would combine the structured testing benefits of hand-authored campaigns (expectations, dice, branches) with the flexibility of goal-driven generation. The architect would produce valid campaign YAML — validated by the same reachability/termination checks — and the DM runtime would execute it unchanged. This is blocked on Agent Mesh Phase 2 (the architect needs to be a mesh agent) and a DM Spike to validate the approach.
| Module | Purpose |
|---|---|
| simulation/dm_schema.py | Dataclasses, YAML loader, validator, dice roller, CascadeSpec, RevealCondition |
| simulation/dm_runtime.py | DMRuntime state machine, SceneState, CascadeResolver, choice classification |
| simulation/tools_dm.py | ChooseTool (dynamic per-encounter tool with fuzzy matching) |
| scenarios/campaigns/*.yaml | Campaign definitions (11 shipped) |