{% extends "base.html" %} {% block title %}Substrate-Primary Mode — Maxim Docs | Bio-Substrate AUT{% endblock %} {% block meta_description %}Maxim's substrate-primary AUT mode — a parallel architecture where the bio-substrate drives action selection without LLM mediation.{% endblock %} {% block meta_keywords %}substrate-primary, Maxim AUT mode, bio-substrate action selection, NAc recommend_action, parallel mode architecture, grounded language acquisition, cradle prelinguistic, LLM-free agent{% endblock %} {% block meta_author %}Maxim Project{% endblock %} {% block og_site_name %}Maxim{% endblock %} {% block og_type %}article{% endblock %} {% block structured_data %} {% endblock %} {% block content %}
← Back to Overview

MAXIM

Substrate-Primary Mode

Bio-Substrate Action Selection Without LLM Mediation

Status

Phase −1 prototype shipped 2026-05-09 — NAc.recommend_action() exists with 11 passing unit tests. Phase 0 harness lands in v1.0 (B5). Substrate-primary AUT mode itself ships in v1.1 as opt-in via --aut-mode substrate-primary. The existing LLM-AUT mode remains the user-facing default indefinitely.

Contents

  1. What this is
  2. Why this exists
  3. Parallel-mode architecture
  4. How action selection works
  5. Phase −1: the gating Boolean
  6. D&D as the bidirectional kill criterion
  7. Raw vs primed substrate
  8. Pretrained-LLM crutches to disable

What this is

Maxim has historically used a language model as the AUT's action selector — the LLM proposes the next tool call, the executor dispatches it, the bio-systems learn from the outcome. The bio-substrate (NAc, EC, ATL, Hippocampus, reflexes, Default Network) has been an augmentation layer sitting around the LLM: predicting outcomes, biasing recognition, capturing episodes, providing reactive behaviors.

Substrate-primary mode flips that. The bio-substrate becomes the action selector. The LLM is removed from the AUT's decision loop entirely and replaced with NAc.recommend_action() reading from learned causal links, reward biases, and active drive states. The action proposal flows through the same executor.execute() dispatch — but the proposer is the substrate, not a language model.

Why this exists

Three motivations:

1. The "substrate carries cognition" thesis

Maxim's bio-inspired framing claims the bio-substrate is doing real cognitive work. If the LLM is always the action selector, that claim has an asterisk — the substrate could be doing nothing useful and the LLM would still drive coherent behavior. Substrate-primary mode is the experimental setup that proves (or disproves) the substrate's role.

2. The LLM-mitigation drift

A 2026-05-09 audit found roughly 60–70% of recent engineering effort going to LLM-mitigation scaffolding (~845 LOC of band-aids) — stall detectors, JSON repair pipelines, tool-failure hint sections, identity rewrites for small models, format enforcement for planning mode. Each band-aid is a workaround for the LLM doing something the substrate could in principle handle natively. Substrate-primary mode is the structural fix.

3. The Hivemind enabler

Distilled bio-substrate (NAc weights, EC concepts, reflexes) is naturally shareable across instances — far more privacy-friendly and aggregatable than raw episode/dialogue logs. Substrate-primary mode is the natural client of the Maxim Hivemind + Oasis layer; the federated cognition story only works if the substrate can drive behavior on its own.

Parallel-mode architecture, not replacement

Substrate-primary mode runs in parallel to the existing LLM-AUT path. The user-facing default does not change. There are now two operating modes for the AUT:

Mode Action selector Use case
--aut-mode llm-primary (default) LLM proposes; bio-substrate learns from outcomes All current Maxim workloads — D&D campaigns, Reachy demos, headless agent runs
--aut-mode substrate-primary (opt-in, v1.1+) NAc.recommend_action() proposes; LLM not invoked at all on the AUT side Substrate research; Phase 0/1 grounded-language experiments; eventual user-facing path once mature

The orchestrator, environment NPCs, imagination designer, and Oasis distillation all continue to use LLMs. Substrate-primary mode is specifically about the AUT's action loop.

How action selection works

NAc.recommend_action() scores each available tool by combining three signals:

🎯

Causal-link confidence

Primary learned signal. For each candidate tool, NAc looks up positive and negative causal links from tool:X → outcome:Y records in its causal graph. Positive links contribute their best confidence; negative links subtract (weighted lower so a single bad outcome doesn't permanently block exploration).

Reward bias

Secondary learned signal. The per-agent reward_bias[(agent_id, node_id)] map adds a small additional positive nudge for tools the agent has been credited on. Capped at NACConfig.max_reward_bias (default 0.20) by design.

💧

Drive-relevance heuristic

Cold-start fallback. When no learned signal exists, active drives (drive_value > 0.5 for hunger/thirst/fatigue/cold/fear/curiosity/pain) bias selection toward semantically-related tools via a substring + affinity-table match. Phase −1 placeholder for proper EC embedding similarity (Phase 0+ replaces).

The highest-scoring tool above min_confidence (default 0.3) wins. Ties are resolved deterministically by tool name. If nothing scores high enough, the method returns None — substrate-primary mode never falls back to random selection. The substrate must have an opinion to act.

from maxim.decisions.nac import NAc, NACConfig

nac = NAc(NACConfig())
# ... agent has observed pick_up_food → satisfaction (positive, several times)
# ... drives indicate hunger=0.8

action = nac.recommend_action(
    agent_id="my_infant",
    available_tools=["pick_up_food", "examine_rock", "rest"],
    current_drives={"hunger": 0.8},
)
# → {"tool_name": "pick_up_food", "params": {}, "confidence": 0.74,
#    "source": "substrate-primary",
#    "reasoning": "causal_pos=0.62; drive:hunger(0.80) name-match"}

The returned dict is compatible with agents.autonomy.Proposal.action and is dispatched through the standard executor.execute() path — no new dispatcher.

Phase −1 — the gating Boolean (PASSED 2026-05-09)

The most important question in the entire substrate-primary program: can the substrate generate even one non-reflex action without LLM proposal?

If yes, the rest of the program is feasible. If no, NAc needs significant extension before substrate-primary mode is viable.

The Phase −1 prototype lands NAc.recommend_action() and 11 unit tests. All tests pass. The substrate can generate non-reflex actions from learned causal links + drive heuristics.

The next phases (still to ship):

Phase 0 (v1.0 B5)

Wire --aut-mode substrate-primary end-to-end + cradle-prelinguistic harness with motor-only AUT prompt + per-tick telemetry. Proves substrate-primary works in a real sim.

Phase 1 (v1.1)

Vocabulary-constrained mode. The LLM is allowed back as input parser but its output vocabulary is masked to tokens the substrate has bound. Tests how much the LLM was doing beyond I/O.

Phase 2 (v1.1+)

Symbol-binding layer. Small online-trained model that binds words to bio-substrate concepts. Enables vocabulary growth from substrate experience.

Phase 3 (v1.2+)

From-scratch sequence model trained on Roy long-horizon curriculum with substrate-grounding objective. The headline experiment.

Phase 4 (v1.3+)

Pretrained-vs-grounded A/B comparison. Final validation.

Roy harness — how we measure substrate convergence

Roy is the three-arm iteration runner that turns "does the substrate carry the cognition?" into a measurable claim. Each iteration primes a substrate via a multi-stage curriculum, then runs the same held-out test scenario across three arms:

Arm Substrate System prompt at test
A Substrate primed for full Roy lifetime Neutral
B Blank Persona-injected (the old prompt-engineering baseline)
C Blank Neutral

The interesting question is not "does arm A look like the target persona at test time" — both A and B will, because the LLM is good at role-play. The interesting question is: does arm A diverge from arm B at the substrate level? If yes, the substrate is doing real work even when behavior looks similar. maxim roy run <spec.yaml> drives all three arms and emits pairwise substrate_diff reports: NAc reward biases, cluster-keyed reward biases, hippocampus episode counts + valence KS, ATL concept Jaccard.

Roy-0 smoke (2026-05-10 + 2026-05-11 re-measurement)

First end-to-end Roy iteration. 50-turn priming + 3 arms × 3 turns of held-out test against a real LLM leader. Methodology-only smoke; not a persona claim.

Pair reward_bias_l2 cluster_reward_bias_l2 causal-link Δ
a_vs_b 0.0 2.4587 +155
a_vs_c 0.0 2.4587 +155
b_vs_c 0.0 0.2121 0

A-vs-blank ratio ≈ 11.6× over the blank-vs-blank stochastic-cluster noise floor — the first empirical proof the substrate-primary tool-outcome wire fires end-to-end. reward_bias_l2 = 0 is the expected per-ATL-node bias from credit_node (a different code path the tool-outcome wire doesn't touch).

Two G-marked gaps were caught + closed in the same session: G3 (fail-fast LLM preflight probe — aborts in ≤3s when the leader is unreachable, so Roy doesn't grind for 10 minutes on dispatch_exhausted), and G4 (the cluster_id reward-feedback wire deferred when cluster-keyed action selection shipped — substrate-primary tool outcomes now populate NAc._cluster_reward_bias, persist to aut_nac.json, and surface in Roy result.json).

Iteration arc through 2026-05-13 — cluster wire reproduces 7×, gap localized to encoder alignment

After Roy-0 proved the wire fires, six follow-up iterations probed whether the substrate-acquired bias actually changes behavior at test time. Across every iteration the priming-side cluster_reward_bias_l2 reproduces within ~5% (the substrate-primary tool-outcome wire is rock-solid). The failing signal is always which cluster ID the bias attaches to — priming-acquired UUIDs are never the active cluster on test percepts.

Iter Date Variable cluster_l2 a_vs_b Finding
Roy-1a 2026-05-11 llm-primary at test 2.4671 Wire structurally preserved, behaviorally inert — LLM proposer doesn't consume cluster bias.
Roy-1b 2026-05-12 substrate-primary at test 2.4632 Wire consumed but held-out percepts don't fire priming clusters.
Roy-2 2026-05-12 multi-arc priming 2.4708 Multi-arc priming did NOT widen cluster vocabulary. Tool-family divergence via salience-mediated path only.
Roy-2pc 2026-05-13 engineered-overlap fixture 2.4678 Byte-identical action distributions across all 3 arms even with engineered semantic overlap.
Roy-2c 2026-05-13 min_confidence=0.0 probe 2.5661 H1 confirmed — LinguisticEncoder → EC alignment is the block. Gate-tuning does not rescue the wire.
Roy-4 2026-05-13 EC trace + Hebbian sweep 2.4678 FAIL — zero priming↔test bound edges across the full parameter sweep. Cancels the 1.1 cross-modal binding plan.

Cross-iteration pattern: tool name survives, cluster identity doesn't. NAc's cluster_reward_bias map has the right tool keys (priming sense_food_source reward survives all seven iterations) but the wrong cluster keys. The substrate is using two different identity schemes for the same concept — coarse tool-symbol (stable) AND fine EC-cluster (encoder-drift-susceptible). Wire-A in the 0.9.1 release exploits the surviving granularity by surfacing tool-level bias to the LLM prompt; the architectural fix at the cluster layer needs encoder work.

Roy-4 ran the pre-registered cheap-gate experiment for the proposed Hebbian binding rule from cross_modal_substrate_binding.md. Result: zero EC node-ID overlap between priming (37 nodes) and any test arm (10/13/9); priming food clusters fired 61 ticks with only 1 non-food co-firing partner; the most permissive parameter setting (min_cofire=1, min_weight=0.01) yielded 256 priming bound edges but zero priming↔test connections. The temporal-coincidence signal the binding rule depends on doesn't exist in the priming trajectory. Outcome doc: 21_roy_4.md.

The 1.1 plan that supersedes the cancelled binding work is roy_5_encoder_alignment_disambiguator.md — a diagnostic-first reframe written after two parallel pre-merge reviews (architecture + bio-fidelity) independently rejected a user-proposed "central lexicon" direction. Stage 1 (Roy-5a) computes priming↔arm-A cosine matrices on existing Roy-4 data with zero new sim runs; the max cosine value decodes to one of three sub-hypotheses (threshold tuning vs encoder A/B vs cradle-arc redesign) that scopes the implementation.

Full Roy methodology + iteration log: persona_convergence_crucible.md. CLI: maxim roy run <spec.yaml> — see the CLI reference for spec shape + subcommands. Detailed experiment writeups for every iteration: Experiments & Results — Roy iteration arc.

D&D survival as the bidirectional kill criterion

A substrate-primary AUT that cannot survive a D&D-style campaign orchestrated by an LLM-DM is a failed bio-substrate. AND a simulation environment that no learning substrate can navigate is a failed simulation environment. The convergence test is mutually load-bearing:

Outcome Diagnosis
Substrate AUT runs the campaign cleanly Substrate is real. Project thesis validated.
Substrate AUT fails; LLM-AUT succeeds in same scenario Substrate insufficient for non-trivial cognition. Reframe required.
LLM-AUT also fails the same scenario Simulation environment is the failure — the test isn't measuring what we think.
Both succeed but substrate is much weaker Acceptable interim. Scope clear; LLM remains in user-facing path.

D&D was chosen because it has long-horizon temporal structure, novel entities every session, decision-making with delayed reward, role coherence demands, and multi-agent dynamics. Cradle (current sensorimotor learning) is necessary; D&D is sufficient.

Confound discipline: raw vs primed substrate

Substrate-primary Maxims can either start from a fresh substrate (raw — the headline experimental condition) or bootstrap from accumulated experience via the Maxim Hivemind. Both are valid, but they answer different questions:

Both ship in parallel. The grounded-language plan's Phase 0 and Phase 1 specifically run with the Hivemind disabled so the headline experiment stays clean.

Pretrained-LLM crutches to disable

When running substrate-primary mode (or running the LLM-AUT path with the intent of not biasing the substrate's learning), several runtime mitigations should be turned off because they exist specifically to compensate for pretrained LLM behaviors that don't exist in a fresh substrate:

Crutch What it does Disable via
Tool-failure hint section Adds a === Tools You've Hallucinated === block to the prompt listing names the agent previously called that don't exist. E4 validation 2026-05-09 (n=6 per arm) showed no benefit on qwen2.5-14B; default flipped to OFF. MAXIM_TOOL_FAILURE_HINTS=0 (already default)

The bio-natural alternative for tool-failure avoidance already exists: NAc records tool:X → failure:not_registered with negative valence and recommend_action() subtracts that confidence from the tool's score. With the prompt-level hint disabled, the substrate's negative-valence avoidance becomes the only signal — which is exactly what we want to measure.

The crutch table grows over time. The principle: if a mitigation exists because the pretrained LLM does something the substrate hasn't earned, the substrate-only experiments must turn it off so we measure substrate competence and not the mitigation.

{% include "_maxim_footer.html" %}
{% endblock %}