PARACOSM

Run any what-if. See how it actually plays out.

Type a scenario in plain English. Paracosm compiles it into a typed world, runs LLM-driven decision-makers with measurable personalities turn by turn, and gives back a trajectory you can replay, fork, and compare. Open source, deterministic, built for AI agents that need to test decisions before making them.

OPEN SOURCE · APACHE-2.0 · TYPESCRIPT · BUILT ON AGENTOS

Same world, three actors — see how each one's traits drive identical state to different futures. Seven entry points, one engine: brief, JSON contract, granular client, fork-and-replay counterfactuals, pluggable trait models, HTTP, CLI. Every path returns the same Zod-validated RunArtifact.

What's a world model?

A world model is software that represents how some slice of reality changes over time. AI agents need them to test decisions before making them: rehearse the move, watch what unfolds, decide whether to commit.

The phrase has been overloaded since 2024. Older lineages do most of the work: Dreamer and MuZero learn dynamics in latent space; MuJoCo, Brax, and Isaac Sim simulate physics deterministically. Both have been the substrate for robotics research for years. Newer entrants (Sora, Genie 3, Marble) generate pixels — a "world simulator" framing the older lineage rejects (LeCun: pixel generation here is "not only wasteful, it's doomed to failure"). Paracosm sits in the structured / LLM lineage: typed JSON state, deterministic kernel, LLM reasoning lane on top. The output is a RunArtifact: each turn's metrics, decisions, specialist analyses, citations, forged tools. Pixels are what humans watch. State is what agents reason inside.

Visual world model
  • Output: pixels (video / 3D scene)
  • Audience: humans watching
  • Replay: rerun the model, get a new sample
  • For agents: hard. Hard to act inside a 4-second clip.
Structured world model · paracosm
  • Output: typed JSON state, decisions, citations
  • Audience: AI agents deciding and measuring
  • Replay: deterministic kernel, byte-equal output
  • For agents: direct. Each turn is a state transition the agent can reason over.

Surveyed in the ACM CSUR 2025 world-model survey and the robotics-focused World Model for Robot Learning survey, which is explicit that the resurgence comes from model-based RL plus physics simulation, not pixel generation. Critique: Xing 2025. LLM-world-model implementation anchor: Yang et al, 2026 — evaluates LLM-based world models through policy verification, action proposal, and policy planning.

Install (npm, pnpm, or bun)
# ESM + subpath exports. Node 20+, Bun 1.x, or any TS runner with import-attributes. npm install paracosm pnpm add paracosm bun add paracosm # Or globally for the CLI binaries (`paracosm` + `paracosm-dashboard`): npm install -g paracosm
Seven ways to use Paracosm
From a 3-line brief to fork-and-replay counterfactuals to a custom trait model. Click through the tabs to see the input and output shape for each entry point.
Three lines: brief to running simulation
// Free-text brief in. 3 HEXACO actors, 3 RunArtifacts out — one line. import { runMany } from 'paracosm'; const { runs } = await runMany( 'Q3 board brief: lab is preparing to release Atlas-7...', { count: 3 }, ); runs.forEach(({ actor, artifact }) => console.log(actor.name, artifact.fingerprint));
End-to-end runner (real input/output captured)
// Save as index.ts next to my-world.json, then: // npm install paracosm (or: pnpm add paracosm / bun add paracosm) // export OPENAI_API_KEY=sk-... # or ANTHROPIC_API_KEY=sk-ant-... // npx tsx index.ts (or: bun index.ts) // Paracosm auto-detects whichever key is present and falls back between providers. import { compileScenario, WorldModel } from 'paracosm'; import worldJson from './my-world.json' with { type: 'json' }; // 1. Compile JSON into a runnable ScenarioPackage (~$0.10, cached to disk) // Defaults to OpenAI (gpt-5.4-mini). Pass provider: 'anthropic' to switch. const scenario = await compileScenario(worldJson); const wm = WorldModel.fromScenario(scenario); // 2. Define actors with HEXACO personality profiles. // Each actor is one simulation run. Run one, two, or many. // The dashboard runs two side-by-side for comparison, // but the API has no limit. const actors = [ { name: 'Captain Reyes', archetype: 'The Pragmatist', unit: 'Station Alpha', hexaco: { openness: 0.4, conscientiousness: 0.9, extraversion: 0.3, agreeableness: 0.6, emotionality: 0.5, honestyHumility: 0.8 }, instructions: 'You lead by protocol. Safety margins first.', }, { name: 'Captain Okafor', archetype: 'The Innovator', unit: 'Station Beta', hexaco: { openness: 0.9, conscientiousness: 0.4, extraversion: 0.8, agreeableness: 0.5, emotionality: 0.3, honestyHumility: 0.6 }, instructions: 'You lead by experimentation. Push boundaries.', }, ]; // 3. Run in parallel: same seed, divergent events, divergent futures const results = await Promise.all( actors.map(actor => wm.simulate({ actor, maxTurns: 6, seed: 42, // costPreset: 'economy', // uncomment for ~5-10× cheaper iteration // Every event carries a one-liner `summary` field. Prints cleanly // for all 17 event types. Narrow on `e.type` for exact payloads. onEvent(e) { console.log(actor.name, e.type, e.data.summary); }, }) ) ); // 4. Compare the two timelines. Each result is a Zod-validated // RunArtifact exported from `paracosm/schema`. for (const r of results) { console.log(r.metadata.scenario.name, '→', r.fingerprint, 'cost $' + r.cost?.totalUSD.toFixed(2), '(' + r.cost?.llmCalls + ' LLM calls)', 'tools', r.forgedTools?.length ?? 0, 'citations', r.citations?.length ?? 0); if (r.providerError) console.error('provider error:', r.providerError.message); }
Or: quick start with the dashboard
git clone https://github.com/framersai/paracosm cd paracosm && npm install cp .env.example .env # add your OpenAI or Anthropic key npm run dashboard # opens http://localhost:3456 # configure leaders, launch from the UI

What paracosm actually is

Paracosm is a structured, LLM-driven, counterfactual world simulator. It compiles a prompt, document, or URL into a typed scenario, runs it through a deterministic seeded kernel coupled with an LLM reasoning surface, and emits one universal Zod-validated RunArtifact across turn-loop civ sims, batch-trajectory digital twins, and batch-point forecasts. The category is named in Xing 2025, catalogued in the ACM CSUR 2025 survey, and given counterfactual semantics by Kirfel et al, 2025.

From a paragraph to a forkable, replayable, multi-agent world
Structured state, deterministic spine
A JSON-contract-backed world schema with metrics, capacities, statuses, politics, and environment. The kernel is a seeded Mulberry32 state machine — given the same seed, scenario, and actor profiles, every run produces an identical agent roster, lifecycle schedule, and resource trajectory. Divergence comes from personality, not luck. wm.replay(artifact) is byte-equal-deterministic.
Zod schema · Mulberry32 PRNG · ScenarioPackage
Grounded compile
Prompts, articles, and URLs become typed ScenarioPackages through a compile pass. A grounding step then runs entity-derived web search across Serper and Tavily in parallel, dedupes citations by URL, and attaches them to the scenario before any actor is generated. Every commander decision and specialist note in the run can cite a real source.
Serper · Tavily · compile-from-seed
Runtime tool forging (AgentOS)
When a department needs a decision tool the codebase doesn't have, an LLM writes the function mid-run inside a hardened node:vm sandbox with wall-clock and heap caps. A judge LLM reviews safety and correctness; approved tools land in a registry that subsequent turns and other actors pull from. Real examples from prod runs: shelter_priority_score, supply_shortage_severity_score, hurricane_after_action_assessment.
node:vm sandbox · EmergentJudge · tool registry
Personality-driven counterfactuals
Pluggable trait-model leaders steer every decision: HEXACO for humans (Ashton & Lee, 2007), ai-agent for AI systems, custom registrable for any other psychometric. Same scenario, same seed, different personality — different forged tools, different decisions, different population trajectories. wm.forkFromArtifact(trunk, atTurn).simulate(altLeader) branches a counterfactual from any captured turn.
HEXACO · ai-agent · trait-model registry

Inspired by the long arc of world-model research — Ha & Schmidhuber's World Models (2018) dream-state agents, the deterministic-microworld tradition behind Sugarscape and Schelling, the agent-based modeling lineage from Conway and Wolfram, and the structured-prompting trajectory traced by Kirfel et al's counterfactual world simulation models. Where prior work used hand-coded rules, learned latents, or unstructured emergence, paracosm uses an LLM as the world's reasoning surface and a typed kernel as its mutable spine — the result is a deterministic, replayable, citation-grounded simulator that fits both research and product use.

Full taxonomy mapping with adjacent-category citations lives at docs/positioning/world-model-mapping.md.

Seven ScenarioPackages ship with the engine

A ScenarioPackage is a JSON file. It declares departments, metrics, crisis templates, progression rules, and milestone events; the engine reads the package, populates the world, and runs the simulation. No engine code changes required between scenarios — every scenario below loads dynamically from JSON. Compile your own from a brief, URL, or seed prompt with compileScenario().

Mars Genesis · 100 colonists · Flagship
100-person Mars colony across 50 simulated years. Five departments (Medical, Engineering, Agriculture, Psychology, Governance) analyze each event. Agents track bone density, radiation exposure, food reserves, morale, and political faction alignment. The Event Director uses all of this to generate events that target the colony's weakest points.
100 colonists · 5 depts · 6 turns · 50 simulated years · landing 2035
Lunar Outpost · 50 crew · Proven
50-person crew at the lunar south pole. Five departments handle medical, engineering, mining, life-support, and communications. The scenario models regolith toxicity, 1/6g muscle and bone atrophy, Earth-relay communication delays, and crew rotation logistics. Proves the engine works across scenario boundaries with zero engine code changes.
50 crew · 5 depts · 8 turns · 10 simulated years · arrival 2030
Atlas Lab · 480 researchers · Frontier AI
A frontier AI research lab racing competitor labs to ship a model that just crossed capability thresholds. Five departments (Alignment Research, Capability Research, Governance, Deployment Engineering, Communications) weigh release-readiness against deception index, alignment score, deployment risk, and competitor capability gap. The Visionary commander pushes ship-fast; the Engineer commander holds for safety review — same world, divergent decisions.
480 researchers · 5 depts · 6 turns · alignment vs capability arc
Dual Superintelligence Council · 30 council members · AI governance
Year 2035. A deliberative council of 30 members weighs civilization-scale deployment decisions for two recently-stood-up superintelligence systems. Five departments (Alignment Council, Capability Council, Governance, Safety Engineering, Public Communications) test the council's ability to coordinate under deception risk, public scrutiny, and competing incentive structures. Pressure-tests how a single commander's HEXACO profile shifts collective decision-making.
30 council members · 5 depts · 6 turns · civilization-scale deployment
Q-Scope Corp · 40 employees · Corporate strategy
A 40-person developer observability SaaS facing quarterly board pressure. Five departments (Sales, Engineering, Finance, HR, Operations) field events targeting churn risk, runway, hiring velocity, and product reliability. Use this for M&A simulation, market-entry plays, leadership replays — the canonical corporate-strategy scenario. Twelve quarter-turns of decisions across the same starting cap table.
40 employees · 5 depts · 12 turns · quarterly cadence
Deep Ocean Station · 50 crew · Closed habitat
50-person crew aboard a deep-ocean research station, 2038 onwards. Five departments (Medical, Engineering, Marine Science, Life Support, Communications) field hull-pressure events, atmospheric balance, surface-link blackouts, and crew morale under prolonged isolation. A pressure-vessel analog to Lunar Outpost — same closed-system shape, different physics.
50 crew · 5 depts · 8 turns · 2038 launch · sealed habitat
T2D + GLP-1 Care · 1 patient · Digital-twin medicine
Single-patient digital twin tracking a Type-2 diabetes patient on a GLP-1 protocol across six clinical timepoints. Five care-team departments (Endocrinology, Nutrition, Behavioral Psych, Cardiology, Lifestyle Coach) read the same chart and propose interventions. The scenario uses paracosm/digital-twin: subject + intervention configs feed the same kernel, and the artifact carries the per-timepoint chart so a different physician HEXACO profile produces a measurably different trajectory.
1 patient · 5 depts · 6 timepoints · GLP-1 protocol arc

Where decision teams use Paracosm

Any domain where testing decisions before making them reduces risk: digital-twin medicine, corporate strategy, policy simulation, wargaming, game design, alignment red-teaming. The engine is domain-agnostic; the scenario contract names the world.

Digital-twin medicine

Model a patient or clinical workflow as a SubjectConfig; replay an intervention with a different physician HEXACO profile and see the divergence on outcome metrics. The paracosm/digital-twin subpath is the canonical entry.

Corporate strategy

Run M&A scenarios, market-entry plays, or workforce-reduction contingencies against multiple leadership profiles. Different CEO personalities, same starting conditions, measurable trajectory differences.

Policy simulation

Model a policy rollout (tax change, healthcare reform, regulation) as the intervention; agencies and citizens as the population. Replay any past run to check kernel determinism after a code change.

Wargaming and scenario planning

Defense and intelligence: explore how different commander profiles produce divergent crises and divergent responses from identical starting conditions. The deterministic kernel anchors replay; the LLM Event Director reads each leader's HEXACO + accumulated state, so the events themselves diverge from turn 1.

Game design playtesting

Generate procedural NPC civilizations and emergent narratives. Forge tools at runtime so departments compute their own balance calculators inside a hardened sandbox.

Alignment red-teaming

Pressure-test agent behavior under leader profiles that emphasize specific HEXACO traits. Reproducible by construction; replay to verify the same prompt and seed yields the same trajectory.

Coming soon: enterprise SaaS dashboard

The open-source engine ships today on npm. The hosted product layer is in development at Frame.dev for organizations that need decision intelligence at scale: defense, government, frontier AI labs, corporate strategy, R&D, alignment teams. No pricing tiers yet. The list below is what's on the roadmap.

Full analytics suite

Per-leader cohort analysis, A/B benchmarks across HEXACO archetypes, scenario regression dashboards, decision-quality scoring, fleet-wide trajectory diff. Every artifact streams into the analytics warehouse with the universal RunArtifact schema.

analytics warehouse: per-run, per-leader, per-decision, per-forge

Fleet orchestration

Run 10, 50, or 500+ leaders through the same scenario in parallel across distributed worker nodes. Aggregate comparison views. Live progress streaming. Automated batch sweeps with shared seeds and shared compiled scenarios. Cancel-on-divergence policies for early stopping.

distributed worker pool, queue-backed dispatch, autoscaling

Persistent memory + replay

Multi-session agent memory with consolidation, eviction, and stance drift across runs. Reload any past run. Replay any artifact for byte-equal kernel verification. Compare runs across paracosm versions to localize behavior changes between releases.

durable agent memory, kernel replay, version-diff regression

Team workspaces

Shared scenario libraries, leader rosters, and run history with role-based access. Approval gates on production simulations. Audit trails on every run. Scenario authoring studio with a visual form-based editor on top of the canonical JSON contract.

SSO, RBAC, audit log, scenario marketplace, version control

Business intelligence

Leadership-decision modeling for organizations: model M&A scenarios, market-entry plays, workforce planning, governance experiments, alignment red-team sweeps, defense wargames. Integrate with internal data warehouses and identity providers for real-context scenario seeding.

SQL connector, S3/Snowflake/BigQuery seed ingestion, SSO

Private deployment

Dedicated cluster or VPC deployment for organizations that need data sovereignty, on-prem inference, air-gapped runs, or regulated compliance posture. Custom scenario authoring services. Integration with internal LLM gateways. SLA, dedicated support, custom rate limits.

VPC / on-prem / air-gapped, custom inference gateway, SLA
Request early access Custom engagements

Built by Frame.dev

Paracosm is built by Frame.dev, the engineering arm of Manic Agency LLC. Sister projects: AgentOS (the agent runtime paracosm runs on) and Wilds.ai (AI-NPC game worlds).

Enterprise & partnerships

Private deployment, custom scenarios, government and defense engagements, investment inquiries.

team@frame.dev

Community & open source

Bug reports, feature requests, scenario contributions, general discussion.

GitHub · Discord
Compare paracosm vs AutoGen / CrewAI / LangGraph Click to expand · feature-by-feature

All four are TypeScript/Python agent frameworks shipping on npm or PyPI. They overlap on "multi-agent orchestration" but diverge sharply on what they treat as the core abstraction: a chat graph (CrewAI, AutoGen), a state graph (LangGraph), or a deterministic structured world model (paracosm).

Capability paracosm AutoGen CrewAI LangGraph
Core abstraction Structured world model with seeded kernel + per-turn state Multi-agent chat orchestration Role-based agent crew with sequential or hierarchical tasks State graph of agent nodes with explicit edges
Determinism Byte-equal at fixed seed via Mulberry32 + canonical JSON None — LLM non-determinism throughout None — LLM non-determinism throughout Partial — graph control flow is deterministic, agent calls are not
Counterfactual replay wm.replay(artifact) reproduces byte-equal artifacts Not supported Not supported Not supported (rerun the graph from scratch)
Fork at past state wm.forkFromArtifact(artifact, atTurn) Not supported Not supported Time-travel via persistent checkpoints (langgraph-checkpoint)
Personality model HEXACO (humans) + ai-agent + pluggable trait registry Per-agent system prompt only Per-agent role + goal + backstory text Per-agent system prompt only
Runtime tool forging Yes — sandboxed node:vm tool generation, judge-validated Function-calling only — predefined tools Function-calling only — predefined tools Function-calling only — predefined tools
Universal output contract Zod-validated RunArtifact across all simulation modes Per-agent message log, no unified schema Task outputs in agent-specific shape Final state object (graph-defined)
Digital-twin pattern First-class wm.intervene({ subject, intervention, actor }) Not supported (no subject/intervention primitive) Not supported Not supported (build it yourself with state nodes)
Compile from prose WorldModel.fromPrompt — brief or URL → typed scenario Not supported (hand-author agents + graph) Not supported (hand-author crew + tasks) Not supported (hand-author state graph)
License Apache-2.0 MIT MIT MIT
Language TypeScript (Node 20+, Bun, Deno) Python · .NET Python Python · TypeScript
WHITEPAPER · COMING SOON

The Paracosm Technical Whitepaper

The full architecture and methodology, written for engineers and researchers who want a citable PDF instead of scrolling docs. Deterministic kernel design, HEXACO leader differentiation, LLM-as-a-judge event grounding, runtime tool forging with sandboxed safety, reproducible counterfactual surfaces (fork / replay / intervene), and the `RunArtifact` schema with Zod validation for byte-equal reproducibility at fixed seeds.

Architecture
ScenarioPackage compilation, deterministic kernel, HEXACO leader engine, LLM event director, judge layer, snapshot capture for fork & replay.
Methodology
Counterfactual surfaces (fork at turn N, replay byte-equal, intervene as digital twin), divergence metrics, KPI rollups, judge verdict semantics.
Reproducibility
Single-CLI run, seed-pinned kernel, Zod-validated artifacts, sandboxed tool forging with explicit safety bounds, end-to-end COOKBOOK with real input + output JSON.
Early access · shipping Q3 2026

Request early access

Join the waitlist for the hosted dashboard, fleet orchestration, and team workspaces. We'll reach out as features ship.

No spam. One confirmation email from team@frame.dev. Unsubscribe anytime.

Frequently asked questions

Everything you'd want to know about the engine, the kernel, the trait models, and the research it builds on. Click any question to expand.

What is Paracosm?
An open-source TypeScript structured world-model engine for LLM agent swarms (npm install paracosm, Apache-2.0). You define or ground a world as a ScenarioPackage: departments, metrics, crisis templates, progression hooks. The engine populates the world, generates events from live state, and resolves outcomes through a seeded deterministic kernel. LLM-driven stages diverge because every prompt carries the leader's HEXACO profile. Two scenarios ship in the box (Mars Genesis, Lunar Outpost); you can compile new ones from a JSON draft, prompt, brief, or URL.
What are the two kinds of "world model" and which one is Paracosm?
The phrase has been overloaded since 2024. Older lineages do most of the work: Dreamer (Hafner et al, V1 2019 → V3 2023) and MuZero (Schrittwieser et al, 2020) learn dynamics in latent space. MuJoCo, Brax, and Isaac Sim simulate physics deterministically. Both have been the substrate for robotics research for years. Yann LeCun's JEPA family (I-JEPA, V-JEPA) is explicitly anti-pixel — the architectural argument is that pixel reconstruction is wasteful and prediction should happen at the level of meaning. Newer pixel-generation entrantsSora, Genie 3, World Labs Marble — get the headlines with a "world simulator" framing the older lineage rejects (LeCun on X: pixel generation here is "not only wasteful, it's doomed to failure"). Structured / LLM-based: typed scenario contracts plus deterministic kernels emit JSON state, decisions, and citations that agents can reason inside. Paracosm sits in the structured / LLM lineage. The split is surveyed in the Tsinghua FIB Lab world-model survey (ACM CSUR 2025) and the World Model for Robot Learning survey; critique in Xing 2025; counterfactual semantics in Kirfel et al, 2025.
Tsinghua FIB Lab. Understanding World or Predicting Future? A Comprehensive Survey of World Models (ACM CSUR 2025) · World Model for Robot Learning: A Comprehensive Survey (arXiv 2605.00080) · Xing E. P. Critiques of World Models (arXiv 2507.05169, 2025) · Is Sora a World Simulator? (arXiv 2405.03520) · Kirfel L. et al. Counterfactual World Simulation Models (AI & Ethics 2025) · Ha & Schmidhuber. World Models (arXiv 1803.10122, 2018)
How does Paracosm differ from MiroFish, OASIS, and other agent-based modeling stacks?
MiroFish and OASIS (CAMEL-AI) run open-ended social simulations with thousands to one million LLM-driven agents on social-media-shaped substrates and emit aggregate prediction reports. Paracosm runs a top-down agent swarm: one HEXACO-typed leader, ~5 specialist departments, ~100 personality-typed cells, deterministic per seed. Different scale (~100 vs 1k-1M), different direction (top-down vs bottom-up emergent), different output (universal forkable RunArtifact vs aggregate forecast). Both are valid agent-swarm shapes; Paracosm's lane is replayable decision support and counterfactual analysis. Classical ABM (Mesa, NetLogo, MASON, ABIDES, AnyLogic) is rule-based and largely non-LLM; Paracosm threads LLMs into events and reasoning while keeping a deterministic kernel for state.
CAMEL-AI / OASIS · agiresearch/MiroFish · classical ABM stacks (Mesa, NetLogo) · the bridge survey is the Nature HSSC LLM-empowered ABM survey (2024) and MIT Media Lab's On the Limits of Agency in Agent-Based Models (arXiv 2409.10568, 2024).
How does a single turn run end-to-end?
A turn takes 30-90 seconds and runs four phases in sequence. (1) Event Director reads world state, prior decisions, and tool intelligence; generates events that target weak points. (2) Department Analysis — five specialist agents (medical, engineering, agriculture, psychology, governance for Mars; configurable per scenario) analyze the events in parallel, can recall research, and may forge new computational tools. (3) Commander Decision — the leader weighs department input through their HEXACO profile and emits an action category, parameter set, rationale, and confidence score. (4) Kernel Resolution — the deterministic state machine applies the decision's effects: resources move, statuses change, agents are promoted or lost, HEXACO traits drift through leader-pull, role-activation, and outcome-reinforcement. The full per-turn record is appended to the RunArtifact.
Inspired by interactive-narrative directors (Riedl & Bulitko, Interactive Narrative: An Intelligent Systems Approach, AI Magazine 34(1), 2013).
Is the simulation deterministic? How is replay verified?
The kernel is fully deterministic, seeded by Mulberry32 (32-bit PRNG). Given the same seed, scenario, and actor profile, the kernel produces identical agent rosters, lifecycle schedules, promotion sequences, and resource starting values on every run. Divergence between parallel worlds comes entirely from the AI commander's decisions. wm.replay(artifact) re-executes the deterministic between-turn progression hook from each recorded snapshot, captures fresh snapshots, and compares under canonical JSON: matches=true proves byte-equal-determinism for that artifact's transitions; matches=false names the first divergence path so changes can be localized. wm.forkFromArtifact(trunk, atTurn).simulate(altLeader) branches a counterfactual from any captured turn — the CWSM pattern.
PRNG methodology: Law & Kelton, Simulation Modeling and Analysis (5th ed., McGraw-Hill, 2015) · CWSM framing: Kirfel L. et al. (2025) · API surface: WorldModel.replay(), canonicalJson, firstDivergence in paracosm/runtime/world-model.
How does runtime tool forging work, and is it safe?
When a department agent needs a calculation the codebase doesn't provide, it writes one. A medical officer facing a radiation crisis writes a dose calculator. An engineer writes a structural load analyzer. The pipeline: specialist proposes a tool → Zod-validated input/output schema → TypeScript body authored by the LLM → executed in a hardened node:vm sandbox with a 5s wall-clock default and ~128 MB nominal heap budget. The sandbox always bans eval, Function, require, dynamic import, process, child_process, and destructive fs.*; fetch, fs.readFile, and crypto are opt-in via empty-by-default allowlist. An LLM judge reviews the output against the specialist's stated intent — match approves, mismatch rejects. Approved tools land in a discoverable registry that subsequent turns and other actors pull from. Reuse via call_forged_tool costs tens of tokens; fresh forges cost full LLM tokens for proposal + body + scaffolding + judge. After turn three, most decisions invoke a previously-forged tool and total run cost flattens.
Implementation: AgentOS EmergentCapabilityEngine + EmergentJudge + ForgeToolMetaTool + CodeSandbox. Preemptive limits via isolated-vm are queued for the hosted multi-tenant tier.
What is the HEXACO personality model and why use it?
HEXACO is a six-factor framework with extensive cross-cultural validation: Honesty-Humility, Emotionality, eXtraversion, Agreeableness, Conscientiousness, Openness to Experience. It is a six-factor extension of the Big Five with Honesty-Humility split out as a separate axis. Each agent carries six continuous values in [0,1]; values drift across turns through three mechanisms encoded in the kernel (not in prompts): leader-pull (subordinate traits trend toward the commander's profile), role-activation (a chief engineer's conscientiousness creeps up), and outcome-reinforcement (a successful bold call reinforces openness). Personality biases which specialists are consulted, which decisions are accepted, and which tools get forged. Prompt-only personality dissolves under context pressure; kernel-encoded personality survives. The microbench is open at agentos-bench/HexacoEncodingBias.
Ashton, M. C. & Lee, K. Empirical, Theoretical, and Practical Advantages of the HEXACO Model of Personality Structure. Personality and Social Psychology Review, 11(2), 150-166 (2007). doi:10.1177/1088868306294907
Are leaders locked to human personality?
No. Paracosm ships a pluggable TraitModel registry with two built-ins: hexaco (Ashton-Lee six-axis human personality) and ai-agent (six axes for AI-system leaders: exploration, verification-rigor, deference, risk-tolerance, transparency, instruction-following). Each model declares its axes, drift table, and prompt cues. Custom models register in a single call: traitModelRegistry.register({...}). The same agent under two different leaders, on the same seed, becomes a measurably different entity by turn 12.
Ashton & Lee, PSPR 11(2), 2007 (HEXACO) · ai-agent v1 calibrated by Frame.dev, 2026 · Registry source: paracosm/engine/traits
How does grounded compile work? What sources can I feed in?
Pass a brief, paper, URL, or JSON draft into the compiler. An LLM extracts topics, facts, and search queries; AgentOS WebSearchService fans out to Firecrawl, Tavily, Serper, and Brave in parallel; results pass through semantic dedup, RRF fusion, and Cohere Rerank v3.5 for neural relevance scoring. The resulting KnowledgeBundle ingests into an AgentOS sqlite-backed memory store. Every department analysis can recall semantically and cite the same sources you fed in. The orchestrator guarantees provenance: when the LLM omits citations, the research packet is auto-attached so the report always carries the sources the agent saw. Compile cost is roughly $0.10 and caches to disk by seed signature.
CLI: paracosm compile --seed-text / --seed-url · Library: compileScenario() / WorldModel.fromPrompt() in paracosm/engine/compiler · Reranker: Cohere Rerank v3.5
How is paracosm architected? Three layers, why?
Three layers, strict boundaries, no circular dependencies. Engine layer owns types and the deterministic kernel: SimulationKernel, SeededRng, ScenarioPackage types, Effect/Metric registries (~71 source files, TypeScript, Mulberry32, Apache-2.0). Runtime layer orchestrates simulations via 6 scenario hooks: Orchestrator, Director, Departments, Batch Runner, scenario-agnostic via progression and prompt hooks (~43 files, AgentOS, generateText(), SSE). CLI layer serves the dashboard, batch runner, and HTTP server (~30 files plus the React/Vite/Tailwind dashboard). The engine never imports from runtime; the runtime never imports from CLI. Underlying it all: @framers/agentos — agent() API, EmergentCapabilityEngine, EmergentJudge, ForgeToolMetaTool, AgentMemory.
Can I model a single subject (digital twin)?
Yes. The paracosm/digital-twin subpath is the curated entry: supply a SubjectConfig + InterventionConfig, get the same Zod-validated RunArtifact shape with artifact.subject and artifact.intervention populated for traceability. artifact.trajectory.timepoints[t].worldSnapshot carries per-timepoint metrics (e.g. HbA1c, weight, BMI for a clinical scenario; engagement, churn, ARPU for a product scenario). The kernel is domain-agnostic — the scenario contract names the world. Worked examples: Maria Chen, T2D, 12-week semaglutide protocol for clinical, SaaS mid-tier pricing introduction for product, Coastal congestion-pricing pilot for policy.
API: DigitalTwin.fromJson(scenarioJson).intervene({ subject, intervention, actor }) · Live route: POST /api/quickstart/simulate-intervention · Counterfactual semantics: Kirfel et al, AI & Ethics (2025); AXIS interrogation framework: arXiv 2505.17801 (May 2025).
Where can I see real input and output JSON?
Every method in the public API has its real input JSON and real output JSON captured in docs/COOKBOOK.md. One run threads through seven public surfaces: prompt-to-world, quickstart parallel leaders (runMany), branch at turn N (wm.forkFromArtifact), kernel replay (wm.replay), HTTP /simulate endpoint, digital-twin intervene(), batch dispatch (runBatch). Every JSON committed. Plus scripts/cookbook-e2e.ts in the repo runs the full pipeline against your scenario / leader profiles / seed and captures live fingerprints + per-metric deltas.
Cookbook: docs/COOKBOOK.md · End-to-end: scripts/cookbook-e2e.ts
What LLM providers work with Paracosm?
OpenAI (GPT-5.4 family) and Anthropic (Claude Sonnet 4.6, Claude Haiku 4.5). Each role (commander, departments, director, judge, agent reactions) is configurable independently. Cost presets quality and economy bundle sensible defaults; per-role overrides win. Paracosm auto-detects which API key is present in env and falls back between providers. Mini models are available for cost iteration but produce lower-quality structured output and trip the judge more often.
What does a typical run cost?
A six-turn run on a small scenario with default settings costs roughly tens of cents under economy and a few dollars under quality. Compile-from-seed adds ~$0.10 (cached to disk by seed signature). Tool reuse after turn three is the largest cost lever; the artifact records every token spend with cache-hit accounting. Anthropic's prompt cache TTL hits the shared system prefix from turn 2 onward (10× input-cost reduction); OpenAI auto-caches any prompt over 1024 tokens. The cost field reports tokens read, tokens created, and USD saved per run.
What verticals does this apply to?
Any domain where testing decisions before making them reduces risk. Digital-twin medicine (model a patient under interventions, vary the physician HEXACO profile). Corporate strategy (M&A, market entry, RIF contingencies, leadership-style replays). Policy simulation (tax change, regulatory rollout, congestion pricing, healthcare reform). Defense / wargaming (commander HEXACO under shared starting conditions; replay any past run after a code change). Game design playtesting (procedural NPC civilizations, emergent narratives, runtime forge inside the sandbox). AI alignment red-teaming (pressure-test agent behavior under specific HEXACO weights; reproducible by construction). The engine is domain-agnostic; the scenario contract names the world.
Can I build my own scenario?
Three paths. (1) Write a ScenarioPackage in TypeScript with custom hooks for progression, crisis generation, milestones, and politics — full control. (2) Write a scenario JSON draft and run it through compileScenario() — zero-code authoring, $0.10 per compile, caches to disk. (3) Pass a brief, paper, or URL to WorldModel.fromPrompt() and let an LLM draft the JSON contract first — the path the live demo uses. All three produce the same canonical ScenarioPackage and route through the same runtime. An Antarctic station, a submarine crew, a corporate org, a generation ship: different world contract, same engine, different future.
Does Paracosm need an internet connection?
Yes — for research grounding (web search) and the LLM calls themselves. Offline mode falls back to LLM-only event generation without research citations; the artifact records the offline flag so runs are auditable for whether they were grounded. For air-gapped deployments, the hosted enterprise tier supports a custom inference gateway and pre-ingested research bundles.
Is Paracosm free?
The engine is open source under Apache-2.0. Use it in commercial products without restriction: install from npm, fork the repo, build on it. You provide LLM API keys; running a 6-turn quality-preset run costs roughly $1-3 on OpenAI, $3-7 on Anthropic. A hosted enterprise SaaS dashboard with full analytics, fleet orchestration, team workspaces, and private-deployment options is in development for Q3 2026 — request early access at the bottom of this page.
Where can I read more or contribute?
Source: github.com/framersai/paracosm. Cookbook: docs/COOKBOOK.md. Architecture write-up: docs/ARCHITECTURE.md. Long-form launch post (the wilds.ai vault story + engine reference, Johnny Dunn): agentos.sh/blog/paracosm-launch. AgentOS runtime: github.com/framersai/agentos. Benchmarks: github.com/framersai/agentos-bench. Reading lists: LLM-Agents-Papers, Awesome World Models. Issues, PRs, and waitlist signups all welcome.