Digital-twin medicine
Model a patient or clinical workflow as a SubjectConfig; replay an intervention with a different physician HEXACO profile and see the divergence on outcome metrics. The paracosm/digital-twin subpath is the canonical entry.
Run any what-if. See how it actually plays out.
Type a scenario in plain English. Paracosm compiles it into a typed world, runs LLM-driven decision-makers with measurable personalities turn by turn, and gives back a trajectory you can replay, fork, and compare. Open source, deterministic, built for AI agents that need to test decisions before making them.
OPEN SOURCE · APACHE-2.0 · TYPESCRIPT · BUILT ON AGENTOS
Same world, three actors — see how each one's traits drive identical state to different futures. Seven entry points, one engine: brief, JSON contract, granular client, fork-and-replay counterfactuals, pluggable trait models, HTTP, CLI. Every path returns the same Zod-validated RunArtifact.
A world model is software that represents how some slice of reality changes over time. AI agents need them to test decisions before making them: rehearse the move, watch what unfolds, decide whether to commit.
The phrase has been overloaded since 2024. Older lineages do most of the work: Dreamer and MuZero learn dynamics in latent space; MuJoCo, Brax, and Isaac Sim simulate physics deterministically. Both have been the substrate for robotics research for years. Newer entrants (Sora, Genie 3, Marble) generate pixels — a "world simulator" framing the older lineage rejects (LeCun: pixel generation here is "not only wasteful, it's doomed to failure"). Paracosm sits in the structured / LLM lineage: typed JSON state, deterministic kernel, LLM reasoning lane on top. The output is a RunArtifact: each turn's metrics, decisions, specialist analyses, citations, forged tools. Pixels are what humans watch. State is what agents reason inside.
Surveyed in the ACM CSUR 2025 world-model survey and the robotics-focused World Model for Robot Learning survey, which is explicit that the resurgence comes from model-based RL plus physics simulation, not pixel generation. Critique: Xing 2025. LLM-world-model implementation anchor: Yang et al, 2026 — evaluates LLM-based world models through policy verification, action proposal, and policy planning.
Paracosm is a structured, LLM-driven, counterfactual world simulator. It compiles a prompt, document, or URL into a typed scenario, runs it through a deterministic seeded kernel coupled with an LLM reasoning surface, and emits one universal Zod-validated RunArtifact across turn-loop civ sims, batch-trajectory digital twins, and batch-point forecasts. The category is named in Xing 2025, catalogued in the ACM CSUR 2025 survey, and given counterfactual semantics by Kirfel et al, 2025.
wm.replay(artifact) is byte-equal-deterministic.ScenarioPackages through a compile pass. A grounding step then runs entity-derived web search across Serper and Tavily in parallel, dedupes citations by URL, and attaches them to the scenario before any actor is generated. Every commander decision and specialist note in the run can cite a real source.node:vm sandbox with wall-clock and heap caps. A judge LLM reviews safety and correctness; approved tools land in a registry that subsequent turns and other actors pull from. Real examples from prod runs: shelter_priority_score, supply_shortage_severity_score, hurricane_after_action_assessment.wm.forkFromArtifact(trunk, atTurn).simulate(altLeader) branches a counterfactual from any captured turn.Inspired by the long arc of world-model research — Ha & Schmidhuber's World Models (2018) dream-state agents, the deterministic-microworld tradition behind Sugarscape and Schelling, the agent-based modeling lineage from Conway and Wolfram, and the structured-prompting trajectory traced by Kirfel et al's counterfactual world simulation models. Where prior work used hand-coded rules, learned latents, or unstructured emergence, paracosm uses an LLM as the world's reasoning surface and a typed kernel as its mutable spine — the result is a deterministic, replayable, citation-grounded simulator that fits both research and product use.
Full taxonomy mapping with adjacent-category citations lives at docs/positioning/world-model-mapping.md.
A ScenarioPackage is a JSON file. It declares departments, metrics, crisis templates, progression rules, and milestone events; the engine reads the package, populates the world, and runs the simulation. No engine code changes required between scenarios — every scenario below loads dynamically from JSON. Compile your own from a brief, URL, or seed prompt with compileScenario().
paracosm/digital-twin: subject + intervention configs feed the same kernel, and the artifact carries the per-timepoint chart so a different physician HEXACO profile produces a measurably different trajectory.
Any domain where testing decisions before making them reduces risk: digital-twin medicine, corporate strategy, policy simulation, wargaming, game design, alignment red-teaming. The engine is domain-agnostic; the scenario contract names the world.
Model a patient or clinical workflow as a SubjectConfig; replay an intervention with a different physician HEXACO profile and see the divergence on outcome metrics. The paracosm/digital-twin subpath is the canonical entry.
Run M&A scenarios, market-entry plays, or workforce-reduction contingencies against multiple leadership profiles. Different CEO personalities, same starting conditions, measurable trajectory differences.
Model a policy rollout (tax change, healthcare reform, regulation) as the intervention; agencies and citizens as the population. Replay any past run to check kernel determinism after a code change.
Defense and intelligence: explore how different commander profiles produce divergent crises and divergent responses from identical starting conditions. The deterministic kernel anchors replay; the LLM Event Director reads each leader's HEXACO + accumulated state, so the events themselves diverge from turn 1.
Generate procedural NPC civilizations and emergent narratives. Forge tools at runtime so departments compute their own balance calculators inside a hardened sandbox.
Pressure-test agent behavior under leader profiles that emphasize specific HEXACO traits. Reproducible by construction; replay to verify the same prompt and seed yields the same trajectory.
The open-source engine ships today on npm. The hosted product layer is in development at Frame.dev for organizations that need decision intelligence at scale: defense, government, frontier AI labs, corporate strategy, R&D, alignment teams. No pricing tiers yet. The list below is what's on the roadmap.
Per-leader cohort analysis, A/B benchmarks across HEXACO archetypes, scenario regression dashboards, decision-quality scoring, fleet-wide trajectory diff. Every artifact streams into the analytics warehouse with the universal RunArtifact schema.
Run 10, 50, or 500+ leaders through the same scenario in parallel across distributed worker nodes. Aggregate comparison views. Live progress streaming. Automated batch sweeps with shared seeds and shared compiled scenarios. Cancel-on-divergence policies for early stopping.
Multi-session agent memory with consolidation, eviction, and stance drift across runs. Reload any past run. Replay any artifact for byte-equal kernel verification. Compare runs across paracosm versions to localize behavior changes between releases.
Shared scenario libraries, leader rosters, and run history with role-based access. Approval gates on production simulations. Audit trails on every run. Scenario authoring studio with a visual form-based editor on top of the canonical JSON contract.
Leadership-decision modeling for organizations: model M&A scenarios, market-entry plays, workforce planning, governance experiments, alignment red-team sweeps, defense wargames. Integrate with internal data warehouses and identity providers for real-context scenario seeding.
Dedicated cluster or VPC deployment for organizations that need data sovereignty, on-prem inference, air-gapped runs, or regulated compliance posture. Custom scenario authoring services. Integration with internal LLM gateways. SLA, dedicated support, custom rate limits.
Paracosm is built by Frame.dev, the engineering arm of Manic Agency LLC. Sister projects: AgentOS (the agent runtime paracosm runs on) and Wilds.ai (AI-NPC game worlds).
Private deployment, custom scenarios, government and defense engagements, investment inquiries.
team@frame.devAll four are TypeScript/Python agent frameworks shipping on npm or PyPI. They overlap on "multi-agent orchestration" but diverge sharply on what they treat as the core abstraction: a chat graph (CrewAI, AutoGen), a state graph (LangGraph), or a deterministic structured world model (paracosm).
| Capability | paracosm | AutoGen | CrewAI | LangGraph |
|---|---|---|---|---|
| Core abstraction | Structured world model with seeded kernel + per-turn state | Multi-agent chat orchestration | Role-based agent crew with sequential or hierarchical tasks | State graph of agent nodes with explicit edges |
| Determinism | Byte-equal at fixed seed via Mulberry32 + canonical JSON | None — LLM non-determinism throughout | None — LLM non-determinism throughout | Partial — graph control flow is deterministic, agent calls are not |
| Counterfactual replay | wm.replay(artifact) reproduces byte-equal artifacts |
Not supported | Not supported | Not supported (rerun the graph from scratch) |
| Fork at past state | wm.forkFromArtifact(artifact, atTurn) |
Not supported | Not supported | Time-travel via persistent checkpoints (langgraph-checkpoint) |
| Personality model | HEXACO (humans) + ai-agent + pluggable trait registry | Per-agent system prompt only | Per-agent role + goal + backstory text | Per-agent system prompt only |
| Runtime tool forging | Yes — sandboxed node:vm tool generation, judge-validated |
Function-calling only — predefined tools | Function-calling only — predefined tools | Function-calling only — predefined tools |
| Universal output contract | Zod-validated RunArtifact across all simulation modes |
Per-agent message log, no unified schema | Task outputs in agent-specific shape | Final state object (graph-defined) |
| Digital-twin pattern | First-class wm.intervene({ subject, intervention, actor }) |
Not supported (no subject/intervention primitive) | Not supported | Not supported (build it yourself with state nodes) |
| Compile from prose | WorldModel.fromPrompt — brief or URL → typed scenario |
Not supported (hand-author agents + graph) | Not supported (hand-author crew + tasks) | Not supported (hand-author state graph) |
| License | Apache-2.0 | MIT | MIT | MIT |
| Language | TypeScript (Node 20+, Bun, Deno) | Python · .NET | Python | Python · TypeScript |
The full architecture and methodology, written for engineers and researchers who want a citable PDF instead of scrolling docs. Deterministic kernel design, HEXACO leader differentiation, LLM-as-a-judge event grounding, runtime tool forging with sandboxed safety, reproducible counterfactual surfaces (fork / replay / intervene), and the `RunArtifact` schema with Zod validation for byte-equal reproducibility at fixed seeds.
Join the waitlist for the hosted dashboard, fleet orchestration, and team workspaces. We'll reach out as features ship.
Everything you'd want to know about the engine, the kernel, the trait models, and the research it builds on. Click any question to expand.
npm install paracosm, Apache-2.0). You define or ground a world as a ScenarioPackage: departments, metrics, crisis templates, progression hooks. The engine populates the world, generates events from live state, and resolves outcomes through a seeded deterministic kernel. LLM-driven stages diverge because every prompt carries the leader's HEXACO profile. Two scenarios ship in the box (Mars Genesis, Lunar Outpost); you can compile new ones from a JSON draft, prompt, brief, or URL.
RunArtifact.
wm.replay(artifact) re-executes the deterministic between-turn progression hook from each recorded snapshot, captures fresh snapshots, and compares under canonical JSON: matches=true proves byte-equal-determinism for that artifact's transitions; matches=false names the first divergence path so changes can be localized. wm.forkFromArtifact(trunk, atTurn).simulate(altLeader) branches a counterfactual from any captured turn — the CWSM pattern.
WorldModel.replay(), canonicalJson, firstDivergence in paracosm/runtime/world-model.node:vm sandbox with a 5s wall-clock default and ~128 MB nominal heap budget. The sandbox always bans eval, Function, require, dynamic import, process, child_process, and destructive fs.*; fetch, fs.readFile, and crypto are opt-in via empty-by-default allowlist. An LLM judge reviews the output against the specialist's stated intent — match approves, mismatch rejects. Approved tools land in a discoverable registry that subsequent turns and other actors pull from. Reuse via call_forged_tool costs tens of tokens; fresh forges cost full LLM tokens for proposal + body + scaffolding + judge. After turn three, most decisions invoke a previously-forged tool and total run cost flattens.
EmergentCapabilityEngine + EmergentJudge + ForgeToolMetaTool + CodeSandbox. Preemptive limits via isolated-vm are queued for the hosted multi-tenant tier.TraitModel registry with two built-ins: hexaco (Ashton-Lee six-axis human personality) and ai-agent (six axes for AI-system leaders: exploration, verification-rigor, deference, risk-tolerance, transparency, instruction-following). Each model declares its axes, drift table, and prompt cues. Custom models register in a single call: traitModelRegistry.register({...}). The same agent under two different leaders, on the same seed, becomes a measurably different entity by turn 12.
paracosm/engine/traitsWebSearchService fans out to Firecrawl, Tavily, Serper, and Brave in parallel; results pass through semantic dedup, RRF fusion, and Cohere Rerank v3.5 for neural relevance scoring. The resulting KnowledgeBundle ingests into an AgentOS sqlite-backed memory store. Every department analysis can recall semantically and cite the same sources you fed in. The orchestrator guarantees provenance: when the LLM omits citations, the research packet is auto-attached so the report always carries the sources the agent saw. Compile cost is roughly $0.10 and caches to disk by seed signature.
paracosm compile --seed-text / --seed-url · Library: compileScenario() / WorldModel.fromPrompt() in paracosm/engine/compiler · Reranker: Cohere Rerank v3.5SimulationKernel, SeededRng, ScenarioPackage types, Effect/Metric registries (~71 source files, TypeScript, Mulberry32, Apache-2.0). Runtime layer orchestrates simulations via 6 scenario hooks: Orchestrator, Director, Departments, Batch Runner, scenario-agnostic via progression and prompt hooks (~43 files, AgentOS, generateText(), SSE). CLI layer serves the dashboard, batch runner, and HTTP server (~30 files plus the React/Vite/Tailwind dashboard). The engine never imports from runtime; the runtime never imports from CLI. Underlying it all: @framers/agentos — agent() API, EmergentCapabilityEngine, EmergentJudge, ForgeToolMetaTool, AgentMemory.
paracosm/digital-twin subpath is the curated entry: supply a SubjectConfig + InterventionConfig, get the same Zod-validated RunArtifact shape with artifact.subject and artifact.intervention populated for traceability. artifact.trajectory.timepoints[t].worldSnapshot carries per-timepoint metrics (e.g. HbA1c, weight, BMI for a clinical scenario; engagement, churn, ARPU for a product scenario). The kernel is domain-agnostic — the scenario contract names the world. Worked examples: Maria Chen, T2D, 12-week semaglutide protocol for clinical, SaaS mid-tier pricing introduction for product, Coastal congestion-pricing pilot for policy.
DigitalTwin.fromJson(scenarioJson).intervene({ subject, intervention, actor }) · Live route: POST /api/quickstart/simulate-intervention · Counterfactual semantics: Kirfel et al, AI & Ethics (2025); AXIS interrogation framework: arXiv 2505.17801 (May 2025).docs/COOKBOOK.md. One run threads through seven public surfaces: prompt-to-world, quickstart parallel leaders (runMany), branch at turn N (wm.forkFromArtifact), kernel replay (wm.replay), HTTP /simulate endpoint, digital-twin intervene(), batch dispatch (runBatch). Every JSON committed. Plus scripts/cookbook-e2e.ts in the repo runs the full pipeline against your scenario / leader profiles / seed and captures live fingerprints + per-metric deltas.
quality and economy bundle sensible defaults; per-role overrides win. Paracosm auto-detects which API key is present in env and falls back between providers. Mini models are available for cost iteration but produce lower-quality structured output and trip the judge more often.
economy and a few dollars under quality. Compile-from-seed adds ~$0.10 (cached to disk by seed signature). Tool reuse after turn three is the largest cost lever; the artifact records every token spend with cache-hit accounting. Anthropic's prompt cache TTL hits the shared system prefix from turn 2 onward (10× input-cost reduction); OpenAI auto-caches any prompt over 1024 tokens. The cost field reports tokens read, tokens created, and USD saved per run.
ScenarioPackage in TypeScript with custom hooks for progression, crisis generation, milestones, and politics — full control. (2) Write a scenario JSON draft and run it through compileScenario() — zero-code authoring, $0.10 per compile, caches to disk. (3) Pass a brief, paper, or URL to WorldModel.fromPrompt() and let an LLM draft the JSON contract first — the path the live demo uses. All three produce the same canonical ScenarioPackage and route through the same runtime. An Antarctic station, a submarine crew, a corporate org, a generation ship: different world contract, same engine, different future.