Backend map / prompt audit / source index

Nipux Project Atlas

A self-contained visual map of the current backend: entrypoints, daemon loop, worker prompt assembly, durable memory, tools, SQLite schema, UI control plane, tests, and every tracked source file with parsed functions/classes and line references.

Mind map

Architecture

Click a node to jump into related detail.
Lifecycle

Runtime Flow

What happens from terminal input to durable progress.
  1. Startuppyproject entrypoint calls nipux_cli.cli:main. With no args, the chat/TUI opens on the focused job or first-run workspace.
  2. Operator inputPlain chat is stored as visible events and, when relevant, durable operator context for future worker prompts.
  3. Daemon schedulingThe daemon claims runnable jobs, keeps a lock/heartbeat, starts runs, and calls one bounded worker step repeatedly.
  4. Prompt assemblyworker.build_messages layers system prompt, program template, operator context, roadmaps, tasks, ledgers, experiments, memory, timeline, and recent steps.
  5. Tool callThe LLM selects one OpenAI-style tool. The registry executes it with ToolContext and stores input/output in steps/events.
  6. Progress accountingGuards require artifacts, findings, tasks, experiments, or milestone validation when evidence or measurements appear.
  7. PersistenceArtifacts go to the job output directory. SQLite stores steps, events, ledgers, runtime state, and usage/cost metadata.
  8. UI refreshThe TUI reads timeline/events and compact job metrics, splitting chat from worker activity and status.
Exact text

Prompt Surfaces

System/program prompts and instruction-like strings extracted from source.

inline string

nipux_cli/chat_context.py:105

Context: inline · 640 chars

You are Nipux, the chat model that controls a generic long-running agent workspace. You know the visible CLI state, focused job, job list, task queue, artifacts, memory, metrics, and recent activity. Answer directly from the visible job state. Do not claim hidden chain-of-thought. If the operator asks for work to be done, explain the concrete job/control action Nipux will take or how to run it from the Jobs/Status panel. If the operator asks where saved work is, explain that artifacts and history are visible from the Jobs/Status panel or direct CLI commands. Do not start replies with an introduction. Keep replies concise and useful.

prompt_label

nipux_cli/chat_tui.py:106

Context: assignment · 1 chars

prompt_label

nipux_cli/chat_tui.py:109

Context: assignment · 1 chars

inline string

nipux_cli/cli.py:2857

Context: inline · 819 chars

You are Nipux, the workspace chat model for a generic long-running agent CLI. Your job is to help the operator create, start, inspect, pause, resume, and steer worker jobs. You know the CLI concepts: jobs are long-running workers; artifacts are saved outputs; outcomes summarize durable progress; the updates page shows durable worker outcomes; the jobs page shows state, outputs, tasks, memory, findings, sources, experiments, and cost. Answer job-status questions from the job dossier. Mention concrete outputs, tasks, measurements, sources, blockers, and next branches when present. When the operator asks you to do new work, explain that Nipux will spin up a worker job; the harness will create the job from plain language. Keep replies concise, concrete, and operator-facing. Do not expose hidden chain-of-thought.

prompt_label

nipux_cli/first_run_tui.py:81

Context: assignment · 5 chars

Setup

inline string

nipux_cli/templates.py:17

Context: inline · 1,339 chars



## Operating Rules

- Work forever in bounded, resumable steps until the operator explicitly cancels or pauses the job.
- Treat useful results as checkpoints, not endings: save the result, create the next branch, and continue.
- Save important observations as artifacts.
- Use report_update for short progress notes or blocked-state notes.
- Use record_lesson when a source, mistake, operator preference, or strategy should affect future steps.
- Use record_source and record_findings when those tools are available so the job improves its ledgers over time.
- Use record_roadmap for broad work that needs milestones, feature groups, validation contracts, and roadmap-level checkpoints.
- Use record_milestone_validation to validate milestones from evidence and create follow-up tasks when validation fails or blocks.
- Use record_tasks to split broad objectives into durable branches with output contracts, acceptance criteria, required evidence, and stall behavior.
- Use record_experiment whenever a branch produces measured results, comparisons, benchmarks, scores, or optimization data.
- Use acknowledge_operator_context after incorporating or superseding active operator steering.
- Use browser and web tools first. Do not assume memory is exact unless it points to an artifact.
- Prefer quantity of attempts over one giant plan.

inline string

nipux_cli/tools.py:1785

Context: inline · 329 chars

Run a local shell command for CLI work. Use small read-only probes first. For long downloads, builds, training, crawls, or benchmarks, set a meaningful timeout, prefer resumable commands, and record or defer monitoring instead of repeatedly restarting short timed-out commands. Do not run destructive or high-risk cyber commands.

prompt

nipux_cli/tui_layout.py:99

Context: assignment · 1 chars

 

shell_guidance

nipux_cli/worker.py:306

Context: assignment · 48 chars

Do not call shell_exec or do more research next.

shell_guidance

nipux_cli/worker.py:308

Context: assignment · 185 chars

Do not call broad shell_exec or do more research next. A single bounded shell_exec is allowed only when it validates one exact candidate path already listed in Candidate file discovery.

inline string

nipux_cli/worker.py:1459

Context: inline · 472 chars

 Do not create new task branches. Either execute an existing high-priority branch, or use record_tasks only to update existing task titles to active, done, blocked, or skipped with concise result/evidence. If you have a near-duplicate task, update the closest existing task instead of inventing a fresh title. Consolidate branch sprawl into roadmap/milestones when useful. If this repeats, record_tasks is temporarily withheld so the worker must use a non-planning action.

inline string

nipux_cli/worker.py:1829

Context: inline · 241 chars

The last artifact read used a reference that does not exist. Do not invent or retry artifact ids. Use a valid recent artifact ref, call search_artifacts with a concrete query, or continue from already observed evidence with a durable record.

inline string

nipux_cli/worker.py:5031

Context: inline · 305 chars

Do not defer merely for a future worker turn to pick up ordinary work. Use defer_job only when waiting for a real external process, scheduled monitor interval, long-running command, or other time-based condition. Otherwise execute, measure, record a task/experiment/lesson, or mark the branch blocked now.

inline string

nipux_cli/worker.py:5178

Context: inline · 351 chars

A recent privileged/package-manager shell command failed due permission or authorization. Do not retry that class of command until the failure is accounted for. Use observed executable paths, user-writable installs, existing project files, or record_tasks/record_lesson/record_experiment to mark the branch blocked or choose a non-privileged recovery.

guidance

nipux_cli/worker.py:5205

Context: assignment · 112 chars

Use a different query, extract one of the prior result URLs, open a result in the browser, or write an artifact.

guidance

nipux_cli/worker.py:5207

Context: assignment · 148 chars

This artifact was already read. Do not read it again; use its content to inspect a concrete item, record findings/tasks, or write a report artifact.

guidance

nipux_cli/worker.py:5212

Context: assignment · 161 chars

This shell command was already run. Do not rerun discovery; use the previous output to inspect a specific file/item, write an artifact, or update findings/tasks.

inline string

nipux_cli/worker.py:5239

Context: inline · 257 chars

Browser automation is unavailable on this host. Do not retry browser tools until the runtime is installed or configured. Use web_search, web_extract, shell_exec, source/ledger tools, or record a blocked task/source and continue through a non-browser branch.

inline string

nipux_cli/worker.py:5455

Context: inline · 334 chars

This job already has many raw lessons and the connected memory graph is behind. Do not add another raw lesson. Use record_memory_graph to consolidate reusable strategy, mistake, constraint, decision, question, skill, or episode nodes with evidence links, or update existing tasks/roadmap/milestone state if this is only branch status.

acceptance

nipux_cli/worker.py:6063

Context: assignment · 258 chars

Use record_findings, record_source, record_experiment, record_tasks, record_roadmap, record_milestone_validation, or record_lesson to state exactly what the checkpoint proved, invalidated, changed, or failed to provide. Do not read the same checkpoint again.

inline string

nipux_cli/worker.py:6064

Context: inline · 258 chars

Use record_findings, record_source, record_experiment, record_tasks, record_roadmap, record_milestone_validation, or record_lesson to state exactly what the checkpoint proved, invalidated, changed, or failed to provide. Do not read the same checkpoint again.

repair_prompt

nipux_cli/worker.py:7276

Context: assignment · 386 chars

Your previous worker response did not call a tool. This worker must advance by calling exactly one available tool now. Do not answer in prose. Choose one bounded action that fits the current state, such as executing existing work, recording a measurement, updating an existing task, saving an evidence-backed output, recording a lesson/finding/source, or deferring only for a real wait.

SYSTEM_PROMPT

nipux_cli/worker_policy.py:11

Context: assignment · 9,241 chars

You are a long-running local work agent.

Operate as a bounded worker, not a chat assistant. Choose one useful next step,
call one of the available tools, and persist important evidence as artifacts.
Do not claim the whole job is complete. A strong result is only a checkpoint:
save it, report it, add the next tasks, and continue improving or broadening.

Use a contract-first durable cycle. Read the objective, operator context,
roadmap, active task, and recent evidence; choose the next action that satisfies
the active output contract; produce or measure concrete evidence; update the
right ledger; report the checkpoint; then open or continue the next branch.
Research is only one possible contract. For action, experiment, monitor, report,
or file-deliverable work, prefer execution, measurement, validation, or writing
over more background collection. Keep moving forever until the operator pauses
or cancels the job.
The worker must not mark jobs completed or failed; use record_tasks,
record_lesson, report_update, and artifacts to describe checkpoints, blockers,
and next branches while the job stays runnable.

Avoid loops. Do not repeat the same search query or the same exact tool call.
If search results already exist, move forward by extracting source pages,
opening a useful site in the browser, or saving a finding/evidence artifact.
If a page has already been extracted and contains useful evidence, save that
evidence with write_artifact before doing more searching or browsing.
Only click or type browser refs from the most recent successful browser snapshot
or navigation result. If a click/type fails with an unknown ref, use the fresh
recovery snapshot or call browser_snapshot before retrying.
If a source shows Cloudflare, login, paywall, or anti-bot verification, keep it
visible in the trace. Do not bypass protections. Continue with normal visible
browser actions when possible, persist what you have, or use alternate public
sources if stuck.
If a tool returns a list of actionable candidates such as files, packages,
configurations, commands, sources, venues, records, branches, or options, do not
keep re-listing the same candidate set with small formatting changes. Persist the
candidate list once, choose the best candidate for the active contract, and move
to execution, measurement, validation, or an explicit blocked decision.
If a probe discovers a local/runtime candidate that might satisfy the active
contract, promote that candidate immediately: record the fact, validate it with
the smallest relevant action, and measure it before continuing external
acquisition or research retries. Do not let an available local candidate fall
out of context while pursuing lower-confidence external sources.
If repeated external acquisition attempts fail with authentication, permission,
quota, missing credentials, or unavailable resources, mark that branch blocked
or low-yield and pivot to another source, local candidate, monitor/defer branch,
or operator-visible credential requirement instead of retrying small variants.
If a browser page says blocked, CAPTCHA, bot check, login required, paywall, or
anti-bot, treat that page as a failed/low-yield source for the current job. Do
not write an artifact that claims usable evidence exists unless the evidence is
actually visible. Record the source outcome or pivot to another public source.
Use report_update for short operator-readable progress notes when you need to
say what you found or why you are blocked. Do not use report_update instead of
write_artifact when you have durable evidence, findings, or report content to save.
Use write_file when the objective requires a concrete file deliverable, source
file, document, config, dataset, or other workspace output. If a measured
experiment says the next action is to write, merge, update, compile, or insert
content, prefer write_file or an execution command that actually changes the
target over more read-only inspection.
Use defer_job when the next useful step is to wait for an external process,
scheduled check, long-running command, or monitor interval. Do not
simulate waiting with repeated searches, reports, or shell probes.
Use record_lesson when you learn something that should change future behavior:
bad source patterns, task-specific success criteria, repeated mistakes, operator
preferences, or a better strategy. Keep lessons short and reusable.
Use record_memory_graph when work produces reusable connected knowledge: an
episode worth remembering, a stable fact, a strategy, a reusable skill, an open
question, a decision, or a constraint. Link nodes to their evidence and to each
other. Treat this as the job's durable brain: recent events are fast episodic
memory, while stable graph nodes are consolidated knowledge that should guide
future branches without replaying raw history.
Durable memory is not automatically true forever. If newer evidence contradicts
an older memory-graph fact, constraint, strategy, or finding, update the older
record as deprecated/resolved/stale and link the newer evidence before acting.
Prefer fresh measured or directly observed evidence over stale summaries.
Use record_source when a source is high-yield, low-yield, blocked, repetitive,
or otherwise useful to score for future behavior.
Use record_findings after finding durable candidates, facts, opportunities,
experiments, files, bugs, sources, or other reusable outputs. Dedupe against the
finding ledger and artifacts before saving.
Use record_tasks to maintain a durable queue of objective-neutral branches:
open work, active branch, blocked branch, completed branch, and skipped branch.
Each task should include an output_contract (research, artifact, experiment,
action, monitor, decision, or report), acceptance criteria, evidence needed,
and stall behavior so progress is judged by evidence, not activity volume.
Before marking a task or milestone done, audit the claim against the objective:
list the requirement, the concrete artifact/file/finding/measurement/validation
that proves it, and any remaining gap. If the audit is incomplete, keep the
branch active or blocked and create the next smallest follow-up task.
When the job is broad or starts looping, split it into tasks and move to the
highest-priority open task rather than staying on one source or tactic forever.
Use record_roadmap for broad, multi-phase, or ambiguous objectives that need a
higher-level orchestration plan. A roadmap is generic: milestones group related
features or work units; each milestone has acceptance criteria, evidence needed,
and a validation contract. Use record_milestone_validation at milestone checkpoints
to pass, fail, block, or create follow-up tasks from validation gaps. Keep the
roadmap compact and update it from durable evidence, not from activity count.
Use record_experiment for measurable trials, benchmarks, comparisons,
optimization attempts, or hypothesis tests. A saved note, source, or artifact is
not enough progress for a measurable objective: record the exact configuration,
metric, result, whether higher or lower is better, and the next experiment. Keep
improving against the best observed result instead of declaring victory after a
single measurement.
Use shell_exec for command-line work, repository inspection, diagnostics,
benchmarks, repeatable experiments, and other command execution that the
objective requires. Prefer small read-only probes before changing anything, use
explicit timeouts, and save important command output with write_artifact before
continuing. Do not run destructive or high-risk cyber commands.
For long downloads, builds, training runs, crawls, benchmarks, or other slow
actions, treat the action as a monitored branch: choose a timeout that can make
meaningful progress, use resumable commands when available, record partial
progress as an experiment/task/checkpoint, and use defer_job when the next useful
step is to wait and check again. Do not repeatedly restart the same long action
with short timeouts without recording what changed and how the next attempt will
resume or differ.
If a probe shows a partial output, incomplete file, running process, cache entry,
checkpoint, or other unfinished artifact from an action branch, stop re-listing
the same state. Either resume/continue the action with a resumable command,
record a monitor/defer step for the still-running work, or record the branch as
blocked with the concrete missing condition and next action.
read_artifact only reads saved Nipux artifacts. Use shell_exec for repository,
workspace, project, or filesystem files that are not saved artifacts.
write_file writes workspace/local files directly; write_artifact writes Nipux's
separate saved-output store. Use the right one for the operator-facing result.
Operator messages are durable context from the human operator. Messages marked
steer are active constraints until acknowledged or superseded. Messages marked
follow_up are lower-priority queued work; keep them in the task queue and act on
them after the current active branch has a durable checkpoint. Messages marked
note are durable preferences. Use acknowledge_operator_context only after you
have incorporated or intentionally superseded a steer/follow_up message.

instruction

nipux_cli/worker_prompt_context.py:87

Context: assignment · 220 chars

Take exactly one bounded next action. If recent state contains search results, do not search the same query again. If recent state contains extracted page evidence, write an artifact before doing more search or browsing.

prompt_text

scripts/generate_project_atlas.py:592

Context: assignment · 102 chars

 prompt/instruction-like strings were extracted. Inspect this section after any agent-behavior change.
Tools

Tool Registry

Static ToolSpec definitions from nipux_cli/tools.py.
NameDescriptionLine
browser_navigateNavigate to a URL and return a compact browser snapshot.1735
browser_snapshotRefresh the current page accessibility snapshot.1740
browser_clickClick an element by snapshot ref, for example @e5.1745
browser_typeFill an input by snapshot ref.1750
browser_scrollScroll the current page up or down.1755
browser_backNavigate back in browser history.1760
browser_pressPress a keyboard key in the browser.1761
browser_consoleRead console errors or evaluate JavaScript in the current page.1766
web_searchSearch the web for candidate sources.1775
web_extractExtract markdown text from up to five URLs.1780
shell_execRun a local shell command for CLI work. Use small read-only probes first. For long downloads, builds, training, crawls, or benchmarks, set a meaningful timeout, prefer resumable commands, and record or defer monitoring instead of repeatedly restarting short timed-out commands. Do not run destructive or high-risk cyber commands.1785
write_fileCreate, overwrite, or append a concrete workspace/local file for deliverables, code, documents, configs, or other file outputs.1795
write_artifactPersist important findings, evidence, reports, or checkpoints to the job artifact store.1805
read_artifactRead a saved artifact by artifact_id, visible number, exact saved path, or title.1816
search_artifactsSearch stored artifacts for exact evidence from prior steps.1826
update_job_stateKeep the current job runnable. Completion, failure, pausing, and cancellation are operator-only; workers should report checkpoints and continue.1831
defer_jobWait before the next worker turn for this job. Use for long external processes, monitor/check-later tasks, or scheduled follow-up without completing or pausing the job.1839
report_updateLeave a short operator-readable progress note. Do not use this instead of write_artifact for durable evidence.1849
record_lessonSave durable learning for this job: bad source patterns, success criteria, strategy changes, mistakes to avoid, or operator preferences.1858
record_memory_graphCreate or update the job1880
search_memory_graphSearch the job1943
acknowledge_operator_contextAcknowledge that durable operator steering has been incorporated or superseded. Use this after acting on a chat correction so it can leave the active context while remaining in history.1951
record_sourceUpdate the source ledger with source quality, finding yield, failures, warnings, and last outcome.1960
record_findingsUpdate the finding ledger with evidence-backed useful results. Each finding needs an evidence anchor such as source_url/url, reason, evidence_artifact, or evidence metadata.1974
record_tasksCreate or update a durable queue of objective-neutral work branches. Use this to split long jobs into next actions, mark blocked branches, and keep the agent from cycling on one path. Missing task contract fields are filled with generic defaults from the task title. When the queue is saturated, near-duplicate task titles are folded into the matching existing task instead of creating another branch.2002
record_roadmapCreate or update a generic roadmap for broad work: milestones, features, success criteria, validation contract, scope, and current roadmap state. Use this before or during long-running work when task lists need higher-level structure.2033
record_milestone_validationRecord validation for a roadmap milestone and optionally create follow-up tasks for gaps. Use fresh evidence, acceptance criteria, and clear pass/fail/blocker reasons.2086
record_experimentTrack a measurable trial, benchmark, comparison, hypothesis test, or optimization attempt. Use this after any command or source produces a concrete result so future steps compare against the best observed result instead of treating notes as progress. Closed trials must include next_action so long-running work can continue from the result.2121
send_digest_emailSend or dry-run a digest email and save the body as an artifact.2140
Persistence

SQLite Tables

CREATE TABLE blocks found in nipux_cli/db.py.

schema_version

nipux_cli/db.py:24

  • version INTEGER NOT NULL

jobs

nipux_cli/db.py:28

  • id TEXT PRIMARY KEY
  • title TEXT NOT NULL
  • objective TEXT NOT NULL
  • kind TEXT NOT NULL DEFAULT 'generic'
  • status TEXT NOT NULL DEFAULT 'queued'
  • priority INTEGER NOT NULL DEFAULT 0
  • cadence TEXT
  • created_at TEXT NOT NULL
  • updated_at TEXT NOT NULL
  • metadata_json TEXT NOT NULL DEFAULT '{}'

job_runs

nipux_cli/db.py:41

  • id TEXT PRIMARY KEY
  • job_id TEXT NOT NULL REFERENCES jobs(id

steps

nipux_cli/db.py:53

  • id TEXT PRIMARY KEY
  • job_id TEXT NOT NULL REFERENCES jobs(id

artifacts

nipux_cli/db.py:69

  • id TEXT PRIMARY KEY
  • job_id TEXT NOT NULL REFERENCES jobs(id

evidence

nipux_cli/db.py:83

  • id TEXT PRIMARY KEY
  • job_id TEXT NOT NULL REFERENCES jobs(id

memory_index

nipux_cli/db.py:94

  • id TEXT PRIMARY KEY
  • job_id TEXT NOT NULL REFERENCES jobs(id

digests

nipux_cli/db.py:104

  • id TEXT PRIMARY KEY
  • day TEXT NOT NULL
  • target TEXT
  • subject TEXT
  • body_path TEXT
  • sent_at TEXT
  • status TEXT NOT NULL
  • error TEXT

events

nipux_cli/db.py:115

  • id TEXT PRIMARY KEY
  • job_id TEXT REFERENCES jobs(id
Source index

Important Files

95 Python modules plus docs/config files.

AGENTS.md

32 lines

This repo is a focused long-running worker, not a broad assistant distribution.

0 symbols0 TODOs
Imports and top symbols

Imports: none

Symbols: none

README.md

378 lines

```text

0 symbols0 TODOs
Imports and top symbols

Imports: none

Symbols: none

RELEASE_CHECKLIST.md

38 lines

Use this before sharing the repository with outside users.

0 symbols0 TODOs
Imports and top symbols

Imports: none

Symbols: none

config.example.yaml

27 lines

model:

0 symbols0 TODOs
Imports and top symbols

Imports: none

Symbols: none

docs/long-running-memory-graph-design.md

62 lines

Nipux needs long-running workers that keep improving instead of flattening into repeated search, notes, or shallow checkpoints. The backend now treats each job as having a small durable "brain": a job-local memory graph made of connected nodes and links. It...

0 symbols0 TODOs
Imports and top symbols

Imports: none

Symbols: none

docs/pi-agent-core-port-plan.md

267 lines

Research date: 2026-04-30

0 symbols0 TODOs
Imports and top symbols

Imports: none

Symbols: none

nipux_cli/__init__.py

11 lines

Minimal daemon-first Nipux runtime. This package owns the daemon, state store, model adapter, artifact store, and fixed tool surface.

0 symbols0 TODOs
Imports and top symbols

Imports: none

Symbols: none

nipux_cli/__main__.py

9 lines

Run the Nipux CLI with ``python -m nipux_cli``.

0 symbols0 TODOs
Imports and top symbols

Imports: __future__, nipux_cli.cli

Symbols: none

nipux_cli/artifacts.py

138 lines

Artifact file storage for long-running jobs.

10 symbols0 TODOs
Imports and top symbols

Imports: __future__, hashlib, re, dataclasses, pathlib, typing, nipux_cli.db

Symbols: safe_filename, sha256_text, StoredArtifact, ArtifactStore, __init__, job_dir, _assert_inside_home, write_text, read_text, search_text

nipux_cli/browser.py

189 lines

Small `agent-browser` wrapper for the Nipux runtime.

15 symbols0 TODOs
Imports and top symbols

Imports: __future__, json, os, shutil, subprocess, tempfile, hashlib, pathlib, typing, nipux_cli.config, nipux_cli.source_quality

Symbols: _find_agent_browser, _session_name, _profile_dir, _socket_dir, run_browser_command, navigate, snapshot, click, fill, scroll

nipux_cli/chat_commands.py

286 lines

Slash-command dispatch for focused chat sessions.

3 symbols0 TODOs
Imports and top symbols

Imports: __future__, argparse, dataclasses, typing, nipux_cli.config, nipux_cli.tui_style

Symbols: ChatCommandDeps, handle_chat_slash_command, _optional_int

nipux_cli/chat_context.py

223 lines

Prompt context builder for the Nipux chat-side controller model.

8 symbols0 TODOs
Imports and top symbols

Imports: __future__, typing, nipux_cli.db, nipux_cli.event_render, nipux_cli.metric_format, nipux_cli.operator_context, nipux_cli.tui_event_format, nipux_cli.tui_outcomes

Symbols: build_chat_messages, _durable_outcome_events, _durable_outcome_lines, _job_list_lines, _roadmap_lines, _experiment_line, _empty_section_text, _clip_chat_context

nipux_cli/chat_controller.py

173 lines

Chat-controller behavior shared by the interactive CLI.

6 symbols0 TODOs
Imports and top symbols

Imports: __future__, dataclasses, typing, nipux_cli.chat_intent

Symbols: ChatControllerDeps, handle_chat_message, chat_reply_text_and_metadata, handle_chat_control_intent, maybe_spawn_job_from_chat, queue_chat_note

nipux_cli/chat_frame_runtime.py

571 lines

Terminal chat-frame runtime helpers.

27 symbols0 TODOs
Imports and top symbols

Imports: __future__, queue, select, shutil, sys, termios, threading, time, tty, dataclasses, typing, typing

Symbols: ChatFrameDeps, compact_command_output, frame_next_job_id, next_chat_right_view, frame_refresh_interval, run_chat_frame, emit_frame_if_changed, _safe_render_frame, _fallback_chat_frame, _diff_frame_update

nipux_cli/chat_intent.py

349 lines

Natural-language intent parsing for Nipux chat and shell control.

9 symbols0 TODOs
Imports and top symbols

Imports: __future__, re

Symbols: natural_command_for, chat_control_command, _mentions_any, _looks_like_control_phrase, message_requests_immediate_run, message_requests_queued_job, extract_job_objective_from_message, looks_like_smalltalk, looks_like_job_objective

nipux_cli/chat_tui.py

277 lines

Chat workspace terminal frame rendering.

10 symbols0 TODOs
Imports and top symbols

Imports: __future__, typing, nipux_cli.config, nipux_cli.first_run_tui, nipux_cli.settings, nipux_cli.tui_commands, nipux_cli.tui_event_format, nipux_cli.tui_events, nipux_cli.tui_layout, nipux_cli.tui_outcomes, nipux_cli.tui_status, nipux_cli.tui_style

Symbols: build_chat_frame, _two_col_title, _two_col_line, _overlay_settings_modal, _settings_row, _rate_text, _metadata_records, _step_count, _step_line, _daemon_state_line

nipux_cli/cli.py

3188 lines

Thin CLI for the Nipux agent runtime.

159 symbols0 TODOs
Imports and top symbols

Imports: __future__, argparse, json, os, shlex, shutil, subprocess, sys, threading, time, contextlib, io

Symbols: _launch_agent_plist, _systemd_service_text, _db, _record_command_deps, cmd_init, cmd_update, cmd_uninstall, cmd_create, _ensure_model_setup_verified_for_workspace, _workspace_has_model_config

nipux_cli/cli_help.py

92 lines

Help text and static branding for the Nipux command console.

2 symbols0 TODOs
Imports and top symbols

Imports: __future__, typing

Symbols: print_shell_help, _print_group

nipux_cli/cli_render.py

280 lines

Reusable text renderers for non-frame CLI commands.

22 symbols0 TODOs
Imports and top symbols

Imports: __future__, json, os, shutil, textwrap, pathlib, typing, nipux_cli.event_render, nipux_cli.tui_event_format, nipux_cli.tui_status, nipux_cli.tui_style

Symbols: clip_json, print_step, print_artifact, print_run, print_wrapped, section_title, print_metric_grid, short_path, print_jobs_panel, next_operator_action

nipux_cli/cli_state.py

126 lines

Persistent CLI focus state and job lookup helpers.

12 symbols0 TODOs
Imports and top symbols

Imports: __future__, hashlib, json, datetime, pathlib, typing, nipux_cli.config, nipux_cli.db

Symbols: default_job_id, configured_focus_job_id, find_job, shell_state_path, read_shell_state, write_shell_state, setup_completed, mark_setup_completed, model_setup_fingerprint, model_setup_verified

nipux_cli/compression.py

246 lines

Deterministic rolling memory summaries for long-running jobs.

7 symbols0 TODOs
Imports and top symbols

Imports: __future__, nipux_cli.db, nipux_cli.memory_graph, nipux_cli.operator_context

Symbols: _clip_text, refresh_memory_index, _metadata_list, _rank_tasks, _compact_count, _context_fraction, _first_positive_int

nipux_cli/config.py

271 lines

Configuration for the Nipux long-running agent runtime.

21 symbols0 TODOs
Imports and top symbols

Imports: __future__, os, dataclasses, pathlib, typing, yaml

Symbols: get_agent_home, load_env_file, ensure_private_file_permissions, ensure_private_dir_permissions, write_private_text, ModelConfig, api_key, RuntimeConfig, state_db_path, jobs_dir

nipux_cli/context_pressure.py

254 lines

Context-pressure signals for long-running worker prompts.

10 symbols0 TODOs
Imports and top symbols

Imports: __future__, datetime, typing, nipux_cli.db

Symbols: context_pressure_for_prompt, usage_pressure_for_prompt, emit_usage_pressure_update, emit_context_pressure_update, compact_token_count, _usage_pressure_band, _context_pressure_band, _durable_usage_signal_count, _as_float, _as_int

nipux_cli/daemon.py

695 lines

Daemon runner for restartable background jobs.

41 symbols0 TODOs
Imports and top symbols

Imports: __future__, contextlib, fcntl, hashlib, json, os, signal, threading, time, dataclasses, datetime, email.utils

Symbols: DaemonAlreadyRunning, runtime_code_file_names, current_runtime_fingerprint, _runtime_code_fingerprint, _runtime_code_paths, runtime_stale, _parse_lock_metadata, daemon_lock_status, single_instance_lock, update_lock_metadata

nipux_cli/daemon_control.py

243 lines

Daemon process control helpers used by CLI commands.

10 symbols0 TODOs
Imports and top symbols

Imports: __future__, argparse, os, signal, subprocess, sys, time, pathlib, typing, nipux_cli.config, nipux_cli.cli_state, nipux_cli.daemon

Symbols: remote_model_preflight_failures, _recoverable_provider_preflight, recoverable_remote_model_preflight_failures, provider_preflight_is_recoverable, ensure_remote_model_ready_for_worker, cmd_start_impl, start_daemon_if_needed_impl, cmd_restart_impl, stop_daemon_process_impl, _find_single_daemon_process

nipux_cli/dashboard.py

493 lines

Operator-facing dashboard state and rendering.

23 symbols0 TODOs
Imports and top symbols

Imports: __future__, collections, datetime, pathlib, textwrap, typing, nipux_cli.config, nipux_cli.daemon, nipux_cli.db, nipux_cli.operator_context, nipux_cli.scheduling, nipux_cli.tools

Symbols: collect_dashboard_state, render_dashboard, render_overview, _select_focus_job, _job_card, _focus_state, _render_focus, _public_run, _public_step, _public_artifact

nipux_cli/db.py

2752 lines

SQLite state store for the Nipux agent.

97 symbols0 TODOs
Imports and top symbols

Imports: __future__, json, random, re, sqlite3, threading, time, uuid, datetime, pathlib, typing, nipux_cli.metric_format

Symbols: utc_now, new_id, _slugify, _unique_job_id, _json_dumps, _json_loads, _bounded_float, _merge_string_lists, _memory_edge_key, _as_int

nipux_cli/digest.py

380 lines

Digest rendering and optional email delivery.

9 symbols0 TODOs
Imports and top symbols

Imports: __future__, smtplib, datetime, email.message, pathlib, typing, nipux_cli.config, nipux_cli.db, nipux_cli.operator_context, nipux_cli.tui_layout

Symbols: _metadata_list, _active_operator_messages, _safe_int, _latest_run_model, _usage_lines, render_job_digest, send_digest_email, render_daily_digest, write_daily_digest

nipux_cli/doctor.py

260 lines

Runtime checks for the Nipux agent.

15 symbols0 TODOs
Imports and top symbols

Imports: __future__, json, shutil, urllib.error, urllib.request, dataclasses, pathlib, typing, urllib.parse, nipux_cli.config, nipux_cli.db, nipux_cli.tools

Symbols: Check, as_dict, _check_writable_dir, _check_db, _check_tool_surface, _check_model_config, _check_browser_runtime, _check_model_endpoint, _check_model_generation, _check_openrouter_auth

nipux_cli/event_render.py

118 lines

Readable event rendering shared by CLI history and chat context.

4 symbols0 TODOs
Imports and top symbols

Imports: __future__, shlex, typing, nipux_cli.metric_format, nipux_cli.tui_event_format, nipux_cli.tui_style

Symbols: event_line, event_display_parts, event_label, compact_time

nipux_cli/first_run_controller.py

170 lines

First-run command decisions for the Nipux TUI.

7 symbols0 TODOs
Imports and top symbols

Imports: __future__, shlex, contextlib, dataclasses, io, typing, nipux_cli.settings, nipux_cli.tui_commands, nipux_cli.frame_snapshot

Symbols: FirstRunFrameDeps, handle_first_run_action, handle_first_run_frame_line, first_run_chat_reply, create_first_run_job, capture_first_run_command, first_token

nipux_cli/first_run_frame_runtime.py

436 lines

Terminal runtime for the first-run Nipux workspace.

17 symbols0 TODOs
Imports and top symbols

Imports: __future__, select, shutil, sys, termios, time, tty, dataclasses, typing, urllib.parse, nipux_cli.config, nipux_cli.settings

Symbols: FirstRunRuntimeDeps, run_first_run_frame, clamp_selection, _safe_render_frame, _fallback_first_run_frame, _submit_first_run_line, _handle_first_run_escape, directional_first_run_action, _apply_first_run_action, _handle_edit_input

nipux_cli/first_run_tui.py

571 lines

First-run terminal UI rendering for Nipux.

38 symbols0 TODOs
Imports and top symbols

Imports: __future__, typing, nipux_cli.config, nipux_cli.settings, nipux_cli.tui_layout, nipux_cli.tui_style

Symbols: build_first_run_frame, first_run_columns, first_run_actions, first_run_themed_lines, _wizard_body_lines, _model_page_lines, _endpoint_page_lines, _api_page_lines, _access_page_lines, _doctor_page_lines

nipux_cli/frame_snapshot.py

183 lines

Data loading contract for the interactive Nipux terminal frame.

5 symbols0 TODOs
Imports and top symbols

Imports: __future__, typing, nipux_cli.config, nipux_cli.daemon, nipux_cli.db, nipux_cli.tui_outcomes

Symbols: load_frame_snapshot, load_workspace_frame_snapshot, _safe_job, _workspace_token_usage, _summary_events

nipux_cli/llm.py

288 lines

LLM provider adapters for one bounded worker step.

23 symbols0 TODOs
Imports and top symbols

Imports: __future__, json, urllib.error, urllib.parse, urllib.request, dataclasses, typing, openai, nipux_cli.config

Symbols: ToolCall, LLMResponse, LLMResponseError, __init__, StepLLM, next_action, OpenAIChatLLM, __init__, next_action, complete

nipux_cli/measurement.py

141 lines

Measurement parsing helpers for generic progress accounting.

7 symbols0 TODOs
Imports and top symbols

Imports: __future__, re, typing

Symbols: measurement_candidates, _table_measurement_candidates, _split_markdown_table_row, _is_markdown_separator_row, _table_measurement_label, measurement_candidates_are_diagnostic_only, _candidate_is_diagnostic_only

nipux_cli/memory_graph.py

302 lines

Job-local memory graph helpers for long-running workers.

14 symbols0 TODOs
Imports and top symbols

Imports: __future__, re, datetime, typing, nipux_cli.worker_prompt_format

Symbols: memory_graph_from_job, memory_graph_for_prompt, _node_contains_stale_token, _node_has_negative_memory_marker, _node_has_stale_id, search_memory_graph, rank_memory_nodes, _node_score, _edge_index, _tokens

nipux_cli/memory_graph_view.py

299 lines

Self-contained HTML view for a job-local memory graph.

4 symbols0 TODOs
Imports and top symbols

Imports: __future__, json, html, typing, nipux_cli.memory_graph

Symbols: render_memory_graph_html, _view_node, _view_edge, _string_list

nipux_cli/metric_format.py

17 lines

Small formatting helpers for measured worker results.

1 symbols0 TODOs
Imports and top symbols

Imports: __future__, typing

Symbols: format_metric_value

nipux_cli/operator_context.py

83 lines

Generic filtering for operator messages that enter worker context.

6 symbols0 TODOs
Imports and top symbols

Imports: __future__, re, typing

Symbols: operator_entry_is_active, operator_entry_is_prompt_relevant, active_prompt_operator_entries, inactive_prompt_operator_ids, _conversation_only, _actionable

nipux_cli/parser_builder.py

356 lines

Argparse construction for Nipux CLI commands.

2 symbols0 TODOs
Imports and top symbols

Imports: __future__, argparse, collections.abc

Symbols: _handler, build_arg_parser

nipux_cli/planning.py

384 lines

Generic initial planning primitives for long-running jobs.

9 symbols0 TODOs
Imports and top symbols

Imports: __future__, re, typing

Symbols: objective_profiles, initial_plan_for_objective, initial_task_contract, initial_roadmap_for_objective, _initial_summary_for_profiles, _initial_tasks_for_profiles, _initial_questions_for_profiles, _primary_execution_contract, format_initial_plan

nipux_cli/progress.py

213 lines

Generic progress summaries for long-running jobs.

12 symbols0 TODOs
Imports and top symbols

Imports: __future__, dataclasses, typing

Symbols: ProgressCheckpoint, build_progress_checkpoint, ledger_counts, ledger_update_counts, ledger_resolution_counts, recent_progress_bits, _metadata_list, _clip_text, _updated_existing_record, _record_after_checkpoint

nipux_cli/provider_errors.py

64 lines

Generic model-provider error classification.

4 symbols0 TODOs
Imports and top symbols

Imports: __future__, json, typing

Symbols: provider_error_text, provider_action_required, provider_action_required_note, provider_rate_limited

nipux_cli/record_commands.py

542 lines

Read-only CLI commands for job records, ledgers, memory, and usage.

16 symbols0 TODOs
Imports and top symbols

Imports: __future__, json, dataclasses, pathlib, typing, nipux_cli.artifacts, nipux_cli.cli_render, nipux_cli.cli_render, nipux_cli.daemon, nipux_cli.memory_graph, nipux_cli.memory_graph_view, nipux_cli.tui_status

Symbols: RecordCommandDeps, cmd_findings_impl, cmd_tasks_impl, cmd_roadmap_impl, cmd_experiments_impl, cmd_sources_impl, cmd_memory_impl, _write_memory_graph_view, cmd_metrics_impl, cmd_usage_impl

nipux_cli/scheduling.py

76 lines

Shared scheduling helpers for deferred long-running work.

6 symbols0 TODOs
Imports and top symbols

Imports: __future__, datetime, typing

Symbols: job_deferred_until, job_is_deferred, job_provider_blocked, provider_retry_metadata, operator_resume_metadata, _metadata_time

nipux_cli/service_install.py

179 lines

OS service installation helpers for the Nipux daemon.

7 symbols0 TODOs
Imports and top symbols

Imports: __future__, os, shlex, shutil, subprocess, sys, argparse, pathlib, nipux_cli.config

Symbols: launch_agent_path, launch_agent_plist, systemd_service_path, systemd_service_text, cmd_autostart, cmd_service, xml_escape

nipux_cli/settings.py

153 lines

Inline config editing helpers for Nipux slash commands.

12 symbols0 TODOs
Imports and top symbols

Imports: __future__, os, pathlib, typing, yaml, nipux_cli.config, nipux_cli.cli_state, nipux_cli.tui_commands

Symbols: config_field_value, save_config_field, inline_setting_notice, edit_target_label, edit_target_hint, edit_target_masks_input, _config_path, _load_config_yaml, _save_config_yaml, _save_env_secret

nipux_cli/settings_commands.py

84 lines

Slash-command handlers for inline Nipux configuration.

5 symbols0 TODOs
Imports and top symbols

Imports: __future__, shlex, contextlib, io, nipux_cli.config, nipux_cli.settings, nipux_cli.tui_commands

Symbols: handle_chat_setting_command, config_summary_lines, _rate_text, _cost_limit_text, capture_setting_command

nipux_cli/shell_tools.py

348 lines

Shell and workspace file tools for Nipux workers.

22 symbols0 TODOs
Imports and top symbols

Imports: __future__, contextlib, json, os, re, signal, subprocess, time, datetime, pathlib, typing

Symbols: write_file, shell_exec, cleanup_registered_shell_processes, _shell_error, _shell_success_anomaly, _missing_executable_probe, _empty_observation_probe, _shell_missing_command_anomaly, _shell_sudo_password_anomaly, _shell_build_error_anomaly

nipux_cli/source_quality.py

32 lines

Source quality checks for web and browser tools.

1 symbols0 TODOs
Imports and top symbols

Imports: __future__

Symbols: anti_bot_reason

nipux_cli/task_match.py

102 lines

Task title matching helpers for long-running job queues.

4 symbols0 TODOs
Imports and top symbols

Imports: __future__, re, typing

Symbols: task_key, find_semantic_task_match, _task_tokens, _task_similarity

nipux_cli/templates.py

67 lines

Program templates for generic long-running jobs.

3 symbols0 TODOs
Imports and top symbols

Imports: __future__

Symbols: program_for_job, _generic_template, _research_paper_template

nipux_cli/tools.py

2229 lines

Static tool registry for the Nipux agent.

68 symbols0 TODOs
Imports and top symbols

Imports: __future__, json, re, time, dataclasses, datetime, typing, nipux_cli.artifacts, nipux_cli.config, nipux_cli.db, nipux_cli.metric_format, nipux_cli.digest

Symbols: ToolContext, ToolSpec, as_openai_tool, _missing_argument, _placeholder_argument, _schema_placeholder_arguments, _schema_missing_arguments, _json, _write_artifact, _read_artifact

nipux_cli/tui_commands.py

296 lines

Slash command metadata and command-palette helpers for the TUI.

7 symbols0 TODOs
Imports and top symbols

Imports: __future__, nipux_cli.tui_style

Symbols: slash_suggestion_lines, autocomplete_slash, slash_completion_for_submit, _slash_argument_text, _slash_argument_footer, cycle_slash, _slash_command_matches

nipux_cli/tui_event_format.py

262 lines

Shared event formatting helpers for Nipux terminal renderers.

18 symbols0 TODOs
Imports and top symbols

Imports: __future__, os, re, shlex, pathlib, typing, nipux_cli.metric_format, nipux_cli.tui_style

Symbols: event_tool_args, shell_write_target, event_title_body, experiment_metric_text, event_clock, event_hour, friendly_error_text, brief_reflection_text, generic_display_text, clean_step_summary

nipux_cli/tui_events.py

371 lines

Compact event rendering helpers for the Nipux terminal UI.

16 symbols0 TODOs
Imports and top symbols

Imports: __future__, re, textwrap, typing, nipux_cli.tui_event_format, nipux_cli.tui_style

Symbols: chat_event_parts, append_chat_output, chat_pane_lines, _chat_item_lines, _flatten_chat_blocks, _chat_label, _normalized_chat_body, _is_low_value_chat_notice, _is_waiting_notice, _is_generic_chat_notice

nipux_cli/tui_input.py

80 lines

Terminal input helpers for full-screen Nipux frames.

5 symbols0 TODOs
Imports and top symbols

Imports: __future__, os, re, select, sys, time

Symbols: read_terminal_char, read_escape_sequence, terminal_escape_complete, decode_terminal_escape, drain_pending_input

nipux_cli/tui_layout.py

234 lines

Reusable terminal layout primitives for Nipux frames.

16 symbols0 TODOs
Imports and top symbols

Imports: __future__, typing, nipux_cli.tui_style

Symbols: _top_bar, _two_col_title, _two_col_line, _edge_line, _triple_line, _compose_bar, _metric_strip, _pill, _token_usage_topline, _model_cost_is_zero

nipux_cli/tui_outcomes.py

508 lines

Durable outcome summaries for the Nipux terminal UI.

18 symbols0 TODOs
Imports and top symbols

Imports: __future__, textwrap, typing, nipux_cli.tui_event_format, nipux_cli.tui_style

Symbols: model_update_event_parts, is_summary_event_candidate, latest_durable_outcome_line, latest_hour_outcome_summary_line, visible_outcome_summary_line, job_outcome_summary, outcome_counts, recent_model_update_lines, chat_updates_pane_lines, _wrapped_label_line

nipux_cli/tui_status.py

540 lines

Status and work-pane renderers for the Nipux terminal UI.

23 symbols0 TODOs
Imports and top symbols

Imports: __future__, textwrap, typing, nipux_cli.config, nipux_cli.operator_context, nipux_cli.scheduling, nipux_cli.tui_event_format, nipux_cli.tui_events, nipux_cli.tui_outcomes, nipux_cli.tui_layout, nipux_cli.tui_style

Symbols: worker_label, job_display_state, active_operator_messages, right_pane_lines, _is_workspace_placeholder, _empty_workspace_status_lines, chat_work_pane_lines, chat_settings_pane_lines, frame_jobs_lines, _job_compact_work_lines

nipux_cli/tui_style.py

153 lines

Small terminal styling helpers shared by the CLI frame renderers.

15 symbols0 TODOs
Imports and top symbols

Imports: __future__, os, re, sys, typing

Symbols: _fancy_ui, _style, _accent, _muted, _bold, _one_line, _strip_ansi, _fit_ansi, _center_ansi, _themed_lines

nipux_cli/uninstall.py

217 lines

Uninstall helpers for local Nipux runtime state.

11 symbols0 TODOs
Imports and top symbols

Imports: __future__, os, shutil, subprocess, dataclasses, pathlib, typing, nipux_cli.config, nipux_cli.service_install

Symbols: UninstallPlan, build_uninstall_plan, uninstall_runtime, uninstall_installed_tool, installed_tool_paths, _disable_services, _run_command, _process_lines, _dedupe_paths, _assert_safe_delete_target

nipux_cli/updater.py

179 lines

Self-update helpers for source checkouts and installed tools.

10 symbols0 TODOs
Imports and top symbols

Imports: __future__, os, shutil, subprocess, collections.abc, pathlib

Symbols: find_checkout_root, update_checkout, _update_uv_tool_install, _verify_updated_command, _uv_tool_update_spec, _run_git, _run_command, _process_lines, _git_text, _short_path

nipux_cli/updates.py

132 lines

Readable durable progress reports for jobs.

4 symbols0 TODOs
Imports and top symbols

Imports: __future__, shlex, typing, nipux_cli.config, nipux_cli.daemon, nipux_cli.db, nipux_cli.tui_outcomes, nipux_cli.tui_status, nipux_cli.tui_style

Symbols: render_updates_report, render_all_updates_report, _metadata_list, _metadata_count

nipux_cli/usage.py

78 lines

Formatting helpers for model token and cost usage.

4 symbols0 TODOs
Imports and top symbols

Imports: __future__, typing, nipux_cli.tui_layout

Symbols: format_usage_report, _safe_int, _safe_float, _safe_optional_float

nipux_cli/web.py

121 lines

Small web search/extract helpers without external web tool dependencies.

11 symbols0 TODOs
Imports and top symbols

Imports: __future__, html, re, urllib.parse, urllib.request, html.parser, typing, nipux_cli.source_quality

Symbols: _TextExtractor, __init__, handle_starttag, handle_endtag, handle_data, text, _request, _strip_html, _duckduckgo_link, web_search

nipux_cli/worker.py

7538 lines

Bounded worker loop for one restartable agent step.

241 symbols1 TODOs
Imports and top symbols

Imports: __future__, json, re, shlex, signal, threading, time, dataclasses, datetime, pathlib, typing, urllib.parse

Symbols: StepExecution, build_messages, _acknowledge_non_prompt_operator_context, _measured_progress_guard_for_prompt, _deliverable_progress_guard_for_prompt, _experiment_stagnation_guard_for_prompt, _measurement_obligation_for_prompt, _recent_measurement_evidence_for_prompt, _file_validation_obligation_for_prompt, _current_execution_focus_for_prompt

nipux_cli/worker_policy.py

489 lines

Static worker prompt and loop policy constants.

0 symbols0 TODOs
Imports and top symbols

Imports: __future__, re

Symbols: none

nipux_cli/worker_prompt_context.py

921 lines

Prompt-context renderers for the Nipux worker loop.

35 symbols0 TODOs
Imports and top symbols

Imports: __future__, re, typing, nipux_cli.memory_graph, nipux_cli.metric_format, nipux_cli.operator_context, nipux_cli.tui_outcomes, nipux_cli.worker_policy, nipux_cli.worker_prompt_format

Symbols: _memory_entries_for_prompt, _render_worker_prompt, _redact_stale_tokens_for_prompt, _match_inside_path_like_span, _operator_messages_for_prompt, _operator_message_line, _lessons_for_prompt, _lesson_prompt_text, _positive_durable_lines_for_lesson_conflicts, _negative_lesson_conflict_tokens

nipux_cli/worker_prompt_format.py

223 lines

Prompt-facing summaries for worker history and tool observations.

8 symbols0 TODOs
Imports and top symbols

Imports: __future__, json, re, pathlib, typing, nipux_cli.metric_format, nipux_cli.source_quality, nipux_cli.worker_policy

Symbols: compact, clip_text, format_step_for_prompt, observation_for_prompt, clean_prompt_candidate_path, browser_candidates_for_prompt, _looks_like_metric_cell, _looks_like_service_description

nipux_cli/worker_tool_summary.py

99 lines

Compact result summaries for worker tool executions.

1 symbols0 TODOs
Imports and top symbols

Imports: __future__, typing, nipux_cli.metric_format

Symbols: summarize_tool_result

nipux_cli/worker_usage.py

48 lines

Usage accounting for worker model turns.

3 symbols0 TODOs
Imports and top symbols

Imports: __future__, json, typing, nipux_cli.llm

Symbols: turn_usage_metadata, estimate_token_count, _as_int

plans/nipux-runtime-notes.md

50 lines

Nipux is a narrow, restartable worker for long-running browser, web research,

0 symbols0 TODOs
Imports and top symbols

Imports: none

Symbols: none

pyproject.toml

54 lines

[build-system]

0 symbols0 TODOs
Imports and top symbols

Imports: none

Symbols: none

scripts/generate_project_atlas.py

619 lines

Generate docs/project-atlas.html from the tracked Nipux source tree.

32 symbols2 TODOs
Imports and top symbols

Imports: __future__, ast, html, re, subprocess, dataclasses, pathlib, typing

Symbols: SourceFile, Symbol, Prompt, main, load_source_files, tracked_paths, git, extract_symbols, call_names, dotted_name

scripts/live_memory_graph_smoke.py

226 lines

Run an opt-in real-model smoke test for memory-graph tool calling. This script is intentionally outside the normal Nipux runtime path. It creates an isolated temporary Nipux home, seeds generic durable job state, and verifies that a configured OpenAI-compat...

5 symbols0 TODOs
Imports and top symbols

Imports: __future__, argparse, json, os, shutil, sys, tempfile, pathlib, typing, nipux_cli.config, nipux_cli.db, nipux_cli.memory_graph

Symbols: main, _seed_metadata, _execution_summary, _finish, _human_summary

scripts/render_nipux_ascii_video.py

523 lines

Render a Nipux ASCII-art CLI intro as an MP4. The renderer is dependency-light on purpose: it draws a small embedded bitmap font into raw RGB frames and pipes those frames directly to ffmpeg.

26 symbols0 TODOs
Imports and top symbols

Imports: __future__, argparse, math, random, shutil, subprocess, dataclasses, pathlib

Symbols: Cell, TextGrid, __init__, set, put, center, box, clamp, ease, mix

tests/nipux_cli/test_artifacts.py

34 lines

No module docstring.

2 symbols0 TODOs
Imports and top symbols

Imports: pytest, nipux_cli.artifacts, nipux_cli.db

Symbols: test_artifact_store_writes_reads_and_searches, test_artifact_store_rejects_paths_outside_home

tests/nipux_cli/test_browser_web.py

118 lines

No module docstring.

11 symbols0 TODOs
Imports and top symbols

Imports: json, nipux_cli.browser, nipux_cli.tools, nipux_cli.web, nipux_cli.artifacts, nipux_cli.config, nipux_cli.db

Symbols: test_session_name_is_stable_and_safe, test_long_session_name_is_short_and_hashed, test_strip_html_removes_scripts_and_keeps_text, test_browser_marks_anti_bot_interstitial_as_warning, test_browser_marks_captcha_block_as_warning, test_web_extract_marks_anti_bot_pages_as_warning, fake_request, test_browser_tool_uses_native_wrapper, fake_navigate, test_browser_click_adds_recovery_snapshot_for_stale_ref

tests/nipux_cli/test_cli.py

4970 lines

No module docstring.

231 symbols0 TODOs
Imports and top symbols

Imports: json, queue, subprocess, time, pathlib, nipux_cli.artifacts, nipux_cli, nipux_cli.chat_frame_runtime, nipux_cli.chat_frame_runtime, nipux_cli.chat_frame_runtime, nipux_cli.chat_frame_runtime, nipux_cli.chat_frame_runtime

Symbols: _mode, _mark_test_model_ready, test_cli_has_operator_commands, test_cli_version_flag, test_main_catches_keyboard_interrupt_without_traceback, interrupt, test_python_module_entrypoint_uses_cli_main, test_init_openrouter_writes_secret_free_config_and_env_template, test_init_defaults_to_local_endpoint, test_init_openrouter_defaults_to_generic_route

tests/nipux_cli/test_cli_model_preflight.py

86 lines

No module docstring.

12 symbols0 TODOs
Imports and top symbols

Imports: types, nipux_cli.cli, nipux_cli.doctor

Symbols: _config, test_remote_model_preflight_blocks_rejected_auth, fake_doctor, test_remote_model_preflight_allows_recovery_monitor_for_quota, fake_doctor, test_remote_model_preflight_skips_fake_runs, fake_doctor, test_model_preflight_checks_local_endpoints, fake_doctor, test_start_does_not_spawn_daemon_when_model_preflight_fails

tests/nipux_cli/test_compression.py

101 lines

No module docstring.

1 symbols0 TODOs
Imports and top symbols

Imports: nipux_cli.compression, nipux_cli.db

Symbols: test_refresh_memory_index_includes_durable_progress_ledgers

tests/nipux_cli/test_config.py

143 lines

No module docstring.

7 symbols0 TODOs
Imports and top symbols

Imports: pathlib, nipux_cli.config

Symbols: _mode, test_load_config_defaults_to_local_endpoint, test_load_config_from_yaml, test_load_config_reads_local_env_file, test_load_config_tightens_local_env_permissions, test_default_config_yaml_allows_provider_template_without_secret, test_config_example_matches_default_local_endpoint

tests/nipux_cli/test_daemon.py

560 lines

No module docstring.

41 symbols0 TODOs
Imports and top symbols

Imports: datetime, json, threading, time, pytest, nipux_cli.config, nipux_cli.daemon, nipux_cli.daemon_control, nipux_cli.db, nipux_cli.worker, nipux_cli.doctor

Symbols: test_single_instance_lock_rejects_second_holder, test_daemon_lock_status_reports_free_lock, test_lock_metadata_can_be_updated_while_held, test_lock_metadata_update_restores_missing_process_fields, test_daemon_lock_heartbeat_updates_while_worker_turn_runs, SlowDaemon, run_once, test_stop_daemon_recovers_pidless_lock_from_process_list, PsResult, test_daemon_lock_status_detects_stale_runtime

tests/nipux_cli/test_dashboard.py

86 lines

No module docstring.

3 symbols0 TODOs
Imports and top symbols

Imports: datetime, nipux_cli.artifacts, nipux_cli.config, nipux_cli.dashboard, nipux_cli.db

Symbols: test_dashboard_collects_jobs_steps_and_artifacts, test_overview_marks_idle_daemon_as_ready_for_work, test_overview_marks_old_heartbeat_as_busy_for_running_step

tests/nipux_cli/test_db.py

609 lines

No module docstring.

19 symbols0 TODOs
Imports and top symbols

Imports: nipux_cli.db

Symbols: test_db_job_run_step_and_artifact_roundtrip, test_create_job_uses_unique_readable_slug_ids, test_step_numbers_increment_across_runs_for_a_job, test_job_token_usage_aggregates_message_usage, test_append_operator_message_roundtrip, test_claim_operator_messages_marks_one_message_at_a_time, test_acknowledge_operator_messages_marks_delivered_context, test_rename_job_updates_title_without_changing_id, test_delete_job_removes_related_rows, test_append_lesson_roundtrip

tests/nipux_cli/test_digest.py

43 lines

No module docstring.

1 symbols0 TODOs
Imports and top symbols

Imports: nipux_cli.config, nipux_cli.db, nipux_cli.digest

Symbols: test_daily_digest_includes_ledgers_lessons_sources_and_strategy

tests/nipux_cli/test_doctor.py

157 lines

No module docstring.

13 symbols0 TODOs
Imports and top symbols

Imports: io, json, urllib.error, nipux_cli.config, nipux_cli.doctor

Symbols: FakeHTTPResponse, __init__, __enter__, __exit__, read, test_doctor_checks_local_runtime_without_model_call, test_doctor_warns_when_remote_model_key_is_missing, test_doctor_reports_openrouter_auth_failure, fake_urlopen, test_doctor_reports_generation_limit_after_model_listing

tests/nipux_cli/test_generic_runtime_audit.py

35 lines

No module docstring.

1 symbols0 TODOs
Imports and top symbols

Imports: pathlib

Symbols: test_runtime_code_has_no_task_specific_literals

tests/nipux_cli/test_live_memory_graph_smoke.py

37 lines

No module docstring.

3 symbols0 TODOs
Imports and top symbols

Imports: importlib.util, sys, pathlib, nipux_cli.memory_graph

Symbols: _load_live_smoke, test_live_memory_graph_smoke_fails_cleanly_without_key, test_live_memory_graph_smoke_seed_pushes_generic_consolidation

tests/nipux_cli/test_llm.py

151 lines

No module docstring.

23 symbols0 TODOs
Imports and top symbols

Imports: types, nipux_cli.config, nipux_cli.llm

Symbols: _FakeCompletions, __init__, create, test_chat_llm_requires_tool_choice_for_worker_actions, FakeOpenAI, __init__, test_chat_llm_retries_without_tool_choice_when_provider_rejects_it, RejectingCompletions, create, FakeOpenAI

tests/nipux_cli/test_measurement.py

32 lines

No module docstring.

2 symbols0 TODOs
Imports and top symbols

Imports: nipux_cli.measurement

Symbols: test_measurement_candidates_extract_markdown_table_unit_columns, test_measurement_candidates_extract_generic_table_metrics

tests/nipux_cli/test_metric_format.py

11 lines

No module docstring.

2 symbols0 TODOs
Imports and top symbols

Imports: nipux_cli.metric_format

Symbols: test_format_metric_value_spaces_named_units, test_format_metric_value_keeps_attached_symbol_units

tests/nipux_cli/test_operator_context.py

30 lines

No module docstring.

4 symbols0 TODOs
Imports and top symbols

Imports: nipux_cli.operator_context

Symbols: _entry, test_conversation_only_operator_messages_do_not_enter_worker_prompt, test_actionable_operator_messages_remain_worker_constraints, test_inactive_prompt_operator_ids_returns_only_conversation_active_messages

tests/nipux_cli/test_planning.py

90 lines

No module docstring.

8 symbols0 TODOs
Imports and top symbols

Imports: nipux_cli.planning

Symbols: test_initial_task_contracts_are_generic_and_complete, test_initial_roadmap_uses_valid_generic_contracts, test_initial_plan_adapts_to_measurable_objectives, test_initial_plan_adapts_to_deliverable_objectives, test_initial_plan_treats_generated_files_as_deliverables, test_initial_plan_adapts_to_monitoring_objectives, test_initial_plan_does_not_add_meta_progress_update_task, test_objective_profiles_stay_generic

tests/nipux_cli/test_progress.py

214 lines

No module docstring.

7 symbols0 TODOs
Imports and top symbols

Imports: nipux_cli.progress

Symbols: test_progress_checkpoint_reports_deltas_and_recent_durable_work, test_progress_checkpoint_for_saved_output_is_concise, test_progress_checkpoint_without_delta_is_activity_not_progress, test_progress_checkpoint_counts_existing_record_updates_as_progress, test_progress_checkpoint_ignores_non_substantive_record_touches, test_progress_checkpoint_counts_roadmap_updates_and_validations, test_progress_helpers_ignore_malformed_metadata

tests/nipux_cli/test_project_atlas.py

46 lines

No module docstring.

3 symbols0 TODOs
Imports and top symbols

Imports: importlib.util, sys, pathlib

Symbols: _load_generator, test_project_atlas_generator_maps_prompts_tools_and_source_without_self_embedding, test_project_atlas_redacts_secret_assignments_from_rendered_source

tests/nipux_cli/test_provider_errors.py

21 lines

No module docstring.

3 symbols0 TODOs
Imports and top symbols

Imports: nipux_cli.provider_errors

Symbols: ProviderPayloadError, test_provider_action_required_detects_payload_and_status_text, test_provider_rate_limited_detects_transient_rate_text

tests/nipux_cli/test_templates.py

15 lines

No module docstring.

1 symbols0 TODOs
Imports and top symbols

Imports: nipux_cli.templates

Symbols: test_generic_template_pushes_artifacts_and_updates

tests/nipux_cli/test_tools.py

2166 lines

No module docstring.

69 symbols0 TODOs
Imports and top symbols

Imports: json, os, signal, subprocess, time, nipux_cli.artifacts, nipux_cli.config, nipux_cli.db, nipux_cli.shell_tools, nipux_cli.tools

Symbols: test_static_tool_surface_is_focused, test_tool_registry_validates_required_arguments, test_tool_registry_blocks_truncated_reference_arguments, test_tool_access_config_filters_worker_schema_and_blocks_calls, test_artifact_tools_roundtrip, test_read_artifact_missing_ref_returns_valid_recent_refs, test_defer_job_records_resume_time_without_pausing, test_shell_exec_tool_runs_bounded_command, test_shell_exec_flags_masked_auth_failure_output, test_write_file_tool_writes_and_appends_workspace_file

tests/nipux_cli/test_uninstall.py

136 lines

No module docstring.

10 symbols0 TODOs
Imports and top symbols

Imports: subprocess, pathlib, nipux_cli.uninstall

Symbols: _completed, test_uninstall_plan_includes_runtime_and_legacy_state, test_uninstall_plan_includes_configured_runtime_home, test_uninstall_runtime_removes_state_and_service_files, test_uninstall_runtime_dry_run_keeps_files, test_uninstall_installed_tool_uses_uv_when_available, runner, test_uninstall_installed_tool_falls_back_to_safe_uv_paths, which, test_installed_tool_paths_ignore_non_user_tool

tests/nipux_cli/test_worker.py

11339 lines

No module docstring.

359 symbols0 TODOs
Imports and top symbols

Imports: json, pathlib, nipux_cli.artifacts, nipux_cli.config, nipux_cli.db, nipux_cli.llm, nipux_cli.worker

Symbols: SnapshotRegistry, openai_tools, handle, SuccessRegistry, openai_tools, handle, MeasuredShellRegistry, openai_tools, handle, DiagnosticShellRegistry

Functions and classes

Symbol Map

Parsed with Python AST.

safe_filename

function nipux_cli/artifacts.py:16

No docstring.

Calls: _SAFE_NAME_RE.sub, strip, value.strip

sha256_text

function nipux_cli/artifacts.py:21

No docstring.

Calls: hashlib.sha256, hexdigest, text.encode

StoredArtifact

class nipux_cli/artifacts.py:26

No docstring.

Calls: dataclass

ArtifactStore

class nipux_cli/artifacts.py:34

No docstring.

Calls: Path, StoredArtifact, ValueError, artifact.get, content.lower, find, join, len, lower, max, new_id, path.exists, path.mkdir, path.resolve, path.write_text, query.lower

__init__

function nipux_cli/artifacts.py:35

No docstring.

Calls: Path, self.home.mkdir

job_dir

function nipux_cli/artifacts.py:40

No docstring.

Calls: path.mkdir

_assert_inside_home

function nipux_cli/artifacts.py:45

No docstring.

Calls: ValueError, path.resolve, resolved.relative_to, self.home.resolve

write_text

function nipux_cli/artifacts.py:54

No docstring.

Calls: StoredArtifact, new_id, path.write_text, replace, safe_filename, self.db.add_artifact, self.job_dir, sha256_text, utc_now

read_text

function nipux_cli/artifacts.py:88

No docstring.

Calls: Path, path.exists, safe_path.read_text, self._assert_inside_home, self.db.get_artifact

search_text

function nipux_cli/artifacts.py:95

No docstring.

Calls: artifact.get, content.lower, find, join, len, lower, max, query.lower, results.append, self.db.list_artifacts, self.read_text, str, strip

_find_agent_browser

function nipux_cli/browser.py:18

No docstring.

Calls: FileNotFoundError, shutil.which

_session_name

function nipux_cli/browser.py:27

No docstring.

Calls: ch.isalnum, hashlib.sha1, hexdigest, join, len, task_id.encode

_profile_dir

function nipux_cli/browser.py:35

No docstring.

Calls: _session_name

_socket_dir

function nipux_cli/browser.py:39

No docstring.

Calls: Path, _session_name, os.environ.get

run_browser_command

function nipux_cli/browser.py:44

No docstring.

Calls: Path, _find_agent_browser, _profile_dir, _session_name, _socket_dir, isinstance, json.loads, proc.kill, proc.wait, profile_dir.mkdir, result.setdefault, socket_dir.mkdir, stderr_path.open, stderr_path.read_text, stdout_path.open, stdout_path.read_text

navigate

function nipux_cli/browser.py:110

No docstring.

Calls: _annotate_source_quality, get, result.get, run_browser_command, snapshot.get

snapshot

function nipux_cli/browser.py:121

No docstring.

Calls: _annotate_source_quality, run_browser_command

click

function nipux_cli/browser.py:125

No docstring.

Calls: _with_recovery_snapshot, ref.startswith, run_browser_command

fill

function nipux_cli/browser.py:130

No docstring.

Calls: _with_recovery_snapshot, ref.startswith, run_browser_command

scroll

function nipux_cli/browser.py:135

No docstring.

Calls: run_browser_command

back

function nipux_cli/browser.py:139

No docstring.

Calls: run_browser_command

press

function nipux_cli/browser.py:143

No docstring.

Calls: run_browser_command

console

function nipux_cli/browser.py:147

No docstring.

Calls: bool, console_result.get, errors_result.get, run_browser_command

_annotate_source_quality

function nipux_cli/browser.py:160

No docstring.

Calls: anti_bot_reason, data.get, isinstance, result.get, str, warnings.append

_with_recovery_snapshot

function nipux_cli/browser.py:180

No docstring.

Calls: _annotate_source_quality, error.lower, result.get, run_browser_command, str

ChatCommandDeps

class nipux_cli/chat_commands.py:14

No docstring.

Calls: dataclass

handle_chat_slash_command

function nipux_cli/chat_commands.py:53

No docstring.

Calls: _one_line, _optional_int, argparse.Namespace, bool, db.append_lesson, db.close, db.get_job, deps.activity, deps.artifact, deps.artifacts, deps.cancel, deps.create_job, deps.db_factory, deps.delete, deps.digest, deps.doctor

_optional_int

function nipux_cli/chat_commands.py:285

No docstring.

Calls: int, isdigit

build_chat_messages

function nipux_cli/chat_context.py:22

Build bounded visible-state context for conversational job control.

Calls: _clip_chat_context, _durable_outcome_events, _durable_outcome_lines, _empty_section_text, _experiment_line, _job_list_lines, _roadmap_lines, active_prompt_operator_entries, artifact.get, clean_step_summary, db.list_artifacts, db.list_jobs, db.list_steps, db.list_timeline_event...

_durable_outcome_events

function nipux_cli/chat_context.py:127

No docstring.

Calls: db.list_events, event.get, is_summary_event_candidate, len, merged.values, sorted, str

_durable_outcome_lines

function nipux_cli/chat_context.py:142

No docstring.

Calls: hourly_outcome_summary, join, label.lower, len, lines.append, model_update_event_parts, outcome_counts, reversed, seen.add, set

_job_list_lines

function nipux_cli/chat_context.py:165

No docstring.

Calls: entry.get, enumerate, join, len, lines.append, rstrip, split, str

_roadmap_lines

function nipux_cli/chat_context.py:180

No docstring.

Calls: entry.get, isinstance, join, roadmap.get, strip

_experiment_line

function nipux_cli/chat_context.py:199

No docstring.

Calls: entry.get, format_metric_value

_empty_section_text

function nipux_cli/chat_context.py:214

No docstring.

Calls: title.startswith

_clip_chat_context

function nipux_cli/chat_context.py:218

No docstring.

Calls: len, max, rstrip, str

ChatControllerDeps

class nipux_cli/chat_controller.py:17

No docstring.

Calls: dataclass

handle_chat_message

function nipux_cli/chat_controller.py:28

No docstring.

Calls: chat_reply_text_and_metadata, db.append_agent_update, db.append_event, db.close, deps.db_factory, deps.friendly_error_text, handle_chat_control_intent, maybe_spawn_job_from_chat, print, queue_chat_note, reply_callable, reply_text.strip, type

chat_reply_text_and_metadata

function nipux_cli/chat_controller.py:79

No docstring.

Calls: getattr, isinstance, str

handle_chat_control_intent

function nipux_cli/chat_controller.py:96

No docstring.

Calls: chat_control_command, command.lstrip, deps.capture_command, deps.compact_command_output, join, print

maybe_spawn_job_from_chat

function nipux_cli/chat_controller.py:114

No docstring.

Calls: db.append_agent_update, db.append_operator_message, db.close, deps.create_job, deps.db_factory, deps.start_daemon, deps.write_shell_state, extract_job_objective_from_message, message_requests_immediate_run, message_requests_queued_job, print

queue_chat_note

function nipux_cli/chat_controller.py:156

No docstring.

Calls: db.append_operator_message, db.close, deps.db_factory, entry.get, print

ChatFrameDeps

class nipux_cli/chat_frame_runtime.py:40

No docstring.

Calls: dataclass

compact_command_output

function nipux_cli/chat_frame_runtime.py:50

No docstring.

Calls: _one_line, compacted.append, join, line.split, line.startswith, line.strip, output.splitlines

frame_next_job_id

function nipux_cli/chat_frame_runtime.py:60

No docstring.

Calls: ids.index, isinstance, job.get, len, snapshot.get, str

next_chat_right_view

function nipux_cli/chat_frame_runtime.py:74

No docstring.

Calls: keys.index, len

frame_refresh_interval

function nipux_cli/chat_frame_runtime.py:83

No docstring.

Calls: none

run_chat_frame

function nipux_cli/chat_frame_runtime.py:89

No docstring.

Calls: _append_notice, _drain_async_notices, _frame_enter_sequence, _frame_exit_sequence, _handle_chat_escape, _handle_chat_submit, _handle_edit_input, _has_active_state_notice, _one_line, _safe_render_frame, autocomplete_slash, char.isprintable, deps.load_snapshot, deps.write_shell_...

emit_frame_if_changed

function nipux_cli/chat_frame_runtime.py:244

No docstring.

Calls: _diff_frame_update, print

_safe_render_frame

function nipux_cli/chat_frame_runtime.py:253

No docstring.

Calls: _append_notice, _display_notices, _fallback_chat_frame, _one_line, deps.render_frame, print, type

_fallback_chat_frame

function nipux_cli/chat_frame_runtime.py:283

No docstring.

Calls: _fit_plain, _one_line, isinstance, job.get, join, lines.extend, max, shutil.get_terminal_size, snapshot.get, str

_diff_frame_update

function nipux_cli/chat_frame_runtime.py:301

No docstring.

Calls: frame.splitlines, join, len, max, output.append, previous_frame.splitlines, range

_fit_plain

function nipux_cli/chat_frame_runtime.py:315

No docstring.

Calls: _one_line, _strip_ansi, len, max, str

_append_notice

function nipux_cli/chat_frame_runtime.py:322

No docstring.

Calls: notices.append

_append_thinking_notice

function nipux_cli/chat_frame_runtime.py:327

No docstring.

Calls: _append_notice, _has_thinking_notice

_append_waiting_notice

function nipux_cli/chat_frame_runtime.py:332

No docstring.

Calls: _append_notice, _has_waiting_notice

_has_thinking_notice

function nipux_cli/chat_frame_runtime.py:337

No docstring.

Calls: any, notice.startswith

_has_waiting_notice

function nipux_cli/chat_frame_runtime.py:341

No docstring.

Calls: any, notice.startswith

_has_active_state_notice

function nipux_cli/chat_frame_runtime.py:345

No docstring.

Calls: _has_thinking_notice, _has_waiting_notice

_clear_thinking_notices

function nipux_cli/chat_frame_runtime.py:349

No docstring.

Calls: notice.startswith

_display_notices

function nipux_cli/chat_frame_runtime.py:357

No docstring.

Calls: int, len, rendered.append, time.monotonic

_handle_edit_input

function nipux_cli/chat_frame_runtime.py:374

No docstring.

Calls: _append_notice, char.isprintable, decode_terminal_escape, inline_setting_notice, read_escape_sequence

_handle_chat_submit

function nipux_cli/chat_frame_runtime.py:405

No docstring.

Calls: _append_notice, _append_thinking_notice, _append_waiting_notice, _looks_like_waiting_output, _one_line, _post_submit_snapshot_job_id, _start_chat_message_worker, buffer.strip, compact_command_output, deps.capture_chat_command, deps.is_plain_chat_line, deps.load_snapshot, join,...

_post_submit_snapshot_job_id

function nipux_cli/chat_frame_runtime.py:468

Return the job id to refresh after a submitted command or message.

Calls: line.strip, lower, split, text.startswith

_start_chat_message_worker

function nipux_cli/chat_frame_runtime.py:480

No docstring.

Calls: _one_line, async_messages.put, deps.handle_chat_message, thread.start, threading.Thread, type

run

function nipux_cli/chat_frame_runtime.py:487

No docstring.

Calls: _one_line, async_messages.put, deps.handle_chat_message, type

_drain_async_notices

function nipux_cli/chat_frame_runtime.py:500

No docstring.

Calls: _append_notice, _append_waiting_notice, _clear_thinking_notices, _looks_like_waiting_output, async_messages.get_nowait

_looks_like_waiting_output

function nipux_cli/chat_frame_runtime.py:520

No docstring.

Calls: join, lower, normalized.startswith, split, str

_handle_chat_escape

function nipux_cli/chat_frame_runtime.py:533

No docstring.

Calls: _append_notice, buffer.startswith, cycle_slash, decode_terminal_escape, deps.load_snapshot, deps.page_click, deps.write_shell_state, drain_pending_input, frame_next_job_id, get, isinstance, next_chat_right_view, read_escape_sequence

natural_command_for

function nipux_cli/chat_intent.py:106

No docstring.

Calls: NATURAL_COMMANDS.get, join, lower, split, text.strip

chat_control_command

function nipux_cli/chat_intent.py:110

No docstring.

Calls: NATURAL_COMMANDS.get, _looks_like_control_phrase, _mentions_any, join, line.strip, rstrip, split, text.lower

_mentions_any

function nipux_cli/chat_intent.py:242

No docstring.

Calls: re.escape, re.search

_looks_like_control_phrase

function nipux_cli/chat_intent.py:253

No docstring.

Calls: text.startswith

message_requests_immediate_run

function nipux_cli/chat_intent.py:272

No docstring.

Calls: bool, join, lower, message.strip, message_requests_queued_job, re.match, re.search, split

message_requests_queued_job

function nipux_cli/chat_intent.py:281

No docstring.

Calls: bool, join, lower, message.strip, re.search, split

extract_job_objective_from_message

function nipux_cli/chat_intent.py:291

No docstring.

Calls: join, looks_like_job_objective, looks_like_smalltalk, match.group, message.strip, re.match, split, strip, text.lower

looks_like_smalltalk

function nipux_cli/chat_intent.py:314

No docstring.

Calls: lowered.endswith

looks_like_job_objective

function nipux_cli/chat_intent.py:318

No docstring.

Calls: any, len, re.escape, re.search, text.lower, text.split

build_chat_frame

function nipux_cli/chat_tui.py:23

No docstring.

Calls: _compose_bar, _daemon_state_line, _metadata_records, _overlay_settings_modal, _step_count, _step_line, _top_bar, _two_col_line, _two_col_title, bool, chat_pane_lines, chat_updates_pane_lines, counts.get, edit_target_hint, edit_target_label, edit_target_masks_input

_two_col_title

function nipux_cli/chat_tui.py:170

No docstring.

Calls: _bold, _fit_ansi, _muted, left.upper, right.upper

_two_col_line

function nipux_cli/chat_tui.py:174

No docstring.

Calls: _fit_ansi, _muted

_overlay_settings_modal

function nipux_cli/chat_tui.py:178

No docstring.

Calls: _accent, _bold, _fit_ansi, _muted, _rate_text, _settings_row, _strip_ansi, box.append, enumerate, int, len, load_config, max, min, str

_settings_row

function nipux_cli/chat_tui.py:242

No docstring.

Calls: _bold, _muted, _one_line, label.ljust

_rate_text

function nipux_cli/chat_tui.py:247

No docstring.

Calls: none

_metadata_records

function nipux_cli/chat_tui.py:251

No docstring.

Calls: isinstance, job.get, metadata.get

_step_count

function nipux_cli/chat_tui.py:259

No docstring.

Calls: int, max, step.get

_step_line

function nipux_cli/chat_tui.py:264

No docstring.

Calls: _one_line, clean_step_summary, step.get

_daemon_state_line

function nipux_cli/chat_tui.py:271

No docstring.

Calls: isinstance, lock.get, metadata.get

_launch_agent_plist

function nipux_cli/cli.py:187

No docstring.

Calls: _service_launch_agent_plist

_systemd_service_text

function nipux_cli/cli.py:191

No docstring.

Calls: _service_systemd_service_text

_db

function nipux_cli/cli.py:256

No docstring.

Calls: AgentDB, config.ensure_dirs, load_config

_record_command_deps

function nipux_cli/cli.py:262

No docstring.

Calls: RecordCommandDeps

cmd_init

function nipux_cli/cli.py:270

No docstring.

Calls: Path, config.ensure_dirs, default_config_yaml, env_path.exists, expanduser, getattr, load_config, path.exists, path.parent.mkdir, print, write_private_text

cmd_update

function nipux_cli/cli.py:304

No docstring.

Calls: SystemExit, _one_line, argparse.Namespace, cmd_restart, config.ensure_dirs, daemon_after.get, daemon_before.get, daemon_lock_status, getattr, load_config, print, str, update_checkout

cmd_uninstall

function nipux_cli/cli.py:336

No docstring.

Calls: _stop_daemon_process, bool, build_uninstall_plan, float, getattr, input, load_config, lower, path.expanduser, print, strip, uninstall_installed_tool, uninstall_runtime

cmd_create

function nipux_cli/cli.py:376

No docstring.

Calls: SystemExit, _create_job, _ensure_model_setup_verified_for_workspace, print

_ensure_model_setup_verified_for_workspace

function nipux_cli/cli.py:388

No docstring.

Calls: _auto_verify_model_setup, _model_setup_verified, _workspace_has_model_config, load_config, print

_workspace_has_model_config

function nipux_cli/cli.py:400

No docstring.

Calls: _read_shell_state, bool, exists, get

_auto_verify_model_setup

function nipux_cli/cli.py:404

No docstring.

Calls: _clear_model_setup_verified, _mark_model_setup_verified, all, run_doctor

_create_job

function nipux_cli/cli.py:414

No docstring.

Calls: _db, _write_shell_state, db.append_agent_update, db.append_roadmap_record, db.append_task_record, db.close, db.create_job, db.update_job_status, enumerate, format_initial_plan, initial_plan_for_objective, initial_roadmap_for_objective, initial_task_contract, max, objective.str...

cmd_jobs

function nipux_cli/cli.py:458

No docstring.

Calls: _configured_focus_job_id, _db, _print_jobs_panel, bool, daemon_lock_status, db.close, db.list_jobs, load_config, print, str

cmd_focus

function nipux_cli/cli.py:472

No docstring.

Calls: _db, _default_job_id, _find_job, _job_display_state, _write_shell_state, bool, daemon_lock_status, db.close, db.get_job, join, load_config, print

cmd_rename

function nipux_cli/cli.py:495

No docstring.

Calls: _db, _job_ref_text, _resolve_job_id, _write_shell_state, content.endswith, content.splitlines, db.close, db.get_job, db.rename_job, join, print, program.exists, program.read_text, program.write_text, startswith

cmd_delete

function nipux_cli/cli.py:521

No docstring.

Calls: Path, _db, _job_ref_text, _read_shell_state, _resolve_job_id, _write_shell_state, counts.get, db.close, db.delete_job, job_dir.exists, path.exists, path.unlink, print, result.get, shutil.rmtree, state.get

cmd_chat

function nipux_cli/cli.py:561

No docstring.

Calls: _db, _ensure_model_setup_verified_for_workspace, _enter_chat, _job_ref_text, _resolve_job_id, _write_shell_state, db.close, print

cmd_home

function nipux_cli/cli.py:578

No docstring.

Calls: _auto_verify_model_setup, _enter_first_run_setup, _enter_workspace_chat, _has_saved_jobs, _install_readline_history, _model_setup_verified, _start_interactive_daemon_if_possible, _workspace_has_model_config, load_config

_enter_first_run_setup

function nipux_cli/cli.py:592

No docstring.

Calls: _enter_first_run_frame, _frame_chat_enabled, print

_enter_empty_workspace

function nipux_cli/cli.py:601

No docstring.

Calls: _db, _one_line, _read_shell_state, _rule, db.close, db.list_jobs, get, job.get, print, str

_has_saved_jobs

function nipux_cli/cli.py:624

No docstring.

Calls: _db, bool, db.close, db.list_jobs

_enter_workspace_chat

function nipux_cli/cli.py:632

No docstring.

Calls: _enter_chat_frame, _enter_empty_workspace, _frame_chat_enabled

_print_first_run_menu

function nipux_cli/cli.py:639

No docstring.

Calls: _short_path, load_config, print

_handle_first_run_menu_line

function nipux_cli/cli.py:654

No docstring.

Calls: _extract_job_objective_from_message, _first_run_chat_reply, _first_token, _print_first_run_menu, _run_shell_line, argparse.Namespace, cmd_doctor, cmd_init, cmd_jobs, line.lower, line.startswith, line.strip, lowered.startswith, print, strip

_prompt_first_run_value

function nipux_cli/cli.py:697

No docstring.

Calls: input, print, strip

_first_run_create_and_open

function nipux_cli/cli.py:705

No docstring.

Calls: _create_job, _ensure_model_setup_verified_for_workspace, _enter_workspace_chat, _start_interactive_daemon_if_possible, _write_shell_state, print

_first_token

function nipux_cli/cli.py:716

No docstring.

Calls: _controller_first_token

_enter_first_run_frame

function nipux_cli/cli.py:720

No docstring.

Calls: _enter_workspace_chat, _first_run_runtime_deps, _run_first_run_frame, _start_interactive_daemon_if_possible, _write_shell_state

_first_run_runtime_deps

function nipux_cli/cli.py:730

No docstring.

Calls: FirstRunRuntimeDeps, _first_run_click_action, _render_first_run_frame

_first_run_actions

function nipux_cli/cli.py:747

No docstring.

Calls: _first_run_tui_actions

_clamp_first_run_selection

function nipux_cli/cli.py:751

No docstring.

Calls: _clamp_first_run_runtime_selection, _first_run_actions

_handle_first_run_action

function nipux_cli/cli.py:755

No docstring.

Calls: _controller_handle_first_run_action, _first_run_frame_deps

_first_run_click_action

function nipux_cli/cli.py:759

No docstring.

Calls: _first_run_actions, len, max, min, shutil.get_terminal_size

_chat_page_click

function nipux_cli/cli.py:780

No docstring.

Calls: int, max, min, shutil.get_terminal_size

_handle_first_run_frame_line

function nipux_cli/cli.py:797

No docstring.

Calls: _controller_handle_first_run_frame_line, _first_run_frame_deps

_first_run_chat_reply

function nipux_cli/cli.py:801

No docstring.

Calls: _controller_first_run_chat_reply

_create_first_run_job

function nipux_cli/cli.py:805

No docstring.

Calls: _controller_create_first_run_job, _first_run_frame_deps

_capture_first_run_command

function nipux_cli/cli.py:809

No docstring.

Calls: _controller_capture_first_run_command

_first_run_frame_deps

function nipux_cli/cli.py:813

No docstring.

Calls: FirstRunFrameDeps, _model_setup_verified, load_config

_current_default_job_id

function nipux_cli/cli.py:826

No docstring.

Calls: _db, _default_job_id, db.close

_render_first_run_frame

function nipux_cli/cli.py:834

No docstring.

Calls: _build_first_run_frame, _emit_frame_if_changed, shutil.get_terminal_size

_build_first_run_frame

function nipux_cli/cli.py:856

No docstring.

Calls: _build_first_run_tui_frame, _daemon_state_line, _db, _first_run_columns, _short_path, daemon_lock_status, db.close, db.list_jobs, load_config, max

_enter_chat

function nipux_cli/cli.py:893

No docstring.

Calls: _chat_handle_line, _chat_prompt, _db, _default_job_id, _ensure_model_setup_verified_for_workspace, _enter_chat_frame, _fancy_ui, _frame_chat_enabled, _install_readline_history, _one_line, _print_chat_composer, _print_startup_history, _rule, _shell_summary, _start_chat_live_fee...

_frame_chat_enabled

function nipux_cli/cli.py:948

No docstring.

Calls: os.environ.get, sys.stdin.isatty, sys.stdout.isatty

_enter_chat_frame

function nipux_cli/cli.py:957

No docstring.

Calls: _chat_frame_deps, _run_chat_frame

_chat_frame_deps

function nipux_cli/cli.py:961

No docstring.

Calls: ChatFrameDeps, _chat_page_click, _handle_chat_message, _load_frame_snapshot, _render_chat_frame

_capture_chat_command

function nipux_cli/cli.py:982

No docstring.

Calls: StringIO, _chat_handle_line, _run_workspace_command_line, chat_control_command, line.strip, lstrip, raw.startswith, redirect_stdout, stream.getvalue, strip

_run_workspace_command_line

function nipux_cli/cli.py:994

No docstring.

Calls: _append_workspace_chat_event, _create_workspace_job_from_chat, _extract_job_objective_from_message, _print_workspace_chat_help, _run_shell_line, _run_workspace_run_command, _run_workspace_setting_command, _workspace_command_should_create_worker, join, len, print, shlex.split, ...

_run_workspace_setting_command

function nipux_cli/cli.py:1040

No docstring.

Calls: _handle_chat_setting_command

_workspace_command_should_create_worker

function nipux_cli/cli.py:1048

No docstring.

Calls: _db, _extract_job_objective_from_message, _find_job, db.close

_run_workspace_run_command

function nipux_cli/cli.py:1059

No docstring.

Calls: build_parser, parse_args, parsed.func, print

_print_workspace_chat_help

function nipux_cli/cli.py:1072

No docstring.

Calls: print

_start_worker_from_chat_context

function nipux_cli/cli.py:1078

Start the daemon from the TUI without dumping preflight internals into chat.

Calls: StringIO, _one_line, _start_daemon_if_needed, output.lower, print, redirect_stdout, report, str, stream.getvalue, type

report

function nipux_cli/cli.py:1087

No docstring.

Calls: print

_start_worker_from_chat_namespace

function nipux_cli/cli.py:1118

No docstring.

Calls: _start_worker_from_chat_context, bool, float, getattr

_is_plain_chat_line

function nipux_cli/cli.py:1127

No docstring.

Calls: chat_control_command, line.strip, lower, lowered.split, shlex.split, stripped.lower, stripped.startswith

_load_frame_snapshot

function nipux_cli/cli.py:1143

No docstring.

Calls: _db, _default_job_id, _workspace_chat_events, db.close, load_frame_snapshot

_render_chat_frame

function nipux_cli/cli.py:1158

No docstring.

Calls: _build_chat_frame, _emit_frame_if_changed, shutil.get_terminal_size

_build_chat_frame

function nipux_cli/cli.py:1184

No docstring.

Calls: _build_chat_tui_frame

_resolve_job_id

function nipux_cli/cli.py:1209

No docstring.

Calls: _default_job_id, _find_job, _job_ref_text, str

_activate_job_if_planning

function nipux_cli/cli.py:1217

No docstring.

Calls: db.append_agent_update, db.get_job, db.update_job_status, job.get

_ensure_job_runnable

function nipux_cli/cli.py:1226

No docstring.

Calls: _activate_job_if_planning, db.append_agent_update, db.get_job, db.update_job_status, job.get, job_provider_blocked, operator_resume_metadata, str

cmd_steer

function nipux_cli/cli.py:1247

No docstring.

Calls: _db, _job_ref_text, _resolve_job_id, db.append_operator_message, db.close, db.get_job, join, print, strip

cmd_pause

function nipux_cli/cli.py:1267

No docstring.

Calls: _db, _resolve_control_job_and_note, db.close, db.get_job, db.update_job_status, print

cmd_resume

function nipux_cli/cli.py:1282

No docstring.

Calls: _db, _job_ref_text, _resolve_job_id, db.close, db.get_job, db.update_job_status, operator_resume_metadata, print

cmd_cancel

function nipux_cli/cli.py:1297

No docstring.

Calls: _db, _resolve_control_job_and_note, db.close, db.get_job, db.update_job_status, print

cmd_status

function nipux_cli/cli.py:1312

No docstring.

Calls: _db, _job_ref_text, _resolve_job_id, _terminal_width, collect_dashboard_state, db.close, json.dumps, print, render_dashboard, render_overview

cmd_health

function nipux_cli/cli.py:1331

No docstring.

Calls: _daemon_event_line, _daemon_state_line, _db, _default_job_id, _job_display_state, _launch_agent_path, _one_line, _rule, _step_count, _step_line, _worker_label, bool, config.ensure_dirs, daemon_lock_status, db.close, db.get_job

cmd_history

function nipux_cli/cli.py:1388

No docstring.

Calls: _db, _event_line, _job_ref_text, _print_event_card, _public_event, _resolve_job_id, _rule, db.close, db.get_job, db.list_timeline_events, json.dumps, max, print

cmd_events

function nipux_cli/cli.py:1419

No docstring.

Calls: _db, _event_line, _job_ref_text, _print_event_card, _public_event, _resolve_job_id, _rule, db.close, db.get_job, db.list_timeline_events, emit, event.get, json.dumps, print, seen.add, set

emit

function nipux_cli/cli.py:1433

No docstring.

Calls: _event_line, _print_event_card, _public_event, _rule, db.list_timeline_events, event.get, json.dumps, print, seen.add, str

cmd_dashboard

function nipux_cli/cli.py:1462

No docstring.

Calls: _db, _job_ref_text, _resolve_job_id, _terminal_width, collect_dashboard_state, db.close, print, render_dashboard, time.sleep

cmd_artifacts

function nipux_cli/cli.py:1483

No docstring.

Calls: _db, _generic_display_text, _job_ref_text, _one_line, _resolve_job_id, _rule, artifact.get, db.close, db.get_job, db.list_artifacts, enumerate, print

cmd_artifact

function nipux_cli/cli.py:1513

No docstring.

Calls: ArtifactStore, _db, _job_ref_text, _resolve_artifact_ref, _resolve_job_id, _rule, content.endswith, db.close, getattr, len, print, resolved.get, store.read_text

cmd_lessons

function nipux_cli/cli.py:1535

No docstring.

Calls: _db, _job_ref_text, _print_lessons, _resolve_job_id, db.close, db.get_job, print

cmd_learn

function nipux_cli/cli.py:1549

No docstring.

Calls: _db, _job_ref_text, _one_line, _resolve_job_id, db.append_lesson, db.close, db.get_job, join, print, strip

cmd_findings

function nipux_cli/cli.py:1570

No docstring.

Calls: _record_command_deps, cmd_findings_impl

cmd_tasks

function nipux_cli/cli.py:1574

No docstring.

Calls: _record_command_deps, cmd_tasks_impl

cmd_roadmap

function nipux_cli/cli.py:1578

No docstring.

Calls: _record_command_deps, cmd_roadmap_impl

cmd_experiments

function nipux_cli/cli.py:1582

No docstring.

Calls: _record_command_deps, cmd_experiments_impl

cmd_sources

function nipux_cli/cli.py:1586

No docstring.

Calls: _record_command_deps, cmd_sources_impl

cmd_memory

function nipux_cli/cli.py:1590

No docstring.

Calls: _record_command_deps, cmd_memory_impl

cmd_metrics

function nipux_cli/cli.py:1594

No docstring.

Calls: _record_command_deps, cmd_metrics_impl

cmd_usage

function nipux_cli/cli.py:1598

No docstring.

Calls: _record_command_deps, cmd_usage_impl

_remote_model_preflight_failures

function nipux_cli/cli.py:1602

No docstring.

Calls: _daemon_remote_model_preflight_failures

_recoverable_remote_model_preflight_failures

function nipux_cli/cli.py:1606

No docstring.

Calls: _daemon_recoverable_remote_model_preflight_failures

_provider_preflight_is_recoverable

function nipux_cli/cli.py:1610

No docstring.

Calls: _daemon_provider_preflight_is_recoverable

_ensure_remote_model_ready_for_worker

function nipux_cli/cli.py:1614

No docstring.

Calls: _daemon_ensure_remote_model_ready

cmd_start

function nipux_cli/cli.py:1618

No docstring.

Calls: _cmd_start_impl, _ensure_remote_model_ready_for_worker, _stop_daemon_process

_start_daemon_if_needed

function nipux_cli/cli.py:1626

No docstring.

Calls: _start_daemon_if_needed_impl, _stop_daemon_process

_start_interactive_daemon_if_possible

function nipux_cli/cli.py:1639

Best-effort daemon start for the full-screen UI without printing over the frame.

Calls: StringIO, _start_daemon_if_needed, redirect_stdout, stream.getvalue

cmd_restart

function nipux_cli/cli.py:1651

No docstring.

Calls: _cmd_restart_impl, _stop_daemon_process

_stop_daemon_process

function nipux_cli/cli.py:1659

No docstring.

Calls: _stop_daemon_process_impl

cmd_stop

function nipux_cli/cli.py:1663

No docstring.

Calls: _db, _job_ref_text, _resolve_job_id, _stop_daemon_process, db.close, db.get_job, db.update_job_status, getattr, load_config, print

cmd_browser_dashboard

function nipux_cli/cli.py:1684

No docstring.

Calls: Path, Path.cwd, SystemExit, _find_agent_browser, config.ensure_dirs, expanduser, load_config, log_path.open, log_path.parent.mkdir, print, str, subprocess.Popen, subprocess.call, subprocess.run

_print_startup_history

function nipux_cli/cli.py:1715

No docstring.

Calls: _db, _important_startup_events, _print_event_card, _print_session_overview, _section_title, bool, daemon_lock_status, db.close, db.get_job, db.list_artifacts, db.list_jobs, db.list_memory, db.list_steps, db.list_timeline_events, enumerate, len

_print_session_overview

function nipux_cli/cli.py:1752

No docstring.

Calls: _job_display_state, _metadata_records, _next_operator_action, _print_jobs_panel, _print_metric_grid, _print_wrapped, _section_title, _short_path, _status_badge, _step_count, _terminal_width, _worker_label, isinstance, job.get, len, metadata.get

_print_chat_composer

function nipux_cli/cli.py:1816

No docstring.

Calls: _accent, _fancy_ui, _section_title, _terminal_width, max, min, print

_chat_prompt

function nipux_cli/cli.py:1827

No docstring.

Calls: _accent

_start_chat_live_feed

function nipux_cli/cli.py:1831

No docstring.

Calls: os.environ.get, sys.stdin.isatty, sys.stdout.isatty, thread.start, threading.Event, threading.Thread

_chat_live_feed_loop

function nipux_cli/cli.py:1845

No docstring.

Calls: _db, _default_job_id, _minimal_live_event_line, _print_live_line, db.close, db.list_events, event.get, initialized_jobs.add, seen.add, seen.update, seen_by_job.setdefault, set, stop.wait, str

_print_live_line

function nipux_cli/cli.py:1875

No docstring.

Calls: _chat_prompt, _fancy_ui, _live_badge, print

_resolve_control_job_and_note

function nipux_cli/cli.py:1885

No docstring.

Calls: _default_job_id, _find_job, _job_ref_text, _note_text, _resolve_job_id, getattr, hasattr, join, len, range, str, strip

_pid_is_alive

function nipux_cli/cli.py:1900

No docstring.

Calls: os.kill

_step_by_id

function nipux_cli/cli.py:1908

No docstring.

Calls: db.list_steps

_step_count

function nipux_cli/cli.py:1915

No docstring.

Calls: int, max, step.get

_job_lessons

function nipux_cli/cli.py:1920

No docstring.

Calls: _metadata_records

_metadata_records

function nipux_cli/cli.py:1924

No docstring.

Calls: isinstance, job.get, metadata.get

_print_lessons

function nipux_cli/cli.py:1930

No docstring.

Calls: _job_lessons, _one_line, _rule, enumerate, isinstance, len, lesson.get, max, print

_resolve_artifact_ref

function nipux_cli/cli.py:1946

No docstring.

Calls: ArtifactStore, Path, artifact.get, artifacts.extend, db.get_artifact, db.get_job, db.list_artifacts, db.list_jobs, expanduser, index_ref.isdigit, int, join, len, lower, ordered_jobs.append, ordered_jobs.extend

cmd_logs

function nipux_cli/cli.py:2008

No docstring.

Calls: _clean_step_summary, _db, _job_display_state, _job_ref_text, _one_line, _print_step, _resolve_job_id, artifact.get, bool, daemon_lock_status, db.close, db.get_job, db.list_artifacts, db.list_runs, db.list_steps, load_config

cmd_activity

function nipux_cli/cli.py:2052

No docstring.

Calls: _db, _event_line, _job_display_state, _print_event_details, _resolve_job_id, _rule, bool, daemon_lock_status, db.close, db.get_job, db.list_timeline_events, emit, event.get, isinstance, load_config, metadata.get

emit

function nipux_cli/cli.py:2066

No docstring.

Calls: _event_line, _print_event_details, _rule, db.list_timeline_events, event.get, isinstance, metadata.get, print, seen_events.add, str

cmd_updates

function nipux_cli/cli.py:2095

No docstring.

Calls: _db, _resolve_job_id, db.close, getattr, join, print, render_all_updates_report, render_updates_report

cmd_watch

function nipux_cli/cli.py:2131

No docstring.

Calls: _db, _job_display_state, _job_ref_text, _print_artifact, _print_run, _print_step, _resolve_job_id, bool, daemon_lock_status, db.close, db.get_job, db.list_artifacts, db.list_runs, db.list_steps, emit_snapshot, list

emit_snapshot

function nipux_cli/cli.py:2150

No docstring.

Calls: _print_artifact, _print_run, _print_step, db.get_job, db.list_artifacts, db.list_runs, db.list_steps, list, print, reversed, seen_artifacts.add, seen_runs.add, seen_steps.add

cmd_run_one

function nipux_cli/cli.py:2194

No docstring.

Calls: LLMResponse, ScriptedLLM, ToolCall, _activate_job_if_planning, _db, _ensure_model_setup_verified_for_workspace, _job_ref_text, _model_setup_verified, _resolve_job_id, db.close, json.dumps, print, run_one_step

cmd_work

function nipux_cli/cli.py:2234

No docstring.

Calls: LLMResponse, ScriptedLLM, ToolCall, _activate_job_if_planning, _db, _ensure_model_setup_verified_for_workspace, _model_setup_verified, _print_step, _resolve_job_id, _step_by_id, _terminal_width, collect_dashboard_state, db.close, db.get_job, json.dumps, print

_pause_job_for_recoverable_provider_preflight

function nipux_cli/cli.py:2295

No docstring.

Calls: _recoverable_remote_model_preflight_failures, db.append_agent_update, db.get_job, db.update_job_metadata, db.update_job_status, get, job.get, job_provider_blocked, join, str, utc_now

cmd_run

function nipux_cli/cli.py:2345

No docstring.

Calls: _db, _default_job_id, _ensure_job_runnable, _find_job, _job_display_state, _job_ref_text, _model_setup_verified, _pause_job_for_recoverable_provider_preflight, _provider_preflight_is_recoverable, _remote_model_preflight_failures, _start_daemon_if_needed, _write_shell_state, ar...

cmd_digest

function nipux_cli/cli.py:2421

No docstring.

Calls: _db, _job_ref_text, _resolve_job_id, db.close, print, render_job_digest

cmd_daily_digest

function nipux_cli/cli.py:2444

No docstring.

Calls: _db, db.close, json.dumps, print, write_daily_digest

cmd_daemon

function nipux_cli/cli.py:2453

No docstring.

Calls: Daemon.open, SystemExit, _ensure_remote_model_ready_for_worker, daemon.close, daemon.run_forever, daemon.run_once, json.dumps, load_config, print, str

cmd_doctor

function nipux_cli/cli.py:2470

No docstring.

Calls: SystemExit, _clear_model_setup_verified, _mark_model_setup_verified, all, load_config, print, run_doctor

_verify_model_setup_from_first_run

function nipux_cli/cli.py:2487

No docstring.

Calls: StringIO, argparse.Namespace, cmd_doctor, item.split, item.strip, join, print, redirect_stdout, splitlines, stream.getvalue

_chat_handle_line

function nipux_cli/cli.py:2501

No docstring.

Calls: _chat_command_deps, _db, _handle_chat_message, _handle_chat_slash_command, argparse.Namespace, cmd_focus, cmd_jobs, db.close, db.get_job, line.startswith, line.strip, print, shlex.split

_handle_chat_message

function nipux_cli/cli.py:2559

No docstring.

Calls: _chat_controller_deps, _controller_handle_chat_message, _handle_workspace_chat_message, _model_setup_verified, load_config, print

_chat_reply_text_and_metadata

function nipux_cli/cli.py:2578

No docstring.

Calls: _controller_reply_text_and_metadata

_workspace_chat_events

function nipux_cli/cli.py:2582

No docstring.

Calls: _read_shell_state, get, isinstance

_append_workspace_chat_event

function nipux_cli/cli.py:2589

No docstring.

Calls: _workspace_chat_events, _write_shell_state, events.append, int, len, time.time, utc_now

_handle_workspace_chat_message

function nipux_cli/cli.py:2605

No docstring.

Calls: _append_workspace_chat_event, _capture_chat_command, _chat_reply_text_and_metadata, _compact_command_output, _create_workspace_job_from_chat, _extract_job_objective_from_message, _friendly_error_text, _reply_to_workspace_chat, chat_control_command, control_command.lstrip, join...

_create_workspace_job_from_chat

function nipux_cli/cli.py:2644

No docstring.

Calls: _create_job, _db, _refine_job_objective_for_worker, _start_worker_from_chat_context, _write_shell_state, db.append_agent_update, db.append_operator_message, db.close, message_requests_immediate_run, message_requests_queued_job

_refine_job_objective_for_worker

function nipux_cli/cli.py:2670

No docstring.

Calls: OpenAIChatLLM, _db, _db_handle.close, _durable_job_objective, complete, len, strip

_durable_job_objective

function nipux_cli/cli.py:2703

No docstring.

Calls: _one_line, join, split, str, strip

_workspace_chat_job_dossier

function nipux_cli/cli.py:2718

Compact job context for the left-side workspace chat model.

Calls: _metadata_records, _one_line, _safe_job_counts, _safe_list_artifacts, _safe_list_events, _safe_list_steps, _step_line, _workspace_current_task, _workspace_recent_outcomes, artifact.get, counts.get, enumerate, isinstance, job.get, join, len

_safe_job_counts

function nipux_cli/cli.py:2770

No docstring.

Calls: db.job_record_counts

_safe_list_artifacts

function nipux_cli/cli.py:2779

No docstring.

Calls: db.list_artifacts

_safe_list_events

function nipux_cli/cli.py:2786

No docstring.

Calls: db.list_timeline_events

_safe_list_steps

function nipux_cli/cli.py:2793

No docstring.

Calls: db.list_steps

_workspace_current_task

function nipux_cli/cli.py:2800

No docstring.

Calls: _one_line, get, int, str, task.get, visible.sort

_workspace_recent_outcomes

function nipux_cli/cli.py:2821

No docstring.

Calls: _model_update_event_parts, _one_line, label.lower, len, outcomes.append, reversed, seen.add, set

_reply_to_workspace_chat

function nipux_cli/cli.py:2841

No docstring.

Calls: OpenAIChatLLM, _db, _one_line, _workspace_chat_events, _workspace_chat_job_dossier, chr, complete_response, db.close, db.list_jobs, event.get, join

_handle_chat_control_intent

function nipux_cli/cli.py:2883

No docstring.

Calls: _chat_controller_deps, _controller_handle_chat_control_intent

_maybe_spawn_job_from_chat

function nipux_cli/cli.py:2887

No docstring.

Calls: _chat_controller_deps, _controller_maybe_spawn_job_from_chat

_queue_chat_note

function nipux_cli/cli.py:2891

No docstring.

Calls: _chat_controller_deps, _controller_queue_chat_note

_chat_controller_deps

function nipux_cli/cli.py:2895

No docstring.

Calls: ChatControllerDeps

_chat_command_deps

function nipux_cli/cli.py:2908

No docstring.

Calls: ChatCommandDeps

_reply_to_chat

function nipux_cli/cli.py:2949

No docstring.

Calls: OpenAIChatLLM, _build_chat_messages, _db, complete_response, db.close, db.get_job

cmd_shell

function nipux_cli/cli.py:2961

No docstring.

Calls: _install_readline_history, _print_shell_header, _print_shell_status, _run_shell_line, _shell_prompt, input, print

_print_shell_header

function nipux_cli/cli.py:2980

No docstring.

Calls: _rule, _shell_summary, print

_shell_summary

function nipux_cli/cli.py:2989

No docstring.

Calls: _db, _default_job_id, _job_display_state, _worker_label, bool, daemon_lock_status, db.close, db.get_job

_shell_prompt

function nipux_cli/cli.py:3006

No docstring.

Calls: _db, _default_job_id, _worker_label, bool, daemon_lock_status, db.close, db.get_job, job.get, load_config, str, strip

_install_readline_history

function nipux_cli/cli.py:3023

No docstring.

Calls: atexit.register, config.ensure_dirs, load_config, readline.read_history_file

_print_shell_status

function nipux_cli/cli.py:3039

No docstring.

Calls: _db, _terminal_width, collect_dashboard_state, db.close, print, render_dashboard

_print_shell_help

function nipux_cli/cli.py:3049

No docstring.

Calls: _render_shell_help

_run_shell_line

function nipux_cli/cli.py:3053

No docstring.

Calls: _print_shell_help, _steer_default_job, build_parser, isinstance, join, len, line.strip, lower, natural_command_for, parsed.func, parser.parse_args, print, shlex.split

_steer_default_job

function nipux_cli/cli.py:3101

No docstring.

Calls: _db, _default_job_id, db.append_operator_message, db.close, db.get_job, print

build_parser

function nipux_cli/cli.py:3116

No docstring.

Calls: build_arg_parser

main

function nipux_cli/cli.py:3173

No docstring.

Calls: argparse.Namespace, args.func, build_parser, cmd_home, parser.parse_args, print

print_shell_help

function nipux_cli/cli_help.py:18

No docstring.

Calls: _print_group, print, rule

_print_group

function nipux_cli/cli_help.py:88

No docstring.

Calls: print

clip_json

function nipux_cli/cli_render.py:18

No docstring.

Calls: json.dumps, len

print_step

function nipux_cli/cli_render.py:25

No docstring.

Calls: _one_line, checkpoint.get, clean_step_summary, clip_json, isinstance, lesson.get, output_data.get, print, source.get, step.get, update.get

print_artifact

function nipux_cli/cli_render.py:63

No docstring.

Calls: artifact.get, print

print_run

function nipux_cli/cli_render.py:69

No docstring.

Calls: print, run.get

print_wrapped

function nipux_cli/cli_render.py:75

No docstring.

Calls: join, len, max, min, prefix.rstrip, print, split, str, textwrap.wrap

section_title

function nipux_cli/cli_render.py:87

No docstring.

Calls: _accent, _fancy_ui, _one_line, len, max, min, terminal_width, title.upper

print_metric_grid

function nipux_cli/cli_render.py:99

No docstring.

Calls: join, len, ljust, max, min, print, range, rstrip, terminal_width

short_path

function nipux_cli/cli_render.py:108

No docstring.

Calls: Path.home, len, max, str, text.startswith

print_jobs_panel

function nipux_cli/cli_render.py:119

No docstring.

Calls: _one_line, _status_badge, enumerate, item.get, job_display_state, len, print, section_title, str, worker_label

next_operator_action

function nipux_cli/cli_render.py:136

No docstring.

Calls: job.get, str

important_startup_events

function nipux_cli/cli_render.py:155

No docstring.

Calls: event.get, len, reversed, selected.append, selected.sort, str

print_event_card

function nipux_cli/cli_render.py:186

No docstring.

Calls: _event_badge, _muted, _one_line, artifact_indexes.get, event.get, event_display_parts, print, str

public_event

function nipux_cli/cli_render.py:197

No docstring.

Calls: dict, public.pop

print_event_details

function nipux_cli/cli_render.py:203

No docstring.

Calls: _one_line, event.get, isinstance, json.dumps, metadata.get, metadata.items, print

step_line

function nipux_cli/cli_render.py:220

No docstring.

Calls: _one_line, clean_step_summary, step.get

terminal_width

function nipux_cli/cli_render.py:227

No docstring.

Calls: shutil.get_terminal_size

rule

function nipux_cli/cli_render.py:231

No docstring.

Calls: min, terminal_width

json_default

function nipux_cli/cli_render.py:235

No docstring.

Calls: str

daemon_state_line

function nipux_cli/cli_render.py:239

No docstring.

Calls: isinstance, lock.get, metadata.get

daemon_event_line

function nipux_cli/cli_render.py:248

No docstring.

Calls: _one_line, event.get, job_titles.get, join, pieces.append, str, strip

job_ref_text

function nipux_cli/cli_render.py:264

No docstring.

Calls: isinstance, join, str, text.split

note_text

function nipux_cli/cli_render.py:275

No docstring.

Calls: isinstance, join, str, strip

default_job_id

function nipux_cli/cli_state.py:15

No docstring.

Calls: configured_focus_job_id, db.list_jobs, job.get, str

configured_focus_job_id

function nipux_cli/cli_state.py:27

No docstring.

Calls: db.get_job, get, isinstance, read_shell_state

find_job

function nipux_cli/cli_state.py:38

No docstring.

Calls: db.list_jobs, job.get, join, lower, query.split, str

shell_state_path

function nipux_cli/cli_state.py:55

No docstring.

Calls: config.ensure_dirs, load_config

read_shell_state

function nipux_cli/cli_state.py:61

No docstring.

Calls: isinstance, json.loads, path.exists, path.read_text, shell_state_path

write_shell_state

function nipux_cli/cli_state.py:72

No docstring.

Calls: json.dumps, read_shell_state, shell_state_path, state.update, write_text

setup_completed

function nipux_cli/cli_state.py:80

No docstring.

Calls: bool, get, read_shell_state

mark_setup_completed

function nipux_cli/cli_state.py:84

No docstring.

Calls: write_shell_state

model_setup_fingerprint

function nipux_cli/cli_state.py:88

No docstring.

Calls: config.model.api_key.encode, encode, hashlib.sha256, hexdigest, json.dumps, load_config

model_setup_verified

function nipux_cli/cli_state.py:100

No docstring.

Calls: isinstance, marker.get, model_setup_fingerprint, read_shell_state, state.get

mark_model_setup_verified

function nipux_cli/cli_state.py:108

No docstring.

Calls: datetime.now, isoformat, load_config, model_setup_fingerprint, write_shell_state

clear_model_setup_verified

function nipux_cli/cli_state.py:125

No docstring.

Calls: write_shell_state

_clip_text

function nipux_cli/compression.py:10

No docstring.

Calls: join, len, max, rstrip, split, str

refresh_memory_index

function nipux_cli/compression.py:17

Write a compact, artifact-referenced job memory entry. This is deliberately deterministic. A local model can later improve the prose, but the daemon should always have a cheap c...

Calls: _clip_text, _compact_count, _context_fraction, _first_positive_int, _metadata_list, _rank_tasks, active_prompt_operator_entries, artifact.get, db.get_job, db.job_token_usage, db.list_artifacts, db.list_steps, db.upsert_memory, entry.get, experiment.get, finding.get

_metadata_list

function nipux_cli/compression.py:193

No docstring.

Calls: isinstance, metadata.get

_rank_tasks

function nipux_cli/compression.py:200

No docstring.

Calls: int, sorted, status_rank.get, str, task.get

_compact_count

function nipux_cli/compression.py:212

No docstring.

Calls: float, int, str

_context_fraction

function nipux_cli/compression.py:224

No docstring.

Calls: _first_positive_int, float, usage.get

_first_positive_int

function nipux_cli/compression.py:238

No docstring.

Calls: float, int

get_agent_home

function nipux_cli/config.py:22

Return the Nipux agent home directory.

Calls: Path, Path.home, expanduser, os.environ.get, strip

load_env_file

function nipux_cli/config.py:29

Load KEY=value pairs from a local env file without overriding the shell.

Calls: Path, ensure_private_file_permissions, env_path.exists, env_path.read_text, expanduser, key.strip, line.split, line.startswith, raw_line.strip, splitlines, strip, value.strip

ensure_private_file_permissions

function nipux_cli/config.py:47

Best-effort POSIX privacy for local config/secret files.

Calls: Path, chmod

ensure_private_dir_permissions

function nipux_cli/config.py:58

Best-effort POSIX privacy for the local Nipux state directory.

Calls: Path, chmod

write_private_text

function nipux_cli/config.py:69

Write text with private file permissions from creation time.

Calls: Path, ensure_private_file_permissions, expanduser, handle.write, os.close, os.fdopen, os.open, target.parent.mkdir

ModelConfig

class nipux_cli/config.py:87

No docstring.

Calls: dataclass, os.environ.get

api_key

function nipux_cli/config.py:97

No docstring.

Calls: os.environ.get

RuntimeConfig

class nipux_cli/config.py:102

No docstring.

Calls: dataclass, field

state_db_path

function nipux_cli/config.py:112

No docstring.

Calls: none

jobs_dir

function nipux_cli/config.py:116

No docstring.

Calls: none

logs_dir

function nipux_cli/config.py:120

No docstring.

Calls: none

digests_dir

function nipux_cli/config.py:124

No docstring.

Calls: none

ToolAccessConfig

class nipux_cli/config.py:129

No docstring.

Calls: dataclass

EmailConfig

class nipux_cli/config.py:137

No docstring.

Calls: dataclass, os.environ.get

password

function nipux_cli/config.py:148

No docstring.

Calls: os.environ.get

AppConfig

class nipux_cli/config.py:153

No docstring.

Calls: dataclass, directory.mkdir, ensure_private_dir_permissions, field

ensure_dirs

function nipux_cli/config.py:159

No docstring.

Calls: directory.mkdir, ensure_private_dir_permissions

_as_dict

function nipux_cli/config.py:170

No docstring.

Calls: isinstance

_optional_float

function nipux_cli/config.py:174

No docstring.

Calls: float

load_config

function nipux_cli/config.py:180

Load config.yaml, falling back to a local OpenAI-compatible endpoint.

Calls: AppConfig, EmailConfig, ModelConfig, Path, RuntimeConfig, ToolAccessConfig, _as_dict, _optional_float, bool, cfg_path.exists, cfg_path.read_text, email_raw.get, expanduser, float, get_agent_home, int

default_config_yaml

function nipux_cli/config.py:234

Return a starter config file for an OpenAI-compatible model server.

Calls: base_url.rstrip

context_pressure_for_prompt

function nipux_cli/context_pressure.py:34

No docstring.

Calls: _as_float, compact_token_count, isinstance, job.get, metadata.get, pressure.get, str

usage_pressure_for_prompt

function nipux_cli/context_pressure.py:54

No docstring.

Calls: _as_float, _as_int, _durable_usage_signal_count, _usage_pressure_band, bits.append, bool, compact_token_count, int, isinstance, join, max, usage.get

emit_usage_pressure_update

function nipux_cli/context_pressure.py:95

No docstring.

Calls: _as_float, _as_int, _usage_pressure_band, bool, compact_token_count, datetime.now, db.append_agent_update, db.get_job, db.update_job_metadata, int, isinstance, isoformat, job.get, max, metadata.get, previous.get

emit_context_pressure_update

function nipux_cli/context_pressure.py:139

No docstring.

Calls: _as_float, _as_int, _context_pressure_band, compact_token_count, datetime.now, db.append_agent_update, db.get_job, db.update_job_metadata, isinstance, isoformat, job.get, max, metadata.get, previous.get, round, str

compact_token_count

function nipux_cli/context_pressure.py:177

No docstring.

Calls: _as_int, str

_usage_pressure_band

function nipux_cli/context_pressure.py:186

No docstring.

Calls: _as_float, _as_int, bool, usage.get

_context_pressure_band

function nipux_cli/context_pressure.py:209

No docstring.

Calls: none

_durable_usage_signal_count

function nipux_cli/context_pressure.py:216

No docstring.

Calls: isinstance, job.get, lower, metadata.get, milestone.get, roadmap.get, str, sum, task.get

_as_float

function nipux_cli/context_pressure.py:243

No docstring.

Calls: float

_as_int

function nipux_cli/context_pressure.py:250

No docstring.

Calls: float, int

DaemonAlreadyRunning

class nipux_cli/daemon.py:29

No docstring.

Calls: none

runtime_code_file_names

function nipux_cli/daemon.py:36

No docstring.

Calls: Path, _runtime_code_paths, resolve, tuple

current_runtime_fingerprint

function nipux_cli/daemon.py:42

Return a stable fingerprint for code that affects daemon behavior.

Calls: DEFAULT_REGISTRY.names, DEFAULT_REGISTRY.openai_tools, SYSTEM_PROMPT.encode, _runtime_code_fingerprint, encode, hashlib.sha256, hexdigest, json.dumps, len, lru_cache, payload.items

_runtime_code_fingerprint

function nipux_cli/daemon.py:68

No docstring.

Calls: Path, _runtime_code_paths, digest, digest.hexdigest, digest.update, hashlib.sha256, lru_cache, max, mtimes.append, name.encode, path.read_bytes, path.stat, resolve

_runtime_code_paths

function nipux_cli/daemon.py:84

No docstring.

Calls: package_dir.glob, path.is_file, sorted

runtime_stale

function nipux_cli/daemon.py:93

No docstring.

Calls: current_runtime_fingerprint, get, isinstance, metadata.get, recorded.get

_parse_lock_metadata

function nipux_cli/daemon.py:102

No docstring.

Calls: isinstance, json.loads, raw.strip

daemon_lock_status

function nipux_cli/daemon.py:113

Return whether another process currently holds the daemon lock.

Calls: Path, _parse_lock_metadata, contextlib.suppress, current_runtime_fingerprint, fcntl.flock, handle.fileno, handle.read, handle.seek, path.open, path.parent.mkdir, runtime_stale, str

single_instance_lock

function nipux_cli/daemon.py:146

Hold an exclusive non-blocking daemon lock for this state directory.

Calls: DaemonAlreadyRunning, Path, current_runtime_fingerprint, datetime.now, fcntl.flock, handle.fileno, handle.flush, handle.seek, handle.truncate, handle.write, isoformat, json.dumps, os.getpid, path.open, path.parent.mkdir

update_lock_metadata

function nipux_cli/daemon.py:171

No docstring.

Calls: _parse_lock_metadata, datetime.now, handle.flush, handle.read, handle.seek, handle.truncate, handle.write, isoformat, json.dumps, metadata.setdefault, metadata.update, os.getpid

_work_heartbeat

function nipux_cli/daemon.py:184

Keep daemon lock metadata fresh while a worker turn is in progress.

Calls: current_runtime_fingerprint, datetime.now, float, isoformat, max, stop.set, stop.wait, thread.join, thread.start, threading.Event, threading.Thread, update_metadata

beat

function nipux_cli/daemon.py:197

No docstring.

Calls: current_runtime_fingerprint, datetime.now, isoformat, stop.wait, update_metadata

append_daemon_event

function nipux_cli/daemon.py:221

Append a small daemon event that the CLI can tail without parsing stdout.

Calls: config.ensure_dirs, datetime.now, handle.write, isoformat, json.dumps, path.open

read_daemon_events

function nipux_cli/daemon.py:236

No docstring.

Calls: events.append, isinstance, json.loads, path.exists, path.read_text, splitlines

fake_step_llm

function nipux_cli/daemon.py:253

No docstring.

Calls: LLMResponse, ScriptedLLM, ToolCall, datetime.now, isoformat

Daemon

class nipux_cli/daemon.py:273

No docstring.

Calls: AgentDB, _exception_backoff, _exception_payload, _is_digest_due, _model_generation_ready, _provider_probe_due, _sleep_or_stop, _step_failure_backoff, _work_heartbeat, append_daemon_event, astimezone, cleanup_registered_shell_processes, cls, config.ensure_dirs, current_runtime_...

open

function nipux_cli/daemon.py:278

No docstring.

Calls: AgentDB, cls, config.ensure_dirs, load_config

lock_path

function nipux_cli/daemon.py:284

No docstring.

Calls: none

close

function nipux_cli/daemon.py:287

No docstring.

Calls: self.db.close

next_runnable_job

function nipux_cli/daemon.py:290

Return the next runnable job by priority/age. UI focus is intentionally not used here. Focus is for the operator's chat view; the daemon should keep all runnable jobs advancing.

Calls: datetime.now, job_is_deferred, job_provider_blocked, self._maybe_recover_provider_blocked_jobs, self.db.append_agent_update, self.db.list_jobs, self.db.update_job_status

_maybe_recover_provider_blocked_jobs

function nipux_cli/daemon.py:324

No docstring.

Calls: _model_generation_ready, _provider_probe_due, append_daemon_event, job_provider_blocked, len, now.isoformat, self.db.append_agent_update, self.db.list_jobs, self.db.update_job_status

idle_sleep_seconds

function nipux_cli/daemon.py:368

Return the next idle sleep, capped by the nearest deferred job wake.

Calls: astimezone, datetime.now, due_times.append, job_deferred_until, max, min, self.db.list_jobs, total_seconds

run_once

function nipux_cli/daemon.py:383

No docstring.

Calls: fake_step_llm, print, run_one_step, self.next_runnable_job

send_due_daily_digest

function nipux_cli/daemon.py:395

No docstring.

Calls: _is_digest_due, datetime.now, isoformat, now.date, self.db.digest_exists, write_daily_digest

run_forever

function nipux_cli/daemon.py:407

No docstring.

Calls: _exception_backoff, _exception_payload, _sleep_or_stop, _step_failure_backoff, _work_heartbeat, append_daemon_event, cleanup_registered_shell_processes, current_runtime_fingerprint, datetime.now, isoformat, json.dumps, len, locked_update_metadata, max, os.getpid, print

locked_update_metadata

function nipux_cli/daemon.py:421

No docstring.

Calls: update_lock_metadata

_is_digest_due

function nipux_cli/daemon.py:562

No docstring.

Calls: configured_time.split, int

_raise_keyboard_interrupt

function nipux_cli/daemon.py:572

No docstring.

Calls: none

_exception_payload

function nipux_cli/daemon.py:576

No docstring.

Calls: str, type

_failure_backoff

function nipux_cli/daemon.py:583

No docstring.

Calls: max, min

_step_failure_backoff

function nipux_cli/daemon.py:588

Return a retry delay for failed worker steps. Worker failures are recorded as failed steps rather than escaping as daemon exceptions, so they use the same generic throttling pat...

Calls: _failure_backoff

_exception_backoff

function nipux_cli/daemon.py:599

No docstring.

Calls: _failure_backoff, _is_rate_limit_error, _retry_after_seconds, max, min

_is_rate_limit_error

function nipux_cli/daemon.py:609

No docstring.

Calls: _is_rate_limit_text, getattr, type

_is_rate_limit_text

function nipux_cli/daemon.py:616

No docstring.

Calls: provider_rate_limited

_retry_after_seconds

function nipux_cli/daemon.py:620

No docstring.

Calls: _exception_headers, _parse_retry_after, headers.items, key.lower

_exception_headers

function nipux_cli/daemon.py:631

No docstring.

Calls: dict, getattr, items, str

_parse_retry_after

function nipux_cli/daemon.py:639

No docstring.

Calls: contextlib.suppress, float, max, parsed.replace, parsed.timestamp, parsedate_to_datetime, str, strip, time.time

_provider_probe_due

function nipux_cli/daemon.py:658

No docstring.

Calls: contextlib.suppress, datetime.fromisoformat, isinstance, job.get, metadata.get, now.astimezone, previous.astimezone, previous.replace, raw.replace, str, strip, total_seconds

_model_generation_ready

function nipux_cli/daemon.py:671

No docstring.

Calls: join, run_doctor

_sleep_or_stop

function nipux_cli/daemon.py:680

No docstring.

Calls: time.sleep

_focused_job_id

function nipux_cli/daemon.py:686

No docstring.

Calls: isinstance, json.loads, parsed.get, path.exists, path.read_text

remote_model_preflight_failures

function nipux_cli/daemon_control.py:27

No docstring.

Calls: doctor_fn

_recoverable_provider_preflight

function nipux_cli/daemon_control.py:33

No docstring.

Calls: failure.split, provider_action_required, provider_rate_limited, strip

recoverable_remote_model_preflight_failures

function nipux_cli/daemon_control.py:45

No docstring.

Calls: _recoverable_provider_preflight, remote_model_preflight_failures

provider_preflight_is_recoverable

function nipux_cli/daemon_control.py:54

No docstring.

Calls: _recoverable_provider_preflight

ensure_remote_model_ready_for_worker

function nipux_cli/daemon_control.py:58

No docstring.

Calls: _recoverable_provider_preflight, clear_model_setup_verified, mark_model_setup_verified, print, remote_model_preflight_failures

cmd_start_impl

function nipux_cli/daemon_control.py:85

No docstring.

Calls: Path, Path.cwd, SystemExit, bool, command.append, config.ensure_dirs, daemon_lock_status, expanduser, load_config, log_path.open, log_path.parent.mkdir, metadata.get, print, process.poll, ready_fn, status.get

start_daemon_if_needed_impl

function nipux_cli/daemon_control.py:140

No docstring.

Calls: argparse.Namespace, config.ensure_dirs, daemon_lock_status, load_config, metadata.get, print, start_fn, status.get, stop_fn, time.sleep

cmd_restart_impl

function nipux_cli/daemon_control.py:165

No docstring.

Calls: argparse.Namespace, config.ensure_dirs, float, load_config, start_fn, stop_fn, time.sleep

stop_daemon_process_impl

function nipux_cli/daemon_control.py:179

No docstring.

Calls: SystemExit, _find_single_daemon_process, daemon_lock_status, isinstance, metadata.get, os.kill, pid_alive, print, status.get, time.sleep, time.time

_find_single_daemon_process

function nipux_cli/daemon_control.py:213

Best-effort recovery for older locks that lost pid metadata.

Calls: Path, candidates.append, command.split, int, join, len, line.partition, normalized.split, os.getpid, raw_line.strip, result.stdout.splitlines, set, sorted, subprocess.run

collect_dashboard_state

function nipux_cli/dashboard.py:19

Build a serializable snapshot for status and dashboard commands.

Calls: DEFAULT_REGISTRY.names, _focus_state, _job_card, _select_focus_job, daemon_lock_status, datetime.now, db.list_jobs, isoformat, len, str

render_dashboard

function nipux_cli/dashboard.py:48

Render a compact terminal dashboard.

Calls: _compact_time, _daemon_text, _job_state_text, _one_line, _render_focus, bool, daemon.get, job.get, join, latest.get, len, lines.append, lines.extend, ljust, max, min

render_overview

function nipux_cli/dashboard.py:86

Render a human-sized status view for the interactive shell.

Calls: _active_operator_messages, _daemon_health_text, _job_state_text, _one_line, _worker_text, agent_update.get, artifact.get, bool, counts.get, daemon.get, focus.get, get, isinstance, job.get, join, latest_step.get

_select_focus_job

function nipux_cli/dashboard.py:157

No docstring.

Calls: db.get_job, job.get

_job_card

function nipux_cli/dashboard.py:167

No docstring.

Calls: _public_step, _step_count, db.list_artifacts, db.list_runs, db.list_steps, len, step.get, sum

_focus_state

function nipux_cli/dashboard.py:185

No docstring.

Calls: Counter, _active_operator_messages, _public_artifact, _public_run, _public_step, _step_count, db.list_artifacts, db.list_memory, db.list_runs, db.list_steps, dict, endswith, entry.get, isinstance, job.get, len

_render_focus

function nipux_cli/dashboard.py:256

No docstring.

Calls: _compact_value, _job_state_text, _one_line, _tool_mix, artifact.get, counts.get, entry.get, finding.get, focus.get, get, isinstance, job.get, join, len, lesson.get, lines.append

_public_run

function nipux_cli/dashboard.py:334

No docstring.

Calls: run.get

_public_step

function nipux_cli/dashboard.py:345

No docstring.

Calls: _clean_step_summary, input_data.get, isinstance, step.get

_public_artifact

function nipux_cli/dashboard.py:362

No docstring.

Calls: artifact.get

_step_count

function nipux_cli/dashboard.py:373

No docstring.

Calls: int, max, step.get

_active_operator_messages

function nipux_cli/dashboard.py:378

No docstring.

Calls: active_prompt_operator_entries, entry.get, isinstance, metadata.get, str

_daemon_text

function nipux_cli/dashboard.py:389

No docstring.

Calls: daemon.get, metadata.get

_daemon_health_text

function nipux_cli/dashboard.py:399

No docstring.

Calls: _age_seconds, _one_line, daemon.get, int, metadata.get, running_step.get

_worker_text

function nipux_cli/dashboard.py:433

No docstring.

Calls: job.get, job_deferred_until, str

_job_state_text

function nipux_cli/dashboard.py:442

No docstring.

Calls: job.get, job_deferred_until, str

_age_seconds

function nipux_cli/dashboard.py:451

No docstring.

Calls: datetime.fromisoformat, datetime.now, max, parsed.astimezone, total_seconds, value.replace

_compact_time

function nipux_cli/dashboard.py:459

No docstring.

Calls: datetime.fromisoformat, parsed.astimezone, strftime, value.replace

_one_line

function nipux_cli/dashboard.py:467

No docstring.

Calls: join, max, shorten, split, str

_clean_step_summary

function nipux_cli/dashboard.py:472

No docstring.

Calls: join, split, str, text.split, text.startswith

_compact_value

function nipux_cli/dashboard.py:479

No docstring.

Calls: isinstance, join, sorted, str

_tool_mix

function nipux_cli/dashboard.py:486

No docstring.

Calls: join, tool_counts.items

resolve_artifact_path

function nipux_cli/dashboard.py:492

No docstring.

Calls: Path, expanduser, str

utc_now

function nipux_cli/db.py:136

No docstring.

Calls: datetime.now, isoformat

new_id

function nipux_cli/db.py:140

No docstring.

Calls: uuid.uuid4

_slugify

function nipux_cli/db.py:144

No docstring.

Calls: new_id, re.sub, strip, value.lower

_unique_job_id

function nipux_cli/db.py:149

No docstring.

Calls: _slugify, conn.execute, fetchone

_json_dumps

function nipux_cli/db.py:159

No docstring.

Calls: json.dumps

_json_loads

function nipux_cli/db.py:163

No docstring.

Calls: isinstance, json.loads

_bounded_float

function nipux_cli/db.py:171

No docstring.

Calls: float, max, min

_merge_string_lists

function nipux_cli/db.py:179

No docstring.

Calls: isinstance, join, source.strip, split, str, values.append

_memory_edge_key

function nipux_cli/db.py:195

No docstring.

Calls: edge.get, str

_as_int

function nipux_cli/db.py:202

No docstring.

Calls: float, int

_as_float

function nipux_cli/db.py:209

No docstring.

Calls: float

_nested_value

function nipux_cli/db.py:216

No docstring.

Calls: current.get, isinstance

_metadata_list

function nipux_cli/db.py:225

No docstring.

Calls: isinstance, metadata.get

_change_fingerprint

function nipux_cli/db.py:232

No docstring.

Calls: _json_dumps, entry.get

_norm_key

function nipux_cli/db.py:236

No docstring.

Calls: re.sub, strip, value.lower

_clean_status

function nipux_cli/db.py:240

No docstring.

Calls: lower, replace, value.strip

_experiment_metric_value

function nipux_cli/db.py:245

No docstring.

Calls: entry.get, float

_same_metric_group

function nipux_cli/db.py:255

No docstring.

Calls: _experiment_metric_value, bool, entry.get, lower, metric_name.strip, metric_unit.strip, str, strip

_best_experiment_for_metric

function nipux_cli/db.py:270

No docstring.

Calls: _experiment_metric_value, _same_metric_group, experiment.get, max, min

_metric_delta

function nipux_cli/db.py:294

No docstring.

Calls: _experiment_metric_value, float, round

_mark_best_experiments

function nipux_cli/db.py:313

No docstring.

Calls: _experiment_metric_value, append, bool, experiment.get, groups.items, groups.setdefault, item.get, lower, max, min, str, strip, winners.append

_row_to_dict

function nipux_cli/db.py:337

No docstring.

Calls: dict, json.loads, key.removesuffix

_insert_event

function nipux_cli/db.py:350

No docstring.

Calls: _json_dumps, body.strip, conn.execute, event_type.strip, lower, new_id, ref_id.strip, ref_table.strip, title.strip, utc_now

_projected_event

function nipux_cli/db.py:394

No docstring.

Calls: none

AgentDB

class nipux_cli/db.py:420

Small SQLite wrapper with WAL and jittered write retries.

Calls: KeyError, Path, RuntimeError, ValueError, _as_float, _as_int, _best_experiment_for_metric, _bounded_float, _change_fingerprint, _clean_status, _insert_event, _json_dumps, _json_loads, _mark_best_experiments, _memory_edge_key, _merge_string_lists

__init__

function nipux_cli/db.py:425

No docstring.

Calls: Path, self._conn.execute, self._init_schema, self.path.parent.mkdir, sqlite3.connect, str, threading.RLock

close

function nipux_cli/db.py:440

No docstring.

Calls: self._conn.close, self._conn.execute

_init_schema

function nipux_cli/db.py:449

No docstring.

Calls: RuntimeError, fetchone, int, self._conn.execute, self._conn.executescript

_write

function nipux_cli/db.py:458

No docstring.

Calls: fn, lower, random.uniform, range, self._conn.commit, self._conn.execute, self._conn.rollback, sqlite3.OperationalError, str, time.sleep

append_event

function nipux_cli/db.py:479

No docstring.

Calls: _insert_event, self._write

op

function nipux_cli/db.py:491

No docstring.

Calls: _insert_event

list_events

function nipux_cli/db.py:506

No docstring.

Calls: _row_to_dict, fetchall, filters.append, int, join, lower, params.append, params.extend, self._conn.execute, str, strip

list_timeline_events

function nipux_cli/db.py:538

Return visible job history, combining durable events with old projected state.

Calls: _metadata_list, _projected_event, artifact.get, entry.get, enumerate, event.get, experiment.get, finding.get, format_metric_value, int, isinstance, job.get, lesson.get, list, max, memory.get

create_job

function nipux_cli/db.py:724

No docstring.

Calls: _insert_event, _json_dumps, _unique_job_id, conn.execute, objective.strip, self._write, splitlines, utc_now

op

function nipux_cli/db.py:737

No docstring.

Calls: _insert_event, _json_dumps, _unique_job_id, conn.execute

get_job

function nipux_cli/db.py:759

No docstring.

Calls: KeyError, _row_to_dict, fetchone, self._conn.execute

list_jobs

function nipux_cli/db.py:766

No docstring.

Calls: _row_to_dict, fetchall, join, list, self._conn.execute

update_job_status

function nipux_cli/db.py:778

No docstring.

Calls: KeyError, _insert_event, _json_dumps, conn.execute, current.update, fetchone, get, json.loads, self._write, str, utc_now

op

function nipux_cli/db.py:781

No docstring.

Calls: KeyError, _insert_event, _json_dumps, conn.execute, current.update, fetchone, get, json.loads, str

update_job_metadata

function nipux_cli/db.py:809

No docstring.

Calls: KeyError, _json_dumps, conn.execute, current.update, fetchone, json.loads, self._write, utc_now

op

function nipux_cli/db.py:812

No docstring.

Calls: KeyError, _json_dumps, conn.execute, current.update, fetchone, json.loads

claim_operator_messages

function nipux_cli/db.py:825

No docstring.

Calls: KeyError, _insert_event, _json_dumps, claimed.append, conn.execute, dict, entry.get, fetchone, isinstance, json.loads, len, lower, metadata.get, mode.strip, replace, self._write

op

function nipux_cli/db.py:835

No docstring.

Calls: KeyError, _insert_event, _json_dumps, claimed.append, conn.execute, dict, entry.get, fetchone, isinstance, json.loads, len, lower, metadata.get, replace, str, strip

acknowledge_operator_messages

function nipux_cli/db.py:883

No docstring.

Calls: KeyError, _insert_event, _json_dumps, acknowledged.append, conn.execute, dict, entry.get, fetchone, isinstance, json.loads, len, lower, metadata.get, replace, self._write, status.strip

op

function nipux_cli/db.py:897

No docstring.

Calls: KeyError, _insert_event, _json_dumps, acknowledged.append, conn.execute, dict, entry.get, fetchone, isinstance, json.loads, len, lower, metadata.get, replace, str, strip

rename_job

function nipux_cli/db.py:955

No docstring.

Calls: KeyError, ValueError, _insert_event, _row_to_dict, conn.execute, dict, fetchone, self._write, title.strip, utc_now

op

function nipux_cli/db.py:961

No docstring.

Calls: KeyError, _insert_event, _row_to_dict, conn.execute, dict, fetchone

delete_job

function nipux_cli/db.py:982

No docstring.

Calls: KeyError, _row_to_dict, conn.execute, fetchall, fetchone, self._write, str

op

function nipux_cli/db.py:983

No docstring.

Calls: KeyError, _row_to_dict, conn.execute, fetchall, fetchone, str

append_operator_message

function nipux_cli/db.py:1012

No docstring.

Calls: KeyError, ValueError, _insert_event, _json_dumps, conn.execute, fetchone, isinstance, json.loads, lower, message.strip, messages.append, metadata.get, mode.strip, replace, self._write, utc_now

op

function nipux_cli/db.py:1029

No docstring.

Calls: KeyError, _insert_event, _json_dumps, conn.execute, fetchone, isinstance, json.loads, messages.append, metadata.get

append_agent_update

function nipux_cli/db.py:1058

No docstring.

Calls: KeyError, ValueError, _insert_event, _json_dumps, category.strip, conn.execute, fetchone, isinstance, job_metadata.get, json.loads, message.strip, self._write, updates.append, utc_now

op

function nipux_cli/db.py:1077

No docstring.

Calls: KeyError, _insert_event, _json_dumps, conn.execute, fetchone, isinstance, job_metadata.get, json.loads, updates.append

append_lesson

function nipux_cli/db.py:1106

No docstring.

Calls: KeyError, ValueError, _insert_event, _json_dumps, _norm_key, category.strip, conn.execute, current.get, existing.get, fetchone, int, isinstance, item.get, job_metadata.get, json.loads, lesson.strip

op

function nipux_cli/db.py:1128

No docstring.

Calls: KeyError, _insert_event, _json_dumps, _norm_key, conn.execute, current.get, existing.get, fetchone, int, isinstance, item.get, job_metadata.get, json.loads, lessons.append, merged.update, next

append_memory_graph_records

function nipux_cli/db.py:1188

No docstring.

Calls: KeyError, ValueError, _bounded_float, _insert_event, _json_dumps, _memory_edge_key, _merge_string_lists, _metadata_list, _norm_key, conn.execute, current.get, edge.get, existing_edge_keys.add, fetchone, int, isinstance

op

function nipux_cli/db.py:1201

No docstring.

Calls: KeyError, _bounded_float, _insert_event, _json_dumps, _memory_edge_key, _merge_string_lists, _metadata_list, _norm_key, conn.execute, current.get, edge.get, existing_edge_keys.add, fetchone, int, isinstance, job_metadata.get

append_source_record

function nipux_cli/db.py:1349

No docstring.

Calls: KeyError, ValueError, _change_fingerprint, _insert_event, _json_dumps, _metadata_list, _norm_key, conn.execute, current.get, dict.fromkeys, entry.get, fetchone, float, int, isinstance, json.loads

op

function nipux_cli/db.py:1368

No docstring.

Calls: KeyError, _change_fingerprint, _insert_event, _json_dumps, _metadata_list, conn.execute, current.get, dict.fromkeys, entry.get, fetchone, float, int, isinstance, json.loads, list, merged_metadata.update

append_finding_record

function nipux_cli/db.py:1454

No docstring.

Calls: KeyError, ValueError, _change_fingerprint, _insert_event, _json_dumps, _metadata_list, _norm_key, category.strip, conn.execute, contact.strip, current.get, entry.get, evidence_artifact.strip, fetchone, findings.append, isinstance

op

function nipux_cli/db.py:1478

No docstring.

Calls: KeyError, _change_fingerprint, _insert_event, _json_dumps, _metadata_list, category.strip, conn.execute, contact.strip, current.get, entry.get, evidence_artifact.strip, fetchone, findings.append, isinstance, items, json.loads

append_roadmap_record

function nipux_cli/db.py:1569

No docstring.

Calls: KeyError, ValueError, _change_fingerprint, _clean_status, _insert_event, _json_dumps, _norm_key, bool, conn.execute, current.get, current_milestone.strip, entry.get, existing_features.append, feature.get, fetchone, int

merge_feature

function nipux_cli/db.py:1589

No docstring.

Calls: _change_fingerprint, _clean_status, _norm_key, current.get, entry.get, existing_features.append, feature.get, isinstance, items, lower, merged.update, next, replace, str, strip

op

function nipux_cli/db.py:1654

No docstring.

Calls: KeyError, _change_fingerprint, _clean_status, _insert_event, _json_dumps, _norm_key, bool, conn.execute, current.get, current_milestone.strip, entry.get, fetchone, int, isinstance, items, job_metadata.get

append_milestone_validation_record

function nipux_cli/db.py:1860

No docstring.

Calls: KeyError, ValueError, _clean_status, _insert_event, _json_dumps, _norm_key, conn.execute, current.get, entry.get, evidence.strip, fetchone, isinstance, job_metadata.get, json.loads, merged_metadata.update, milestone.strip

op

function nipux_cli/db.py:1880

No docstring.

Calls: KeyError, _insert_event, _json_dumps, _norm_key, conn.execute, current.get, entry.get, evidence.strip, fetchone, isinstance, job_metadata.get, json.loads, merged_metadata.update, milestones.append, next, next_action.strip

append_task_record

function nipux_cli/db.py:1980

No docstring.

Calls: KeyError, ValueError, _change_fingerprint, _insert_event, _json_dumps, _metadata_list, _norm_key, acceptance_criteria.strip, conn.execute, current.get, entry.get, evidence_needed.strip, fetchone, goal.strip, int, isinstance

op

function nipux_cli/db.py:2009

No docstring.

Calls: KeyError, _change_fingerprint, _insert_event, _json_dumps, _metadata_list, _norm_key, acceptance_criteria.strip, conn.execute, current.get, entry.get, evidence_needed.strip, fetchone, goal.strip, int, isinstance, items

append_experiment_record

function nipux_cli/db.py:2117

No docstring.

Calls: KeyError, ValueError, _best_experiment_for_metric, _change_fingerprint, _insert_event, _json_dumps, _mark_best_experiments, _metadata_list, _metric_delta, _norm_key, bool, conn.execute, current.get, entry.get, evidence_artifact.strip, experiments.append

op

function nipux_cli/db.py:2145

No docstring.

Calls: KeyError, _best_experiment_for_metric, _change_fingerprint, _insert_event, _json_dumps, _mark_best_experiments, _metadata_list, _metric_delta, bool, conn.execute, current.get, entry.get, evidence_artifact.strip, experiments.append, fetchone, format_metric_value

append_reflection

function nipux_cli/db.py:2276

No docstring.

Calls: KeyError, ValueError, _insert_event, _json_dumps, _metadata_list, conn.execute, fetchone, json.loads, reflections.append, self._write, strategy.strip, summary.strip, utc_now

op

function nipux_cli/db.py:2295

No docstring.

Calls: KeyError, _insert_event, _json_dumps, _metadata_list, conn.execute, fetchone, json.loads, reflections.append, strategy.strip

start_run

function nipux_cli/db.py:2322

No docstring.

Calls: _insert_event, conn.execute, new_id, self._write, utc_now

op

function nipux_cli/db.py:2326

No docstring.

Calls: _insert_event, conn.execute

finish_run

function nipux_cli/db.py:2350

No docstring.

Calls: conn.execute, self._write, utc_now

op

function nipux_cli/db.py:2353

No docstring.

Calls: conn.execute

mark_interrupted_running

function nipux_cli/db.py:2361

No docstring.

Calls: _json_dumps, conn.execute, int, self._write, utc_now

op

function nipux_cli/db.py:2365

No docstring.

Calls: _json_dumps, conn.execute, int

next_step_no

function nipux_cli/db.py:2392

No docstring.

Calls: fetchone, int, self._conn.execute

add_step

function nipux_cli/db.py:2396

No docstring.

Calls: _insert_event, _json_dumps, conn.execute, new_id, self._write, self.next_step_no, utc_now

op

function nipux_cli/db.py:2411

No docstring.

Calls: _insert_event, _json_dumps, conn.execute

finish_step

function nipux_cli/db.py:2434

No docstring.

Calls: _insert_event, _json_dumps, conn.execute, fetchone, self._write, utc_now

op

function nipux_cli/db.py:2445

No docstring.

Calls: _insert_event, _json_dumps, conn.execute, fetchone

list_steps

function nipux_cli/db.py:2480

No docstring.

Calls: _row_to_dict, fetchall, int, self._conn.execute

job_record_counts

function nipux_cli/db.py:2519

No docstring.

Calls: fetchone, int, self._conn.execute

job_token_usage

function nipux_cli/db.py:2537

No docstring.

Calls: _as_float, _as_int, _json_loads, _nested_value, bool, fetchall, isinstance, metadata.get, self._conn.execute, str, usage.get

list_runs

function nipux_cli/db.py:2592

No docstring.

Calls: _row_to_dict, fetchall, self._conn.execute

add_artifact

function nipux_cli/db.py:2599

No docstring.

Calls: _insert_event, _json_dumps, conn.execute, new_id, self._write, str, utc_now

op

function nipux_cli/db.py:2615

No docstring.

Calls: _insert_event, _json_dumps, conn.execute, str

get_artifact

function nipux_cli/db.py:2650

No docstring.

Calls: KeyError, _row_to_dict, fetchone, self._conn.execute

list_artifacts

function nipux_cli/db.py:2657

No docstring.

Calls: _row_to_dict, fetchall, self._conn.execute

upsert_memory

function nipux_cli/db.py:2664

No docstring.

Calls: _insert_event, _json_dumps, conn.execute, fetchone, new_id, self._write, str, utc_now

op

function nipux_cli/db.py:2675

No docstring.

Calls: _insert_event, _json_dumps, conn.execute, fetchone, str

list_memory

function nipux_cli/db.py:2704

No docstring.

Calls: _row_to_dict, fetchall, self._conn.execute

digest_exists

function nipux_cli/db.py:2711

No docstring.

Calls: fetchone, self._conn.execute

record_digest

function nipux_cli/db.py:2718

No docstring.

Calls: _insert_event, conn.execute, new_id, self._write, str, utc_now

op

function nipux_cli/db.py:2731

No docstring.

Calls: _insert_event, conn.execute, str, utc_now

_metadata_list

function nipux_cli/digest.py:17

No docstring.

Calls: isinstance, job.get, metadata.get

_active_operator_messages

function nipux_cli/digest.py:23

No docstring.

Calls: active_prompt_operator_entries, entry.get, str

_safe_int

function nipux_cli/digest.py:32

No docstring.

Calls: float, int

_latest_run_model

function nipux_cli/digest.py:39

No docstring.

Calls: db.list_runs, get, str

_usage_lines

function nipux_cli/digest.py:46

No docstring.

Calls: _format_compact_count, _format_usage_cost, _latest_run_model, _safe_int, bool, db.job_token_usage, lines.append, usage.get

render_job_digest

function nipux_cli/digest.py:86

No docstring.

Calls: _active_operator_messages, _metadata_list, _usage_lines, artifact.get, bool, db.get_job, db.list_artifacts, db.list_steps, entry.get, experiment.get, finding.get, float, int, item.get, join, len

send_digest_email

function nipux_cli/digest.py:212

No docstring.

Calls: EmailMessage, ValueError, all, message.set_content, smtp.login, smtp.send_message, smtp.starttls, smtplib.SMTP

render_daily_digest

function nipux_cli/digest.py:232

No docstring.

Calls: _active_operator_messages, _metadata_list, _usage_lines, artifact.get, db.list_artifacts, db.list_jobs, db.list_steps, entry.get, experiment.get, finding.get, float, int, item.get, join, len, lesson.get

write_daily_digest

function nipux_cli/digest.py:340

No docstring.

Calls: Path, body_path.write_text, config.runtime.digests_dir.mkdir, date.today, db.digest_exists, db.record_digest, email_result.get, isoformat, render_daily_digest, send_digest_email, str

Check

class nipux_cli/doctor.py:20

No docstring.

Calls: dataclass

as_dict

function nipux_cli/doctor.py:25

No docstring.

Calls: none

_check_writable_dir

function nipux_cli/doctor.py:29

No docstring.

Calls: Check, path.mkdir, probe.unlink, probe.write_text, str

_check_db

function nipux_cli/doctor.py:40

No docstring.

Calls: AgentDB, Check, db.close, str

_check_tool_surface

function nipux_cli/doctor.py:49

No docstring.

Calls: Check, DEFAULT_REGISTRY.names, join, len, set, sorted

_check_model_config

function nipux_cli/doctor.py:57

No docstring.

Calls: Check, host.endswith, lower, urlparse

_check_browser_runtime

function nipux_cli/doctor.py:72

No docstring.

Calls: Check, shutil.which

_check_model_endpoint

function nipux_cli/doctor.py:86

No docstring.

Calls: Check, _check_model_generation, _check_openrouter_auth, _model_available, config.model.base_url.rstrip, data.get, decode, isinstance, json.loads, len, response.read, urllib.request.Request, urllib.request.urlopen

_check_model_generation

function nipux_cli/doctor.py:116

No docstring.

Calls: Check, _extract_error_message, config.model.base_url.rstrip, data.get, decode, encode, exc.read, isinstance, json.dumps, json.loads, response.read, str, urllib.request.Request, urllib.request.urlopen

_check_openrouter_auth

function nipux_cli/doctor.py:163

No docstring.

Calls: Check, _extract_error_message, decode, exc.read, response.read, str, urllib.request.Request, urllib.request.urlopen

_extract_error_message

function nipux_cli/doctor.py:182

No docstring.

Calls: _extract_error_raw, _metadata_value, body.strip, data.get, details.append, error.get, isinstance, join, json.loads, primary.strip, str, strip

_metadata_value

function nipux_cli/doctor.py:212

No docstring.

Calls: isinstance, metadata.get, str, strip

_extract_error_raw

function nipux_cli/doctor.py:221

No docstring.

Calls: _extract_error_message, isinstance, json.dumps, json.loads, metadata.get, str, strip

_model_available

function nipux_cli/doctor.py:242

No docstring.

Calls: data.get, isinstance, item.get, str

run_doctor

function nipux_cli/doctor.py:249

No docstring.

Calls: _check_browser_runtime, _check_db, _check_model_config, _check_model_endpoint, _check_tool_surface, _check_writable_dir, checks.append, load_config

event_line

function nipux_cli/event_render.py:13

No docstring.

Calls: _one_line, event_display_parts

event_display_parts

function nipux_cli/event_render.py:19

No docstring.

Calls: _one_line, body.startswith, clean_step_summary, compact_time, event.get, event_label, format_metric_value, generic_display_text, isinstance, kind.startswith, metadata.get, min, shlex.quote, str, strip

event_label

function nipux_cli/event_render.py:77

No docstring.

Calls: kind.endswith, kind.startswith, kind.upper, labels.get, metadata.get, str

compact_time

function nipux_cli/event_render.py:114

No docstring.

Calls: _one_line, len, value.replace

FirstRunFrameDeps

class nipux_cli/first_run_controller.py:25

No docstring.

Calls: dataclass

handle_first_run_action

function nipux_cli/first_run_controller.py:36

No docstring.

Calls: TOGGLE_SETTING_COMMANDS.get, action.split, action.startswith, bool, config_field_value, deps.capture_command, deps.capture_setting_command, deps.model_setup_verified, deps.verify_model_setup

handle_first_run_frame_line

function nipux_cli/first_run_controller.py:80

No docstring.

Calls: deps.capture_command, deps.capture_setting_command, deps.current_default_job_id, deps.extract_objective, deps.model_setup_verified, deps.verify_model_setup, first_run_chat_reply, first_token, line.strip, lowered.startswith, original.lower, original.startswith, strip

first_run_chat_reply

function nipux_cli/first_run_controller.py:134

No docstring.

Calls: none

create_first_run_job

function nipux_cli/first_run_controller.py:139

No docstring.

Calls: deps.create_job, deps.model_setup_verified, objective.strip

capture_first_run_command

function nipux_cli/first_run_controller.py:153

No docstring.

Calls: StringIO, item.split, item.strip, join, print, redirect_stdout, run_shell_line, splitlines, stream.getvalue

first_token

function nipux_cli/first_run_controller.py:165

No docstring.

Calls: line.split, lower, shlex.split

FirstRunRuntimeDeps

class nipux_cli/first_run_frame_runtime.py:28

No docstring.

Calls: dataclass

run_first_run_frame

function nipux_cli/first_run_frame_runtime.py:36

No docstring.

Calls: _append_notice, _apply_first_run_action, _frame_enter_sequence, _frame_exit_sequence, _handle_edit_input, _handle_first_run_escape, _one_line, _safe_render_frame, _submit_first_run_line, autocomplete_slash, char.isprintable, clamp_selection, deps.actions, next_first_run_view_a...

clamp_selection

function nipux_cli/first_run_frame_runtime.py:178

No docstring.

Calls: len, max, min

_safe_render_frame

function nipux_cli/first_run_frame_runtime.py:184

No docstring.

Calls: _append_notice, _fallback_first_run_frame, _one_line, deps.render_frame, print, type

_fallback_first_run_frame

function nipux_cli/first_run_frame_runtime.py:203

No docstring.

Calls: _fit_plain, _one_line, join, lines.extend, max, shutil.get_terminal_size

_submit_first_run_line

function nipux_cli/first_run_frame_runtime.py:219

No docstring.

Calls: buffer.strip, clamp_selection, deps.actions, deps.handle_action, deps.handle_line, line.startswith

_handle_first_run_escape

function nipux_cli/first_run_frame_runtime.py:237

No docstring.

Calls: _apply_first_run_action, buffer.startswith, clamp_selection, cycle_slash, decode_terminal_escape, deps.actions, deps.click_action, deps.handle_action, drain_pending_input, isinstance, len, read_escape_sequence

directional_first_run_action

function nipux_cli/first_run_frame_runtime.py:295

Return the setup-screen action for left/right navigation.

Calls: key.startswith, label.lower, reversed

_apply_first_run_action

function nipux_cli/first_run_frame_runtime.py:309

No docstring.

Calls: _append_notice, isinstance, notices.clear, str, strip

_handle_edit_input

function nipux_cli/first_run_frame_runtime.py:340

No docstring.

Calls: _append_notice, _save_first_run_edit, char.isprintable, decode_terminal_escape, read_escape_sequence

required_first_run_edit_field

function nipux_cli/first_run_frame_runtime.py:372

No docstring.

Calls: get

next_first_run_view_after_edit

function nipux_cli/first_run_frame_runtime.py:380

No docstring.

Calls: get

_save_first_run_edit

function nipux_cli/first_run_frame_runtime.py:388

No docstring.

Calls: _is_local_endpoint, endswith, inline_setting_notice, load_config, parsed.path.rstrip, raw_value.strip, urlparse, value.lower

_is_local_endpoint

function nipux_cli/first_run_frame_runtime.py:415

No docstring.

Calls: host.endswith, lower, urlparse

_append_notice

function nipux_cli/first_run_frame_runtime.py:420

No docstring.

Calls: notices.append

_one_line

function nipux_cli/first_run_frame_runtime.py:425

No docstring.

Calls: join, len, max, split, str

_fit_plain

function nipux_cli/first_run_frame_runtime.py:432

No docstring.

Calls: _one_line, len, max, str

build_first_run_frame

function nipux_cli/first_run_tui.py:55

No docstring.

Calls: _clamp_first_run_selection, _compose_bar, _first_run_edit_hint, _first_run_hint, _first_run_prompt_label, _normalize_first_run_view, _wizard_body_lines, edit_target_masks_input, first_run_themed_lines, join, len, max

first_run_columns

function nipux_cli/first_run_tui.py:109

No docstring.

Calls: int, max, min

first_run_actions

function nipux_cli/first_run_tui.py:118

No docstring.

Calls: _normalize_first_run_view

first_run_themed_lines

function nipux_cli/first_run_tui.py:122

No docstring.

Calls: _themed_lines

_wizard_body_lines

function nipux_cli/first_run_tui.py:126

No docstring.

Calls: _access_page_lines, _api_page_lines, _append_notice_block, _doctor_page_lines, _endpoint_page_lines, _fit_page, _model_page_lines

_model_page_lines

function nipux_cli/first_run_tui.py:155

No docstring.

Calls: _accent, _bold, _center_ansi, _muted, _one_line, _panel, _step_count_label, _step_header, max, min

_endpoint_page_lines

function nipux_cli/first_run_tui.py:178

No docstring.

Calls: _bold, _center_ansi, _form_panel, _muted, _step_count_label, _step_header, min

_api_page_lines

function nipux_cli/first_run_tui.py:198

No docstring.

Calls: _bold, _center_ansi, _muted, _panel, _step_count_label, _step_header, _style, min

_access_page_lines

function nipux_cli/first_run_tui.py:223

No docstring.

Calls: _access_row, _action_cards, _bold, _center_ansi, _muted, _panel, _step_count_label, _step_header, first_run_actions, min

_doctor_page_lines

function nipux_cli/first_run_tui.py:243

No docstring.

Calls: _accent, _action_cards, _bold, _center_ansi, _fit_ansi, _muted, _panel, _step_count_label, _step_header, first_run_actions, min

_stepper_lines

function nipux_cli/first_run_tui.py:269

No docstring.

Calls: _accent, _fit_ansi, _muted, _step_state, lines.append

_step_header

function nipux_cli/first_run_tui.py:278

No docstring.

Calls: _accent, _bold, _center_ansi, _muted, enumerate, join, parts.append

_action_cards

function nipux_cli/first_run_tui.py:290

No docstring.

Calls: _action_tile, _center_ansi, _join_many_cards, enumerate, len, max, min, row.rstrip

_action_tile

function nipux_cli/first_run_tui.py:306

No docstring.

Calls: _accent, _action_value, _bold, _fit_ansi, _muted, _one_line, border, max

_panel

function nipux_cli/first_run_tui.py:329

No docstring.

Calls: _center_ansi, _fit_ansi, _muted, len, lines.append, max

_form_panel

function nipux_cli/first_run_tui.py:340

No docstring.

Calls: _accent, _bold, _muted, _one_line, _panel, max

_choice_card

function nipux_cli/first_run_tui.py:352

No docstring.

Calls: _accent, _bold, _fit_ansi, _muted, border, max

_join_cards

function nipux_cli/first_run_tui.py:366

No docstring.

Calls: _center_ansi, _strip_ansi, len, max, range, rows.append

_join_many_cards

function nipux_cli/first_run_tui.py:376

No docstring.

Calls: _fit_ansi, _strip_ansi, gap_text.join, len, max, range, row_parts.append, rows.append

_append_notice_block

function nipux_cli/first_run_tui.py:389

No docstring.

Calls: _accent, _bold, _fit_ansi, _one_line, len, max, min, notice_lines.append

_fit_page

function nipux_cli/first_run_tui.py:400

No docstring.

Calls: _fit_ansi, fitted.extend, len, range

_action_line

function nipux_cli/first_run_tui.py:407

No docstring.

Calls: _accent, _action_value, _bold, _fit_ansi, _muted, _one_line, max

_screen_value_lines

function nipux_cli/first_run_tui.py:425

No docstring.

Calls: _large_value, _muted

_large_value

function nipux_cli/first_run_tui.py:444

No docstring.

Calls: _accent, _bold, _fit_ansi, _muted, _one_line, len, max

_action_value

function nipux_cli/first_run_tui.py:449

No docstring.

Calls: bool, config_field_value, key.split, key.startswith, str

_step_state

function nipux_cli/first_run_tui.py:465

No docstring.

Calls: _is_local_endpoint, _one_line, bool, sum

_first_run_hint

function nipux_cli/first_run_tui.py:480

No docstring.

Calls: none

_first_run_edit_hint

function nipux_cli/first_run_tui.py:494

No docstring.

Calls: edit_target_hint

_first_run_prompt_label

function nipux_cli/first_run_tui.py:504

No docstring.

Calls: edit_target_label

_left_title

function nipux_cli/first_run_tui.py:514

No docstring.

Calls: _screen_heading

_screen_heading

function nipux_cli/first_run_tui.py:518

No docstring.

Calls: get

_screen_copy

function nipux_cli/first_run_tui.py:528

No docstring.

Calls: get

_install_summary

function nipux_cli/first_run_tui.py:538

No docstring.

Calls: _is_local_endpoint, _muted, _one_line

_normalize_first_run_view

function nipux_cli/first_run_tui.py:544

No docstring.

Calls: none

_step_count_label

function nipux_cli/first_run_tui.py:548

No docstring.

Calls: keys.index, len

_access_row

function nipux_cli/first_run_tui.py:557

No docstring.

Calls: _accent, _fit_ansi, _muted

_is_local_endpoint

function nipux_cli/first_run_tui.py:562

No docstring.

Calls: lowered.startswith, value.lower

_clamp_first_run_selection

function nipux_cli/first_run_tui.py:567

No docstring.

Calls: first_run_actions, len, max, min

load_frame_snapshot

function nipux_cli/frame_snapshot.py:16

Return the compact state bundle rendered by the chat TUI.

Calls: _summary_events, daemon_lock_status, db.get_job, db.job_record_counts, db.job_token_usage, db.list_artifacts, db.list_events, db.list_jobs, db.list_memory, db.list_steps, item.get, load_workspace_frame_snapshot, max, str

load_workspace_frame_snapshot

function nipux_cli/frame_snapshot.py:75

Return the chat/control frame before any worker job is focused.

Calls: _safe_job, _summary_events, _workspace_token_usage, daemon_lock_status, db.job_record_counts, db.job_token_usage, db.list_artifacts, db.list_events, db.list_jobs, db.list_memory, db.list_steps, focused_job.get, item.get, len, list, max

_safe_job

function nipux_cli/frame_snapshot.py:139

No docstring.

Calls: db.get_job

_workspace_token_usage

function nipux_cli/frame_snapshot.py:148

No docstring.

Calls: none

_summary_events

function nipux_cli/frame_snapshot.py:161

No docstring.

Calls: db.list_events, event.get, is_summary_event_candidate, len, max, merged.values, sorted, str

ToolCall

class nipux_cli/llm.py:18

No docstring.

Calls: dataclass, field

LLMResponse

class nipux_cli/llm.py:25

No docstring.

Calls: dataclass, field

LLMResponseError

class nipux_cli/llm.py:33

Raised when a provider returns an OpenAI-shaped response without choices.

Calls: __init__, super

__init__

function nipux_cli/llm.py:36

No docstring.

Calls: __init__, super

StepLLM

class nipux_cli/llm.py:41

No docstring.

Calls: none

next_action

function nipux_cli/llm.py:42

No docstring.

Calls: none

OpenAIChatLLM

class nipux_cli/llm.py:46

OpenAI-compatible chat-completions adapter.

Calls: LLMResponse, LLMResponseError, OpenAI, ToolCall, _enrich_openrouter_generation_usage, _is_unsupported_tool_choice_error, _response_id, _response_model, _response_payload, _response_usage, calls.append, dict, error.get, fallback_request.pop, isinstance, json.loads

__init__

function nipux_cli/llm.py:51

No docstring.

Calls: OpenAI

next_action

function nipux_cli/llm.py:67

No docstring.

Calls: LLMResponse, LLMResponseError, ToolCall, _enrich_openrouter_generation_usage, _is_unsupported_tool_choice_error, _response_id, _response_model, _response_payload, _response_usage, calls.append, dict, error.get, fallback_request.pop, isinstance, json.loads, payload.get

complete

function nipux_cli/llm.py:115

No docstring.

Calls: self.complete_response

complete_response

function nipux_cli/llm.py:118

No docstring.

Calls: LLMResponse, LLMResponseError, _enrich_openrouter_generation_usage, _response_id, _response_model, _response_payload, _response_usage, error.get, isinstance, payload.get, self._openai.chat.completions.create, str

ScriptedLLM

class nipux_cli/llm.py:146

Tiny deterministic LLM used by tests and CLI dry runs.

Calls: LLMResponse, list, self.responses.pop

__init__

function nipux_cli/llm.py:149

No docstring.

Calls: list

next_action

function nipux_cli/llm.py:152

No docstring.

Calls: LLMResponse, self.responses.pop

_response_payload

function nipux_cli/llm.py:159

No docstring.

Calls: hasattr, isinstance, repr, response.model_dump, response.to_dict

_response_usage

function nipux_cli/llm.py:169

No docstring.

Calls: _estimate_token_count, _response_payload, dict, getattr, hasattr, isinstance, json.dumps, payload.get, usage_obj.model_dump

_enrich_openrouter_generation_usage

function nipux_cli/llm.py:200

No docstring.

Calls: _safe_float, _safe_int, bool, data.get, decode, dict, enriched.get, int, isinstance, json.loads, payload.get, response.read, urllib.parse.quote, urllib.parse.urlparse, urllib.request.Request, urllib.request.urlopen

_safe_float

function nipux_cli/llm.py:242

No docstring.

Calls: float

_safe_int

function nipux_cli/llm.py:251

No docstring.

Calls: float, int

_is_unsupported_tool_choice_error

function nipux_cli/llm.py:260

No docstring.

Calls: any, lower, type

_estimate_token_count

function nipux_cli/llm.py:275

No docstring.

Calls: len, max

_response_model

function nipux_cli/llm.py:281

No docstring.

Calls: _response_payload, getattr, payload.get, str

_response_id

function nipux_cli/llm.py:286

No docstring.

Calls: _response_payload, getattr, payload.get, str

measurement_candidates

function nipux_cli/measurement.py:38

No docstring.

Calls: EXPLICIT_RESULT_UNIT_PATTERN.search, MEASUREMENT_INTENT_PATTERN.search, MEASUREMENT_PATTERN.finditer, _candidate_is_diagnostic_only, _table_measurement_candidates, bool, candidates.append, join, len, match.end, match.group, match.start, min, output.get, split, str

_table_measurement_candidates

function nipux_cli/measurement.py:68

No docstring.

Calls: TABLE_NUMBER_PATTERN.search, TABLE_UNIT_PATTERN.search, _is_markdown_separator_row, _split_markdown_table_row, _table_measurement_label, candidates.append, enumerate, header.strip, len, line.strip, number.group, splitlines, startswith, str, strip

_split_markdown_table_row

function nipux_cli/measurement.py:99

No docstring.

Calls: cell.strip, join, raw.endswith, raw.split, raw.startswith, split, str, strip

_is_markdown_separator_row

function nipux_cli/measurement.py:108

No docstring.

Calls: all, bool, cell.strip, re.fullmatch

_table_measurement_label

function nipux_cli/measurement.py:112

No docstring.

Calls: TABLE_NUMBER_PATTERN.fullmatch, enumerate, header.strip, len, lower, min, range, strip

measurement_candidates_are_diagnostic_only

function nipux_cli/measurement.py:126

No docstring.

Calls: MEASUREMENT_INTENT_PATTERN.search, _candidate_is_diagnostic_only, all, bool, str

_candidate_is_diagnostic_only

function nipux_cli/measurement.py:131

No docstring.

Calls: ACTION_MEASUREMENT_PATTERN.search, DIAGNOSTIC_MEASUREMENT_PATTERN.search, EXPLICIT_RESULT_UNIT_PATTERN.search, LABELED_MEASUREMENT_PATTERN.search, bool, re.search

memory_graph_from_job

function nipux_cli/memory_graph.py:53

No docstring.

Calls: graph.get, isinstance, job.get, metadata.get

memory_graph_for_prompt

function nipux_cli/memory_graph.py:65

No docstring.

Calls: _clean_string_list, _durable_signal_count, _edge_index, _node_contains_stale_token, _node_has_stale_id, clip_text, edge_index.get, isinstance, job.get, join, len, lines.append, max, memory_graph_from_job, metadata.get, node.get

_node_contains_stale_token

function nipux_cli/memory_graph.py:132

No docstring.

Calls: _clean_string_list, _node_has_negative_memory_marker, join, node.get, re.escape, re.search, str, strip, text.lower, token.startswith

_node_has_negative_memory_marker

function nipux_cli/memory_graph.py:157

No docstring.

Calls: any

_node_has_stale_id

function nipux_cli/memory_graph.py:161

No docstring.

Calls: node.get, str, strip

search_memory_graph

function nipux_cli/memory_graph.py:171

No docstring.

Calls: edge.get, graph.get, isinstance, max, node.get, rank_memory_nodes, str

rank_memory_nodes

function nipux_cli/memory_graph.py:185

No docstring.

Calls: _node_score, _tokens, max, sorted

_node_score

function nipux_cli/memory_graph.py:193

No docstring.

Calls: _clean_string_list, _float_between, _recency_score, int, join, lower, min, node.get, str

_edge_index

function nipux_cli/memory_graph.py:226

No docstring.

Calls: append, edge.get, index.setdefault, str

_tokens

function nipux_cli/memory_graph.py:239

No docstring.

Calls: re.findall, value.lower

_float_between

function nipux_cli/memory_graph.py:243

No docstring.

Calls: float, max, min

_recency_score

function nipux_cli/memory_graph.py:251

No docstring.

Calls: datetime.fromisoformat, datetime.now, max, total_seconds, value.replace

_clean_string_list

function nipux_cli/memory_graph.py:268

No docstring.

Calls: isinstance, join, split, str, strip

_durable_signal_count

function nipux_cli/memory_graph.py:274

No docstring.

Calls: isinstance, job.get, metadata.get, roadmap.get, sum

render_memory_graph_html

function nipux_cli/memory_graph_view.py:12

Return a standalone clickable graph page for a job's memory graph.

Calls: _view_edge, _view_node, escape, graph.get, job.get, json.dumps, memory_graph_from_job, replace, str

_view_node

function nipux_cli/memory_graph_view.py:271

No docstring.

Calls: _string_list, node.get, str

_view_edge

function nipux_cli/memory_graph_view.py:287

No docstring.

Calls: edge.get, str

_string_list

function nipux_cli/memory_graph_view.py:296

No docstring.

Calls: isinstance, join, split, str, strip

format_metric_value

function nipux_cli/metric_format.py:8

Return a readable metric string such as ``score=0.82`` or ``tokens=4200 tokens``.

Calls: metric_unit.startswith, str, strip

operator_entry_is_active

function nipux_cli/operator_context.py:27

No docstring.

Calls: entry.get, lower, replace, str, strip

operator_entry_is_prompt_relevant

function nipux_cli/operator_context.py:36

No docstring.

Calls: _actionable, _conversation_only, entry.get, lower, replace, str, strip

active_prompt_operator_entries

function nipux_cli/operator_context.py:50

No docstring.

Calls: isinstance, operator_entry_is_prompt_relevant

inactive_prompt_operator_ids

function nipux_cli/operator_context.py:59

No docstring.

Calls: entry.get, ids.append, isinstance, operator_entry_is_active, operator_entry_is_prompt_relevant, str

_conversation_only

function nipux_cli/operator_context.py:74

No docstring.

Calls: any, join, message.split, pattern.search

_actionable

function nipux_cli/operator_context.py:79

No docstring.

Calls: _conversation_only, any, join, message.split, pattern.search

_handler

function nipux_cli/parser_builder.py:13

No docstring.

Calls: none

build_arg_parser

function nipux_cli/parser_builder.py:17

No docstring.

Calls: _handler, activity.add_argument, activity.set_defaults, argparse.ArgumentParser, artifact.add_argument, artifact.set_defaults, artifacts.add_argument, artifacts.set_defaults, autostart.add_argument, autostart.set_defaults, browser_dashboard.add_argument, browser_dashboard.set_...

objective_profiles

function nipux_cli/planning.py:99

Infer generic work profiles from an objective without binding to a domain.

Calls: _PROFILE_PRIORITY.get, _PROFILE_TERMS.items, len, objective.lower, re.findall, scores.append, scores.sort, set

initial_plan_for_objective

function nipux_cli/planning.py:115

No docstring.

Calls: _initial_questions_for_profiles, _initial_summary_for_profiles, _initial_tasks_for_profiles, join, objective.split, objective_profiles

initial_task_contract

function nipux_cli/planning.py:131

No docstring.

Calls: any, task_title.lower

initial_roadmap_for_objective

function nipux_cli/planning.py:222

No docstring.

Calls: _primary_execution_contract, join, objective_profiles

_initial_summary_for_profiles

function nipux_cli/planning.py:277

No docstring.

Calls: none

_initial_tasks_for_profiles

function nipux_cli/planning.py:292

No docstring.

Calls: tasks.extend

_initial_questions_for_profiles

function nipux_cli/planning.py:339

No docstring.

Calls: questions.append, questions.insert

_primary_execution_contract

function nipux_cli/planning.py:359

No docstring.

Calls: none

format_initial_plan

function nipux_cli/planning.py:373

No docstring.

Calls: isinstance, join, lines.append, lines.extend, plan.get, str

ProgressCheckpoint

class nipux_cli/progress.py:10

No docstring.

Calls: dataclass

build_progress_checkpoint

function nipux_cli/progress.py:23

Create the operator-facing checkpoint text from durable ledger deltas.

Calls: ProgressCheckpoint, _as_int, _count_phrase, bool, changed_parts.extend, deltas.items, join, ledger_counts, ledger_resolution_counts, ledger_update_counts, metadata.get, previous.get, recent_progress_bits, resolutions.items, str, updates.items

ledger_counts

function nipux_cli/progress.py:76

No docstring.

Calls: _metadata_list, isinstance, len, metadata.get, roadmap.get

ledger_update_counts

function nipux_cli/progress.py:89

Count durable ledger updates that do not increase ledger size.

Calls: _as_int, _record_after_checkpoint, _updated_existing_record, isinstance, metadata.get, record_map.items, roadmap.get

ledger_resolution_counts

function nipux_cli/progress.py:116

Count durable branch resolutions so task updates do not look like empty churn.

Calls: _record_after_checkpoint, _updated_existing_record, experiment.get, isinstance, lower, metadata.get, str, task.get

recent_progress_bits

function nipux_cli/progress.py:135

No docstring.

Calls: _as_int, _clip_text, _metadata_list, bits.append, entry.get, experiment.get, finding.get, get, isinstance, join, lower, metadata.get, milestone.get, roadmap.get, sorted, str

_metadata_list

function nipux_cli/progress.py:171

No docstring.

Calls: isinstance, metadata.get

_clip_text

function nipux_cli/progress.py:178

No docstring.

Calls: join, len, max, rstrip, value.split

_updated_existing_record

function nipux_cli/progress.py:185

No docstring.

Calls: _record_after_checkpoint, isinstance, record.get

_record_after_checkpoint

function nipux_cli/progress.py:194

No docstring.

Calls: bool, record.get, str

_count_phrase

function nipux_cli/progress.py:201

No docstring.

Calls: bits.append, join, key.endswith

_as_int

function nipux_cli/progress.py:209

No docstring.

Calls: int

provider_error_text

function nipux_cli/provider_errors.py:43

No docstring.

Calls: error.lower, getattr, isinstance, join, json.dumps, lower, parts.append, str, type

provider_action_required

function nipux_cli/provider_errors.py:53

No docstring.

Calls: any, provider_error_text

provider_action_required_note

function nipux_cli/provider_errors.py:58

No docstring.

Calls: provider_action_required

provider_rate_limited

function nipux_cli/provider_errors.py:62

No docstring.

Calls: any, provider_error_text

RecordCommandDeps

class nipux_cli/record_commands.py:22

No docstring.

Calls: dataclass

cmd_findings_impl

function nipux_cli/record_commands.py:28

No docstring.

Calls: _metadata_records, _one_line, _resolve_or_print, db.close, db.get_job, deps.db_factory, enumerate, finding.get, float, isinstance, join, json.dumps, len, print, rule, sorted

cmd_tasks_impl

function nipux_cli/record_commands.py:68

No docstring.

Calls: _metadata_records, _one_line, _resolve_or_print, db.close, db.get_job, deps.db_factory, enumerate, int, join, json.dumps, len, lower, print, rule, sorted, status.strip

cmd_roadmap_impl

function nipux_cli/record_commands.py:119

No docstring.

Calls: _one_line, _print_milestones, _resolve_or_print, db.close, db.get_job, deps.db_factory, isinstance, job.get, json.dumps, len, metadata.get, print, roadmap.get, rule

cmd_experiments_impl

function nipux_cli/record_commands.py:151

No docstring.

Calls: _metadata_records, _one_line, _resolve_or_print, bool, db.close, db.get_job, deps.db_factory, enumerate, experiment.get, join, json.dumps, len, lower, print, rule, sorted

cmd_sources_impl

function nipux_cli/record_commands.py:208

No docstring.

Calls: _metadata_records, _one_line, _resolve_or_print, db.close, db.get_job, deps.db_factory, enumerate, float, int, isinstance, join, json.dumps, len, print, rule, sorted

cmd_memory_impl

function nipux_cli/record_commands.py:252

No docstring.

Calls: _metadata_records, _print_memory_sections, _resolve_or_print, _write_memory_graph_view, active_operator_messages, bool, db.close, db.get_job, db.list_memory, deps.db_factory, getattr, isinstance, job.get, json.dumps, len, memory_graph_from_job

_write_memory_graph_view

function nipux_cli/record_commands.py:296

No docstring.

Calls: ArtifactStore, Path, db.add_artifact, expanduser, len, path.parent.mkdir, path.write_text, print, render_memory_graph_html, sha256_text, store.write_text, str

cmd_metrics_impl

function nipux_cli/record_commands.py:340

No docstring.

Calls: _metadata_records, _print_best_records, _resolve_or_print, _step_count, artifact.get, bool, daemon_lock_status, db.close, db.get_job, db.list_artifacts, db.list_steps, deps.db_factory, isinstance, job.get, len, lower

cmd_usage_impl

function nipux_cli/record_commands.py:381

No docstring.

Calls: _resolve_or_print, db.close, db.get_job, db.job_token_usage, deps.db_factory, format_usage_report, int, job.get, join, json.dumps, print, str

_resolve_or_print

function nipux_cli/record_commands.py:407

No docstring.

Calls: deps.job_ref_text, deps.resolve_job_id, print

_metadata_records

function nipux_cli/record_commands.py:416

No docstring.

Calls: isinstance, job.get, metadata.get

_print_milestones

function nipux_cli/record_commands.py:424

No docstring.

Calls: _one_line, enumerate, feature.get, int, isinstance, join, len, max, milestone.get, min, print, sorted, status_order.get, str, sum

_print_memory_sections

function nipux_cli/record_commands.py:467

No docstring.

Calls: _one_line, entry.get, graph.get, isinstance, join, lesson.get, min, node.get, pending_measurement.get, print, reflection.get, str

_print_best_records

function nipux_cli/record_commands.py:519

No docstring.

Calls: _one_line, best.get, best_experiment.get, best_finding.get, experiment.get, finding.get, float, max, print, source.get

_step_count

function nipux_cli/record_commands.py:540

No docstring.

Calls: int, max, step.get

job_deferred_until

function nipux_cli/scheduling.py:9

No docstring.

Calls: astimezone, datetime.fromisoformat, datetime.now, isinstance, job.get, metadata.get, raw_until.replace, str, strip, until.astimezone, until.replace

job_is_deferred

function nipux_cli/scheduling.py:25

No docstring.

Calls: job_deferred_until

job_provider_blocked

function nipux_cli/scheduling.py:29

Return true when provider calls need operator action before retrying.

Calls: _metadata_time, isinstance, job.get, metadata.get, str, strip

provider_retry_metadata

function nipux_cli/scheduling.py:46

Metadata patch used when the operator explicitly retries provider work.

Calls: datetime.now, isoformat

operator_resume_metadata

function nipux_cli/scheduling.py:55

Metadata patch used when the operator explicitly makes a job runnable.

Calls: patch.update, provider_retry_metadata

_metadata_time

function nipux_cli/scheduling.py:69

No docstring.

Calls: datetime.fromisoformat, parsed.astimezone, parsed.replace, value.replace

launch_agent_path

function nipux_cli/service_install.py:16

No docstring.

Calls: Path.home

launch_agent_plist

function nipux_cli/service_install.py:20

No docstring.

Calls: Path.cwd, command.append, config.ensure_dirs, join, load_config, str, xml_escape

systemd_service_path

function nipux_cli/service_install.py:64

No docstring.

Calls: Path.home

systemd_service_text

function nipux_cli/service_install.py:68

No docstring.

Calls: Path.cwd, command.append, config.ensure_dirs, join, load_config, shlex.quote, str

cmd_autostart

function nipux_cli/service_install.py:102

No docstring.

Calls: SystemExit, launch_agent_path, launch_agent_plist, os.getuid, path.exists, path.parent.mkdir, path.unlink, path.write_text, print, str, subprocess.run

cmd_service

function nipux_cli/service_install.py:139

No docstring.

Calls: SystemExit, path.exists, path.parent.mkdir, path.unlink, path.write_text, print, result.stderr.strip, result.stdout.strip, shutil.which, subprocess.run, systemd_service_path, systemd_service_text

xml_escape

function nipux_cli/service_install.py:178

No docstring.

Calls: replace, value.replace

config_field_value

function nipux_cli/settings.py:16

No docstring.

Calls: load_config, str, values.get

save_config_field

function nipux_cli/settings.py:40

No docstring.

Calls: _coerce_config_value, _load_config_yaml, _save_config_yaml, data.setdefault, field.split, isinstance

inline_setting_notice

function nipux_cli/settings.py:53

No docstring.

Calls: _save_env_secret, _short_path, clear_model_setup_verified, get_agent_home, load_config, raw_value.strip, save_config_field

edit_target_label

function nipux_cli/settings.py:71

No docstring.

Calls: none

edit_target_hint

function nipux_cli/settings.py:77

No docstring.

Calls: config_field_value, load_config

edit_target_masks_input

function nipux_cli/settings.py:86

No docstring.

Calls: none

_config_path

function nipux_cli/settings.py:90

No docstring.

Calls: get_agent_home

_load_config_yaml

function nipux_cli/settings.py:94

No docstring.

Calls: _config_path, default_config_yaml, isinstance, path.exists, path.read_text, yaml.safe_load

_save_config_yaml

function nipux_cli/settings.py:103

No docstring.

Calls: _config_path, write_private_text, yaml.safe_dump

_save_env_secret

function nipux_cli/settings.py:108

No docstring.

Calls: current.strip, env_path.exists, env_path.parent.mkdir, env_path.read_text, existing.items, get_agent_home, join, key.strip, raw.split, raw.strip, splitlines, startswith, write_private_text

_coerce_config_value

function nipux_cli/settings.py:124

No docstring.

Calls: Path, SETTINGS_FIELD_TYPES.get, ValueError, expanduser, float, int, raw_value.strip, str, value.lower

_short_path

function nipux_cli/settings.py:145

No docstring.

Calls: Path.home, len, max, str, text.startswith

handle_chat_setting_command

function nipux_cli/settings_commands.py:14

No docstring.

Calls: config_field_value, config_summary_lines, inline_setting_notice, join, load_config, print

config_summary_lines

function nipux_cli/settings_commands.py:38

No docstring.

Calls: _cost_limit_text, _rate_text, load_config

_rate_text

function nipux_cli/settings_commands.py:64

No docstring.

Calls: none

_cost_limit_text

function nipux_cli/settings_commands.py:68

No docstring.

Calls: none

capture_setting_command

function nipux_cli/settings_commands.py:72

No docstring.

Calls: StringIO, handle_chat_setting_command, item.split, item.strip, join, line.startswith, print, redirect_stdout, shlex.split, splitlines, stream.getvalue

write_file

function nipux_cli/shell_tools.py:17

No docstring.

Calls: Path, Path.cwd, _json, args.get, bool, expanduser, fh.write, lower, path.exists, path.is_absolute, path.is_dir, path.open, path.parent.mkdir, path.stat, str, strip

shell_exec

function nipux_cli/shell_tools.py:47

No docstring.

Calls: Path, _json, _kill_process_group, _register_shell_process, _shell_error, _terminate_process_group, _truncate_output, _unregister_shell_process, args.get, dict, exists, expanduser, float, int, isinstance, max

cleanup_registered_shell_processes

function nipux_cli/shell_tools.py:126

No docstring.

Calls: _as_int, _pid_exists, _read_shell_process_registry, _shell_process_registry_path, _write_shell_process_registry, cleaned.append, contextlib.suppress, datetime.now, dict, isoformat, os.killpg, record.get, survivors.append, time.sleep

_shell_error

function nipux_cli/shell_tools.py:157

No docstring.

Calls: _missing_executable_probe, _shell_success_anomaly, combined.lower, combined.split, join, part.strip

_shell_success_anomaly

function nipux_cli/shell_tools.py:173

No docstring.

Calls: _empty_observation_probe, _missing_executable_probe, _shell_build_error_anomaly, _shell_http_error_anomaly, _shell_missing_command_anomaly, _shell_sudo_password_anomaly, any, combined.lower, combined.split, join, part.strip

_missing_executable_probe

function nipux_cli/shell_tools.py:212

No docstring.

Calls: combined_output.lower, combined_output.strip, match.group, re.match, str, strip

_empty_observation_probe

function nipux_cli/shell_tools.py:222

No docstring.

Calls: re.match, str, strip

_shell_missing_command_anomaly

function nipux_cli/shell_tools.py:231

No docstring.

Calls: bool, re.search

_shell_sudo_password_anomaly

function nipux_cli/shell_tools.py:240

No docstring.

Calls: none

_shell_build_error_anomaly

function nipux_cli/shell_tools.py:244

No docstring.

Calls: bool, re.search, text.lower

_shell_http_error_anomaly

function nipux_cli/shell_tools.py:253

No docstring.

Calls: any

_terminate_process_group

function nipux_cli/shell_tools.py:259

No docstring.

Calls: os.killpg

_kill_process_group

function nipux_cli/shell_tools.py:266

No docstring.

Calls: os.killpg

_register_shell_process

function nipux_cli/shell_tools.py:273

No docstring.

Calls: _shell_process_registry_path, datetime.now, getattr, handle.write, isoformat, json.dumps, path.open, path.parent.mkdir

_unregister_shell_process

function nipux_cli/shell_tools.py:291

No docstring.

Calls: _as_int, _read_shell_process_registry, _shell_process_registry_path, _write_shell_process_registry, record.get

_shell_process_registry_path

function nipux_cli/shell_tools.py:297

No docstring.

Calls: Path, expanduser

_read_shell_process_registry

function nipux_cli/shell_tools.py:301

No docstring.

Calls: isinstance, json.loads, path.exists, path.read_text, records.append, splitlines

_write_shell_process_registry

function nipux_cli/shell_tools.py:315

No docstring.

Calls: contextlib.suppress, join, json.dumps, path.parent.mkdir, path.unlink, path.write_text

_pid_exists

function nipux_cli/shell_tools.py:324

No docstring.

Calls: os.kill

_as_int

function nipux_cli/shell_tools.py:332

No docstring.

Calls: int

_truncate_output

function nipux_cli/shell_tools.py:339

No docstring.

Calls: isinstance, len, str, value.decode

_json

function nipux_cli/shell_tools.py:347

No docstring.

Calls: json.dumps

anti_bot_reason

function nipux_cli/source_quality.py:19

Return a short reason if text looks like an anti-bot interstitial.

Calls: join, lower

task_key

function nipux_cli/task_match.py:30

No docstring.

Calls: lower, re.sub, strip

find_semantic_task_match

function nipux_cli/task_match.py:34

No docstring.

Calls: _task_similarity, _task_tokens, isinstance, len, lower, replace, round, str, strip, task.get, task_key

_task_tokens

function nipux_cli/task_match.py:87

No docstring.

Calls: len, lower, re.findall, str

_task_similarity

function nipux_cli/task_match.py:96

No docstring.

Calls: len, max, min

program_for_job

function nipux_cli/templates.py:6

No docstring.

Calls: _TEMPLATES.get, body, lower, strip

_generic_template

function nipux_cli/templates.py:12

No docstring.

Calls: none

_research_paper_template

function nipux_cli/templates.py:37

No docstring.

Calls: none

ToolContext

class nipux_cli/tools.py:25

No docstring.

Calls: dataclass

ToolSpec

class nipux_cli/tools.py:58

No docstring.

Calls: dataclass

as_openai_tool

function nipux_cli/tools.py:64

No docstring.

Calls: none

_missing_argument

function nipux_cli/tools.py:75

No docstring.

Calls: isinstance, re.fullmatch, stripped.lower, value.strip

_placeholder_argument

function nipux_cli/tools.py:96

No docstring.

Calls: _missing_argument, bool, isinstance, re.search, strip, value.strip

_schema_placeholder_arguments

function nipux_cli/tools.py:109

No docstring.

Calls: REFERENCE_LIKE_FIELD_PATTERN.search, _placeholder_argument, _schema_placeholder_arguments, enumerate, isinstance, placeholders.append, placeholders.extend, properties.items, schema.get, str, value.get

_schema_missing_arguments

function nipux_cli/tools.py:129

No docstring.

Calls: _missing_argument, _schema_missing_arguments, enumerate, isinstance, missing.append, missing.extend, properties.items, schema.get, str, value.get

_json

function nipux_cli/tools.py:161

No docstring.

Calls: json.dumps

_write_artifact

function nipux_cli/tools.py:165

No docstring.

Calls: _json, args.get, ctx.artifacts.write_text, isinstance, str

_read_artifact

function nipux_cli/tools.py:187

No docstring.

Calls: _json, _recent_artifact_refs, _resolve_artifact_ref, args.get, ctx.artifacts.read_text, resolved.get, str

_recent_artifact_refs

function nipux_cli/tools.py:211

No docstring.

Calls: artifact.get, ctx.db.list_artifacts, enumerate, refs.append, str

_resolve_artifact_ref

function nipux_cli/tools.py:224

No docstring.

Calls: artifact.get, artifact_ref.strip, ctx.db.list_artifacts, int, join, len, lower, ref.isdigit, ref.lower, str, strip

_search_artifacts

function nipux_cli/tools.py:248

No docstring.

Calls: _json, args.get, ctx.artifacts.search_text, int, str

_update_job_state

function nipux_cli/tools.py:254

No docstring.

Calls: _append_completion_audit_task, _json, args.get, ctx.db.append_agent_update, ctx.db.update_job_status, follow_up_task.get, lower, str, strip

_defer_job

function nipux_cli/tools.py:297

No docstring.

Calls: _defer_until, _json, args.get, ctx.db.append_agent_update, ctx.db.get_job, ctx.db.update_job_status, job.get, str, strip, until.isoformat

_defer_until

function nipux_cli/tools.py:332

No docstring.

Calls: args.get, datetime.fromisoformat, datetime.now, float, max, parsed.astimezone, parsed.replace, raw_until.replace, str, strip, timedelta

_report_update

function nipux_cli/tools.py:350

No docstring.

Calls: _append_completion_audit_task, _json, _perpetual_checkpoint_message, args.get, ctx.db.append_agent_update, follow_up_task.get, isinstance, lower, metadata.get, str, strip

_append_completion_audit_task

function nipux_cli/tools.py:371

No docstring.

Calls: ctx.db.append_task_record

_perpetual_checkpoint_message

function nipux_cli/tools.py:409

Keep worker reports checkpoint-oriented without hiding the underlying audit trail.

Calls: join, leading_claim.search, leading_claim.sub, re.compile, split, str, strip, whole_job_claim.search, whole_job_claim.sub

_record_lesson

function nipux_cli/tools.py:432

No docstring.

Calls: _json, _lesson_explains_measurement_obligation, _pending_measurement, _resolve_measurement_obligation, args.get, ctx.db.append_lesson, float, isinstance, lower, str, strip

_lesson_explains_measurement_obligation

function nipux_cli/tools.py:463

No docstring.

Calls: any, isinstance, join, lower, metadata.values, parts.append, str

_record_memory_graph

function nipux_cli/tools.py:512

No docstring.

Calls: _json, args.get, ctx.db.append_agent_update, ctx.db.append_memory_graph_records, isinstance, record.get

_search_memory_graph

function nipux_cli/tools.py:536

No docstring.

Calls: _json, args.get, ctx.db.get_job, int, memory_graph_from_job, search_memory_graph, str

_acknowledge_operator_context

function nipux_cli/tools.py:545

No docstring.

Calls: _acknowledgeable_operator_messages, _json, args.get, ctx.db.acknowledge_operator_messages, ctx.db.append_agent_update, ctx.db.get_job, entry.get, isinstance, lower, result.get, str, strip

_acknowledgeable_operator_messages

function nipux_cli/tools.py:583

No docstring.

Calls: entry.get, isinstance, job.get, lower, metadata.get, pending.append, replace, str, strip

_record_source

function nipux_cli/tools.py:605

No docstring.

Calls: _json, _source_has_assessment, args.get, ctx.db.append_source_record, float, int, isinstance, str, strip

_source_has_assessment

function nipux_cli/tools.py:649

No docstring.

Calls: any, bool, metadata.values, outcome.strip, str, strip

_record_findings

function nipux_cli/tools.py:669

No docstring.

Calls: _finding_has_evidence, _json, args.get, ctx.db.append_agent_update, ctx.db.append_finding_record, ctx.db.append_source_record, entry.get, finding.get, float, isinstance, len, min, rejected.append, round, source_records.append, source_yields.get

_finding_has_evidence

function nipux_cli/tools.py:779

No docstring.

Calls: any, evidence_artifact.strip, metadata.get, reason.strip, source_url.strip, str, strip, url.strip

_record_tasks

function nipux_cli/tools.py:802

No docstring.

Calls: _complete_task_contract, _json, _pending_measurement, _resolve_measurement_obligation, _semantic_task_match_under_pressure, _task_queue_pressure_active, _task_targets_measurement_obligation, _task_would_be_unchanged, _validated_task_status, args.get, ctx.db.append_agent_update...

_task_targets_measurement_obligation

function nipux_cli/tools.py:977

No docstring.

Calls: _metadata_scalar_text, _text_mentions_measurement, _text_mentions_measurement_accounting, join, lower, output_contract.strip

_task_would_be_unchanged

function nipux_cli/tools.py:1006

No docstring.

Calls: _task_change_fingerprint, acceptance_criteria.strip, after.get, ctx.db.get_job, dict, entry.get, evidence_needed.strip, goal.strip, int, isinstance, items, job.get, job_metadata.get, lower, merged_metadata.update, next

_task_change_fingerprint

function nipux_cli/tools.py:1081

No docstring.

Calls: entry.get, json.dumps

_task_queue_pressure_active

function nipux_cli/tools.py:1085

No docstring.

Calls: _task_is_guard_recovery, ctx.db.get_job, isinstance, job.get, len, lower, metadata.get, replace, str, strip, task.get

_semantic_task_match_under_pressure

function nipux_cli/tools.py:1101

No docstring.

Calls: _task_is_guard_recovery, ctx.db.get_job, find_semantic_task_match, isinstance, job.get, metadata.get

_task_is_guard_recovery

function nipux_cli/tools.py:1115

No docstring.

Calls: bool, isinstance, lower, metadata.get, startswith, str, strip, task.get

_text_mentions_measurement

function nipux_cli/tools.py:1120

No docstring.

Calls: any

_text_mentions_measurement_accounting

function nipux_cli/tools.py:1135

No docstring.

Calls: any

_metadata_scalar_text

function nipux_cli/tools.py:1173

No docstring.

Calls: isinstance, join, metadata.values, parts.append, str

_complete_task_contract

function nipux_cli/tools.py:1181

No docstring.

Calls: acceptance_criteria.strip, dict, evidence_needed.strip, inferred.append, initial_task_contract, isinstance, metadata.get, output_contract.strip, set, sorted, stall_behavior.strip, str

_validated_task_status

function nipux_cli/tools.py:1212

No docstring.

Calls: _recent_deliverable_evidence, _task_contract_completion_has_evidence, _task_metadata_has_completion_evidence, dict, lower, output_contract.strip, replace, result.strip, status.strip

_task_contract_completion_has_evidence

function nipux_cli/tools.py:1245

No docstring.

Calls: _shell_command_counts_as_action_evidence, _task_metadata_has_completion_evidence, args.get, ctx.db.list_steps, input_data.get, isinstance, recent_evidence_tools.get, reversed, step.get, str

_task_metadata_has_completion_evidence

function nipux_cli/tools.py:1275

No docstring.

Calls: any, evidence_keys.update, metadata.get, str, strip

_recent_deliverable_evidence

function nipux_cli/tools.py:1295

No docstring.

Calls: _artifact_args_look_like_deliverable, _shell_command_looks_like_write, args.get, ctx.db.list_steps, input_data.get, isinstance, reversed, step.get, str

_artifact_args_look_like_deliverable

function nipux_cli/tools.py:1313

No docstring.

Calls: any, args.get, join, lower, str

_shell_command_looks_like_write

function nipux_cli/tools.py:1322

No docstring.

Calls: any, command.strip, re.search

_shell_command_counts_as_action_evidence

function nipux_cli/tools.py:1338

No docstring.

Calls: _shell_command_looks_like_write, bool, command.strip, re.compile, re.match, re.search, read_only.search

_record_roadmap

function nipux_cli/tools.py:1361

No docstring.

Calls: _json, args.get, ctx.db.append_agent_update, ctx.db.append_roadmap_record, isinstance, len, roadmap.get, str, strip

_record_milestone_validation

function nipux_cli/tools.py:1395

No docstring.

Calls: _json, _validation_has_positive_evidence, args.get, ctx.db.append_agent_update, ctx.db.append_milestone_validation_record, ctx.db.append_task_record, follow_up_tasks.append, get, int, isinstance, len, lower, replace, str, strip, task.get

_validation_has_positive_evidence

function nipux_cli/tools.py:1490

No docstring.

Calls: any, evidence.strip, metadata.get, result.strip, str, strip

_record_experiment

function nipux_cli/tools.py:1506

No docstring.

Calls: _experiment_has_closed_trial_context, _json, _optional_float, _resolve_measurement_obligation, args.get, bool, ctx.db.append_agent_update, ctx.db.append_experiment_record, format_metric_value, isinstance, lower, record.get, str, strip

_experiment_has_closed_trial_context

function nipux_cli/tools.py:1595

No docstring.

Calls: any, args.get, config.values, isinstance, metadata.values, str, strip

_optional_float

function nipux_cli/tools.py:1609

No docstring.

Calls: float, isinstance, replace, value.strip

_pending_measurement

function nipux_cli/tools.py:1625

No docstring.

Calls: ctx.db.get_job, isinstance, job.get, metadata.get, obligation.get

_resolve_measurement_obligation

function nipux_cli/tools.py:1637

No docstring.

Calls: _pending_measurement, ctx.db.append_agent_update, ctx.db.update_job_metadata, dict, resolved.update, time.gmtime, time.strftime

_send_digest_email

function nipux_cli/tools.py:1672

No docstring.

Calls: _json, args.get, ctx.artifacts.write_text, send_digest_email, str

_browser_call

function nipux_cli/tools.py:1691

No docstring.

Calls: KeyError, _json, args.get, bool, browser.back, browser.click, browser.console, browser.fill, browser.navigate, browser.press, browser.scroll, browser.snapshot, str

_web_call

function nipux_cli/tools.py:1714

No docstring.

Calls: KeyError, _json, args.get, int, isinstance, str, web_extract, web_search

_browser_handler

function nipux_cli/tools.py:1726

No docstring.

Calls: _browser_call

_web_handler

function nipux_cli/tools.py:1730

No docstring.

Calls: _web_call

ToolRegistry

class nipux_cli/tools.py:2151

No docstring.

Calls: REQUIRED_ARGUMENT_ALIASES.get, REQUIRED_ARGUMENT_GROUPS.get, _json, _missing_argument, _schema_missing_arguments, _schema_placeholder_arguments, _tool_access_group, _tool_enabled, aliases.get, all, args.get, as_openai_tool, dict, handler, isinstance, join

__init__

function nipux_cli/tools.py:2152

No docstring.

Calls: none

names

function nipux_cli/tools.py:2155

No docstring.

Calls: sorted

openai_tools

function nipux_cli/tools.py:2158

No docstring.

Calls: _tool_enabled, as_openai_tool, self.names

validate_arguments

function nipux_cli/tools.py:2161

No docstring.

Calls: REQUIRED_ARGUMENT_ALIASES.get, REQUIRED_ARGUMENT_GROUPS.get, _missing_argument, _schema_missing_arguments, _schema_placeholder_arguments, _tool_enabled, aliases.get, all, args.get, dict, isinstance, join, missing.append, missing.extend, spec.parameters.get, str

handle

function nipux_cli/tools.py:2195

No docstring.

Calls: _json, _tool_access_group, _tool_enabled, handler

_tool_enabled

function nipux_cli/tools.py:2208

No docstring.

Calls: _tool_access_group, bool, getattr

_tool_access_group

function nipux_cli/tools.py:2217

No docstring.

Calls: name.startswith

slash_suggestion_lines

function nipux_cli/tui_commands.py:163

No docstring.

Calls: SLASH_ARGUMENT_HINTS.get, _accent, _bold, _fit_ansi, _muted, _slash_argument_footer, enumerate, input_buffer.rstrip, input_buffer.startswith, len, lines.append, lower, max, min, next, split

autocomplete_slash

function nipux_cli/tui_commands.py:227

No docstring.

Calls: _slash_command_matches, input_buffer.startswith

slash_completion_for_submit

function nipux_cli/tui_commands.py:236

Return the buffer to use and whether Enter should submit it now.

Calls: _slash_argument_text, _slash_command_matches, input_buffer.rstrip, input_buffer.startswith, lower, split, strip

_slash_argument_text

function nipux_cli/tui_commands.py:265

No docstring.

Calls: len, split

_slash_argument_footer

function nipux_cli/tui_commands.py:270

No docstring.

Calls: len

cycle_slash

function nipux_cli/tui_commands.py:276

No docstring.

Calls: _slash_command_matches, input_buffer.rstrip, input_buffer.startswith, len, matches.index

_slash_command_matches

function nipux_cli/tui_commands.py:291

No docstring.

Calls: input_buffer.strip, lower, startswith

event_tool_args

function nipux_cli/tui_event_format.py:15

No docstring.

Calls: input_data.get, isinstance, metadata.get

shell_write_target

function nipux_cli/tui_event_format.py:21

No docstring.

Calls: candidate.startswith, command.split, command.strip, enumerate, re.search, redirect.group, shlex.split, strip, target.startswith

event_title_body

function nipux_cli/tui_event_format.py:43

No docstring.

Calls: none

experiment_metric_text

function nipux_cli/tui_event_format.py:49

No docstring.

Calls: format_metric_value, join, metadata.get, str

event_clock

function nipux_cli/tui_event_format.py:59

No docstring.

Calls: _compact_time, _one_line, event.get, len, str

event_hour

function nipux_cli/tui_event_format.py:66

No docstring.

Calls: _compact_time, event.get, len, str

friendly_error_text

function nipux_cli/tui_event_format.py:75

No docstring.

Calls: _one_line, clean_step_summary, text.lower

brief_reflection_text

function nipux_cli/tui_event_format.py:95

No docstring.

Calls: _one_line, clean_step_summary, counts.replace, match.group, re.search

generic_display_text

function nipux_cli/tui_event_format.py:106

No docstring.

Calls: join, split, str

clean_step_summary

function nipux_cli/tui_event_format.py:110

No docstring.

Calls: join, split, str, text.split, text.startswith

chat_message_paragraphs

function nipux_cli/tui_event_format.py:117

No docstring.

Calls: join, paragraphs.append, raw.strip, re.sub, replace, split, str, text.splitlines

chat_agent_message_text

function nipux_cli/tui_event_format.py:130

No docstring.

Calls: _one_line, body.split, clean_step_summary, len, re.findall, title.lower

tool_live_summary

function nipux_cli/tui_event_format.py:145

No docstring.

Calls: _regex_group, _short_command, _short_url, args.get, clean_step_summary, event_tool_args, get, isinstance, len, metadata.get, output.get, short_path, str

short_path

function nipux_cli/tui_event_format.py:213

No docstring.

Calls: Path.home, len, max, str, text.startswith

_regex_group

function nipux_cli/tui_event_format.py:224

No docstring.

Calls: match.group, re.search

_short_url

function nipux_cli/tui_event_format.py:229

No docstring.

Calls: replace, stripped.split, url.replace

_short_command

function nipux_cli/tui_event_format.py:236

No docstring.

Calls: command.split, join, len, next, part.startswith, parts.index, remote.split, shlex.split, strip

_compact_time

function nipux_cli/tui_event_format.py:258

No docstring.

Calls: _one_line, len, value.replace

chat_event_parts

function nipux_cli/tui_events.py:74

No docstring.

Calls: _is_low_value_chat_notice, _is_waiting_notice, _normalized_chat_body, event.get, event_clock, friendly_error_text, isinstance, metadata.get, str, strip

append_chat_output

function nipux_cli/tui_events.py:91

No docstring.

Calls: _chat_label, _fit_ansi, _muted, chat_message_paragraphs, len, lines.append, max, textwrap.wrap

chat_pane_lines

function nipux_cli/tui_events.py:105

No docstring.

Calls: _chat_item_lines, _fit_ansi, _flatten_chat_blocks, _is_generic_chat_notice, _is_low_value_chat_notice, _is_waiting_notice, _muted, _normalized_chat_body, chat_empty_state_lines, chat_event_parts, items.append, len, max, min, notice.removeprefix, notice.startswith

_chat_item_lines

function nipux_cli/tui_events.py:172

No docstring.

Calls: _accent, _fit_ansi, _muted, _style, append_chat_output, str

_flatten_chat_blocks

function nipux_cli/tui_events.py:182

No docstring.

Calls: rows.append, rows.extend

_chat_label

function nipux_cli/tui_events.py:191

No docstring.

Calls: _muted, _style, label.lower

_normalized_chat_body

function nipux_cli/tui_events.py:203

No docstring.

Calls: join, split, str

_is_low_value_chat_notice

function nipux_cli/tui_events.py:207

No docstring.

Calls: any, normalized.lower

_is_waiting_notice

function nipux_cli/tui_events.py:212

No docstring.

Calls: lowered.startswith, normalized.lower

_is_generic_chat_notice

function nipux_cli/tui_events.py:221

No docstring.

Calls: any, lowered.startswith, normalized.lower

chat_empty_state_lines

function nipux_cli/tui_events.py:226

No docstring.

Calls: _accent, _bold, _center_ansi, _centered_wrapped_hint, len, max

_centered_wrapped_hint

function nipux_cli/tui_events.py:238

No docstring.

Calls: _center_ansi, _muted, max, min, textwrap.wrap

worker_activity_lines

function nipux_cli/tui_events.py:243

No docstring.

Calls: _one_line, get, int, item.get, items.append, live_badge, live_display_text, max, minimal_live_event_line, rendered.append, str

minimal_live_event_line

function nipux_cli/tui_events.py:261

No docstring.

Calls: _one_line, brief_reflection_text, event.get, event_title_body, friendly_error_text, generic_display_text, isinstance, metadata.get, str, strip, tool_live_summary

live_badge

function nipux_cli/tui_events.py:317

No docstring.

Calls: _style, badge_text.startswith, re.sub

live_display_text

function nipux_cli/tui_events.py:346

No docstring.

Calls: len, text.startswith

read_terminal_char

function nipux_cli/tui_input.py:12

No docstring.

Calls: data.decode, os.read

read_escape_sequence

function nipux_cli/tui_input.py:17

No docstring.

Calls: len, max, min, read_terminal_char, select.select, sys.stdin.fileno, terminal_escape_complete, time.monotonic

terminal_escape_complete

function nipux_cli/tui_input.py:34

No docstring.

Calls: len, re.match, sequence.startswith

decode_terminal_escape

function nipux_cli/tui_input.py:46

No docstring.

Calls: csi_arrow.group, int, len, match.group, ord, re.match, sequence.startswith

drain_pending_input

function nipux_cli/tui_input.py:74

No docstring.

Calls: os.read, select.select, sys.stdin.fileno

_top_bar

function nipux_cli/tui_layout.py:18

No docstring.

Calls: _edge_line, _muted, _one_line, _style, _token_usage_topline, max

_two_col_title

function nipux_cli/tui_layout.py:47

No docstring.

Calls: _fit_ansi, _muted, _style, left.upper, right.upper

_two_col_line

function nipux_cli/tui_layout.py:53

No docstring.

Calls: _fit_ansi, _muted

_edge_line

function nipux_cli/tui_layout.py:57

No docstring.

Calls: _fit_ansi, _strip_ansi, len, max

_triple_line

function nipux_cli/tui_layout.py:65

No docstring.

Calls: _edge_line, _fit_ansi, _strip_ansi, join, len, max

_compose_bar

function nipux_cli/tui_layout.py:83

No docstring.

Calls: _accent, _fit_ansi, _muted, len, lines.extend, max, min, title.strip

_metric_strip

function nipux_cli/tui_layout.py:112

No docstring.

Calls: _bold, _muted, _one_line, _strip_ansi, join, len

_pill

function nipux_cli/tui_layout.py:121

No docstring.

Calls: _muted, _style, any, str, value_text.lower

_token_usage_topline

function nipux_cli/tui_layout.py:134

No docstring.

Calls: _format_compact_count, _format_usage_cost, _muted, _safe_int, _style, usage.get

_model_cost_is_zero

function nipux_cli/tui_layout.py:164

No docstring.

Calls: base_url.lower, lowered_model.endswith, model.lower

_format_usage_cost

function nipux_cli/tui_layout.py:175

No docstring.

Calls: _model_cost_is_zero, _safe_float, _safe_int, _safe_optional_float, bool, usage.get

_format_compact_count

function nipux_cli/tui_layout.py:193

No docstring.

Calls: _safe_int, str

_safe_int

function nipux_cli/tui_layout.py:204

No docstring.

Calls: float, int

_safe_float

function nipux_cli/tui_layout.py:211

No docstring.

Calls: float

_safe_optional_float

function nipux_cli/tui_layout.py:218

No docstring.

Calls: float

_status_dot

function nipux_cli/tui_layout.py:227

No docstring.

Calls: _style

model_update_event_parts

function nipux_cli/tui_outcomes.py:89

No docstring.

Calls: _durable_progress_event_parts, _outcome_text, brief_reflection_text, chat_agent_message_text, event.get, event_clock, event_title_body, event_tool_args, experiment_metric_text, generic_display_text, get, isinstance, max, metadata.get, output.get, shell_write_target

is_summary_event_candidate

function nipux_cli/tui_outcomes.py:152

No docstring.

Calls: bool, event.get, event_tool_args, get, isinstance, metadata.get, shell_write_target, str

latest_durable_outcome_line

function nipux_cli/tui_outcomes.py:170

No docstring.

Calls: _event_badge, _fit_ansi, _muted, _one_line, _strip_ansi, len, max, model_update_event_parts, reversed

latest_hour_outcome_summary_line

function nipux_cli/tui_outcomes.py:191

Return a single compact count summary for the newest visible activity hour.

Calls: _bold, _fit_ansi, _muted, _one_line, _strip_ansi, event_hour, get, hourly_outcome_summary, int, len, max, model_update_event_parts, order.append

visible_outcome_summary_line

function nipux_cli/tui_outcomes.py:217

Return a stable summary of the durable outcomes available to the pane.

Calls: _bold, _fit_ansi, _muted, _one_line, _strip_ansi, hourly_outcome_summary, len, max, outcome_counts

job_outcome_summary

function nipux_cli/tui_outcomes.py:228

Return a short per-job durable outcome mix for compact job cards.

Calls: _one_line, hourly_outcome_summary, outcome_counts

outcome_counts

function nipux_cli/tui_outcomes.py:238

No docstring.

Calls: counts.get, int, model_update_event_parts

recent_model_update_lines

function nipux_cli/tui_outcomes.py:260

Render recent durable worker outcomes for the compact status pane.

Calls: _event_badge, _fit_ansi, _muted, _one_line, _strip_ansi, get, int, item.get, items.append, len, lines.append, max, model_update_event_parts, reversed, str, textwrap.wrap

chat_updates_pane_lines

function nipux_cli/tui_outcomes.py:320

No docstring.

Calls: _bold, _fit_ansi, _muted, _one_line, _page_indicator, _wrapped_label_line, hourly_outcome_summary, job.get, len, lines.extend, max, outcome_counts, recent_model_update_lines

_wrapped_label_line

function nipux_cli/tui_outcomes.py:343

No docstring.

Calls: _bold, _fit_ansi, _muted, _strip_ansi, len, lines.append, max, textwrap.wrap

hourly_update_lines

function nipux_cli/tui_outcomes.py:355

No docstring.

Calls: _bold, _event_badge, _fit_ansi, _muted, _strip_ansi, append, counts.get, event_hour, hourly_outcome_summary, int, len, max, min, model_update_event_parts, order.append, rendered.append

hourly_outcome_summary

function nipux_cli/tui_outcomes.py:406

No docstring.

Calls: OUTCOME_SUMMARY_NAMES.get, counts.get, int, join, label.lower, ordered.extend, pieces.append, set, sorted

_durable_progress_event_parts

function nipux_cli/tui_outcomes.py:419

No docstring.

Calls: _count_map, _dominant_progress_key, _outcome_text, _progress_count_phrase, _progress_label_for_key, any, bool, deltas.get, generic_display_text, int, join, metadata.get, pieces.append, resolutions.get, set, totals.values

_count_map

function nipux_cli/tui_outcomes.py:454

No docstring.

Calls: int, isinstance, str, value.items

_dominant_progress_key

function nipux_cli/tui_outcomes.py:468

No docstring.

Calls: resolutions.get, totals.get

_progress_label_for_key

function nipux_cli/tui_outcomes.py:478

No docstring.

Calls: none

_progress_count_phrase

function nipux_cli/tui_outcomes.py:494

No docstring.

Calls: OUTCOME_SUMMARY_NAMES.get, _progress_label_for_key, join, label.endswith, parts.append

_outcome_text

function nipux_cli/tui_outcomes.py:504

No docstring.

Calls: _one_line, generic_display_text

worker_label

function nipux_cli/tui_status.py:36

No docstring.

Calls: job.get, job_deferred_until, job_provider_blocked, str

job_display_state

function nipux_cli/tui_status.py:49

No docstring.

Calls: job.get, job_deferred_until, job_provider_blocked, str

active_operator_messages

function nipux_cli/tui_status.py:60

No docstring.

Calls: active_prompt_operator_entries, entry.get, isinstance, metadata.get, str

right_pane_lines

function nipux_cli/tui_status.py:71

No docstring.

Calls: _bold, _chat_workspace_lines, _defer_status_line, _empty_workspace_status_lines, _event_badge, _fit_ansi, _is_workspace_placeholder, _metric_strip, _metrics_grid_lines, _muted, _one_line, _yield_line, active_operator_messages, artifact.get, frame_jobs_lines, get

_is_workspace_placeholder

function nipux_cli/tui_status.py:175

No docstring.

Calls: job.get, str

_empty_workspace_status_lines

function nipux_cli/tui_status.py:179

No docstring.

Calls: _bold, _fit_ansi, _muted, _page_indicator

chat_work_pane_lines

function nipux_cli/tui_status.py:192

No docstring.

Calls: _bold, _event_badge, _fit_ansi, _muted, _one_line, _page_indicator, _rank_visible_tasks, _status_badge, experiment.get, experiment_metric_text, job.get, len, lines.append, lines.extend, max, min

chat_settings_pane_lines

function nipux_cli/tui_status.py:239

No docstring.

Calls: _bold, _fit_ansi, _muted, _page_indicator, _rate_text, _setting_line, str

frame_jobs_lines

function nipux_cli/tui_status.py:277

No docstring.

Calls: _accent, _fit_ansi, _job_compact_work_lines, _muted, _one_line, _status_badge, enumerate, get, item.get, job_display_state, max, min, rendered.append, rendered.extend, str, worker_label

_job_compact_work_lines

function nipux_cli/tui_status.py:315

No docstring.

Calls: _bold, _fit_ansi, _job_recent_non_output_pieces, _muted, _one_line, bool, counts.get, int, job_outcome_summary, latest.get, len, lines.append, max, second.get, str

_job_recent_non_output_pieces

function nipux_cli/tui_status.py:348

No docstring.

Calls: _compact_outcome_label, _muted, _one_line, len, max, model_update_event_parts, pieces.append, reversed, seen.add, set

_compact_outcome_label

function nipux_cli/tui_status.py:378

No docstring.

Calls: get, label.lower

_defer_status_line

function nipux_cli/tui_status.py:395

No docstring.

Calls: _fit_ansi, _muted, _one_line, isinstance, job.get, job_deferred_until, max, metadata.get, str, strftime, strip, until.astimezone

_rank_visible_tasks

function nipux_cli/tui_status.py:408

No docstring.

Calls: int, isinstance, sorted, status_order.get, str, task.get

_chat_workspace_lines

function nipux_cli/tui_status.py:420

No docstring.

Calls: _bold, _context_pressure_line, _current_task_line, _muted, _one_line, _page_indicator, _status_badge, goal_lines.append, job.get, len, lines.append, max, str, textwrap.wrap

_current_task_line

function nipux_cli/tui_status.py:451

No docstring.

Calls: _fit_ansi, _muted, _one_line, _rank_visible_tasks, _status_badge, isinstance, job.get, max, metadata.get, str, task.get

_context_pressure_line

function nipux_cli/tui_status.py:469

No docstring.

Calls: _fit_ansi, _format_compact_count, _muted, _one_line, _safe_int, max, usage.get

_metrics_grid_lines

function nipux_cli/tui_status.py:485

No docstring.

Calls: _fit_ansi, _metric_cell, _metric_strip, len, lines.append, max, range

_metric_cell

function nipux_cli/tui_status.py:500

No docstring.

Calls: _bold, _fit_ansi, _muted

_yield_line

function nipux_cli/tui_status.py:505

No docstring.

Calls: _fit_ansi, _muted, _safe_int, _status_badge, lookup.get

_setting_line

function nipux_cli/tui_status.py:525

No docstring.

Calls: _bold, _fit_ansi, _muted, _one_line, max

_rate_text

function nipux_cli/tui_status.py:532

No docstring.

Calls: none

_safe_int

function nipux_cli/tui_status.py:536

No docstring.

Calls: float, int

_fancy_ui

function nipux_cli/tui_style.py:11

No docstring.

Calls: os.environ.get, sys.stdout.isatty

_style

function nipux_cli/tui_style.py:20

No docstring.

Calls: _fancy_ui, str

_accent

function nipux_cli/tui_style.py:27

No docstring.

Calls: _style

_muted

function nipux_cli/tui_style.py:31

No docstring.

Calls: _style

_bold

function nipux_cli/tui_style.py:35

No docstring.

Calls: _style

_one_line

function nipux_cli/tui_style.py:39

No docstring.

Calls: join, len, max, split, str

_strip_ansi

function nipux_cli/tui_style.py:46

No docstring.

Calls: re.sub

_fit_ansi

function nipux_cli/tui_style.py:50

No docstring.

Calls: _one_line, _strip_ansi, int, len, max, str

_center_ansi

function nipux_cli/tui_style.py:60

No docstring.

Calls: _fit_ansi, _strip_ansi, len, max

_themed_lines

function nipux_cli/tui_style.py:68

No docstring.

Calls: _fancy_ui, _fit_ansi, replace

_frame_enter_sequence

function nipux_cli/tui_style.py:76

No docstring.

Calls: _fancy_ui

_frame_exit_sequence

function nipux_cli/tui_style.py:81

No docstring.

Calls: none

_page_indicator

function nipux_cli/tui_style.py:85

No docstring.

Calls: _accent, _bold, _muted, join, parts.append

_status_badge

function nipux_cli/tui_style.py:95

No docstring.

Calls: _style, get, str

_event_badge

function nipux_cli/tui_style.py:122

No docstring.

Calls: _style, get

UninstallPlan

class nipux_cli/uninstall.py:21

No docstring.

Calls: dataclass

build_uninstall_plan

function nipux_cli/uninstall.py:26

Return all local runtime paths that a full uninstall should remove.

Calls: Path.home, UninstallPlan, _dedupe_paths, get_agent_home, homes.append, launch_agent_path, runtime_home.expanduser, systemd_service_path, tuple

uninstall_runtime

function nipux_cli/uninstall.py:37

Remove local Nipux state, logs, service files, and legacy state dirs.

Calls: _assert_safe_delete_target, _disable_services, build_uninstall_plan, lines.append, lines.extend, path.expanduser, shutil.rmtree, target.exists, target.is_dir, target.is_symlink, target.unlink

uninstall_installed_tool

function nipux_cli/uninstall.py:66

Remove the installed `nipux` command from common uv-tool locations.

Calls: _process_lines, installed_tool_paths, lines.append, lines.extend, path.exists, path.is_dir, path.is_symlink, path.unlink, run, shutil.rmtree, shutil.which

installed_tool_paths

function nipux_cli/uninstall.py:114

Return safe user-level paths for uv-tool Nipux installs.

Calls: Path, Path.home, _dedupe_paths, _is_safe_installed_tool_path, candidates.append, expanduser, path.expanduser, resolve, safe.append, shutil.which, tuple

_disable_services

function nipux_cli/uninstall.py:133

No docstring.

Calls: launch_agent_path, launch_path.exists, lines.append, os.getuid, runner, service_path.exists, shutil.which, str, systemd_service_path

_run_command

function nipux_cli/uninstall.py:168

No docstring.

Calls: list, subprocess.run

_process_lines

function nipux_cli/uninstall.py:178

No docstring.

Calls: isinstance, line.rstrip, line.strip, splitlines

_dedupe_paths

function nipux_cli/uninstall.py:184

No docstring.

Calls: path.expanduser, result.append, seen.add, set, str

_assert_safe_delete_target

function nipux_cli/uninstall.py:196

No docstring.

Calls: Path, Path.home, ValueError, len, path.expanduser, resolve

_is_safe_installed_tool_path

function nipux_cli/uninstall.py:206

No docstring.

Calls: expanded.resolve, path.expanduser

find_checkout_root

function nipux_cli/updater.py:19

Return the nearest enclosing git checkout for the Nipux install.

Calls: Path, current.is_file, exists, expanduser, resolve

update_checkout

function nipux_cli/updater.py:31

Update the current Nipux install and return output lines. Source checkouts are fast-forwarded with git. Installed tools are refreshed from the configured source repository so `n...

Calls: Path, _git_text, _process_lines, _short_path, _update_uv_tool_install, dirty.stdout.strip, exists, expanduser, find_checkout_root, lines.append, lines.extend, prefix.append, resolve, run, top_level.stdout.strip

_update_uv_tool_install

function nipux_cli/updater.py:83

No docstring.

Calls: _process_lines, _uv_tool_update_spec, _verify_updated_command, lines.append, lines.extend, run, shutil.which

_verify_updated_command

function nipux_cli/updater.py:114

No docstring.

Calls: _process_lines, join, runner, shutil.which, strip

_uv_tool_update_spec

function nipux_cli/updater.py:125

Return the direct source uv should use for installed-tool updates.

Calls: os.environ.get, repo.startswith, strip

_run_git

function nipux_cli/updater.py:139

No docstring.

Calls: list, subprocess.run

_run_command

function nipux_cli/updater.py:150

No docstring.

Calls: list, subprocess.run

_process_lines

function nipux_cli/updater.py:160

No docstring.

Calls: isinstance, line.rstrip, line.strip, output.splitlines

_git_text

function nipux_cli/updater.py:165

No docstring.

Calls: isinstance, process.stdout.strip

_short_path

function nipux_cli/updater.py:172

No docstring.

Calls: Path.home, len, max, str, text.startswith

render_updates_report

function nipux_cli/updates.py:16

No docstring.

Calls: _metadata_list, _one_line, artifact.get, bool, daemon_lock_status, db.get_job, db.list_artifacts, db.list_timeline_events, hourly_update_lines, isinstance, job.get, job_display_state, latest.get, lesson.get, lines.append, lines.extend

render_all_updates_report

function nipux_cli/updates.py:70

No docstring.

Calls: _metadata_count, _one_line, artifact.get, bool, counts.get, daemon_lock_status, db.job_record_counts, db.list_artifacts, db.list_events, db.list_jobs, job_display_state, join, len, lines.append, lines.extend, max

_metadata_list

function nipux_cli/updates.py:124

No docstring.

Calls: isinstance, metadata.get

_metadata_count

function nipux_cli/updates.py:129

No docstring.

Calls: isinstance, job.get, len, metadata.get

format_usage_report

function nipux_cli/usage.py:10

No docstring.

Calls: _format_compact_count, _format_usage_cost, _safe_float, _safe_int, _safe_optional_float, bool, lines.append, max, usage.get

_safe_int

function nipux_cli/usage.py:58

No docstring.

Calls: float, int

_safe_float

function nipux_cli/usage.py:65

No docstring.

Calls: float

_safe_optional_float

function nipux_cli/usage.py:72

No docstring.

Calls: float

_TextExtractor

class nipux_cli/web.py:15

No docstring.

Calls: __init__, data.strip, html.unescape, join, re.sub, self.parts.append, strip, super

__init__

function nipux_cli/web.py:16

No docstring.

Calls: __init__, super

handle_starttag

function nipux_cli/web.py:21

No docstring.

Calls: self.parts.append

handle_endtag

function nipux_cli/web.py:28

No docstring.

Calls: self.parts.append

handle_data

function nipux_cli/web.py:34

No docstring.

Calls: data.strip, self.parts.append

text

function nipux_cli/web.py:40

No docstring.

Calls: html.unescape, join, re.sub, strip

_request

function nipux_cli/web.py:48

No docstring.

Calls: ValueError, body.decode, len, match.group, re.search, response.headers.get, response.read, urllib.request.Request, urllib.request.urlopen

_strip_html

function nipux_cli/web.py:68

No docstring.

Calls: _TextExtractor, parser.feed, parser.text

_duckduckgo_link

function nipux_cli/web.py:74

No docstring.

Calls: html.unescape, urllib.parse.parse_qs, urllib.parse.urlparse

web_search

function nipux_cli/web.py:82

No docstring.

Calls: _duckduckgo_link, _request, html.unescape, len, match.group, pattern.finditer, re.compile, re.sub, results.append, strip, urllib.parse.urlencode

web_extract

function nipux_cli/web.py:98

No docstring.

Calls: _request, _strip_html, anti_bot_reason, len, pages.append, str

StepExecution

class nipux_cli/worker.py:122

No docstring.

Calls: dataclass

build_messages

function nipux_cli/worker.py:170

No docstring.

Calls: _activity_stagnation_for_prompt, _candidate_file_discovery_for_prompt, _clip_text, _current_execution_focus_for_prompt, _deliverable_progress_guard_for_prompt, _durable_yield_for_prompt, _evidence_checkpoint_accounting_for_prompt, _experiment_stagnation_guard_for_prompt, _expe...

_acknowledge_non_prompt_operator_context

function nipux_cli/worker.py:285

No docstring.

Calls: db.acknowledge_operator_messages, db.get_job, inactive_prompt_operator_ids, int, isinstance, job.get, metadata.get, result.get

_measured_progress_guard_for_prompt

function nipux_cli/worker.py:300

No docstring.

Calls: _as_int, _candidate_file_discovery_context, _measured_progress_guard_context, context.get

_deliverable_progress_guard_for_prompt

function nipux_cli/worker.py:330

No docstring.

Calls: _deliverable_progress_guard_context, context.get

_experiment_stagnation_guard_for_prompt

function nipux_cli/worker.py:345

No docstring.

Calls: _experiment_stagnation_context, context.get

_measurement_obligation_for_prompt

function nipux_cli/worker.py:360

No docstring.

Calls: _clip_text, isinstance, job.get, join, lines.append, metadata.get, obligation.get, str

_recent_measurement_evidence_for_prompt

function nipux_cli/worker.py:382

No docstring.

Calls: _as_int, _clip_text, _completed_or_failed_recent_steps, _pending_measurement_obligation, _step_command, isinstance, join, len, lines.append, max, measurement_candidates, measurement_candidates_are_diagnostic_only, output.get, reversed, step.get, str

_file_validation_obligation_for_prompt

function nipux_cli/worker.py:421

No docstring.

Calls: _pending_file_validation_obligation, join, lines.append, obligation.get, str, strip

_current_execution_focus_for_prompt

function nipux_cli/worker.py:442

No docstring.

Calls: _clip_text, _current_execution_focus_context, backlog.get, focus.get, isinstance, join, lines.append, str, strip

_current_execution_focus_context

function nipux_cli/worker.py:475

No docstring.

Calls: _auto_checkpoint_accounting_context, _candidate_file_discovery_context, _candidate_file_recently_validated, _latest_evidence_grounding_block, _latest_experiment_next_action_context, _milestone_validation_needed, _pending_file_validation_obligation, _pending_measurement_obligat...

_task_backlog_pressure_context

function nipux_cli/worker.py:574

No docstring.

Calls: _metadata_list, isinstance, len, lower, str, sum, task.get

_primary_execution_task

function nipux_cli/worker.py:586

No docstring.

Calls: _as_int, _metadata_list, isinstance, lower, sorted, status_rank.get, str, task.get

_candidate_file_discovery_for_prompt

function nipux_cli/worker.py:602

No docstring.

Calls: _candidate_file_discovery_context, _candidate_file_recently_validated, _clip_text, context.get, isinstance, join, lines.append, str

_shell_path_recovery_for_prompt

function nipux_cli/worker.py:639

No docstring.

Calls: _clip_text, _shell_path_recovery_context, candidate_executables.items, context.get, isinstance, join, lines.append, list, str

_shell_path_recovery_context

function nipux_cli/worker.py:685

No docstring.

Calls: _candidate_executable_paths_for_missing_commands, _completed_or_failed_recent_steps, _missing_commands_from_shell_output, _missing_paths_from_shell_output, _observed_executable_paths_from_recent_shell, _shell_output_has_missing_command, isinstance, join, output.get, reversed, ...

_shell_permission_recovery_for_prompt

function nipux_cli/worker.py:712

No docstring.

Calls: _clip_text, _recent_privileged_shell_failure_context, context.get, join, lines.append, str

_shell_step_failure_text

function nipux_cli/worker.py:734

No docstring.

Calls: isinstance, join, output.get, step.get, str

_shell_output_has_missing_command

function nipux_cli/worker.py:739

No docstring.

Calls: any, text.lower

_missing_paths_from_shell_output

function nipux_cli/worker.py:744

No docstring.

Calls: get, len, match.groupdict, paths.append, re.finditer, seen.add, set, str, strip

_missing_commands_from_shell_output

function nipux_cli/worker.py:764

No docstring.

Calls: commands.append, get, len, match.groupdict, re.finditer, seen.add, set, str, strip

_candidate_executable_paths_for_missing_commands

function nipux_cli/worker.py:789

No docstring.

Calls: Path, _observed_executable_paths_from_recent_shell, append, len, lower, matches.get, matches.items, matches.setdefault, name.lower, path.lower, seen.add, set, str, strip

_observed_executable_paths_from_recent_shell

function nipux_cli/worker.py:810

No docstring.

Calls: _completed_or_failed_recent_steps, _extract_candidate_executable_paths, _shell_line_reports_missing_candidate, isinstance, join, len, lower, output.get, path.lower, paths.append, seen.add, set, step.get, str, text.splitlines

_shell_line_reports_missing_candidate

function nipux_cli/worker.py:840

No docstring.

Calls: any, lower, str

_extract_candidate_executable_paths

function nipux_cli/worker.py:855

No docstring.

Calls: Path, _clean_candidate_file_path, _looks_like_candidate_executable_path, command.lower, match.group, name.lower, paths.append, raw.lower, re.finditer, seen.add, set

_looks_like_candidate_executable_path

function nipux_cli/worker.py:874

No docstring.

Calls: Path, any, len, name.startswith, raw.startswith, str, strip

_shell_command_looks_privileged_or_package_manager

function nipux_cli/worker.py:898

No docstring.

Calls: PACKAGE_MANAGER_WRITE_COMMAND_PATTERN.search, PRIVILEGED_COMMAND_PATTERN.search, bool, str, strip

_shell_output_has_permission_failure

function nipux_cli/worker.py:905

No docstring.

Calls: any, lower, str

_recent_privileged_shell_failure_context

function nipux_cli/worker.py:924

No docstring.

Calls: _as_int, _completed_or_failed_recent_steps, _shell_command_looks_privileged_or_package_manager, _shell_output_has_permission_failure, _shell_step_failure_text, _step_command, isinstance, max, output.get, reversed, step.get, str, text.strip

_observed_candidate_recovery_required_context

function nipux_cli/worker.py:955

No docstring.

Calls: _shell_command_invokes_bare_executable, _shell_command_mentions_candidate_path, _shell_path_recovery_context, args.get, candidate_executables.items, command.strip, context.get, isinstance, str, strip

_shell_command_invokes_bare_executable

function nipux_cli/worker.py:986

No docstring.

Calls: bool, re.escape, re.search, str, strip

_shell_command_mentions_candidate_path

function nipux_cli/worker.py:993

No docstring.

Calls: Path, str, strip

_candidate_file_discovery_context

function nipux_cli/worker.py:1007

No docstring.

Calls: _candidate_file_paths_from_durable_records, _candidate_file_paths_from_recent_grounding_blocks, _candidate_file_paths_from_recent_shell, _invalid_candidate_file_paths, _open_file_dependent_task_text, _rank_candidate_file_paths, path.lower, paths.append, seen.add, set

_shell_exec_targets_candidate_file

function nipux_cli/worker.py:1039

No docstring.

Calls: _candidate_file_discovery_context, any, args.get, command.replace, command.strip, context.get, str

_rank_candidate_file_paths

function nipux_cli/worker.py:1050

No docstring.

Calls: _candidate_context_tokens, _candidate_file_path_score, enumerate, list, sorted

_candidate_context_tokens

function nipux_cli/worker.py:1072

No docstring.

Calls: job.get, join, len, re.findall, re.split, set, str, text.lower, token.strip, tokens.add

_candidate_file_path_score

function nipux_cli/worker.py:1086

No docstring.

Calls: Path, _candidate_file_observation_score, any, len, min, name.lower, name.startswith, path.count, path.lower, path_tokens.add, path_tokens.update, re.findall, re.split, set, stem.lower, suffix.lower

_invalid_candidate_file_paths

function nipux_cli/worker.py:1133

No docstring.

Calls: _candidate_file_observation_score, invalid.append

_candidate_file_observation_score

function nipux_cli/worker.py:1141

No docstring.

Calls: _candidate_file_size_score_from_line, _completed_or_failed_recent_steps, _shell_line_reports_missing_candidate, any, isinstance, join, line.lower, output.get, path.lower, step.get, str, text.splitlines

_candidate_file_recently_validated

function nipux_cli/worker.py:1163

No docstring.

Calls: _candidate_file_observation_score, _completed_or_failed_recent_steps, _shell_line_reports_missing_candidate, evidence_lines.append, isinstance, join, len, line.lower, line.split, output.get, path.lower, step.get, str, text.splitlines

_candidate_file_size_score_from_line

function nipux_cli/worker.py:1184

No docstring.

Calls: any, int, lower, re.findall, re.search, str

_open_file_dependent_task_text

function nipux_cli/worker.py:1198

No docstring.

Calls: _metadata_list, any, isinstance, join, len, lower, parts.append, str, task.get, text.lower, text.split

_candidate_file_paths_from_recent_shell

function nipux_cli/worker.py:1219

No docstring.

Calls: _completed_or_failed_recent_steps, _extract_candidate_file_paths, isinstance, join, len, output.get, path.lower, paths.append, seen.add, set, step.get, str

_candidate_file_paths_from_recent_grounding_blocks

function nipux_cli/worker.py:1240

No docstring.

Calls: _clean_candidate_file_path, _looks_like_exact_candidate_file_path, grounding.get, isinstance, len, output.get, path.lower, paths.append, seen.add, set, step.get, str

_candidate_file_paths_from_durable_records

function nipux_cli/worker.py:1265

No docstring.

Calls: _extract_candidate_file_paths, _metadata_list, isinstance, job.get, json.dumps, len, metadata.get, path.lower, paths.append, record_groups.append, reversed, roadmap.get, seen.add, set, str

_extract_candidate_file_paths

function nipux_cli/worker.py:1304

No docstring.

Calls: _clean_candidate_file_path, _looks_like_exact_candidate_file_path, match.group, paths.append, re.finditer

_looks_like_exact_candidate_file_path

function nipux_cli/worker.py:1319

No docstring.

Calls: Path, any, ch.isalpha, len, name.startswith, raw.startswith, re.match, str, strip

_clean_candidate_file_path

function nipux_cli/worker.py:1336

No docstring.

Calls: raw.split, raw.strip, rstrip, str, strip

_progress_accounting_for_prompt

function nipux_cli/worker.py:1343

No docstring.

Calls: _artifact_accounting_context, context.get, join, str

_activity_stagnation_for_prompt

function nipux_cli/worker.py:1357

No docstring.

Calls: _activity_stagnation_context, context.get

_research_balance_guard_for_prompt

function nipux_cli/worker.py:1371

No docstring.

Calls: _research_balance_context, context.get

_source_yield_guard_for_prompt

function nipux_cli/worker.py:1386

No docstring.

Calls: _source_yield_context, context.get, join, str

_task_planning_guard_for_prompt

function nipux_cli/worker.py:1401

No docstring.

Calls: _task_planning_stagnation_context, context.get

_task_queue_saturation_for_prompt

function nipux_cli/worker.py:1414

No docstring.

Calls: _current_task_backlog_pressure_context, _recent_task_queue_saturation_context, context.get, counts.append, current_pressure.get, guard_recovery.get, isinstance, job.get, join, json.dumps, metadata.get, pressure.get, str, strip, task_queue.get

_current_task_backlog_pressure_context

function nipux_cli/worker.py:1468

No docstring.

Calls: _is_guard_recovery_task, _metadata_list, len, lower, replace, str, strip, task.get

_memory_consolidation_guard_for_prompt

function nipux_cli/worker.py:1491

No docstring.

Calls: _memory_graph_consolidation_context, context.get

_lesson_consolidation_guard_for_prompt

function nipux_cli/worker.py:1505

No docstring.

Calls: _lesson_sprawl_context, context.get

_memory_graph_consolidation_context

function nipux_cli/worker.py:1519

No docstring.

Calls: _durable_memory_signal_count, any, len, memory_graph_from_job, step.get

_lesson_sprawl_context

function nipux_cli/worker.py:1545

No docstring.

Calls: _memory_graph_consolidation_context, _metadata_list, len, lower, memory_context.get, step.get, str

_durable_memory_signal_count

function nipux_cli/worker.py:1570

No docstring.

Calls: _metadata_list, isinstance, job.get, len, lower, metadata.get, milestone.get, roadmap.get, str, sum, task.get

_durable_yield_for_prompt

function nipux_cli/worker.py:1596

No docstring.

Calls: _metadata_list, enumerate, isinstance, job.get, len, max, metadata.get, roadmap.get, step.get

_reflections_for_prompt

function nipux_cli/worker.py:1637

No docstring.

Calls: _clip_text, _metadata_list, join, lines.append, reflection.get

_next_action_constraint

function nipux_cli/worker.py:1648

No docstring.

Calls: _activity_stagnation_context, _artifact_accounting_context, _auto_checkpoint_accounting_context, _candidate_file_discovery_context, _candidate_file_recently_validated, _clean_candidate_file_path, _clip_text, _deliverable_progress_guard_context, _experiment_next_action_failure_...

_latest_evidence_grounding_block

function nipux_cli/worker.py:1849

No docstring.

Calls: isinstance, output.get, reversed, step.get

_milestone_validation_needed

function nipux_cli/worker.py:1870

No docstring.

Calls: all, feature.get, isinstance, job.get, metadata.get, milestone.get, roadmap.get, str

_tool_call_matches_pending_milestone_need

function nipux_cli/worker.py:1890

No docstring.

Calls: _json_value_text, _text_matches_pending_milestone_need, lower, milestone.get, str, strip

_text_matches_pending_milestone_need

function nipux_cli/worker.py:1898

No docstring.

Calls: _substantive_next_action_tokens, bool, feature.get, isinstance, join, milestone.get, parts.extend, str

_milestone_validation_call_matches_current

function nipux_cli/worker.py:1927

No docstring.

Calls: _norm_task_key, args.get, milestone.get, str

_normalize_milestone_validation_args_for_active_gate

function nipux_cli/worker.py:1943

No docstring.

Calls: _json_value_text, _milestone_validation_call_matches_current, _milestone_validation_needed, _text_matches_pending_milestone_need, args.get, dict, isinstance, milestone.get, normalized.get, str

_latest_experiment_next_action_context

function nipux_cli/worker.py:1966

No docstring.

Calls: _metadata_list, experiment.get, isinstance, lower, reversed, str, strip

_experiment_next_action_requires_delivery

function nipux_cli/worker.py:1986

No docstring.

Calls: bool, context.get, lower, re.findall, set, str

_experiment_next_action_failure_context

function nipux_cli/worker.py:1998

No docstring.

Calls: _as_int, _completed_or_failed_recent_steps, _experiment_next_action_requires_delivery, _latest_experiment_next_action_context, _missing_commands_from_shell_output, _missing_paths_from_shell_output, _shell_command_matches_next_action, _shell_output_has_missing_command, _shell_s...

_shell_command_looks_like_write

function nipux_cli/worker.py:2033

No docstring.

Calls: any, command.strip, re.match, re.search

_shell_command_looks_read_only

function nipux_cli/worker.py:2057

No docstring.

Calls: READ_ONLY_SHELL_COMMAND_PATTERN.search, _shell_command_looks_like_write, bool, command.strip, re.match, re.search

_shell_command_supports_experiment_next_action

function nipux_cli/worker.py:2071

No docstring.

Calls: EXPERIMENT_NEXT_ACTION_VERIFY_SHELL_PATTERN.search, _substantive_next_action_tokens, bool, command.strip, context.get, next_action.strip, str

_shell_command_matches_next_action

function nipux_cli/worker.py:2087

No docstring.

Calls: _substantive_next_action_tokens, bool, command.strip, next_action.strip

_substantive_next_action_tokens

function nipux_cli/worker.py:2095

No docstring.

Calls: len, re.findall, re.split, set, text.lower, token.strip, tokens.add

_roadmap_staleness_context

function nipux_cli/worker.py:2110

No docstring.

Calls: any, isinstance, job.get, len, metadata.get, milestone.get, roadmap.get, step.get, str

_roadmap_missing_for_broad_job

function nipux_cli/worker.py:2148

No docstring.

Calls: any, isinstance, job.get, len, metadata.get, objective.lower, re.findall, str

_task_queue_exhausted

function nipux_cli/worker.py:2161

No docstring.

Calls: _metadata_list, any, lower, str, strip, task.get

_task_queue_saturation_context

function nipux_cli/worker.py:2169

No docstring.

Calls: _is_guard_recovery_task, _metadata_list, _norm_task_key, args.get, bool, find_semantic_task_match, isinstance, len, lower, new_open_titles.append, new_titles.append, replace, semantic_match.get, semantic_matches.append, str, strip

_recent_task_queue_saturation_context

function nipux_cli/worker.py:2250

No docstring.

Calls: isinstance, output.get, reversed, step.get, task_queue.get

_record_task_backlog_pressure

function nipux_cli/worker.py:2268

No docstring.

Calls: datetime.now, db.append_agent_update, db.update_job_metadata, isinstance, isoformat, task_queue.get

_clear_stale_task_backlog_pressure

function nipux_cli/worker.py:2301

No docstring.

Calls: _current_task_backlog_pressure_context, datetime.now, db.append_agent_update, db.update_job_metadata, dict, isinstance, isoformat, job.get, metadata.get

_repeated_task_queue_saturation_context

function nipux_cli/worker.py:2320

No docstring.

Calls: isinstance, latest.get, len, matches.append, output.get, step.get, task_queue.get

_task_planning_stagnation_context

function nipux_cli/worker.py:2340

No docstring.

Calls: _as_int, _metadata_list, isinstance, job.get, len, lower, metadata.get, replace, str, strip, task.get

_is_guard_recovery_task

function nipux_cli/worker.py:2359

No docstring.

Calls: bool, isinstance, lower, metadata.get, startswith, str, strip, task.get

_record_tasks_adds_new_open_work

function nipux_cli/worker.py:2364

No docstring.

Calls: _metadata_list, _norm_task_key, args.get, isinstance, lower, replace, str, strip, task.get

_norm_task_key

function nipux_cli/worker.py:2386

No docstring.

Calls: task_key

_parse_tool_result

function nipux_cli/worker.py:2390

No docstring.

Calls: isinstance, json.loads

_load_program_text

function nipux_cli/worker.py:2398

No docstring.

Calls: path.exists, path.read_text

_browser_warning_context

function nipux_cli/worker.py:2405

No docstring.

Calls: anti_bot_reason, data.get, isinstance, output.get, str

_recent_anti_bot_context

function nipux_cli/worker.py:2416

No docstring.

Calls: _browser_warning_context, isinstance, reversed, step.get

_artifact_args_acknowledge_block

function nipux_cli/worker.py:2427

No docstring.

Calls: any, args.get, join, lower, str

_same_source_url

function nipux_cli/worker.py:2432

No docstring.

Calls: left.split, right.split, rstrip

_normalized_source_url

function nipux_cli/worker.py:2438

No docstring.

Calls: str, strip

_source_host

function nipux_cli/worker.py:2447

No docstring.

Calls: _normalized_source_url, parsed.netloc.lower, removeprefix, urlparse

_source_matches

function nipux_cli/worker.py:2452

No docstring.

Calls: _same_source_url, _source_path_key, left_path.startswith, right_path.startswith

_source_path_key

function nipux_cli/worker.py:2464

No docstring.

Calls: _normalized_source_url, parsed.netloc.lower, removeprefix, rstrip, urlparse

_shell_source_matches

function nipux_cli/worker.py:2471

No docstring.

Calls: _same_source_url, _source_path_key, left_path.startswith, right_path.startswith

_urls_from_text

function nipux_cli/worker.py:2483

No docstring.

Calls: match.group, re.finditer, rstrip, seen.add, set, str, url.lower, urls.append

_source_url_has_path

function nipux_cli/worker.py:2496

No docstring.

Calls: _source_path_key

_shell_guard_urls

function nipux_cli/worker.py:2501

No docstring.

Calls: _source_url_has_path, _urls_from_text, len

_shell_placeholder_context

function nipux_cli/worker.py:2541

No docstring.

Calls: _urls_from_text, join, lower, match.group, re.escape, re.search, str, strip, urlparse

_shell_syntax_preflight_context

function nipux_cli/worker.py:2585

No docstring.

Calls: shlex.split, str, strip

_source_failure_family_url

function nipux_cli/worker.py:2600

No docstring.

Calls: _normalized_source_url, join, len, split, urlparse

_known_bad_sources

function nipux_cli/worker.py:2615

No docstring.

Calls: _as_float, _as_int, _metadata_list, bad_sources.append, source.get

_known_bad_source_for_call

function nipux_cli/worker.py:2627

No docstring.

Calls: _known_bad_sources, _shell_guard_urls, _shell_source_matches, _source_failure_family_url, _source_matches, args.get, isinstance, source.get, str, url.strip

_tool_signature

function nipux_cli/worker.py:2661

No docstring.

Calls: json.dumps

_duplicate_recent_tool_call

function nipux_cli/worker.py:2665

No docstring.

Calls: _tool_signature, input_data.get, isinstance, reversed, step.get

_completed_recent_steps

function nipux_cli/worker.py:2685

No docstring.

Calls: step.get

_completed_or_failed_recent_steps

function nipux_cli/worker.py:2689

No docstring.

Calls: step.get

_is_browser_tool

function nipux_cli/worker.py:2716

No docstring.

Calls: bool, startswith, str

_browser_runtime_unavailable_context

function nipux_cli/worker.py:2720

No docstring.

Calls: _clip_text, _is_browser_tool, any, int, isinstance, join, lower, max, output.get, reversed, step.get, str

_self_defer_context

function nipux_cli/worker.py:2764

No docstring.

Calls: args.get, lower, next, next_action.strip, reason.strip, str

_stale_claim_tokens_from_unsupported

function nipux_cli/worker.py:3050

No docstring.

Calls: _looks_like_generated_or_file_token, _normalize_claim_text, any, ch.isalpha, ch.isdigit, cleaned.isupper, cleaned.lower, len, seen.add, set, stale_tokens.append, str, strip

_looks_like_generated_or_file_token

function nipux_cli/worker.py:3076

No docstring.

Calls: any, ch.isalpha, ch.isdigit, lowered.endswith, lowered.startswith, token.lower

_normalize_claim_text

function nipux_cli/worker.py:3108

No docstring.

Calls: lower, re.sub, str

_evidence_grounding_context

function nipux_cli/worker.py:3112

No docstring.

Calls: _active_stale_claim_token_set, _candidate_file_paths_from_recent_grounding_blocks, _cited_step_numbers, _concrete_evidence_tokens_for_grounding, _evidence_grounding_proposed_text, _evidence_steps_for_grounding, _grounding_token_in_reference_text, _high_risk_evidence_token, _js...

_concrete_evidence_tokens_for_grounding

function nipux_cli/worker.py:3260

No docstring.

Calls: _concrete_evidence_tokens, _high_risk_evidence_token

_grounding_token_in_reference_text

function nipux_cli/worker.py:3267

No docstring.

Calls: _normalize_claim_text

_missing_candidate_paths_for_grounding

function nipux_cli/worker.py:3274

No docstring.

Calls: _candidate_file_paths_from_recent_grounding_blocks, _evidence_line_is_negative, _extract_candidate_file_paths, _path_mentioned_in_text, _rank_candidate_file_paths, any, distinctive_paths.append, join, len, line.lower, lower, path.lower, seen.add, set, splitlines, str

_positive_path_claim_conflicts_for_grounding

function nipux_cli/worker.py:3330

No docstring.

Calls: _clip_text, _evidence_line_is_negative, _excerpt_around, _extract_candidate_executable_paths, _extract_candidate_file_paths, _path_near_positive_claim, conflicts.append, len, line.lower, line.strip, path.lower, proposed_combined.lower, seen.add, set, splitlines, str

_path_near_positive_claim

function nipux_cli/worker.py:3373

No docstring.

Calls: _evidence_line_is_negative, _excerpts_around_all, any, excerpt.lower

_excerpt_around

function nipux_cli/worker.py:3383

No docstring.

Calls: _excerpts_around_all

_excerpts_around_all

function nipux_cli/worker.py:3388

No docstring.

Calls: excerpts.append, len, max, min, needle_text.lower, source.lower, source_lower.find, str

_path_mentioned_in_text

function nipux_cli/worker.py:3408

No docstring.

Calls: Path, bool, name.lower, path.lower

_refresh_contradicted_negative_claims

function nipux_cli/worker.py:3416

No docstring.

Calls: _concrete_evidence_tokens, _metadata_list, _negative_claim_conflicts_for_grounding, _negative_record_id, _negative_record_text, _negative_record_title, _recent_evidence_text, conflict.get, datetime.now, db.append_agent_update, db.update_job_metadata, fresh_evidence_text.strip,...

_negative_record_text

function nipux_cli/worker.py:3498

No docstring.

Calls: join, record.get, str

_negative_record_id

function nipux_cli/worker.py:3512

No docstring.

Calls: _negative_record_title, _normalize_claim_text, record.get, str, strip

_negative_record_title

function nipux_cli/worker.py:3520

No docstring.

Calls: _clip_text, record.get, str

_negative_claim_conflicts_for_grounding

function nipux_cli/worker.py:3526

No docstring.

Calls: _clip_text, _file_pattern_tokens_for_grounding, _high_risk_evidence_token, _positive_evidence_line_for_token, _token_near_negative_claim, any, conflicts.append, fresh_evidence_text.splitlines, line.strip, proposed_text.lower, seen.add, set, token.lower, token.startswith

_file_pattern_tokens_for_grounding

function nipux_cli/worker.py:3560

No docstring.

Calls: len, lower, lstrip, match.end, match.group, match.start, raw.startswith, re.finditer, seen.add, set, strip, tokens.append

_token_near_negative_claim

function nipux_cli/worker.py:3586

No docstring.

Calls: _nearby_negative_is_positive_validation, _nearby_negative_is_role_classification, any, len, max, text.lower, text_lower.find, token.lower

_nearby_negative_is_role_classification

function nipux_cli/worker.py:3606

No docstring.

Calls: any

_nearby_negative_is_positive_validation

function nipux_cli/worker.py:3610

No docstring.

Calls: bool, re.search

_positive_evidence_line_for_token

function nipux_cli/worker.py:3614

No docstring.

Calls: _evidence_line_is_negative, line.lower, token.lower

_evidence_line_is_negative

function nipux_cli/worker.py:3626

No docstring.

Calls: any, line_lower.startswith

_evidence_grounding_proposed_text

function nipux_cli/worker.py:3632

No docstring.

Calls: _json_text, _json_value_text, args.get, edge.get, isinstance, join, node.get, parts.append

_json_text

function nipux_cli/worker.py:3675

No docstring.

Calls: json.dumps, str

_json_value_text

function nipux_cli/worker.py:3682

No docstring.

Calls: _json_value_text, isinstance, join, str, value.values

_cited_step_numbers

function nipux_cli/worker.py:3690

No docstring.

Calls: int, match.group, numbers.add, re.finditer, set

_evidence_steps_for_grounding

function nipux_cli/worker.py:3710

No docstring.

Calls: _completed_recent_steps, int, step.get

_recent_evidence_text

function nipux_cli/worker.py:3735

No docstring.

Calls: _durable_records_for_grounding, _evidence_steps_for_grounding, _json_text, isinstance, item.get, job.get, join, output.get, page.get, parts.append, parts.extend, step.get, str

_active_stale_claim_token_set

function nipux_cli/worker.py:3772

No docstring.

Calls: _stale_claim_tokens_from_unsupported, isinstance, job.get, join, lower, metadata.get, str, strip

_durable_records_for_grounding

function nipux_cli/worker.py:3782

No docstring.

Calls: _json_text, _metadata_list, experiment.get, finding.get, graph.get, isinstance, job.get, join, metadata.get, node.get, parts.append, roadmap.get, source.get

_concrete_evidence_tokens

function nipux_cli/worker.py:3835

No docstring.

Calls: _looks_like_generated_evidence_token, any, ch.isalpha, ch.isdigit, islower, isupper, len, lowered.endswith, lowered.startswith, raw.strip, re.findall, re.match, re.sub, replace, seen_numeric.add, set

_high_risk_evidence_token

function nipux_cli/worker.py:3878

No docstring.

Calls: _looks_like_generated_evidence_token, any, ch.isalpha, ch.isdigit, len, lowered.endswith, lowered.startswith, token.isupper, token.lower

_looks_like_generated_evidence_token

function nipux_cli/worker.py:3895

No docstring.

Calls: bool, re.match, strip, token.lower

_step_has_evidence

function nipux_cli/worker.py:3905

No docstring.

Calls: anti_bot_reason, data.get, isinstance, join, len, output.get, page.get, snapshot.strip, step.get, str, strip, text.strip

_unpersisted_evidence_step

function nipux_cli/worker.py:3927

No docstring.

Calls: _step_has_evidence, isinstance, output.get, reversed, step.get

_evidence_checkpoint_accounting_for_prompt

function nipux_cli/worker.py:3941

No docstring.

Calls: _auto_checkpoint_accounting_context, context.get

_pending_evidence_checkpoint

function nipux_cli/worker.py:3963

No docstring.

Calls: checkpoint.get, isinstance, job.get, metadata.get

_step_created_auto_checkpoint

function nipux_cli/worker.py:3971

No docstring.

Calls: checkpoint.get, isinstance, output.get, step.get

_auto_checkpoint_accounting_context

function nipux_cli/worker.py:3985

No docstring.

Calls: _pending_evidence_checkpoint, _read_artifact_args_match_checkpoint, _step_created_auto_checkpoint, any, bool, checkpoint.get, checkpoint_step.get, int, pending.get, reversed, step.get, str

_evidence_checkpoint_blocks_tool

function nipux_cli/worker.py:4032

No docstring.

Calls: _read_artifact_call_matches_checkpoint, context.get, str

_evidence_checkpoint_block_guidance

function nipux_cli/worker.py:4050

No docstring.

Calls: context.get

_read_artifact_args_match_checkpoint

function nipux_cli/worker.py:4068

No docstring.

Calls: _read_artifact_call_matches_checkpoint, input_data.get, isinstance, step.get

_read_artifact_call_matches_checkpoint

function nipux_cli/worker.py:4074

No docstring.

Calls: any, args.get, bool, str, strip

_recent_search_streak

function nipux_cli/worker.py:4082

No docstring.

Calls: _recent_tool_streak

_pending_measurement_obligation

function nipux_cli/worker.py:4086

No docstring.

Calls: isinstance, job.get, metadata.get, obligation.get

_pending_file_validation_obligation

function nipux_cli/worker.py:4127

No docstring.

Calls: isinstance, job.get, metadata.get, obligation.get

_file_output_needs_validation

function nipux_cli/worker.py:4135

No docstring.

Calls: Path, content.lstrip, content.strip, first_line.startswith, name.lower, splitlines, suffix.lower

_suggested_file_validation

function nipux_cli/worker.py:4146

No docstring.

Calls: Path, shlex_quote, suffix.lower

shlex_quote

function nipux_cli/worker.py:4160

No docstring.

Calls: replace, str

_clear_invalid_measurement_obligation

function nipux_cli/worker.py:4164

No docstring.

Calls: _pending_measurement_obligation, db.append_agent_update, db.get_job, db.update_job_metadata, isinstance, measurement_candidates_are_diagnostic_only, obligation.get, str

_progress_churn_context

function nipux_cli/worker.py:4185

No docstring.

Calls: any, get, len, step.get, sum

_activity_stagnation_context

function nipux_cli/worker.py:4203

No docstring.

Calls: _as_int, counts.get, isinstance, job.get, metadata.get

_research_balance_context

function nipux_cli/worker.py:4216

No docstring.

Calls: _metadata_list, any, bool, isinstance, job.get, len, lower, metadata.get, step.get, str, task.get

_source_yield_context

function nipux_cli/worker.py:4276

No docstring.

Calls: _as_float, _as_int, _metadata_list, get, len, max, source.get, step.get, str, strip

_artifact_accounting_context

function nipux_cli/worker.py:4335

No docstring.

Calls: _clip_text, args.get, get, input_data.get, isinstance, len, reversed, step.get, str, tail.append, tail.reverse, titles.append

_job_requires_measured_progress

function nipux_cli/worker.py:4366

No docstring.

Calls: MEASURABLE_PROGRESS_PATTERN.search, _metadata_list, _task_text_requires_measurement, any, job.get, str, task.get, text_parts.extend

_task_text_requires_measurement

function nipux_cli/worker.py:4389

No docstring.

Calls: MEASURABLE_PROGRESS_PATTERN.search, any, str, task.get

_job_requires_deliverable_progress

function nipux_cli/worker.py:4396

No docstring.

Calls: _as_int, _metadata_list, any, bool, competing_execution_tasks.append, job.get, join, lower, max, re.findall, report_tasks.append, set, str, strip, task.get

_step_is_deliverable_checkpoint

function nipux_cli/worker.py:4428

No docstring.

Calls: args.get, bool, input_data.get, isinstance, join, lower, re.findall, set, step.get, str

_deliverable_progress_guard_context

function nipux_cli/worker.py:4451

No docstring.

Calls: _job_requires_deliverable_progress, _shell_command_looks_read_only, _step_command, _step_is_deliverable_checkpoint, any, enumerate, get, len, step.get

_step_command

function nipux_cli/worker.py:4491

No docstring.

Calls: args.get, input_data.get, isinstance, step.get, str

_read_only_shell_churn_context

function nipux_cli/worker.py:4497

No docstring.

Calls: _clip_text, _shell_command_looks_read_only, _step_command, get, len, step.get

_experiment_metric_group_key

function nipux_cli/worker.py:4536

No docstring.

Calls: bool, experiment.get, lower, str, strip

_experiment_metric_number

function nipux_cli/worker.py:4549

No docstring.

Calls: experiment.get, float

_experiment_value_improves

function nipux_cli/worker.py:4556

No docstring.

Calls: none

_experiment_stagnation_context

function nipux_cli/worker.py:4560

No docstring.

Calls: _as_int, _experiment_metric_group_key, _experiment_metric_number, _experiment_value_improves, _job_requires_measured_progress, _metadata_list, any, best.get, bool, enumerate, experiment.get, latest.get, len, lower, max, step.get

_measured_progress_guard_context

function nipux_cli/worker.py:4625

No docstring.

Calls: _job_requires_measured_progress, _metadata_list, _pending_measurement_obligation, _step_accounts_for_measured_progress_guard, any, enumerate, get, len, step.get

_step_accounts_for_measured_progress_guard

function nipux_cli/worker.py:4663

No docstring.

Calls: _task_text_requires_measurement, isinstance, lower, output.get, replace, step.get, str, strip, task.get

_maybe_create_measurement_obligation

function nipux_cli/worker.py:4685

No docstring.

Calls: args.get, datetime.now, db.append_agent_update, db.get_job, db.update_job_metadata, existing.get, get, isinstance, isoformat, join, measurement_candidates, metadata.get, result.get, step.get, str

_maybe_create_file_validation_obligation

function nipux_cli/worker.py:4723

No docstring.

Calls: _file_output_needs_validation, _suggested_file_validation, args.get, datetime.now, db.append_agent_update, db.get_job, db.update_job_metadata, existing.get, get, isinstance, isoformat, metadata.get, result.get, step.get, str, strip

_command_references_path

function nipux_cli/worker.py:4758

No docstring.

Calls: Path, any, needles.add, path_obj.expanduser, resolve, str

_resolve_file_validation_obligation

function nipux_cli/worker.py:4770

No docstring.

Calls: _pending_file_validation_obligation, datetime.now, db.append_agent_update, db.get_job, db.update_job_metadata, dict, isoformat, resolved.update, result.get

_maybe_resolve_file_validation_obligation

function nipux_cli/worker.py:4811

No docstring.

Calls: _command_references_path, _pending_file_validation_obligation, _resolve_file_validation_obligation, args.get, db.get_job, obligation.get, result.get, str

_step_by_id

function nipux_cli/worker.py:4843

No docstring.

Calls: db.list_steps, step.get, str

_search_query

function nipux_cli/worker.py:4850

No docstring.

Calls: args.get, str, strip

_query_tokens

function nipux_cli/worker.py:4854

No docstring.

Calls: len, query.lower, re.findall

_text_tokens

function nipux_cli/worker.py:4862

No docstring.

Calls: len, lower, re.findall, str

_similar_recent_search

function nipux_cli/worker.py:4870

No docstring.

Calls: _similar_recent_query_tool

_similar_recent_query_tool

function nipux_cli/worker.py:4879

No docstring.

Calls: _completed_recent_steps, _query_tokens, _search_query, input_data.get, isinstance, len, max, reversed, step.get

_recent_tool_streak

function nipux_cli/worker.py:4907

No docstring.

Calls: _completed_recent_steps, reversed, step.get

_repeated_guard_block_context

function nipux_cli/worker.py:4919

No docstring.

Calls: _already_read_checkpoint_accounting_block, any, blocked_tools.append, int, isinstance, last_recovery.get, latest_blocked.get, max, next, output.get, recovery_context.get, recovery_output.get, reversed, step.get, step_output.get, str

_already_read_checkpoint_accounting_block

function nipux_cli/worker.py:4993

No docstring.

Calls: bool, checkpoint.get, isinstance, output.get, step.get

_step_error_text

function nipux_cli/worker.py:5002

No docstring.

Calls: isinstance, join, output.get, step.get, str

_blocked_tool_call_result

function nipux_cli/worker.py:5015

No docstring.

Calls: _activity_stagnation_context, _artifact_accounting_context, _artifact_args_acknowledge_block, _as_int, _auto_checkpoint_accounting_context, _browser_runtime_unavailable_context, _deliverable_progress_guard_context, _duplicate_recent_tool_call, _evidence_checkpoint_block_guidan...

_error_result

function nipux_cli/worker.py:5744

No docstring.

Calls: isinstance, str, type

_hard_llm_provider_failure_note

function nipux_cli/worker.py:5755

No docstring.

Calls: provider_action_required_note

_max_step_no

function nipux_cli/worker.py:5759

No docstring.

Calls: int, max, step.get

_should_reflect

function nipux_cli/worker.py:5763

No docstring.

Calls: _max_step_no, _metadata_list, get, isinstance, metadata.get

_lesson_already_recorded

function nipux_cli/worker.py:5781

No docstring.

Calls: _metadata_list, any, entry.get, join, lower, split, str, strip

_reflection_strategy

function nipux_cli/worker.py:5791

No docstring.

Calls: _as_float, _as_int, isinstance, len, lower, max, source.get, str, task.get

_claim_operator_queue

function nipux_cli/worker.py:5831

No docstring.

Calls: db.claim_operator_messages

_emit_loop_start

function nipux_cli/worker.py:5838

No docstring.

Calls: db.append_event

_emit_assistant_message_event

function nipux_cli/worker.py:5857

No docstring.

Calls: db.append_event, float, join, max, round, turn_usage_metadata

_emit_loop_end

function nipux_cli/worker.py:5892

No docstring.

Calls: db.append_event

_run_reflection_step

function nipux_cli/worker.py:5923

No docstring.

Calls: StepExecution, _as_float, _as_int, _emit_loop_end, _lesson_already_recorded, _max_step_no, _pending_measurement_obligation, _reflection_strategy, artifact.get, bool, db.add_step, db.append_agent_update, db.append_lesson, db.append_reflection, db.finish_run, db.finish_step

_run_guard_recovery_step

function nipux_cli/worker.py:6042

No docstring.

Calls: StepExecution, _emit_loop_end, _resolve_evidence_checkpoint, _step_by_id, context.get, datetime.now, db.add_step, db.append_agent_update, db.append_lesson, db.append_task_record, db.finish_run, db.finish_step, db.update_job_metadata, isinstance, isoformat, join

_usage_budget_limit_context

function nipux_cli/worker.py:6187

No docstring.

Calls: _as_float, _as_int, bool, float, usage.get

_run_usage_budget_limit_step

function nipux_cli/worker.py:6204

No docstring.

Calls: StepExecution, _compact_usage_tokens, _emit_loop_end, context.get, datetime.now, db.add_step, db.append_agent_update, db.finish_run, db.finish_step, db.update_job_status, float, isoformat, refresh_memory_index

_compact_usage_tokens

function nipux_cli/worker.py:6241

No docstring.

Calls: _as_int, str

_evidence_checkpoint_content

function nipux_cli/worker.py:6250

No docstring.

Calls: _observation_for_prompt, evidence_step.get, input_data.get, isinstance, join, json.dumps

_auto_persist_evidence

function nipux_cli/worker.py:6264

No docstring.

Calls: _evidence_checkpoint_content, artifacts.write_text, datetime.now, db.append_agent_update, db.append_lesson, db.update_job_metadata, evidence_step.get, isoformat, str

_auto_record_grounding_block_lesson

function nipux_cli/worker.py:6319

No docstring.

Calls: _stale_claim_tokens_from_unsupported, combined.append, combined_seen.add, db.append_lesson, db.get_job, db.update_job_metadata, grounding.get, isinstance, job.get, join, metadata.get, result.get, set, str, strip, token.lower

_mark_evidence_checkpoint_read

function nipux_cli/worker.py:6369

No docstring.

Calls: _pending_evidence_checkpoint, _read_artifact_call_matches_checkpoint, datetime.now, db.append_agent_update, db.get_job, db.update_job_metadata, dict, isoformat, pending.get, step.get, str

_resolve_evidence_checkpoint

function nipux_cli/worker.py:6403

No docstring.

Calls: _pending_evidence_checkpoint, datetime.now, db.append_agent_update, db.get_job, db.update_job_metadata, dict, isoformat, pending.get, step.get

_auto_record_blocked_source

function nipux_cli/worker.py:6431

No docstring.

Calls: context.get, db.append_agent_update, db.append_lesson, db.append_source_record, int, record.get, str

_auto_record_tool_source_quality

function nipux_cli/worker.py:6468

No docstring.

Calls: _auto_record_blocked_source, _browser_warning_context, context.get, db.append_source_record, isinstance, item.get, len, page.get, result.get, str, strip, text.strip

_auto_record_failed_shell_sources

function nipux_cli/worker.py:6541

No docstring.

Calls: _clip_text, _same_source_url, _shell_guard_urls, _source_failure_family_url, any, args.get, candidate.lower, candidates.append, db.append_source_record, error_text.lower, join, metadata.update, recorded.add, result.get, set, str

_auto_reconcile_artifact_tasks

function nipux_cli/worker.py:6601

No docstring.

Calls: _artifact_can_reconcile_task, _as_int, _clip_text, _metadata_list, _text_tokens, args.get, db.append_agent_update, db.append_task_record, db.get_job, isinstance, join, len, lower, max, min, reconciled.append

_auto_open_revision_task_for_deliverable

function nipux_cli/worker.py:6676

No docstring.

Calls: _artifact_can_reconcile_task, _as_int, _clip_text, _metadata_list, args.get, db.append_agent_update, db.append_task_record, db.get_job, isinstance, lower, metadata.get, result.get, str, strip, task.get

_artifact_can_reconcile_task

function nipux_cli/worker.py:6744

No docstring.

Calls: any, contract.strip, lower, task_text.lower

_auto_checkpoint_update

function nipux_cli/worker.py:6768

No docstring.

Calls: _as_int, any, args.get, build_progress_checkpoint, checkpoint.deltas.get, checkpoint.resolutions.get, checkpoint.updates.get, datetime.now, db.append_agent_update, db.get_job, db.update_job_metadata, isinstance, isoformat, job.get, join, lower

_execute_tool_call

function nipux_cli/worker.py:6828

No docstring.

Calls: StepExecution, ToolContext, _auto_checkpoint_update, _auto_open_revision_task_for_deliverable, _auto_persist_evidence, _auto_reconcile_artifact_tasks, _auto_record_blocked_source, _auto_record_failed_shell_sources, _auto_record_grounding_block_lesson, _auto_record_tool_source_...

_is_continuable_recoverable_input_block

function nipux_cli/worker.py:7070

No docstring.

Calls: bool, error.startswith, isinstance, result.get, str

_ordered_tool_calls_for_execution

function nipux_cli/worker.py:7084

Run guard-unblocking calls before branch work when a model batches both.

Calls: _auto_checkpoint_accounting_context, _browser_runtime_unavailable_context, _is_browser_tool, _read_artifact_call_matches_checkpoint, _task_queue_saturation_context, any, bool, checkpoint.get, enumerate, len, priority, sorted, str

priority

function nipux_cli/worker.py:7121

No docstring.

Calls: _read_artifact_call_matches_checkpoint, _task_queue_saturation_context

_registry_tools

function nipux_cli/worker.py:7146

No docstring.

Calls: registry.openai_tools

_registry_tools_for_step

function nipux_cli/worker.py:7153

No docstring.

Calls: _active_obligation_tool_names, _browser_runtime_unavailable_context, _is_browser_tool, _openai_tool_name, _registry_tools, _suppressed_tool_names

_active_obligation_tool_names

function nipux_cli/worker.py:7174

No docstring.

Calls: _as_int, _auto_checkpoint_accounting_context, _experiment_next_action_failure_context, _measured_progress_guard_context, _pending_file_validation_obligation, _pending_measurement_obligation, allowed.add, allowed.update, checkpoint.get, measured_progress.get, set

_suppressed_tool_names

function nipux_cli/worker.py:7197

No docstring.

Calls: _as_int, _auto_checkpoint_accounting_context, _has_acknowledgeable_operator_context, _pending_file_validation_obligation, _pending_measurement_obligation, _repeated_task_queue_saturation_context, _task_backlog_pressure_context, _task_queue_exhausted, backlog.get, set, suppress...

_has_acknowledgeable_operator_context

function nipux_cli/worker.py:7217

No docstring.

Calls: entry.get, isinstance, job.get, lower, metadata.get, replace, str, strip

_openai_tool_name

function nipux_cli/worker.py:7234

No docstring.

Calls: function.get, isinstance, str, tool.get

_call_next_action_with_timeout

function nipux_cli/worker.py:7241

No docstring.

Calls: TimeoutError, float, llm.next_action, max, signal.getitimer, signal.getsignal, signal.setitimer, signal.signal, threading.current_thread, threading.main_thread, time.monotonic

_raise_timeout

function nipux_cli/worker.py:7256

No docstring.

Calls: TimeoutError

_tool_repair_messages

function nipux_cli/worker.py:7272

No docstring.

Calls: len, list, repaired.append, str, strip

run_one_step

function nipux_cli/worker.py:7289

No docstring.

Calls: AgentDB, ArtifactStore, OpenAIChatLLM, StepExecution, _acknowledge_non_prompt_operator_context, _call_next_action_with_timeout, _claim_operator_queue, _clear_invalid_measurement_obligation, _clear_stale_task_backlog_pressure, _emit_assistant_message_event, _emit_loop_end, _emi...

_memory_entries_for_prompt

function nipux_cli/worker_prompt_context.py:64

No docstring.

Calls: entry.get, isinstance, len, next, selected.append

_render_worker_prompt

function nipux_cli/worker_prompt_context.py:79

No docstring.

Calls: PROMPT_SECTION_BUDGETS.get, _clip_text, _redact_stale_tokens_for_prompt, _stale_claim_tokens_for_prompt, int, isinstance, job.get, join, len, max, parts.append, str, suffix_sections.append

_redact_stale_tokens_for_prompt

function nipux_cli/worker_prompt_context.py:118

No docstring.

Calls: _match_inside_path_like_span, match.end, match.group, match.start, re.escape, re.sub, sorted, str, strip

_match_inside_path_like_span

function nipux_cli/worker_prompt_context.py:131

No docstring.

Calls: isspace, len, span.startswith

_operator_messages_for_prompt

function nipux_cli/worker_prompt_context.py:142

No docstring.

Calls: _operator_message_line, _operator_message_visible_in_prompt, active.get, active_prompt_operator_entries, entry.get, isinstance, job.get, join, lines.append, metadata.get, operator_entry_is_prompt_relevant

_operator_message_line

function nipux_cli/worker_prompt_context.py:177

No docstring.

Calls: _clip_text, entry.get, isinstance, join, split, states.append, str

_lessons_for_prompt

function nipux_cli/worker_prompt_context.py:199

No docstring.

Calls: _clip_text, _lesson_prompt_text, _negative_lesson_conflict_tokens, _positive_durable_lines_for_lesson_conflicts, _record_id_for_staleness, _stale_negative_record_ids, entry.get, isinstance, job.get, join, lines.append, metadata.get, str

_lesson_prompt_text

function nipux_cli/worker_prompt_context.py:230

No docstring.

Calls: _normalize_claim_text, _stale_token_is_distinctive, _unsupported_tokens_from_lesson, cleaned.lower, join, lesson.lower, seen.add, set, split, stale_tokens.append, str

_positive_durable_lines_for_lesson_conflicts

function nipux_cli/worker_prompt_context.py:257

No docstring.

Calls: _dict_scalar_text, graph.get, isinstance, lines.append, metadata.get

_negative_lesson_conflict_tokens

function nipux_cli/worker_prompt_context.py:276

No docstring.

Calls: _distinctive_claim_tokens, _positive_line_contains_token, _token_near_negative_marker, any, conflicts.append, join, lesson.lower, seen.add, set, split, str, token.lower

_dict_scalar_text

function nipux_cli/worker_prompt_context.py:296

No docstring.

Calls: _dict_scalar_text, isinstance, join, parts.append, record.items, str

_distinctive_claim_tokens

function nipux_cli/worker_prompt_context.py:308

No docstring.

Calls: _stale_token_is_distinctive, raw.strip, re.findall, tokens.append

_token_near_negative_marker

function nipux_cli/worker_prompt_context.py:317

No docstring.

Calls: any, len, max, text.lower, text_lower.find, token.lower

_positive_line_contains_token

function nipux_cli/worker_prompt_context.py:331

No docstring.

Calls: any, line.lower, token.lower

_memory_graph_for_prompt

function nipux_cli/worker_prompt_context.py:343

No docstring.

Calls: _stale_claim_tokens_for_prompt, isinstance, job.get, join, memory_graph_for_prompt, str

_roadmap_for_prompt

function nipux_cli/worker_prompt_context.py:352

No docstring.

Calls: _clip_text, feature.get, isinstance, job.get, join, len, lines.append, metadata.get, milestone.get, roadmap.get, sorted, status_counts.get, status_counts.items, str, sum, validation_counts.get

_tasks_for_prompt

function nipux_cli/worker_prompt_context.py:416

No docstring.

Calls: _as_int, _clip_text, _metadata_list, _task_output_contract, bits.append, counts.get, counts.items, join, len, lines.append, selected.extend, sorted, status_rank.get, str, task.get

_task_output_contract

function nipux_cli/worker_prompt_context.py:464

No docstring.

Calls: isinstance, metadata.get, str, task.get

_timeline_for_prompt

function nipux_cli/worker_prompt_context.py:469

No docstring.

Calls: _clip_text, _timeline_event_for_prompt, counts.get, counts.items, join, lines.append, selected.append, sorted

_outcomes_for_prompt

function nipux_cli/worker_prompt_context.py:491

Summarize durable outputs so the worker sees progress, not just activity.

Calls: _clip_text, hourly_outcome_summary, join, label.lower, len, lines.append, model_update_event_parts, outcome_counts, reversed, seen.add, set

_ledgers_for_prompt

function nipux_cli/worker_prompt_context.py:519

No docstring.

Calls: _as_float, _as_int, _clip_text, _metadata_list, _record_contains_stale_token, _record_id_for_staleness, _stale_claim_tokens_for_prompt, _stale_negative_record_ids, _stale_negative_records_for_prompt, finding.get, isinstance, item.get, job.get, join, len, lines.append

_stale_claim_tokens_for_prompt

function nipux_cli/worker_prompt_context.py:616

No docstring.

Calls: _normalize_claim_text, _stale_token_is_distinctive, _unsupported_tokens_from_lesson, candidates.append, candidates.extend, isinstance, join, lesson.get, metadata.get, record.get, seen.add, set, split, str, token.lower, tokens.append

_unsupported_tokens_from_lesson

function nipux_cli/worker_prompt_context.py:651

No docstring.

Calls: lesson.lower, match.group, part.strip, re.search, split

_stale_token_is_distinctive

function nipux_cli/worker_prompt_context.py:663

No docstring.

Calls: any, ch.isalpha, ch.isdigit, len, lowered.endswith, lowered.startswith, re.match, token.isupper, token.lower

_normalize_claim_text

function nipux_cli/worker_prompt_context.py:746

No docstring.

Calls: lower, re.sub, str

_record_contains_stale_token

function nipux_cli/worker_prompt_context.py:750

No docstring.

Calls: isinstance, join, metadata.values, re.escape, re.search, record.get, str

_stale_negative_records_for_prompt

function nipux_cli/worker_prompt_context.py:766

No docstring.

Calls: isinstance, metadata.get, record.get, str

_stale_negative_record_ids

function nipux_cli/worker_prompt_context.py:777

No docstring.

Calls: _stale_negative_records_for_prompt, ids.add, record.get, set, str, strip

_record_id_for_staleness

function nipux_cli/worker_prompt_context.py:786

No docstring.

Calls: _normalize_claim_text, record.get, str, strip

_experiments_for_prompt

function nipux_cli/worker_prompt_context.py:794

No docstring.

Calls: _clip_text, _metadata_list, bool, experiment.get, format_metric_value, join, len, lines.append, sorted, status_counts.get, status_counts.items, str

_operator_message_visible_in_prompt

function nipux_cli/worker_prompt_context.py:868

No docstring.

Calls: entry.get, lower, replace, str, strip

_metadata_list

function nipux_cli/worker_prompt_context.py:875

No docstring.

Calls: isinstance, job.get, metadata.get

_timeline_event_for_prompt

function nipux_cli/worker_prompt_context.py:883

No docstring.

Calls: event.get, isinstance, join, lower, metadata.get, split, str, strip, title.lower

_as_float

function nipux_cli/worker_prompt_context.py:910

No docstring.

Calls: float

_as_int

function nipux_cli/worker_prompt_context.py:917

No docstring.

Calls: int

compact

function nipux_cli/worker_prompt_format.py:15

No docstring.

Calls: isinstance, join, json.dumps, len, text.split

clip_text

function nipux_cli/worker_prompt_format.py:21

No docstring.

Calls: join, len, max, rstrip, split, str

format_step_for_prompt

function nipux_cli/worker_prompt_format.py:28

No docstring.

Calls: compact, input_data.get, isinstance, join, observation_for_prompt, pieces.append, step.get

observation_for_prompt

function nipux_cli/worker_prompt_format.py:43

No docstring.

Calls: anti_bot_reason, artifact.get, browser_candidates_for_prompt, clean_prompt_candidate_path, clip_text, compact, data.get, evidence_grounding.get, experiment.get, format_metric_value, isinstance, join, len, lesson.get, node.get, output.get

clean_prompt_candidate_path

function nipux_cli/worker_prompt_format.py:172

No docstring.

Calls: Path, any, ch.isalpha, name.startswith, raw.startswith, re.match, rstrip, str, strip

browser_candidates_for_prompt

function nipux_cli/worker_prompt_format.py:185

No docstring.

Calls: _looks_like_metric_cell, _looks_like_service_description, candidates.append, data.get, isinstance, item.get, join, len, name.lower, output.get, refs.items, seen.add, set, split, str, strip

_looks_like_metric_cell

function nipux_cli/worker_prompt_format.py:213

No docstring.

Calls: bool, name.strip, re.fullmatch

_looks_like_service_description

function nipux_cli/worker_prompt_format.py:218

No docstring.

Calls: any, len, name.lower, text.split

summarize_tool_result

function nipux_cli/worker_tool_summary.py:10

No docstring.

Calls: args.get, data.get, experiment.get, format_metric_value, isinstance, item.get, join, len, lesson.get, page.get, result.get, roadmap.get, source.get, str, update.get, validation.get

turn_usage_metadata

function nipux_cli/worker_usage.py:11

No docstring.

Calls: _as_int, bool, dict, estimate_token_count, isinstance, json.dumps, len, max, round, usage.get, usage.setdefault

estimate_token_count

function nipux_cli/worker_usage.py:38

No docstring.

Calls: len, max

_as_int

function nipux_cli/worker_usage.py:44

No docstring.

Calls: float, int

SourceFile

class scripts/generate_project_atlas.py:28

No docstring.

Calls: none

Symbol

class scripts/generate_project_atlas.py:37

No docstring.

Calls: none

Prompt

class scripts/generate_project_atlas.py:48

No docstring.

Calls: none

main

function scripts/generate_project_atlas.py:56

No docstring.

Calls: OUT.parent.mkdir, OUT.relative_to, OUT.write_text, extract_prompts, extract_symbols, extract_tables, extract_tools, git, len, load_source_files, print, render

load_source_files

function scripts/generate_project_atlas.py:69

No docstring.

Calls: SourceFile, ast.parse, files.append, full.is_file, full.read_text, str, text.splitlines, tracked_paths

tracked_paths

function scripts/generate_project_atlas.py:90

No docstring.

Calls: git, line.strip, output.splitlines, sorted

git

function scripts/generate_project_atlas.py:97

No docstring.

Calls: result.stdout.strip, subprocess.run

extract_symbols

function scripts/generate_project_atlas.py:105

No docstring.

Calls: Symbol, ast.get_docstring, ast.walk, call_names, getattr, isinstance, sorted, symbols.append

call_names

function scripts/generate_project_atlas.py:127

No docstring.

Calls: ast.walk, dotted_name, isinstance, names.add, set

dotted_name

function scripts/generate_project_atlas.py:138

No docstring.

Calls: dotted_name, isinstance

extract_prompts

function scripts/generate_project_atlas.py:147

No docstring.

Calls: Prompt, any, assignment_names, ast.walk, deduped.get, deduped.values, getattr, is_prompt_name, is_prompt_text, isinstance, join, literal_string, prompts.append, sorted

assignment_names

function scripts/generate_project_atlas.py:188

No docstring.

Calls: isinstance, names.append

literal_string

function scripts/generate_project_atlas.py:199

No docstring.

Calls: isinstance, join

is_prompt_name

function scripts/generate_project_atlas.py:207

No docstring.

Calls: any, name.upper, upper.endswith

is_prompt_text

function scripts/generate_project_atlas.py:212

No docstring.

Calls: clean.lower, join, len, text.split

extract_tools

function scripts/generate_project_atlas.py:225

No docstring.

Calls: count, join, match.group, match.start, next, pattern.finditer, re.compile, split, str, tools.append

extract_tables

function scripts/generate_project_atlas.py:235

No docstring.

Calls: count, line.strip, line.upper, match.group, match.start, next, re.finditer, rstrip, splitlines, startswith, tables.append

render

function scripts/generate_project_atlas.py:245

No docstring.

Calls: architecture_nodes, esc, join, len, render_file_card, render_prompt, render_review_points, render_source_file, render_symbol, render_table, runtime_flow, source.path.endswith, sum, test_cards

architecture_nodes

function scripts/generate_project_atlas.py:450

No docstring.

Calls: esc, join

runtime_flow

function scripts/generate_project_atlas.py:469

No docstring.

Calls: esc, join

render_file_card

function scripts/generate_project_atlas.py:483

No docstring.

Calls: esc, join, len, module_doc, module_imports, short, source.text.count

module_doc

function scripts/generate_project_atlas.py:496

No docstring.

Calls: ast.get_docstring, line.strip, stripped.startswith

module_imports

function scripts/generate_project_atlas.py:506

No docstring.

Calls: isinstance, names.append, names.extend

render_source_file

function scripts/generate_project_atlas.py:519

No docstring.

Calls: enumerate, esc, join, len, redact_source_line

redact_source_line

function scripts/generate_project_atlas.py:532

No docstring.

Calls: SENSITIVE_ASSIGNMENT_RE.match, match.groups

render_symbol

function scripts/generate_project_atlas.py:540

No docstring.

Calls: esc, join, short

render_prompt

function scripts/generate_project_atlas.py:551

No docstring.

Calls: esc, len

render_table

function scripts/generate_project_atlas.py:560

No docstring.

Calls: esc, join

test_cards

function scripts/generate_project_atlas.py:569

No docstring.

Calls: ast.walk, cards.append, esc, isinstance, join, len, node.name.startswith, short, source.path.startswith

render_review_points

function scripts/generate_project_atlas.py:584

No docstring.

Calls: esc, join, len, sorted

short

function scripts/generate_project_atlas.py:607

No docstring.

Calls: join, len, max, split, str

esc

function scripts/generate_project_atlas.py:614

No docstring.

Calls: html.escape, str

main

function scripts/live_memory_graph_smoke.py:32

No docstring.

Calls: AgentDB, AppConfig, ModelConfig, Path, RuntimeConfig, ToolAccessConfig, _execution_summary, _finish, _seed_metadata, argparse.ArgumentParser, args.base_url.rstrip, config.ensure_dirs, db.close, db.create_job, db.get_job, db.update_job_status

_seed_metadata

function scripts/live_memory_graph_smoke.py:121

No docstring.

Calls: none

_execution_summary

function scripts/live_memory_graph_smoke.py:187

No docstring.

Calls: isinstance, result.get

_finish

function scripts/live_memory_graph_smoke.py:200

No docstring.

Calls: _human_summary, json.dumps, payload.get, print

_human_summary

function scripts/live_memory_graph_smoke.py:208

No docstring.

Calls: bool, isinstance, item.get, join, lines.append, payload.get

Cell

class scripts/render_nipux_ascii_video.py:156

No docstring.

Calls: dataclass

TextGrid

class scripts/render_nipux_ascii_video.py:161

No docstring.

Calls: Cell, enumerate, len, range, self.put, self.set

__init__

function scripts/render_nipux_ascii_video.py:162

No docstring.

Calls: Cell, range

set

function scripts/render_nipux_ascii_video.py:165

No docstring.

Calls: Cell

put

function scripts/render_nipux_ascii_video.py:169

No docstring.

Calls: enumerate, self.set

center

function scripts/render_nipux_ascii_video.py:173

No docstring.

Calls: len, self.put

box

function scripts/render_nipux_ascii_video.py:176

No docstring.

Calls: range, self.put, self.set

clamp

function scripts/render_nipux_ascii_video.py:188

No docstring.

Calls: max, min

ease

function scripts/render_nipux_ascii_video.py:192

No docstring.

Calls: clamp

mix

function scripts/render_nipux_ascii_video.py:197

No docstring.

Calls: clamp, int, range, tuple

logo_origin

function scripts/render_nipux_ascii_video.py:202

No docstring.

Calls: len, max

put_logo

function scripts/render_nipux_ascii_video.py:207

No docstring.

Calls: enumerate, grid.set, logo_origin, mix, random.Random, rng.choice, rng.random

put_collapsing_logo

function scripts/render_nipux_ascii_video.py:228

No docstring.

Calls: enumerate, grid.set, logo_origin, math.sin, mix, random.Random, rng.choice, rng.random, round

put_progress_bar

function scripts/render_nipux_ascii_video.py:251

No docstring.

Calls: clamp, grid.put, int, round

put_rain

function scripts/render_nipux_ascii_video.py:260

No docstring.

Calls: choice, grid.set, max, mix, random, random.Random, range, rng.randint

put_boot_lines

function scripts/render_nipux_ascii_video.py:276

No docstring.

Calls: clamp, enumerate, grid.put, int, len, put_progress_bar

put_cli

function scripts/render_nipux_ascii_video.py:291

No docstring.

Calls: clamp, grid.put, grid.set, int, len, put_progress_bar

build_grid

function scripts/render_nipux_ascii_video.py:313

No docstring.

Calls: TextGrid, ease, grid.box, grid.center, grid.put, int, put_boot_lines, put_cli, put_collapsing_logo, put_logo, put_rain

draw_rect

function scripts/render_nipux_ascii_video.py:358

No docstring.

Calls: bytes, len, max, min, range

build_base_frame

function scripts/render_nipux_ascii_video.py:371

No docstring.

Calls: bytearray, bytes, draw_rect, len, range

glyph_for

function scripts/render_nipux_ascii_video.py:398

No docstring.

Calls: GLYPHS.get, char.upper

draw_glyph

function scripts/render_nipux_ascii_video.py:405

No docstring.

Calls: draw_rect, enumerate, glyph_for, int, max, tuple

render_frame

function scripts/render_nipux_ascii_video.py:422

No docstring.

Calls: build_grid, bytearray, bytes, draw_glyph, draw_rect, enumerate, int, sum

render_video

function scripts/render_nipux_ascii_video.py:441

No docstring.

Calls: SystemExit, decode, int, output.parent.mkdir, poster.parent.mkdir, print, process.stderr.read, process.stdin.close, process.stdin.write, process.wait, range, render_frame, shutil.which, str, subprocess.Popen, subprocess.run

parse_args

function scripts/render_nipux_ascii_video.py:508

No docstring.

Calls: Path, argparse.ArgumentParser, parser.add_argument, parser.parse_args

main

function scripts/render_nipux_ascii_video.py:515

No docstring.

Calls: parse_args, print, render_video

test_artifact_store_writes_reads_and_searches

function tests/nipux_cli/test_artifacts.py:7

No docstring.

Calls: AgentDB, ArtifactStore, db.close, db.create_job, startswith, store.read_text, store.search_text, store.write_text

test_artifact_store_rejects_paths_outside_home

function tests/nipux_cli/test_artifacts.py:28

No docstring.

Calls: ArtifactStore, outside.write_text, pytest.raises, store.read_text, str

test_session_name_is_stable_and_safe

function tests/nipux_cli/test_browser_web.py:11

No docstring.

Calls: _session_name

test_long_session_name_is_short_and_hashed

function tests/nipux_cli/test_browser_web.py:15

No docstring.

Calls: _session_name, _socket_dir, len, name.startswith, str

test_strip_html_removes_scripts_and_keeps_text

function tests/nipux_cli/test_browser_web.py:25

No docstring.

Calls: _strip_html

test_browser_marks_anti_bot_interstitial_as_warning

function tests/nipux_cli/test_browser_web.py:32

No docstring.

Calls: _annotate_source_quality

test_browser_marks_captcha_block_as_warning

function tests/nipux_cli/test_browser_web.py:45

No docstring.

Calls: _annotate_source_quality

test_web_extract_marks_anti_bot_pages_as_warning

function tests/nipux_cli/test_browser_web.py:56

No docstring.

Calls: monkeypatch.setattr, web.web_extract

fake_request

function tests/nipux_cli/test_browser_web.py:59

No docstring.

Calls: none

test_browser_tool_uses_native_wrapper

function tests/nipux_cli/test_browser_web.py:73

No docstring.

Calls: AgentDB, AppConfig, ArtifactStore, DEFAULT_REGISTRY.handle, RuntimeConfig, ToolContext, db.close, db.create_job, json.loads, monkeypatch.setattr

fake_navigate

function tests/nipux_cli/test_browser_web.py:82

No docstring.

Calls: none

test_browser_click_adds_recovery_snapshot_for_stale_ref

function tests/nipux_cli/test_browser_web.py:93

No docstring.

Calls: AppConfig, RuntimeConfig, browser.click, calls.append, monkeypatch.setattr

fake_command

function tests/nipux_cli/test_browser_web.py:99

No docstring.

Calls: calls.append

_mode

function tests/nipux_cli/test_cli.py:77

No docstring.

Calls: path.stat

_mark_test_model_ready

function tests/nipux_cli/test_cli.py:81

No docstring.

Calls: _mark_model_setup_verified, ensure_dirs, load_config

test_cli_has_operator_commands

function tests/nipux_cli/test_cli.py:86

No docstring.

Calls: build_parser, parser.parse_args

test_cli_version_flag

function tests/nipux_cli/test_cli.py:146

No docstring.

Calls: capsys.readouterr, main

test_main_catches_keyboard_interrupt_without_traceback

function tests/nipux_cli/test_cli.py:155

No docstring.

Calls: capsys.readouterr, main, monkeypatch.setattr

interrupt

function tests/nipux_cli/test_cli.py:156

No docstring.

Calls: none

test_python_module_entrypoint_uses_cli_main

function tests/nipux_cli/test_cli.py:166

No docstring.

Calls: none

test_init_openrouter_writes_secret_free_config_and_env_template

function tests/nipux_cli/test_cli.py:172

No docstring.

Calls: _mode, capsys.readouterr, endswith, env_text.strip, main, monkeypatch.setenv, read_text, str

test_init_defaults_to_local_endpoint

function tests/nipux_cli/test_cli.py:191

No docstring.

Calls: endswith, env_text.strip, main, monkeypatch.setenv, read_text, str

test_init_openrouter_defaults_to_generic_route

function tests/nipux_cli/test_cli.py:204

No docstring.

Calls: main, monkeypatch.setenv, read_text, str

test_shell_freeform_text_adds_operator_message

function tests/nipux_cli/test_cli.py:215

No docstring.

Calls: AgentDB, _run_shell_line, capsys.readouterr, db.close, db.create_job, db.get_job, monkeypatch.setenv, str

test_main_no_args_enters_chat_first_home

function tests/nipux_cli/test_cli.py:238

No docstring.

Calls: AgentDB, _mark_test_model_ready, capsys.readouterr, db.append_agent_update, db.append_operator_message, db.close, db.create_job, main, monkeypatch.setattr, monkeypatch.setenv, str

eof_input

function tests/nipux_cli/test_cli.py:249

No docstring.

Calls: none

test_main_no_args_with_no_jobs_requires_setup_frame

function tests/nipux_cli/test_cli.py:262

No docstring.

Calls: capsys.readouterr, main, monkeypatch.setattr, monkeypatch.setenv, str

eof_input

function tests/nipux_cli/test_cli.py:265

No docstring.

Calls: none

test_main_no_args_with_old_setup_marker_still_requires_model_verification

function tests/nipux_cli/test_cli.py:282

No docstring.

Calls: capsys.readouterr, ensure_dirs, load_config, main, monkeypatch.setenv, str, write_shell_state

test_main_no_args_after_setup_complete_does_not_reopen_setup

function tests/nipux_cli/test_cli.py:296

No docstring.

Calls: _mark_test_model_ready, capsys.readouterr, main, monkeypatch.setenv, str

test_main_no_args_autoverifies_existing_model_config

function tests/nipux_cli/test_cli.py:308

No docstring.

Calls: Check, _model_setup_verified, capsys.readouterr, load_config, main, monkeypatch.setattr, monkeypatch.setenv, str, tmp_path.mkdir, write_text

fake_doctor

function tests/nipux_cli/test_cli.py:320

No docstring.

Calls: Check

test_main_no_args_enters_setup_when_existing_model_config_fails

function tests/nipux_cli/test_cli.py:341

No docstring.

Calls: Check, _model_setup_verified, capsys.readouterr, load_config, main, monkeypatch.setattr, monkeypatch.setenv, str, tmp_path.mkdir, write_text

fake_doctor

function tests/nipux_cli/test_cli.py:352

No docstring.

Calls: Check

test_main_no_args_keeps_workspace_locked_after_completed_setup_if_provider_fails

function tests/nipux_cli/test_cli.py:366

No docstring.

Calls: Check, _model_setup_verified, _write_shell_state, capsys.readouterr, load_config, main, monkeypatch.setattr, monkeypatch.setenv, str, tmp_path.mkdir, write_text

fake_doctor

function tests/nipux_cli/test_cli.py:378

No docstring.

Calls: Check

test_first_run_refuses_job_before_model_is_verified

function tests/nipux_cli/test_cli.py:392

No docstring.

Calls: AgentDB, _handle_first_run_frame_line, db.close, db.list_jobs, monkeypatch.setenv, str

test_doctor_check_model_marks_model_setup_verified

function tests/nipux_cli/test_cli.py:412

No docstring.

Calls: Check, _model_setup_verified, args.func, build_parser, capsys.readouterr, load_config, monkeypatch.setattr, monkeypatch.setenv, parse_args, str

fake_doctor

function tests/nipux_cli/test_cli.py:415

No docstring.

Calls: Check

test_first_run_doctor_failure_shows_inline_fix_commands

function tests/nipux_cli/test_cli.py:434

No docstring.

Calls: Check, _verify_model_setup_from_first_run, join, monkeypatch.setattr, monkeypatch.setenv, str

fake_doctor

function tests/nipux_cli/test_cli.py:437

No docstring.

Calls: Check

test_setting_change_clears_model_setup_verification

function tests/nipux_cli/test_cli.py:453

No docstring.

Calls: _inline_setting_notice, _mark_test_model_ready, _model_setup_verified, load_config, monkeypatch.setenv, str

test_first_run_menu_blocks_job_creation_until_workspace_chat

function tests/nipux_cli/test_cli.py:463

No docstring.

Calls: AgentDB, _handle_first_run_menu_line, _mark_test_model_ready, capsys.readouterr, db.close, db.list_jobs, monkeypatch.setenv, str

test_first_run_plain_greeting_does_not_create_job

function tests/nipux_cli/test_cli.py:478

No docstring.

Calls: AgentDB, _handle_first_run_menu_line, capsys.readouterr, db.close, db.list_jobs, monkeypatch.setenv, str

test_first_run_frame_uses_full_screen_ui_not_banner

function tests/nipux_cli/test_cli.py:492

No docstring.

Calls: _build_first_run_frame, frame.splitlines, lower, monkeypatch.setenv, str

test_first_run_frame_hides_command_popup_during_setup

function tests/nipux_cli/test_cli.py:514

No docstring.

Calls: _build_first_run_frame, monkeypatch.setenv, str

test_first_run_frame_walks_setup_screens

function tests/nipux_cli/test_cli.py:528

No docstring.

Calls: _build_first_run_frame, monkeypatch.setenv, str

test_first_run_frame_does_not_use_command_palette_for_setup

function tests/nipux_cli/test_cli.py:557

No docstring.

Calls: _build_first_run_frame, monkeypatch.setenv, str

test_settings_editor_persists_model_config

function tests/nipux_cli/test_cli.py:568

No docstring.

Calls: _config_field_value, _inline_setting_notice, _save_config_field, monkeypatch.setenv, read_text, str

test_slash_autocomplete_filters_commands

function tests/nipux_cli/test_cli.py:583

No docstring.

Calls: _autocomplete_slash, _cycle_slash, _slash_completion_for_submit, _slash_suggestion_lines, join

test_terminal_escape_decodes_arrows_and_mouse_click

function tests/nipux_cli/test_cli.py:636

No docstring.

Calls: _decode_terminal_escape

test_first_run_click_maps_right_pane_actions

function tests/nipux_cli/test_cli.py:647

No docstring.

Calls: _first_run_click_action, monkeypatch.setattr

test_first_run_arrow_navigation_changes_setup_screens

function tests/nipux_cli/test_cli.py:657

No docstring.

Calls: _directional_first_run_action

test_frame_next_job_cycles_jobs

function tests/nipux_cli/test_cli.py:683

No docstring.

Calls: _frame_next_job_id

test_frame_refresh_slows_background_updates_while_typing

function tests/nipux_cli/test_cli.py:691

No docstring.

Calls: _frame_refresh_interval

test_first_run_empty_submit_without_actions_does_not_crash

function tests/nipux_cli/test_cli.py:695

No docstring.

Calls: _FirstRunRuntimeDeps, _submit_first_run_line

test_first_run_required_edit_cancel_and_clear_stay_on_same_field

function tests/nipux_cli/test_cli.py:710

No docstring.

Calls: _handle_first_run_edit_input, join

test_chat_settings_edit_supports_ctrl_u_clear

function tests/nipux_cli/test_cli.py:739

No docstring.

Calls: _handle_chat_edit_input

test_first_run_render_failure_uses_safe_mode

function tests/nipux_cli/test_cli.py:753

No docstring.

Calls: RuntimeError, _FirstRunRuntimeDeps, _safe_first_run_render_frame, capsys.readouterr, join, throw

test_chat_submit_failure_stays_in_frame

function tests/nipux_cli/test_cli.py:778

No docstring.

Calls: RuntimeError, _ChatFrameDeps, _drain_chat_async_notices, _handle_chat_submit, async_messages.get, async_messages.put, join, queue.Queue, throw

test_chat_submit_plain_message_returns_without_waiting_for_model

function tests/nipux_cli/test_cli.py:819

No docstring.

Calls: _ChatFrameDeps, _handle_chat_submit, async_messages.get, join, queue.Queue, time.monotonic, time.sleep

slow_chat

function tests/nipux_cli/test_cli.py:823

No docstring.

Calls: time.sleep

test_chat_submit_plain_message_renders_thinking_notice_without_echoing_message

function tests/nipux_cli/test_cli.py:857

No docstring.

Calls: _ChatFrameDeps, _display_chat_notices, _handle_chat_submit, chat_pane_lines, join, queue.Queue

test_chat_submit_waiting_command_output_becomes_animation

function tests/nipux_cli/test_cli.py:891

No docstring.

Calls: _ChatFrameDeps, _display_chat_notices, _handle_chat_submit, chat_pane_lines, join

test_chat_submit_new_refreshes_focused_job_from_shell_state

function tests/nipux_cli/test_cli.py:926

No docstring.

Calls: _ChatFrameDeps, _handle_chat_submit, join, loaded.append

load_snapshot

function tests/nipux_cli/test_cli.py:931

No docstring.

Calls: loaded.append

test_workspace_chat_submit_new_keeps_workspace_chat_left_pane

function tests/nipux_cli/test_cli.py:962

No docstring.

Calls: _ChatFrameDeps, _handle_chat_submit, join, loaded.append

load_snapshot

function tests/nipux_cli/test_cli.py:972

No docstring.

Calls: loaded.append

test_chat_render_failure_uses_safe_mode

function tests/nipux_cli/test_cli.py:1004

No docstring.

Calls: RuntimeError, _ChatFrameDeps, _safe_chat_render_frame, capsys.readouterr, join, throw

test_chat_help_has_config_slash_commands_without_settings_page

function tests/nipux_cli/test_cli.py:1034

No docstring.

Calls: AgentDB, _chat_handle_line, capsys.readouterr, db.close, db.create_job, monkeypatch.setenv, str

test_chat_slash_palette_matches_public_chat_commands

function tests/nipux_cli/test_cli.py:1063

No docstring.

Calls: len

test_workspace_help_is_minimal_and_actionable

function tests/nipux_cli/test_cli.py:1133

No docstring.

Calls: _capture_chat_command, _mark_test_model_ready, monkeypatch.setenv, str

test_first_run_slash_palette_matches_setup_commands

function tests/nipux_cli/test_cli.py:1148

No docstring.

Calls: len

test_chat_settings_slash_commands_persist_config

function tests/nipux_cli/test_cli.py:1185

No docstring.

Calls: AgentDB, _chat_handle_line, _config_field_value, _mode, capsys.readouterr, db.close, db.create_job, monkeypatch.setenv, read_text, str

test_chat_init_slash_command_does_not_crash

function tests/nipux_cli/test_cli.py:1251

No docstring.

Calls: AgentDB, _chat_handle_line, capsys.readouterr, db.close, db.create_job, exists, monkeypatch.setenv, str

test_chat_config_slash_command_summarizes_runtime_without_secret

function tests/nipux_cli/test_cli.py:1267

No docstring.

Calls: AgentDB, _chat_handle_line, capsys.readouterr, db.close, db.create_job, monkeypatch.setenv, str, write_text

test_chat_usage_slash_command_reports_tokens

function tests/nipux_cli/test_cli.py:1308

No docstring.

Calls: AgentDB, _chat_handle_line, capsys.readouterr, db.append_event, db.close, db.create_job, monkeypatch.setenv, str

test_chat_usage_estimates_cost_from_configured_rates

function tests/nipux_cli/test_cli.py:1337

No docstring.

Calls: AgentDB, _chat_handle_line, capsys.readouterr, db.append_event, db.close, db.create_job, monkeypatch.setenv, str, write_text

test_chat_usage_shows_configured_job_cost_limit

function tests/nipux_cli/test_cli.py:1374

No docstring.

Calls: AgentDB, _chat_handle_line, capsys.readouterr, db.append_event, db.close, db.create_job, monkeypatch.setenv, str, write_text

test_first_run_settings_slash_commands_persist_config

function tests/nipux_cli/test_cli.py:1404

No docstring.

Calls: _config_field_value, _handle_first_run_frame_line, any, isinstance, monkeypatch.setenv, str

test_first_run_local_connector_action_sets_generic_local_endpoint

function tests/nipux_cli/test_cli.py:1415

No docstring.

Calls: _config_field_value, _handle_first_run_action, any, isinstance, monkeypatch.setenv, str

test_first_run_access_action_toggles_generic_tools

function tests/nipux_cli/test_cli.py:1429

No docstring.

Calls: _config_field_value, _handle_first_run_action, any, isinstance, monkeypatch.setenv, str

test_first_run_doctor_success_opens_workspace_chat

function tests/nipux_cli/test_cli.py:1440

No docstring.

Calls: _handle_first_run_action, _mark_test_model_ready, monkeypatch.setattr, monkeypatch.setenv, str

fake_verify

function tests/nipux_cli/test_cli.py:1443

No docstring.

Calls: _mark_test_model_ready

test_first_run_open_workspace_action_requires_verified_model

function tests/nipux_cli/test_cli.py:1455

No docstring.

Calls: _handle_first_run_action, monkeypatch.setenv, str

test_first_run_open_workspace_action_opens_after_verified_model

function tests/nipux_cli/test_cli.py:1464

No docstring.

Calls: _handle_first_run_action, _mark_test_model_ready, monkeypatch.setenv, str

test_workspace_frame_snapshot_exists_without_jobs

function tests/nipux_cli/test_cli.py:1474

No docstring.

Calls: _build_chat_frame, _load_frame_snapshot, _mark_test_model_ready, monkeypatch.setenv, str

test_workspace_frame_right_pane_tracks_focused_worker

function tests/nipux_cli/test_cli.py:1490

No docstring.

Calls: AgentDB, _build_chat_frame, _load_frame_snapshot, _mark_test_model_ready, _write_shell_state, db.add_artifact, db.close, db.create_job, monkeypatch.setenv, str

test_workspace_slash_new_creates_and_focuses_job

function tests/nipux_cli/test_cli.py:1518

No docstring.

Calls: AgentDB, _capture_chat_command, _mark_test_model_ready, _read_shell_state, db.close, db.list_jobs, get, len, monkeypatch.setattr, monkeypatch.setenv, started.update, str

fake_start

function tests/nipux_cli/test_cli.py:1523

No docstring.

Calls: started.update

test_workspace_chat_job_dossier_includes_progress_outputs_and_outcomes

function tests/nipux_cli/test_cli.py:1545

No docstring.

Calls: AgentDB, _workspace_chat_job_dossier, db.add_artifact, db.add_step, db.append_experiment_record, db.append_finding_record, db.append_source_record, db.append_task_record, db.close, db.create_job, db.finish_step, db.get_job, db.start_run

test_workspace_run_with_objective_creates_worker_when_no_job_matches

function tests/nipux_cli/test_cli.py:1585

No docstring.

Calls: AgentDB, _capture_chat_command, _mark_test_model_ready, db.close, db.list_jobs, len, monkeypatch.setattr, monkeypatch.setenv, started.update, str

fake_start

function tests/nipux_cli/test_cli.py:1590

No docstring.

Calls: started.update

test_workspace_run_with_existing_job_does_not_create_duplicate

function tests/nipux_cli/test_cli.py:1613

No docstring.

Calls: AgentDB, _capture_chat_command, _mark_test_model_ready, _read_shell_state, db.close, db.create_job, db.list_jobs, get, len, monkeypatch.setattr, monkeypatch.setenv, started.update, str

fake_start

function tests/nipux_cli/test_cli.py:1623

No docstring.

Calls: started.update

test_workspace_start_with_existing_job_runs_without_parser_error

function tests/nipux_cli/test_cli.py:1642

No docstring.

Calls: AgentDB, _capture_chat_command, _mark_test_model_ready, _read_shell_state, db.close, db.create_job, db.list_jobs, get, len, monkeypatch.setattr, monkeypatch.setenv, started.update, str

fake_start

function tests/nipux_cli/test_cli.py:1652

No docstring.

Calls: started.update

test_workspace_slash_new_without_objective_is_minimal

function tests/nipux_cli/test_cli.py:1672

No docstring.

Calls: _capture_chat_command, _mark_test_model_ready, monkeypatch.setenv, output.lower, output.strip, str

test_workspace_slash_new_hides_model_preflight_noise

function tests/nipux_cli/test_cli.py:1683

No docstring.

Calls: _capture_chat_command, _mark_test_model_ready, monkeypatch.setattr, monkeypatch.setenv, print, str

fake_start

function tests/nipux_cli/test_cli.py:1687

No docstring.

Calls: print

test_workspace_settings_slash_commands_persist_config

function tests/nipux_cli/test_cli.py:1703

No docstring.

Calls: AgentDB, _capture_chat_command, _config_field_value, _mark_test_model_ready, db.close, db.list_jobs, monkeypatch.setenv, str

test_workspace_settings_slash_command_summarizes_config

function tests/nipux_cli/test_cli.py:1720

No docstring.

Calls: _capture_chat_command, _mark_test_model_ready, monkeypatch.setenv, str, write_text

test_workspace_natural_control_phrase_uses_mapped_command

function tests/nipux_cli/test_cli.py:1746

No docstring.

Calls: AgentDB, _capture_chat_command, _mark_test_model_ready, db.close, db.list_jobs, monkeypatch.setenv, str

test_workspace_natural_settings_phrase_opens_settings_summary

function tests/nipux_cli/test_cli.py:1763

No docstring.

Calls: _capture_chat_command, _mark_test_model_ready, monkeypatch.setenv, str, write_text

test_workspace_how_to_start_job_question_uses_local_help

function tests/nipux_cli/test_cli.py:1787

No docstring.

Calls: AssertionError, _capture_chat_command, _mark_test_model_ready, monkeypatch.setattr, monkeypatch.setenv, str

fail_model

function tests/nipux_cli/test_cli.py:1791

No docstring.

Calls: AssertionError

test_workspace_chat_connection_error_is_operator_friendly

function tests/nipux_cli/test_cli.py:1803

No docstring.

Calls: RuntimeError, _handle_workspace_chat_message, _load_frame_snapshot, _mark_test_model_ready, event.get, join, monkeypatch.setattr, monkeypatch.setenv, str

raise_connection

function tests/nipux_cli/test_cli.py:1807

No docstring.

Calls: RuntimeError

test_chat_start_reports_model_provider_not_ready

function tests/nipux_cli/test_cli.py:1824

No docstring.

Calls: AgentDB, _chat_handle_line, _mark_test_model_ready, capsys.readouterr, db.close, db.create_job, monkeypatch.setattr, monkeypatch.setenv, print, str

fake_start

function tests/nipux_cli/test_cli.py:1833

No docstring.

Calls: print

test_chat_doctor_checks_configured_model

function tests/nipux_cli/test_cli.py:1847

No docstring.

Calls: AgentDB, Check, _chat_handle_line, _mark_test_model_ready, capsys.readouterr, db.close, db.create_job, monkeypatch.setattr, monkeypatch.setenv, str

fake_doctor

function tests/nipux_cli/test_cli.py:1858

No docstring.

Calls: Check

test_workspace_chat_control_phrase_runs_job_command

function tests/nipux_cli/test_cli.py:1870

No docstring.

Calls: AgentDB, _handle_workspace_chat_message, _mark_test_model_ready, _read_shell_state, any, db.close, db.create_job, db.get_job, get, monkeypatch.setenv, str

test_shell_ls_alias_lists_jobs_instead_of_steering

function tests/nipux_cli/test_cli.py:1898

No docstring.

Calls: AgentDB, _run_shell_line, capsys.readouterr, db.close, db.create_job, monkeypatch.setenv, str

test_roadmap_command_renders_roadmap

function tests/nipux_cli/test_cli.py:1913

No docstring.

Calls: AgentDB, capsys.readouterr, db.append_roadmap_record, db.close, db.create_job, main, monkeypatch.setenv, str

test_shell_focus_controls_default_steering_job

function tests/nipux_cli/test_cli.py:1942

No docstring.

Calls: AgentDB, _run_shell_line, capsys.readouterr, db.close, db.create_job, db.get_job, get, monkeypatch.setenv, str

test_shell_rename_updates_job_title_and_program

function tests/nipux_cli/test_cli.py:1966

No docstring.

Calls: AgentDB, _run_shell_line, capsys.readouterr, db.close, db.create_job, db.get_job, monkeypatch.setenv, program.parent.mkdir, program.read_text, program.write_text, startswith, str

test_shell_delete_removes_job_and_artifact_dir

function tests/nipux_cli/test_cli.py:1990

No docstring.

Calls: AgentDB, ArtifactStore, AssertionError, _run_shell_line, artifact_path.exists, capsys.readouterr, db.add_step, db.close, db.create_job, db.get_job, db.start_run, exists, monkeypatch.setenv, store.write_text, str

test_shell_help_has_no_examples_or_control_run_sections

function tests/nipux_cli/test_cli.py:2029

No docstring.

Calls: _print_shell_help, capsys.readouterr

test_update_checkout_falls_back_to_tool_install_for_non_git_path

function tests/nipux_cli/test_cli.py:2043

No docstring.

Calls: _update_checkout, join, monkeypatch.setattr, subprocess.CompletedProcess

runner

function tests/nipux_cli/test_cli.py:2046

No docstring.

Calls: subprocess.CompletedProcess

test_update_checkout_upgrades_uv_tool_when_installed_package

function tests/nipux_cli/test_cli.py:2068

No docstring.

Calls: _update_checkout, calls.append, join, monkeypatch.setattr, subprocess.CompletedProcess, tuple

runner

function tests/nipux_cli/test_cli.py:2073

No docstring.

Calls: calls.append, subprocess.CompletedProcess, tuple

test_update_checkout_fast_forwards_git_checkout

function tests/nipux_cli/test_cli.py:2099

No docstring.

Calls: AssertionError, _update_checkout, calls.append, join, mkdir, repo.mkdir, str, subprocess.CompletedProcess, tuple

runner

function tests/nipux_cli/test_cli.py:2106

No docstring.

Calls: AssertionError, calls.append, str, subprocess.CompletedProcess, tuple

test_update_checkout_verifies_installed_command

function tests/nipux_cli/test_cli.py:2132

No docstring.

Calls: _update_checkout, calls.append, join, monkeypatch.setattr, subprocess.CompletedProcess, tuple

which

function tests/nipux_cli/test_cli.py:2135

No docstring.

Calls: none

runner

function tests/nipux_cli/test_cli.py:2145

No docstring.

Calls: calls.append, subprocess.CompletedProcess, tuple

test_update_command_reports_no_restart_when_daemon_is_stopped

function tests/nipux_cli/test_cli.py:2158

No docstring.

Calls: args.func, build_parser, capsys.readouterr, monkeypatch.setattr, monkeypatch.setenv, parse_args, str

test_update_command_restarts_running_daemon

function tests/nipux_cli/test_cli.py:2174

No docstring.

Calls: args.func, build_parser, capsys.readouterr, monkeypatch.setattr, monkeypatch.setenv, parse_args, print, str

fake_restart

function tests/nipux_cli/test_cli.py:2183

No docstring.

Calls: print

test_update_command_no_restart_flag_skips_running_daemon

function tests/nipux_cli/test_cli.py:2199

No docstring.

Calls: AssertionError, args.func, build_parser, capsys.readouterr, monkeypatch.setattr, monkeypatch.setenv, parse_args, str, throw

test_uninstall_dry_run_removes_installed_tool_by_default

function tests/nipux_cli/test_cli.py:2214

No docstring.

Calls: args.func, build_parser, capsys.readouterr, monkeypatch.setenv, parse_args, str

test_uninstall_keep_tool_skips_tool_removal

function tests/nipux_cli/test_cli.py:2226

No docstring.

Calls: args.func, build_parser, capsys.readouterr, monkeypatch.setenv, parse_args, str

test_uninstall_runtime_skips_missing_systemd_service_without_runner_noise

function tests/nipux_cli/test_cli.py:2237

No docstring.

Calls: calls.append, join, monkeypatch.setattr, monkeypatch.setenv, runtime.mkdir, str, subprocess.CompletedProcess, uninstall_runtime

which

function tests/nipux_cli/test_cli.py:2245

No docstring.

Calls: none

runner

function tests/nipux_cli/test_cli.py:2250

No docstring.

Calls: calls.append, subprocess.CompletedProcess

test_chat_clear_does_not_queue_operator_message

function tests/nipux_cli/test_cli.py:2265

No docstring.

Calls: AgentDB, _chat_handle_line, capsys.readouterr, db.close, db.create_job, db.get_job, get, monkeypatch.setenv, str

test_minimal_live_event_line_summarizes_tool_steps

function tests/nipux_cli/test_cli.py:2285

No docstring.

Calls: _minimal_live_event_line

test_chat_frame_is_bounded_and_has_composer

function tests/nipux_cli/test_cli.py:2298

No docstring.

Calls: _build_chat_frame, frame.splitlines, len, startswith, wide_frame.splitlines

test_chat_frame_separates_chat_from_worker_activity

function tests/nipux_cli/test_cli.py:2412

No docstring.

Calls: _build_chat_frame, frame.split

test_chat_frame_empty_state_is_minimal_and_actionable

function tests/nipux_cli/test_cli.py:2447

No docstring.

Calls: _build_chat_frame

test_frame_emit_skips_unchanged_render

function tests/nipux_cli/test_cli.py:2478

No docstring.

Calls: _emit_frame_if_changed, capsys.readouterr, out.count

test_chat_frame_does_not_cap_long_agent_messages

function tests/nipux_cli/test_cli.py:2493

No docstring.

Calls: _build_chat_frame

test_plain_chat_control_intents_map_to_commands

function tests/nipux_cli/test_cli.py:2533

No docstring.

Calls: _chat_control_command

test_plain_chat_classifier_keeps_natural_controls_out_of_model_path

function tests/nipux_cli/test_cli.py:2570

No docstring.

Calls: _is_plain_chat_line

test_plain_chat_control_intent_does_not_queue_operator_context

function tests/nipux_cli/test_cli.py:2577

No docstring.

Calls: AgentDB, _handle_chat_message, _mark_test_model_ready, db.close, db.create_job, db.get_job, get, monkeypatch.setattr, monkeypatch.setenv, str

fake_capture

function tests/nipux_cli/test_cli.py:2588

No docstring.

Calls: none

test_plain_chat_reply_usage_is_recorded

function tests/nipux_cli/test_cli.py:2608

No docstring.

Calls: AgentDB, LLMResponse, _handle_chat_message, _mark_test_model_ready, db.close, db.create_job, db.job_token_usage, db.list_events, monkeypatch.setenv, str

test_chat_frame_surfaces_actual_work_events

function tests/nipux_cli/test_cli.py:2645

No docstring.

Calls: _build_chat_frame

test_chat_frame_has_model_updates_page

function tests/nipux_cli/test_cli.py:2695

No docstring.

Calls: _build_chat_frame

test_workspace_status_page_does_not_render_fake_worker_when_no_jobs

function tests/nipux_cli/test_cli.py:2735

No docstring.

Calls: _build_chat_frame

test_status_job_cards_show_durable_work_mix

function tests/nipux_cli/test_cli.py:2782

No docstring.

Calls: _build_chat_frame

test_recent_outcome_lines_wrap_long_updates

function tests/nipux_cli/test_cli.py:2823

No docstring.

Calls: join, len, recent_model_update_lines

test_recent_outcome_lines_do_not_pretruncate_actual_work

function tests/nipux_cli/test_cli.py:2845

No docstring.

Calls: join, recent_model_update_lines

test_chat_updates_page_keeps_updates_to_one_line_each

function tests/nipux_cli/test_cli.py:2871

No docstring.

Calls: len, recent_model_update_lines

test_chat_pane_marks_hidden_overflow

function tests/nipux_cli/test_cli.py:2892

No docstring.

Calls: chat_pane_lines, join, len, range

test_chat_pane_groups_multiline_command_output_under_one_label

function tests/nipux_cli/test_cli.py:2911

No docstring.

Calls: chat_pane_lines, join, rendered.count

test_chat_pane_suppresses_transient_duplicates_after_events_arrive

function tests/nipux_cli/test_cli.py:2931

No docstring.

Calls: chat_pane_lines, join, rendered.count

test_chat_pane_hides_persisted_legacy_waiting_notice

function tests/nipux_cli/test_cli.py:2961

No docstring.

Calls: chat_pane_lines, join

test_chat_pane_renders_waiting_notice_as_animation_only

function tests/nipux_cli/test_cli.py:2996

No docstring.

Calls: _display_chat_notices, chat_pane_lines, join

test_chat_pane_hides_persisted_worker_waiting_text

function tests/nipux_cli/test_cli.py:3007

No docstring.

Calls: chat_pane_lines, join

test_chat_pane_renders_stored_provider_errors_as_actions

function tests/nipux_cli/test_cli.py:3044

No docstring.

Calls: chat_pane_lines, join

test_chat_updates_page_uses_deeper_summary_events

function tests/nipux_cli/test_cli.py:3066

No docstring.

Calls: _build_chat_frame

test_hourly_outcomes_prioritize_durable_work_over_research_noise

function tests/nipux_cli/test_cli.py:3098

No docstring.

Calls: hourly_update_lines, join

test_status_recent_outcomes_hide_research_noise

function tests/nipux_cli/test_cli.py:3140

No docstring.

Calls: join, recent_model_update_lines

test_status_recent_outcomes_hide_plan_update_noise

function tests/nipux_cli/test_cli.py:3164

No docstring.

Calls: join, recent_model_update_lines

test_status_recent_outcomes_show_durable_checkpoint_updates

function tests/nipux_cli/test_cli.py:3196

No docstring.

Calls: join, recent_model_update_lines

test_status_recent_outcomes_compact_repeated_updates

function tests/nipux_cli/test_cli.py:3219

No docstring.

Calls: join, range, recent_model_update_lines, rendered.count

test_hourly_outcomes_hide_plan_update_noise

function tests/nipux_cli/test_cli.py:3237

No docstring.

Calls: hourly_update_lines, join

test_hourly_outcomes_count_durable_checkpoint_updates

function tests/nipux_cli/test_cli.py:3269

No docstring.

Calls: hourly_update_lines, join

test_hourly_outcome_summary_uses_progress_order

function tests/nipux_cli/test_cli.py:3290

No docstring.

Calls: hourly_update_lines, join

test_hourly_outcomes_wrap_long_durable_updates_without_pre_truncation

function tests/nipux_cli/test_cli.py:3320

No docstring.

Calls: hourly_update_lines, join

test_hourly_outcomes_limit_visible_hours_without_losing_headers

function tests/nipux_cli/test_cli.py:3345

No docstring.

Calls: events.extend, hourly_update_lines, join, range

test_chat_updates_page_includes_agent_error_updates

function tests/nipux_cli/test_cli.py:3376

No docstring.

Calls: _build_chat_frame

test_chat_status_marks_provider_blocked_jobs_before_daemon_retry

function tests/nipux_cli/test_cli.py:3413

No docstring.

Calls: _build_chat_frame

test_chat_status_page_surfaces_context_pressure

function tests/nipux_cli/test_cli.py:3443

No docstring.

Calls: _build_chat_frame

test_chat_status_page_surfaces_low_durable_yield

function tests/nipux_cli/test_cli.py:3473

No docstring.

Calls: _build_chat_frame

test_chat_status_page_shows_job_outputs

function tests/nipux_cli/test_cli.py:3501

No docstring.

Calls: _build_chat_frame

test_frame_snapshot_keeps_summary_events_durable

function tests/nipux_cli/test_cli.py:3567

No docstring.

Calls: AgentDB, _load_frame_snapshot, db.append_event, db.close, db.create_job, event.get, join, monkeypatch.setenv, range, str

test_frame_snapshot_respects_explicit_job_over_saved_focus

function tests/nipux_cli/test_cli.py:3610

No docstring.

Calls: AgentDB, _load_frame_snapshot, db.close, db.create_job, json.dumps, monkeypatch.setenv, str, write_text

test_chat_status_page_marks_deferred_jobs_waiting

function tests/nipux_cli/test_cli.py:3626

No docstring.

Calls: _build_chat_frame

test_chat_frame_collapses_repeated_failures_and_hides_memory_noise

function tests/nipux_cli/test_cli.py:3664

No docstring.

Calls: _build_chat_frame

test_work_pane_uses_badges_without_duplicate_action_verbs

function tests/nipux_cli/test_cli.py:3708

No docstring.

Calls: _build_chat_frame

test_run_reopens_completed_focused_job

function tests/nipux_cli/test_cli.py:3743

No docstring.

Calls: AgentDB, _mark_test_model_ready, args.func, build_parser, capsys.readouterr, db.close, db.create_job, db.get_job, db.update_job_status, monkeypatch.setattr, monkeypatch.setenv, parser.parse_args, started.update, str

fake_start

function tests/nipux_cli/test_cli.py:3755

No docstring.

Calls: started.update

test_run_delegates_unverified_provider_state_to_daemon_start

function tests/nipux_cli/test_cli.py:3775

No docstring.

Calls: AgentDB, args.func, build_parser, capsys.readouterr, db.close, db.create_job, monkeypatch.setattr, monkeypatch.setenv, parser.parse_args, print, started.update, str

fake_start

function tests/nipux_cli/test_cli.py:3785

No docstring.

Calls: print, started.update

test_run_marks_job_waiting_when_provider_recovery_is_needed

function tests/nipux_cli/test_cli.py:3801

No docstring.

Calls: AgentDB, args.func, build_parser, capsys.readouterr, db.close, db.create_job, db.get_job, monkeypatch.setattr, monkeypatch.setenv, parser.parse_args, print, str

fake_start

function tests/nipux_cli/test_cli.py:3812

No docstring.

Calls: print

test_run_does_not_reopen_already_provider_blocked_job

function tests/nipux_cli/test_cli.py:3832

No docstring.

Calls: AgentDB, all, args.func, build_parser, capsys.readouterr, db.close, db.create_job, db.get_job, db.list_events, db.update_job_status, event.get, len, monkeypatch.setattr, monkeypatch.setenv, parser.parse_args, print

fake_start

function tests/nipux_cli/test_cli.py:3853

No docstring.

Calls: print

test_run_does_not_reopen_job_when_provider_preflight_is_hard_failure

function tests/nipux_cli/test_cli.py:3876

No docstring.

Calls: AgentDB, args.func, build_parser, db.close, db.create_job, db.get_job, db.update_job_status, monkeypatch.setattr, monkeypatch.setenv, parser.parse_args, str

test_create_sets_new_job_as_shell_focus

function tests/nipux_cli/test_cli.py:3901

No docstring.

Calls: AgentDB, _mark_test_model_ready, _run_shell_line, all, args.func, build_parser, capsys.readouterr, db.close, db.get_job, exists, monkeypatch.setenv, out.strip, parser.parse_args, str

test_commands_accept_unquoted_job_titles_in_shell

function tests/nipux_cli/test_cli.py:3931

No docstring.

Calls: AgentDB, _run_shell_line, capsys.readouterr, db.close, db.create_job, monkeypatch.setenv, str

test_shell_stop_job_title_pauses_job_instead_of_stopping_daemon

function tests/nipux_cli/test_cli.py:3947

No docstring.

Calls: AgentDB, _run_shell_line, capsys.readouterr, db.close, db.create_job, db.get_job, db.update_job_status, monkeypatch.setenv, str

test_resume_clears_provider_block_before_retry

function tests/nipux_cli/test_cli.py:3969

No docstring.

Calls: AgentDB, capsys.readouterr, db.close, db.create_job, db.get_job, db.update_job_status, main, monkeypatch.setenv, str

test_shell_cancel_prefers_multiword_job_title_over_note

function tests/nipux_cli/test_cli.py:4004

No docstring.

Calls: AgentDB, _run_shell_line, capsys.readouterr, db.close, db.create_job, db.get_job, db.update_job_status, monkeypatch.setenv, str

test_shell_pause_splits_note_after_longest_matching_job_title

function tests/nipux_cli/test_cli.py:4027

No docstring.

Calls: AgentDB, _run_shell_line, capsys.readouterr, db.close, db.create_job, db.get_job, db.update_job_status, monkeypatch.setenv, str

test_chat_handle_line_adds_operator_message

function tests/nipux_cli/test_cli.py:4049

No docstring.

Calls: AgentDB, _chat_handle_line, _mark_test_model_ready, capsys.readouterr, db.close, db.create_job, db.get_job, monkeypatch.setenv, str

test_chat_can_spawn_new_job_from_plain_message

function tests/nipux_cli/test_cli.py:4078

No docstring.

Calls: AgentDB, _chat_handle_line, _mark_test_model_ready, capsys.readouterr, db.close, db.create_job, db.list_jobs, len, monkeypatch.setattr, monkeypatch.setenv, started.update, str

fake_start

function tests/nipux_cli/test_cli.py:4088

No docstring.

Calls: started.update

test_workspace_chat_can_create_refined_worker_job

function tests/nipux_cli/test_cli.py:4120

No docstring.

Calls: AgentDB, _handle_chat_message, _load_frame_snapshot, _mark_test_model_ready, db.close, db.list_jobs, event.get, join, len, monkeypatch.setattr, monkeypatch.setenv, started.update, str

fake_start

function tests/nipux_cli/test_cli.py:4125

No docstring.

Calls: started.update

test_workspace_chat_start_objective_creates_worker_without_model_reply

function tests/nipux_cli/test_cli.py:4156

No docstring.

Calls: AgentDB, _handle_workspace_chat_message, _mark_test_model_ready, _read_shell_state, db.close, db.list_jobs, get, len, monkeypatch.setattr, monkeypatch.setenv, started.update, str

fake_start

function tests/nipux_cli/test_cli.py:4161

No docstring.

Calls: started.update

test_workspace_chat_accepts_natural_worker_and_task_phrasing

function tests/nipux_cli/test_cli.py:4185

No docstring.

Calls: AgentDB, _handle_workspace_chat_message, _mark_test_model_ready, append, db.close, db.list_jobs, len, monkeypatch.setattr, monkeypatch.setenv, started.setdefault, str

fake_start

function tests/nipux_cli/test_cli.py:4190

No docstring.

Calls: append, started.setdefault

test_chat_can_queue_new_job_without_starting

function tests/nipux_cli/test_cli.py:4222

No docstring.

Calls: AgentDB, _chat_handle_line, _mark_test_model_ready, capsys.readouterr, db.close, db.create_job, monkeypatch.setattr, monkeypatch.setenv, started.update, str

fake_start

function tests/nipux_cli/test_cli.py:4232

No docstring.

Calls: started.update

test_chat_can_spawn_generic_deliverable_job_from_plain_message

function tests/nipux_cli/test_cli.py:4252

No docstring.

Calls: AgentDB, _chat_handle_line, _mark_test_model_ready, db.close, db.create_job, db.list_jobs, len, monkeypatch.setattr, monkeypatch.setenv, started.update, str

fake_start

function tests/nipux_cli/test_cli.py:4262

No docstring.

Calls: started.update

test_chat_start_job_message_starts_daemon

function tests/nipux_cli/test_cli.py:4287

No docstring.

Calls: AgentDB, _chat_handle_line, _mark_test_model_ready, capsys.readouterr, db.close, db.create_job, monkeypatch.setattr, monkeypatch.setenv, started.update, str

fake_start

function tests/nipux_cli/test_cli.py:4297

No docstring.

Calls: started.update

test_chat_create_job_and_run_it_starts_daemon

function tests/nipux_cli/test_cli.py:4317

No docstring.

Calls: AgentDB, _chat_handle_line, _mark_test_model_ready, capsys.readouterr, db.close, db.create_job, monkeypatch.setattr, monkeypatch.setenv, started.update, str

fake_start

function tests/nipux_cli/test_cli.py:4327

No docstring.

Calls: started.update

test_chat_jobs_command_lists_jobs_instead_of_steering

function tests/nipux_cli/test_cli.py:4347

No docstring.

Calls: AgentDB, _chat_handle_line, capsys.readouterr, db.close, db.create_job, db.get_job, get, monkeypatch.setenv, str

test_chat_command_inside_chat_is_not_queued

function tests/nipux_cli/test_cli.py:4368

No docstring.

Calls: AgentDB, _chat_handle_line, capsys.readouterr, db.close, db.create_job, db.get_job, get, monkeypatch.setenv, str

test_chat_run_accepts_initial_plan_before_starting

function tests/nipux_cli/test_cli.py:4392

No docstring.

Calls: AgentDB, _chat_handle_line, _mark_test_model_ready, args.func, build_parser, db.close, db.get_job, monkeypatch.setattr, monkeypatch.setenv, parser.parse_args, str

fake_run

function tests/nipux_cli/test_cli.py:4401

No docstring.

Calls: none

test_run_without_jobs_does_not_start_empty_daemon

function tests/nipux_cli/test_cli.py:4417

No docstring.

Calls: _mark_test_model_ready, _run_shell_line, capsys.readouterr, monkeypatch.setattr, monkeypatch.setenv, started.update, str

fake_start

function tests/nipux_cli/test_cli.py:4422

No docstring.

Calls: started.update

test_build_chat_messages_includes_recent_job_state

function tests/nipux_cli/test_cli.py:4434

No docstring.

Calls: AgentDB, _build_chat_messages, db.add_step, db.close, db.create_job, db.finish_step, db.get_job, db.start_run

test_build_chat_messages_includes_durable_outcome_summary

function tests/nipux_cli/test_cli.py:4457

No docstring.

Calls: AgentDB, _build_chat_messages, db.append_event, db.close, db.create_job, db.get_job

test_build_chat_messages_does_not_include_local_machine_context

function tests/nipux_cli/test_cli.py:4484

No docstring.

Calls: AgentDB, _build_chat_messages, db.close, db.create_job, db.get_job, monkeypatch.setenv, ssh_dir.mkdir, str, write_text

test_build_chat_messages_points_to_artifact_and_lessons

function tests/nipux_cli/test_cli.py:4504

No docstring.

Calls: AgentDB, ArtifactStore, _build_chat_messages, db.add_step, db.append_lesson, db.close, db.create_job, db.get_job, db.start_run, write_text

test_build_chat_messages_clip_large_visible_state

function tests/nipux_cli/test_cli.py:4530

No docstring.

Calls: AgentDB, _build_chat_messages, db.append_event, db.close, db.create_job, db.get_job, len, range

test_artifact_command_resolves_title_query

function tests/nipux_cli/test_cli.py:4554

No docstring.

Calls: AgentDB, ArtifactStore, args.func, build_parser, capsys.readouterr, db.add_step, db.close, db.create_job, db.start_run, monkeypatch.setenv, parser.parse_args, str, write_text

test_artifacts_command_prints_compact_view_command

function tests/nipux_cli/test_cli.py:4581

No docstring.

Calls: AgentDB, ArtifactStore, args.func, build_parser, capsys.readouterr, db.add_step, db.close, db.create_job, db.start_run, monkeypatch.setenv, parser.parse_args, str, write_text

test_artifact_command_opens_recent_output_by_number

function tests/nipux_cli/test_cli.py:4609

No docstring.

Calls: AgentDB, ArtifactStore, args.func, build_parser, capsys.readouterr, db.add_step, db.close, db.create_job, db.start_run, monkeypatch.setenv, parser.parse_args, str, write_text

test_chat_work_defaults_to_compact_output

function tests/nipux_cli/test_cli.py:4636

No docstring.

Calls: AgentDB, _chat_handle_line, db.close, db.create_job, monkeypatch.setattr, monkeypatch.setenv, str

fake_work

function tests/nipux_cli/test_cli.py:4645

No docstring.

Calls: none

test_chat_learn_adds_lesson

function tests/nipux_cli/test_cli.py:4656

No docstring.

Calls: AgentDB, _chat_handle_line, capsys.readouterr, db.close, db.create_job, db.get_job, monkeypatch.setenv, str

test_chat_follow_queues_follow_up_message

function tests/nipux_cli/test_cli.py:4676

No docstring.

Calls: AgentDB, _chat_handle_line, capsys.readouterr, db.close, db.create_job, db.get_job, monkeypatch.setenv, str

test_findings_sources_memory_metrics_commands

function tests/nipux_cli/test_cli.py:4698

No docstring.

Calls: AgentDB, args.func, build_parser, capsys.readouterr, db.append_experiment_record, db.append_finding_record, db.append_lesson, db.append_memory_graph_records, db.append_reflection, db.append_source_record, db.append_task_record, db.close, db.create_job, monkeypatch.setenv, pars...

test_memory_graph_html_command_writes_clickable_artifact

function tests/nipux_cli/test_cli.py:4743

No docstring.

Calls: AgentDB, Path, args.func, build_parser, capsys.readouterr, db.append_memory_graph_records, db.close, db.create_job, db.list_artifacts, monkeypatch.setenv, parse_args, read_text, str

test_shell_natural_update_phrase_shows_updates

function tests/nipux_cli/test_cli.py:4793

No docstring.

Calls: AgentDB, _run_shell_line, capsys.readouterr, db.close, db.create_job, monkeypatch.setenv, str

test_updates_command_summarizes_durable_outcomes

function tests/nipux_cli/test_cli.py:4809

No docstring.

Calls: AgentDB, args.func, artifact_path.write_text, build_parser, capsys.readouterr, db.add_artifact, db.append_event, db.append_finding_record, db.close, db.create_job, monkeypatch.setenv, parse_args, str

test_updates_all_summarizes_durable_work_across_jobs

function tests/nipux_cli/test_cli.py:4846

No docstring.

Calls: AgentDB, args.func, build_parser, capsys.readouterr, db.add_artifact, db.append_experiment_record, db.append_finding_record, db.close, db.create_job, first_path.write_text, monkeypatch.setenv, parse_args, second_path.write_text, str

test_history_and_events_commands_render_visible_timeline

function tests/nipux_cli/test_cli.py:4897

No docstring.

Calls: AgentDB, build_parser, capsys.readouterr, db.append_agent_update, db.append_operator_message, db.close, db.create_job, func, monkeypatch.setenv, parser.parse_args, str

test_shell_natural_health_phrase_shows_health

function tests/nipux_cli/test_cli.py:4918

No docstring.

Calls: AgentDB, _run_shell_line, capsys.readouterr, db.close, db.create_job, monkeypatch.setenv, str

test_health_prints_recent_daemon_events

function tests/nipux_cli/test_cli.py:4933

No docstring.

Calls: append_daemon_event, args.func, build_parser, capsys.readouterr, load_config, monkeypatch.setenv, parser.parse_args, str

test_launch_agent_plist_contains_daemon_command

function tests/nipux_cli/test_cli.py:4949

No docstring.

Calls: _launch_agent_plist, monkeypatch.setenv, str

test_systemd_service_text_contains_daemon_command

function tests/nipux_cli/test_cli.py:4961

No docstring.

Calls: _systemd_service_text, monkeypatch.setenv, str

_config

function tests/nipux_cli/test_cli_model_preflight.py:7

No docstring.

Calls: SimpleNamespace

test_remote_model_preflight_blocks_rejected_auth

function tests/nipux_cli/test_cli_model_preflight.py:18

No docstring.

Calls: Check, _config, _ensure_remote_model_ready_for_worker, capsys.readouterr, monkeypatch.setattr

fake_doctor

function tests/nipux_cli/test_cli_model_preflight.py:19

No docstring.

Calls: Check

test_remote_model_preflight_allows_recovery_monitor_for_quota

function tests/nipux_cli/test_cli_model_preflight.py:33

No docstring.

Calls: Check, _config, _ensure_remote_model_ready_for_worker, capsys.readouterr, monkeypatch.setattr

fake_doctor

function tests/nipux_cli/test_cli_model_preflight.py:34

No docstring.

Calls: Check

test_remote_model_preflight_skips_fake_runs

function tests/nipux_cli/test_cli_model_preflight.py:47

No docstring.

Calls: AssertionError, _config, _ensure_remote_model_ready_for_worker, monkeypatch.setattr

fake_doctor

function tests/nipux_cli/test_cli_model_preflight.py:48

No docstring.

Calls: AssertionError

test_model_preflight_checks_local_endpoints

function tests/nipux_cli/test_cli_model_preflight.py:56

No docstring.

Calls: _config, _ensure_remote_model_ready_for_worker, monkeypatch.setattr

fake_doctor

function tests/nipux_cli/test_cli_model_preflight.py:59

No docstring.

Calls: none

test_start_does_not_spawn_daemon_when_model_preflight_fails

function tests/nipux_cli/test_cli_model_preflight.py:69

No docstring.

Calls: AssertionError, args.func, build_parser, monkeypatch.setattr, monkeypatch.setenv, parse_args, str

fake_ready

function tests/nipux_cli/test_cli_model_preflight.py:73

No docstring.

Calls: none

fake_popen

function tests/nipux_cli/test_cli_model_preflight.py:77

No docstring.

Calls: AssertionError

test_refresh_memory_index_includes_durable_progress_ledgers

function tests/nipux_cli/test_compression.py:5

No docstring.

Calls: AgentDB, db.append_event, db.close, db.create_job, db.list_memory, refresh_memory_index

_mode

function tests/nipux_cli/test_config.py:6

No docstring.

Calls: path.stat

test_load_config_defaults_to_local_endpoint

function tests/nipux_cli/test_config.py:10

No docstring.

Calls: load_config, monkeypatch.setenv, str

test_load_config_from_yaml

function tests/nipux_cli/test_config.py:30

No docstring.

Calls: Path, cfg.write_text, load_config, monkeypatch.setenv, str

test_load_config_reads_local_env_file

function tests/nipux_cli/test_config.py:79

No docstring.

Calls: load_config, monkeypatch.delenv, monkeypatch.setenv, str, write_text

test_load_config_tightens_local_env_permissions

function tests/nipux_cli/test_config.py:98

No docstring.

Calls: _mode, env_path.chmod, env_path.write_text, load_config, monkeypatch.delenv, monkeypatch.setenv, str

test_default_config_yaml_allows_provider_template_without_secret

function tests/nipux_cli/test_config.py:110

No docstring.

Calls: default_config_yaml

test_config_example_matches_default_local_endpoint

function tests/nipux_cli/test_config.py:131

No docstring.

Calls: Path, read_text, resolve

test_single_instance_lock_rejects_second_holder

function tests/nipux_cli/test_daemon.py:30

No docstring.

Calls: pytest.raises, single_instance_lock

test_daemon_lock_status_reports_free_lock

function tests/nipux_cli/test_daemon.py:38

No docstring.

Calls: daemon_lock_status

test_lock_metadata_can_be_updated_while_held

function tests/nipux_cli/test_daemon.py:45

No docstring.

Calls: daemon_lock_status, single_instance_lock, update_lock_metadata

test_lock_metadata_update_restores_missing_process_fields

function tests/nipux_cli/test_daemon.py:58

No docstring.

Calls: daemon_lock_status, handle.flush, handle.seek, handle.truncate, handle.write, json.dumps, single_instance_lock, update_lock_metadata

test_daemon_lock_heartbeat_updates_while_worker_turn_runs

function tests/nipux_cli/test_daemon.py:75

No docstring.

Calls: AgentDB, AppConfig, RuntimeConfig, SlowDaemon, daemon_lock_status, db.close, metadata.get, monkeypatch.setattr, status.get, thread.is_alive, thread.join, thread.start, threading.Thread, time.sleep, time.time

SlowDaemon

class tests/nipux_cli/test_daemon.py:80

No docstring.

Calls: time.sleep

run_once

function tests/nipux_cli/test_daemon.py:81

No docstring.

Calls: time.sleep

test_stop_daemon_recovers_pidless_lock_from_process_list

function tests/nipux_cli/test_daemon.py:115

No docstring.

Calls: AppConfig, PsResult, RuntimeConfig, handle.flush, handle.seek, handle.truncate, handle.write, json.dumps, killed.append, monkeypatch.setattr, single_instance_lock, stop_daemon_process_impl

PsResult

class tests/nipux_cli/test_daemon.py:120

No docstring.

Calls: none

test_daemon_lock_status_detects_stale_runtime

function tests/nipux_cli/test_daemon.py:139

No docstring.

Calls: current_runtime_fingerprint, daemon_lock_status, runtime_stale, single_instance_lock, update_lock_metadata

test_runtime_fingerprint_tracks_progress_code

function tests/nipux_cli/test_daemon.py:153

No docstring.

Calls: none

test_rate_limit_backoff_uses_retry_after_header

function tests/nipux_cli/test_daemon.py:158

No docstring.

Calls: RateLimit, _exception_backoff, type

RateLimit

class tests/nipux_cli/test_daemon.py:159

No docstring.

Calls: type

test_rate_limit_backoff_has_conservative_fallback

function tests/nipux_cli/test_daemon.py:166

No docstring.

Calls: RateLimit, _exception_backoff

RateLimit

class tests/nipux_cli/test_daemon.py:167

No docstring.

Calls: none

test_failed_step_provider_config_error_uses_normal_backoff

function tests/nipux_cli/test_daemon.py:173

No docstring.

Calls: StepExecution, _step_failure_backoff

test_failed_tool_auth_error_uses_normal_backoff

function tests/nipux_cli/test_daemon.py:189

No docstring.

Calls: StepExecution, _step_failure_backoff

test_failed_step_rate_limit_uses_normal_backoff

function tests/nipux_cli/test_daemon.py:204

No docstring.

Calls: StepExecution, _step_failure_backoff

test_failed_step_provider_timeout_uses_normal_backoff

function tests/nipux_cli/test_daemon.py:220

No docstring.

Calls: StepExecution, _step_failure_backoff

test_retry_after_parses_epoch_milliseconds

function tests/nipux_cli/test_daemon.py:236

No docstring.

Calls: _parse_retry_after, int, str, time.time

test_daemon_run_once_claims_next_job_with_fake_step

function tests/nipux_cli/test_daemon.py:245

No docstring.

Calls: AgentDB, AppConfig, Daemon, RuntimeConfig, daemon.run_once, db.close, db.create_job, db.list_artifacts

test_daemon_ignores_ui_focus_for_worker_scheduling

function tests/nipux_cli/test_daemon.py:262

No docstring.

Calls: AgentDB, AppConfig, Daemon, RuntimeConfig, daemon.next_runnable_job, db.close, db.create_job, json.dumps, write_text

test_daemon_skips_deferred_jobs_until_due

function tests/nipux_cli/test_daemon.py:280

No docstring.

Calls: AgentDB, AppConfig, Daemon, RuntimeConfig, daemon.next_runnable_job, datetime.now, db.close, db.create_job, db.update_job_status, isoformat, timedelta

test_daemon_quarantines_provider_blocked_jobs

function tests/nipux_cli/test_daemon.py:301

No docstring.

Calls: AgentDB, AppConfig, Daemon, RuntimeConfig, any, daemon.next_runnable_job, datetime.now, db.close, db.create_job, db.get_job, db.list_events, db.update_job_status, get, isoformat, lower

test_daemon_leaves_provider_blocked_job_paused_until_model_recovers

function tests/nipux_cli/test_daemon.py:327

No docstring.

Calls: AgentDB, AppConfig, Check, Daemon, RuntimeConfig, daemon.next_runnable_job, datetime.now, db.close, db.create_job, db.get_job, db.update_job_status, isoformat, monkeypatch.setattr, read_daemon_events, startswith, timedelta

fake_doctor

function tests/nipux_cli/test_daemon.py:339

No docstring.

Calls: Check

test_daemon_resumes_provider_blocked_job_when_model_recovers

function tests/nipux_cli/test_daemon.py:355

No docstring.

Calls: AgentDB, AppConfig, Daemon, RuntimeConfig, any, daemon.next_runnable_job, datetime.now, db.close, db.create_job, db.get_job, db.list_events, db.update_job_status, get, isoformat, monkeypatch.setattr, timedelta

fake_doctor

function tests/nipux_cli/test_daemon.py:367

No docstring.

Calls: none

test_daemon_idle_sleep_wakes_for_deferred_job

function tests/nipux_cli/test_daemon.py:387

No docstring.

Calls: AgentDB, AppConfig, Daemon, RuntimeConfig, daemon.idle_sleep_seconds, datetime.now, db.close, db.create_job, db.update_job_status, isoformat, timedelta

test_daemon_idle_sleep_uses_poll_when_no_deferred_jobs

function tests/nipux_cli/test_daemon.py:407

No docstring.

Calls: AgentDB, AppConfig, Daemon, RuntimeConfig, daemon.idle_sleep_seconds, db.close, db.create_job

test_daemon_advances_multiple_runnable_jobs_without_focus_starvation

function tests/nipux_cli/test_daemon.py:420

No docstring.

Calls: AgentDB, AppConfig, Daemon, RuntimeConfig, daemon.run_forever, db.close, db.create_job, db.list_steps, json.dumps, write_text

test_daemon_writes_due_daily_digest_once

function tests/nipux_cli/test_daemon.py:437

No docstring.

Calls: AgentDB, AppConfig, Daemon, RuntimeConfig, daemon.send_due_daily_digest, datetime, db.close, db.create_job, exists

test_daemon_event_log_round_trips_jsonl

function tests/nipux_cli/test_daemon.py:456

No docstring.

Calls: AppConfig, RuntimeConfig, append_daemon_event, read_daemon_events

test_daemon_recovers_stale_running_steps_on_start

function tests/nipux_cli/test_daemon.py:467

No docstring.

Calls: AgentDB, AppConfig, Daemon, RuntimeConfig, any, daemon.run_forever, db.add_step, db.close, db.create_job, db.list_runs, db.list_steps, db.start_run, event.get, next, read_daemon_events

test_daemon_survives_unexpected_step_exception

function tests/nipux_cli/test_daemon.py:489

No docstring.

Calls: AgentDB, AppConfig, ExplodingDaemon, RuntimeConfig, RuntimeError, any, daemon.run_forever, daemon_lock_status, db.close, event.get, read_daemon_events

ExplodingDaemon

class tests/nipux_cli/test_daemon.py:490

No docstring.

Calls: RuntimeError

run_once

function tests/nipux_cli/test_daemon.py:491

No docstring.

Calls: RuntimeError

test_daemon_treats_blocked_steps_as_recoverable

function tests/nipux_cli/test_daemon.py:510

No docstring.

Calls: AgentDB, AppConfig, BlockedDaemon, RuntimeConfig, StepExecution, any, daemon.run_forever, daemon_lock_status, db.close, event.get, read_daemon_events, sum

BlockedDaemon

class tests/nipux_cli/test_daemon.py:511

No docstring.

Calls: StepExecution

run_once

function tests/nipux_cli/test_daemon.py:512

No docstring.

Calls: StepExecution

test_fake_daemon_can_run_100_iterations_without_auto_stop

function tests/nipux_cli/test_daemon.py:538

No docstring.

Calls: AgentDB, AppConfig, Daemon, RuntimeConfig, any, daemon.run_forever, daemon_lock_status, db.close, db.create_job, db.get_job, db.list_artifacts, db.list_memory, db.list_steps, event.get, len, read_daemon_events

test_dashboard_collects_jobs_steps_and_artifacts

function tests/nipux_cli/test_dashboard.py:9

No docstring.

Calls: AgentDB, AppConfig, ArtifactStore, RuntimeConfig, collect_dashboard_state, db.add_step, db.append_lesson, db.append_task_record, db.close, db.create_job, db.finish_run, db.finish_step, db.start_run, render_dashboard, render_overview, write_text

test_overview_marks_idle_daemon_as_ready_for_work

function tests/nipux_cli/test_dashboard.py:55

No docstring.

Calls: AgentDB, AppConfig, RuntimeConfig, collect_dashboard_state, db.close, db.create_job, render_overview

test_overview_marks_old_heartbeat_as_busy_for_running_step

function tests/nipux_cli/test_dashboard.py:68

No docstring.

Calls: AgentDB, AppConfig, RuntimeConfig, collect_dashboard_state, datetime.now, db.add_step, db.close, db.create_job, db.start_run, isoformat, render_overview, timedelta

test_db_job_run_step_and_artifact_roundtrip

function tests/nipux_cli/test_db.py:4

No docstring.

Calls: AgentDB, db.add_artifact, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.get_artifact, db.get_job, db.list_artifacts, db.list_runs, db.list_steps, db.start_run

test_create_job_uses_unique_readable_slug_ids

function tests/nipux_cli/test_db.py:42

No docstring.

Calls: AgentDB, db.close, db.create_job

test_step_numbers_increment_across_runs_for_a_job

function tests/nipux_cli/test_db.py:54

No docstring.

Calls: AgentDB, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.list_steps, db.start_run

test_job_token_usage_aggregates_message_usage

function tests/nipux_cli/test_db.py:73

No docstring.

Calls: AgentDB, db.append_event, db.close, db.create_job, db.job_token_usage

test_append_operator_message_roundtrip

function tests/nipux_cli/test_db.py:125

No docstring.

Calls: AgentDB, db.append_operator_message, db.close, db.create_job, db.get_job, db.list_timeline_events

test_claim_operator_messages_marks_one_message_at_a_time

function tests/nipux_cli/test_db.py:145

No docstring.

Calls: AgentDB, all, any, db.append_operator_message, db.claim_operator_messages, db.close, db.create_job, db.get_job, db.list_timeline_events, message.get

test_acknowledge_operator_messages_marks_delivered_context

function tests/nipux_cli/test_db.py:168

No docstring.

Calls: AgentDB, any, db.acknowledge_operator_messages, db.append_operator_message, db.claim_operator_messages, db.close, db.create_job, db.get_job, db.list_timeline_events

test_rename_job_updates_title_without_changing_id

function tests/nipux_cli/test_db.py:193

No docstring.

Calls: AgentDB, db.close, db.create_job, db.get_job, db.rename_job

test_delete_job_removes_related_rows

function tests/nipux_cli/test_db.py:208

No docstring.

Calls: AgentDB, AssertionError, artifact_path.write_text, db.add_artifact, db.add_step, db.close, db.create_job, db.delete_job, db.get_job, db.list_artifacts, db.list_memory, db.list_steps, db.start_run, db.upsert_memory

test_append_lesson_roundtrip

function tests/nipux_cli/test_db.py:247

No docstring.

Calls: AgentDB, db.append_lesson, db.close, db.create_job, db.get_job

test_append_lesson_dedupes_repeated_memory

function tests/nipux_cli/test_db.py:267

No docstring.

Calls: AgentDB, db.append_lesson, db.close, db.create_job, db.get_job, db.list_events, len

test_source_and_finding_ledgers_dedupe_and_update

function tests/nipux_cli/test_db.py:288

No docstring.

Calls: AgentDB, db.append_finding_record, db.append_reflection, db.append_source_record, db.close, db.create_job, db.get_job, len

test_repeated_source_and_finding_records_mark_non_substantive_touches

function tests/nipux_cli/test_db.py:340

No docstring.

Calls: AgentDB, db.append_finding_record, db.append_source_record, db.close, db.create_job

test_task_queue_dedupes_and_updates

function tests/nipux_cli/test_db.py:386

No docstring.

Calls: AgentDB, db.append_task_record, db.close, db.create_job, db.get_job, len

test_repeated_task_and_experiment_records_mark_non_substantive_touches

function tests/nipux_cli/test_db.py:417

No docstring.

Calls: AgentDB, db.append_experiment_record, db.append_task_record, db.close, db.create_job

test_non_substantive_ledger_touches_do_not_emit_visible_events

function tests/nipux_cli/test_db.py:458

No docstring.

Calls: AgentDB, db.append_experiment_record, db.append_finding_record, db.append_roadmap_record, db.append_source_record, db.append_task_record, db.close, db.create_job, db.list_timeline_events, sum

test_roadmap_last_records_include_progress_accounting_metadata

function tests/nipux_cli/test_db.py:514

No docstring.

Calls: AgentDB, db.append_milestone_validation_record, db.append_roadmap_record, db.close, db.create_job, db.get_job

test_repeated_roadmap_records_do_not_create_fake_milestone_updates

function tests/nipux_cli/test_db.py:548

No docstring.

Calls: AgentDB, db.append_roadmap_record, db.close, db.create_job, db.get_job

test_timeline_events_cover_visible_activity

function tests/nipux_cli/test_db.py:567

No docstring.

Calls: AgentDB, any, db.add_artifact, db.add_step, db.append_agent_update, db.append_finding_record, db.append_lesson, db.append_operator_message, db.append_reflection, db.append_source_record, db.append_task_record, db.close, db.create_job, db.finish_step, db.list_timeline_events, d...

test_daily_digest_includes_ledgers_lessons_sources_and_strategy

function tests/nipux_cli/test_digest.py:6

No docstring.

Calls: AgentDB, AppConfig, RuntimeConfig, db.append_event, db.append_finding_record, db.append_lesson, db.append_reflection, db.append_source_record, db.append_task_record, db.close, db.create_job, db.start_run, exists, render_daily_digest, render_job_digest, write_daily_digest

FakeHTTPResponse

class tests/nipux_cli/test_doctor.py:9

No docstring.

Calls: encode, json.dumps

__init__

function tests/nipux_cli/test_doctor.py:10

No docstring.

Calls: none

__enter__

function tests/nipux_cli/test_doctor.py:13

No docstring.

Calls: none

__exit__

function tests/nipux_cli/test_doctor.py:16

No docstring.

Calls: none

read

function tests/nipux_cli/test_doctor.py:19

No docstring.

Calls: encode, json.dumps

test_doctor_checks_local_runtime_without_model_call

function tests/nipux_cli/test_doctor.py:23

No docstring.

Calls: AppConfig, RuntimeConfig, all, run_doctor

test_doctor_warns_when_remote_model_key_is_missing

function tests/nipux_cli/test_doctor.py:32

No docstring.

Calls: AppConfig, ModelConfig, RuntimeConfig, monkeypatch.delenv, next, run_doctor

test_doctor_reports_openrouter_auth_failure

function tests/nipux_cli/test_doctor.py:51

No docstring.

Calls: AppConfig, ModelConfig, RuntimeConfig, monkeypatch.setattr, monkeypatch.setenv, run_doctor, urllib.error.HTTPError

fake_urlopen

function tests/nipux_cli/test_doctor.py:62

No docstring.

Calls: urllib.error.HTTPError

test_doctor_reports_generation_limit_after_model_listing

function tests/nipux_cli/test_doctor.py:81

No docstring.

Calls: AppConfig, AssertionError, FakeHTTPResponse, ModelConfig, RuntimeConfig, io.BytesIO, monkeypatch.setattr, monkeypatch.setenv, run_doctor, url.endswith, urllib.error.HTTPError

fake_urlopen

function tests/nipux_cli/test_doctor.py:92

No docstring.

Calls: AssertionError, FakeHTTPResponse, io.BytesIO, url.endswith, urllib.error.HTTPError

test_doctor_reports_nested_provider_generation_error

function tests/nipux_cli/test_doctor.py:113

No docstring.

Calls: AppConfig, AssertionError, FakeHTTPResponse, ModelConfig, RuntimeConfig, encode, io.BytesIO, json.dumps, monkeypatch.setattr, monkeypatch.setenv, run_doctor, url.endswith, urllib.error.HTTPError

fake_urlopen

function tests/nipux_cli/test_doctor.py:124

No docstring.

Calls: AssertionError, FakeHTTPResponse, encode, io.BytesIO, json.dumps, url.endswith, urllib.error.HTTPError

test_runtime_code_has_no_task_specific_literals

function tests/nipux_cli/test_generic_runtime_audit.py:26

No docstring.

Calls: Path, join, lower, path.read_text, resolve, root.glob, sorted

_load_live_smoke

function tests/nipux_cli/test_live_memory_graph_smoke.py:8

No docstring.

Calls: Path, importlib.util.module_from_spec, importlib.util.spec_from_file_location, resolve, spec.loader.exec_module

test_live_memory_graph_smoke_fails_cleanly_without_key

function tests/nipux_cli/test_live_memory_graph_smoke.py:19

No docstring.

Calls: _load_live_smoke, capsys.readouterr, monkeypatch.delenv, monkeypatch.setattr, out.lower, smoke.main

test_live_memory_graph_smoke_seed_pushes_generic_consolidation

function tests/nipux_cli/test_live_memory_graph_smoke.py:32

No docstring.

Calls: _load_live_smoke, memory_graph_for_prompt, smoke._seed_metadata

_FakeCompletions

class tests/nipux_cli/test_llm.py:7

No docstring.

Calls: SimpleNamespace, self.calls.append

__init__

function tests/nipux_cli/test_llm.py:8

No docstring.

Calls: none

create

function tests/nipux_cli/test_llm.py:12

No docstring.

Calls: SimpleNamespace, self.calls.append

test_chat_llm_requires_tool_choice_for_worker_actions

function tests/nipux_cli/test_llm.py:21

No docstring.

Calls: ModelConfig, OpenAIChatLLM, SimpleNamespace, _FakeCompletions, llm.next_action, monkeypatch.setattr, monkeypatch.setenv

FakeOpenAI

class tests/nipux_cli/test_llm.py:25

No docstring.

Calls: SimpleNamespace

__init__

function tests/nipux_cli/test_llm.py:26

No docstring.

Calls: none

test_chat_llm_retries_without_tool_choice_when_provider_rejects_it

function tests/nipux_cli/test_llm.py:46

No docstring.

Calls: ModelConfig, OpenAIChatLLM, RejectingCompletions, RuntimeError, SimpleNamespace, create, kwargs.get, llm.next_action, monkeypatch.setattr, monkeypatch.setenv, self.calls.append, super

RejectingCompletions

class tests/nipux_cli/test_llm.py:49

No docstring.

Calls: RuntimeError, create, kwargs.get, self.calls.append, super

create

function tests/nipux_cli/test_llm.py:50

No docstring.

Calls: RuntimeError, create, kwargs.get, self.calls.append, super

FakeOpenAI

class tests/nipux_cli/test_llm.py:58

No docstring.

Calls: SimpleNamespace

__init__

function tests/nipux_cli/test_llm.py:59

No docstring.

Calls: none

test_chat_llm_complete_response_returns_usage

function tests/nipux_cli/test_llm.py:74

No docstring.

Calls: ModelConfig, OpenAIChatLLM, SimpleNamespace, _FakeCompletions, llm.complete_response, monkeypatch.setattr, monkeypatch.setenv

FakeOpenAI

class tests/nipux_cli/test_llm.py:78

No docstring.

Calls: SimpleNamespace

__init__

function tests/nipux_cli/test_llm.py:79

No docstring.

Calls: none

test_chat_llm_disables_provider_sdk_retries

function tests/nipux_cli/test_llm.py:98

No docstring.

Calls: ModelConfig, OpenAIChatLLM, SimpleNamespace, _FakeCompletions, captured.update, monkeypatch.setattr, monkeypatch.setenv

FakeOpenAI

class tests/nipux_cli/test_llm.py:102

No docstring.

Calls: SimpleNamespace, _FakeCompletions, captured.update

__init__

function tests/nipux_cli/test_llm.py:103

No docstring.

Calls: captured.update

test_openrouter_generation_usage_enriches_cost_and_tokens

function tests/nipux_cli/test_llm.py:116

No docstring.

Calls: FakeHTTPResponse, _enrich_openrouter_generation_usage, monkeypatch.setattr

FakeHTTPResponse

class tests/nipux_cli/test_llm.py:117

No docstring.

Calls: none

__enter__

function tests/nipux_cli/test_llm.py:118

No docstring.

Calls: none

__exit__

function tests/nipux_cli/test_llm.py:121

No docstring.

Calls: none

read

function tests/nipux_cli/test_llm.py:124

No docstring.

Calls: none

fake_urlopen

function tests/nipux_cli/test_llm.py:132

No docstring.

Calls: FakeHTTPResponse

test_measurement_candidates_extract_markdown_table_unit_columns

function tests/nipux_cli/test_measurement.py:4

No docstring.

Calls: measurement_candidates, measurement_candidates_are_diagnostic_only

test_measurement_candidates_extract_generic_table_metrics

function tests/nipux_cli/test_measurement.py:21

No docstring.

Calls: measurement_candidates

test_format_metric_value_spaces_named_units

function tests/nipux_cli/test_metric_format.py:4

No docstring.

Calls: format_metric_value

test_format_metric_value_keeps_attached_symbol_units

function tests/nipux_cli/test_metric_format.py:9

No docstring.

Calls: format_metric_value

_entry

function tests/nipux_cli/test_operator_context.py:4

No docstring.

Calls: none

test_conversation_only_operator_messages_do_not_enter_worker_prompt

function tests/nipux_cli/test_operator_context.py:8

No docstring.

Calls: _entry, operator_entry_is_prompt_relevant

test_actionable_operator_messages_remain_worker_constraints

function tests/nipux_cli/test_operator_context.py:13

No docstring.

Calls: _entry, operator_entry_is_prompt_relevant

test_inactive_prompt_operator_ids_returns_only_conversation_active_messages

function tests/nipux_cli/test_operator_context.py:23

No docstring.

Calls: _entry, inactive_prompt_operator_ids

test_initial_task_contracts_are_generic_and_complete

function tests/nipux_cli/test_planning.py:4

No docstring.

Calls: initial_task_contract

test_initial_roadmap_uses_valid_generic_contracts

function tests/nipux_cli/test_planning.py:19

No docstring.

Calls: initial_roadmap_for_objective

test_initial_plan_adapts_to_measurable_objectives

function tests/nipux_cli/test_planning.py:35

No docstring.

Calls: any, initial_plan_for_objective, initial_task_contract, question.lower, title.lower

test_initial_plan_adapts_to_deliverable_objectives

function tests/nipux_cli/test_planning.py:45

No docstring.

Calls: any, initial_plan_for_objective, initial_task_contract, title.lower

test_initial_plan_treats_generated_files_as_deliverables

function tests/nipux_cli/test_planning.py:55

No docstring.

Calls: any, initial_plan_for_objective, initial_task_contract, question.lower

test_initial_plan_adapts_to_monitoring_objectives

function tests/nipux_cli/test_planning.py:64

No docstring.

Calls: any, initial_plan_for_objective, initial_task_contract, question.lower

test_initial_plan_does_not_add_meta_progress_update_task

function tests/nipux_cli/test_planning.py:73

No docstring.

Calls: all, initial_plan_for_objective, title.lower

test_objective_profiles_stay_generic

function tests/nipux_cli/test_planning.py:86

No docstring.

Calls: all, objective_profiles

test_progress_checkpoint_reports_deltas_and_recent_durable_work

function tests/nipux_cli/test_progress.py:4

No docstring.

Calls: build_progress_checkpoint

test_progress_checkpoint_for_saved_output_is_concise

function tests/nipux_cli/test_progress.py:45

No docstring.

Calls: build_progress_checkpoint, checkpoint.message.startswith

test_progress_checkpoint_without_delta_is_activity_not_progress

function tests/nipux_cli/test_progress.py:61

No docstring.

Calls: build_progress_checkpoint

test_progress_checkpoint_counts_existing_record_updates_as_progress

function tests/nipux_cli/test_progress.py:75

No docstring.

Calls: build_progress_checkpoint

test_progress_checkpoint_ignores_non_substantive_record_touches

function tests/nipux_cli/test_progress.py:124

No docstring.

Calls: build_progress_checkpoint

test_progress_checkpoint_counts_roadmap_updates_and_validations

function tests/nipux_cli/test_progress.py:167

No docstring.

Calls: build_progress_checkpoint

test_progress_helpers_ignore_malformed_metadata

function tests/nipux_cli/test_progress.py:202

No docstring.

Calls: ledger_counts, recent_progress_bits

_load_generator

function tests/nipux_cli/test_project_atlas.py:6

No docstring.

Calls: Path, importlib.util.module_from_spec, importlib.util.spec_from_file_location, resolve, spec.loader.exec_module

test_project_atlas_generator_maps_prompts_tools_and_source_without_self_embedding

function tests/nipux_cli/test_project_atlas.py:17

No docstring.

Calls: _load_generator, any, generator.extract_prompts, generator.extract_tools, generator.load_source_files

test_project_atlas_redacts_secret_assignments_from_rendered_source

function tests/nipux_cli/test_project_atlas.py:29

No docstring.

Calls: _load_generator, generator.SourceFile, generator.render_source_file

ProviderPayloadError

class tests/nipux_cli/test_provider_errors.py:8

No docstring.

Calls: none

test_provider_action_required_detects_payload_and_status_text

function tests/nipux_cli/test_provider_errors.py:12

No docstring.

Calls: ProviderPayloadError, provider_action_required, provider_action_required_note

test_provider_rate_limited_detects_transient_rate_text

function tests/nipux_cli/test_provider_errors.py:18

No docstring.

Calls: provider_rate_limited

test_generic_template_pushes_artifacts_and_updates

function tests/nipux_cli/test_templates.py:4

No docstring.

Calls: program_for_job

test_static_tool_surface_is_focused

function tests/nipux_cli/test_tools.py:14

No docstring.

Calls: DEFAULT_REGISTRY.names, sorted, tuple

test_tool_registry_validates_required_arguments

function tests/nipux_cli/test_tools.py:37

No docstring.

Calls: AppConfig, DEFAULT_REGISTRY.validate_arguments, RuntimeConfig

test_tool_registry_blocks_truncated_reference_arguments

function tests/nipux_cli/test_tools.py:65

No docstring.

Calls: AppConfig, DEFAULT_REGISTRY.validate_arguments, RuntimeConfig

test_tool_access_config_filters_worker_schema_and_blocks_calls

function tests/nipux_cli/test_tools.py:83

No docstring.

Calls: AgentDB, AppConfig, ArtifactStore, DEFAULT_REGISTRY.handle, DEFAULT_REGISTRY.openai_tools, RuntimeConfig, ToolAccessConfig, ToolContext, db.close, db.create_job, json.loads

test_artifact_tools_roundtrip

function tests/nipux_cli/test_tools.py:104

No docstring.

Calls: AgentDB, AppConfig, ArtifactStore, DEFAULT_REGISTRY.handle, RuntimeConfig, ToolContext, db.add_step, db.close, db.create_job, db.start_run, json.loads

test_read_artifact_missing_ref_returns_valid_recent_refs

function tests/nipux_cli/test_tools.py:135

No docstring.

Calls: AgentDB, AppConfig, ArtifactStore, DEFAULT_REGISTRY.handle, RuntimeConfig, ToolContext, ctx.artifacts.write_text, db.add_step, db.close, db.create_job, db.start_run, json.loads

test_defer_job_records_resume_time_without_pausing

function tests/nipux_cli/test_tools.py:158

No docstring.

Calls: AgentDB, AppConfig, ArtifactStore, DEFAULT_REGISTRY.handle, RuntimeConfig, ToolContext, any, db.add_step, db.close, db.create_job, db.get_job, db.list_events, db.start_run, json.loads

test_shell_exec_tool_runs_bounded_command

function tests/nipux_cli/test_tools.py:186

No docstring.

Calls: AgentDB, AppConfig, ArtifactStore, DEFAULT_REGISTRY.handle, RuntimeConfig, ToolContext, db.add_step, db.close, db.create_job, db.start_run, json.loads

test_shell_exec_flags_masked_auth_failure_output

function tests/nipux_cli/test_tools.py:205

No docstring.

Calls: AgentDB, AppConfig, ArtifactStore, DEFAULT_REGISTRY.handle, RuntimeConfig, ToolContext, db.add_step, db.close, db.create_job, db.start_run, json.loads

test_write_file_tool_writes_and_appends_workspace_file

function tests/nipux_cli/test_tools.py:234

No docstring.

Calls: AgentDB, AppConfig, ArtifactStore, DEFAULT_REGISTRY.handle, RuntimeConfig, ToolContext, db.add_step, db.close, db.create_job, db.start_run, json.loads, monkeypatch.chdir, read_text

test_shell_exec_timeout_kills_process_group

function tests/nipux_cli/test_tools.py:260

No docstring.

Calls: AgentDB, AppConfig, ArtifactStore, DEFAULT_REGISTRY.handle, RuntimeConfig, ToolContext, db.add_step, db.close, db.create_job, db.start_run, json.loads

test_cleanup_registered_shell_processes_kills_orphaned_group

function tests/nipux_cli/test_tools.py:279

No docstring.

Calls: cleanup_registered_shell_processes, json.dumps, os.kill, os.killpg, process.poll, process.wait, range, registry.exists, registry.parent.mkdir, registry.write_text, subprocess.Popen, time.sleep

test_shell_exec_does_not_attach_local_ssh_config

function tests/nipux_cli/test_tools.py:307

No docstring.

Calls: AgentDB, AppConfig, ArtifactStore, DEFAULT_REGISTRY.handle, RuntimeConfig, ToolContext, db.add_step, db.close, db.create_job, db.start_run, json.loads

test_shell_exec_reports_nonzero_stderr_as_error

function tests/nipux_cli/test_tools.py:324

No docstring.

Calls: AgentDB, AppConfig, ArtifactStore, DEFAULT_REGISTRY.handle, RuntimeConfig, ToolContext, db.add_step, db.close, db.create_job, db.start_run, json.loads

test_shell_exec_flags_sudo_password_hidden_by_success_status

function tests/nipux_cli/test_tools.py:346

No docstring.

Calls: AgentDB, AppConfig, ArtifactStore, DEFAULT_REGISTRY.handle, RuntimeConfig, ToolContext, db.add_step, db.close, db.create_job, db.start_run, json.loads

test_shell_exec_flags_missing_command_hidden_by_success_status

function tests/nipux_cli/test_tools.py:375

No docstring.

Calls: AgentDB, AppConfig, ArtifactStore, DEFAULT_REGISTRY.handle, RuntimeConfig, ToolContext, db.add_step, db.close, db.create_job, db.start_run, json.loads

test_shell_exec_flags_missing_absolute_executable_hidden_by_success_status

function tests/nipux_cli/test_tools.py:397

No docstring.

Calls: AgentDB, AppConfig, ArtifactStore, DEFAULT_REGISTRY.handle, RuntimeConfig, ToolContext, db.add_step, db.close, db.create_job, db.start_run, json.loads

test_shell_exec_reports_empty_which_probe_as_missing_executable

function tests/nipux_cli/test_tools.py:421

No docstring.

Calls: AgentDB, AppConfig, ArtifactStore, DEFAULT_REGISTRY.handle, RuntimeConfig, ToolContext, db.add_step, db.close, db.create_job, db.start_run, json.loads

test_shell_exec_flags_empty_successful_probe_as_no_observation

function tests/nipux_cli/test_tools.py:444

No docstring.

Calls: AgentDB, AppConfig, ArtifactStore, DEFAULT_REGISTRY.handle, RuntimeConfig, ToolContext, db.add_step, db.close, db.create_job, db.start_run, json.loads

test_shell_exec_flags_missing_which_probe_hidden_by_true

function tests/nipux_cli/test_tools.py:468

No docstring.

Calls: AgentDB, AppConfig, ArtifactStore, DEFAULT_REGISTRY.handle, RuntimeConfig, ToolContext, db.add_step, db.close, db.create_job, db.start_run, json.loads

test_shell_exec_flags_make_failure_hidden_by_pipe_status

function tests/nipux_cli/test_tools.py:491

No docstring.

Calls: AgentDB, AppConfig, ArtifactStore, DEFAULT_REGISTRY.handle, RuntimeConfig, ToolContext, db.add_step, db.close, db.create_job, db.start_run, json.loads

test_update_job_state_keeps_terminal_statuses_operator_only

function tests/nipux_cli/test_tools.py:513

No docstring.

Calls: AgentDB, AppConfig, ArtifactStore, DEFAULT_REGISTRY.handle, RuntimeConfig, ToolContext, db.add_step, db.close, db.create_job, db.get_job, db.start_run, json.loads

test_report_update_tool_records_operator_visible_note

function tests/nipux_cli/test_tools.py:548

No docstring.

Calls: AgentDB, AppConfig, ArtifactStore, DEFAULT_REGISTRY.handle, RuntimeConfig, ToolContext, db.add_step, db.close, db.create_job, db.get_job, db.start_run, json.loads

test_record_lesson_tool_records_durable_learning

function tests/nipux_cli/test_tools.py:568

No docstring.

Calls: AgentDB, AppConfig, ArtifactStore, DEFAULT_REGISTRY.handle, RuntimeConfig, ToolContext, db.add_step, db.close, db.create_job, db.get_job, db.start_run, json.loads

test_record_lesson_cannot_clear_measurement_obligation_with_vague_lesson

function tests/nipux_cli/test_tools.py:592

No docstring.

Calls: AgentDB, AppConfig, ArtifactStore, DEFAULT_REGISTRY.handle, RuntimeConfig, ToolContext, db.add_step, db.close, db.create_job, db.get_job, db.start_run, json.loads

test_record_lesson_can_explain_invalid_measurement_obligation

function tests/nipux_cli/test_tools.py:627

No docstring.

Calls: AgentDB, AppConfig, ArtifactStore, DEFAULT_REGISTRY.handle, RuntimeConfig, ToolContext, db.add_step, db.close, db.create_job, db.get_job, db.start_run, get, json.loads

test_memory_graph_tools_roundtrip

function tests/nipux_cli/test_tools.py:668

No docstring.

Calls: AgentDB, AppConfig, ArtifactStore, DEFAULT_REGISTRY.handle, RuntimeConfig, ToolContext, db.add_step, db.close, db.create_job, db.get_job, db.list_events, db.start_run, json.loads, len

test_record_source_and_findings_tools_update_ledgers

function tests/nipux_cli/test_tools.py:728

No docstring.

Calls: AgentDB, AppConfig, ArtifactStore, DEFAULT_REGISTRY.handle, RuntimeConfig, ToolContext, any, db.add_step, db.close, db.create_job, db.get_job, db.start_run, json.loads

test_record_source_requires_assessment

function tests/nipux_cli/test_tools.py:773

No docstring.

Calls: AgentDB, AppConfig, ArtifactStore, DEFAULT_REGISTRY.handle, RuntimeConfig, ToolContext, db.add_step, db.close, db.create_job, db.get_job, db.start_run, get, json.loads

test_record_source_does_not_accept_type_without_assessment

function tests/nipux_cli/test_tools.py:792

No docstring.

Calls: AgentDB, AppConfig, ArtifactStore, DEFAULT_REGISTRY.handle, RuntimeConfig, ToolContext, db.add_step, db.close, db.create_job, db.start_run, json.loads

test_record_findings_reports_unchanged_duplicates_without_agent_update_noise

function tests/nipux_cli/test_tools.py:814

No docstring.

Calls: AgentDB, AppConfig, ArtifactStore, DEFAULT_REGISTRY.handle, RuntimeConfig, ToolContext, db.add_step, db.close, db.create_job, db.list_events, db.start_run, json.loads, len

test_record_findings_requires_evidence_anchor

function tests/nipux_cli/test_tools.py:849

No docstring.

Calls: AgentDB, AppConfig, ArtifactStore, DEFAULT_REGISTRY.handle, RuntimeConfig, ToolContext, db.add_step, db.close, db.create_job, db.get_job, db.start_run, get, json.loads

test_record_findings_reports_rejected_unevidenced_items_in_mixed_batch

function tests/nipux_cli/test_tools.py:873

No docstring.

Calls: AgentDB, AppConfig, ArtifactStore, DEFAULT_REGISTRY.handle, RuntimeConfig, ToolContext, db.add_step, db.close, db.create_job, db.get_job, db.start_run, json.loads

test_record_tasks_tool_updates_task_queue

function tests/nipux_cli/test_tools.py:904

No docstring.

Calls: AgentDB, AppConfig, ArtifactStore, DEFAULT_REGISTRY.handle, RuntimeConfig, ToolContext, db.add_step, db.close, db.create_job, db.get_job, db.start_run, json.loads

test_record_tasks_dedupes_semantic_task_under_backlog_pressure

function tests/nipux_cli/test_tools.py:951

No docstring.

Calls: AgentDB, AppConfig, ArtifactStore, DEFAULT_REGISTRY.handle, RuntimeConfig, ToolContext, db.add_step, db.close, db.create_job, db.get_job, db.start_run, json.loads, len, range

test_record_tasks_reports_unchanged_duplicates_without_agent_update_noise

function tests/nipux_cli/test_tools.py:1007

No docstring.

Calls: AgentDB, AppConfig, ArtifactStore, DEFAULT_REGISTRY.handle, RuntimeConfig, ToolContext, db.add_step, db.close, db.create_job, db.get_job, db.list_events, db.start_run, db.update_job_metadata, json.loads, len

test_record_tasks_cannot_defer_measurement_with_unrelated_task

function tests/nipux_cli/test_tools.py:1047

No docstring.

Calls: AgentDB, AppConfig, ArtifactStore, DEFAULT_REGISTRY.handle, RuntimeConfig, ToolContext, db.add_step, db.close, db.create_job, db.get_job, db.start_run, json.loads

test_record_tasks_can_defer_measurement_with_explicit_measurement_task

function tests/nipux_cli/test_tools.py:1081

No docstring.

Calls: AgentDB, AppConfig, ArtifactStore, DEFAULT_REGISTRY.handle, RuntimeConfig, ToolContext, db.add_step, db.close, db.create_job, db.get_job, db.start_run, get, json.loads

test_record_roadmap_tool_updates_roadmap

function tests/nipux_cli/test_tools.py:1125

No docstring.

Calls: AgentDB, AppConfig, ArtifactStore, DEFAULT_REGISTRY.handle, RuntimeConfig, ToolContext, db.add_step, db.close, db.create_job, db.get_job, db.start_run, json.loads

test_record_roadmap_dedupes_milestone_titles_even_when_keys_change

function tests/nipux_cli/test_tools.py:1171

No docstring.

Calls: AgentDB, AppConfig, ArtifactStore, DEFAULT_REGISTRY.handle, RuntimeConfig, ToolContext, db.add_step, db.close, db.create_job, db.get_job, db.start_run, len

test_record_milestone_validation_creates_follow_up_tasks

function tests/nipux_cli/test_tools.py:1216

No docstring.

Calls: AgentDB, AppConfig, ArtifactStore, DEFAULT_REGISTRY.handle, RuntimeConfig, ToolContext, db.add_step, db.close, db.create_job, db.get_job, db.start_run, json.loads

test_record_milestone_validation_requires_evidence_for_passed_status

function tests/nipux_cli/test_tools.py:1255

No docstring.

Calls: AgentDB, AppConfig, ArtifactStore, DEFAULT_REGISTRY.handle, RuntimeConfig, ToolContext, db.add_step, db.close, db.create_job, db.get_job, db.start_run, get, json.loads

test_record_milestone_validation_allows_passed_status_with_metadata_evidence

function tests/nipux_cli/test_tools.py:1281

No docstring.

Calls: AgentDB, AppConfig, ArtifactStore, DEFAULT_REGISTRY.handle, RuntimeConfig, ToolContext, db.add_step, db.close, db.create_job, db.start_run, json.loads

test_record_milestone_validation_requires_gap_for_failed_or_blocked_status

function tests/nipux_cli/test_tools.py:1307

No docstring.

Calls: AgentDB, AppConfig, ArtifactStore, DEFAULT_REGISTRY.handle, RuntimeConfig, ToolContext, db.add_step, db.close, db.create_job, db.get_job, db.start_run, get, json.loads

test_record_experiment_tool_tracks_best_measured_result

function tests/nipux_cli/test_tools.py:1342

No docstring.

Calls: AgentDB, AppConfig, ArtifactStore, DEFAULT_REGISTRY.handle, RuntimeConfig, ToolContext, db.add_step, db.close, db.create_job, db.get_job, db.start_run, json.loads

test_record_experiment_synthesizes_missing_title

function tests/nipux_cli/test_tools.py:1395

No docstring.

Calls: AgentDB, AppConfig, ArtifactStore, DEFAULT_REGISTRY.handle, RuntimeConfig, ToolContext, db.add_step, db.close, db.create_job, db.start_run, json.loads

test_record_experiment_requires_next_action_for_closed_trials

function tests/nipux_cli/test_tools.py:1421

No docstring.

Calls: AgentDB, AppConfig, ArtifactStore, DEFAULT_REGISTRY.handle, RuntimeConfig, ToolContext, db.add_step, db.close, db.create_job, db.get_job, db.start_run, get, json.loads

test_record_experiment_requires_context_for_closed_non_measured_trials

function tests/nipux_cli/test_tools.py:1449

No docstring.

Calls: AgentDB, AppConfig, ArtifactStore, DEFAULT_REGISTRY.handle, RuntimeConfig, ToolContext, db.add_step, db.close, db.create_job, db.get_job, db.start_run, get, json.loads

test_record_experiment_accepts_blocked_trial_with_context

function tests/nipux_cli/test_tools.py:1477

No docstring.

Calls: AgentDB, AppConfig, ArtifactStore, DEFAULT_REGISTRY.handle, RuntimeConfig, ToolContext, db.add_step, db.close, db.create_job, db.start_run, json.loads

test_record_experiment_requires_metric_for_measured_trials

function tests/nipux_cli/test_tools.py:1506

No docstring.

Calls: AgentDB, AppConfig, ArtifactStore, DEFAULT_REGISTRY.handle, RuntimeConfig, ToolContext, db.add_step, db.close, db.create_job, db.get_job, db.start_run, get, json.loads

test_record_experiment_accepts_numeric_metric_strings

function tests/nipux_cli/test_tools.py:1547

No docstring.

Calls: AgentDB, AppConfig, ArtifactStore, DEFAULT_REGISTRY.handle, RuntimeConfig, ToolContext, db.add_step, db.close, db.create_job, db.start_run, json.loads

test_acknowledge_operator_context_tool_marks_context

function tests/nipux_cli/test_tools.py:1577

No docstring.

Calls: AgentDB, AppConfig, ArtifactStore, DEFAULT_REGISTRY.handle, RuntimeConfig, ToolContext, db.add_step, db.append_operator_message, db.claim_operator_messages, db.close, db.create_job, db.get_job, db.start_run, json.loads

test_acknowledge_operator_context_requires_active_context

function tests/nipux_cli/test_tools.py:1604

No docstring.

Calls: AgentDB, AppConfig, ArtifactStore, DEFAULT_REGISTRY.handle, RuntimeConfig, ToolContext, db.add_step, db.close, db.create_job, db.get_job, db.start_run, json.loads

test_record_tasks_accepts_generic_output_contracts

function tests/nipux_cli/test_tools.py:1629

No docstring.

Calls: AgentDB, AppConfig, ArtifactStore, DEFAULT_REGISTRY.handle, RuntimeConfig, ToolContext, db.add_step, db.close, db.create_job, db.get_job, db.start_run, json.loads

test_record_tasks_promotes_output_contract_from_metadata

function tests/nipux_cli/test_tools.py:1664

No docstring.

Calls: AgentDB, AppConfig, ArtifactStore, DEFAULT_REGISTRY.handle, RuntimeConfig, ToolContext, db.add_step, db.close, db.create_job, db.get_job, db.start_run, json.loads

test_record_tasks_downgrades_done_artifact_without_delivery_evidence

function tests/nipux_cli/test_tools.py:1695

No docstring.

Calls: AgentDB, AppConfig, ArtifactStore, DEFAULT_REGISTRY.handle, RuntimeConfig, ToolContext, db.add_step, db.close, db.create_job, db.get_job, db.start_run, json.loads

test_record_tasks_downgrades_done_without_result_evidence

function tests/nipux_cli/test_tools.py:1727

No docstring.

Calls: AgentDB, AppConfig, ArtifactStore, DEFAULT_REGISTRY.handle, RuntimeConfig, ToolContext, db.add_step, db.close, db.create_job, db.get_job, db.start_run, json.loads

test_record_tasks_downgrades_done_research_without_durable_evidence

function tests/nipux_cli/test_tools.py:1757

No docstring.

Calls: AgentDB, AppConfig, ArtifactStore, DEFAULT_REGISTRY.handle, RuntimeConfig, ToolContext, db.add_step, db.close, db.create_job, db.get_job, db.start_run, json.loads

test_record_tasks_allows_done_research_after_source_evidence

function tests/nipux_cli/test_tools.py:1789

No docstring.

Calls: AgentDB, AppConfig, ArtifactStore, DEFAULT_REGISTRY.handle, RuntimeConfig, ToolContext, db.add_step, db.close, db.create_job, db.finish_step, db.get_job, db.start_run, json.loads, task.get

test_record_tasks_allows_done_research_with_metadata_evidence

function tests/nipux_cli/test_tools.py:1822

No docstring.

Calls: AgentDB, AppConfig, ArtifactStore, DEFAULT_REGISTRY.handle, RuntimeConfig, ToolContext, db.add_step, db.close, db.create_job, db.get_job, db.start_run, json.loads, task.get

test_record_tasks_downgrades_done_experiment_without_measurement_evidence

function tests/nipux_cli/test_tools.py:1852

No docstring.

Calls: AgentDB, AppConfig, ArtifactStore, DEFAULT_REGISTRY.handle, RuntimeConfig, ToolContext, db.add_step, db.close, db.create_job, db.get_job, db.start_run, json.loads

test_record_tasks_allows_done_experiment_after_measurement_evidence

function tests/nipux_cli/test_tools.py:1882

No docstring.

Calls: AgentDB, AppConfig, ArtifactStore, DEFAULT_REGISTRY.handle, RuntimeConfig, ToolContext, db.add_step, db.close, db.create_job, db.finish_step, db.get_job, db.start_run, json.loads, task.get

test_record_tasks_downgrades_done_action_after_read_only_shell

function tests/nipux_cli/test_tools.py:1914

No docstring.

Calls: AgentDB, AppConfig, ArtifactStore, DEFAULT_REGISTRY.handle, RuntimeConfig, ToolContext, db.add_step, db.close, db.create_job, db.finish_step, db.get_job, db.start_run, json.loads

test_record_tasks_allows_done_action_after_action_shell

function tests/nipux_cli/test_tools.py:1952

No docstring.

Calls: AgentDB, AppConfig, ArtifactStore, DEFAULT_REGISTRY.handle, RuntimeConfig, ToolContext, db.add_step, db.close, db.create_job, db.finish_step, db.get_job, db.start_run, json.loads, task.get

test_record_tasks_downgrades_done_monitor_without_defer_evidence

function tests/nipux_cli/test_tools.py:1990

No docstring.

Calls: AgentDB, AppConfig, ArtifactStore, DEFAULT_REGISTRY.handle, RuntimeConfig, ToolContext, db.add_step, db.close, db.create_job, db.get_job, db.start_run, json.loads

test_record_tasks_allows_done_monitor_after_defer_evidence

function tests/nipux_cli/test_tools.py:2020

No docstring.

Calls: AgentDB, AppConfig, ArtifactStore, DEFAULT_REGISTRY.handle, RuntimeConfig, ToolContext, db.add_step, db.close, db.create_job, db.finish_step, db.get_job, db.start_run, json.loads, task.get

test_record_tasks_allows_done_artifact_after_delivery_evidence

function tests/nipux_cli/test_tools.py:2052

No docstring.

Calls: AgentDB, AppConfig, ArtifactStore, DEFAULT_REGISTRY.handle, RuntimeConfig, ToolContext, db.add_step, db.close, db.create_job, db.finish_step, db.get_job, db.start_run, json.loads, task.get

test_record_tasks_does_not_treat_stderr_redirect_as_delivery_write

function tests/nipux_cli/test_tools.py:2091

No docstring.

Calls: AgentDB, AppConfig, ArtifactStore, DEFAULT_REGISTRY.handle, RuntimeConfig, ToolContext, db.add_step, db.close, db.create_job, db.finish_step, db.get_job, db.start_run, json.loads

test_record_tasks_rejects_checkpoint_as_delivery_evidence

function tests/nipux_cli/test_tools.py:2130

No docstring.

Calls: AgentDB, AppConfig, ArtifactStore, DEFAULT_REGISTRY.handle, RuntimeConfig, ToolContext, db.add_step, db.close, db.create_job, db.finish_step, db.get_job, db.start_run, json.loads

_completed

function tests/nipux_cli/test_uninstall.py:7

No docstring.

Calls: subprocess.CompletedProcess

test_uninstall_plan_includes_runtime_and_legacy_state

function tests/nipux_cli/test_uninstall.py:11

No docstring.

Calls: build_uninstall_plan, monkeypatch.setenv, str

test_uninstall_plan_includes_configured_runtime_home

function tests/nipux_cli/test_uninstall.py:26

No docstring.

Calls: build_uninstall_plan, monkeypatch.setenv, str

test_uninstall_runtime_removes_state_and_service_files

function tests/nipux_cli/test_uninstall.py:39

No docstring.

Calls: any, exists, monkeypatch.setenv, path.mkdir, profile.exists, str, uninstall_runtime, write_text

test_uninstall_runtime_dry_run_keeps_files

function tests/nipux_cli/test_uninstall.py:68

No docstring.

Calls: any, monkeypatch.setenv, profile.exists, profile.mkdir, str, uninstall_runtime

test_uninstall_installed_tool_uses_uv_when_available

function tests/nipux_cli/test_uninstall.py:81

No docstring.

Calls: calls.append, join, monkeypatch.setattr, monkeypatch.setenv, str, subprocess.CompletedProcess, tuple, uninstall_installed_tool

runner

function tests/nipux_cli/test_uninstall.py:87

No docstring.

Calls: calls.append, subprocess.CompletedProcess, tuple

test_uninstall_installed_tool_falls_back_to_safe_uv_paths

function tests/nipux_cli/test_uninstall.py:98

No docstring.

Calls: join, monkeypatch.setattr, monkeypatch.setenv, shim.exists, shim.parent.mkdir, shim.symlink_to, str, target.write_text, tool_bin.mkdir, tool_dir.exists, uninstall_installed_tool

which

function tests/nipux_cli/test_uninstall.py:110

No docstring.

Calls: str

test_installed_tool_paths_ignore_non_user_tool

function tests/nipux_cli/test_uninstall.py:128

No docstring.

Calls: Path, installed_tool_paths, monkeypatch.setattr, monkeypatch.setenv, str

SnapshotRegistry

class tests/nipux_cli/test_worker.py:22

No docstring.

Calls: json.dumps

openai_tools

function tests/nipux_cli/test_worker.py:23

No docstring.

Calls: none

handle

function tests/nipux_cli/test_worker.py:26

No docstring.

Calls: json.dumps

SuccessRegistry

class tests/nipux_cli/test_worker.py:31

No docstring.

Calls: json.dumps

openai_tools

function tests/nipux_cli/test_worker.py:32

No docstring.

Calls: none

handle

function tests/nipux_cli/test_worker.py:35

No docstring.

Calls: json.dumps

MeasuredShellRegistry

class tests/nipux_cli/test_worker.py:40

No docstring.

Calls: json.dumps

openai_tools

function tests/nipux_cli/test_worker.py:41

No docstring.

Calls: none

handle

function tests/nipux_cli/test_worker.py:44

No docstring.

Calls: json.dumps

DiagnosticShellRegistry

class tests/nipux_cli/test_worker.py:51

No docstring.

Calls: json.dumps

openai_tools

function tests/nipux_cli/test_worker.py:52

No docstring.

Calls: none

handle

function tests/nipux_cli/test_worker.py:55

No docstring.

Calls: json.dumps

TableBenchmarkShellRegistry

class tests/nipux_cli/test_worker.py:68

No docstring.

Calls: json.dumps

openai_tools

function tests/nipux_cli/test_worker.py:69

No docstring.

Calls: none

handle

function tests/nipux_cli/test_worker.py:72

No docstring.

Calls: json.dumps

FailedTableBenchmarkShellRegistry

class tests/nipux_cli/test_worker.py:90

No docstring.

Calls: json.dumps

openai_tools

function tests/nipux_cli/test_worker.py:91

No docstring.

Calls: none

handle

function tests/nipux_cli/test_worker.py:94

No docstring.

Calls: json.dumps

FailedUrlShellRegistry

class tests/nipux_cli/test_worker.py:112

No docstring.

Calls: args.get, json.dumps

openai_tools

function tests/nipux_cli/test_worker.py:113

No docstring.

Calls: none

handle

function tests/nipux_cli/test_worker.py:116

No docstring.

Calls: args.get, json.dumps

HangingLLM

class tests/nipux_cli/test_worker.py:133

No docstring.

Calls: LLMResponse, ToolCall, time.sleep

next_action

function tests/nipux_cli/test_worker.py:134

No docstring.

Calls: LLMResponse, ToolCall, time.sleep

SlowLLM

class tests/nipux_cli/test_worker.py:142

No docstring.

Calls: LLMResponse, ToolCall, time.sleep

__init__

function tests/nipux_cli/test_worker.py:143

No docstring.

Calls: none

next_action

function tests/nipux_cli/test_worker.py:146

No docstring.

Calls: LLMResponse, ToolCall, time.sleep

RepairableLLM

class tests/nipux_cli/test_worker.py:154

No docstring.

Calls: LLMResponse, list, self.messages.append, self.responses.pop, self.tools.append

__init__

function tests/nipux_cli/test_worker.py:157

No docstring.

Calls: list

next_action

function tests/nipux_cli/test_worker.py:162

No docstring.

Calls: LLMResponse, self.messages.append, self.responses.pop, self.tools.append

SourceCodeShellRegistry

class tests/nipux_cli/test_worker.py:170

No docstring.

Calls: json.dumps

openai_tools

function tests/nipux_cli/test_worker.py:171

No docstring.

Calls: none

handle

function tests/nipux_cli/test_worker.py:174

No docstring.

Calls: json.dumps

LargeShellEvidenceRegistry

class tests/nipux_cli/test_worker.py:187

No docstring.

Calls: join, json.dumps, range

openai_tools

function tests/nipux_cli/test_worker.py:188

No docstring.

Calls: none

handle

function tests/nipux_cli/test_worker.py:191

No docstring.

Calls: join, json.dumps, range

ExtractRegistry

class tests/nipux_cli/test_worker.py:204

No docstring.

Calls: json.dumps

openai_tools

function tests/nipux_cli/test_worker.py:205

No docstring.

Calls: none

handle

function tests/nipux_cli/test_worker.py:208

No docstring.

Calls: json.dumps

SearchRegistry

class tests/nipux_cli/test_worker.py:221

No docstring.

Calls: json.dumps

openai_tools

function tests/nipux_cli/test_worker.py:222

No docstring.

Calls: none

handle

function tests/nipux_cli/test_worker.py:225

No docstring.

Calls: json.dumps

BrowserAndWebRegistry

class tests/nipux_cli/test_worker.py:239

No docstring.

Calls: json.dumps

openai_tools

function tests/nipux_cli/test_worker.py:240

No docstring.

Calls: none

handle

function tests/nipux_cli/test_worker.py:247

No docstring.

Calls: json.dumps

CapturingLLM

class tests/nipux_cli/test_worker.py:252

No docstring.

Calls: none

__init__

function tests/nipux_cli/test_worker.py:253

No docstring.

Calls: none

next_action

function tests/nipux_cli/test_worker.py:258

No docstring.

Calls: none

ExplodingLLM

class tests/nipux_cli/test_worker.py:264

No docstring.

Calls: AssertionError

next_action

function tests/nipux_cli/test_worker.py:265

No docstring.

Calls: AssertionError

AntiBotBrowserRegistry

class tests/nipux_cli/test_worker.py:270

No docstring.

Calls: json.dumps

openai_tools

function tests/nipux_cli/test_worker.py:271

No docstring.

Calls: none

handle

function tests/nipux_cli/test_worker.py:274

No docstring.

Calls: json.dumps

test_system_prompt_is_contract_first_not_research_first

function tests/nipux_cli/test_worker.py:287

No docstring.

Calls: none

test_run_one_step_executes_scripted_tool_call

function tests/nipux_cli/test_worker.py:295

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.close, db.create_job, db.list_artifacts, db.list_memory, db.list_steps, run_one_step

test_run_one_step_records_estimated_usage_for_scripted_model

function tests/nipux_cli/test_worker.py:329

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, db.close, db.create_job, db.job_token_usage, db.list_events, event.get, next, run_one_step

test_run_one_step_blocks_content_only_worker_turn

function tests/nipux_cli/test_worker.py:360

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, build_messages, db.close, db.create_job, db.get_job, db.list_steps, db.list_timeline_events, run_one_step

test_run_one_step_repairs_content_only_worker_turn_with_tool_retry

function tests/nipux_cli/test_worker.py:393

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RepairableLLM, RuntimeConfig, ToolCall, db.close, db.create_job, db.job_token_usage, db.list_steps, len, run_one_step

test_run_one_step_recovers_repeated_content_only_worker_turns

function tests/nipux_cli/test_worker.py:418

No docstring.

Calls: AgentDB, AppConfig, ExplodingLLM, LLMResponse, RuntimeConfig, ScriptedLLM, any, db.close, db.create_job, db.get_job, run_one_step

test_run_one_step_records_context_pressure_without_spam

function tests/nipux_cli/test_worker.py:443

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, ModelConfig, RuntimeConfig, ScriptedLLM, db.close, db.create_job, db.get_job, db.list_events, get, len, run_one_step

test_run_one_step_executes_tool_call_batch_in_order

function tests/nipux_cli/test_worker.py:474

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, any, db.close, db.create_job, db.get_job, db.list_artifacts, db.list_runs, db.list_steps, run_one_step

test_write_artifact_reconciles_matching_report_task

function tests/nipux_cli/test_worker.py:526

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.close, db.create_job, db.get_job, get, item.get, len, run_one_step

test_evidence_artifact_does_not_complete_deliverable_task

function tests/nipux_cli/test_worker.py:579

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.close, db.create_job, db.get_job, get, item.get, run_one_step, task.get

test_new_deliverable_supersedes_old_auto_revision_task

function tests/nipux_cli/test_worker.py:629

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.close, db.create_job, db.get_job, get, next, run_one_step

test_audit_report_draft_counts_as_deliverable_output

function tests/nipux_cli/test_worker.py:679

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.close, db.create_job, db.get_job, run_one_step

test_checkpoint_artifact_does_not_complete_deliverable_task

function tests/nipux_cli/test_worker.py:724

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.close, db.create_job, db.get_job, run_one_step, task.get

test_evidence_artifact_can_complete_research_task

function tests/nipux_cli/test_worker.py:768

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.close, db.create_job, db.get_job, run_one_step

test_run_one_step_blocks_artifact_churn_until_progress_accounting

function tests/nipux_cli/test_worker.py:812

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.start_run, range, run_one_step

test_activity_checkpoint_streak_blocks_more_churn_until_ledger_update

function tests/nipux_cli/test_worker.py:866

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.close, db.create_job, db.update_job_metadata, run_one_step

test_task_only_checkpoint_streak_blocks_new_task_sprawl

function tests/nipux_cli/test_worker.py:911

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.close, db.create_job, db.update_job_metadata, run_one_step

test_task_only_checkpoint_updates_planning_streak

function tests/nipux_cli/test_worker.py:975

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.add_step, db.append_finding_record, db.close, db.create_job, db.finish_run, db.finish_step, db.get_job, db.start_run, range, run_one_step

test_task_resolution_checkpoint_resets_planning_streak

function tests/nipux_cli/test_worker.py:1025

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.add_step, db.append_task_record, db.close, db.create_job, db.finish_run, db.finish_step, db.get_job, db.start_run, db.update_job_metadata, range

test_run_one_step_blocks_similar_artifact_search

function tests/nipux_cli/test_worker.py:1085

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.start_run, run_one_step

test_run_one_step_blocks_artifact_review_when_tasks_are_exhausted

function tests/nipux_cli/test_worker.py:1124

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.close, db.create_job, run_one_step

test_run_one_step_recovers_repeated_guard_blocks_without_llm

function tests/nipux_cli/test_worker.py:1151

No docstring.

Calls: AgentDB, AppConfig, ExplodingLLM, RuntimeConfig, any, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.get_job, db.start_run, enumerate, run_one_step

test_guard_recovery_does_not_add_task_for_queue_saturation

function tests/nipux_cli/test_worker.py:1185

No docstring.

Calls: AgentDB, AppConfig, ExplodingLLM, RuntimeConfig, any, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.get_job, db.start_run, len, range, run_one_step, startswith

test_run_one_step_recovers_repeated_evidence_grounding_blocks

function tests/nipux_cli/test_worker.py:1241

No docstring.

Calls: AgentDB, AppConfig, ExplodingLLM, RuntimeConfig, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.start_run, range, run_one_step

test_run_one_step_recovers_repeated_known_bad_source_blocks

function tests/nipux_cli/test_worker.py:1272

No docstring.

Calls: AgentDB, AppConfig, ExplodingLLM, RuntimeConfig, any, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.get_job, db.start_run, range, run_one_step

test_guard_recovery_does_not_repeat_after_recovery_step

function tests/nipux_cli/test_worker.py:1305

No docstring.

Calls: AgentDB, AppConfig, ExplodingLLM, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.list_steps, db.start_run, range, run_one_step

test_guard_recovery_does_not_keep_reopening_same_guard

function tests/nipux_cli/test_worker.py:1348

No docstring.

Calls: AgentDB, AppConfig, ExplodingLLM, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.list_steps, db.start_run, range, run_one_step

test_guard_recovery_reopens_same_guard_after_progress

function tests/nipux_cli/test_worker.py:1381

No docstring.

Calls: AgentDB, AppConfig, ExplodingLLM, RuntimeConfig, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.list_steps, db.start_run, range, run_one_step, sum

test_guard_recovery_accounts_pending_evidence_checkpoint

function tests/nipux_cli/test_worker.py:1435

No docstring.

Calls: AgentDB, AppConfig, ExplodingLLM, RuntimeConfig, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.get_job, db.start_run, db.update_job_metadata, range, run_one_step

test_guard_recovery_immediately_recovers_already_read_checkpoint_reread

function tests/nipux_cli/test_worker.py:1479

No docstring.

Calls: AgentDB, AppConfig, ExplodingLLM, RuntimeConfig, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.start_run, run_one_step

test_prompt_does_not_tell_worker_to_reread_checkpoint_after_it_was_read

function tests/nipux_cli/test_worker.py:1519

No docstring.

Calls: AgentDB, build_messages, db.close, db.create_job, db.get_job, db.list_steps, db.update_job_metadata

test_checkpoint_reread_block_requires_accounting_not_more_reads

function tests/nipux_cli/test_worker.py:1544

No docstring.

Calls: AgentDB, AppConfig, ExplodingLLM, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.close, db.create_job, db.update_job_metadata, run_one_step

test_already_read_checkpoint_branch_block_recovers_immediately

function tests/nipux_cli/test_worker.py:1587

No docstring.

Calls: AgentDB, AppConfig, ExplodingLLM, RuntimeConfig, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.get_job, db.start_run, db.update_job_metadata, run_one_step

test_evidence_grounding_ignores_format_protocol_tokens

function tests/nipux_cli/test_worker.py:1640

No docstring.

Calls: _concrete_evidence_tokens

test_evidence_grounding_ignores_lowercase_command_shorthand_tokens

function tests/nipux_cli/test_worker.py:1664

No docstring.

Calls: _concrete_evidence_tokens

test_record_experiment_allows_not_stub_validation_for_observed_token

function tests/nipux_cli/test_worker.py:1671

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.start_run, run_one_step

test_record_findings_ignores_generated_step_labels_as_claims

function tests/nipux_cli/test_worker.py:1720

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.start_run, run_one_step

test_write_artifact_allows_plain_prose_headings_without_evidence

function tests/nipux_cli/test_worker.py:1769

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.start_run, run_one_step

test_write_artifact_blocks_unsupported_high_risk_identifier

function tests/nipux_cli/test_worker.py:1815

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.start_run, run_one_step

test_web_search_auto_records_source_quality

function tests/nipux_cli/test_worker.py:1860

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, SearchRegistry, ToolCall, all, db.close, db.create_job, db.get_job, run_one_step

test_web_extract_auto_records_source_quality

function tests/nipux_cli/test_worker.py:1887

No docstring.

Calls: AgentDB, AppConfig, ExtractRegistry, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.close, db.create_job, db.get_job, next, run_one_step

test_worker_cannot_mark_job_completed_by_default

function tests/nipux_cli/test_worker.py:1914

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.close, db.create_job, db.get_job, run_one_step

test_report_update_completion_claim_is_rewritten_as_checkpoint

function tests/nipux_cli/test_worker.py:1939

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.close, db.create_job, db.get_job, next, run_one_step

test_run_one_step_claims_one_message_but_keeps_all_steering_in_prompt

function tests/nipux_cli/test_worker.py:1974

No docstring.

Calls: AgentDB, AppConfig, CapturingLLM, LLMResponse, RuntimeConfig, any, db.append_operator_message, db.close, db.create_job, db.get_job, db.list_timeline_events, get, run_one_step

FailingLLM

class tests/nipux_cli/test_worker.py:2000

No docstring.

Calls: RuntimeError

next_action

function tests/nipux_cli/test_worker.py:2001

No docstring.

Calls: RuntimeError

HardProviderFailingLLM

class tests/nipux_cli/test_worker.py:2006

No docstring.

Calls: LLMResponseError

next_action

function tests/nipux_cli/test_worker.py:2007

No docstring.

Calls: LLMResponseError

test_run_one_step_records_model_failures_instead_of_raising

function tests/nipux_cli/test_worker.py:2015

No docstring.

Calls: AgentDB, AppConfig, FailingLLM, RuntimeConfig, db.close, db.create_job, db.list_runs, db.list_steps, run_one_step

test_run_one_step_blocks_missing_tool_arguments_as_recoverable

function tests/nipux_cli/test_worker.py:2036

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.close, db.create_job, db.list_steps, run_one_step

test_run_one_step_continues_after_malformed_tool_arguments_when_batch_has_more_work

function tests/nipux_cli/test_worker.py:2056

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.close, db.create_job, db.list_runs, db.list_steps, run_one_step

test_run_one_step_continues_after_missing_artifact_when_batch_has_more_work

function tests/nipux_cli/test_worker.py:2082

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.close, db.create_job, db.list_runs, db.list_steps, run_one_step

test_run_one_step_continues_after_empty_operator_ack_when_batch_has_more_work

function tests/nipux_cli/test_worker.py:2109

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.close, db.create_job, db.list_runs, db.list_steps, run_one_step

test_run_one_step_blocks_placeholder_tool_arguments_as_recoverable

function tests/nipux_cli/test_worker.py:2136

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.close, db.create_job, db.list_steps, run_one_step

test_run_one_step_blocks_truncated_optional_reference_arguments

function tests/nipux_cli/test_worker.py:2156

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.close, db.create_job, db.list_steps, run_one_step

test_run_one_step_blocks_placeholder_shell_command_before_execution

function tests/nipux_cli/test_worker.py:2187

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.close, db.create_job, db.list_steps, run_one_step

test_run_one_step_blocks_tool_markup_shell_command_before_execution

function tests/nipux_cli/test_worker.py:2209

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.close, db.create_job, db.list_steps, run_one_step

test_run_one_step_blocks_unbalanced_shell_quotes_before_execution

function tests/nipux_cli/test_worker.py:2234

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.close, db.create_job, db.list_steps, run_one_step

test_run_one_step_blocks_markdown_fenced_shell_command_before_execution

function tests/nipux_cli/test_worker.py:2261

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.close, db.create_job, db.list_steps, run_one_step

test_run_one_step_times_out_stalled_model_call

function tests/nipux_cli/test_worker.py:2295

No docstring.

Calls: AgentDB, AppConfig, HangingLLM, ModelConfig, RuntimeConfig, db.close, db.create_job, db.list_steps, run_one_step

test_repeated_model_failures_do_not_create_automatic_defer

function tests/nipux_cli/test_worker.py:2317

No docstring.

Calls: AgentDB, AppConfig, ExplodingLLM, ModelConfig, RuntimeConfig, all, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.get_job, db.list_steps, db.start_run, get, range

test_legacy_model_cooldown_metadata_is_ignored

function tests/nipux_cli/test_worker.py:2348

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.close, db.create_job, db.get_job, db.list_events, db.update_job_metadata, next, run_one_step

test_run_one_step_pauses_job_on_hard_provider_failure

function tests/nipux_cli/test_worker.py:2371

No docstring.

Calls: AgentDB, AppConfig, HardProviderFailingLLM, RuntimeConfig, any, db.close, db.create_job, db.get_job, db.list_events, run_one_step

test_prompt_includes_recent_tool_arguments_and_observations

function tests/nipux_cli/test_worker.py:2392

No docstring.

Calls: Path.cwd, build_messages, str

test_prompt_recovers_from_missing_artifact_reference

function tests/nipux_cli/test_worker.py:2415

No docstring.

Calls: build_messages

test_prompt_does_not_inject_local_ssh_alias_context

function tests/nipux_cli/test_worker.py:2440

No docstring.

Calls: build_messages, monkeypatch.setenv, ssh_dir.mkdir, str, write_text

test_prompt_includes_operator_steering_messages

function tests/nipux_cli/test_worker.py:2455

No docstring.

Calls: build_messages

test_prompt_keeps_claimed_operator_context_until_acknowledged

function tests/nipux_cli/test_worker.py:2475

No docstring.

Calls: AgentDB, build_messages, db.acknowledge_operator_messages, db.append_operator_message, db.claim_operator_messages, db.close, db.create_job, db.get_job

test_prompt_keeps_unclaimed_steering_but_not_followup_until_claimed

function tests/nipux_cli/test_worker.py:2500

No docstring.

Calls: AgentDB, build_messages, db.append_operator_message, db.close, db.create_job, db.get_job

test_prompt_includes_context_pressure_constraint

function tests/nipux_cli/test_worker.py:2516

No docstring.

Calls: build_messages

test_prompt_includes_cumulative_usage_pressure

function tests/nipux_cli/test_worker.py:2539

No docstring.

Calls: build_messages

test_prompt_renders_task_contract_from_metadata_for_existing_tasks

function tests/nipux_cli/test_worker.py:2575

No docstring.

Calls: build_messages

test_prompt_keeps_persistent_task_backlog_pressure_visible

function tests/nipux_cli/test_worker.py:2600

No docstring.

Calls: build_messages, range

test_prompt_shows_current_task_backlog_pressure_without_prior_block

function tests/nipux_cli/test_worker.py:2631

No docstring.

Calls: build_messages, range

test_prompt_ignores_stale_task_backlog_pressure_after_queue_is_cleaned_up

function tests/nipux_cli/test_worker.py:2652

No docstring.

Calls: build_messages

test_run_one_step_clears_stale_task_backlog_pressure

function tests/nipux_cli/test_worker.py:2677

No docstring.

Calls: AgentDB, AppConfig, CapturingLLM, LLMResponse, RuntimeConfig, ToolCall, any, db.close, db.create_job, db.get_job, db.list_events, run_one_step

test_run_one_step_records_usage_pressure_without_spam

function tests/nipux_cli/test_worker.py:2715

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, ModelConfig, RuntimeConfig, ScriptedLLM, ToolCall, db.close, db.create_job, db.get_job, db.list_events, get, len, run_one_step

test_critical_usage_does_not_create_automatic_defer

function tests/nipux_cli/test_worker.py:2750

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, ModelConfig, RuntimeConfig, ScriptedLLM, ToolCall, db.append_event, db.close, db.create_job, db.get_job, get, run_one_step

test_prompt_ignores_legacy_usage_pressure_recovery_metadata

function tests/nipux_cli/test_worker.py:2776

No docstring.

Calls: build_messages

test_run_one_step_pauses_when_configured_cost_limit_is_reached

function tests/nipux_cli/test_worker.py:2805

No docstring.

Calls: AgentDB, AppConfig, ExplodingLLM, ModelConfig, RuntimeConfig, db.append_event, db.close, db.create_job, db.get_job, run_one_step

test_run_one_step_ignores_cost_limit_without_provider_cost_metadata

function tests/nipux_cli/test_worker.py:2834

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, ModelConfig, RuntimeConfig, ScriptedLLM, ToolCall, db.append_event, db.close, db.create_job, db.get_job, run_one_step

test_run_one_step_does_not_defer_critical_usage_after_progress

function tests/nipux_cli/test_worker.py:2861

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, ModelConfig, RuntimeConfig, ScriptedLLM, ToolCall, db.add_step, db.append_event, db.close, db.create_job, db.finish_run, db.finish_step, db.start_run, run_one_step

test_run_one_step_drops_conversation_only_chat_from_worker_prompt

function tests/nipux_cli/test_worker.py:2912

No docstring.

Calls: AgentDB, AppConfig, CapturingLLM, LLMResponse, RuntimeConfig, ToolCall, db.append_operator_message, db.close, db.create_job, db.get_job, get, run_one_step

test_build_messages_keeps_generic_context_under_budget

function tests/nipux_cli/test_worker.py:2937

No docstring.

Calls: build_messages, len, range

test_prompt_timeline_filters_low_signal_tool_noise

function tests/nipux_cli/test_worker.py:3012

No docstring.

Calls: build_messages, range, timeline.extend

test_prompt_includes_durable_outcome_summary

function tests/nipux_cli/test_worker.py:3063

No docstring.

Calls: build_messages, content.split, split

test_emergency_prompt_clipping_repeats_operator_and_next_action

function tests/nipux_cli/test_worker.py:3107

No docstring.

Calls: _render_worker_prompt, content.split, len, range, sections.append, sections.insert

test_build_messages_keeps_rolling_memory_when_not_first

function tests/nipux_cli/test_worker.py:3124

No docstring.

Calls: build_messages

test_build_messages_surfaces_recent_measurement_evidence_outside_state_window

function tests/nipux_cli/test_worker.py:3139

No docstring.

Calls: build_messages, range

test_measurement_obligation_blocks_research_until_recorded

function tests/nipux_cli/test_worker.py:3180

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, MeasuredShellRegistry, RuntimeConfig, ScriptedLLM, ToolCall, db.close, db.create_job, db.get_job, get, run_one_step

test_measurement_obligation_preserves_table_metric_candidates

function tests/nipux_cli/test_worker.py:3235

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, TableBenchmarkShellRegistry, ToolCall, db.close, db.create_job, db.get_job, run_one_step

test_failed_shell_measurement_output_still_requires_accounting

function tests/nipux_cli/test_worker.py:3258

No docstring.

Calls: AgentDB, AppConfig, FailedTableBenchmarkShellRegistry, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.close, db.create_job, db.get_job, run_one_step

test_measurement_obligation_blocks_operator_acknowledgement_churn

function tests/nipux_cli/test_worker.py:3280

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.close, db.create_job, db.update_job_metadata, run_one_step

test_pending_measurement_narrows_available_tools

function tests/nipux_cli/test_worker.py:3318

No docstring.

Calls: AgentDB, AppConfig, CapturingLLM, LLMResponse, RuntimeConfig, ToolCall, db.close, db.create_job, db.update_job_metadata, issubset, run_one_step

test_resolution_tools_survive_task_saturation_suppression

function tests/nipux_cli/test_worker.py:3361

No docstring.

Calls: AgentDB, AppConfig, CapturingLLM, LLMResponse, RuntimeConfig, ToolCall, db.add_step, db.close, db.create_job, db.finish_step, db.start_run, issubset, range, run_one_step

test_pending_evidence_checkpoint_narrows_available_tools

function tests/nipux_cli/test_worker.py:3409

No docstring.

Calls: AgentDB, AppConfig, CapturingLLM, LLMResponse, RuntimeConfig, ToolCall, db.close, db.create_job, db.update_job_metadata, issubset, run_one_step

test_acknowledge_operator_context_hidden_without_active_operator_context

function tests/nipux_cli/test_worker.py:3445

No docstring.

Calls: AgentDB, AppConfig, CapturingLLM, LLMResponse, RuntimeConfig, ToolCall, db.close, db.create_job, run_one_step

test_acknowledge_operator_context_visible_with_active_operator_context

function tests/nipux_cli/test_worker.py:3460

No docstring.

Calls: AgentDB, AppConfig, CapturingLLM, LLMResponse, RuntimeConfig, ToolCall, db.append_operator_message, db.close, db.create_job, run_one_step

test_diagnostic_shell_output_does_not_create_measurement_obligation

function tests/nipux_cli/test_worker.py:3476

No docstring.

Calls: AgentDB, AppConfig, DiagnosticShellRegistry, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.close, db.create_job, db.get_job, get, run_one_step

test_source_code_shell_output_does_not_create_measurement_obligation

function tests/nipux_cli/test_worker.py:3497

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, SourceCodeShellRegistry, ToolCall, db.close, db.create_job, db.get_job, get, run_one_step

test_prose_from_timed_command_does_not_create_measurement_obligation

function tests/nipux_cli/test_worker.py:3520

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, ProseShellRegistry, RuntimeConfig, ScriptedLLM, ToolCall, db.close, db.create_job, db.get_job, get, json.dumps, run_one_step

ProseShellRegistry

class tests/nipux_cli/test_worker.py:3521

No docstring.

Calls: json.dumps

openai_tools

function tests/nipux_cli/test_worker.py:3522

No docstring.

Calls: none

handle

function tests/nipux_cli/test_worker.py:3525

No docstring.

Calls: json.dumps

test_large_shell_output_must_be_saved_before_more_shell_churn

function tests/nipux_cli/test_worker.py:3557

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, LargeShellEvidenceRegistry, RuntimeConfig, ScriptedLLM, ToolCall, db.close, db.create_job, run_one_step

test_stale_diagnostic_measurement_obligation_is_cleared

function tests/nipux_cli/test_worker.py:3585

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.close, db.create_job, db.get_job, get, run_one_step

test_measurable_objective_blocks_research_after_budget_but_allows_action

function tests/nipux_cli/test_worker.py:3627

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, MeasuredShellRegistry, RuntimeConfig, ScriptedLLM, ToolCall, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.get_job, db.start_run, range, run_one_step

test_measurable_objective_blocks_shell_churn_without_experiment_accounting

function tests/nipux_cli/test_worker.py:3669

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, MeasuredShellRegistry, RuntimeConfig, ScriptedLLM, ToolCall, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.start_run, range, run_one_step

test_measured_progress_guard_narrows_available_tools_after_shell_budget

function tests/nipux_cli/test_worker.py:3700

No docstring.

Calls: AgentDB, AppConfig, CapturingLLM, LLMResponse, RuntimeConfig, ToolCall, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.start_run, issubset, range, run_one_step

test_measured_progress_guard_keeps_shell_available_before_shell_budget

function tests/nipux_cli/test_worker.py:3736

No docstring.

Calls: AgentDB, AppConfig, CapturingLLM, LLMResponse, RuntimeConfig, ToolCall, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.start_run, issubset, range, run_one_step

test_measured_progress_guard_ignores_non_measurement_task_updates

function tests/nipux_cli/test_worker.py:3772

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, MeasuredShellRegistry, RuntimeConfig, ScriptedLLM, ToolCall, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.start_run, range, run_one_step

test_measured_progress_guard_accepts_measurement_task_update

function tests/nipux_cli/test_worker.py:3814

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, MeasuredShellRegistry, RuntimeConfig, ScriptedLLM, ToolCall, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.start_run, range, run_one_step

test_measurable_objective_allows_candidate_file_validation_shell_after_budget

function tests/nipux_cli/test_worker.py:3856

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, MeasuredShellRegistry, RuntimeConfig, ScriptedLLM, ToolCall, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.start_run, db.update_job_metadata, range, run_one_step

test_prompt_includes_durable_lessons

function tests/nipux_cli/test_worker.py:3911

No docstring.

Calls: build_messages

test_prompt_suppresses_stale_negative_lessons_when_positive_durable_evidence_exists

function tests/nipux_cli/test_worker.py:3931

No docstring.

Calls: build_messages

test_prompt_keeps_negative_lessons_when_durable_evidence_is_negative

function tests/nipux_cli/test_worker.py:3961

No docstring.

Calls: build_messages

test_prompt_includes_memory_graph_slice

function tests/nipux_cli/test_worker.py:3988

No docstring.

Calls: build_messages

test_prompt_suppresses_memory_graph_nodes_matching_stale_claim_tokens

function tests/nipux_cli/test_worker.py:4035

No docstring.

Calls: AgentDB, build_messages, db.append_lesson, db.append_memory_graph_records, db.close, db.create_job, db.get_job, db.list_steps

test_prompt_suppresses_negative_memory_graph_nodes_matching_stale_file_type

function tests/nipux_cli/test_worker.py:4074

No docstring.

Calls: AgentDB, build_messages, db.append_memory_graph_records, db.close, db.create_job, db.get_job, db.list_steps, db.update_job_metadata

test_prompt_pushes_memory_graph_consolidation_when_ledgers_exist_without_nodes

function tests/nipux_cli/test_worker.py:4121

No docstring.

Calls: build_messages

test_prompt_adds_memory_consolidation_guard_when_graph_lags_ledgers

function tests/nipux_cli/test_worker.py:4140

No docstring.

Calls: build_messages

test_prompt_adds_research_balance_guard_for_execution_without_sources

function tests/nipux_cli/test_worker.py:4163

No docstring.

Calls: build_messages, range

test_prompt_research_balance_guard_clears_when_sources_exist

function tests/nipux_cli/test_worker.py:4191

No docstring.

Calls: build_messages, range

_source_yield_metadata

function tests/nipux_cli/test_worker.py:4217

No docstring.

Calls: range

_source_gathering_steps

function tests/nipux_cli/test_worker.py:4252

No docstring.

Calls: range

test_prompt_adds_source_yield_guard_when_sources_are_not_synthesized

function tests/nipux_cli/test_worker.py:4265

No docstring.

Calls: _source_gathering_steps, _source_yield_metadata, build_messages

test_prompt_source_yield_guard_clears_when_findings_cover_sources

function tests/nipux_cli/test_worker.py:4282

No docstring.

Calls: _source_gathering_steps, _source_yield_metadata, build_messages

test_run_one_step_blocks_more_source_gathering_when_source_yield_is_missing

function tests/nipux_cli/test_worker.py:4296

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, _source_gathering_steps, _source_yield_metadata, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.start_run, run_one_step

test_source_yield_guard_takes_priority_over_memory_consolidation

function tests/nipux_cli/test_worker.py:4334

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, _source_gathering_steps, _source_yield_metadata, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.start_run, run_one_step

test_run_one_step_allows_source_yield_accounting

function tests/nipux_cli/test_worker.py:4371

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, _source_gathering_steps, _source_yield_metadata, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.start_run, run_one_step

test_run_one_step_blocks_execution_when_research_balance_is_missing

function tests/nipux_cli/test_worker.py:4418

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.start_run, range, run_one_step

test_run_one_step_blocks_lesson_churn_when_research_balance_is_missing

function tests/nipux_cli/test_worker.py:4457

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.start_run, range, run_one_step

test_run_one_step_blocks_durable_records_with_unsupported_concrete_claims

function tests/nipux_cli/test_worker.py:4504

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, any, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.get_job, db.start_run, run_one_step

test_record_experiment_blocks_unsupported_proper_noun_hardware_claims

function tests/nipux_cli/test_worker.py:4555

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.start_run, run_one_step, set

test_record_experiment_allows_supported_proper_noun_hardware_claims

function tests/nipux_cli/test_worker.py:4606

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.get_job, db.start_run, run_one_step

test_record_lesson_blocks_negative_claim_that_conflicts_with_positive_evidence

function tests/nipux_cli/test_worker.py:4657

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.start_run, run_one_step

test_record_lesson_ignores_plain_titlecase_negative_conflict_tokens

function tests/nipux_cli/test_worker.py:4707

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.start_run, run_one_step

test_record_lesson_allows_negative_claim_when_evidence_is_also_negative

function tests/nipux_cli/test_worker.py:4746

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.get_job, db.start_run, run_one_step

test_shell_path_recovery_prompt_shows_missing_executable

function tests/nipux_cli/test_worker.py:4790

No docstring.

Calls: AgentDB, AppConfig, CapturingLLM, LLMResponse, RuntimeConfig, ToolCall, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.start_run, run_one_step

test_shell_path_recovery_prompt_prefers_observed_candidate_executable

function tests/nipux_cli/test_worker.py:4830

No docstring.

Calls: AgentDB, build_messages, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.get_job, db.list_steps, db.start_run

test_shell_path_recovery_prompt_preserves_partial_success_paths

function tests/nipux_cli/test_worker.py:4871

No docstring.

Calls: AgentDB, build_messages, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.get_job, db.list_steps, db.start_run

test_shell_exec_blocks_bare_retry_when_candidate_executable_observed

function tests/nipux_cli/test_worker.py:4909

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, SuccessRegistry, ToolCall, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.start_run, run_one_step

test_permission_failure_prompt_blocks_package_manager_retry

function tests/nipux_cli/test_worker.py:4960

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, SuccessRegistry, ToolCall, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.start_run, run_one_step

test_permission_failure_prompt_mentions_non_privileged_recovery

function tests/nipux_cli/test_worker.py:5003

No docstring.

Calls: AgentDB, build_messages, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.get_job, db.list_steps, db.start_run

test_record_findings_blocks_negative_file_pattern_that_conflicts_with_positive_evidence

function tests/nipux_cli/test_worker.py:5039

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.start_run, run_one_step

test_file_pattern_grounding_ignores_hidden_path_components

function tests/nipux_cli/test_worker.py:5092

No docstring.

Calls: _file_pattern_tokens_for_grounding

test_record_experiment_allows_classifying_observed_files_as_non_primary

function tests/nipux_cli/test_worker.py:5106

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.start_run, run_one_step

test_record_findings_requires_exact_paths_when_file_candidates_exist

function tests/nipux_cli/test_worker.py:5164

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.start_run, run_one_step

test_missing_candidate_paths_are_ranked_before_grounding_guidance

function tests/nipux_cli/test_worker.py:5217

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.start_run, join, range, run_one_step

test_record_findings_allows_exact_candidate_path_summary

function tests/nipux_cli/test_worker.py:5267

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.start_run, run_one_step

test_evidence_grounding_blocks_positive_claim_for_missing_path

function tests/nipux_cli/test_worker.py:5312

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.start_run, run_one_step

test_evidence_grounding_checks_later_positive_path_mentions

function tests/nipux_cli/test_worker.py:5366

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.start_run, run_one_step

test_record_findings_allows_negative_file_pattern_when_evidence_is_negative

function tests/nipux_cli/test_worker.py:5415

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.get_job, db.start_run, run_one_step

test_run_one_step_marks_contradicted_negative_finding_stale

function tests/nipux_cli/test_worker.py:5462

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, _ledgers_for_prompt, db.add_step, db.append_finding_record, db.close, db.create_job, db.finish_run, db.finish_step, db.get_job, db.start_run, get

test_run_one_step_marks_contradicted_negative_memory_node_stale

function tests/nipux_cli/test_worker.py:5523

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, any, build_messages, db.add_step, db.append_memory_graph_records, db.close, db.create_job, db.finish_run, db.finish_step, db.get_job, db.list_steps

test_record_lesson_allows_generic_strategy_without_concrete_facts

function tests/nipux_cli/test_worker.py:5592

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.start_run, run_one_step

test_record_lesson_allows_positive_checkpoint_summary_with_new_concrete_terms

function tests/nipux_cli/test_worker.py:5624

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.start_run, run_one_step

test_record_findings_blocks_single_unsupported_identifier

function tests/nipux_cli/test_worker.py:5663

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.start_run, run_one_step

test_evidence_grounding_ignores_job_context_labels

function tests/nipux_cli/test_worker.py:5708

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.start_run, run_one_step

test_evidence_grounding_blocks_unsupported_numeric_measurements

function tests/nipux_cli/test_worker.py:5753

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.start_run, run_one_step

test_evidence_grounding_ignores_record_schema_keys

function tests/nipux_cli/test_worker.py:5794

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.get_job, db.start_run, run_one_step

test_evidence_grounding_uses_durable_finding_location_and_metadata

function tests/nipux_cli/test_worker.py:5837

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.append_finding_record, db.close, db.create_job, run_one_step

test_evidence_grounding_ignores_json_literals_even_when_stale_tokens_exist

function tests/nipux_cli/test_worker.py:5880

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.start_run, db.update_job_metadata, run_one_step

test_evidence_grounding_ignores_planning_and_status_labels

function tests/nipux_cli/test_worker.py:5919

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.start_run, run_one_step

test_run_one_step_blocks_memory_graph_with_unsupported_claims

function tests/nipux_cli/test_worker.py:5961

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.start_run, run_one_step

test_run_one_step_allows_memory_graph_identifier_labels_without_evidence

function tests/nipux_cli/test_worker.py:6008

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.start_run, run_one_step

test_run_one_step_still_blocks_stale_memory_graph_key_claims

function tests/nipux_cli/test_worker.py:6061

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.start_run, run_one_step

test_run_one_step_allows_memory_graph_grounded_in_durable_records

function tests/nipux_cli/test_worker.py:6115

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.append_finding_record, db.close, db.create_job, run_one_step

test_run_one_step_blocks_memory_graph_grounded_only_in_stale_records

function tests/nipux_cli/test_worker.py:6157

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.append_finding_record, db.close, db.create_job, run_one_step

test_run_one_step_allows_stale_token_when_fresh_evidence_revalidates_it

function tests/nipux_cli/test_worker.py:6205

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.start_run, run_one_step

test_run_one_step_allows_durable_records_grounded_in_read_artifact

function tests/nipux_cli/test_worker.py:6253

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.get_job, db.start_run, run_one_step

test_run_one_step_scopes_grounding_to_cited_step

function tests/nipux_cli/test_worker.py:6310

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.get_job, db.start_run, run_one_step

test_cited_step_numbers_ignore_ordinal_hash_labels

function tests/nipux_cli/test_worker.py:6370

No docstring.

Calls: _cited_step_numbers

test_prompt_shows_evidence_grounding_tokens_after_block

function tests/nipux_cli/test_worker.py:6380

No docstring.

Calls: AgentDB, build_messages, db.add_step, db.close, db.create_job, db.finish_step, db.get_job, db.list_steps, db.start_run

test_prompt_shows_missing_candidate_paths_after_grounding_block

function tests/nipux_cli/test_worker.py:6405

No docstring.

Calls: AgentDB, build_messages, db.add_step, db.close, db.create_job, db.finish_step, db.get_job, db.list_steps, db.start_run, range

test_prompt_adds_ranked_current_candidates_to_stale_grounding_block

function tests/nipux_cli/test_worker.py:6445

No docstring.

Calls: AgentDB, build_messages, content.index, db.add_step, db.close, db.create_job, db.finish_step, db.get_job, db.list_steps, db.start_run, db.update_job_metadata, next_constraint.index, ranked_text.index

test_prompt_does_not_resurface_grounding_block_after_durable_resolution

function tests/nipux_cli/test_worker.py:6503

No docstring.

Calls: AgentDB, build_messages, content.split, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.get_job, db.list_steps, db.start_run, split

test_prompt_suppresses_findings_matching_stale_claim_tokens

function tests/nipux_cli/test_worker.py:6543

No docstring.

Calls: AgentDB, build_messages, db.append_finding_record, db.append_lesson, db.close, db.create_job, db.get_job, db.list_steps

test_prompt_prioritizes_validation_for_recent_candidate_file_paths

function tests/nipux_cli/test_worker.py:6566

No docstring.

Calls: AgentDB, build_messages, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.get_job, db.list_steps, db.start_run, db.update_job_metadata

test_prompt_deprioritizes_recent_stub_candidate_file_paths

function tests/nipux_cli/test_worker.py:6610

No docstring.

Calls: AgentDB, build_messages, content.index, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.get_job, db.list_steps, db.start_run, db.update_job_metadata

test_prompt_isolates_current_execution_focus_for_candidate_validation

function tests/nipux_cli/test_worker.py:6656

No docstring.

Calls: AgentDB, build_messages, content.index, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.get_job, db.list_steps, db.start_run, db.update_job_metadata, range

test_prompt_moves_from_candidate_validation_to_candidate_use_after_positive_evidence

function tests/nipux_cli/test_worker.py:6709

No docstring.

Calls: AgentDB, build_messages, content.index, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.get_job, db.list_steps, db.start_run, db.update_job_metadata

test_prompt_ranks_context_matching_candidate_paths_before_auxiliary_files

function tests/nipux_cli/test_worker.py:6752

No docstring.

Calls: AgentDB, _rank_candidate_file_paths, build_messages, content.index, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.get_job, db.list_steps, db.start_run, db.update_job_metadata

test_next_action_prioritizes_candidate_file_validation_over_download_retry

function tests/nipux_cli/test_worker.py:6808

No docstring.

Calls: AgentDB, build_messages, content.index, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.get_job, db.list_steps, db.start_run, db.update_job_metadata

test_prompt_ranks_late_candidate_paths_from_large_shell_listing

function tests/nipux_cli/test_worker.py:6858

No docstring.

Calls: AgentDB, build_messages, content.index, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.get_job, db.list_steps, db.start_run, db.update_job_metadata, join, range

test_prompt_prioritizes_structured_candidate_file_paths

function tests/nipux_cli/test_worker.py:6899

No docstring.

Calls: AgentDB, build_messages, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.get_job, db.list_steps, db.start_run, db.update_job_metadata

test_prompt_filters_truncated_and_url_like_candidate_file_paths

function tests/nipux_cli/test_worker.py:6941

No docstring.

Calls: AgentDB, build_messages, db.close, db.create_job, db.get_job, db.list_steps, db.update_job_metadata

test_candidate_path_extraction_stops_at_escaped_newline_metadata

function tests/nipux_cli/test_worker.py:6977

No docstring.

Calls: _extract_candidate_file_paths, all

test_candidate_path_extraction_skips_globs_and_truncated_fragments

function tests/nipux_cli/test_worker.py:6990

No docstring.

Calls: _extract_candidate_file_paths

test_prompt_resurfaces_durable_candidate_file_paths

function tests/nipux_cli/test_worker.py:7008

No docstring.

Calls: AgentDB, build_messages, db.close, db.create_job, db.get_job, db.list_steps, db.update_job_metadata

test_prompt_resurfaces_candidate_paths_from_recent_grounding_block

function tests/nipux_cli/test_worker.py:7044

No docstring.

Calls: AgentDB, build_messages, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.get_job, db.list_steps, db.start_run, db.update_job_metadata

test_grounding_uses_recent_missing_candidate_paths_after_raw_evidence_ages

function tests/nipux_cli/test_worker.py:7092

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.start_run, db.update_job_metadata, range, run_one_step

test_prompt_filters_stale_generated_and_objective_tokens

function tests/nipux_cli/test_worker.py:7163

No docstring.

Calls: AgentDB, build_messages, db.append_finding_record, db.append_lesson, db.close, db.create_job, db.get_job, db.list_steps

test_prompt_redacts_stale_tokens_from_recent_state

function tests/nipux_cli/test_worker.py:7191

No docstring.

Calls: AgentDB, build_messages, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.get_job, db.list_steps, db.start_run

test_prompt_does_not_redact_stale_tokens_inside_exact_paths

function tests/nipux_cli/test_worker.py:7224

No docstring.

Calls: AgentDB, build_messages, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.get_job, db.list_steps, db.start_run, db.update_job_metadata

test_prompt_redacts_older_stale_tokens_from_task_queue

function tests/nipux_cli/test_worker.py:7278

No docstring.

Calls: build_messages, range

test_run_one_step_requires_accounting_after_auto_checkpoint_read

function tests/nipux_cli/test_worker.py:7304

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.start_run, run_one_step, str

test_run_one_step_reads_checkpoint_before_batched_branch_work

function tests/nipux_cli/test_worker.py:7352

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, SuccessRegistry, ToolCall, db.close, db.create_job, db.get_job, db.list_steps, db.update_job_metadata, run_one_step, step.get

test_run_one_step_allows_checkpoint_read_when_deliverable_guard_is_active

function tests/nipux_cli/test_worker.py:7394

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, SuccessRegistry, ToolCall, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.get_job, db.start_run, db.update_job_metadata, range

test_run_one_step_accounts_checkpoint_before_batched_branch_work

function tests/nipux_cli/test_worker.py:7441

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, SuccessRegistry, ToolCall, db.close, db.create_job, db.get_job, db.list_steps, db.update_job_metadata, run_one_step, step.get

test_run_one_step_treats_guard_recovery_as_checkpoint_accounting

function tests/nipux_cli/test_worker.py:7483

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, SuccessRegistry, ToolCall, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.start_run, run_one_step, str

test_checkpoint_resolution_tool_bypasses_measured_progress_guard

function tests/nipux_cli/test_worker.py:7562

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.get_job, db.start_run, db.update_job_metadata, range, run_one_step

test_run_one_step_persists_checkpoint_obligation_until_accounted

function tests/nipux_cli/test_worker.py:7611

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.close, db.create_job, db.get_job, db.update_job_metadata, run_one_step, str

test_run_one_step_blocks_branch_work_when_memory_graph_needs_consolidation

function tests/nipux_cli/test_worker.py:7666

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.append_experiment_record, db.append_finding_record, db.append_lesson, db.append_source_record, db.close, db.create_job, run_one_step

test_run_one_step_allows_memory_graph_consolidation_when_guard_is_active

function tests/nipux_cli/test_worker.py:7688

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.append_experiment_record, db.append_finding_record, db.append_lesson, db.append_source_record, db.close, db.create_job, db.get_job, run_one_step

test_prompt_adds_lesson_consolidation_guard_when_raw_lessons_sprawl

function tests/nipux_cli/test_worker.py:7730

No docstring.

Calls: build_messages, range

test_run_one_step_blocks_more_lessons_when_lesson_sprawl_needs_graph

function tests/nipux_cli/test_worker.py:7761

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.add_step, db.append_lesson, db.close, db.create_job, db.finish_run, db.finish_step, db.start_run, range, run_one_step

test_run_one_step_allows_memory_graph_when_lesson_sprawl_is_active

function tests/nipux_cli/test_worker.py:7791

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.add_step, db.append_lesson, db.close, db.create_job, db.finish_run, db.finish_step, db.start_run, range, run_one_step

test_prompt_includes_activity_stagnation_context

function tests/nipux_cli/test_worker.py:7835

No docstring.

Calls: build_messages

test_prompt_includes_task_planning_guard_context

function tests/nipux_cli/test_worker.py:7860

No docstring.

Calls: build_messages

test_prompt_includes_durable_yield_pressure

function tests/nipux_cli/test_worker.py:7881

No docstring.

Calls: build_messages, range

test_prompt_includes_finding_source_ledgers_and_reflections

function tests/nipux_cli/test_worker.py:7906

No docstring.

Calls: build_messages

test_prompt_includes_experiment_ledger_and_best_result

function tests/nipux_cli/test_worker.py:7929

No docstring.

Calls: build_messages

_stagnant_experiments

function tests/nipux_cli/test_worker.py:7973

No docstring.

Calls: range

_stagnant_experiment_metadata

function tests/nipux_cli/test_worker.py:8001

No docstring.

Calls: _stagnant_experiments

test_prompt_includes_experiment_stagnation_guard

function tests/nipux_cli/test_worker.py:8013

No docstring.

Calls: _stagnant_experiment_metadata, build_messages

test_prompt_infers_experiment_stagnation_from_metric_direction

function tests/nipux_cli/test_worker.py:8029

No docstring.

Calls: build_messages, range

test_prompt_does_not_treat_unmarked_improvements_as_stagnation

function tests/nipux_cli/test_worker.py:8073

No docstring.

Calls: build_messages, float, range

test_run_one_step_blocks_branch_work_after_experiment_stagnation

function tests/nipux_cli/test_worker.py:8106

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, _stagnant_experiment_metadata, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.start_run, range, run_one_step

test_run_one_step_allows_branch_decision_after_experiment_stagnation

function tests/nipux_cli/test_worker.py:8139

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, _stagnant_experiment_metadata, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.start_run, range, run_one_step

test_run_one_step_allows_blocked_experiment_after_experiment_stagnation

function tests/nipux_cli/test_worker.py:8184

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, _stagnant_experiment_metadata, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.start_run, range, run_one_step

test_delivery_experiment_next_action_blocks_unrelated_research

function tests/nipux_cli/test_worker.py:8227

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, SuccessRegistry, ToolCall, db.close, db.create_job, db.update_job_metadata, run_one_step

test_research_experiment_next_action_allows_research

function tests/nipux_cli/test_worker.py:8258

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, SuccessRegistry, ToolCall, db.close, db.create_job, db.update_job_metadata, run_one_step

test_delivery_experiment_next_action_blocks_read_only_shell

function tests/nipux_cli/test_worker.py:8288

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, SuccessRegistry, ToolCall, db.close, db.create_job, db.update_job_metadata, run_one_step

test_delivery_experiment_next_action_allows_bounded_verification_shell

function tests/nipux_cli/test_worker.py:8318

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, SuccessRegistry, ToolCall, db.close, db.create_job, db.update_job_metadata, run_one_step

test_failed_next_action_requires_accounting_before_more_shell

function tests/nipux_cli/test_worker.py:8350

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, SuccessRegistry, ToolCall, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.start_run, db.update_job_metadata, run_one_step

test_failed_next_action_prompt_prioritizes_accounting

function tests/nipux_cli/test_worker.py:8404

No docstring.

Calls: AgentDB, build_messages, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.get_job, db.list_steps, db.start_run, db.update_job_metadata

test_failed_next_action_narrows_available_tools_to_accounting

function tests/nipux_cli/test_worker.py:8449

No docstring.

Calls: AgentDB, AppConfig, CapturingLLM, LLMResponse, RuntimeConfig, ToolCall, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.start_run, db.update_job_metadata, issubset, run_one_step

test_accounted_next_action_failure_does_not_keep_blocking

function tests/nipux_cli/test_worker.py:8493

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, SuccessRegistry, ToolCall, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.start_run, db.update_job_metadata, run_one_step

test_delivery_experiment_next_action_allows_write_shell

function tests/nipux_cli/test_worker.py:8551

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, SuccessRegistry, ToolCall, db.close, db.create_job, db.update_job_metadata, run_one_step

test_write_file_can_consume_recent_shell_evidence

function tests/nipux_cli/test_worker.py:8581

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, LargeShellEvidenceRegistry, RuntimeConfig, ScriptedLLM, SuccessRegistry, ToolCall, db.close, db.create_job, run_one_step

test_write_file_creates_validation_obligation_for_code_outputs

function tests/nipux_cli/test_worker.py:8609

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.close, db.create_job, db.get_job, run_one_step, str

test_file_validation_obligation_blocks_research_until_validated

function tests/nipux_cli/test_worker.py:8637

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, SuccessRegistry, ToolCall, db.close, db.create_job, db.get_job, get, run_one_step, str

test_delivery_experiment_next_action_allows_internal_artifact_review

function tests/nipux_cli/test_worker.py:8684

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, SuccessRegistry, ToolCall, db.close, db.create_job, db.update_job_metadata, run_one_step

test_prompt_marks_recent_anti_bot_browser_source

function tests/nipux_cli/test_worker.py:8714

No docstring.

Calls: build_messages

test_prompt_marks_recent_captcha_browser_block

function tests/nipux_cli/test_worker.py:8734

No docstring.

Calls: build_messages

test_prompt_includes_browser_candidate_names

function tests/nipux_cli/test_worker.py:8756

No docstring.

Calls: build_messages

test_prompt_includes_candidate_names_from_table_cells

function tests/nipux_cli/test_worker.py:8784

No docstring.

Calls: build_messages

test_prompt_includes_recovery_candidates_after_stale_ref

function tests/nipux_cli/test_worker.py:8820

No docstring.

Calls: build_messages

test_run_one_step_blocks_exact_duplicate_tool_call

function tests/nipux_cli/test_worker.py:8851

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.close, db.create_job, run_one_step

test_duplicate_artifact_read_guidance_pushes_follow_up_work

function tests/nipux_cli/test_worker.py:8872

No docstring.

Calls: AgentDB, AppConfig, ArtifactStore, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, artifacts.write_text, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.start_run, run_one_step, str

test_fresh_evidence_guard_takes_priority_over_duplicate_read

function tests/nipux_cli/test_worker.py:8895

No docstring.

Calls: AgentDB, AppConfig, ArtifactStore, LLMResponse, LargeShellEvidenceRegistry, RuntimeConfig, ScriptedLLM, ToolCall, artifacts.write_text, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.start_run, run_one_step

test_run_one_step_allows_repeated_browser_snapshot

function tests/nipux_cli/test_worker.py:8930

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, SnapshotRegistry, ToolCall, db.close, db.create_job, run_one_step

test_run_one_step_blocks_browser_tools_after_runtime_missing

function tests/nipux_cli/test_worker.py:8956

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, SnapshotRegistry, ToolCall, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.start_run, run_one_step

test_run_one_step_allows_non_browser_work_after_runtime_missing

function tests/nipux_cli/test_worker.py:8998

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, SuccessRegistry, ToolCall, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.start_run, run_one_step

test_run_one_step_skips_batched_browser_call_when_runtime_missing_and_fallback_present

function tests/nipux_cli/test_worker.py:9035

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, SuccessRegistry, ToolCall, all, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.list_steps, db.start_run, get

test_run_one_step_removes_browser_tools_from_schema_after_runtime_missing

function tests/nipux_cli/test_worker.py:9082

No docstring.

Calls: AgentDB, AppConfig, BrowserAndWebRegistry, CapturingLLM, LLMResponse, RuntimeConfig, ToolCall, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.start_run, run_one_step

test_run_one_step_removes_browser_tools_after_older_runtime_missing

function tests/nipux_cli/test_worker.py:9118

No docstring.

Calls: AgentDB, AppConfig, BrowserAndWebRegistry, CapturingLLM, LLMResponse, RuntimeConfig, ToolCall, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.start_run, range, run_one_step

test_run_one_step_allows_repeated_defer_for_monitor_intervals

function tests/nipux_cli/test_worker.py:9162

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.close, db.create_job, run_one_step

test_run_one_step_blocks_self_defer_for_next_worker_turn

function tests/nipux_cli/test_worker.py:9179

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.close, db.create_job, run_one_step

test_run_one_step_blocks_defer_without_wait_reason

function tests/nipux_cli/test_worker.py:9211

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.close, db.create_job, run_one_step

test_run_one_step_blocks_search_after_unpersisted_extract

function tests/nipux_cli/test_worker.py:9242

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, SuccessRegistry, ToolCall, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.list_artifacts, db.start_run, run_one_step, startswith

test_prompt_tells_model_to_save_unpersisted_evidence_before_more_research

function tests/nipux_cli/test_worker.py:9287

No docstring.

Calls: AgentDB, build_messages, db.add_step, db.close, db.create_job, db.finish_step, db.get_job, db.list_steps, db.start_run

test_run_one_step_blocks_research_after_unpersisted_browser_snapshot

function tests/nipux_cli/test_worker.py:9309

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.start_run, run_one_step

test_prompt_tells_model_to_open_new_branch_when_tasks_are_exhausted

function tests/nipux_cli/test_worker.py:9344

No docstring.

Calls: build_messages

test_prompt_pushes_deliverable_checkpoint_after_long_research

function tests/nipux_cli/test_worker.py:9364

No docstring.

Calls: build_messages, range

test_low_priority_report_task_does_not_block_execution_task_prompt

function tests/nipux_cli/test_worker.py:9398

No docstring.

Calls: build_messages, range

test_run_one_step_blocks_more_research_when_deliverable_needs_checkpoint

function tests/nipux_cli/test_worker.py:9437

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.start_run, range, run_one_step

test_prompt_includes_roadmap_and_validation_constraints

function tests/nipux_cli/test_worker.py:9495

No docstring.

Calls: build_messages

test_prompt_suggests_roadmap_for_broad_jobs_without_one

function tests/nipux_cli/test_worker.py:9527

No docstring.

Calls: build_messages

test_run_one_step_blocks_branch_work_when_milestone_needs_validation

function tests/nipux_cli/test_worker.py:9542

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.close, db.create_job, run_one_step

test_run_one_step_allows_milestone_validation_when_gate_is_active

function tests/nipux_cli/test_worker.py:9580

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.close, db.create_job, db.get_job, run_one_step

test_run_one_step_allows_matching_pending_milestone_evidence_action

function tests/nipux_cli/test_worker.py:9622

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.close, db.create_job, run_one_step

test_run_one_step_allows_matching_pending_milestone_validation_evidence_action

function tests/nipux_cli/test_worker.py:9662

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.close, db.create_job, run_one_step

test_run_one_step_blocks_non_matching_pending_milestone_action

function tests/nipux_cli/test_worker.py:9701

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.close, db.create_job, run_one_step

test_run_one_step_blocks_wrong_milestone_validation_when_gate_is_active

function tests/nipux_cli/test_worker.py:9741

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.close, db.create_job, db.get_job, run_one_step

test_run_one_step_normalizes_matching_validation_to_active_milestone

function tests/nipux_cli/test_worker.py:9781

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.close, db.create_job, db.get_job, run_one_step

test_run_one_step_blocks_task_churn_when_roadmap_stalls

function tests/nipux_cli/test_worker.py:9831

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.add_step, db.close, db.create_job, db.finish_step, db.start_run, range, run_one_step

test_run_one_step_allows_roadmap_update_when_roadmap_stalls

function tests/nipux_cli/test_worker.py:9874

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.add_step, db.close, db.create_job, db.finish_step, db.get_job, db.start_run, range, run_one_step

test_run_one_step_blocks_branch_work_when_tasks_are_exhausted

function tests/nipux_cli/test_worker.py:9927

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.close, db.create_job, run_one_step

test_run_one_step_allows_record_tasks_when_tasks_are_exhausted

function tests/nipux_cli/test_worker.py:9954

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, any, db.close, db.create_job, db.get_job, run_one_step

test_run_one_step_blocks_new_tasks_when_queue_is_saturated

function tests/nipux_cli/test_worker.py:9983

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.close, db.create_job, db.get_job, range, run_one_step

test_run_one_step_blocks_batch_that_would_saturate_task_queue

function tests/nipux_cli/test_worker.py:10022

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.close, db.create_job, db.get_job, len, range, run_one_step

test_run_one_step_executes_accounting_before_saturated_record_tasks

function tests/nipux_cli/test_worker.py:10067

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, any, db.close, db.create_job, db.get_job, db.list_steps, get, lesson.get, range, run_one_step, step.get

test_run_one_step_blocks_batch_that_would_saturate_open_tasks

function tests/nipux_cli/test_worker.py:10111

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.close, db.create_job, db.get_job, len, range, run_one_step

test_run_one_step_ignores_guard_recovery_tasks_for_queue_saturation

function tests/nipux_cli/test_worker.py:10156

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, any, db.close, db.create_job, db.get_job, range, run_one_step

test_run_one_step_ignores_guard_recovery_tasks_for_total_sprawl

function tests/nipux_cli/test_worker.py:10196

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, any, db.close, db.create_job, db.get_job, range, run_one_step

test_run_one_step_blocks_read_only_shell_churn

function tests/nipux_cli/test_worker.py:10236

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.start_run, run_one_step

test_run_one_step_allows_action_after_read_only_shell_churn

function tests/nipux_cli/test_worker.py:10269

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, SuccessRegistry, ToolCall, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.start_run, run_one_step

test_run_one_step_allows_read_only_shell_after_durable_decision

function tests/nipux_cli/test_worker.py:10302

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, SuccessRegistry, ToolCall, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.start_run, run_one_step

test_run_one_step_allows_explicit_download_after_read_only_shell_churn

function tests/nipux_cli/test_worker.py:10344

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, SuccessRegistry, ToolCall, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.start_run, run_one_step

test_run_one_step_blocks_new_tasks_when_queue_sprawls

function tests/nipux_cli/test_worker.py:10377

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.close, db.create_job, range, run_one_step

test_recent_task_saturation_keeps_record_tasks_for_existing_updates

function tests/nipux_cli/test_worker.py:10412

No docstring.

Calls: AgentDB, AppConfig, CapturingLLM, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.close, db.create_job, range, run_one_step

test_repeated_task_saturation_temporarily_suppresses_record_tasks

function tests/nipux_cli/test_worker.py:10456

No docstring.

Calls: AgentDB, AppConfig, CapturingLLM, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.close, db.create_job, range, run_one_step

test_chronic_backlog_suppresses_new_task_planning_tool

function tests/nipux_cli/test_worker.py:10494

No docstring.

Calls: AgentDB, AppConfig, CapturingLLM, LLMResponse, RuntimeConfig, ToolCall, db.close, db.create_job, range, run_one_step

test_run_one_step_allows_existing_task_update_when_queue_is_saturated

function tests/nipux_cli/test_worker.py:10524

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.close, db.create_job, db.get_job, range, run_one_step

test_run_one_step_allows_semantic_task_update_when_queue_is_saturated

function tests/nipux_cli/test_worker.py:10559

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.close, db.create_job, db.get_job, range, run_one_step

test_run_one_step_auto_records_anti_bot_browser_source

function tests/nipux_cli/test_worker.py:10613

No docstring.

Calls: AgentDB, AntiBotBrowserRegistry, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.close, db.create_job, db.get_job, run_one_step

test_run_one_step_blocks_misleading_artifact_after_anti_bot_snapshot

function tests/nipux_cli/test_worker.py:10641

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.get_job, db.list_artifacts, db.start_run, run_one_step

test_run_one_step_allows_blocked_source_artifact_when_acknowledged

function tests/nipux_cli/test_worker.py:10688

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.list_artifacts, db.start_run, run_one_step

test_run_one_step_blocks_browser_loop_after_anti_bot_snapshot

function tests/nipux_cli/test_worker.py:10730

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.start_run, run_one_step

test_run_one_step_blocks_known_bad_browser_source_from_ledger

function tests/nipux_cli/test_worker.py:10766

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.append_source_record, db.close, db.create_job, db.get_job, run_one_step

test_run_one_step_blocks_known_bad_extract_source_from_ledger

function tests/nipux_cli/test_worker.py:10799

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.append_source_record, db.close, db.create_job, run_one_step

test_run_one_step_allows_child_url_when_bad_web_source_is_domain_root

function tests/nipux_cli/test_worker.py:10832

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, SuccessRegistry, ToolCall, db.append_source_record, db.close, db.create_job, run_one_step

test_run_one_step_records_failed_shell_url_source

function tests/nipux_cli/test_worker.py:10865

No docstring.

Calls: AgentDB, AppConfig, FailedUrlShellRegistry, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.close, db.create_job, db.get_job, run_one_step

test_run_one_step_records_pathful_failed_shell_urls_not_root_health_checks

function tests/nipux_cli/test_worker.py:10894

No docstring.

Calls: AgentDB, AppConfig, FailedUrlShellRegistry, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.close, db.create_job, db.get_job, run_one_step

test_run_one_step_blocks_known_bad_shell_source_family

function tests/nipux_cli/test_worker.py:10921

No docstring.

Calls: AgentDB, AppConfig, FailedUrlShellRegistry, LLMResponse, RuntimeConfig, ScriptedLLM, SuccessRegistry, ToolCall, db.close, db.create_job, run_one_step

test_run_one_step_derives_bad_shell_source_family_from_exact_failure

function tests/nipux_cli/test_worker.py:10972

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.append_source_record, db.close, db.create_job, endswith, run_one_step

test_run_one_step_does_not_block_entire_host_after_auth_source_families

function tests/nipux_cli/test_worker.py:11007

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, SuccessRegistry, ToolCall, db.append_source_record, db.close, db.create_job, run_one_step

test_run_one_step_blocks_known_bad_shell_source_path

function tests/nipux_cli/test_worker.py:11059

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, SuccessRegistry, ToolCall, db.append_source_record, db.close, db.create_job, run_one_step

test_run_one_step_allows_mixed_shell_command_with_bad_root_health_check

function tests/nipux_cli/test_worker.py:11106

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, SuccessRegistry, ToolCall, db.append_source_record, db.close, db.create_job, run_one_step

test_run_one_step_saves_unpersisted_evidence_before_known_bad_source_block

function tests/nipux_cli/test_worker.py:11140

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.add_step, db.append_source_record, db.close, db.create_job, db.finish_run, db.finish_step, db.start_run, run_one_step

test_run_one_step_blocks_search_streak

function tests/nipux_cli/test_worker.py:11184

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.start_run, run_one_step

test_run_one_step_blocks_similar_search_query

function tests/nipux_cli/test_worker.py:11211

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.start_run, run_one_step

test_run_one_step_reflects_every_fixed_interval

function tests/nipux_cli/test_worker.py:11242

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, build_messages, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.get_job, db.list_steps, db.start_run, range

test_reflection_does_not_repeat_existing_strategy_lesson

function tests/nipux_cli/test_worker.py:11272

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.add_step, db.append_lesson, db.close, db.create_job, db.finish_run, db.finish_step, db.get_job, db.start_run, get, len

test_reflection_strategy_uses_current_operator_state

function tests/nipux_cli/test_worker.py:11303

No docstring.

Calls: AgentDB, AppConfig, LLMResponse, RuntimeConfig, ScriptedLLM, ToolCall, db.add_step, db.close, db.create_job, db.finish_run, db.finish_step, db.start_run, range, run_one_step

Line-by-line

Source Browser

Collapsed raw tracked source so the backend can be inspected directly in this page.
AGENTS.md 32 lines
   1# Development Notes
   2
   3This repo is a focused long-running worker, not a broad assistant distribution.
   4
   5## Active Surface
   6
   7- Runtime package: `nipux_cli/`
   8- Tests: `tests/nipux_cli/`
   9- Entry point: `nipux`
  10- State home: `~/.nipux` or `NIPUX_HOME`
  11- Planning notes: `plans/nipux-runtime-notes.md`
  12
  13## Constraints
  14
  15- Keep the default tool surface small and explicit.
  16- Do not reintroduce broad upstream surfaces such as gateways, skills, plugins, web UI, ACP, RL environments, voice, image generation, or arbitrary terminal execution. The chat-first terminal UI is part of Nipux's active product surface; keep it generic, minimal, and backed by persisted worker state.
  17- Preserve restartability: every worker step should persist state before and after tool execution.
  18- Store exact evidence as artifacts. Summaries should point back to artifacts instead of replacing them.
  19- Keep `memory_index` entries compact and artifact-referenced; do not use raw transcript replay as the long-term state strategy.
  20- Prefer OpenAI-compatible model serving, configured through `~/.nipux/config.yaml`.
  21- Keep runtime behavior domain-neutral. Do not add task-specific or environment-specific guards, keyword lists, examples, prompts, tools, or tests and describe them as generic framework improvements.
  22
  23## Validation
  24
  25Use the focused suite:
  26
  27```bash
  28PYTEST_ADDOPTS='' uv run --extra dev python -m pytest -q
  29uv run --extra dev ruff check --isolated nipux_cli tests/nipux_cli
  30```
  31
  32Use `nipux daemon --once --fake` for a no-model smoke test. Use `nipux logs JOB_ID --verbose` or `nipux watch JOB_ID --verbose` when inspecting what a background job is actually doing.
README.md 378 lines
   1# Nipux CLI
   2
   3```text
   4 _   _ ___ ____  _   ___  __
   5| \ | |_ _|  _ \| | | \ \/ /
   6|  \| || || |_) | | | |>  <
   7| |\  || ||  __/| |_| /_/\_\
   8|_| \_|___|_|    \__,_|
   9```
  10
  11Nipux CLI is a small, restartable worker for long-running browser, web research,
  12and command-line jobs. It supports any OpenAI-compatible local or remote model
  13endpoint. It is maintained for Nipux and built around one practical idea: keep a
  14worker moving in bounded steps, save exact evidence, learn from each branch, and
  15recover cleanly when a process or model call fails.
  16
  17- Website: [Nipux.com](https://nipux.com)
  18- Source: [github.com/nipuxx/agent-cli](https://github.com/nipuxx/agent-cli)
  19- License: [MIT](LICENSE)
  20
  21## What It Does
  22
  23Nipux runs jobs that are too long or repetitive for a single chat turn. A job can
  24search the web, operate a persistent browser profile, write artifacts, inspect
  25local files with bounded shell commands, update source and finding ledgers, and
  26continue through a daemon loop until the operator pauses or cancels it.
  27
  28The default runtime is intentionally narrow:
  29
  30- one OpenAI-compatible model endpoint chosen during setup
  31- one SQLite state store under `~/.nipux`
  32- one restartable daemon with a single-instance lock
  33- per-job artifact files for exact evidence
  34- per-job browser profiles through `agent-browser`
  35- compact memory summaries that point back to artifacts
  36- visible event history for chat, tools, artifacts, progress, errors, and digests
  37- durable ledgers for lessons, sources, findings, tasks, roadmap, and experiments
  38
  39Nipux does not include a messaging gateway, plugin marketplace, skills manager,
  40RL environment, voice stack, image stack, or broad web application. The public
  41surface is the `nipux` CLI and the focused `nipux_cli/` Python package.
  42
  43## Install
  44
  45Requirements:
  46
  47- Python 3.11+
  48- an OpenAI-compatible chat completions endpoint, local or remote
  49- optional browser automation: `npm install -g agent-browser && agent-browser install`
  50
  51Install and open the full-screen setup wizard with one command:
  52
  53```bash
  54curl -fsSL https://raw.githubusercontent.com/nipuxx/agent-cli/main/scripts/install.sh | bash
  55```
  56
  57The first-run wizard asks for the provider/model, endpoint, API key location,
  58and tool access. After the model is verified, Nipux opens the workspace chat
  59where you can describe worker jobs in plain language. It stores
  60secrets outside the git repo and writes runtime state under `~/.nipux` unless
  61`NIPUX_HOME` is set.
  62
  63Install from a local checkout while developing:
  64
  65```bash
  66git clone https://github.com/nipuxx/agent-cli.git
  67cd agent-cli
  68uv tool install --editable .
  69nipux
  70```
  71
  72Install directly from git once the repository is public:
  73
  74```bash
  75uv tool install git+https://github.com/nipuxx/agent-cli.git
  76```
  77
  78## First Run
  79
  80Run `nipux`. If this is a fresh profile, the full-screen setup wizard opens
  81immediately and locks chat/job creation until the configured model passes a real
  82chat request. The wizard writes `config.yaml` and a local `.env` template under
  83`~/.nipux` unless `NIPUX_HOME` is set. Real API keys stay in the environment or
  84`~/.nipux/.env`, not in the git repo.
  85
  86```bash
  87nipux
  88```
  89
  90After setup, `nipux` opens the workspace chat. Type a plain-English goal to spin
  91up a worker, or use `/new OBJECTIVE`. Use `/settings` to edit model, endpoint,
  92tool access, runtime, and cost fields from inside the UI.
  93
  94Manual configuration is still available for scripts or headless environments:
  95
  96```bash
  97nipux init --model local-model --base-url http://localhost:8000/v1 --api-key-env OPENAI_API_KEY
  98nipux doctor --check-model
  99```
 100
 101`nipux init` creates `~/.nipux/config.yaml` and `~/.nipux/.env` with private
 102file permissions. Later `/api-key` edits keep the secret in `~/.nipux/.env`
 103instead of writing it to config.
 104
 105Update an installed tool or source checkout from anywhere:
 106
 107```bash
 108nipux update
 109```
 110
 111When installed as a `uv tool`, `nipux update` force-refreshes the command from
 112the source repository and verifies the installed command afterward. When run
 113inside a git checkout, it fast-forwards the checkout. If a daemon is running,
 114update restarts it automatically unless `--no-restart` is used. Set
 115`NIPUX_UPDATE_SPEC` only when you need to update from a different package source.
 116
 117Inspect progress from the terminal:
 118
 119```bash
 120nipux status
 121nipux activity --follow
 122```
 123
 124On macOS, install launchd autostart:
 125
 126```bash
 127nipux autostart install --poll-seconds 0
 128nipux autostart status
 129```
 130
 131On Linux, install a user service:
 132
 133```bash
 134nipux service install
 135nipux service status
 136```
 137
 138Fully remove local runtime state when you want a fresh user install:
 139
 140```bash
 141nipux uninstall --yes
 142```
 143
 144This stops the daemon, removes launchd/systemd service files, deletes
 145`~/.nipux`, removes legacy `~/.kneepucks` state if it exists, and removes the
 146installed `nipux` command with `uv tool uninstall nipux`. Add `--keep-tool` only
 147when you intentionally want to keep the command installed.
 148
 149## Secrets
 150
 151Nipux never needs an API key in `config.yaml`. The config stores only the name
 152of the environment variable to read:
 153
 154```yaml
 155model:
 156  name: provider/model
 157  base_url: https://provider.example/v1
 158  api_key_env: PROVIDER_API_KEY
 159  # Optional fallback pricing when the provider does not return cost metadata.
 160  input_cost_per_million: null
 161  output_cost_per_million: null
 162```
 163
 164Put secrets in your shell, your process manager, or `~/.nipux/.env`:
 165
 166```bash
 167# ~/.nipux/.env
 168PROVIDER_API_KEY = <redacted>
 169```
 170
 171The repository includes `.env.example` and `config.example.yaml` as templates.
 172Do not commit real `.env`, state databases, logs, artifacts, or browser
 173profiles. The default `.gitignore` excludes those local runtime files.
 174
 175## Tool Access
 176
 177The first-run wizard and config slash commands control which generic tool groups
 178the worker can use:
 179
 180```yaml
 181tools:
 182  browser: true
 183  web: true
 184  shell: true
 185  files: true
 186```
 187
 188Use `/browser`, `/web`, `/cli-access`, and `/file-access` in the terminal UI to
 189change those switches later. Disabled tools are removed from the worker tool
 190schema and blocked if an old daemon tries to call them.
 191
 192## Local Model Examples
 193
 194Nipux talks to OpenAI-compatible `/v1/chat/completions` and `/v1/models`
 195servers. Use any serving stack that supports the model and tool-calling behavior
 196you want.
 197
 198SGLang example:
 199
 200```bash
 201python -m sglang.launch_server \
 202  --model-path "$MODEL_NAME" \
 203  --port 8000 \
 204  --context-length 262144 \
 205  --reasoning-parser auto \
 206  --tool-call-parser auto
 207```
 208
 209vLLM example:
 210
 211```bash
 212vllm serve "$MODEL_NAME" \
 213  --port 8000 \
 214  --max-model-len 262144 \
 215  --enable-auto-tool-choice \
 216  --tool-call-parser auto
 217```
 218
 219## Operator Workflow
 220
 221The no-argument CLI opens the focused job directly. Plain text becomes operator
 222steering for the next worker step. The terminal UI keeps conversation/output on
 223the left and status, jobs, saved outputs, updates, and worker activity on the
 224right. Configuration is handled through slash commands such as `/model`,
 225`/api-key`, `/base-url`, and `/context`, not a separate settings page.
 226
 227```text
 228nipux > what are you working on?
 229nipux > prioritize measured progress over notes
 230```
 231
 232For direct command use:
 233
 234```bash
 235uv run nipux status "nightly research" --full
 236uv run nipux history "nightly research"
 237uv run nipux events "nightly research" --follow
 238uv run nipux activity "nightly research" --follow
 239uv run nipux outcomes "nightly research"
 240uv run nipux outcomes --all
 241uv run nipux findings "nightly research"
 242uv run nipux tasks "nightly research"
 243uv run nipux roadmap "nightly research"
 244uv run nipux experiments "nightly research"
 245uv run nipux sources "nightly research"
 246uv run nipux memory "nightly research"
 247uv run nipux metrics "nightly research"
 248uv run nipux usage "nightly research"
 249uv run nipux artifacts "nightly research" --paths
 250```
 251
 252Use `nipux health` for daemon truth without opening the dashboard. It reports
 253the lock state, heartbeat, recent failures, log paths, autostart state, focused
 254job, and latest daemon events.
 255
 256### Seeing What It Actually Did
 257
 258Use these views when a job has been running unattended:
 259
 260- `nipux outcomes JOB` or the **Outcomes** pane: durable work grouped by time,
 261  including saved outputs, findings, measurements, decisions, lessons, and file
 262  changes.
 263- `nipux outcomes --all`: latest durable work and saved outputs for every job,
 264  useful when several agents have been running in the background.
 265- `nipux activity JOB --follow` or the **Work** pane: the raw live tool stream
 266  for debugging what the worker is doing right now.
 267- `nipux usage JOB`: model calls, context pressure, output tokens, and cost when
 268  the provider returns cost metadata. If the provider does not return cost,
 269  configure `/input-cost` and `/output-cost` to estimate it from token counts.
 270- `nipux digest JOB` and `nipux daily-digest`: durable summary reports that
 271  include progress counts, active operator context, experiments, artifacts, and
 272  token/cost usage.
 273
 274## Tool Surface
 275
 276The worker exposes a deliberately small tool registry:
 277
 278- `browser_navigate`
 279- `browser_snapshot`
 280- `browser_click`
 281- `browser_type`
 282- `browser_scroll`
 283- `browser_back`
 284- `browser_press`
 285- `browser_console`
 286- `web_search`
 287- `web_extract`
 288- `shell_exec`
 289- `write_file`
 290- `write_artifact`
 291- `read_artifact`
 292- `search_artifacts`
 293- `update_job_state`
 294- `defer_job`
 295- `report_update`
 296- `record_lesson`
 297- `acknowledge_operator_context`
 298- `record_source`
 299- `record_findings`
 300- `record_tasks`
 301- `record_roadmap`
 302- `record_milestone_validation`
 303- `record_experiment`
 304- `send_digest_email`
 305
 306`shell_exec` is bounded with timeouts and output capture. Browser sessions use
 307per-job profiles under `~/.nipux/browser-profiles/`. Anti-bot, CAPTCHA, login,
 308and paywall pages are recorded as visible source-quality warnings; Nipux does
 309not bypass protections.
 310
 311Workers can use `defer_job` for scheduled follow-up, monitor intervals, or long
 312external processes that are actually waiting on time to pass. Deferred jobs stay
 313runnable but show as waiting until their next check time, so the daemon can keep
 314other work moving without burning model calls on repeated polling.
 315
 316## Command Reference
 317
 318```bash
 319nipux init [--force] [--openrouter] [--model MODEL] [--base-url URL] [--api-key-env ENV]
 320nipux update [--path PATH] [--allow-dirty] [--no-restart]
 321nipux uninstall [--yes] [--dry-run] [--keep-legacy] [--keep-tool]
 322nipux doctor [--check-model]
 323nipux shell [--status]
 324nipux create "objective" [--title TITLE] [--kind KIND] [--cadence CADENCE]
 325nipux jobs
 326nipux ls
 327nipux focus [JOB_TITLE]
 328nipux rename JOB_TITLE --title NEW_TITLE
 329nipux delete JOB_TITLE [--keep-files]
 330nipux chat [JOB_TITLE] [--no-history]
 331nipux steer [--job JOB_TITLE] MESSAGE
 332nipux pause [JOB_TITLE] [note...]
 333nipux resume [JOB_TITLE]
 334nipux cancel [JOB_TITLE] [note...]
 335nipux start [--poll-seconds N]
 336nipux stop
 337nipux autostart install|status|uninstall [--poll-seconds N]
 338nipux service install|status|uninstall [--poll-seconds N]
 339nipux browser-dashboard [--port N] [--foreground] [--stop]
 340nipux health
 341nipux status [JOB_TITLE] [--full] [--json]
 342nipux history [JOB_TITLE] [--full] [--json]
 343nipux events [JOB_TITLE] [--follow] [--json]
 344nipux activity [JOB_TITLE] [--follow] [--verbose]
 345nipux updates [JOB_TITLE]
 346nipux outcomes [JOB_TITLE] [--all]
 347nipux dashboard [JOB_TITLE]
 348nipux findings [JOB_TITLE] [--limit N] [--json]
 349nipux tasks [JOB_TITLE] [--limit N] [--status STATUS] [--json]
 350nipux roadmap [JOB_TITLE] [--limit N] [--json]
 351nipux experiments [JOB_TITLE] [--limit N] [--status STATUS] [--json]
 352nipux sources [JOB_TITLE] [--limit N] [--json]
 353nipux memory [JOB_TITLE]
 354nipux metrics [JOB_TITLE]
 355nipux usage [JOB_TITLE] [--json]
 356nipux artifacts [JOB_TITLE] [--paths]
 357nipux artifact QUERY_OR_TITLE [--job JOB_TITLE]
 358nipux lessons [JOB_TITLE]
 359nipux learn [--job JOB_TITLE] [--category CATEGORY] LESSON
 360nipux logs [JOB_TITLE] [--limit N] [--verbose]
 361nipux outputs [JOB_TITLE] [--limit N] [--verbose]
 362nipux watch JOB_TITLE [--verbose]
 363nipux run-one JOB_TITLE [--fake]
 364nipux work [JOB_TITLE] [--steps N] [--verbose] [--dashboard]
 365nipux run [JOB_TITLE] [--poll-seconds N] [--no-follow]
 366nipux daemon [--once] [--fake] [--verbose] [--poll-seconds N]
 367nipux digest JOB_TITLE
 368nipux daily-digest [--day YYYY-MM-DD]
 369```
 370
 371## Development
 372
 373```bash
 374PYTEST_ADDOPTS='' uv run --extra dev python -m pytest -q
 375uv run --extra dev ruff check --isolated nipux_cli tests/nipux_cli
 376```
 377
 378The active implementation notes live in `plans/nipux-runtime-notes.md`.
RELEASE_CHECKLIST.md 38 lines
   1# Release Checklist
   2
   3Use this before sharing the repository with outside users.
   4
   5## Secrets
   6
   7- No real API keys in git.
   8- `config.yaml` examples use `model.api_key_env`, not literal keys.
   9- `.env`, `.env.*`, state databases, logs, artifacts, and browser profiles are ignored.
  10- `nipux doctor` reports missing remote API-key environment variables without printing key values.
  11
  12## Install
  13
  14- `uv tool install --editable .` works from a checkout.
  15- `uv run nipux --help` works without installing.
  16- `NIPUX_HOME=$(mktemp -d) uv run nipux` opens the first-run terminal UI, not argparse help or an ASCII-only prompt.
  17- `nipux init` writes the default Qwen/OpenRouter `~/.nipux/config.yaml` and a blank `~/.nipux/.env` template.
  18- `nipux doctor` passes for local runtime checks after initialization.
  19- `nipux daemon --once --fake` runs without a model key.
  20
  21## Runtime
  22
  23- `nipux start`, `nipux stop`, and `nipux restart` recover stale daemon state.
  24- `nipux status`, `nipux activity`, `nipux history`, and `nipux artifacts` expose enough state to debug jobs.
  25- Worker prompts stay bounded and do not replay raw transcript history.
  26- Operator chat that is only conversational stays in history but does not remain active worker context.
  27- Measurable jobs record experiments instead of treating notes as progress.
  28- Status, outcomes, and work panes show different layers clearly: jobs and latest outputs, durable progress by hour, and raw tool/console events.
  29
  30## Validation
  31
  32```bash
  33python -m compileall nipux_cli tests/nipux_cli
  34uv run --extra dev python -m pytest tests/nipux_cli -q
  35uv run --extra dev ruff check nipux_cli tests/nipux_cli
  36rg -n --hidden -S "(sk-[A-Za-z0-9_-]{20,}|OPENROUTER_API_KEY[=].+|OPENAI_API_KEY[=].+|Bearer\\s+[A-Za-z0-9._-]{20,})" . \
  37  -g '!uv.lock' -g '!**/__pycache__/**' -g '!*.db' -g '!*.log' -g '!*.pyc'
  38```
config.example.yaml 27 lines
   1model:
   2  name: local-model
   3  base_url: http://localhost:8000/v1
   4  api_key_env: OPENAI_API_KEY
   5  context_length: 262144
   6  input_cost_per_million: null
   7  output_cost_per_million: null
   8runtime:
   9  max_step_seconds: 600
  10  max_steps_per_run: 1
  11  artifact_inline_char_limit: 12000
  12  daily_digest_enabled: true
  13  daily_digest_time: "08:00"
  14  max_job_cost_usd: null
  15tools:
  16  browser: true
  17  web: true
  18  shell: true
  19  files: true
  20email:
  21  enabled: false
  22  smtp_host: ""
  23  smtp_port: 587
  24  username: ""
  25  password_env: NIPUX_EMAIL_PASSWORD
  26  from_addr: ""
  27  to_addr: ""
docs/long-running-memory-graph-design.md 62 lines
   1# Long-Running Memory Graph Design
   2
   3Nipux needs long-running workers that keep improving instead of flattening into repeated search, notes, or shallow checkpoints. The backend now treats each job as having a small durable "brain": a job-local memory graph made of connected nodes and links. It is not task-specific and does not require embeddings or a new service to be useful.
   4
   5## Research Takeaways
   6
   7- **Complementary learning systems:** human memory separates fast episodic capture from slower semantic consolidation. The hippocampus rapidly stores separated episodes while cortex gradually extracts structure. Nipux mirrors this with recent events/steps as fast episodic traces and `memory_graph` nodes as consolidated reusable knowledge. Source: [O'Reilly and Norman, 2002](https://collaborate.princeton.edu/en/publications/hippocampal-and-neocortical-contributions-to-memory-advances-in-t) and [McClelland et al., 1995](https://colab.ws/articles/10.1037/0033-295x.102.3.419).
   8- **Sleep/consolidation:** memory consolidation strengthens relevant traces and reorganizes them into associations that support later inference. Nipux should periodically turn raw work into compact graph nodes and edges instead of replaying full history. Source: [Born and Wilhelm, 2012](https://link.springer.com/article/10.1007/s00426-011-0335-6) and [Diekelmann and Born, 2010](https://www.nature.com/articles/nrn2762).
   9- **Reflexion:** agents improve without weight updates by writing verbal reflections into episodic memory after feedback. Nipux already has lessons and reflection; the graph adds structure so reflections can connect to facts, decisions, tasks, and evidence. Source: [Reflexion](https://huggingface.co/papers/2303.11366).
  10- **Generative Agents:** believable long-lived agents combine memory stream, retrieval, reflection, and planning. Nipux should keep the event stream, but retrieve distilled context through durable ledgers and graph nodes. Source: [Generative Agents](https://huggingface.co/papers/2304.03442).
  11- **MemGPT:** OS-style memory tiers let fixed-context models use long histories by paging between prompt context and archival memory. Nipux's prompt now gets only a ranked slice of graph memory, with `search_memory_graph` for deeper recall. Source: [MemGPT](https://huggingface.co/papers/2310.08560).
  12- **Voyager:** long-horizon improvement comes from an automatic curriculum, a growing reusable skill library, and iterative self-verification. Nipux's graph supports this by representing skills, strategies, open questions, decisions, and evidence links as reusable nodes. Source: [Voyager](https://voyager.minedojo.org/).
  13- **Agent memory surveys and graph memory work:** recent surveys and systems emphasize memory operations: write, retrieve, update, consolidate, forget/deprecate, and evaluate. Graph memory helps preserve relationships and temporal change better than a flat note list. Sources: [LLM Agent Memory Survey](https://huggingface.co/papers/2404.13501), [AriGraph](https://huggingface.co/papers/2407.04363), [Zep](https://huggingface.co/papers/2501.13956).
  14
  15## Backend Shape
  16
  17Each job can now maintain metadata under `memory_graph`:
  18
  19- `nodes`: connected notes with `kind`, `status`, `summary`, `salience`, `confidence`, `tags`, `parent_key`, `links`, and `evidence_refs`.
  20- `edges`: typed links between nodes such as `supports`, `replaces`, `raises`, `blocks`, or `depends_on`.
  21- Nodes are generic: `episode`, `fact`, `strategy`, `skill`, `question`, `decision`, `constraint`, `artifact`, `source`, `task`, `experiment`, and `milestone`.
  22
  23The worker gets a compact `Memory graph` prompt section that ranks active, salient, recent, and procedural nodes. It can call `search_memory_graph` when it needs deeper recall. It can call `record_memory_graph` whenever new work should become reusable knowledge.
  24
  25Operators can inspect the same graph with `nipux memory --graph`, which writes a self-contained clickable HTML artifact. The view uses a local canvas renderer, needs no external network assets, and lets the operator rotate, zoom, search, and click nodes to inspect summaries, evidence refs, tags, and links.
  26
  27The worker also has a generic consolidation guard: once findings, sources, experiments, lessons, resolved tasks, or roadmap milestones accumulate faster than graph nodes and links, more branch churn is blocked until the worker calls `record_memory_graph` or records why the current branch has no reusable memory value.
  28
  29## Live Model Smoke
  30
  31Use `scripts/live_memory_graph_smoke.py` to verify a real OpenAI-compatible model can follow the graph-consolidation contract. The script creates a temporary Nipux home, disables side-effect tools, seeds generic durable job state, and runs a few worker turns. It succeeds only after the model calls `record_memory_graph` and creates at least one node.
  32
  33Example:
  34
  35```bash
  36OPENROUTER_API_KEY = <redacted>
  37```
  38
  39The key is read from the configured environment variable and is never printed. If no key is present, the script exits before making a network request.
  40
  41Latest smoke result:
  42
  43- Model: `qwen/qwen3.6-27b`
  44- Provider path: OpenAI-compatible chat completions through OpenRouter
  45- Isolation: temporary Nipux home with browser, web, shell, and file tools disabled
  46- Result: first worker step called `record_memory_graph`
  47- Graph written: 7 nodes and 8 edges
  48
  49## Why This Should Improve Long Runs
  50
  51- Raw history stays available in events/artifacts, but the model sees a compact graph slice.
  52- Bad or small models get explicit, typed memory instead of relying on implicit recap.
  53- Repeated branches can be deprecated instead of merely summarized.
  54- Useful strategies and skills can compound across hundreds or thousands of actions.
  55- Open questions remain visible as first-class nodes, making it harder for the worker to drift away from unresolved blockers.
  56
  57## Next Backend Slices
  58
  59- Add periodic deterministic consolidation that proposes graph nodes from recent events when the model fails to do it.
  60- Tune graph-aware stagnation checks from real runs: if a branch has no new node, edge, validation, experiment, or deliverable after a budget, force consolidation or branch rejection.
  61- Add better retrieval scoring using local embeddings when available, while keeping lexical fallback mandatory.
  62- Add live UI/status counters for memory graph growth: new nodes, active questions, deprecated paths, and current strategy.
docs/pi-agent-core-port-plan.md 267 lines
   1# Pi Agent Core Port Plan
   2
   3Research date: 2026-04-30
   4
   5Sources:
   6- https://github.com/badlogic/pi-mono
   7- https://github.com/badlogic/pi-mono/tree/main/packages/agent
   8- https://github.com/badlogic/pi-mono/tree/main/packages/coding-agent
   9- https://github.com/badlogic/pi-mono/blob/main/packages/coding-agent/docs/session.md
  10- https://github.com/badlogic/pi-mono/blob/main/packages/coding-agent/docs/sdk.md
  11- https://github.com/badlogic/pi-mono/blob/main/packages/coding-agent/docs/rpc.md
  12
  13Pi is MIT licensed, so direct adaptation is allowed if we preserve attribution
  14where substantial code is ported. The right move is not to copy the full
  15TypeScript app into Nipux. The right move is to port the small generic runtime
  16ideas from `packages/agent` and keep Nipux's SQLite daemon, tools, and
  17multi-job persistence.
  18
  19## What Pi Does Better
  20
  21Pi's core is a stateful agent loop, not a "one next action" prompt wrapper.
  22The important files are:
  23
  24- `packages/agent/src/agent-loop.ts`
  25- `packages/agent/src/agent.ts`
  26- `packages/agent/src/types.ts`
  27- `packages/coding-agent/src/core/session-manager.ts`
  28- `packages/coding-agent/src/core/agent-session.ts`
  29- `packages/coding-agent/src/core/compaction/*`
  30
  31Key behaviors to port:
  32
  331. Evented loop as the runtime contract.
  34   Pi emits `agent_start`, `turn_start`, `message_start`, `message_update`,
  35   `message_end`, `tool_execution_start`, `tool_execution_update`,
  36   `tool_execution_end`, `turn_end`, and `agent_end`. Nipux has events, but
  37   currently treats a daemon step as the main unit. We should make these events
  38   first-class and derive UI/status from them.
  39
  402. Real transcript state.
  41   Pi keeps `AgentMessage[]` as state and converts it to LLM messages only at
  42   the model boundary. Nipux currently rebuilds each prompt from job metadata,
  43   memory, recent steps, and ledgers. That works, but it loses the clean
  44   distinction between visible transcript, UI-only records, and model context.
  45
  463. Context transform boundary.
  47   Pi uses `transformContext(messages)` before `convertToLlm(messages)`. This is
  48   the exact place Nipux should inject durable operator context, compact memory,
  49   task contracts, ledgers, and active constraints without polluting raw history.
  50
  514. Steering and follow-up queues.
  52   Pi splits queued user input into:
  53   - `steer`: delivered after current tool execution and before the next model turn.
  54   - `followUp`: delivered only when the agent would otherwise stop.
  55   Nipux already has `steer` and `follow_up` metadata, but delivery is bolted
  56   onto single steps. This should move into the agent core.
  57
  585. Hookable tool preflight and postprocessing.
  59   Pi has `beforeToolCall` and `afterToolCall`. Nipux has good generic guards,
  60   but they live inside `worker.py`. They should become hooks:
  61   - duplicate/repetition guard
  62   - artifact obligation guard
  63   - measurement obligation guard
  64   - source quality guard
  65   - experiment accounting after measurable shell output
  66
  676. Tool batch semantics.
  68   Pi can prepare tool calls sequentially, execute safe tools in parallel, and
  69   still persist tool-result messages in assistant order. Nipux currently
  70   executes only the first tool call. That leaves useful model intent unused.
  71
  727. Compaction as session structure, not just memory refresh.
  73   Pi stores compaction entries in the session tree. Full history remains, but
  74   future model context sees a summary plus kept recent messages. Nipux has
  75   `memory_index`, but it should add explicit compaction entries tied to the
  76   transcript path.
  77
  788. Continue semantics.
  79   Pi has `continue()` for retries after errors or compaction. Nipux currently
  80   creates a new run every step. Continue semantics would make recovery cleaner
  81   after model errors, context overflow, daemon restart, and queued messages.
  82
  83## Nipux Mapping
  84
  85Current Nipux files:
  86
  87- `nipux_cli/worker.py`: prompt building, step execution, guards, reflection.
  88- `nipux_cli/daemon.py`: forever loop, lock, heartbeat, multi-job scheduling.
  89- `nipux_cli/db.py`: SQLite state, events, job metadata, ledgers.
  90- `nipux_cli/operator_context.py`: durable operator message filtering.
  91- `nipux_cli/tools.py`: tool registry and tool execution.
  92- `nipux_cli/compression.py`: compact memory refresh.
  93- `nipux_cli/cli.py`: chat/TUI/status/output rendering.
  94
  95Target files:
  96
  97- Add `nipux_cli/agent_core.py`
  98  - Python port of Pi's small `Agent`, `PendingMessageQueue`, event types,
  99    tool result types, and loop control.
 100  - Keep attribution header because it is directly inspired by Pi's MIT code.
 101  - Support non-streaming model responses first, then streaming later.
 102
 103- Add `nipux_cli/session.py`
 104  - Load/save transcript entries for a job.
 105  - Build current session context from entries plus compaction records.
 106  - Keep SQLite as source of truth instead of JSONL files, but use Pi's entry
 107    shape: `message`, `compaction`, `branch_summary`, `custom`,
 108    `custom_message`, `model_change`, and `label`.
 109
 110- Refactor `nipux_cli/worker.py`
 111  - Move prompt assembly into `transform_context`.
 112  - Move `_blocked_tool_call_result` into `before_tool_call`.
 113  - Move measurement/source/artifact side effects into `after_tool_call`.
 114  - Replace "only first tool call" execution with core loop tool execution.
 115  - Preserve one bounded heartbeat by limiting wall-clock/tool budget per daemon
 116    tick, not by discarding the agent loop structure.
 117
 118- Extend `nipux_cli/db.py`
 119  - Add a `session_entries` table:
 120    - `id`
 121    - `job_id`
 122    - `parent_id`
 123    - `entry_type`
 124    - `created_at`
 125    - `payload_json`
 126  - Add `job_session_state` metadata for current leaf and compaction stats.
 127  - Backfill existing `events`/`steps` into session view lazily.
 128
 129- Keep `nipux_cli/daemon.py`
 130  - Do not replace the daemon. Pi is mostly single-session interactive; Nipux
 131    needs multi-job background scheduling.
 132  - The daemon should call `AgentSession.continue_or_prompt()` for whichever job
 133    is runnable, then keep heartbeating while the agent loop emits events.
 134
 135## Implementation Sequence
 136
 137### Commit 1: Agent Core Skeleton
 138
 139Create `agent_core.py` with:
 140
 141- `AgentMessage`
 142- `AgentToolCall`
 143- `AgentToolResult`
 144- `AgentEvent`
 145- `PendingMessageQueue`
 146- `AgentState`
 147- `Agent`
 148
 149Support:
 150
 151- `prompt(messages)`
 152- `continue_()`
 153- `steer(message)`
 154- `follow_up(message)`
 155- `abort()`
 156- `wait_for_idle()`
 157- event subscription
 158- sequential tool execution only
 159- `before_tool_call`
 160- `after_tool_call`
 161- `transform_context`
 162- `convert_to_llm`
 163
 164Tests:
 165
 166- event order matches Pi's documented event order
 167- steering is delivered after tool execution
 168- follow-up waits until no tool calls remain
 169- prompt/continue reject concurrent runs
 170- tool errors become tool result messages instead of crashing the loop
 171
 172### Commit 2: Session Entries
 173
 174Add SQLite session entries and a `SessionManager` equivalent.
 175
 176Tests:
 177
 178- append messages with parent IDs
 179- build context from current leaf
 180- compaction summary appears before kept messages
 181- full raw history remains queryable
 182- branch summaries can be represented even if UI does not expose branching yet
 183
 184### Commit 3: Worker Integration
 185
 186Make `run_one_step` use the Pi-style agent loop.
 187
 188Important constraint:
 189
 190Nipux should still be generic and background-safe. Do not encode any objective,
 191host, model, source, or task domain. The old guards stay generic and move into
 192hooks.
 193
 194Tests:
 195
 196- model can call multiple tools and all are persisted in order
 197- duplicate/measurement/artifact guards block through `before_tool_call`
 198- measurable output creates obligations through `after_tool_call`
 199- operator steer persists until acknowledged and is injected through the queue
 200- follow-up waits behind active branch work
 201
 202### Commit 4: Compaction
 203
 204Replace fixed memory refresh as the main context strategy with session
 205compaction:
 206
 207- estimate context from last usage when available
 208- truncate tool results for summarization
 209- store compaction entries
 210- rebuild model context from compaction plus recent path
 211- continue after overflow or threshold compaction
 212
 213Tests:
 214
 215- long transcript compacts without losing recent messages
 216- compaction failure does not crash daemon
 217- queued messages survive compaction
 218- context overflow retry uses `continue_()`
 219
 220### Commit 5: UI/Event Stream Cleanup
 221
 222Make CLI/TUI read the event stream and session entries, not ad hoc step text.
 223
 224Tests:
 225
 226- chat shows user/assistant transcript
 227- right pane shows job/daemon/session stats
 228- activity shows tool start/update/end events
 229- history can show full transcript, compacted transcript, and raw events
 230
 231## Why This Should Fix The Current Failure Mode
 232
 233Nipux currently has many good guards, but the model is still treated like a
 234stateless planner that gets one tool call per daemon step. That encourages
 235research churn because the loop boundary is outside the model's natural
 236tool-result feedback cycle.
 237
 238Pi's design keeps the model inside a coherent turn loop:
 239
 2401. User/operator/context enters as messages.
 2412. Assistant proposes tool calls.
 2423. Tools execute and return tool-result messages.
 2434. The assistant immediately sees those results.
 2445. Steering and follow-up are delivered at well-defined boundaries.
 2456. Compaction preserves the useful path instead of stuffing every summary into
 246   every future prompt.
 247
 248Porting that structure should make Nipux feel less like a step counter and more
 249like an actual long-running agent runtime.
 250
 251## What Not To Copy
 252
 253Do not copy Pi's task-specific extension examples into Nipux core.
 254
 255Do not make the harness depend on Node, Bun, or the Pi TUI.
 256
 257Do not encode any SSH, model, inference, lead-finding, browser-source, or local
 258machine assumptions. Everything here must stay generic:
 259
 260- transcript
 261- events
 262- queues
 263- tool hooks
 264- compaction
 265- session state
 266- UI rendering over events
 267
nipux_cli/__init__.py 11 lines
   1"""Minimal daemon-first Nipux runtime.
   2
   3This package owns the daemon, state store, model adapter, artifact store, and
   4fixed tool surface.
   5"""
   6
   7__all__ = [
   8    "__version__",
   9]
  10
  11__version__ = "0.1.0"
nipux_cli/__main__.py 9 lines
   1"""Run the Nipux CLI with ``python -m nipux_cli``."""
   2
   3from __future__ import annotations
   4
   5from nipux_cli.cli import main
   6
   7
   8if __name__ == "__main__":
   9    main()
nipux_cli/artifacts.py 138 lines
   1"""Artifact file storage for long-running jobs."""
   2
   3from __future__ import annotations
   4
   5import hashlib
   6import re
   7from dataclasses import dataclass
   8from pathlib import Path
   9from typing import Any
  10
  11from nipux_cli.db import AgentDB, new_id, utc_now
  12
  13_SAFE_NAME_RE = re.compile(r"[^A-Za-z0-9._-]+")
  14
  15
  16def safe_filename(value: str, *, default: str = "artifact") -> str:
  17    cleaned = _SAFE_NAME_RE.sub("-", value.strip()).strip(".-")
  18    return (cleaned or default)[:96]
  19
  20
  21def sha256_text(text: str) -> str:
  22    return hashlib.sha256(text.encode("utf-8")).hexdigest()
  23
  24
  25@dataclass(frozen=True)
  26class StoredArtifact:
  27    id: str
  28    path: Path
  29    sha256: str
  30    title: str | None = None
  31    summary: str | None = None
  32
  33
  34class ArtifactStore:
  35    def __init__(self, home: str | Path, db: AgentDB | None = None):
  36        self.home = Path(home)
  37        self.db = db
  38        self.home.mkdir(parents=True, exist_ok=True)
  39
  40    def job_dir(self, job_id: str) -> Path:
  41        path = self.home / "jobs" / job_id / "artifacts"
  42        path.mkdir(parents=True, exist_ok=True)
  43        return path
  44
  45    def _assert_inside_home(self, path: Path) -> Path:
  46        resolved = path.resolve()
  47        root = self.home.resolve()
  48        try:
  49            resolved.relative_to(root)
  50        except ValueError as exc:
  51            raise ValueError(f"Refusing to read outside agent home: {path}") from exc
  52        return resolved
  53
  54    def write_text(
  55        self,
  56        *,
  57        job_id: str,
  58        content: str,
  59        title: str | None = None,
  60        summary: str | None = None,
  61        artifact_type: str = "text",
  62        run_id: str | None = None,
  63        step_id: str | None = None,
  64        metadata: dict[str, Any] | None = None,
  65    ) -> StoredArtifact:
  66        suffix = "html" if artifact_type == "html" else "md" if artifact_type in {"digest", "markdown", "text"} else "txt"
  67        stem = safe_filename(title or artifact_type)
  68        timestamp = utc_now().replace("+00:00", "Z").replace(":", "")
  69        filename = f"{timestamp}-{stem}-{new_id('file')}.{suffix}"
  70        path = self.job_dir(job_id) / filename
  71        path.write_text(content, encoding="utf-8")
  72        digest = sha256_text(content)
  73        artifact_id = new_id("art")
  74        if self.db is not None:
  75            artifact_id = self.db.add_artifact(
  76                job_id=job_id,
  77                run_id=run_id,
  78                step_id=step_id,
  79                path=path,
  80                sha256=digest,
  81                artifact_type=artifact_type,
  82                title=title,
  83                summary=summary,
  84                metadata=metadata,
  85            )
  86        return StoredArtifact(id=artifact_id, path=path, sha256=digest, title=title, summary=summary)
  87
  88    def read_text(self, artifact_id_or_path: str) -> str:
  89        path = Path(artifact_id_or_path)
  90        if self.db is not None and not path.exists():
  91            path = Path(self.db.get_artifact(artifact_id_or_path)["path"])
  92        safe_path = self._assert_inside_home(path)
  93        return safe_path.read_text(encoding="utf-8")
  94
  95    def search_text(self, *, job_id: str, query: str, limit: int = 10) -> list[dict[str, Any]]:
  96        if self.db is None:
  97            return []
  98        query_lower = query.lower().strip()
  99        results: list[dict[str, Any]] = []
 100        for artifact in self.db.list_artifacts(job_id, limit=250):
 101            haystack = " ".join(
 102                str(artifact.get(key) or "") for key in ("title", "summary", "type")
 103            ).lower()
 104            content = ""
 105            if query_lower and query_lower not in haystack:
 106                try:
 107                    content = self.read_text(artifact["id"])
 108                except OSError:
 109                    content = ""
 110                if query_lower not in content.lower():
 111                    continue
 112            elif not query_lower:
 113                try:
 114                    content = self.read_text(artifact["id"])
 115                except OSError:
 116                    content = ""
 117            if not content:
 118                try:
 119                    content = self.read_text(artifact["id"])
 120                except OSError:
 121                    content = ""
 122            excerpt = content[:500]
 123            if query_lower:
 124                idx = content.lower().find(query_lower)
 125                if idx >= 0:
 126                    start = max(0, idx - 160)
 127                    excerpt = content[start:start + 500]
 128            results.append({
 129                "id": artifact["id"],
 130                "title": artifact.get("title"),
 131                "type": artifact.get("type"),
 132                "path": artifact.get("path"),
 133                "summary": artifact.get("summary"),
 134                "excerpt": excerpt,
 135            })
 136            if len(results) >= limit:
 137                break
 138        return results
nipux_cli/browser.py 189 lines
   1"""Small `agent-browser` wrapper for the Nipux runtime."""
   2
   3from __future__ import annotations
   4
   5import json
   6import os
   7import shutil
   8import subprocess
   9import tempfile
  10import hashlib
  11from pathlib import Path
  12from typing import Any
  13
  14from nipux_cli.config import AppConfig
  15from nipux_cli.source_quality import anti_bot_reason
  16
  17
  18def _find_agent_browser() -> list[str]:
  19    direct = shutil.which("agent-browser")
  20    if direct:
  21        return [direct]
  22    if shutil.which("npx"):
  23        return ["npx", "--yes", "agent-browser"]
  24    raise FileNotFoundError("agent-browser CLI not found. Install with: npm install -g agent-browser && agent-browser install")
  25
  26
  27def _session_name(task_id: str) -> str:
  28    safe = "".join(ch if ch.isalnum() or ch in "-_" else "_" for ch in task_id)
  29    if len(safe) <= 32:
  30        return f"nipux_{safe}"
  31    digest = hashlib.sha1(task_id.encode("utf-8")).hexdigest()[:10]
  32    return f"nipux_{safe[:20]}_{digest}"
  33
  34
  35def _profile_dir(config: AppConfig, task_id: str) -> Path:
  36    return config.runtime.home / "browser-profiles" / _session_name(task_id)
  37
  38
  39def _socket_dir(task_id: str) -> Path:
  40    root = Path(os.environ.get("NIPUX_BROWSER_SOCKET_ROOT") or "/tmp")
  41    return root / "nipux-ab" / _session_name(task_id)
  42
  43
  44def run_browser_command(
  45    config: AppConfig,
  46    *,
  47    task_id: str,
  48    command: str,
  49    args: list[str] | None = None,
  50    timeout: int = 60,
  51) -> dict[str, Any]:
  52    args = args or []
  53    profile_dir = _profile_dir(config, task_id)
  54    profile_dir.mkdir(parents=True, exist_ok=True)
  55    cmd = [
  56        *_find_agent_browser(),
  57        "--session",
  58        _session_name(task_id),
  59        "--session-name",
  60        _session_name(task_id),
  61        "--profile",
  62        str(profile_dir),
  63        "--json",
  64        command,
  65        *args,
  66    ]
  67    socket_dir = _socket_dir(task_id)
  68    socket_dir.mkdir(parents=True, exist_ok=True)
  69    env = {
  70        **os.environ,
  71        "AGENT_BROWSER_SOCKET_DIR": str(socket_dir),
  72        "AGENT_BROWSER_SESSION_NAME": _session_name(task_id),
  73        "AGENT_BROWSER_PROFILE": str(profile_dir),
  74    }
  75
  76    with tempfile.TemporaryDirectory(dir=str(socket_dir)) as tmp:
  77        stdout_path = Path(tmp) / "stdout"
  78        stderr_path = Path(tmp) / "stderr"
  79        with stdout_path.open("w", encoding="utf-8") as stdout, stderr_path.open("w", encoding="utf-8") as stderr:
  80            proc = subprocess.Popen(cmd, stdin=subprocess.DEVNULL, stdout=stdout, stderr=stderr, env=env)
  81            try:
  82                proc.wait(timeout=timeout)
  83            except subprocess.TimeoutExpired:
  84                proc.kill()
  85                proc.wait()
  86                return {"success": False, "error": f"browser command timed out after {timeout}s"}
  87        stdout_text = stdout_path.read_text(encoding="utf-8").strip()
  88        stderr_text = stderr_path.read_text(encoding="utf-8").strip()
  89
  90    if stdout_text:
  91        try:
  92            result = json.loads(stdout_text)
  93        except json.JSONDecodeError:
  94            return {"success": False, "error": f"agent-browser returned non-JSON output: {stdout_text[:1000]}"}
  95        if isinstance(result, dict):
  96            result.setdefault("browser_session", _session_name(task_id))
  97            result.setdefault("browser_profile", str(profile_dir))
  98            return result
  99        return {"success": True, "data": result, "browser_session": _session_name(task_id), "browser_profile": str(profile_dir)}
 100    if proc.returncode != 0:
 101        return {
 102            "success": False,
 103            "error": stderr_text or f"agent-browser exited {proc.returncode}",
 104            "browser_session": _session_name(task_id),
 105            "browser_profile": str(profile_dir),
 106        }
 107    return {"success": True, "data": {}, "browser_session": _session_name(task_id), "browser_profile": str(profile_dir)}
 108
 109
 110def navigate(config: AppConfig, *, task_id: str, url: str) -> dict[str, Any]:
 111    result = run_browser_command(config, task_id=task_id, command="open", args=[url], timeout=90)
 112    if not result.get("success"):
 113        return result
 114    snapshot = run_browser_command(config, task_id=task_id, command="snapshot", args=["-c"], timeout=30)
 115    if snapshot.get("success"):
 116        result["snapshot"] = snapshot.get("data", {}).get("snapshot", "")
 117        result["refs"] = snapshot.get("data", {}).get("refs", {})
 118    return _annotate_source_quality(result)
 119
 120
 121def snapshot(config: AppConfig, *, task_id: str, full: bool = False) -> dict[str, Any]:
 122    return _annotate_source_quality(run_browser_command(config, task_id=task_id, command="snapshot", args=[] if full else ["-c"]))
 123
 124
 125def click(config: AppConfig, *, task_id: str, ref: str) -> dict[str, Any]:
 126    result = run_browser_command(config, task_id=task_id, command="click", args=[ref if ref.startswith("@") else f"@{ref}"])
 127    return _with_recovery_snapshot(config, task_id=task_id, result=result)
 128
 129
 130def fill(config: AppConfig, *, task_id: str, ref: str, text: str) -> dict[str, Any]:
 131    result = run_browser_command(config, task_id=task_id, command="fill", args=[ref if ref.startswith("@") else f"@{ref}", text])
 132    return _with_recovery_snapshot(config, task_id=task_id, result=result)
 133
 134
 135def scroll(config: AppConfig, *, task_id: str, direction: str) -> dict[str, Any]:
 136    return run_browser_command(config, task_id=task_id, command="scroll", args=[direction, "500"])
 137
 138
 139def back(config: AppConfig, *, task_id: str) -> dict[str, Any]:
 140    return run_browser_command(config, task_id=task_id, command="back")
 141
 142
 143def press(config: AppConfig, *, task_id: str, key: str) -> dict[str, Any]:
 144    return run_browser_command(config, task_id=task_id, command="press", args=[key])
 145
 146
 147def console(config: AppConfig, *, task_id: str, clear: bool = False, expression: str | None = None) -> dict[str, Any]:
 148    if expression is not None:
 149        return run_browser_command(config, task_id=task_id, command="eval", args=[expression])
 150    args = ["--clear"] if clear else []
 151    console_result = run_browser_command(config, task_id=task_id, command="console", args=args)
 152    errors_result = run_browser_command(config, task_id=task_id, command="errors", args=args)
 153    return {
 154        "success": bool(console_result.get("success") or errors_result.get("success")),
 155        "console": console_result,
 156        "errors": errors_result,
 157    }
 158
 159
 160def _annotate_source_quality(result: dict[str, Any]) -> dict[str, Any]:
 161    data = result.get("data") if isinstance(result.get("data"), dict) else {}
 162    reason = anti_bot_reason(
 163        str(data.get("title") or ""),
 164        str(data.get("url") or data.get("origin") or ""),
 165        str(result.get("snapshot") or data.get("snapshot") or ""),
 166    )
 167    if not reason:
 168        return result
 169    result["source_warning"] = reason
 170    warnings = result.get("warnings") if isinstance(result.get("warnings"), list) else []
 171    warnings.append({
 172        "type": "anti_bot",
 173        "message": reason,
 174        "guidance": "This page may require normal human browser verification. Do not bypass protections; continue only with visible browser actions or choose another source if stuck.",
 175    })
 176    result["warnings"] = warnings
 177    return result
 178
 179
 180def _with_recovery_snapshot(config: AppConfig, *, task_id: str, result: dict[str, Any]) -> dict[str, Any]:
 181    if result.get("success", True):
 182        return result
 183    error = str(result.get("error") or "")
 184    if "unknown ref" not in error.lower():
 185        return result
 186    recovery = run_browser_command(config, task_id=task_id, command="snapshot", args=["-c"], timeout=30)
 187    result["recovery_guidance"] = "The ref was stale or missing. Use refs from recovery_snapshot before clicking or typing again."
 188    result["recovery_snapshot"] = _annotate_source_quality(recovery)
 189    return result
nipux_cli/chat_commands.py 286 lines
   1"""Slash-command dispatch for focused chat sessions."""
   2
   3from __future__ import annotations
   4
   5import argparse
   6from dataclasses import dataclass
   7from typing import Any, Callable
   8
   9from nipux_cli.config import DEFAULT_CONTEXT_LENGTH
  10from nipux_cli.tui_style import _one_line
  11
  12
  13@dataclass(frozen=True)
  14class ChatCommandDeps:
  15    db_factory: Callable[[], tuple[Any, Any]]
  16    jobs: Callable[[argparse.Namespace], None]
  17    history: Callable[[argparse.Namespace], None]
  18    events: Callable[[argparse.Namespace], None]
  19    logs: Callable[[argparse.Namespace], None]
  20    updates: Callable[[argparse.Namespace], None]
  21    artifacts: Callable[[argparse.Namespace], None]
  22    artifact: Callable[[argparse.Namespace], None]
  23    lessons: Callable[[argparse.Namespace], None]
  24    findings: Callable[[argparse.Namespace], None]
  25    tasks: Callable[[argparse.Namespace], None]
  26    roadmap: Callable[[argparse.Namespace], None]
  27    experiments: Callable[[argparse.Namespace], None]
  28    sources: Callable[[argparse.Namespace], None]
  29    memory: Callable[[argparse.Namespace], None]
  30    metrics: Callable[[argparse.Namespace], None]
  31    activity: Callable[[argparse.Namespace], None]
  32    digest: Callable[[argparse.Namespace], None]
  33    status: Callable[[argparse.Namespace], None]
  34    usage: Callable[[argparse.Namespace], None]
  35    handle_setting: Callable[[str, list[str]], bool]
  36    doctor: Callable[[argparse.Namespace], None]
  37    init: Callable[[argparse.Namespace], None]
  38    health: Callable[[argparse.Namespace], None]
  39    start: Callable[[argparse.Namespace], None]
  40    ensure_job_runnable: Callable[[Any, str], None]
  41    run: Callable[[argparse.Namespace], None]
  42    restart: Callable[[argparse.Namespace], None]
  43    work: Callable[[argparse.Namespace], None]
  44    pause: Callable[[argparse.Namespace], None]
  45    resume: Callable[[argparse.Namespace], None]
  46    cancel: Callable[[argparse.Namespace], None]
  47    queue_note: Callable[..., None]
  48    create_job: Callable[..., tuple[str, str]]
  49    focus: Callable[[argparse.Namespace], None]
  50    delete: Callable[[argparse.Namespace], None]
  51
  52
  53def handle_chat_slash_command(job_id: str, command: str, rest: list[str], *, deps: ChatCommandDeps) -> bool:
  54    if command in {"jobs", "ls"}:
  55        deps.jobs(argparse.Namespace())
  56        return True
  57    if command == "history":
  58        deps.history(
  59            argparse.Namespace(
  60                job_id=job_id,
  61                limit=_optional_int(rest, default=40),
  62                chars=220,
  63                full=False,
  64                json=False,
  65            )
  66        )
  67        return True
  68    if command == "events":
  69        deps.events(
  70            argparse.Namespace(
  71                job_id=job_id,
  72                limit=_optional_int(rest, default=40),
  73                chars=220,
  74                full=False,
  75                json=False,
  76                follow=False,
  77                interval=2.0,
  78            )
  79        )
  80        return True
  81    if command == "outputs":
  82        deps.logs(
  83            argparse.Namespace(
  84                job_id=[job_id],
  85                limit=_optional_int(rest, default=25),
  86                verbose=False,
  87                chars=260,
  88            )
  89        )
  90        return True
  91    if command in {"updates", "outcomes", "outcome"}:
  92        all_jobs = bool(rest and rest[0].lower() == "all")
  93        deps.updates(argparse.Namespace(job_id=job_id, all=all_jobs, limit=5, chars=180, paths=False))
  94        return True
  95    if command == "artifacts":
  96        deps.artifacts(argparse.Namespace(job_id=job_id, limit=10, chars=220, paths=False))
  97        return True
  98    if command == "artifact":
  99        query = " ".join(rest).strip()
 100        if not query:
 101            print("usage: /artifact QUERY_OR_ID")
 102            return True
 103        deps.artifact(argparse.Namespace(artifact_id_or_path=[query], job_id=job_id, chars=12000))
 104        return True
 105    if command == "lessons":
 106        deps.lessons(argparse.Namespace(job_id=job_id, limit=10, chars=220))
 107        return True
 108    if command == "findings":
 109        deps.findings(argparse.Namespace(job_id=job_id, limit=20, chars=220, json=False))
 110        return True
 111    if command == "tasks":
 112        deps.tasks(argparse.Namespace(job_id=job_id, limit=20, chars=220, status=None, json=False))
 113        return True
 114    if command == "roadmap":
 115        deps.roadmap(argparse.Namespace(job_id=job_id, limit=20, features=3, chars=220, json=False))
 116        return True
 117    if command == "experiments":
 118        deps.experiments(argparse.Namespace(job_id=job_id, limit=20, chars=220, status=None, json=False))
 119        return True
 120    if command == "sources":
 121        deps.sources(argparse.Namespace(job_id=job_id, limit=20, chars=220, json=False))
 122        return True
 123    if command == "memory":
 124        deps.memory(
 125            argparse.Namespace(
 126                job_id=job_id,
 127                limit=10,
 128                chars=220,
 129                json=False,
 130                graph=bool(rest and rest[0].lower() in {"graph", "view", "html"}),
 131                output=None,
 132            )
 133        )
 134        return True
 135    if command == "metrics":
 136        deps.metrics(argparse.Namespace(job_id=job_id, chars=220))
 137        return True
 138    if command == "learn":
 139        lesson = " ".join(rest).strip()
 140        if not lesson:
 141            print("usage: /learn LESSON")
 142            return True
 143        db, _config = deps.db_factory()
 144        try:
 145            entry = db.append_lesson(job_id, lesson, category="operator_preference", metadata={"source": "chat"})
 146            job = db.get_job(job_id)
 147            print(f"learned for {job['title']}: {_one_line(entry['lesson'], 220)}")
 148        finally:
 149            db.close()
 150        return True
 151    if command == "activity":
 152        deps.activity(
 153            argparse.Namespace(job_id=job_id, limit=20, chars=180, follow=False, interval=2.0, verbose=False, paths=False)
 154        )
 155        return True
 156    if command == "digest":
 157        deps.digest(argparse.Namespace(job_id=[job_id]))
 158        return True
 159    if command == "status":
 160        deps.status(argparse.Namespace(job_id=job_id, limit=8, chars=180, full=False, json=False))
 161        return True
 162    if command == "usage":
 163        deps.usage(argparse.Namespace(job_id=job_id, json=False))
 164        return True
 165    if command == "settings":
 166        deps.handle_setting("config", [])
 167        return True
 168    if deps.handle_setting(command, rest):
 169        return True
 170    if command == "doctor":
 171        try:
 172            deps.doctor(argparse.Namespace(check_model=True))
 173        except SystemExit:
 174            pass
 175        return True
 176    if command == "init":
 177        deps.init(
 178            argparse.Namespace(
 179                path=None,
 180                force=False,
 181                model=None,
 182                base_url=None,
 183                api_key_env=None,
 184                openrouter=False,
 185                context_length=DEFAULT_CONTEXT_LENGTH,
 186            )
 187        )
 188        return True
 189    if command == "health":
 190        deps.health(argparse.Namespace(limit=8, chars=180))
 191        return True
 192    if command == "start":
 193        deps.start(argparse.Namespace(poll_seconds=0.0, fake=False, quiet=False, log_file=None))
 194        return True
 195    if command == "run":
 196        db, _config = deps.db_factory()
 197        try:
 198            deps.ensure_job_runnable(db, job_id)
 199        finally:
 200            db.close()
 201        deps.run(
 202            argparse.Namespace(
 203                job_id=job_id,
 204                poll_seconds=0.0,
 205                interval=2.0,
 206                limit=20,
 207                chars=180,
 208                verbose=False,
 209                paths=False,
 210                fake=False,
 211                quiet=False,
 212                log_file=None,
 213                no_follow=True,
 214            )
 215        )
 216        return True
 217    if command == "restart":
 218        deps.restart(argparse.Namespace(poll_seconds=0.0, wait=5.0, fake=False, quiet=False, log_file=None))
 219        return True
 220    if command in {"work", "work-verbose"}:
 221        deps.work(
 222            argparse.Namespace(
 223                job_id=job_id,
 224                steps=_optional_int(rest, default=1),
 225                poll_seconds=0.5,
 226                fake=False,
 227                verbose=command == "work-verbose",
 228                dashboard=False,
 229                limit=12,
 230                chars=260 if command == "work" else 4000,
 231                continue_on_error=False,
 232            )
 233        )
 234        return True
 235    if command in {"pause", "stop"}:
 236        deps.pause(argparse.Namespace(job_id=job_id, note=rest))
 237        return True
 238    if command == "resume":
 239        deps.resume(argparse.Namespace(job_id=job_id))
 240        return True
 241    if command == "cancel":
 242        deps.cancel(argparse.Namespace(job_id=job_id, note=rest))
 243        return True
 244    if command == "note":
 245        message = " ".join(rest).strip()
 246        if not message:
 247            print("usage: /note MESSAGE")
 248            return True
 249        deps.queue_note(job_id, message, mode="note")
 250        return True
 251    if command == "follow":
 252        message = " ".join(rest).strip()
 253        if not message:
 254            print("usage: /follow MESSAGE")
 255            return True
 256        deps.queue_note(job_id, message, mode="follow_up")
 257        return True
 258    if command == "new":
 259        objective = " ".join(rest).strip()
 260        if not objective:
 261            print("usage: /new OBJECTIVE")
 262            return True
 263        _created_id, title = deps.create_job(objective=objective, title=None, kind="generic", cadence=None)
 264        print(f"created {title}")
 265        started = deps.start(argparse.Namespace(poll_seconds=0.0, fake=False, quiet=True, log_file=None))
 266        if started is False:
 267            print(f"focus set to {title}; worker is waiting for a working model.")
 268        else:
 269            print(f"focus set to {title}; initial plan accepted and worker started.")
 270        return True
 271    if command in {"focus", "switch"}:
 272        if not " ".join(rest).strip():
 273            deps.focus(argparse.Namespace(query=[]))
 274            return True
 275        deps.focus(argparse.Namespace(query=rest))
 276        return True
 277    if command == "delete":
 278        target = rest if rest else [job_id]
 279        deps.delete(argparse.Namespace(job_id=target, keep_files=False))
 280        return bool(rest)
 281    print(f"unknown chat command: /{command}")
 282    return True
 283
 284
 285def _optional_int(values: list[str], *, default: int) -> int:
 286    return int(values[0]) if values and values[0].isdigit() else default
nipux_cli/chat_context.py 223 lines
   1"""Prompt context builder for the Nipux chat-side controller model."""
   2
   3from __future__ import annotations
   4
   5from typing import Any
   6
   7from nipux_cli.db import AgentDB
   8from nipux_cli.event_render import event_line
   9from nipux_cli.metric_format import format_metric_value
  10from nipux_cli.operator_context import active_prompt_operator_entries
  11from nipux_cli.tui_event_format import clean_step_summary
  12from nipux_cli.tui_outcomes import (
  13    SUMMARY_EVENT_TYPES,
  14    SUMMARY_TOOL_EVENT_TYPES,
  15    hourly_outcome_summary,
  16    is_summary_event_candidate,
  17    model_update_event_parts,
  18    outcome_counts,
  19)
  20
  21
  22def build_chat_messages(db: AgentDB, job: dict[str, Any], message: str) -> list[dict[str, str]]:
  23    """Build bounded visible-state context for conversational job control."""
  24
  25    steps = db.list_steps(job_id=job["id"])[-10:]
  26    jobs = db.list_jobs()[:12]
  27    artifacts = db.list_artifacts(job["id"], limit=5)
  28    timeline_events = db.list_timeline_events(job["id"], limit=18)
  29    outcome_events = _durable_outcome_events(db, job["id"])
  30    metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
  31    operator_messages = metadata.get("operator_messages") if isinstance(metadata.get("operator_messages"), list) else []
  32    agent_updates = metadata.get("agent_updates") if isinstance(metadata.get("agent_updates"), list) else []
  33    lessons = metadata.get("lessons") if isinstance(metadata.get("lessons"), list) else []
  34    findings = metadata.get("finding_ledger") if isinstance(metadata.get("finding_ledger"), list) else []
  35    sources = metadata.get("source_ledger") if isinstance(metadata.get("source_ledger"), list) else []
  36    tasks = metadata.get("task_queue") if isinstance(metadata.get("task_queue"), list) else []
  37    experiments = metadata.get("experiment_ledger") if isinstance(metadata.get("experiment_ledger"), list) else []
  38    roadmap = metadata.get("roadmap") if isinstance(metadata.get("roadmap"), dict) else {}
  39
  40    step_lines = "\n".join(
  41        f"- #{step['step_no']} {step['status']} {step.get('tool_name') or step['kind']}: "
  42        f"{clean_step_summary(step.get('summary') or step.get('error') or '')}"
  43        for step in steps
  44    )
  45    artifact_lines = "\n".join(
  46        f"- #{index} {artifact.get('title') or artifact['id']}: {artifact.get('summary') or ''} "
  47        f"(view with /artifact {index})"
  48        for index, artifact in enumerate(artifacts, start=1)
  49    )
  50    steering_lines = "\n".join(
  51        f"- {entry.get('source', 'operator')} {entry.get('mode', 'steer')}: {entry.get('message', '')}"
  52        for entry in active_prompt_operator_entries(operator_messages)[-6:]
  53        if isinstance(entry, dict)
  54    )
  55    update_lines = "\n".join(
  56        f"- {entry.get('category', 'progress')}: {entry.get('message', '')}"
  57        for entry in agent_updates[-5:]
  58        if isinstance(entry, dict)
  59    )
  60    lesson_lines = "\n".join(
  61        f"- {entry.get('category', 'memory')}: {entry.get('lesson', '')}"
  62        for entry in lessons[-8:]
  63        if isinstance(entry, dict)
  64    )
  65    finding_lines = "\n".join(
  66        f"- {entry.get('name')}: {entry.get('category') or ''} {entry.get('location') or ''} score={entry.get('score')}"
  67        for entry in findings[-8:]
  68        if isinstance(entry, dict)
  69    )
  70    task_lines = "\n".join(
  71        f"- {entry.get('status') or 'open'} p={entry.get('priority') or 0}: {entry.get('title')}"
  72        for entry in tasks[-10:]
  73        if isinstance(entry, dict)
  74    )
  75    milestone_lines = _roadmap_lines(roadmap)
  76    experiment_lines = "\n".join(_experiment_line(entry) for entry in experiments[-10:] if isinstance(entry, dict))
  77    source_lines = "\n".join(
  78        f"- {entry.get('source')}: score={entry.get('usefulness_score')} "
  79        f"findings={entry.get('yield_count') or 0} outcome={entry.get('last_outcome') or ''}"
  80        for entry in sources[-8:]
  81        if isinstance(entry, dict)
  82    )
  83    timeline_lines = "\n".join(event_line(event, chars=700) for event in timeline_events[-12:])
  84
  85    sections = {
  86        "Jobs": _clip_chat_context(_job_list_lines(jobs, focused_job_id=job["id"]), 1_300),
  87        "Durable outcomes": _clip_chat_context(_durable_outcome_lines(outcome_events), 1_600),
  88        "Recent tool calls": _clip_chat_context(step_lines, 1_800),
  89        "Latest artifacts": _clip_chat_context(artifact_lines, 1_200),
  90        "Finding ledger": _clip_chat_context(finding_lines, 1_200),
  91        "Task queue": _clip_chat_context(task_lines, 1_300),
  92        "Roadmap": _clip_chat_context(milestone_lines, 1_200),
  93        "Experiment ledger": _clip_chat_context(experiment_lines, 1_300),
  94        "Source ledger": _clip_chat_context(source_lines, 1_100),
  95        "Lessons learned": _clip_chat_context(lesson_lines, 1_000),
  96        "Recent operator steering": _clip_chat_context(steering_lines, 1_200),
  97        "Recent agent notes": _clip_chat_context(update_lines, 1_200),
  98        "Recent visible timeline": _clip_chat_context(timeline_lines, 1_800),
  99    }
 100    section_text = "\n\n".join(f"{title}:\n{body or _empty_section_text(title)}" for title, body in sections.items())
 101    return [
 102        {
 103            "role": "system",
 104            "content": (
 105                "You are Nipux, the chat model that controls a generic long-running agent workspace. "
 106                "You know the visible CLI state, focused job, job list, task queue, artifacts, memory, metrics, and recent activity. "
 107                "Answer directly from the visible job state. Do not claim hidden chain-of-thought. "
 108                "If the operator asks for work to be done, explain the concrete job/control action Nipux will take or how to run it from the Jobs/Status panel. "
 109                "If the operator asks where saved work is, explain that artifacts and history are visible from the Jobs/Status panel or direct CLI commands. "
 110                "Do not start replies with an introduction. Keep replies concise and useful."
 111            ),
 112        },
 113        {
 114            "role": "user",
 115            "content": (
 116                f"Job title: {job['title']}\n"
 117                f"Job status: {job['status']}\n"
 118                f"Kind: {job['kind']}\n"
 119                f"Objective: {job['objective']}\n\n"
 120                f"{section_text}\n\n"
 121                f"Operator message:\n{message}"
 122            ),
 123        },
 124    ]
 125
 126
 127def _durable_outcome_events(db: AgentDB, job_id: str) -> list[dict[str, Any]]:
 128    durable_events = db.list_events(job_id=job_id, limit=160, event_types=SUMMARY_EVENT_TYPES)
 129    tool_events = [
 130        event
 131        for event in db.list_events(job_id=job_id, limit=80, event_types=SUMMARY_TOOL_EVENT_TYPES)
 132        if is_summary_event_candidate(event)
 133    ]
 134    merged: dict[str, dict[str, Any]] = {}
 135    for event in [*durable_events, *tool_events]:
 136        event_id = str(event.get("id") or "")
 137        key = event_id or f"{event.get('created_at')}-{event.get('event_type')}-{event.get('title')}-{len(merged)}"
 138        merged[key] = event
 139    return sorted(merged.values(), key=lambda event: (str(event.get("created_at") or ""), str(event.get("id") or "")))
 140
 141
 142def _durable_outcome_lines(events: list[dict[str, Any]]) -> str:
 143    if not events:
 144        return ""
 145    counts = outcome_counts(events, include_research=True, include_failures=True)
 146    lines = [f"- summary: {hourly_outcome_summary(counts)}"]
 147    seen: set[str] = set()
 148    for event in reversed(events):
 149        parsed = model_update_event_parts(event, width=240, compact=False)
 150        if not parsed:
 151            continue
 152        label, text, _clock = parsed
 153        if label in {"DONE", "PLAN", "UPDATE"}:
 154            continue
 155        key = f"{label}:{text}"
 156        if key in seen:
 157            continue
 158        seen.add(key)
 159        lines.append(f"- {label.lower()}: {text}")
 160        if len(lines) >= 9:
 161            break
 162    return "\n".join(lines)
 163
 164
 165def _job_list_lines(jobs: list[dict[str, Any]], *, focused_job_id: str) -> str:
 166    lines: list[str] = []
 167    for index, entry in enumerate(jobs, start=1):
 168        marker = "*" if str(entry.get("id") or "") == focused_job_id else "-"
 169        title = entry.get("title") or entry.get("id") or "untitled"
 170        objective = " ".join(str(entry.get("objective") or "").split())
 171        if len(objective) > 120:
 172            objective = objective[:119].rstrip() + "..."
 173        lines.append(
 174            f"{marker} {index}. {title} status={entry.get('status') or 'unknown'} "
 175            f"kind={entry.get('kind') or 'generic'} objective={objective}"
 176        )
 177    return "\n".join(lines)
 178
 179
 180def _roadmap_lines(roadmap: dict[str, Any]) -> str:
 181    if not roadmap:
 182        return ""
 183    milestones = roadmap.get("milestones") if isinstance(roadmap.get("milestones"), list) else []
 184    body = "\n".join(
 185        (
 186            f"- {entry.get('status') or 'planned'} validation={entry.get('validation_status') or 'not_started'} "
 187            f"p={entry.get('priority') or 0}: {entry.get('title')}"
 188        )
 189        for entry in milestones[-8:]
 190        if isinstance(entry, dict)
 191    )
 192    header = (
 193        f"{roadmap.get('status') or 'planned'}: {roadmap.get('title') or 'Roadmap'}"
 194        + (f" current={roadmap.get('current_milestone')}" if roadmap.get("current_milestone") else "")
 195    )
 196    return f"{header}\n{body}".strip()
 197
 198
 199def _experiment_line(entry: dict[str, Any]) -> str:
 200    if entry.get("metric_value") is None:
 201        return f"- {entry.get('status') or 'planned'}: {entry.get('title')}"
 202    metric = format_metric_value(
 203        entry.get("metric_name") or "metric",
 204        entry.get("metric_value"),
 205        entry.get("metric_unit") or "",
 206    )
 207    return (
 208        f"- {entry.get('status') or 'planned'}: {entry.get('title')}"
 209        f" {metric}"
 210        f"{' best' if entry.get('best_observed') else ''}"
 211    )
 212
 213
 214def _empty_section_text(title: str) -> str:
 215    return "None." if title.startswith("Recent operator") else "None yet."
 216
 217
 218def _clip_chat_context(value: str, limit: int) -> str:
 219    text = str(value or "")
 220    if len(text) <= limit:
 221        return text
 222    marker = f"\n... clipped {len(text) - limit} chars from this visible state section ..."
 223    return text[: max(0, limit - len(marker))].rstrip() + marker
nipux_cli/chat_controller.py 173 lines
   1"""Chat-controller behavior shared by the interactive CLI."""
   2
   3from __future__ import annotations
   4
   5from dataclasses import dataclass
   6from typing import Any, Callable
   7
   8from nipux_cli.chat_intent import (
   9    chat_control_command,
  10    extract_job_objective_from_message,
  11    message_requests_immediate_run,
  12    message_requests_queued_job,
  13)
  14
  15
  16@dataclass(frozen=True)
  17class ChatControllerDeps:
  18    db_factory: Callable[[], tuple[Any, Any]]
  19    reply_fn: Callable[[str, str], Any]
  20    create_job: Callable[..., tuple[str, str]]
  21    write_shell_state: Callable[[dict[str, Any]], None]
  22    start_daemon: Callable[..., Any]
  23    capture_command: Callable[[str, str], tuple[bool, str]]
  24    compact_command_output: Callable[[str], list[str]]
  25    friendly_error_text: Callable[[str], str]
  26
  27
  28def handle_chat_message(
  29    job_id: str,
  30    line: str,
  31    *,
  32    deps: ChatControllerDeps,
  33    reply_fn: Callable[[str, str], Any] | None = None,
  34    quiet: bool = False,
  35) -> tuple[bool, str]:
  36    reply_callable = reply_fn or deps.reply_fn
  37    spawned = maybe_spawn_job_from_chat(job_id, line, deps=deps, quiet=quiet)
  38    if spawned:
  39        return True, spawned
  40    controlled = handle_chat_control_intent(job_id, line, deps=deps, quiet=quiet)
  41    if controlled is not None:
  42        return controlled
  43    queue_chat_note(job_id, line, deps=deps, mode="steer", quiet=quiet)
  44    try:
  45        reply = reply_callable(job_id, line)
  46    except Exception as exc:
  47        detail = deps.friendly_error_text(f"{type(exc).__name__}: {exc}")
  48        message = f"{detail}; message saved for the worker"
  49        if not quiet:
  50            print(detail)
  51            print("Your message is still saved for the next worker step.")
  52        return True, message
  53    reply_text, reply_metadata = chat_reply_text_and_metadata(reply)
  54    if reply_text.strip():
  55        db, _config = deps.db_factory()
  56        try:
  57            if reply_metadata:
  58                db.append_event(
  59                    job_id,
  60                    event_type="loop",
  61                    title="message_end",
  62                    body=reply_text[:1000],
  63                    metadata={"source": "chat", "tool_calls": [], **reply_metadata},
  64                )
  65            db.append_agent_update(job_id, reply_text.strip(), category="chat")
  66        finally:
  67            db.close()
  68        if not quiet:
  69            print()
  70            print(reply_text.strip())
  71            print()
  72        return True, ""
  73    message = "model returned an empty reply; message is queued"
  74    if not quiet:
  75        print("model returned an empty reply; your message is still queued.")
  76    return True, message
  77
  78
  79def chat_reply_text_and_metadata(reply: Any) -> tuple[str, dict[str, Any]]:
  80    content = getattr(reply, "content", None)
  81    if content is None:
  82        return str(reply), {}
  83    metadata: dict[str, Any] = {}
  84    usage = getattr(reply, "usage", None)
  85    if isinstance(usage, dict) and usage:
  86        metadata["usage"] = usage
  87    model = getattr(reply, "model", "")
  88    if model:
  89        metadata["model"] = model
  90    response_id = getattr(reply, "response_id", "")
  91    if response_id:
  92        metadata["response_id"] = response_id
  93    return str(content), metadata
  94
  95
  96def handle_chat_control_intent(
  97    job_id: str,
  98    line: str,
  99    *,
 100    deps: ChatControllerDeps,
 101    quiet: bool = False,
 102) -> tuple[bool, str] | None:
 103    command = chat_control_command(line)
 104    if not command:
 105        return None
 106    keep_running, output = deps.capture_command(job_id, command)
 107    compact = deps.compact_command_output(output)
 108    message = " | ".join(compact[-4:]) if compact else f"{command.lstrip('/')} done"
 109    if not quiet:
 110        print(message)
 111    return keep_running, message
 112
 113
 114def maybe_spawn_job_from_chat(
 115    job_id: str,
 116    message: str,
 117    *,
 118    deps: ChatControllerDeps,
 119    quiet: bool = False,
 120) -> str:
 121    objective = extract_job_objective_from_message(message)
 122    if not objective:
 123        return ""
 124    created_id, title = deps.create_job(objective=objective, title=None, kind="generic", cadence=None)
 125    deps.write_shell_state({"focus_job_id": created_id})
 126    db, _config = deps.db_factory()
 127    try:
 128        db.append_operator_message(created_id, message, source="chat", mode="steer")
 129        run_now = not message_requests_queued_job(message) or message_requests_immediate_run(message)
 130        update = "Created this job from chat and drafted its initial plan."
 131        if run_now:
 132            update += " Starting the daemon so it can begin work."
 133        else:
 134            update += " Use the right pane to run it."
 135        db.append_agent_update(created_id, update, category="chat")
 136        db.append_agent_update(
 137            job_id,
 138            f"Created job '{title}' from your chat request and switched focus to it.",
 139            category="chat",
 140        )
 141    finally:
 142        db.close()
 143    run_now = not message_requests_queued_job(message) or message_requests_immediate_run(message)
 144    text = f"Created job: {title}. Focus switched to it."
 145    if run_now:
 146        started = deps.start_daemon(poll_seconds=0.0, quiet=True)
 147        if started is False:
 148            text += " Worker is waiting for a working model."
 149        else:
 150            text += " Started worker."
 151    if not quiet:
 152        print(text)
 153    return text
 154
 155
 156def queue_chat_note(
 157    job_id: str,
 158    message: str,
 159    *,
 160    deps: ChatControllerDeps,
 161    mode: str = "steer",
 162    quiet: bool = False,
 163) -> None:
 164    db, _config = deps.db_factory()
 165    try:
 166        entry = db.append_operator_message(job_id, message, source="chat", mode=mode)
 167        if not quiet:
 168            if entry.get("mode") == "follow_up":
 169                print(f"waiting after current branch: {entry['message']}")
 170            else:
 171                print(f"waiting: {entry['message']}")
 172    finally:
 173        db.close()
nipux_cli/chat_frame_runtime.py 571 lines
   1"""Terminal chat-frame runtime helpers."""
   2
   3from __future__ import annotations
   4
   5import queue
   6import select
   7import shutil
   8import sys
   9import termios
  10import threading
  11import time
  12import tty
  13from dataclasses import dataclass
  14from typing import Callable
  15from typing import Any
  16
  17from nipux_cli.settings import inline_setting_notice
  18from nipux_cli.tui_commands import CHAT_SLASH_COMMANDS, autocomplete_slash, cycle_slash, slash_completion_for_submit
  19from nipux_cli.tui_input import (
  20    decode_terminal_escape,
  21    drain_pending_input,
  22    read_escape_sequence,
  23    read_terminal_char,
  24)
  25from nipux_cli.tui_outcomes import CHAT_RIGHT_PAGES
  26from nipux_cli.tui_style import _frame_enter_sequence, _frame_exit_sequence, _one_line, _strip_ansi
  27
  28
  29IDLE_REFRESH_SECONDS = 0.75
  30ACTIVE_INPUT_REFRESH_SECONDS = 2.0
  31THINKING_REFRESH_SECONDS = 0.18
  32WORKSPACE_CHAT_ID = "__workspace__"
  33THINKING_NOTICE = "__nipux_thinking__"
  34THINKING_FRAMES = ("◐ thinking", "◓ thinking", "◑ thinking", "◒ thinking")
  35WAITING_NOTICE = "__nipux_waiting__"
  36WAITING_FRAMES = ("∙ waiting", "· waiting", "• waiting", "· waiting")
  37
  38
  39@dataclass(frozen=True)
  40class ChatFrameDeps:
  41    load_snapshot: Callable[[str, int], dict[str, Any]]
  42    render_frame: Callable[[dict[str, Any], str, list[str], str, int, str | None, str | None, str], str]
  43    handle_chat_message: Callable[[str, str], tuple[bool, str]]
  44    capture_chat_command: Callable[[str, str], tuple[bool, str]]
  45    write_shell_state: Callable[[dict[str, str]], None]
  46    is_plain_chat_line: Callable[[str], bool]
  47    page_click: Callable[[int, int, str], str | None]
  48
  49
  50def compact_command_output(output: str) -> list[str]:
  51    lines = [" ".join(line.split()) for line in output.splitlines() if line.strip()]
  52    compacted: list[str] = []
  53    for line in lines:
  54        if line.startswith("\033[2J"):
  55            continue
  56        compacted.append(_one_line(line, 120))
  57    return compacted[-8:]
  58
  59
  60def frame_next_job_id(snapshot: dict[str, Any], current_job_id: str, *, direction: int) -> str | None:
  61    jobs = snapshot.get("jobs")
  62    if not isinstance(jobs, list) or not jobs:
  63        return None
  64    ids = [str(job.get("id")) for job in jobs if job.get("id")]
  65    if not ids:
  66        return None
  67    try:
  68        index = ids.index(str(current_job_id))
  69    except ValueError:
  70        index = 0
  71    return ids[(index + direction) % len(ids)]
  72
  73
  74def next_chat_right_view(current: str, direction: int) -> str:
  75    keys = [key for key, _label in CHAT_RIGHT_PAGES]
  76    try:
  77        index = keys.index(current)
  78    except ValueError:
  79        index = 0
  80    return keys[(index + direction) % len(keys)]
  81
  82
  83def frame_refresh_interval(input_buffer: str, *, thinking: bool = False) -> float:
  84    if thinking:
  85        return THINKING_REFRESH_SECONDS
  86    return ACTIVE_INPUT_REFRESH_SECONDS if input_buffer else IDLE_REFRESH_SECONDS
  87
  88
  89def run_chat_frame(job_id: str, *, history_limit: int, deps: ChatFrameDeps) -> None:
  90    if job_id != WORKSPACE_CHAT_ID:
  91        deps.write_shell_state({"focus_job_id": job_id})
  92    buffer = ""
  93    notices: list[str] = []
  94    right_view = "updates"
  95    modal_view: str | None = None
  96    selected_control = 0
  97    editing_field: str | None = None
  98    async_messages: queue.Queue[str] = queue.Queue()
  99    snapshot = deps.load_snapshot(job_id, history_limit)
 100    job_id = str(snapshot["job_id"])
 101    old_attrs = termios.tcgetattr(sys.stdin)
 102    print(_frame_enter_sequence(), end="", flush=True)
 103    try:
 104        stdin_fd = sys.stdin.fileno()
 105        tty.setcbreak(stdin_fd)
 106        last_snapshot = 0.0
 107        needs_render = True
 108        last_frame = ""
 109        while True:
 110            now = time.monotonic()
 111            if _drain_async_notices(async_messages, notices):
 112                last_snapshot = 0.0
 113                needs_render = True
 114            if now - last_snapshot >= frame_refresh_interval(buffer, thinking=_has_active_state_notice(notices)):
 115                try:
 116                    snapshot = deps.load_snapshot(job_id, history_limit)
 117                    job_id = str(snapshot["job_id"])
 118                    last_snapshot = now
 119                    needs_render = True
 120                except Exception as exc:
 121                    _append_notice(notices, f"frame refresh failed: {type(exc).__name__}")
 122            if needs_render:
 123                selected_control = 0
 124                last_frame = _safe_render_frame(
 125                    deps,
 126                    snapshot=snapshot,
 127                    buffer=buffer,
 128                    notices=notices,
 129                    right_view=right_view,
 130                    selected_control=selected_control,
 131                    editing_field=editing_field,
 132                    modal_view=modal_view,
 133                    previous_frame=last_frame,
 134                )
 135                needs_render = False
 136            try:
 137                readable, _, _ = select.select([stdin_fd], [], [], 0.05)
 138            except OSError as exc:
 139                _append_notice(notices, f"terminal read failed: {type(exc).__name__}: {_one_line(exc, 90)}")
 140                needs_render = True
 141                continue
 142            if not readable:
 143                continue
 144            try:
 145                char = read_terminal_char(stdin_fd)
 146            except OSError as exc:
 147                _append_notice(notices, f"terminal input failed: {type(exc).__name__}: {_one_line(exc, 90)}")
 148                needs_render = True
 149                continue
 150            if editing_field is not None:
 151                try:
 152                    buffer, editing_field, should_exit = _handle_edit_input(
 153                        char,
 154                        buffer=buffer,
 155                        editing_field=editing_field,
 156                        notices=notices,
 157                        stdin_fd=stdin_fd,
 158                    )
 159                except Exception as exc:
 160                    buffer = ""
 161                    editing_field = None
 162                    _append_notice(notices, f"edit failed: {type(exc).__name__}: {_one_line(exc, 90)}")
 163                    needs_render = True
 164                    continue
 165                if should_exit:
 166                    return
 167                needs_render = True
 168                continue
 169            if char in {"\r", "\n"}:
 170                buffer, should_submit = slash_completion_for_submit(buffer, CHAT_SLASH_COMMANDS)
 171                if not should_submit:
 172                    needs_render = True
 173                    continue
 174                try:
 175                    keep_running, snapshot, job_id, notices, right_view, modal_view = _handle_chat_submit(
 176                        buffer,
 177                        job_id=job_id,
 178                        history_limit=history_limit,
 179                        snapshot=snapshot,
 180                        notices=notices,
 181                        right_view=right_view,
 182                        modal_view=modal_view,
 183                        deps=deps,
 184                        async_messages=async_messages,
 185                    )
 186                except Exception as exc:
 187                    keep_running = True
 188                    _append_notice(notices, f"submit failed: {type(exc).__name__}: {_one_line(exc, 100)}")
 189                buffer = ""
 190                needs_render = True
 191                if not keep_running:
 192                    return
 193                continue
 194            if char in {"\x04"}:
 195                return
 196            if char == "\x03":
 197                buffer = ""
 198                _append_notice(notices, "cancelled input")
 199                needs_render = True
 200                continue
 201            if char == "\x15":
 202                buffer = ""
 203                needs_render = True
 204                continue
 205            if char in {"\x7f", "\b"}:
 206                buffer = buffer[:-1]
 207                needs_render = True
 208                continue
 209            if char == "\t":
 210                try:
 211                    buffer = autocomplete_slash(buffer, CHAT_SLASH_COMMANDS)
 212                except Exception as exc:
 213                    _append_notice(notices, f"autocomplete failed: {type(exc).__name__}: {_one_line(exc, 90)}")
 214                needs_render = True
 215                continue
 216            if char == "\x1b":
 217                try:
 218                    snapshot, job_id, right_view, modal_view, buffer = _handle_chat_escape(
 219                        stdin_fd,
 220                        snapshot=snapshot,
 221                        job_id=job_id,
 222                        history_limit=history_limit,
 223                        right_view=right_view,
 224                        modal_view=modal_view,
 225                        buffer=buffer,
 226                        notices=notices,
 227                        deps=deps,
 228                    )
 229                except Exception as exc:
 230                    modal_view = None
 231                    _append_notice(notices, f"navigation failed: {type(exc).__name__}: {_one_line(exc, 90)}")
 232                needs_render = True
 233                continue
 234            if char.isprintable():
 235                buffer += char
 236                needs_render = True
 237    except KeyboardInterrupt:
 238        return
 239    finally:
 240        termios.tcsetattr(sys.stdin, termios.TCSADRAIN, old_attrs)
 241        print(_frame_exit_sequence(), flush=True)
 242
 243
 244def emit_frame_if_changed(frame: str, previous_frame: str = "") -> str:
 245    if frame != previous_frame:
 246        if not previous_frame:
 247            print("\033[H" + frame, end="", flush=True)
 248        else:
 249            print(_diff_frame_update(frame, previous_frame), end="", flush=True)
 250    return frame
 251
 252
 253def _safe_render_frame(
 254    deps: ChatFrameDeps,
 255    *,
 256    snapshot: dict[str, Any],
 257    buffer: str,
 258    notices: list[str],
 259    right_view: str,
 260    selected_control: int,
 261    editing_field: str | None,
 262    modal_view: str | None,
 263    previous_frame: str,
 264) -> str:
 265    try:
 266        return deps.render_frame(
 267            snapshot,
 268            buffer,
 269            _display_notices(notices),
 270            right_view,
 271            selected_control,
 272            editing_field,
 273            modal_view,
 274            previous_frame,
 275        )
 276    except Exception as exc:
 277        _append_notice(notices, f"render failed: {type(exc).__name__}: {_one_line(exc, 100)}")
 278        frame = _fallback_chat_frame(snapshot=snapshot, buffer=buffer, notices=notices)
 279        print("\033[H" + frame, end="", flush=True)
 280        return frame
 281
 282
 283def _fallback_chat_frame(*, snapshot: dict[str, Any], buffer: str, notices: list[str]) -> str:
 284    width, height = shutil.get_terminal_size((100, 30))
 285    width = max(60, width)
 286    job = snapshot.get("job") if isinstance(snapshot.get("job"), dict) else {}
 287    title = str(job.get("title") or snapshot.get("job_id") or "Nipux")
 288    lines = [
 289        _fit_plain("NIPUX - safe mode", width),
 290        _fit_plain("=" * width, width),
 291        _fit_plain(f"Job: {title}", width),
 292        _fit_plain("A UI render error was caught. You can keep typing; /exit leaves.", width),
 293        "",
 294        "Recent notices:",
 295    ]
 296    lines.extend(f"- {_one_line(notice, width - 3)}" for notice in notices[-8:])
 297    lines.extend(["", f"> {_one_line(buffer, width - 3)}"])
 298    return "\n".join(_fit_plain(line, width) for line in lines[:height])
 299
 300
 301def _diff_frame_update(frame: str, previous_frame: str) -> str:
 302    current_lines = frame.splitlines()
 303    previous_lines = previous_frame.splitlines()
 304    output: list[str] = []
 305    max_lines = max(len(current_lines), len(previous_lines))
 306    for index in range(max_lines):
 307        current = current_lines[index] if index < len(current_lines) else ""
 308        previous = previous_lines[index] if index < len(previous_lines) else ""
 309        if current == previous:
 310            continue
 311        output.append(f"\033[{index + 1};1H\033[2K{current}")
 312    return "".join(output)
 313
 314
 315def _fit_plain(text: Any, width: int) -> str:
 316    content = _strip_ansi(str(text))
 317    if len(content) > width:
 318        content = _one_line(content, width)
 319    return content + " " * max(0, width - len(content))
 320
 321
 322def _append_notice(notices: list[str], message: str, *, limit: int = 12) -> None:
 323    notices.append(message)
 324    notices[:] = notices[-limit:]
 325
 326
 327def _append_thinking_notice(notices: list[str]) -> None:
 328    if not _has_thinking_notice(notices):
 329        _append_notice(notices, THINKING_NOTICE)
 330
 331
 332def _append_waiting_notice(notices: list[str]) -> None:
 333    if not _has_waiting_notice(notices):
 334        _append_notice(notices, WAITING_NOTICE)
 335
 336
 337def _has_thinking_notice(notices: list[str]) -> bool:
 338    return any(notice == THINKING_NOTICE or notice.startswith(f"{THINKING_NOTICE}:") for notice in notices)
 339
 340
 341def _has_waiting_notice(notices: list[str]) -> bool:
 342    return any(notice == WAITING_NOTICE or notice.startswith(f"{WAITING_NOTICE}:") for notice in notices)
 343
 344
 345def _has_active_state_notice(notices: list[str]) -> bool:
 346    return _has_thinking_notice(notices) or _has_waiting_notice(notices)
 347
 348
 349def _clear_thinking_notices(notices: list[str]) -> None:
 350    notices[:] = [
 351        notice
 352        for notice in notices
 353        if notice != THINKING_NOTICE and not notice.startswith(f"{THINKING_NOTICE}:")
 354    ]
 355
 356
 357def _display_notices(notices: list[str]) -> list[str]:
 358    if not notices:
 359        return []
 360    index = int(time.monotonic() / THINKING_REFRESH_SECONDS)
 361    thinking_frame = THINKING_FRAMES[index % len(THINKING_FRAMES)]
 362    waiting_frame = WAITING_FRAMES[index % len(WAITING_FRAMES)]
 363    rendered = []
 364    for notice in notices:
 365        if notice == THINKING_NOTICE:
 366            rendered.append(f"{THINKING_NOTICE}:{thinking_frame}")
 367        elif notice == WAITING_NOTICE:
 368            rendered.append(f"{WAITING_NOTICE}:{waiting_frame}")
 369        else:
 370            rendered.append(notice)
 371    return rendered
 372
 373
 374def _handle_edit_input(
 375    char: str,
 376    *,
 377    buffer: str,
 378    editing_field: str,
 379    notices: list[str],
 380    stdin_fd: int,
 381) -> tuple[str, str | None, bool]:
 382    if char in {"\r", "\n"}:
 383        _append_notice(notices, inline_setting_notice(editing_field, buffer))
 384        return "", None, False
 385    if char in {"\x04"}:
 386        return buffer, editing_field, True
 387    if char == "\x03":
 388        _append_notice(notices, "cancelled edit")
 389        return "", None, False
 390    if char == "\x15":
 391        return "", editing_field, False
 392    if char in {"\x7f", "\b"}:
 393        return buffer[:-1], editing_field, False
 394    if char == "\x1b":
 395        key, _payload = decode_terminal_escape(read_escape_sequence(char, fd=stdin_fd))
 396        if key == "unknown":
 397            _append_notice(notices, "cancelled edit")
 398            return "", None, False
 399        return buffer, editing_field, False
 400    if char.isprintable():
 401        return buffer + char, editing_field, False
 402    return buffer, editing_field, False
 403
 404
 405def _handle_chat_submit(
 406    buffer: str,
 407    *,
 408    job_id: str,
 409    history_limit: int,
 410    snapshot: dict[str, Any],
 411    notices: list[str],
 412    right_view: str,
 413    modal_view: str | None,
 414    deps: ChatFrameDeps,
 415    async_messages: queue.Queue[str] | None = None,
 416) -> tuple[bool, dict[str, Any], str, list[str], str, str | None]:
 417    line = buffer.strip()
 418    if not line:
 419        return True, snapshot, job_id, notices, right_view, modal_view
 420    if line in {"clear", "/clear"}:
 421        notices.clear()
 422        return True, snapshot, job_id, notices, right_view, None
 423    if line in {"settings", "/settings"}:
 424        _append_notice(notices, "opened settings")
 425        return True, snapshot, job_id, notices, right_view, "settings"
 426    if line in {"jobs", "/jobs", "status", "/status"}:
 427        _append_notice(notices, "opened jobs")
 428        return True, snapshot, job_id, notices, "status", None
 429    if line in {"outcomes", "/outcomes", "updates", "/updates"}:
 430        _append_notice(notices, "opened outcomes")
 431        return True, snapshot, job_id, notices, "updates", None
 432    keep_running = True
 433    try:
 434        if deps.is_plain_chat_line(line):
 435            _append_thinking_notice(notices)
 436            _start_chat_message_worker(
 437                job_id,
 438                line,
 439                deps=deps,
 440                async_messages=async_messages,
 441            )
 442            modal_view = None
 443        else:
 444            _append_notice(notices, f"> {line}")
 445            keep_running, output = deps.capture_chat_command(job_id, line)
 446            output_lines = compact_command_output(output)
 447            if output_lines:
 448                output_text = "\n".join(output_lines)
 449                if _looks_like_waiting_output(output_text):
 450                    _append_waiting_notice(notices)
 451                else:
 452                    _append_notice(notices, output_text)
 453            if line.startswith(("/model", "/base-url", "/api-key", "/api-key-env", "/context", "/input-cost", "/output-cost", "/timeout", "/home", "/step-limit", "/output-chars", "/daily-digest", "/digest-time", "/config")):
 454                modal_view = "settings"
 455            else:
 456                modal_view = None
 457    except Exception as exc:
 458        _append_notice(notices, f"message failed: {type(exc).__name__}: {_one_line(exc, 120)}")
 459    try:
 460        refresh_job_id = _post_submit_snapshot_job_id(line, job_id)
 461        snapshot = deps.load_snapshot(refresh_job_id, history_limit)
 462        job_id = str(snapshot["job_id"])
 463    except Exception as exc:
 464        _append_notice(notices, f"refresh failed after message: {type(exc).__name__}: {_one_line(exc, 100)}")
 465    return keep_running, snapshot, job_id, notices, right_view, modal_view
 466
 467
 468def _post_submit_snapshot_job_id(line: str, current_job_id: str) -> str:
 469    """Return the job id to refresh after a submitted command or message."""
 470
 471    text = line.strip()
 472    if not text.startswith("/"):
 473        return current_job_id
 474    command = text[1:].split(maxsplit=1)[0].lower()
 475    if command in {"new", "focus", "switch"}:
 476        return WORKSPACE_CHAT_ID if current_job_id == WORKSPACE_CHAT_ID else ""
 477    return current_job_id
 478
 479
 480def _start_chat_message_worker(
 481    job_id: str,
 482    line: str,
 483    *,
 484    deps: ChatFrameDeps,
 485    async_messages: queue.Queue[str] | None,
 486) -> None:
 487    def run() -> None:
 488        try:
 489            deps.handle_chat_message(job_id, line)
 490            if async_messages is not None:
 491                async_messages.put("__refresh__")
 492        except Exception as exc:
 493            if async_messages is not None:
 494                async_messages.put(f"message failed: {type(exc).__name__}: {_one_line(exc, 120)}")
 495
 496    thread = threading.Thread(target=run, name="nipux-chat-submit", daemon=True)
 497    thread.start()
 498
 499
 500def _drain_async_notices(async_messages: queue.Queue[str], notices: list[str]) -> bool:
 501    changed = False
 502    while True:
 503        try:
 504            message = async_messages.get_nowait()
 505        except queue.Empty:
 506            return changed
 507        if message:
 508            if message == "__refresh__":
 509                _clear_thinking_notices(notices)
 510                changed = True
 511                continue
 512            _clear_thinking_notices(notices)
 513            if _looks_like_waiting_output(message):
 514                _append_waiting_notice(notices)
 515            else:
 516                _append_notice(notices, message)
 517            changed = True
 518
 519
 520def _looks_like_waiting_output(message: str) -> bool:
 521    normalized = " ".join(str(message or "").lower().split())
 522    if not normalized:
 523        return False
 524    return (
 525        normalized.startswith("waiting:")
 526        or normalized.startswith("waiting for ")
 527        or "waiting for model" in normalized
 528        or "waiting for the next worker step" in normalized
 529        or "message saved for the worker" in normalized
 530    )
 531
 532
 533def _handle_chat_escape(
 534    stdin_fd: int,
 535    *,
 536    snapshot: dict[str, Any],
 537    job_id: str,
 538    history_limit: int,
 539    right_view: str,
 540    modal_view: str | None,
 541    buffer: str,
 542    notices: list[str],
 543    deps: ChatFrameDeps,
 544) -> tuple[dict[str, Any], str, str, str | None, str]:
 545    key, payload = decode_terminal_escape(read_escape_sequence("\x1b", fd=stdin_fd))
 546    if modal_view:
 547        _append_notice(notices, "closed settings")
 548        drain_pending_input(stdin_fd)
 549        return snapshot, job_id, right_view, None, buffer
 550    if key in {"up", "down"} and buffer.startswith("/"):
 551        buffer = cycle_slash(buffer, CHAT_SLASH_COMMANDS, direction=-1 if key == "up" else 1)
 552        return snapshot, job_id, right_view, modal_view, buffer
 553    if key == "right" and not buffer:
 554        return snapshot, job_id, next_chat_right_view(right_view, 1), modal_view, buffer
 555    if key == "left" and not buffer:
 556        return snapshot, job_id, next_chat_right_view(right_view, -1), modal_view, buffer
 557    if key in {"up", "down"} and not buffer:
 558        next_focus = frame_next_job_id(snapshot, job_id, direction=-1 if key == "up" else 1)
 559        if next_focus and next_focus != job_id:
 560            job_id = next_focus
 561            deps.write_shell_state({"focus_job_id": job_id})
 562            snapshot = deps.load_snapshot(job_id, history_limit)
 563            title = snapshot["job"].get("title") or job_id
 564            _append_notice(notices, f"focus {title}")
 565        return snapshot, job_id, right_view, modal_view, buffer
 566    if key == "click" and isinstance(payload, tuple):
 567        clicked_view = deps.page_click(payload[0], payload[1], right_view)
 568        if clicked_view:
 569            return snapshot, job_id, clicked_view, modal_view, buffer
 570    drain_pending_input(stdin_fd)
 571    return snapshot, job_id, right_view, modal_view, buffer
nipux_cli/chat_intent.py 349 lines
   1"""Natural-language intent parsing for Nipux chat and shell control."""
   2
   3from __future__ import annotations
   4
   5import re
   6
   7
   8NATURAL_COMMANDS = {
   9    "tell me updates": "updates",
  10    "show updates": "updates",
  11    "show outcomes": "outcomes",
  12    "show all outcomes": "outcomes all",
  13    "show all accomplishments": "outcomes all",
  14    "show accomplishments": "outcomes",
  15    "what have all jobs done": "outcomes all",
  16    "what has everything done": "outcomes all",
  17    "what did all jobs do": "outcomes all",
  18    "what did it accomplish": "outcomes",
  19    "what has it done": "outcomes",
  20    "what has it done so far": "outcomes",
  21    "what have you done": "outcomes",
  22    "what have you done so far": "outcomes",
  23    "what did it actually do": "outcomes",
  24    "what did the model do": "outcomes",
  25    "show me what it did": "outcomes",
  26    "show history": "history",
  27    "what happened": "history",
  28    "show events": "events",
  29    "what did it find": "updates",
  30    "what did you find": "updates",
  31    "what has it found": "updates",
  32    "findings": "findings",
  33    "tasks": "tasks",
  34    "roadmap": "roadmap",
  35    "show roadmap": "roadmap",
  36    "show artifacts": "artifacts",
  37    "where are artifacts": "artifacts",
  38    "show lessons": "lessons",
  39    "what did it learn": "lessons",
  40    "show findings": "findings",
  41    "show tasks": "tasks",
  42    "show experiments": "experiments",
  43    "show sources": "sources",
  44    "show memory": "memory",
  45    "show metrics": "metrics",
  46    "show usage": "usage",
  47    "show cost": "usage",
  48    "show tokens": "usage",
  49    "show token usage": "usage",
  50    "context usage": "usage",
  51    "token usage": "usage",
  52    "how much did it cost": "usage",
  53    "how many tokens did it use": "usage",
  54    "status": "status",
  55    "check status": "status",
  56    "job status": "status",
  57    "what is going on": "status",
  58    "whats going on": "status",
  59    "what's going on": "status",
  60    "what is happening": "status",
  61    "whats happening": "status",
  62    "what's happening": "status",
  63    "what are you doing": "status",
  64    "what is it doing": "status",
  65    "how is it going": "status",
  66    "how are things going": "status",
  67    "check up on things": "status",
  68    "what is blocking it": "status",
  69    "what's blocking it": "status",
  70    "why is it stuck": "status",
  71    "is it stuck": "status",
  72    "is it running": "health",
  73    "is the daemon running": "health",
  74    "daemon health": "health",
  75    "show health": "health",
  76    "how do i start a job": "help",
  77    "how do i create a job": "help",
  78    "how do i make a job": "help",
  79    "how do i run a job": "help",
  80    "how do i start work": "help",
  81    "how do i use this": "help",
  82    "what can i do": "help",
  83    "show activity": "activity",
  84    "show tool calls": "activity",
  85    "show worker activity": "activity",
  86    "show worker output": "activity",
  87    "show raw work": "outputs",
  88    "show console output": "outputs",
  89    "show logs": "outputs",
  90    "show saved files": "artifacts",
  91    "what did it save": "artifacts",
  92    "what files did it create": "artifacts",
  93    "what outputs did it save": "artifacts",
  94    "what tasks are open": "tasks",
  95    "what is the current task": "tasks",
  96    "show measurements": "experiments",
  97    "show benchmarks": "experiments",
  98    "show milestones": "roadmap",
  99    "show plan": "roadmap",
 100    "show daemon": "health",
 101    "start daemon": "start",
 102    "restart daemon": "restart",
 103}
 104
 105
 106def natural_command_for(text: str) -> str:
 107    return NATURAL_COMMANDS.get(" ".join(text.strip().lower().split()), "")
 108
 109
 110def chat_control_command(line: str) -> str:
 111    text = " ".join(line.strip().split())
 112    if not text:
 113        return ""
 114    lowered = text.lower().rstrip("?.!")
 115    natural = NATURAL_COMMANDS.get(lowered)
 116    if natural:
 117        return f"/{natural}"
 118    control_phrase = _looks_like_control_phrase(lowered)
 119    if control_phrase and _mentions_any(lowered, ("token", "cost", "usage", "context window", "context budget")):
 120        return "/usage"
 121    if control_phrase and _mentions_any(lowered, ("tool call", "tool calls", "worker activity", "worker output", "right pane")):
 122        return "/activity"
 123    if control_phrase and _mentions_any(lowered, ("console output", "raw output", "raw run", "raw runs", "log", "logs")):
 124        return "/outputs"
 125    if control_phrase and _mentions_any(lowered, ("saved file", "saved files", "artifact", "artifacts")):
 126        return "/artifacts"
 127    if (
 128        _mentions_any(lowered, ("what did", "what has", "what have", "show me"))
 129        and _mentions_any(lowered, ("made", "created", "saved", "produced", "done", "accomplished"))
 130    ):
 131        return "/outcomes"
 132    if control_phrase and _mentions_any(lowered, ("measurement", "measurements", "experiment", "experiments", "benchmark", "benchmarks")):
 133        return "/experiments"
 134    if control_phrase and _mentions_any(lowered, ("roadmap", "milestone", "milestones", "plan")):
 135        return "/roadmap"
 136    if control_phrase and _mentions_any(lowered, ("task", "tasks", "todo", "to do", "queue")):
 137        return "/tasks"
 138    if control_phrase and _mentions_any(lowered, ("finding", "findings")):
 139        return "/findings"
 140    if control_phrase and _mentions_any(lowered, ("source", "sources")):
 141        return "/sources"
 142    if control_phrase and _mentions_any(lowered, ("lesson", "lessons", "learned")):
 143        return "/lessons"
 144    if control_phrase and _mentions_any(lowered, ("memory", "remembered", "learning state")):
 145        return "/memory"
 146    if lowered in {"start daemon", "launch daemon"}:
 147        return "/start"
 148    if lowered in {"restart daemon", "reload daemon"}:
 149        return "/restart"
 150    if lowered in {"jobs", "show jobs", "list jobs", "switch jobs", "change jobs"}:
 151        return "/jobs"
 152    if lowered in {"settings", "show settings"}:
 153        return "/settings"
 154    if lowered in {"model settings", "change model", "edit settings"}:
 155        return "/model"
 156    if lowered in {
 157        "run",
 158        "start",
 159        "run it",
 160        "start it",
 161        "run job",
 162        "run worker",
 163        "start job",
 164        "start worker",
 165        "start working",
 166        "start work",
 167        "run this",
 168        "run the job",
 169        "run this job",
 170        "start the job",
 171        "start this job",
 172        "continue",
 173        "continue it",
 174        "keep going",
 175        "keep working",
 176        "resume work",
 177    }:
 178        return "/run"
 179    if lowered in {
 180        "pause",
 181        "pause it",
 182        "pause job",
 183        "pause worker",
 184        "pause the job",
 185        "pause work",
 186        "pause this job",
 187        "stop",
 188        "stop it",
 189        "stop job",
 190        "stop worker",
 191        "stop the job",
 192        "stop work",
 193        "stop working",
 194        "stop this job",
 195        "halt",
 196        "halt job",
 197        "halt the job",
 198    }:
 199        return "/pause"
 200    if lowered in {
 201        "resume",
 202        "resume it",
 203        "resume job",
 204        "resume worker",
 205        "resume the job",
 206        "resume this job",
 207        "reopen this job",
 208    }:
 209        return "/resume"
 210    if lowered in {"history", "show history", "timeline", "show timeline"}:
 211        return "/history"
 212    if lowered in {
 213        "all outcomes",
 214        "show all outcomes",
 215        "show all accomplishments",
 216        "what have all jobs done",
 217        "what has everything done",
 218        "what did all jobs do",
 219    }:
 220        return "/outcomes all"
 221    if lowered in {
 222        "outcomes",
 223        "show outcomes",
 224        "accomplishments",
 225        "show accomplishments",
 226        "what has it done",
 227        "what has it done so far",
 228        "what have you done",
 229        "what have you done so far",
 230        "what did it actually do",
 231        "what did the model do",
 232        "show me what it did",
 233    }:
 234        return "/outcomes"
 235    if lowered in {"artifacts", "outputs", "saved outputs", "show artifacts", "show outputs"}:
 236        return "/artifacts"
 237    if lowered in {"memory", "show memory", "learning", "show learning"}:
 238        return "/memory"
 239    return ""
 240
 241
 242def _mentions_any(text: str, needles: tuple[str, ...]) -> bool:
 243    for needle in needles:
 244        if " " in needle:
 245            if needle in text:
 246                return True
 247            continue
 248        if re.search(rf"\b{re.escape(needle)}\b", text):
 249            return True
 250    return False
 251
 252
 253def _looks_like_control_phrase(text: str) -> bool:
 254    return text.startswith(
 255        (
 256            "show ",
 257            "view ",
 258            "open ",
 259            "list ",
 260            "display ",
 261            "give me ",
 262            "where ",
 263            "what ",
 264            "how ",
 265            "is ",
 266            "are ",
 267            "check ",
 268        )
 269    )
 270
 271
 272def message_requests_immediate_run(message: str) -> bool:
 273    lowered = " ".join(message.strip().lower().split())
 274    if message_requests_queued_job(message):
 275        return False
 276    if re.match(r"^(?:please\s+)?(?:start|launch|run|spin\s+off|spin\s+up)\b", lowered):
 277        return True
 278    return bool(re.search(r"\b(?:and|then)\s+(?:start|launch|run|resume)\s+(?:it|the\s+job|work)?\b", lowered))
 279
 280
 281def message_requests_queued_job(message: str) -> bool:
 282    lowered = " ".join(message.strip().lower().split())
 283    return bool(
 284        re.search(
 285            r"\b(?:queue only|plan only|create only|do not start|don't start|do not run|don't run|without starting)\b",
 286            lowered,
 287        )
 288    )
 289
 290
 291def extract_job_objective_from_message(message: str) -> str:
 292    text = " ".join(message.strip().split())
 293    if not text:
 294        return ""
 295    lowered = text.lower()
 296    patterns = [
 297        r"^(?:please\s+)?(?:create|start|spin\s+off|make|launch)\s+(?:a\s+)?(?:new\s+)?job\s+(?:to|for|that|which)?\s*(.+)$",
 298        r"^(?:please\s+)?(?:create|start|spin\s+off|spin\s+up|make|launch|run)\s+(?:a\s+|an\s+)?(?:new\s+)?(?:worker|agent|task)\s+(?:to|for|that|which)?\s*(.+)$",
 299        r"^(?:please\s+)?(?:send|queue)\s+(?:off\s+)?(?:a\s+)?(?:new\s+)?job\s+(?:to|for|that|which)?\s*(.+)$",
 300        r"^(?:please\s+)?(?:new|job)\s+(.+)$",
 301        r"^(?:please\s+)?(?:start|run|launch)\s+(?!daemon\b|it\b|this\b|that\b|the\s+job\b|the\s+worker\b|job\b|worker\b|work\b)(.+)$",
 302        r"^(?:please\s+)?(?:can\s+you|could\s+you|i\s+need\s+you\s+to|i\s+want\s+you\s+to)\s+(.+)$",
 303    ]
 304    for pattern in patterns:
 305        match = re.match(pattern, text, flags=re.IGNORECASE)
 306        if match:
 307            objective = match.group(1).strip(" .")
 308            return objective if looks_like_job_objective(objective) else ""
 309    if looks_like_job_objective(text) and not looks_like_smalltalk(lowered):
 310        return text
 311    return ""
 312
 313
 314def looks_like_smalltalk(lowered: str) -> bool:
 315    return lowered in {"hi", "hello", "hey", "yo", "sup", "thanks", "thank you"} or lowered.endswith("?")
 316
 317
 318def looks_like_job_objective(text: str) -> bool:
 319    lowered = text.lower()
 320    if len(text.split()) < 3:
 321        return False
 322    action_words = {
 323        "research",
 324        "monitor",
 325        "optimize",
 326        "build",
 327        "find",
 328        "test",
 329        "deploy",
 330        "fix",
 331        "write",
 332        "analyze",
 333        "audit",
 334        "track",
 335        "benchmark",
 336        "create",
 337        "document",
 338        "draft",
 339        "generate",
 340        "scrape",
 341        "produce",
 342        "watch",
 343        "automate",
 344        "summarize",
 345        "compare",
 346        "investigate",
 347        "improve",
 348    }
 349    return any(re.search(rf"\b{re.escape(word)}\b", lowered) for word in action_words)
nipux_cli/chat_tui.py 277 lines
   1"""Chat workspace terminal frame rendering."""
   2
   3from __future__ import annotations
   4
   5from typing import Any
   6
   7from nipux_cli.config import load_config
   8from nipux_cli.first_run_tui import first_run_themed_lines
   9from nipux_cli.settings import edit_target_hint, edit_target_label, edit_target_masks_input
  10from nipux_cli.tui_commands import CHAT_SLASH_COMMANDS, slash_suggestion_lines
  11from nipux_cli.tui_event_format import clean_step_summary
  12from nipux_cli.tui_events import chat_pane_lines
  13from nipux_cli.tui_layout import _compose_bar, _top_bar
  14from nipux_cli.tui_outcomes import chat_updates_pane_lines
  15from nipux_cli.tui_status import (
  16    job_display_state,
  17    right_pane_lines,
  18    worker_label,
  19)
  20from nipux_cli.tui_style import _accent, _bold, _fit_ansi, _muted, _one_line, _strip_ansi
  21
  22
  23def build_chat_frame(
  24    snapshot: dict[str, Any],
  25    input_buffer: str,
  26    notices: list[str],
  27    *,
  28    width: int,
  29    height: int,
  30    right_view: str = "updates",
  31    selected_control: int = 0,
  32    editing_field: str | None = None,
  33    modal_view: str | None = None,
  34) -> str:
  35    del selected_control
  36    if right_view == "work":
  37        right_view = "updates"
  38    width = max(92, width)
  39    height = max(22, height)
  40    job = snapshot["job"]
  41    right_job = snapshot.get("right_job") if isinstance(snapshot.get("right_job"), dict) else job
  42    jobs = snapshot["jobs"]
  43    steps = snapshot["steps"]
  44    artifacts = snapshot["artifacts"]
  45    job_id = str(snapshot["job_id"])
  46    right_job_id = str(snapshot.get("right_job_id") or job_id)
  47    job_artifacts = snapshot.get("job_artifacts") if isinstance(snapshot.get("job_artifacts"), dict) else {}
  48    if artifacts:
  49        job_artifacts.setdefault(job_id, artifacts)
  50    job_summary_events = snapshot.get("job_summary_events") if isinstance(snapshot.get("job_summary_events"), dict) else {}
  51    job_counts = snapshot.get("job_counts") if isinstance(snapshot.get("job_counts"), dict) else {}
  52    memory_entries = snapshot["memory_entries"]
  53    events = snapshot["events"]
  54    summary_events = snapshot.get("summary_events") if isinstance(snapshot.get("summary_events"), list) else events
  55    daemon = snapshot["daemon"]
  56    model = str(snapshot["model"])
  57    base_url = str(snapshot.get("base_url") or "")
  58    token_usage = snapshot.get("token_usage") if isinstance(snapshot.get("token_usage"), dict) else {}
  59    context_length = int(snapshot.get("context_length") or 0)
  60    counts = snapshot.get("counts") if isinstance(snapshot.get("counts"), dict) else {}
  61    findings = _metadata_records(right_job, "finding_ledger")
  62    sources = _metadata_records(right_job, "source_ledger")
  63    tasks = _metadata_records(right_job, "task_queue")
  64    experiments = _metadata_records(right_job, "experiment_ledger")
  65    lessons = _metadata_records(right_job, "lessons")
  66    roadmap = right_job.get("metadata", {}).get("roadmap") if isinstance(right_job.get("metadata"), dict) else {}
  67    milestones = roadmap.get("milestones") if isinstance(roadmap, dict) and isinstance(roadmap.get("milestones"), list) else []
  68    open_tasks = sum(1 for task in tasks if str(task.get("status") or "open") in {"open", "active"})
  69    state = job_display_state(right_job, bool(daemon["running"]))
  70    worker = worker_label(right_job, bool(daemon["running"]))
  71    latest_step = steps[-1] if steps else None
  72    right_width = min(max(52, int(width * 0.36)), 72)
  73    left_width = max(48, width - right_width - 3)
  74    if left_width < 48:
  75        left_width = 48
  76        right_width = max(34, width - left_width - 3)
  77    latest_text = _step_line(latest_step, chars=right_width - 6) if latest_step else "no worker steps yet"
  78    daemon_text = _daemon_state_line(daemon)
  79    goal_text = " ".join(str(right_job.get("objective") or "").split())
  80    metrics = [
  81        ("actions", counts.get("steps", _step_count(steps))),
  82        ("outputs", counts.get("artifacts", len(artifacts))),
  83        ("findings", len(findings)),
  84        ("sources", len(sources)),
  85        ("tasks", f"{len(tasks)}/{open_tasks} open"),
  86        ("roadmap", len(milestones)),
  87        ("experiments", len(experiments)),
  88        ("lessons", len(lessons)),
  89        ("memory", counts.get("memory", len(memory_entries))),
  90    ]
  91
  92    header = _top_bar(
  93        width,
  94        state=state,
  95        daemon=daemon_text,
  96        model=model,
  97        token_usage=token_usage,
  98        context_length=context_length,
  99        base_url=base_url,
 100    )
 101    if editing_field:
 102        hint = edit_target_hint(editing_field)
 103        prompt_label = edit_target_label(editing_field)
 104    elif not jobs:
 105        hint = "Type a goal to create the first worker  ·  / opens commands  ·  /settings configures"
 106        prompt_label = "❯"
 107    else:
 108        hint = "Enter sends  ·  / opens commands  ·  /settings configures  ·  ←→ updates/jobs"
 109        prompt_label = "❯"
 110    suggestions = [] if editing_field else slash_suggestion_lines(input_buffer, CHAT_SLASH_COMMANDS, width=width)
 111    compose_lines = _compose_bar(
 112        input_buffer,
 113        width=width,
 114        hint=hint,
 115        suggestions=suggestions,
 116        prompt_label=prompt_label,
 117        mask_input=edit_target_masks_input(editing_field),
 118    )
 119    footer_rows = len(compose_lines)
 120    body_rows = max(10, height - len(header) - 1 - footer_rows)
 121    chat_lines = chat_pane_lines(events, notices, width=left_width, rows=body_rows)
 122    if right_view == "updates":
 123        right_lines = chat_updates_pane_lines(
 124            job=right_job,
 125            events=summary_events,
 126            width=right_width,
 127            rows=body_rows,
 128        )
 129        right_title = "Model updates"
 130    else:
 131        right_lines = right_pane_lines(
 132            job=right_job,
 133            jobs=jobs,
 134            job_artifacts=job_artifacts,
 135            job_summary_events=job_summary_events,
 136            job_counts=job_counts,
 137            job_id=right_job_id,
 138            daemon_running=bool(daemon["running"]),
 139            state=state,
 140            worker=worker,
 141            daemon_text=daemon_text,
 142            model=model,
 143            goal_text=goal_text,
 144            latest_text=latest_text,
 145            metrics=metrics,
 146            events=summary_events,
 147            token_usage=token_usage,
 148            context_length=context_length,
 149            width=right_width,
 150            rows=body_rows,
 151            right_view=right_view,
 152        )
 153        right_title = "Jobs"
 154    lines = [*header, _two_col_title(left_width, right_width, "Chat", right_title)]
 155    for index in range(body_rows):
 156        left = chat_lines[index] if index < len(chat_lines) else ""
 157        right = right_lines[index] if index < len(right_lines) else ""
 158        lines.append(_two_col_line(left, right, left_width=left_width, right_width=right_width))
 159    lines.extend(compose_lines)
 160    if len(lines) > height:
 161        keep_top = min(4, len(header) + 1)
 162        keep_bottom = footer_rows
 163        middle_budget = max(0, height - keep_top - keep_bottom)
 164        lines = lines[:keep_top] + lines[-(middle_budget + keep_bottom) : -keep_bottom] + lines[-keep_bottom:]
 165    if modal_view == "settings":
 166        lines = _overlay_settings_modal(lines[:height], width=width, height=height)
 167    return "\n".join(first_run_themed_lines(lines[:height], width=width))
 168
 169
 170def _two_col_title(left_width: int, right_width: int, left: str, right: str) -> str:
 171    return _fit_ansi(_bold(left.upper()), left_width) + _muted(" │ ") + _fit_ansi(_bold(right.upper()), right_width)
 172
 173
 174def _two_col_line(left: str, right: str, *, left_width: int, right_width: int) -> str:
 175    return _fit_ansi(left, left_width) + _muted(" │ ") + _fit_ansi(right, right_width)
 176
 177
 178def _overlay_settings_modal(lines: list[str], *, width: int, height: int) -> list[str]:
 179    config = load_config()
 180    key_state = "set" if config.model.api_key else "missing"
 181    input_cost = _rate_text(config.model.input_cost_per_million)
 182    output_cost = _rate_text(config.model.output_cost_per_million)
 183    cost_limit = "none" if config.runtime.max_job_cost_usd is None else f"${config.runtime.max_job_cost_usd:g}"
 184    content = [
 185        _bold("Model"),
 186        _settings_row("id", config.model.model, "/model MODEL"),
 187        _settings_row("endpoint", config.model.base_url, "/base-url URL"),
 188        _settings_row("key", f"{key_state} in {config.model.api_key_env}", "/api-key KEY"),
 189        _settings_row(
 190            "limits",
 191            f"context {config.model.context_length}, timeout {config.model.request_timeout_seconds:g}s",
 192            "/context TOKENS /timeout SECONDS",
 193        ),
 194        "",
 195        _bold("Runtime"),
 196        _settings_row("home", str(config.runtime.home), "/home PATH"),
 197        _settings_row(
 198            "steps",
 199            f"tool {config.runtime.max_step_seconds}s, preview {config.runtime.artifact_inline_char_limit} chars",
 200            "/step-limit SECONDS /output-chars CHARS",
 201        ),
 202        _settings_row(
 203            "digest",
 204            f"{config.runtime.daily_digest_enabled} at {config.runtime.daily_digest_time}",
 205            "/daily-digest BOOL /digest-time HH:MM",
 206        ),
 207        "",
 208        _bold("Cost"),
 209        _settings_row("rates", f"input {input_cost}, output {output_cost}", "/input-cost DOLLARS /output-cost DOLLARS"),
 210        _settings_row("limit", cost_limit, "/max-cost DOLLARS"),
 211        "",
 212        _muted("Edit with slash commands in the composer. Esc closes."),
 213    ]
 214    box_width = min(max(64, int(width * 0.58)), width - 8)
 215    box_height = min(len(content) + 4, height - 6)
 216    inner = max(20, box_width - 4)
 217    title = f" Settings {_accent('●')} "
 218    rule_width = max(2, box_width - len(_strip_ansi(title)) - 2)
 219    left_rule = max(1, rule_width // 2)
 220    right_rule = max(1, rule_width - left_rule)
 221    top = "╭" + "─" * left_rule + title + "─" * right_rule + "╮"
 222    box = [top]
 223    for item in content[: box_height - 3]:
 224        if item:
 225            box.append("│ " + _fit_ansi(item, inner) + " │")
 226        else:
 227            box.append("│ " + " " * inner + " │")
 228    while len(box) < box_height - 1:
 229        box.append("│ " + " " * inner + " │")
 230    box.append("╰" + "─" * (box_width - 2) + "╯")
 231    output = [_fit_ansi(line, width) for line in lines]
 232    start_y = max(2, (height - len(box)) // 2)
 233    start_x = max(0, (width - box_width) // 2)
 234    for offset, modal_line in enumerate(box):
 235        target = start_y + offset
 236        if target >= len(output):
 237            break
 238        output[target] = _fit_ansi(" " * start_x + modal_line, width)
 239    return output
 240
 241
 242def _settings_row(label: str, value: Any, command: str) -> str:
 243    value_text = _one_line(value, 42)
 244    return f"{_muted(label.ljust(9))} {_bold(value_text)}  {_muted(command)}"
 245
 246
 247def _rate_text(value: float | None) -> str:
 248    return "provider-reported" if value is None else f"${value:g}/1M"
 249
 250
 251def _metadata_records(job: dict[str, Any], key: str) -> list[dict[str, Any]]:
 252    metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
 253    values = metadata.get(key)
 254    if not isinstance(values, list):
 255        return []
 256    return [value for value in values if isinstance(value, dict)]
 257
 258
 259def _step_count(steps: list[dict[str, Any]]) -> int:
 260    numbers = [int(step.get("step_no") or 0) for step in steps]
 261    return max(numbers, default=0)
 262
 263
 264def _step_line(step: dict[str, Any], *, chars: int = 180) -> str:
 265    tool = step.get("tool_name") or step.get("kind") or "-"
 266    summary = clean_step_summary(step.get("summary") or step.get("error") or "-")
 267    error = " ERROR" if step.get("error") else ""
 268    return f"#{step['step_no']:<4} {step['status']:<9} {tool:<18} {_one_line(summary, chars)}{error}"
 269
 270
 271def _daemon_state_line(lock: dict[str, Any]) -> str:
 272    metadata = lock.get("metadata") if isinstance(lock.get("metadata"), dict) else {}
 273    if lock.get("running"):
 274        pid = metadata.get("pid") or "unknown"
 275        stale = " stale-runtime" if lock.get("stale") else ""
 276        return f"running pid={pid}{stale}"
 277    return "ready when work starts"
nipux_cli/cli.py 3188 lines
   1"""Thin CLI for the Nipux agent runtime."""
   2
   3from __future__ import annotations
   4
   5import argparse
   6import json
   7import os
   8import shlex
   9import shutil
  10import subprocess
  11import sys
  12import threading
  13import time
  14from contextlib import redirect_stdout
  15from io import StringIO
  16from pathlib import Path
  17from typing import Any
  18
  19from nipux_cli import __version__
  20from nipux_cli.artifacts import ArtifactStore
  21from nipux_cli.chat_intent import (
  22    chat_control_command,
  23    extract_job_objective_from_message as _extract_job_objective_from_message,
  24    message_requests_immediate_run,
  25    message_requests_queued_job,
  26    natural_command_for,
  27)
  28from nipux_cli.cli_state import (
  29    configured_focus_job_id as _configured_focus_job_id,
  30    default_job_id as _default_job_id,
  31    find_job as _find_job,
  32    clear_model_setup_verified as _clear_model_setup_verified,
  33    mark_model_setup_verified as _mark_model_setup_verified,
  34    model_setup_verified as _model_setup_verified,
  35    read_shell_state as _read_shell_state,
  36    write_shell_state as _write_shell_state,
  37)
  38from nipux_cli.cli_render import (
  39    daemon_event_line as _daemon_event_line,
  40    daemon_state_line as _daemon_state_line,
  41    important_startup_events as _important_startup_events,
  42    job_ref_text as _job_ref_text,
  43    json_default as _json_default,
  44    next_operator_action as _next_operator_action,
  45    note_text as _note_text,
  46    print_artifact as _print_artifact,
  47    print_event_card as _print_event_card,
  48    print_event_details as _print_event_details,
  49    print_jobs_panel as _print_jobs_panel,
  50    print_metric_grid as _print_metric_grid,
  51    print_run as _print_run,
  52    print_step as _print_step,
  53    print_wrapped as _print_wrapped,
  54    public_event as _public_event,
  55    rule as _rule,
  56    section_title as _section_title,
  57    short_path as _short_path,
  58    step_line as _step_line,
  59    terminal_width as _terminal_width,
  60)
  61from nipux_cli.chat_context import build_chat_messages as _build_chat_messages
  62from nipux_cli.chat_commands import ChatCommandDeps, handle_chat_slash_command as _handle_chat_slash_command
  63from nipux_cli.chat_controller import (
  64    ChatControllerDeps,
  65    chat_reply_text_and_metadata as _controller_reply_text_and_metadata,
  66    handle_chat_control_intent as _controller_handle_chat_control_intent,
  67    handle_chat_message as _controller_handle_chat_message,
  68    maybe_spawn_job_from_chat as _controller_maybe_spawn_job_from_chat,
  69    queue_chat_note as _controller_queue_chat_note,
  70)
  71from nipux_cli.chat_frame_runtime import (
  72    ChatFrameDeps,
  73    compact_command_output as _compact_command_output,
  74    emit_frame_if_changed as _emit_frame_if_changed,
  75    run_chat_frame as _run_chat_frame,
  76)
  77from nipux_cli.chat_tui import build_chat_frame as _build_chat_tui_frame
  78from nipux_cli.cli_help import NIPUX_BANNER, print_shell_help as _render_shell_help
  79from nipux_cli.config import (
  80    DEFAULT_BASE_URL,
  81    DEFAULT_API_KEY_ENV,
  82    DEFAULT_CONTEXT_LENGTH,
  83    DEFAULT_MODEL,
  84    DEFAULT_OPENROUTER_API_KEY_ENV,
  85    DEFAULT_OPENROUTER_MODEL,
  86    default_config_yaml,
  87    load_config,
  88    write_private_text,
  89)
  90from nipux_cli.daemon_control import cmd_restart_impl as _cmd_restart_impl
  91from nipux_cli.daemon_control import cmd_start_impl as _cmd_start_impl
  92from nipux_cli.daemon_control import ensure_remote_model_ready_for_worker as _daemon_ensure_remote_model_ready
  93from nipux_cli.daemon_control import provider_preflight_is_recoverable as _daemon_provider_preflight_is_recoverable
  94from nipux_cli.daemon_control import recoverable_remote_model_preflight_failures as _daemon_recoverable_remote_model_preflight_failures
  95from nipux_cli.daemon_control import remote_model_preflight_failures as _daemon_remote_model_preflight_failures
  96from nipux_cli.daemon_control import start_daemon_if_needed_impl as _start_daemon_if_needed_impl
  97from nipux_cli.daemon_control import stop_daemon_process_impl as _stop_daemon_process_impl
  98from nipux_cli.daemon import Daemon, DaemonAlreadyRunning, daemon_lock_status, read_daemon_events
  99from nipux_cli.dashboard import collect_dashboard_state, render_dashboard, render_overview
 100from nipux_cli.db import AgentDB, utc_now
 101from nipux_cli.digest import render_job_digest, write_daily_digest
 102from nipux_cli.doctor import run_doctor
 103from nipux_cli.first_run_tui import (
 104    build_first_run_frame as _build_first_run_tui_frame,
 105    first_run_actions as _first_run_tui_actions,
 106    first_run_columns as _first_run_columns,
 107)
 108from nipux_cli.first_run_controller import (
 109    FirstRunFrameDeps,
 110    capture_first_run_command as _controller_capture_first_run_command,
 111    create_first_run_job as _controller_create_first_run_job,
 112    first_run_chat_reply as _controller_first_run_chat_reply,
 113    first_token as _controller_first_token,
 114    handle_first_run_action as _controller_handle_first_run_action,
 115    handle_first_run_frame_line as _controller_handle_first_run_frame_line,
 116)
 117from nipux_cli.first_run_frame_runtime import (
 118    FirstRunRuntimeDeps,
 119    clamp_selection as _clamp_first_run_runtime_selection,
 120    run_first_run_frame as _run_first_run_frame,
 121)
 122from nipux_cli.event_render import event_line as _event_line
 123from nipux_cli.frame_snapshot import WORKSPACE_CHAT_ID, load_frame_snapshot
 124from nipux_cli.parser_builder import build_arg_parser
 125from nipux_cli.planning import (
 126    format_initial_plan,
 127    initial_plan_for_objective,
 128    initial_roadmap_for_objective,
 129    initial_task_contract,
 130)
 131from nipux_cli.scheduling import job_provider_blocked, operator_resume_metadata
 132from nipux_cli.record_commands import (
 133    RecordCommandDeps,
 134    cmd_experiments_impl,
 135    cmd_findings_impl,
 136    cmd_memory_impl,
 137    cmd_metrics_impl,
 138    cmd_roadmap_impl,
 139    cmd_sources_impl,
 140    cmd_tasks_impl,
 141    cmd_usage_impl,
 142)
 143from nipux_cli.service_install import cmd_autostart, cmd_service
 144from nipux_cli.service_install import launch_agent_path as _launch_agent_path
 145from nipux_cli.service_install import launch_agent_plist as _service_launch_agent_plist
 146from nipux_cli.service_install import systemd_service_text as _service_systemd_service_text
 147from nipux_cli.templates import program_for_job
 148from nipux_cli.tui_commands import CHAT_SETTING_COMMANDS, slash_suggestion_lines
 149from nipux_cli.settings import (
 150    config_field_value,
 151    save_config_field,
 152)
 153from nipux_cli.settings_commands import (
 154    capture_setting_command as _capture_setting_command,
 155    handle_chat_setting_command as _handle_chat_setting_command,
 156)
 157from nipux_cli.tui_event_format import (
 158    clean_step_summary as _clean_step_summary,
 159    friendly_error_text as _friendly_error_text,
 160    generic_display_text as _generic_display_text,
 161)
 162from nipux_cli.tui_events import (
 163    live_badge as _live_badge,
 164    minimal_live_event_line as _minimal_live_event_line,
 165)
 166from nipux_cli.tui_outcomes import model_update_event_parts as _model_update_event_parts
 167from nipux_cli.tui_status import (
 168    job_display_state as _job_display_state,
 169    worker_label as _worker_label,
 170)
 171from nipux_cli.tui_style import (
 172    _accent,
 173    _fancy_ui,
 174    _one_line,
 175    _status_badge,
 176)
 177from nipux_cli.uninstall import build_uninstall_plan, uninstall_installed_tool, uninstall_runtime
 178from nipux_cli.updater import update_checkout
 179from nipux_cli.updates import render_all_updates_report, render_updates_report
 180
 181_save_config_field = save_config_field
 182_config_field_value = config_field_value
 183_slash_suggestion_lines = slash_suggestion_lines
 184_chat_control_command = chat_control_command
 185
 186
 187def _launch_agent_plist(*, poll_seconds: float, quiet: bool) -> str:
 188    return _service_launch_agent_plist(poll_seconds=poll_seconds, quiet=quiet)
 189
 190
 191def _systemd_service_text(*, poll_seconds: float, quiet: bool) -> str:
 192    return _service_systemd_service_text(poll_seconds=poll_seconds, quiet=quiet)
 193
 194
 195SHELL_BUILTINS = {"help", "?", "commands", "exit", "quit", ":q", "clear"}
 196SHELL_COMMAND_NAMES = {
 197    "init",
 198    "uninstall",
 199    "create",
 200    "jobs",
 201    "ls",
 202    "focus",
 203    "rename",
 204    "delete",
 205    "rm",
 206    "chat",
 207    "shell",
 208    "status",
 209    "health",
 210    "history",
 211    "events",
 212    "activity",
 213    "feed",
 214    "tail",
 215    "updates",
 216    "findings",
 217    "tasks",
 218    "roadmap",
 219    "experiments",
 220    "update",
 221    "dashboard",
 222    "dash",
 223    "start",
 224    "stop",
 225    "restart",
 226    "browser-dashboard",
 227    "artifacts",
 228    "artifact",
 229    "lessons",
 230    "learn",
 231    "findings",
 232    "sources",
 233    "memory",
 234    "metrics",
 235    "usage",
 236    "logs",
 237    "outputs",
 238    "output",
 239    "watch",
 240    "run-one",
 241    "run",
 242    "work",
 243    "steer",
 244    "say",
 245    "pause",
 246    "resume",
 247    "cancel",
 248    "digest",
 249    "daily-digest",
 250    "daemon",
 251    "doctor",
 252    "autostart",
 253    "service",
 254}
 255
 256def _db() -> tuple[AgentDB, object]:
 257    config = load_config()
 258    config.ensure_dirs()
 259    return AgentDB(config.runtime.state_db_path), config
 260
 261
 262def _record_command_deps() -> RecordCommandDeps:
 263    return RecordCommandDeps(
 264        db_factory=_db,
 265        resolve_job_id=_resolve_job_id,
 266        job_ref_text=_job_ref_text,
 267    )
 268
 269
 270def cmd_init(args: argparse.Namespace) -> None:
 271    config = load_config()
 272    config.ensure_dirs()
 273    path = Path(args.path).expanduser() if args.path else config.runtime.home / "config.yaml"
 274    if path.exists() and not args.force:
 275        print(f"Config already exists: {path}")
 276        return
 277    path.parent.mkdir(parents=True, exist_ok=True)
 278    model = args.model or DEFAULT_MODEL
 279    base_url = args.base_url or DEFAULT_BASE_URL
 280    api_key_env = args.api_key_env or DEFAULT_API_KEY_ENV
 281    if args.openrouter:
 282        base_url = args.base_url or "https://openrouter.ai/api/v1"
 283        api_key_env = args.api_key_env or DEFAULT_OPENROUTER_API_KEY_ENV
 284        model = args.model or DEFAULT_OPENROUTER_MODEL
 285    write_private_text(
 286        path,
 287        default_config_yaml(
 288            model=model,
 289            base_url=base_url,
 290            api_key_env=api_key_env,
 291            context_length=getattr(args, "context_length", DEFAULT_CONTEXT_LENGTH),
 292        ),
 293    )
 294    print(f"Wrote {path}")
 295    env_path = config.runtime.home / ".env"
 296    if not env_path.exists():
 297        write_private_text(
 298            env_path,
 299            f"# Optional local secrets for Nipux. This file stays outside the git repo.\n{api_key_env}=\n",
 300        )
 301        print(f"Wrote {env_path} (fill {api_key_env}; do not commit secrets)")
 302
 303
 304def cmd_update(args: argparse.Namespace) -> None:
 305    config = load_config()
 306    config.ensure_dirs()
 307    daemon_before = daemon_lock_status(config.runtime.home / "agentd.lock")
 308    code, lines = update_checkout(path=args.path, allow_dirty=args.allow_dirty)
 309    for line in lines:
 310        print(line)
 311    if code:
 312        raise SystemExit(code)
 313    if getattr(args, "no_restart", False):
 314        print("Daemon restart skipped by --no-restart.")
 315        return
 316    daemon_after = daemon_lock_status(config.runtime.home / "agentd.lock")
 317    if not daemon_before.get("running") and not daemon_after.get("running"):
 318        print("No daemon is running; no restart needed.")
 319        return
 320    print("Restarting running daemon so it uses the updated code.")
 321    try:
 322        cmd_restart(
 323            argparse.Namespace(
 324                poll_seconds=0.0,
 325                wait=5.0,
 326                fake=False,
 327                quiet=True,
 328                log_file=None,
 329            )
 330        )
 331    except SystemExit as exc:
 332        detail = str(exc) if str(exc) else "restart failed"
 333        print(f"Update succeeded, but daemon restart failed: {_one_line(detail, 160)}")
 334
 335
 336def cmd_uninstall(args: argparse.Namespace) -> None:
 337    config = load_config()
 338    plan = build_uninstall_plan(runtime_home=config.runtime.home, include_legacy=not args.keep_legacy)
 339    remove_tool = bool(getattr(args, "remove_tool", False)) or not bool(getattr(args, "keep_tool", False))
 340    if not args.yes and not args.dry_run:
 341        print("This will stop Nipux and remove local runtime state:")
 342        for path in (*plan.service_paths, *plan.paths):
 343            print(f"  {path.expanduser()}")
 344        if remove_tool:
 345            print("It will also remove the installed `nipux` command with: uv tool uninstall nipux")
 346        try:
 347            answer = input("Type 'uninstall' to continue: ").strip().lower()
 348        except (EOFError, KeyboardInterrupt):
 349            print()
 350            print("uninstall aborted")
 351            return
 352        if answer != "uninstall":
 353            print("uninstall aborted")
 354            return
 355    if not args.dry_run:
 356        try:
 357            _stop_daemon_process(config, wait=float(args.wait), quiet=True)
 358        except (OSError, SystemExit) as exc:
 359            print(f"daemon stop skipped: {exc}")
 360    for line in uninstall_runtime(
 361        runtime_home=config.runtime.home,
 362        dry_run=bool(args.dry_run),
 363        include_legacy=not args.keep_legacy,
 364    ):
 365        print(line)
 366    if remove_tool:
 367        code, lines = uninstall_installed_tool(dry_run=bool(args.dry_run))
 368        for line in lines:
 369            print(line)
 370        if code:
 371            print("installed command removal failed; runtime state was still removed")
 372    elif not args.dry_run:
 373        print("runtime removed. Installed `nipux` command kept by --keep-tool.")
 374
 375
 376def cmd_create(args: argparse.Namespace) -> None:
 377    if not _ensure_model_setup_verified_for_workspace():
 378        raise SystemExit(1)
 379    job_id, title = _create_job(
 380        objective=args.objective,
 381        title=args.title,
 382        kind=args.kind,
 383        cadence=args.cadence,
 384    )
 385    print(f"created {title}")
 386
 387
 388def _ensure_model_setup_verified_for_workspace() -> bool:
 389    config = load_config()
 390    if _model_setup_verified(config):
 391        return True
 392    if _workspace_has_model_config(config) and _auto_verify_model_setup(config):
 393        return True
 394    print("Model setup is not verified.")
 395    print("Run `nipux` and finish setup, or run `nipux doctor --check-model` after configuring a provider.")
 396    print("Jobs and chat stay locked until the configured model accepts a chat request.")
 397    return False
 398
 399
 400def _workspace_has_model_config(config: Any) -> bool:
 401    return bool(_read_shell_state().get("setup_completed")) or (config.runtime.home / "config.yaml").exists()
 402
 403
 404def _auto_verify_model_setup(config: Any) -> bool:
 405    checks = run_doctor(config=config, check_model=True)
 406    ok = all(check.ok for check in checks)
 407    if ok:
 408        _mark_model_setup_verified(config)
 409        return True
 410    _clear_model_setup_verified()
 411    return False
 412
 413
 414def _create_job(
 415    *, objective: str, title: str | None = None, kind: str = "generic", cadence: str | None = None
 416) -> tuple[str, str]:
 417    db, config = _db()
 418    try:
 419        title = title or objective.strip().splitlines()[0][:80] or "Untitled job"
 420        plan = initial_plan_for_objective(objective)
 421        job_id = db.create_job(
 422            objective,
 423            title=title,
 424            kind=kind,
 425            cadence=cadence,
 426            metadata={"planning": plan},
 427        )
 428        db.update_job_status(job_id, "queued", metadata_patch={"planning": plan, "planning_status": "auto_accepted"})
 429        db.append_agent_update(job_id, format_initial_plan(plan), category="plan", metadata={"planning": plan})
 430        db.append_agent_update(job_id, "Plan accepted automatically. I will start working from the planned tasks.", category="plan")
 431        db.append_roadmap_record(job_id, **initial_roadmap_for_objective(title=title, objective=objective))
 432        for index, task in enumerate(plan["tasks"], start=1):
 433            task_contract = initial_task_contract(str(task))
 434            db.append_task_record(
 435                job_id,
 436                title=str(task),
 437                status="open",
 438                priority=max(0, 10 - index),
 439                goal=objective,
 440                output_contract=task_contract["output_contract"],
 441                acceptance_criteria=task_contract["acceptance_criteria"],
 442                evidence_needed=task_contract["evidence_needed"],
 443                stall_behavior=task_contract["stall_behavior"],
 444                metadata={"phase": "initial_plan"},
 445            )
 446        program = config.runtime.jobs_dir / job_id / "program.md"
 447        program.parent.mkdir(parents=True, exist_ok=True)
 448        program.write_text(
 449            program_for_job(kind=kind, title=title, objective=objective),
 450            encoding="utf-8",
 451        )
 452        _write_shell_state({"focus_job_id": job_id})
 453        return job_id, title
 454    finally:
 455        db.close()
 456
 457
 458def cmd_jobs(args: argparse.Namespace) -> None:
 459    db, _ = _db()
 460    try:
 461        jobs = db.list_jobs()
 462        if not jobs:
 463            print('No jobs yet. Create one with: nipux create "objective"')
 464            return
 465        focused = _configured_focus_job_id(db)
 466        daemon_running = daemon_lock_status(load_config().runtime.home / "agentd.lock")["running"]
 467        _print_jobs_panel(jobs, focused_job_id=str(focused or ""), daemon_running=bool(daemon_running))
 468    finally:
 469        db.close()
 470
 471
 472def cmd_focus(args: argparse.Namespace) -> None:
 473    db, _ = _db()
 474    try:
 475        if not args.query:
 476            job_id = _default_job_id(db)
 477            if not job_id:
 478                print("No focused job. Create one first.")
 479                return
 480            job = db.get_job(job_id)
 481            daemon = daemon_lock_status(load_config().runtime.home / "agentd.lock")
 482            print(f"focus: {job['title']} | job {_job_display_state(job, bool(daemon['running']))}")
 483            return
 484        job = _find_job(db, " ".join(args.query))
 485        if not job:
 486            print(f"No job matched: {' '.join(args.query)}")
 487            return
 488        _write_shell_state({"focus_job_id": job["id"]})
 489        daemon = daemon_lock_status(load_config().runtime.home / "agentd.lock")
 490        print(f"focus set: {job['title']} | job {_job_display_state(job, bool(daemon['running']))}")
 491    finally:
 492        db.close()
 493
 494
 495def cmd_rename(args: argparse.Namespace) -> None:
 496    db, config = _db()
 497    try:
 498        job_id = _resolve_job_id(db, args.job_id)
 499        if not job_id:
 500            ref = _job_ref_text(args.job_id)
 501            print(f"No job matched: {ref}" if ref else "No jobs found.")
 502            return
 503        old = db.get_job(job_id)
 504        renamed = db.rename_job(job_id, _job_ref_text(args.title))
 505        program = config.runtime.jobs_dir / job_id / "program.md"
 506        if program.exists():
 507            try:
 508                content = program.read_text(encoding="utf-8")
 509                lines = content.splitlines()
 510                if lines and lines[0].startswith("# "):
 511                    lines[0] = f"# {renamed['title']}"
 512                    program.write_text("\n".join(lines) + ("\n" if content.endswith("\n") else ""), encoding="utf-8")
 513            except OSError:
 514                pass
 515        _write_shell_state({"focus_job_id": job_id})
 516        print(f"renamed {old['title']} -> {renamed['title']}")
 517    finally:
 518        db.close()
 519
 520
 521def cmd_delete(args: argparse.Namespace) -> None:
 522    db, config = _db()
 523    try:
 524        job_id = _resolve_job_id(db, args.job_id)
 525        if not job_id:
 526            ref = _job_ref_text(args.job_id)
 527            print(f"No job matched: {ref}" if ref else "usage: delete JOB_TITLE")
 528            return
 529        result = db.delete_job(job_id)
 530        job = result["job"]
 531    finally:
 532        db.close()
 533
 534    removed_files = 0
 535    if not args.keep_files:
 536        job_dir = config.runtime.jobs_dir / job_id
 537        for path_text in result.get("artifact_paths") or []:
 538            path = Path(path_text)
 539            try:
 540                if path.exists() and job_dir in path.parents:
 541                    path.unlink()
 542                    removed_files += 1
 543            except OSError:
 544                pass
 545        try:
 546            if job_dir.exists():
 547                shutil.rmtree(job_dir)
 548        except OSError:
 549            pass
 550    state = _read_shell_state()
 551    if state.get("focus_job_id") == job_id:
 552        _write_shell_state({"focus_job_id": ""})
 553    counts = result.get("counts") or {}
 554    file_text = "kept files" if args.keep_files else f"removed files={removed_files}"
 555    print(
 556        f"deleted {job['title']} | steps={counts.get('steps', 0)} "
 557        f"artifacts={counts.get('artifacts', 0)} runs={counts.get('runs', 0)} | {file_text}"
 558    )
 559
 560
 561def cmd_chat(args: argparse.Namespace) -> None:
 562    if not _ensure_model_setup_verified_for_workspace():
 563        return
 564    db, _ = _db()
 565    try:
 566        job_id = _resolve_job_id(db, args.job_id)
 567        if not job_id:
 568            ref = _job_ref_text(args.job_id)
 569            print(f"No job matched: {ref}" if ref else "No jobs found. Create one first.")
 570            return
 571        _write_shell_state({"focus_job_id": job_id})
 572    finally:
 573        db.close()
 574
 575    _enter_chat(job_id, show_history=not args.no_history, history_limit=args.history_limit)
 576
 577
 578def cmd_home(args: argparse.Namespace) -> None:
 579    _install_readline_history()
 580    config = load_config()
 581    if not _model_setup_verified(config) and _workspace_has_model_config(config):
 582        _auto_verify_model_setup(config)
 583    if not _model_setup_verified(load_config()):
 584        _enter_first_run_setup(history_limit=args.history_limit)
 585        return
 586
 587    if _has_saved_jobs():
 588        _start_interactive_daemon_if_possible()
 589    _enter_workspace_chat(history_limit=args.history_limit)
 590
 591
 592def _enter_first_run_setup(*, history_limit: int = 12) -> None:
 593    if _frame_chat_enabled():
 594        _enter_first_run_frame(history_limit=history_limit)
 595        return
 596
 597    print("Nipux setup requires an interactive terminal.")
 598    print("Run `nipux` in a terminal window to choose model, endpoint, and tool access.")
 599
 600
 601def _enter_empty_workspace(*, history_limit: int = 12) -> None:
 602    del history_limit
 603    db, _ = _db()
 604    try:
 605        jobs = db.list_jobs()[:12]
 606    finally:
 607        db.close()
 608    print("NIPUX WORKSPACE")
 609    print(_rule("="))
 610    if jobs:
 611        print("Jobs")
 612        for job in jobs:
 613            marker = "*" if str(job.get("id") or "") == _read_shell_state().get("focus_job_id") else " "
 614            title = _one_line(job.get("title") or job.get("id") or "untitled", 44)
 615            print(f"{marker} {title:44} {job.get('status') or 'unknown'}")
 616        print()
 617        print("Run in a terminal to use the full-screen workspace, or use: nipux chat JOB_TITLE")
 618    else:
 619        print("No jobs are saved in this profile.")
 620        print("Create a job with: nipux create \"objective\"")
 621    print("Settings: nipux init --force | Check setup: nipux doctor --check-model")
 622
 623
 624def _has_saved_jobs() -> bool:
 625    db, _ = _db()
 626    try:
 627        return bool(db.list_jobs()[:1])
 628    finally:
 629        db.close()
 630
 631
 632def _enter_workspace_chat(*, history_limit: int = 12) -> None:
 633    if _frame_chat_enabled():
 634        _enter_chat_frame(WORKSPACE_CHAT_ID, history_limit=history_limit)
 635        return
 636    _enter_empty_workspace(history_limit=history_limit)
 637
 638
 639def _print_first_run_menu() -> None:
 640    config = load_config()
 641    print("Start")
 642    print(f"  model   {config.model.model}")
 643    print("  status  ready when work starts")
 644    print(f"  home    {_short_path(config.runtime.home)}")
 645    print()
 646    print("Commands")
 647    print("  1  doctor    verify provider/model")
 648    print("  2  init      write config/env template")
 649    print("  3  exit      leave")
 650    print()
 651    print("Finish setup before chat or job creation is available.")
 652
 653
 654def _handle_first_run_menu_line(line: str, *, history_limit: int = 12) -> bool:
 655    line = line.strip()
 656    if not line:
 657        _print_first_run_menu()
 658        return True
 659    if line.startswith("/"):
 660        line = line[1:].strip()
 661    lowered = line.lower()
 662    if lowered in {"exit", "quit", ":q", "3", "5"}:
 663        return False
 664    if lowered in {"help", "?", "commands"}:
 665        _print_first_run_menu()
 666        return True
 667    if lowered in {"new"} or lowered.startswith("new "):
 668        print("Finish setup first. Then describe worker jobs in the chat workspace.")
 669        return True
 670    if lowered in {"jobs", "ls"}:
 671        cmd_jobs(argparse.Namespace())
 672        return True
 673    if lowered in {"1", "doctor"}:
 674        try:
 675            cmd_doctor(argparse.Namespace(check_model=False))
 676        except SystemExit:
 677            pass
 678        return True
 679    if lowered in {"2", "init"}:
 680        cmd_init(argparse.Namespace(path=None, force=False))
 681        return True
 682    first = _first_token(line)
 683    if first in {"create", "new"}:
 684        print("Finish setup first. Then describe worker jobs in the chat workspace.")
 685        return True
 686    if first in SHELL_COMMAND_NAMES:
 687        _run_shell_line(line)
 688        return True
 689    objective = _extract_job_objective_from_message(line)
 690    if objective:
 691        print("Finish setup first. Then describe worker jobs in the chat workspace.")
 692        return True
 693    print(_first_run_chat_reply(line))
 694    return True
 695
 696
 697def _prompt_first_run_value(label: str) -> str:
 698    try:
 699        return input(f"{label} > ").strip()
 700    except (EOFError, KeyboardInterrupt):
 701        print()
 702        return ""
 703
 704
 705def _first_run_create_and_open(objective: str, *, history_limit: int = 12) -> None:
 706    if not _ensure_model_setup_verified_for_workspace():
 707        return
 708    job_id, title = _create_job(objective=objective, title=None, kind="generic", cadence=None)
 709    _write_shell_state({"focus_job_id": job_id})
 710    print(f"created {title}")
 711    _start_interactive_daemon_if_possible()
 712    print("Opening workspace.")
 713    _enter_workspace_chat(history_limit=history_limit)
 714
 715
 716def _first_token(line: str) -> str:
 717    return _controller_first_token(line)
 718
 719
 720def _enter_first_run_frame(*, history_limit: int = 12) -> None:
 721    next_job_id = _run_first_run_frame(deps=_first_run_runtime_deps())
 722    if next_job_id == WORKSPACE_CHAT_ID:
 723        _enter_workspace_chat(history_limit=history_limit)
 724    elif next_job_id:
 725        _start_interactive_daemon_if_possible()
 726        _write_shell_state({"focus_job_id": next_job_id})
 727        _enter_workspace_chat(history_limit=history_limit)
 728
 729
 730def _first_run_runtime_deps() -> FirstRunRuntimeDeps:
 731    return FirstRunRuntimeDeps(
 732        render_frame=lambda buffer, notices, selected, view, editing_field, previous: _render_first_run_frame(
 733            buffer,
 734            notices,
 735            selected=selected,
 736            view=view,
 737            editing_field=editing_field,
 738            previous_frame=previous,
 739        ),
 740        actions=_first_run_actions,
 741        handle_action=_handle_first_run_action,
 742        handle_line=_handle_first_run_frame_line,
 743        click_action=lambda x, y, view: _first_run_click_action(x, y, view=view),
 744    )
 745
 746
 747def _first_run_actions(view: str) -> list[tuple[str, str, str]]:
 748    return _first_run_tui_actions(view)
 749
 750
 751def _clamp_first_run_selection(selected: int, view: str) -> int:
 752    return _clamp_first_run_runtime_selection(selected, _first_run_actions(view))
 753
 754
 755def _handle_first_run_action(action: str) -> tuple[str, str | list[str] | None]:
 756    return _controller_handle_first_run_action(action, deps=_first_run_frame_deps())
 757
 758
 759def _first_run_click_action(x: int, y: int, *, view: str) -> int | str | None:
 760    width, height = shutil.get_terminal_size((100, 30))
 761    width = max(92, width)
 762    actions = _first_run_actions(view)
 763    if not actions or y < 10 or y > max(10, height - 4):
 764        return None
 765    gap = 2
 766    card_width = max(18, min(34, (width - (len(actions) - 1) * gap - 4) // len(actions)))
 767    total_width = len(actions) * card_width + (len(actions) - 1) * gap
 768    start_x = max(1, (width - total_width) // 2 + 1)
 769    relative = x - start_x
 770    if relative < 0 or relative >= total_width:
 771        return None
 772    span = card_width + gap
 773    index = relative // span
 774    within_card = relative % span < card_width
 775    if not within_card:
 776        return None
 777    return index if 0 <= index < len(actions) else None
 778
 779
 780def _chat_page_click(x: int, y: int, *, right_view: str) -> str | None:
 781    del right_view
 782    width, _height = shutil.get_terminal_size((100, 30))
 783    width = max(92, width)
 784    right_width = min(max(52, int(width * 0.36)), 72)
 785    left_width = max(48, width - right_width - 3)
 786    if left_width < 48:
 787        left_width = 48
 788        right_width = max(34, width - left_width - 3)
 789    right_start = left_width + 4
 790    if x < right_start or y > 8:
 791        return None
 792    relative = max(0, x - right_start)
 793    third = max(1, right_width // 3)
 794    return ["updates", "status", "work"][min(2, relative // third)]
 795
 796
 797def _handle_first_run_frame_line(line: str) -> tuple[str, str | list[str] | None]:
 798    return _controller_handle_first_run_frame_line(line, deps=_first_run_frame_deps())
 799
 800
 801def _first_run_chat_reply(message: str) -> str:
 802    return _controller_first_run_chat_reply(message)
 803
 804
 805def _create_first_run_job(objective: str) -> str | list[str]:
 806    return _controller_create_first_run_job(objective, deps=_first_run_frame_deps())
 807
 808
 809def _capture_first_run_command(line: str) -> list[str]:
 810    return _controller_capture_first_run_command(line, _run_shell_line)
 811
 812
 813def _first_run_frame_deps() -> FirstRunFrameDeps:
 814    return FirstRunFrameDeps(
 815        capture_command=_capture_first_run_command,
 816        capture_setting_command=_capture_setting_command,
 817        create_job=_create_job,
 818        current_default_job_id=_current_default_job_id,
 819        extract_objective=_extract_job_objective_from_message,
 820        model_setup_verified=lambda: _model_setup_verified(load_config()),
 821        verify_model_setup=_verify_model_setup_from_first_run,
 822        shell_command_names=SHELL_COMMAND_NAMES,
 823    )
 824
 825
 826def _current_default_job_id() -> str | None:
 827    db, _ = _db()
 828    try:
 829        return _default_job_id(db)
 830    finally:
 831        db.close()
 832
 833
 834def _render_first_run_frame(
 835    input_buffer: str,
 836    notices: list[str],
 837    *,
 838    selected: int = 0,
 839    view: str = "start",
 840    editing_field: str | None = None,
 841    previous_frame: str = "",
 842) -> str:
 843    width, height = shutil.get_terminal_size((100, 30))
 844    frame = _build_first_run_frame(
 845        input_buffer,
 846        notices,
 847        width=width,
 848        height=height,
 849        selected=selected,
 850        view=view,
 851        editing_field=editing_field,
 852    )
 853    return _emit_frame_if_changed(frame, previous_frame)
 854
 855
 856def _build_first_run_frame(
 857    input_buffer: str,
 858    notices: list[str],
 859    *,
 860    width: int,
 861    height: int,
 862    selected: int = 0,
 863    view: str = "start",
 864    editing_field: str | None = None,
 865) -> str:
 866    width = max(92, width)
 867    height = max(22, height)
 868    config = load_config()
 869    daemon = daemon_lock_status(config.runtime.home / "agentd.lock")
 870    jobs: list[dict[str, Any]] = []
 871    db, _ = _db()
 872    try:
 873        jobs = db.list_jobs()
 874    finally:
 875        db.close()
 876    right_width = _first_run_columns(width)[1]
 877    return _build_first_run_tui_frame(
 878        input_buffer,
 879        notices,
 880        width=width,
 881        height=height,
 882        selected=selected,
 883        view=view,
 884        editing_field=editing_field,
 885        config=config,
 886        jobs=jobs,
 887        daemon_text=_daemon_state_line(daemon),
 888        home=_short_path(config.runtime.home, max_width=max(20, right_width - 8)),
 889        config_path=_short_path(config.runtime.home / "config.yaml", max_width=max(20, right_width - 8)),
 890    )
 891
 892
 893def _enter_chat(job_id: str, *, show_history: bool, history_limit: int = 12) -> None:
 894    if not _ensure_model_setup_verified_for_workspace():
 895        return
 896    _install_readline_history()
 897    startup_note = _start_interactive_daemon_if_possible()
 898    if _frame_chat_enabled():
 899        _enter_chat_frame(job_id, history_limit=history_limit)
 900        return
 901    db, _ = _db()
 902    try:
 903        job = db.get_job(job_id)
 904        _write_shell_state({"focus_job_id": job_id})
 905    finally:
 906        db.close()
 907
 908    if _fancy_ui():
 909        print("\033[2J\033[H", end="")
 910    print(NIPUX_BANNER)
 911    print(_rule("="))
 912    print(_shell_summary())
 913    print(_rule("="))
 914    if show_history:
 915        _print_startup_history(job_id, limit=history_limit, chars=180)
 916        print()
 917    if startup_note:
 918        print(_one_line(startup_note, 180))
 919    _print_chat_composer(job)
 920    live_stop, live_thread = _start_chat_live_feed(job_id)
 921    try:
 922        while True:
 923            db, _ = _db()
 924            try:
 925                refreshed = _default_job_id(db)
 926                if refreshed:
 927                    job_id = refreshed
 928                    job = db.get_job(job_id)
 929            finally:
 930                db.close()
 931            try:
 932                line = input(_chat_prompt(job))
 933            except EOFError:
 934                print()
 935                return
 936            except KeyboardInterrupt:
 937                print()
 938                continue
 939            if not _chat_handle_line(job_id, line):
 940                return
 941    finally:
 942        if live_stop is not None:
 943            live_stop.set()
 944        if live_thread is not None:
 945            live_thread.join(timeout=1.0)
 946
 947
 948def _frame_chat_enabled() -> bool:
 949    return (
 950        sys.stdin.isatty()
 951        and sys.stdout.isatty()
 952        and not os.environ.get("NIPUX_APPEND_LIVE")
 953        and not os.environ.get("NIPUX_NO_FRAME")
 954    )
 955
 956
 957def _enter_chat_frame(job_id: str, *, history_limit: int = 12) -> None:
 958    _run_chat_frame(job_id, history_limit=history_limit, deps=_chat_frame_deps())
 959
 960
 961def _chat_frame_deps() -> ChatFrameDeps:
 962    return ChatFrameDeps(
 963        load_snapshot=lambda job_id, history_limit: _load_frame_snapshot(job_id, history_limit=history_limit),
 964        render_frame=lambda snapshot, buffer, notices, right_view, selected, editing_field, modal_view, previous: _render_chat_frame(
 965            snapshot,
 966            buffer,
 967            notices,
 968            right_view=right_view,
 969            selected_control=selected,
 970            editing_field=editing_field,
 971            modal_view=modal_view,
 972            previous_frame=previous,
 973        ),
 974        handle_chat_message=lambda job_id, line: _handle_chat_message(job_id, line, quiet=True),
 975        capture_chat_command=_capture_chat_command,
 976        write_shell_state=_write_shell_state,
 977        is_plain_chat_line=_is_plain_chat_line,
 978        page_click=lambda x, y, right_view: _chat_page_click(x, y, right_view=right_view),
 979    )
 980
 981
 982def _capture_chat_command(job_id: str, line: str) -> tuple[bool, str]:
 983    stream = StringIO()
 984    with redirect_stdout(stream):
 985        if job_id == WORKSPACE_CHAT_ID:
 986            raw = line.strip()
 987            command = raw[1:].strip() if raw.startswith("/") else (chat_control_command(raw).lstrip("/") or raw)
 988            keep_running = _run_workspace_command_line(command) if command else True
 989        else:
 990            keep_running = _chat_handle_line(job_id, line)
 991    return keep_running, stream.getvalue()
 992
 993
 994def _run_workspace_command_line(command: str) -> bool:
 995    try:
 996        tokens = shlex.split(command)
 997    except ValueError as exc:
 998        print(f"parse error: {exc}")
 999        return True
1000    if tokens and tokens[0] == "help":
1001        _print_workspace_chat_help()
1002        return True
1003    if tokens and _run_workspace_setting_command(tokens[0], tokens[1:]):
1004        return True
1005    if tokens and tokens[0] == "new":
1006        objective = command[len("new") :].strip()
1007        if not objective:
1008            print("usage: /new OBJECTIVE")
1009            return True
1010        operator_line = f"/new {objective}"
1011        _append_workspace_chat_event("operator_message", "command", operator_line, {"source": "workspace"})
1012        message = _create_workspace_job_from_chat(operator_line, objective)
1013        _append_workspace_chat_event("agent_message", "chat", message, {"source": "workspace"})
1014        print(message)
1015        return True
1016    if tokens and tokens[0] == "run":
1017        if len(tokens) > 1 and not tokens[1].startswith("-") and _workspace_command_should_create_worker(command, " ".join(tokens[1:])):
1018            objective = _extract_job_objective_from_message(command)
1019            operator_line = f"/{tokens[0]} {objective}"
1020            _append_workspace_chat_event("operator_message", "command", operator_line, {"source": "workspace"})
1021            message = _create_workspace_job_from_chat(operator_line, objective)
1022            _append_workspace_chat_event("agent_message", "chat", message, {"source": "workspace"})
1023            print(message)
1024            return True
1025        return _run_workspace_run_command(tokens)
1026    if tokens and tokens[0] in {"start", "launch"} and len(tokens) > 1 and not tokens[1].startswith("-"):
1027        target_text = " ".join(tokens[1:])
1028        if _workspace_command_should_create_worker(command, target_text):
1029            objective = _extract_job_objective_from_message(command)
1030            operator_line = f"/{tokens[0]} {objective}"
1031            _append_workspace_chat_event("operator_message", "command", operator_line, {"source": "workspace"})
1032            message = _create_workspace_job_from_chat(operator_line, objective)
1033            _append_workspace_chat_event("agent_message", "chat", message, {"source": "workspace"})
1034            print(message)
1035            return True
1036        return _run_workspace_run_command(["run", target_text])
1037    return _run_shell_line(command)
1038
1039
1040def _run_workspace_setting_command(command: str, rest: list[str]) -> bool:
1041    if command == "settings":
1042        command = "config"
1043    if command in {"config", "key", "api-key"} or command in CHAT_SETTING_COMMANDS:
1044        return _handle_chat_setting_command(command, rest)
1045    return False
1046
1047
1048def _workspace_command_should_create_worker(command: str, target_text: str) -> bool:
1049    objective = _extract_job_objective_from_message(command)
1050    if not objective:
1051        return False
1052    db, _config = _db()
1053    try:
1054        return _find_job(db, target_text) is None
1055    finally:
1056        db.close()
1057
1058
1059def _run_workspace_run_command(tokens: list[str]) -> bool:
1060    try:
1061        parsed = build_parser().parse_args(tokens)
1062    except SystemExit as exc:
1063        if exc.code:
1064            print(f"command exited with status {exc.code}")
1065        return True
1066    parsed.no_follow = True
1067    parsed.quiet = True
1068    parsed.func(parsed)
1069    return True
1070
1071
1072def _print_workspace_chat_help() -> None:
1073    print("Create: type a goal, or /new OBJECTIVE.")
1074    print("Run: /run, /pause, /resume. Inspect: /jobs, /outcomes, /artifacts, /activity.")
1075    print("Config: /settings, /model, /base-url, /api-key. Navigate: ←→ pages, ↑↓ jobs.")
1076
1077
1078def _start_worker_from_chat_context(
1079    *,
1080    poll_seconds: float = 0.0,
1081    fake: bool = False,
1082    quiet: bool = True,
1083    log_file: str | None = None,
1084) -> bool:
1085    """Start the daemon from the TUI without dumping preflight internals into chat."""
1086
1087    def report(message: str) -> None:
1088        if not quiet:
1089            print(message)
1090
1091    stream = StringIO()
1092    try:
1093        with redirect_stdout(stream):
1094            _start_daemon_if_needed(poll_seconds=poll_seconds, fake=fake, quiet=True, log_file=log_file)
1095    except SystemExit as exc:
1096        detail = _one_line(str(exc) or "daemon start failed", 120)
1097        report(f"worker not started: {detail}")
1098        return False
1099    except Exception as exc:
1100        detail = _one_line(f"{type(exc).__name__}: {exc}", 120)
1101        report(f"worker not started: {detail}")
1102        return False
1103    output = stream.getvalue()
1104    lowered = output.lower()
1105    if (
1106        "model is not ready" in lowered
1107        or "model setup is not verified" in lowered
1108        or "model_generation:" in lowered
1109        or "model_endpoint:" in lowered
1110        or "model_auth:" in lowered
1111        or "model_config:" in lowered
1112    ):
1113        report("worker not started: model provider is not ready. Use /settings, then /doctor.")
1114        return False
1115    return True
1116
1117
1118def _start_worker_from_chat_namespace(args: argparse.Namespace) -> bool:
1119    return _start_worker_from_chat_context(
1120        poll_seconds=float(getattr(args, "poll_seconds", 0.0) or 0.0),
1121        fake=bool(getattr(args, "fake", False)),
1122        quiet=bool(getattr(args, "quiet", True)),
1123        log_file=getattr(args, "log_file", None),
1124    )
1125
1126
1127def _is_plain_chat_line(line: str) -> bool:
1128    stripped = line.strip()
1129    if not stripped or stripped.startswith("/"):
1130        return False
1131    lowered = stripped.lower()
1132    if lowered in {"help", "jobs", "ls", "clear", "exit", "quit"}:
1133        return False
1134    if chat_control_command(stripped):
1135        return False
1136    try:
1137        first = shlex.split(stripped)[0].lower()
1138    except (IndexError, ValueError):
1139        first = lowered.split(maxsplit=1)[0]
1140    return first not in {"chat", "focus", "switch", "jobs", "ls", "help", "clear", "exit", "quit"}
1141
1142
1143def _load_frame_snapshot(job_id: str, *, history_limit: int = 12) -> dict[str, Any]:
1144    db, config = _db()
1145    try:
1146        return load_frame_snapshot(
1147            db,
1148            config,
1149            job_id,
1150            default_job_id=_default_job_id(db),
1151            history_limit=history_limit,
1152            workspace_events=_workspace_chat_events() if job_id == WORKSPACE_CHAT_ID else None,
1153        )
1154    finally:
1155        db.close()
1156
1157
1158def _render_chat_frame(
1159    snapshot: dict[str, Any],
1160    input_buffer: str,
1161    notices: list[str],
1162    *,
1163    right_view: str = "updates",
1164    selected_control: int = 0,
1165    editing_field: str | None = None,
1166    modal_view: str | None = None,
1167    previous_frame: str = "",
1168) -> str:
1169    width, height = shutil.get_terminal_size((100, 30))
1170    frame = _build_chat_frame(
1171        snapshot,
1172        input_buffer,
1173        notices,
1174        width=width,
1175        height=height,
1176        right_view=right_view,
1177        selected_control=selected_control,
1178        editing_field=editing_field,
1179        modal_view=modal_view,
1180    )
1181    return _emit_frame_if_changed(frame, previous_frame)
1182
1183
1184def _build_chat_frame(
1185    snapshot: dict[str, Any],
1186    input_buffer: str,
1187    notices: list[str],
1188    *,
1189    width: int,
1190    height: int,
1191    right_view: str = "updates",
1192    selected_control: int = 0,
1193    editing_field: str | None = None,
1194    modal_view: str | None = None,
1195) -> str:
1196    return _build_chat_tui_frame(
1197        snapshot,
1198        input_buffer,
1199        notices,
1200        width=width,
1201        height=height,
1202        right_view=right_view,
1203        selected_control=selected_control,
1204        editing_field=editing_field,
1205        modal_view=modal_view,
1206    )
1207
1208
1209def _resolve_job_id(db: AgentDB, requested: Any = None) -> str | None:
1210    requested = _job_ref_text(requested)
1211    if requested:
1212        job = _find_job(db, requested)
1213        return str(job["id"]) if job else None
1214    return _default_job_id(db)
1215
1216
1217def _activate_job_if_planning(db: AgentDB, job_id: str) -> bool:
1218    job = db.get_job(job_id)
1219    if job.get("status") != "planning":
1220        return False
1221    db.update_job_status(job_id, "queued", metadata_patch={"planning_status": "accepted"})
1222    db.append_agent_update(job_id, "Plan accepted. I will start working from the planned tasks.", category="plan")
1223    return True
1224
1225
1226def _ensure_job_runnable(db: AgentDB, job_id: str) -> None:
1227    if _activate_job_if_planning(db, job_id):
1228        return
1229    job = db.get_job(job_id)
1230    status = str(job.get("status") or "")
1231    if status in {"completed", "paused", "cancelled", "failed"} or job_provider_blocked(job):
1232        patch = operator_resume_metadata()
1233        patch["last_note"] = f"reopened from {status} by operator run command"
1234        db.update_job_status(
1235            job_id,
1236            "queued",
1237            metadata_patch=patch,
1238        )
1239        db.append_agent_update(
1240            job_id,
1241            f"Reopened from {status}; continuing as a long-running job.",
1242            category="progress",
1243            metadata={"previous_status": status},
1244        )
1245
1246
1247def cmd_steer(args: argparse.Namespace) -> None:
1248    message = " ".join(args.message).strip()
1249    if not message:
1250        print("No steering message provided.")
1251        return
1252    db, _ = _db()
1253    try:
1254        job_id = _resolve_job_id(db, args.job_id)
1255        if not job_id:
1256            ref = _job_ref_text(args.job_id)
1257            print(f"No job matched: {ref}" if ref else "No jobs found. Create one first, then send steering.")
1258            return
1259        entry = db.append_operator_message(job_id, message, source="operator")
1260        job = db.get_job(job_id)
1261        print(f"waiting for {job['title']}: {entry['message']}")
1262        print("The next worker step will include this in model-visible context.")
1263    finally:
1264        db.close()
1265
1266
1267def cmd_pause(args: argparse.Namespace) -> None:
1268    db, _ = _db()
1269    try:
1270        job_id, note, ref = _resolve_control_job_and_note(db, args)
1271        if not job_id:
1272            print(f"No job matched: {ref}" if ref else "No jobs found.")
1273            return
1274        patch = {"last_note": note} if note else None
1275        db.update_job_status(job_id, "paused", metadata_patch=patch)
1276        job = db.get_job(job_id)
1277        print(f"paused {job['title']}" + (f": {note}" if note else ""))
1278    finally:
1279        db.close()
1280
1281
1282def cmd_resume(args: argparse.Namespace) -> None:
1283    db, _ = _db()
1284    try:
1285        job_id = _resolve_job_id(db, args.job_id)
1286        if not job_id:
1287            ref = _job_ref_text(args.job_id)
1288            print(f"No job matched: {ref}" if ref else "No jobs found.")
1289            return
1290        db.update_job_status(job_id, "queued", metadata_patch=operator_resume_metadata())
1291        job = db.get_job(job_id)
1292        print(f"resumed {job['title']}")
1293    finally:
1294        db.close()
1295
1296
1297def cmd_cancel(args: argparse.Namespace) -> None:
1298    db, _ = _db()
1299    try:
1300        job_id, note, ref = _resolve_control_job_and_note(db, args)
1301        if not job_id:
1302            print(f"No job matched: {ref}" if ref else "No jobs found.")
1303            return
1304        patch = {"last_note": note} if note else None
1305        db.update_job_status(job_id, "cancelled", metadata_patch=patch)
1306        job = db.get_job(job_id)
1307        print(f"cancelled {job['title']}" + (f": {note}" if note else ""))
1308    finally:
1309        db.close()
1310
1311
1312def cmd_status(args: argparse.Namespace) -> None:
1313    db, config = _db()
1314    try:
1315        job_id = _resolve_job_id(db, args.job_id)
1316        if _job_ref_text(args.job_id) and not job_id:
1317            print(f"No job matched: {_job_ref_text(args.job_id)}")
1318            return
1319        state = collect_dashboard_state(db, config, job_id=job_id, limit=args.limit)
1320        if args.json:
1321            print(json.dumps(state, ensure_ascii=False, indent=2, default=_json_default))
1322            return
1323        if args.full:
1324            print(render_dashboard(state, width=_terminal_width(), chars=args.chars), end="")
1325        else:
1326            print(render_overview(state, width=_terminal_width()), end="")
1327    finally:
1328        db.close()
1329
1330
1331def cmd_health(args: argparse.Namespace) -> None:
1332    db, config = _db()
1333    try:
1334        config.ensure_dirs()
1335        lock = daemon_lock_status(config.runtime.home / "agentd.lock")
1336        metadata = lock.get("metadata") if isinstance(lock.get("metadata"), dict) else {}
1337        events = read_daemon_events(config, limit=args.limit)
1338        job_id = _default_job_id(db)
1339        print("Nipux Health")
1340        print(_rule("="))
1341        print(f"daemon: {_daemon_state_line(lock)}")
1342        if metadata.get("last_heartbeat"):
1343            print(f"heartbeat: {metadata['last_heartbeat']}")
1344        if metadata.get("last_state"):
1345            print(f"state: {metadata['last_state']}")
1346        if metadata.get("last_status") or metadata.get("last_tool"):
1347            print(f"last step: {metadata.get('last_status') or '?'} {metadata.get('last_tool') or '-'}")
1348        if metadata.get("consecutive_failures"):
1349            print(f"consecutive failures: {metadata['consecutive_failures']}")
1350        if metadata.get("last_error"):
1351            print(
1352                f"last error: {metadata.get('last_error_type') or 'error'}: {_one_line(metadata['last_error'], args.chars)}"
1353            )
1354        print(f"model: {config.model.model}")
1355        print(f"state db: {config.runtime.state_db_path}")
1356        print(f"daemon log: {config.runtime.logs_dir / 'daemon.log'}")
1357        print(f"event log: {config.runtime.logs_dir / 'daemon-events.jsonl'}")
1358        print(f"autostart: {'installed' if _launch_agent_path().exists() else 'not installed'}")
1359        if job_id:
1360            job = db.get_job(job_id)
1361            steps = db.list_steps(job_id=job_id)
1362            artifacts = db.list_artifacts(job_id, limit=1)
1363            print()
1364            print(f"focus: {job['title']}")
1365            state = _job_display_state(job, bool(lock["running"]))
1366            print(
1367                f"state: {state} | worker: {_worker_label(job, bool(lock['running']))} | "
1368                f"steps: {_step_count(steps)} | latest artifacts: {len(artifacts)}"
1369            )
1370            if steps:
1371                print(f"latest: {_step_line(steps[-1], chars=args.chars)}")
1372        else:
1373            print()
1374            print("focus: no jobs")
1375        if events:
1376            print()
1377            print("recent daemon events:")
1378            job_titles = {job["id"]: job["title"] for job in db.list_jobs()}
1379            for event in events[-args.limit :]:
1380                print(f"  {_daemon_event_line(event, chars=args.chars, job_titles=job_titles)}")
1381        else:
1382            print()
1383            print("recent daemon events: none")
1384    finally:
1385        db.close()
1386
1387
1388def cmd_history(args: argparse.Namespace) -> None:
1389    db, _ = _db()
1390    try:
1391        job_id = _resolve_job_id(db, args.job_id)
1392        if not job_id:
1393            ref = _job_ref_text(args.job_id)
1394            print(f"No job matched: {ref}" if ref else "No jobs found.")
1395            return
1396        job = db.get_job(job_id)
1397        events = db.list_timeline_events(job_id, limit=args.limit)
1398        if args.json:
1399            print(
1400                json.dumps(
1401                    [_public_event(event) for event in events], ensure_ascii=False, indent=2, default=_json_default
1402                )
1403            )
1404            return
1405        print(f"history {job['title']}")
1406        print(_rule("="))
1407        if not events:
1408            print("No visible history yet.")
1409            return
1410        for event in events:
1411            if args.full:
1412                print(_event_line(event, chars=max(args.chars, 1200), full=True))
1413            else:
1414                _print_event_card(event, chars=args.chars)
1415    finally:
1416        db.close()
1417
1418
1419def cmd_events(args: argparse.Namespace) -> None:
1420    db, _ = _db()
1421    seen: set[str] = set()
1422    try:
1423        job_id = _resolve_job_id(db, args.job_id)
1424        if not job_id:
1425            ref = _job_ref_text(args.job_id)
1426            print(f"No job matched: {ref}" if ref else "No jobs found.")
1427            return
1428        job = db.get_job(job_id)
1429        if not args.json:
1430            print(f"events {job['title']}")
1431            print(_rule("="))
1432
1433        def emit() -> None:
1434            events = db.list_timeline_events(job_id, limit=args.limit)
1435            printed = False
1436            for event in events:
1437                event_id = str(event.get("id") or "")
1438                if event_id in seen:
1439                    continue
1440                seen.add(event_id)
1441                if args.json:
1442                    print(json.dumps(_public_event(event), ensure_ascii=False, default=_json_default), flush=True)
1443                else:
1444                    if args.full:
1445                        print(_event_line(event, chars=args.chars, full=True), flush=True)
1446                    else:
1447                        _print_event_card(event, chars=args.chars)
1448                printed = True
1449            if printed and not args.json:
1450                print(_rule("-"), flush=True)
1451
1452        emit()
1453        while args.follow:
1454            time.sleep(args.interval)
1455            emit()
1456    except KeyboardInterrupt:
1457        print("\nevents stopped")
1458    finally:
1459        db.close()
1460
1461
1462def cmd_dashboard(args: argparse.Namespace) -> None:
1463    db, config = _db()
1464    try:
1465        while True:
1466            job_id = _resolve_job_id(db, args.job_id)
1467            if _job_ref_text(args.job_id) and not job_id:
1468                print(f"No job matched: {_job_ref_text(args.job_id)}")
1469                return
1470            state = collect_dashboard_state(db, config, job_id=job_id, limit=args.limit)
1471            if args.clear:
1472                print("\033[2J\033[H", end="")
1473            print(render_dashboard(state, width=_terminal_width(), chars=args.chars), end="", flush=True)
1474            if not args.follow:
1475                return
1476            time.sleep(args.interval)
1477    except KeyboardInterrupt:
1478        print("\ndashboard stopped")
1479    finally:
1480        db.close()
1481
1482
1483def cmd_artifacts(args: argparse.Namespace) -> None:
1484    db, _ = _db()
1485    try:
1486        job_id = _resolve_job_id(db, args.job_id)
1487        if not job_id:
1488            ref = _job_ref_text(args.job_id)
1489            print(f"No job matched: {ref}" if ref else "No jobs found.")
1490            return
1491        job = db.get_job(job_id)
1492        artifacts = db.list_artifacts(job_id, limit=args.limit)
1493        if not artifacts:
1494            print(f"No saved outputs recorded for {job['title']}.")
1495            return
1496        print(f"saved outputs {job['title']} (newest first)")
1497        print(_rule("-"))
1498        print("Open one with: artifact NUMBER, artifact latest, or artifact TITLE")
1499        for index, artifact in enumerate(artifacts, start=1):
1500            title = artifact.get("title") or artifact["id"]
1501            print(f"{index:>2}. {_one_line(title, 72)}")
1502            meta = f"{artifact['created_at']} | {artifact['type']} | id {artifact['id']}"
1503            print(f"    {meta}")
1504            if artifact.get("summary"):
1505                print(f"    {_one_line(_generic_display_text(artifact['summary']), args.chars)}")
1506            print(f"    view: artifact {index}")
1507            if args.paths:
1508                print(f"    path: {artifact['path']}")
1509    finally:
1510        db.close()
1511
1512
1513def cmd_artifact(args: argparse.Namespace) -> None:
1514    db, config = _db()
1515    try:
1516        store = ArtifactStore(config.runtime.home, db=db)
1517        ref = _job_ref_text(args.artifact_id_or_path)
1518        resolved = _resolve_artifact_ref(db, config, ref, job_id=_resolve_job_id(db, getattr(args, "job_id", None)))
1519        if not resolved:
1520            print(f"No artifact matched: {ref}")
1521            return
1522        content = store.read_text(resolved["id"] if resolved.get("id") else resolved["path"])
1523        if resolved.get("title"):
1524            print(f"artifact: {resolved['title']}")
1525            if resolved.get("summary"):
1526                print(f"summary: {resolved['summary']}")
1527            print(_rule("-"))
1528        if args.chars and len(content) > args.chars:
1529            content = content[: args.chars] + f"\n... truncated {len(content) - args.chars} chars\n"
1530        print(content, end="" if content.endswith("\n") else "\n")
1531    finally:
1532        db.close()
1533
1534
1535def cmd_lessons(args: argparse.Namespace) -> None:
1536    db, _ = _db()
1537    try:
1538        job_id = _resolve_job_id(db, args.job_id)
1539        if not job_id:
1540            ref = _job_ref_text(args.job_id)
1541            print(f"No job matched: {ref}" if ref else "No jobs found.")
1542            return
1543        job = db.get_job(job_id)
1544        _print_lessons(job, limit=args.limit, chars=args.chars)
1545    finally:
1546        db.close()
1547
1548
1549def cmd_learn(args: argparse.Namespace) -> None:
1550    lesson = " ".join(args.lesson).strip()
1551    if not lesson:
1552        print("usage: learn [--job JOB_TITLE] [--category CATEGORY] LESSON")
1553        return
1554    db, _ = _db()
1555    try:
1556        job_id = _resolve_job_id(db, args.job_id)
1557        if not job_id:
1558            ref = _job_ref_text(args.job_id)
1559            print(f"No job matched: {ref}" if ref else "No jobs found.")
1560            return
1561        entry = db.append_lesson(
1562            job_id, lesson, category=args.category or "operator_preference", metadata={"source": "operator"}
1563        )
1564        job = db.get_job(job_id)
1565        print(f"learned for {job['title']}: {_one_line(entry['lesson'], args.chars)}")
1566    finally:
1567        db.close()
1568
1569
1570def cmd_findings(args: argparse.Namespace) -> None:
1571    return cmd_findings_impl(args, _record_command_deps())
1572
1573
1574def cmd_tasks(args: argparse.Namespace) -> None:
1575    return cmd_tasks_impl(args, _record_command_deps())
1576
1577
1578def cmd_roadmap(args: argparse.Namespace) -> None:
1579    return cmd_roadmap_impl(args, _record_command_deps())
1580
1581
1582def cmd_experiments(args: argparse.Namespace) -> None:
1583    return cmd_experiments_impl(args, _record_command_deps())
1584
1585
1586def cmd_sources(args: argparse.Namespace) -> None:
1587    return cmd_sources_impl(args, _record_command_deps())
1588
1589
1590def cmd_memory(args: argparse.Namespace) -> None:
1591    return cmd_memory_impl(args, _record_command_deps())
1592
1593
1594def cmd_metrics(args: argparse.Namespace) -> None:
1595    return cmd_metrics_impl(args, _record_command_deps())
1596
1597
1598def cmd_usage(args: argparse.Namespace) -> None:
1599    return cmd_usage_impl(args, _record_command_deps())
1600
1601
1602def _remote_model_preflight_failures(config) -> list[str]:
1603    return _daemon_remote_model_preflight_failures(config, doctor_fn=run_doctor)
1604
1605
1606def _recoverable_remote_model_preflight_failures(config) -> list[str]:
1607    return _daemon_recoverable_remote_model_preflight_failures(config, doctor_fn=run_doctor)
1608
1609
1610def _provider_preflight_is_recoverable(failures: list[str]) -> bool:
1611    return _daemon_provider_preflight_is_recoverable(failures)
1612
1613
1614def _ensure_remote_model_ready_for_worker(config, *, fake: bool) -> bool:
1615    return _daemon_ensure_remote_model_ready(config, fake=fake, doctor_fn=run_doctor)
1616
1617
1618def cmd_start(args: argparse.Namespace) -> None:
1619    return _cmd_start_impl(
1620        args,
1621        ready_fn=lambda config, fake: _ensure_remote_model_ready_for_worker(config, fake=fake),
1622        stop_fn=lambda config, wait, quiet: _stop_daemon_process(config, wait=wait, quiet=quiet),
1623    )
1624
1625
1626def _start_daemon_if_needed(
1627    *, poll_seconds: float, fake: bool = False, quiet: bool = False, log_file: str | None = None
1628) -> None:
1629    return _start_daemon_if_needed_impl(
1630        poll_seconds=poll_seconds,
1631        fake=fake,
1632        quiet=quiet,
1633        log_file=log_file,
1634        start_fn=cmd_start,
1635        stop_fn=lambda config, wait, quiet: _stop_daemon_process(config, wait=wait, quiet=quiet),
1636    )
1637
1638
1639def _start_interactive_daemon_if_possible() -> str:
1640    """Best-effort daemon start for the full-screen UI without printing over the frame."""
1641
1642    stream = StringIO()
1643    with redirect_stdout(stream):
1644        try:
1645            _start_daemon_if_needed(poll_seconds=0.0, quiet=True)
1646        except SystemExit:
1647            pass
1648    return stream.getvalue()
1649
1650
1651def cmd_restart(args: argparse.Namespace) -> None:
1652    return _cmd_restart_impl(
1653        args,
1654        start_fn=cmd_start,
1655        stop_fn=lambda config, wait, quiet: _stop_daemon_process(config, wait=wait, quiet=quiet),
1656    )
1657
1658
1659def _stop_daemon_process(config, *, wait: float, quiet: bool) -> bool:
1660    return _stop_daemon_process_impl(config, wait=wait, quiet=quiet, pid_alive=_pid_is_alive)
1661
1662
1663def cmd_stop(args: argparse.Namespace) -> None:
1664    requested_job = _job_ref_text(getattr(args, "job_id", None))
1665    if requested_job:
1666        db, _ = _db()
1667        try:
1668            job_id = _resolve_job_id(db, requested_job)
1669            if not job_id:
1670                print(f"No job matched: {requested_job}")
1671                return
1672            db.update_job_status(job_id, "paused", metadata_patch={"last_note": "stopped by operator"})
1673            job = db.get_job(job_id)
1674            print(f"stopped {job['title']} (paused job)")
1675            print("Use resume/run to start it again. Plain 'stop' still stops the daemon.")
1676            return
1677        finally:
1678            db.close()
1679
1680    config = load_config()
1681    _stop_daemon_process(config, wait=args.wait, quiet=False)
1682
1683
1684def cmd_browser_dashboard(args: argparse.Namespace) -> None:
1685    from nipux_cli.browser import _find_agent_browser
1686
1687    config = load_config()
1688    config.ensure_dirs()
1689    if args.stop:
1690        result = subprocess.run([*_find_agent_browser(), "dashboard", "stop"], check=False)
1691        if result.returncode:
1692            raise SystemExit(result.returncode)
1693        print("agent-browser dashboard stopped")
1694        return
1695
1696    command = [*_find_agent_browser(), "dashboard", "start", "--port", str(args.port)]
1697    if args.foreground:
1698        raise SystemExit(subprocess.call(command))
1699
1700    log_path = Path(args.log_file).expanduser() if args.log_file else config.runtime.logs_dir / "browser-dashboard.log"
1701    log_path.parent.mkdir(parents=True, exist_ok=True)
1702    with log_path.open("a", encoding="utf-8") as log_file:
1703        process = subprocess.Popen(
1704            command,
1705            cwd=str(Path.cwd()),
1706            stdout=log_file,
1707            stderr=subprocess.STDOUT,
1708            start_new_session=True,
1709        )
1710    print(f"agent-browser dashboard started pid={process.pid}")
1711    print(f"url: http://127.0.0.1:{args.port}")
1712    print(f"log: {log_path}")
1713
1714
1715def _print_startup_history(job_id: str, *, limit: int, chars: int) -> None:
1716    db, config = _db()
1717    try:
1718        job = db.get_job(job_id)
1719        jobs = db.list_jobs()
1720        steps = db.list_steps(job_id=job_id)
1721        artifacts = db.list_artifacts(job_id, limit=1000)
1722        memory_entries = db.list_memory(job_id)
1723        events = db.list_timeline_events(job_id, limit=limit)
1724        daemon = daemon_lock_status(config.runtime.home / "agentd.lock")
1725    finally:
1726        db.close()
1727    print()
1728    _print_session_overview(
1729        job,
1730        steps=steps,
1731        artifacts=artifacts,
1732        memory_entries=memory_entries,
1733        daemon_running=bool(daemon["running"]),
1734        model=config.model.model,
1735        artifacts_dir=config.runtime.jobs_dir / job_id / "artifacts",
1736        jobs=jobs,
1737        chars=chars,
1738    )
1739    print()
1740    print(_section_title("Recent activity", f"{job['title']}"))
1741    if not events:
1742        print("  No visible history yet.")
1743        return
1744    display_events = _important_startup_events(events, limit=min(limit, 8))
1745    artifact_indexes = {str(artifact["id"]): index for index, artifact in enumerate(artifacts, start=1)}
1746    for event in display_events:
1747        _print_event_card(event, chars=min(chars, 140), artifact_indexes=artifact_indexes)
1748    if len(events) > len(display_events):
1749        print(f"  ... {len(events) - len(display_events)} older events hidden. Use /history for the full timeline.")
1750
1751
1752def _print_session_overview(
1753    job: dict[str, Any],
1754    *,
1755    steps: list[dict[str, Any]],
1756    artifacts: list[dict[str, Any]],
1757    memory_entries: list[dict[str, Any]],
1758    daemon_running: bool,
1759    model: str,
1760    artifacts_dir: Path,
1761    jobs: list[dict[str, Any]],
1762    chars: int,
1763) -> None:
1764    metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
1765    findings = _metadata_records(job, "finding_ledger")
1766    sources = _metadata_records(job, "source_ledger")
1767    tasks = _metadata_records(job, "task_queue")
1768    experiments = _metadata_records(job, "experiment_ledger")
1769    lessons = _metadata_records(job, "lessons")
1770    roadmap = metadata.get("roadmap") if isinstance(metadata.get("roadmap"), dict) else {}
1771    milestones = roadmap.get("milestones") if isinstance(roadmap.get("milestones"), list) else []
1772    open_tasks = sum(1 for task in tasks if str(task.get("status") or "open") in {"open", "active"})
1773    state = _job_display_state(job, daemon_running)
1774    worker = _worker_label(job, daemon_running)
1775    print(_section_title("Workspace"))
1776    print(f"  model      {model}")
1777    print(f"  focus      {job['title']}")
1778    print(f"  state      {_status_badge(state)}   worker {_status_badge(worker)}   kind {job['kind']}")
1779    next_action = _next_operator_action(job, daemon_running)
1780    if next_action:
1781        print(f"  next       {next_action}")
1782
1783    print()
1784    _print_jobs_panel(jobs, focused_job_id=str(job["id"]), daemon_running=daemon_running)
1785
1786    print()
1787    print(_section_title("Focus"))
1788    _print_wrapped(
1789        "  goal       ", job.get("objective") or "", width=_terminal_width(), subsequent_indent="             "
1790    )
1791    planning = metadata.get("planning") if isinstance(metadata.get("planning"), dict) else {}
1792    if job.get("status") == "planning" and planning:
1793        print("  plan       waiting for your answers or /run")
1794        questions = planning.get("questions") if isinstance(planning.get("questions"), list) else []
1795        for question in questions[:3]:
1796            _print_wrapped("  question   ", question, width=_terminal_width(), subsequent_indent="             ")
1797
1798    print()
1799    print(_section_title("Progress"))
1800    _print_metric_grid(
1801        [
1802            ("actions", _step_count(steps)),
1803            ("outputs", len(artifacts)),
1804            ("findings", len(findings)),
1805            ("sources", len(sources)),
1806            ("tasks", f"{len(tasks)} ({open_tasks} open)"),
1807            ("roadmap", len(milestones)),
1808            ("experiments", len(experiments)),
1809            ("lessons", len(lessons)),
1810            ("memory", len(memory_entries)),
1811        ]
1812    )
1813    print(f"  output dir {_short_path(artifacts_dir, max_width=min(_terminal_width() - 13, 84))}")
1814
1815
1816def _print_chat_composer(job: dict[str, Any]) -> None:
1817    width = min(_terminal_width(), 96)
1818    if _fancy_ui():
1819        print(_accent("╭─ Message " + "─" * max(0, width - 11)))
1820        print("│ Type normally to chat. Live steps stream above. /jobs switches workspaces. /help shows commands.")
1821        print("╰─" + "─" * max(0, width - 2))
1822        return
1823    print(_section_title("Message"))
1824    print("  Type normally to chat. Live steps stream above. /jobs switches workspaces. /help shows commands.")
1825
1826
1827def _chat_prompt(job: dict[str, Any]) -> str:
1828    return f"{_accent('nipux')} > "
1829
1830
1831def _start_chat_live_feed(job_id: str) -> tuple[threading.Event | None, threading.Thread | None]:
1832    if (
1833        not sys.stdin.isatty()
1834        or not sys.stdout.isatty()
1835        or os.environ.get("NIPUX_NO_LIVE")
1836        or os.environ.get("NIPUX_PLAIN")
1837    ):
1838        return None, None
1839    stop = threading.Event()
1840    thread = threading.Thread(target=_chat_live_feed_loop, args=(job_id, stop), daemon=True)
1841    thread.start()
1842    return stop, thread
1843
1844
1845def _chat_live_feed_loop(initial_job_id: str, stop: threading.Event) -> None:
1846    seen_by_job: dict[str, set[str]] = {}
1847    initialized_jobs: set[str] = set()
1848    active_job_id = initial_job_id
1849    while not stop.wait(1.0):
1850        try:
1851            db, _ = _db()
1852            try:
1853                focused = _default_job_id(db) or active_job_id
1854                active_job_id = focused
1855                seen = seen_by_job.setdefault(focused, set())
1856                events = db.list_events(job_id=focused, limit=40)
1857                if focused not in initialized_jobs:
1858                    initialized_jobs.add(focused)
1859                    seen.update(str(event.get("id") or "") for event in events)
1860                    continue
1861                for event in events:
1862                    event_id = str(event.get("id") or "")
1863                    if not event_id or event_id in seen:
1864                        continue
1865                    seen.add(event_id)
1866                    line = _minimal_live_event_line(event)
1867                    if line:
1868                        _print_live_line(line)
1869            finally:
1870                db.close()
1871        except Exception:
1872            continue
1873
1874
1875def _print_live_line(line: str) -> None:
1876    try:
1877        if _fancy_ui():
1878            print(f"\r\033[K{_live_badge(line)} {line}\n{_chat_prompt({})}", end="", flush=True)
1879        else:
1880            print(f"\n· {line}", flush=True)
1881    except Exception:
1882        return
1883
1884
1885def _resolve_control_job_and_note(db: AgentDB, args: argparse.Namespace) -> tuple[str | None, str, str | None]:
1886    if hasattr(args, "parts"):
1887        parts = [str(part) for part in getattr(args, "parts") or []]
1888        if not parts:
1889            return _default_job_id(db), "", None
1890        for end in range(len(parts), 0, -1):
1891            ref = " ".join(parts[:end])
1892            job = _find_job(db, ref)
1893            if job:
1894                return str(job["id"]), " ".join(parts[end:]).strip(), ref
1895        return None, "", " ".join(parts)
1896    job_ref = _job_ref_text(getattr(args, "job_id", None))
1897    return _resolve_job_id(db, job_ref), _note_text(getattr(args, "note", None)), job_ref
1898
1899
1900def _pid_is_alive(pid: int) -> bool:
1901    try:
1902        os.kill(pid, 0)
1903    except OSError:
1904        return False
1905    return True
1906
1907
1908def _step_by_id(db: AgentDB, job_id: str, step_id: str) -> dict[str, Any] | None:
1909    for step in db.list_steps(job_id=job_id):
1910        if step["id"] == step_id:
1911            return step
1912    return None
1913
1914
1915def _step_count(steps: list[dict[str, Any]]) -> int:
1916    numbers = [int(step.get("step_no") or 0) for step in steps]
1917    return max(numbers, default=0)
1918
1919
1920def _job_lessons(job: dict[str, Any]) -> list[dict[str, Any]]:
1921    return _metadata_records(job, "lessons")
1922
1923
1924def _metadata_records(job: dict[str, Any], key: str) -> list[dict[str, Any]]:
1925    metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
1926    values = metadata.get(key) if isinstance(metadata.get(key), list) else []
1927    return [entry for entry in values if isinstance(entry, dict)]
1928
1929
1930def _print_lessons(job: dict[str, Any], *, limit: int, chars: int) -> None:
1931    lessons = _job_lessons(job)
1932    print(f"lessons {job['title']}")
1933    print(_rule("="))
1934    if not lessons:
1935        print("none yet")
1936        print("add one with: learn this source is not useful for the current objective")
1937        return
1938    for index, lesson in enumerate(lessons[-limit:], start=max(1, len(lessons) - limit + 1)):
1939        category = lesson.get("category") or "memory"
1940        confidence = lesson.get("confidence")
1941        suffix = f" | confidence {confidence:g}" if isinstance(confidence, (int, float)) else ""
1942        print(f"{index:>2}. {category}{suffix}")
1943        print(f"    {_one_line(lesson.get('lesson') or '', chars)}")
1944
1945
1946def _resolve_artifact_ref(
1947    db: AgentDB,
1948    config: Any,
1949    query: str | None,
1950    *,
1951    job_id: str | None = None,
1952) -> dict[str, Any] | None:
1953    if not query:
1954        return None
1955    ref = query.strip()
1956    path = Path(ref).expanduser()
1957    if path.exists():
1958        return {"path": str(path), "title": path.name, "summary": ""}
1959
1960    ref_lower = ref.lower()
1961    focused_artifacts = db.list_artifacts(job_id, limit=250) if job_id else []
1962    if focused_artifacts and ref_lower in {"latest", "last", "newest"}:
1963        return focused_artifacts[0]
1964    index_ref = ref_lower[1:] if ref_lower.startswith("#") else ref_lower
1965    if focused_artifacts and index_ref.isdigit():
1966        index = int(index_ref)
1967        if 1 <= index <= len(focused_artifacts):
1968            return focused_artifacts[index - 1]
1969
1970    jobs = db.list_jobs()
1971    ordered_jobs = []
1972    if job_id:
1973        try:
1974            selected = db.get_job(job_id)
1975            ordered_jobs.append(selected)
1976        except KeyError:
1977            pass
1978    ordered_jobs.extend(job for job in jobs if not job_id or job["id"] != job_id)
1979    artifacts: list[dict[str, Any]] = []
1980    for job in ordered_jobs:
1981        artifacts.extend(db.list_artifacts(job["id"], limit=250))
1982
1983    for artifact in artifacts:
1984        if str(artifact["id"]).lower() == ref_lower:
1985            return artifact
1986    for artifact in artifacts:
1987        title = str(artifact.get("title") or "")
1988        if title.lower() == ref_lower:
1989            return artifact
1990    for artifact in artifacts:
1991        haystack = " ".join(str(artifact.get(key) or "") for key in ("title", "summary", "type")).lower()
1992        if ref_lower in haystack:
1993            return artifact
1994
1995    store = ArtifactStore(config.runtime.home, db=db)
1996    search_job_ids = [job_id] if job_id else [str(job["id"]) for job in ordered_jobs]
1997    for candidate_job_id in search_job_ids:
1998        if not candidate_job_id:
1999            continue
2000        for result in store.search_text(job_id=candidate_job_id, query=ref, limit=1):
2001            try:
2002                return db.get_artifact(str(result["id"]))
2003            except KeyError:
2004                continue
2005    return None
2006
2007
2008def cmd_logs(args: argparse.Namespace) -> None:
2009    db, _ = _db()
2010    try:
2011        job_id = _resolve_job_id(db, args.job_id)
2012        if not job_id:
2013            ref = _job_ref_text(args.job_id)
2014            print(f"No job matched: {ref}" if ref else "No jobs found.")
2015            return
2016        job = db.get_job(job_id)
2017        daemon = daemon_lock_status(load_config().runtime.home / "agentd.lock")
2018        print(f"{job['title']}\tstate {_job_display_state(job, bool(daemon['running']))}\t{job['kind']}")
2019        print()
2020        print("Runs")
2021        for run in db.list_runs(job_id, limit=args.limit):
2022            error = f"\tERROR {run['error']}" if run.get("error") else ""
2023            print(f"{run['started_at']}\t{run['status']}\t{run['id']}\t{run.get('model') or ''}{error}")
2024        print()
2025        print("Steps")
2026        steps = db.list_steps(job_id=job_id)[-args.limit :]
2027        if not steps:
2028            print("No steps recorded.")
2029        for step in steps:
2030            if args.verbose:
2031                _print_step(step, verbose=True, chars=args.chars)
2032            else:
2033                tool = step.get("tool_name") or "-"
2034                summary = _one_line(_clean_step_summary(step.get("summary") or ""), args.chars)
2035                error = f"\tERROR {step['error']}" if step.get("error") else ""
2036                print(
2037                    f"#{step['step_no']}\t{step['started_at']}\t{step['status']}\t{step['kind']}\t{tool}\t{summary}{error}"
2038                )
2039        print()
2040        print("Artifacts")
2041        artifacts = db.list_artifacts(job_id, limit=args.limit)
2042        if not artifacts:
2043            print("No artifacts recorded.")
2044        for artifact in artifacts:
2045            print(
2046                f"{artifact['created_at']}\t{artifact['type']}\t{artifact.get('title') or artifact['id']}\t{artifact['path']}"
2047            )
2048    finally:
2049        db.close()
2050
2051
2052def cmd_activity(args: argparse.Namespace) -> None:
2053    db, _ = _db()
2054    seen_events: set[str] = set()
2055    try:
2056        job_id = _resolve_job_id(db, args.job_id)
2057        if not job_id:
2058            print("No jobs found.")
2059            return
2060        job = db.get_job(job_id)
2061        daemon = daemon_lock_status(load_config().runtime.home / "agentd.lock")
2062        print(f"activity {job['title']} | state {_job_display_state(job, bool(daemon['running']))}")
2063        print("tool calls, artifacts, learning, and messages, oldest to newest")
2064        print(_rule("-"))
2065
2066        def emit() -> None:
2067            events = db.list_timeline_events(job_id, limit=args.limit)
2068            printed = False
2069            for event in events:
2070                event_id = str(event.get("id") or "")
2071                if event_id in seen_events:
2072                    continue
2073                print(_event_line(event, chars=args.chars, full=args.verbose))
2074                if args.verbose:
2075                    _print_event_details(event, chars=args.chars)
2076                if args.paths and event.get("event_type") == "artifact":
2077                    metadata = event.get("metadata") if isinstance(event.get("metadata"), dict) else {}
2078                    if metadata.get("path"):
2079                        print(f"     path: {metadata['path']}")
2080                seen_events.add(event_id)
2081                printed = True
2082            if printed:
2083                print(_rule("-"))
2084
2085        emit()
2086        while args.follow:
2087            time.sleep(args.interval)
2088            emit()
2089    except KeyboardInterrupt:
2090        print("\nactivity stopped")
2091    finally:
2092        db.close()
2093
2094
2095def cmd_updates(args: argparse.Namespace) -> None:
2096    db, config = _db()
2097    try:
2098        if getattr(args, "all", False):
2099            print(
2100                "\n".join(
2101                    render_all_updates_report(
2102                        db,
2103                        config,
2104                        limit=args.limit,
2105                        chars=args.chars,
2106                        paths=args.paths,
2107                    )
2108                )
2109            )
2110            return
2111        job_id = _resolve_job_id(db, args.job_id)
2112        if not job_id:
2113            print("No jobs found.")
2114            return
2115        print(
2116            "\n".join(
2117                render_updates_report(
2118                    db,
2119                    config,
2120                    job_id,
2121                    limit=args.limit,
2122                    chars=args.chars,
2123                    paths=args.paths,
2124                )
2125            )
2126        )
2127    finally:
2128        db.close()
2129
2130
2131def cmd_watch(args: argparse.Namespace) -> None:
2132    db, _ = _db()
2133    seen_runs: set[str] = set()
2134    seen_steps: set[str] = set()
2135    seen_artifacts: set[str] = set()
2136    try:
2137        job_id = _resolve_job_id(db, args.job_id)
2138        if not job_id:
2139            print(f"No job matched: {_job_ref_text(args.job_id)}")
2140            return
2141        job = db.get_job(job_id)
2142        daemon = daemon_lock_status(load_config().runtime.home / "agentd.lock")
2143        print(f"watching {job['title']} | state {_job_display_state(job, bool(daemon['running']))} | {job['kind']}")
2144        print(f"objective: {job['objective']}")
2145        print(
2146            "Note: this shows model-visible state, tool calls, outputs, and errors. It does not expose hidden chain-of-thought."
2147        )
2148        print()
2149
2150        def emit_snapshot(*, initial: bool = False) -> None:
2151            nonlocal job
2152            job = db.get_job(job_id)
2153            runs = list(reversed(db.list_runs(job_id, limit=args.limit)))
2154            steps = db.list_steps(job_id=job_id)[-args.limit :]
2155            artifacts = list(reversed(db.list_artifacts(job_id, limit=args.limit)))
2156            printed = False
2157            for run in runs:
2158                if run["id"] in seen_runs:
2159                    continue
2160                if not initial:
2161                    print()
2162                _print_run(run)
2163                seen_runs.add(run["id"])
2164                printed = True
2165            for step in steps:
2166                if step["id"] in seen_steps:
2167                    continue
2168                if not initial and not printed:
2169                    print()
2170                _print_step(step, verbose=args.verbose, chars=args.chars)
2171                seen_steps.add(step["id"])
2172                printed = True
2173            for artifact in artifacts:
2174                if artifact["id"] in seen_artifacts:
2175                    continue
2176                if not initial and not printed:
2177                    print()
2178                _print_artifact(artifact)
2179                seen_artifacts.add(artifact["id"])
2180                printed = True
2181            if printed:
2182                print(f"status: {job['status']}")
2183
2184        emit_snapshot(initial=True)
2185        while args.follow:
2186            time.sleep(args.interval)
2187            emit_snapshot()
2188    except KeyboardInterrupt:
2189        print("\nwatch stopped")
2190    finally:
2191        db.close()
2192
2193
2194def cmd_run_one(args: argparse.Namespace) -> None:
2195    from nipux_cli.worker import run_one_step
2196
2197    db, config = _db()
2198    try:
2199        job_id = _resolve_job_id(db, args.job_id)
2200        if not job_id:
2201            print(f"No job matched: {_job_ref_text(args.job_id)}")
2202            return
2203        if not args.fake and not _model_setup_verified(config):
2204            _ensure_model_setup_verified_for_workspace()
2205            return
2206        _activate_job_if_planning(db, job_id)
2207        llm = None
2208        if args.fake:
2209            from nipux_cli.llm import LLMResponse, ScriptedLLM, ToolCall
2210
2211            llm = ScriptedLLM(
2212                [
2213                    LLMResponse(
2214                        tool_calls=[
2215                            ToolCall(
2216                                name="write_artifact",
2217                                arguments={
2218                                    "title": "fake-step",
2219                                    "type": "text",
2220                                    "summary": "Fake one-step smoke artifact",
2221                                    "content": "This is a fake bounded worker step.",
2222                                },
2223                            )
2224                        ]
2225                    )
2226                ]
2227            )
2228        result = run_one_step(job_id, config=config, db=db, llm=llm)
2229        print(json.dumps(result.__dict__, ensure_ascii=False, indent=2))
2230    finally:
2231        db.close()
2232
2233
2234def cmd_work(args: argparse.Namespace) -> None:
2235    from nipux_cli.worker import run_one_step
2236
2237    db, config = _db()
2238    try:
2239        job_id = _resolve_job_id(db, args.job_id)
2240        if not job_id:
2241            print('No jobs found. Create one with: nipux create "objective"')
2242            return
2243        if not args.fake and not _model_setup_verified(config):
2244            _ensure_model_setup_verified_for_workspace()
2245            return
2246        _activate_job_if_planning(db, job_id)
2247        job = db.get_job(job_id)
2248        print(f"working {job['title']} | state foreground | {job['kind']}")
2249        print(
2250            "Note: this shows model-visible state, tool calls, outputs, and errors. It does not expose hidden chain-of-thought."
2251        )
2252        print()
2253        for index in range(1, args.steps + 1):
2254            llm = None
2255            if args.fake:
2256                from nipux_cli.llm import LLMResponse, ScriptedLLM, ToolCall
2257
2258                llm = ScriptedLLM(
2259                    [
2260                        LLMResponse(
2261                            tool_calls=[
2262                                ToolCall(
2263                                    name="write_artifact",
2264                                    arguments={
2265                                        "title": f"fake-work-step-{index}",
2266                                        "type": "text",
2267                                        "summary": "Fake foreground work step",
2268                                        "content": f"This is fake foreground work step {index}.",
2269                                    },
2270                                )
2271                            ]
2272                        )
2273                    ]
2274                )
2275            print(f"work step {index}/{args.steps}", flush=True)
2276            result = run_one_step(job_id, config=config, db=db, llm=llm)
2277            step = _step_by_id(db, job_id, result.step_id)
2278            if step:
2279                _print_step(step, verbose=args.verbose, chars=args.chars)
2280            else:
2281                print(json.dumps(result.__dict__, ensure_ascii=False, indent=2, default=_json_default))
2282            if args.dashboard:
2283                state = collect_dashboard_state(db, config, job_id=job_id, limit=args.limit)
2284                print()
2285                print(render_dashboard(state, width=_terminal_width(), chars=args.chars), end="")
2286            if result.status == "failed" and not args.continue_on_error:
2287                print("stopped after failed step; pass --continue-on-error to keep going")
2288                return
2289            if index < args.steps and args.poll_seconds > 0:
2290                time.sleep(args.poll_seconds)
2291    finally:
2292        db.close()
2293
2294
2295def _pause_job_for_recoverable_provider_preflight(
2296    db: AgentDB,
2297    config: Any,
2298    job_id: str,
2299    *,
2300    fake: bool,
2301    failures: list[str] | None = None,
2302) -> bool:
2303    if fake:
2304        return False
2305    failures = failures if failures is not None else _recoverable_remote_model_preflight_failures(config)
2306    if not failures:
2307        return False
2308    now = utc_now()
2309    detail = "; ".join(failures)
2310    job = db.get_job(job_id)
2311    already_provider_blocked = job_provider_blocked(job)
2312    if already_provider_blocked and str(job.get("status") or "") == "paused":
2313        db.update_job_metadata(
2314            job_id,
2315            {
2316                "provider_last_probe_at": now,
2317                "provider_last_probe_detail": detail[:1000],
2318                "last_note": "Model provider still unavailable; daemon will check again later.",
2319            },
2320        )
2321        return True
2322    note = "Model provider is unavailable; daemon will monitor and resume this job when calls succeed."
2323    db.update_job_status(
2324        job_id,
2325        "paused",
2326        metadata_patch={
2327            "last_note": note,
2328            "provider_blocked_at": str(job.get("metadata", {}).get("provider_blocked_at") or now)
2329            if already_provider_blocked
2330            else now,
2331            "provider_last_probe_at": now,
2332            "provider_last_probe_detail": detail[:1000],
2333        },
2334    )
2335    if not already_provider_blocked:
2336        db.append_agent_update(
2337            job_id,
2338            note,
2339            category="error",
2340            metadata={"reason": "llm_provider_blocked", "detail": detail[:1000]},
2341        )
2342    return True
2343
2344
2345def cmd_run(args: argparse.Namespace) -> None:
2346    config = load_config()
2347    preflight_failures = [] if args.fake or _model_setup_verified(config) else _remote_model_preflight_failures(config)
2348    preflight_recoverable = _provider_preflight_is_recoverable(preflight_failures)
2349    can_prepare_job = not preflight_failures or preflight_recoverable
2350    requested = _job_ref_text(args.job_id)
2351    if requested:
2352        db, _ = _db()
2353        try:
2354            job = _find_job(db, requested)
2355            if not job:
2356                print(f"No job matched: {requested}")
2357                return
2358            args.job_id = job["id"]
2359            _write_shell_state({"focus_job_id": job["id"]})
2360            if can_prepare_job:
2361                already_provider_blocked = preflight_recoverable and job_provider_blocked(job)
2362                if not already_provider_blocked:
2363                    _ensure_job_runnable(db, job["id"])
2364                if preflight_recoverable:
2365                    _pause_job_for_recoverable_provider_preflight(
2366                        db,
2367                        config,
2368                        job["id"],
2369                        fake=bool(args.fake),
2370                        failures=preflight_failures,
2371                    )
2372            job = db.get_job(job["id"])
2373            daemon = daemon_lock_status(load_config().runtime.home / "agentd.lock")
2374            print(f"focus set: {job['title']} | job {_job_display_state(job, bool(daemon['running']))}")
2375        finally:
2376            db.close()
2377    else:
2378        db, _ = _db()
2379        try:
2380            job_id = _default_job_id(db)
2381            if job_id:
2382                if can_prepare_job:
2383                    job = db.get_job(job_id)
2384                    already_provider_blocked = preflight_recoverable and job_provider_blocked(job)
2385                    if not already_provider_blocked:
2386                        _ensure_job_runnable(db, job_id)
2387                    if preflight_recoverable:
2388                        _pause_job_for_recoverable_provider_preflight(
2389                            db,
2390                            config,
2391                            job_id,
2392                            fake=bool(args.fake),
2393                            failures=preflight_failures,
2394                        )
2395            else:
2396                print("No jobs found. Create one with /new OBJECTIVE.")
2397                return
2398        finally:
2399            db.close()
2400    _start_daemon_if_needed(
2401        poll_seconds=args.poll_seconds,
2402        fake=args.fake,
2403        quiet=args.quiet,
2404        log_file=args.log_file,
2405    )
2406    if args.no_follow:
2407        return
2408    cmd_activity(
2409        argparse.Namespace(
2410            job_id=args.job_id,
2411            limit=args.limit,
2412            chars=args.chars,
2413            follow=True,
2414            interval=args.interval,
2415            verbose=args.verbose,
2416            paths=args.paths,
2417        )
2418    )
2419
2420
2421def cmd_digest(args: argparse.Namespace) -> None:
2422    db, config = _db()
2423    try:
2424        job_id = _resolve_job_id(db, args.job_id)
2425        if not job_id:
2426            print(f"No job matched: {_job_ref_text(args.job_id)}")
2427            return
2428        print(
2429            render_job_digest(
2430                db,
2431                job_id,
2432                model=config.model.model,
2433                base_url=config.model.base_url,
2434                context_length=config.model.context_length,
2435                input_cost_per_million=config.model.input_cost_per_million,
2436                output_cost_per_million=config.model.output_cost_per_million,
2437            ),
2438            end="",
2439        )
2440    finally:
2441        db.close()
2442
2443
2444def cmd_daily_digest(args: argparse.Namespace) -> None:
2445    db, config = _db()
2446    try:
2447        result = write_daily_digest(config, db, day=args.day)
2448        print(json.dumps(result, ensure_ascii=False, indent=2))
2449    finally:
2450        db.close()
2451
2452
2453def cmd_daemon(args: argparse.Namespace) -> None:
2454    config = load_config()
2455    if not _ensure_remote_model_ready_for_worker(config, fake=args.fake):
2456        raise SystemExit(2)
2457    daemon = Daemon.open(config=config)
2458    try:
2459        if args.once:
2460            result = daemon.run_once(fake=args.fake, verbose=args.verbose)
2461            print(json.dumps(result.__dict__ if result else None, ensure_ascii=False, indent=2))
2462            return
2463        daemon.run_forever(fake=args.fake, poll_seconds=args.poll_seconds, quiet=args.quiet, verbose=args.verbose)
2464    except DaemonAlreadyRunning as exc:
2465        raise SystemExit(str(exc)) from exc
2466    finally:
2467        daemon.close()
2468
2469
2470def cmd_doctor(args: argparse.Namespace) -> None:
2471    config = load_config()
2472    checks = run_doctor(config=config, check_model=args.check_model)
2473    for check in checks:
2474        status = "ok" if check.ok else "fail"
2475        print(f"{status}\t{check.name}\t{check.detail}")
2476    ok = all(check.ok for check in checks)
2477    if args.check_model:
2478        if ok:
2479            _mark_model_setup_verified(config)
2480            print("ok\tmodel_setup\tverified for workspace and chat")
2481        else:
2482            _clear_model_setup_verified()
2483    if not ok:
2484        raise SystemExit(1)
2485
2486
2487def _verify_model_setup_from_first_run() -> list[str]:
2488    stream = StringIO()
2489    with redirect_stdout(stream):
2490        try:
2491            cmd_doctor(argparse.Namespace(check_model=True))
2492        except SystemExit as exc:
2493            if exc.code not in (None, 0):
2494                print("Model setup is not ready. Fix the failed check above before creating a job.")
2495                print("Use /base-url URL, /api-key KEY, or /model MODEL here, then run Doctor again.")
2496                print("For a local endpoint, start the local server or change the endpoint.")
2497    lines = [" ".join(item.split()) for item in stream.getvalue().splitlines() if item.strip()]
2498    return lines[-12:] or ["done"]
2499
2500
2501def _chat_handle_line(job_id: str, line: str, *, reply_fn=None) -> bool:
2502    line = line.strip()
2503    if not line:
2504        return True
2505    if line.startswith("chat "):
2506        db, _ = _db()
2507        try:
2508            job = db.get_job(job_id)
2509            print(f"already chatting with {job['title']}; type your message, /run, or /exit")
2510            return True
2511        finally:
2512            db.close()
2513    if line in {"/exit", "/quit", "exit", "quit"}:
2514        return False
2515    if line in {"/help", "help"}:
2516        print("Core workflow:")
2517        print("  /new OBJECTIVE       create a job and start work")
2518        print("  /run                 resume/start the focused job")
2519        print("  /jobs                switch or inspect jobs")
2520        print("  /status              current job state")
2521        print("  /outcomes            durable progress")
2522        print("  /artifacts           saved files")
2523        print("  /activity            tool calls")
2524        print("  /pause /resume       control the focused job")
2525        print()
2526        print("All commands:")
2527        print("  /jobs /focus JOB_TITLE /switch JOB_TITLE /new OBJECTIVE /delete [JOB_TITLE]")
2528        print("  /history /events /activity /outputs /updates /outcomes [all] /status /usage /config /settings /health")
2529        print("  /artifacts /artifact QUERY /findings /tasks /roadmap /experiments /sources /memory /metrics /lessons")
2530        print("  /model MODEL /base-url URL /api-key KEY /api-key-env ENV /context TOKENS")
2531        print("  /input-cost DOLLARS_PER_1M_INPUT_TOKENS /output-cost DOLLARS_PER_1M_OUTPUT_TOKENS")
2532        print("  /browser true|false /web true|false /cli-access true|false /file-access true|false")
2533        print("  /timeout SECONDS /home PATH /step-limit SECONDS /output-chars CHARS /daily-digest BOOL /digest-time HH:MM /doctor")
2534        print("  /run /start /restart /work N /work-verbose N /stop /pause [note] /resume /cancel [note]")
2535        print("  /learn LESSON /note MESSAGE /follow MESSAGE /digest /clear /exit")
2536        print("Plain text gets a model reply and is saved as model-visible steering.")
2537        return True
2538    if line in {"clear", "/clear"}:
2539        print("\033[2J\033[H", end="")
2540        return True
2541    if line == "jobs" or line == "ls" or line.startswith("jobs "):
2542        cmd_jobs(argparse.Namespace())
2543        return True
2544    if line.startswith(("focus ", "switch ")):
2545        parts = shlex.split(line)
2546        cmd_focus(argparse.Namespace(query=parts[1:]))
2547        return True
2548    if line.startswith("/"):
2549        parts = shlex.split(line[1:])
2550        if not parts:
2551            return True
2552        return _handle_chat_slash_command(job_id, parts[0], parts[1:], deps=_chat_command_deps())
2553    if reply_fn is None:
2554        reply_fn = _reply_to_chat
2555    _handle_chat_message(job_id, line, reply_fn=reply_fn)
2556    return True
2557
2558
2559def _handle_chat_message(job_id: str, line: str, *, reply_fn=None, quiet: bool = False) -> tuple[bool, str]:
2560    if not _model_setup_verified(load_config()):
2561        message = (
2562            "Model setup is not verified. Complete setup or run /doctor after configuring a working provider."
2563        )
2564        if not quiet:
2565            print(message)
2566        return True, message
2567    if job_id == WORKSPACE_CHAT_ID:
2568        return _handle_workspace_chat_message(line, quiet=quiet)
2569    return _controller_handle_chat_message(
2570        job_id,
2571        line,
2572        deps=_chat_controller_deps(),
2573        reply_fn=reply_fn,
2574        quiet=quiet,
2575    )
2576
2577
2578def _chat_reply_text_and_metadata(reply: Any) -> tuple[str, dict[str, Any]]:
2579    return _controller_reply_text_and_metadata(reply)
2580
2581
2582def _workspace_chat_events() -> list[dict[str, Any]]:
2583    events = _read_shell_state().get("workspace_chat_events")
2584    if not isinstance(events, list):
2585        return []
2586    return [event for event in events if isinstance(event, dict)][-120:]
2587
2588
2589def _append_workspace_chat_event(event_type: str, title: str, body: str, metadata: dict[str, Any] | None = None) -> None:
2590    events = _workspace_chat_events()
2591    events.append(
2592        {
2593            "id": f"workspace_{len(events) + 1}_{int(time.time() * 1000)}",
2594            "job_id": WORKSPACE_CHAT_ID,
2595            "event_type": event_type,
2596            "created_at": utc_now(),
2597            "title": title,
2598            "body": body,
2599            "metadata": metadata or {},
2600        }
2601    )
2602    _write_shell_state({"workspace_chat_events": events[-120:]})
2603
2604
2605def _handle_workspace_chat_message(line: str, *, quiet: bool = False) -> tuple[bool, str]:
2606    _append_workspace_chat_event("operator_message", "chat", line, {"source": "workspace"})
2607    objective = _extract_job_objective_from_message(line)
2608    if objective:
2609        message = _create_workspace_job_from_chat(line, objective)
2610        _append_workspace_chat_event("agent_message", "chat", message, {"source": "workspace"})
2611        if not quiet:
2612            print(message)
2613        return True, message
2614    control_command = chat_control_command(line)
2615    if control_command:
2616        keep_running, output = _capture_chat_command(WORKSPACE_CHAT_ID, control_command)
2617        compact = _compact_command_output(output)
2618        message = " | ".join(compact[-4:]) if compact else f"{control_command.lstrip('/')} done"
2619        _append_workspace_chat_event(
2620            "agent_message",
2621            "chat",
2622            message,
2623            {"source": "workspace", "command": control_command},
2624        )
2625        if not quiet:
2626            print(message)
2627        return keep_running, message
2628    try:
2629        reply = _reply_to_workspace_chat(line)
2630    except Exception as exc:
2631        message = _friendly_error_text(f"{type(exc).__name__}: {exc}")
2632        _append_workspace_chat_event("agent_message", "chat", message, {"source": "workspace", "error": True})
2633        if not quiet:
2634            print(message)
2635        return True, message
2636    reply_text, reply_metadata = _chat_reply_text_and_metadata(reply)
2637    text = reply_text.strip() or "I did not get a usable model reply."
2638    _append_workspace_chat_event("agent_message", "chat", text, {"source": "workspace", **reply_metadata})
2639    if not quiet:
2640        print(text)
2641    return True, text
2642
2643
2644def _create_workspace_job_from_chat(message: str, objective: str) -> str:
2645    refined = _refine_job_objective_for_worker(message=message, objective=objective)
2646    job_id, title = _create_job(objective=refined, title=None, kind="generic", cadence=None)
2647    _write_shell_state({"focus_job_id": job_id})
2648    db, _config = _db()
2649    try:
2650        db.append_operator_message(job_id, message, source="workspace_chat", mode="steer")
2651        db.append_agent_update(
2652            job_id,
2653            "Created from Nipux workspace chat with an expanded long-running objective.",
2654            category="chat",
2655        )
2656    finally:
2657        db.close()
2658    run_now = not message_requests_queued_job(message) or message_requests_immediate_run(message)
2659    text = f"Created worker job: {title}."
2660    if run_now:
2661        if _start_worker_from_chat_context():
2662            text += " Started worker."
2663        else:
2664            text += " Worker is waiting for a working model."
2665    else:
2666        text += " It is queued; tell me to run it when ready."
2667    return text
2668
2669
2670def _refine_job_objective_for_worker(*, message: str, objective: str) -> str:
2671    fallback = _durable_job_objective(objective)
2672    try:
2673        from nipux_cli.llm import OpenAIChatLLM
2674
2675        _db_handle, config = _db()
2676        _db_handle.close()
2677        prompt = [
2678            {
2679                "role": "system",
2680                "content": (
2681                    "You rewrite operator requests into strong, generic Nipux worker objectives. "
2682                    "Nipux workers are long-running autonomous jobs with browser, web, CLI, file, artifact, "
2683                    "memory, roadmap, task, source, finding, and experiment tools. "
2684                    "Return only the objective text for the worker. Start with one concise title line, then add "
2685                    "clear success criteria, output expectations, constraints, evidence requirements, progress "
2686                    "reporting expectations, and instructions to keep improving until no useful progress remains. "
2687                    "Do not invent hosts, credentials, accounts, domains, models, or private details that the operator did not provide."
2688                ),
2689            },
2690            {
2691                "role": "user",
2692                "content": f"Operator message:\n{message}\n\nExtracted objective:\n{objective}",
2693            },
2694        ]
2695        refined = OpenAIChatLLM(config.model).complete(messages=prompt).strip()
2696    except Exception:
2697        return fallback
2698    if len(refined) < 20:
2699        return fallback
2700    return refined[:8000]
2701
2702
2703def _durable_job_objective(objective: str) -> str:
2704    cleaned = " ".join(str(objective or "").split()).strip() or "Long-running Nipux job"
2705    title = _one_line(cleaned, 96)
2706    return (
2707        f"{title}\n\n"
2708        "Run this as a durable long-running Nipux worker job.\n"
2709        "- Clarify and preserve the operator's actual goal, constraints, and success criteria.\n"
2710        "- Build a roadmap before deep work, then keep the task queue current as evidence changes.\n"
2711        "- Produce concrete outputs as artifacts or files when the work creates something useful.\n"
2712        "- Record findings, sources, lessons, experiments, and measurable results when they apply.\n"
2713        "- Separate activity from progress: report what changed, what was learned, what failed, and what branch is next.\n"
2714        "- Keep improving autonomously until no useful progress remains, the operator pauses the job, or a real blocker needs operator input."
2715    )
2716
2717
2718def _workspace_chat_job_dossier(db: AgentDB, jobs: list[dict[str, Any]], *, limit: int = 8) -> str:
2719    """Compact job context for the left-side workspace chat model."""
2720
2721    if not jobs:
2722        return "No worker jobs yet."
2723    sections: list[str] = []
2724    for index, job in enumerate(jobs[:limit], start=1):
2725        job_id = str(job.get("id") or "")
2726        metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
2727        counts = _safe_job_counts(db, job_id)
2728        findings = _metadata_records(job, "finding_ledger")
2729        sources = _metadata_records(job, "source_ledger")
2730        tasks = _metadata_records(job, "task_queue")
2731        experiments = _metadata_records(job, "experiment_ledger")
2732        lessons = _metadata_records(job, "lessons")
2733        roadmap = metadata.get("roadmap") if isinstance(metadata.get("roadmap"), dict) else {}
2734        milestones = roadmap.get("milestones") if isinstance(roadmap.get("milestones"), list) else []
2735        open_tasks = sum(1 for task in tasks if str(task.get("status") or "open") in {"open", "active", "blocked"})
2736        artifacts = _safe_list_artifacts(db, job_id, limit=3)
2737        events = _safe_list_events(db, job_id, limit=40)
2738        steps = _safe_list_steps(db, job_id, limit=1)
2739        current_task = _workspace_current_task(tasks)
2740        recent_outcomes = _workspace_recent_outcomes(events, limit=5)
2741        latest_outputs = [
2742            _one_line(str(artifact.get("title") or artifact.get("id") or "saved output"), 160)
2743            for artifact in artifacts[:3]
2744        ]
2745        progress = (
2746            f"actions={counts.get('steps', 0)} outputs={counts.get('artifacts', 0)} "
2747            f"findings={len(findings)} sources={len(sources)} tasks={len(tasks)}/{open_tasks} open "
2748            f"experiments={len(experiments)} lessons={len(lessons)} memory={counts.get('memory', 0)} "
2749            f"roadmap={len(milestones)}"
2750        )
2751        latest_step = _one_line(_step_line(steps[-1]), 180) if steps else "no worker steps yet"
2752        lines = [
2753            f"{index}. {job.get('title') or job_id} | state={job.get('status') or 'unknown'} kind={job.get('kind') or 'generic'}",
2754            f"   objective: {_one_line(job.get('objective') or '', 220)}",
2755            f"   progress: {progress}",
2756            f"   latest step: {latest_step}",
2757        ]
2758        if current_task:
2759            lines.append(f"   active task: {current_task}")
2760        if latest_outputs:
2761            lines.append(f"   latest outputs: {'; '.join(latest_outputs)}")
2762        if recent_outcomes:
2763            lines.append(f"   recent outcomes: {'; '.join(recent_outcomes)}")
2764        sections.append("\n".join(lines))
2765    if len(jobs) > limit:
2766        sections.append(f"... {len(jobs) - limit} more job(s) available.")
2767    return "\n\n".join(sections)
2768
2769
2770def _safe_job_counts(db: AgentDB, job_id: str) -> dict[str, int]:
2771    if not job_id:
2772        return {"steps": 0, "artifacts": 0, "memory": 0, "events": 0}
2773    try:
2774        return db.job_record_counts(job_id)
2775    except Exception:
2776        return {"steps": 0, "artifacts": 0, "memory": 0, "events": 0}
2777
2778
2779def _safe_list_artifacts(db: AgentDB, job_id: str, *, limit: int) -> list[dict[str, Any]]:
2780    try:
2781        return db.list_artifacts(job_id, limit=limit)
2782    except Exception:
2783        return []
2784
2785
2786def _safe_list_events(db: AgentDB, job_id: str, *, limit: int) -> list[dict[str, Any]]:
2787    try:
2788        return db.list_timeline_events(job_id, limit=limit)
2789    except Exception:
2790        return []
2791
2792
2793def _safe_list_steps(db: AgentDB, job_id: str, *, limit: int) -> list[dict[str, Any]]:
2794    try:
2795        return db.list_steps(job_id=job_id, limit=limit)
2796    except Exception:
2797        return []
2798
2799
2800def _workspace_current_task(tasks: list[dict[str, Any]]) -> str:
2801    visible = [
2802        task
2803        for task in tasks
2804        if str(task.get("status") or "open") in {"active", "open", "blocked"}
2805    ]
2806    if not visible:
2807        return ""
2808    visible.sort(
2809        key=lambda task: (
2810            {"active": 0, "open": 1, "blocked": 2}.get(str(task.get("status") or "open"), 9),
2811            -int(task.get("priority") or 0),
2812        )
2813    )
2814    task = visible[0]
2815    status = str(task.get("status") or "open")
2816    contract = str(task.get("output_contract") or "")
2817    suffix = f" [{contract}]" if contract else ""
2818    return _one_line(f"{status} {task.get('title') or 'task'}{suffix}", 180)
2819
2820
2821def _workspace_recent_outcomes(events: list[dict[str, Any]], *, limit: int) -> list[str]:
2822    outcomes: list[str] = []
2823    seen: set[str] = set()
2824    for event in reversed(events):
2825        parsed = _model_update_event_parts(event, width=240, compact=True)
2826        if not parsed:
2827            continue
2828        label, text, _clock = parsed
2829        if label == "DONE":
2830            continue
2831        piece = _one_line(f"{label.lower()} {text}", 180)
2832        if piece in seen:
2833            continue
2834        seen.add(piece)
2835        outcomes.append(piece)
2836        if len(outcomes) >= limit:
2837            break
2838    return outcomes
2839
2840
2841def _reply_to_workspace_chat(message: str) -> Any:
2842    from nipux_cli.llm import OpenAIChatLLM
2843
2844    db, config = _db()
2845    try:
2846        jobs = db.list_jobs()[:12]
2847        job_dossier = _workspace_chat_job_dossier(db, jobs)
2848        workspace_events = _workspace_chat_events()[-12:]
2849        history_lines = [
2850            f"- {event.get('event_type')} {event.get('title')}: {_one_line(event.get('body') or '', 220)}"
2851            for event in workspace_events
2852        ]
2853        messages = [
2854            {
2855                "role": "system",
2856                "content": (
2857                    "You are Nipux, the workspace chat model for a generic long-running agent CLI. "
2858                    "Your job is to help the operator create, start, inspect, pause, resume, and steer worker jobs. "
2859                    "You know the CLI concepts: jobs are long-running workers; artifacts are saved outputs; outcomes summarize durable progress; "
2860                    "the updates page shows durable worker outcomes; the jobs page shows state, outputs, tasks, memory, findings, sources, experiments, and cost. "
2861                    "Answer job-status questions from the job dossier. Mention concrete outputs, tasks, measurements, sources, blockers, and next branches when present. "
2862                    "When the operator asks you to do new work, explain that Nipux will spin up a worker job; the harness will create the job from plain language. "
2863                    "Keep replies concise, concrete, and operator-facing. Do not expose hidden chain-of-thought."
2864                ),
2865            },
2866            {
2867                "role": "user",
2868                "content": (
2869                    f"Model: {config.model.model}\n"
2870                    f"Endpoint: {config.model.base_url}\n"
2871                    f"Tools: browser={config.tools.browser}, web={config.tools.web}, CLI={config.tools.shell}, files={config.tools.files}\n\n"
2872                    f"Job dossier:\n{job_dossier}\n\n"
2873                    f"Recent workspace chat:\n{chr(10).join(history_lines) or 'None yet.'}\n\n"
2874                    f"Operator message:\n{message}"
2875                ),
2876            },
2877        ]
2878    finally:
2879        db.close()
2880    return OpenAIChatLLM(config.model).complete_response(messages=messages)
2881
2882
2883def _handle_chat_control_intent(job_id: str, line: str, *, quiet: bool = False) -> tuple[bool, str] | None:
2884    return _controller_handle_chat_control_intent(job_id, line, deps=_chat_controller_deps(), quiet=quiet)
2885
2886
2887def _maybe_spawn_job_from_chat(job_id: str, message: str, *, quiet: bool = False) -> str:
2888    return _controller_maybe_spawn_job_from_chat(job_id, message, deps=_chat_controller_deps(), quiet=quiet)
2889
2890
2891def _queue_chat_note(job_id: str, message: str, *, mode: str = "steer", quiet: bool = False) -> None:
2892    _controller_queue_chat_note(job_id, message, deps=_chat_controller_deps(), mode=mode, quiet=quiet)
2893
2894
2895def _chat_controller_deps() -> ChatControllerDeps:
2896    return ChatControllerDeps(
2897        db_factory=_db,
2898        reply_fn=_reply_to_chat,
2899        create_job=_create_job,
2900        write_shell_state=_write_shell_state,
2901        start_daemon=_start_worker_from_chat_context,
2902        capture_command=_capture_chat_command,
2903        compact_command_output=_compact_command_output,
2904        friendly_error_text=_friendly_error_text,
2905    )
2906
2907
2908def _chat_command_deps() -> ChatCommandDeps:
2909    return ChatCommandDeps(
2910        db_factory=_db,
2911        jobs=cmd_jobs,
2912        history=cmd_history,
2913        events=cmd_events,
2914        logs=cmd_logs,
2915        updates=cmd_updates,
2916        artifacts=cmd_artifacts,
2917        artifact=cmd_artifact,
2918        lessons=cmd_lessons,
2919        findings=cmd_findings,
2920        tasks=cmd_tasks,
2921        roadmap=cmd_roadmap,
2922        experiments=cmd_experiments,
2923        sources=cmd_sources,
2924        memory=cmd_memory,
2925        metrics=cmd_metrics,
2926        activity=cmd_activity,
2927        digest=cmd_digest,
2928        status=cmd_status,
2929        usage=cmd_usage,
2930        handle_setting=_handle_chat_setting_command,
2931        doctor=cmd_doctor,
2932        init=cmd_init,
2933        health=cmd_health,
2934        start=_start_worker_from_chat_namespace,
2935        ensure_job_runnable=_ensure_job_runnable,
2936        run=cmd_run,
2937        restart=cmd_restart,
2938        work=cmd_work,
2939        pause=cmd_pause,
2940        resume=cmd_resume,
2941        cancel=cmd_cancel,
2942        queue_note=_queue_chat_note,
2943        create_job=_create_job,
2944        focus=cmd_focus,
2945        delete=cmd_delete,
2946    )
2947
2948
2949def _reply_to_chat(job_id: str, message: str) -> Any:
2950    from nipux_cli.llm import OpenAIChatLLM
2951
2952    db, config = _db()
2953    try:
2954        job = db.get_job(job_id)
2955        messages = _build_chat_messages(db, job, message)
2956    finally:
2957        db.close()
2958    return OpenAIChatLLM(config.model).complete_response(messages=messages)
2959
2960
2961def cmd_shell(args: argparse.Namespace) -> None:
2962    _install_readline_history()
2963    _print_shell_header()
2964    print()
2965    if args.status:
2966        _print_shell_status(limit=args.limit, chars=args.chars)
2967    while True:
2968        try:
2969            line = input(_shell_prompt())
2970        except EOFError:
2971            print()
2972            return
2973        except KeyboardInterrupt:
2974            print()
2975            continue
2976        if not _run_shell_line(line):
2977            return
2978
2979
2980def _print_shell_header() -> None:
2981    print(NIPUX_BANNER)
2982    print(_rule("="))
2983    print(_shell_summary())
2984    print("Type 'chat' to talk, 'history' or 'artifacts' to inspect output, or plain text to steer.")
2985    print("Trace output is observable state and tool I/O, not hidden chain-of-thought.")
2986    print(_rule("="))
2987
2988
2989def _shell_summary() -> str:
2990    db, config = _db()
2991    try:
2992        daemon = daemon_lock_status(config.runtime.home / "agentd.lock")
2993        job_id = _default_job_id(db)
2994        if not job_id:
2995            focus = "no jobs"
2996        else:
2997            job = db.get_job(job_id)
2998            state = _job_display_state(job, bool(daemon["running"]))
2999            focus = f"{job['title']} [job {state} | worker {_worker_label(job, bool(daemon['running']))}]"
3000        daemon_text = "running" if daemon["running"] else "stopped"
3001        return f"daemon: {daemon_text} | model: {config.model.model} | focus: {focus}"
3002    finally:
3003        db.close()
3004
3005
3006def _shell_prompt() -> str:
3007    db, _ = _db()
3008    try:
3009        job_id = _default_job_id(db)
3010        if not job_id:
3011            return "nipux> "
3012        job = db.get_job(job_id)
3013        title = str(job.get("title") or job_id).strip()[:22]
3014        daemon = daemon_lock_status(load_config().runtime.home / "agentd.lock")
3015        worker = _worker_label(job, bool(daemon["running"]))
3016        return f"nipux[{title}:{worker}]> "
3017    except Exception:
3018        return "nipux> "
3019    finally:
3020        db.close()
3021
3022
3023def _install_readline_history() -> None:
3024    try:
3025        import atexit
3026        import readline
3027    except ImportError:
3028        return
3029    config = load_config()
3030    config.ensure_dirs()
3031    history_path = config.runtime.home / "shell_history"
3032    try:
3033        readline.read_history_file(history_path)
3034    except OSError:
3035        pass
3036    atexit.register(readline.write_history_file, history_path)
3037
3038
3039def _print_shell_status(*, limit: int, chars: int) -> None:
3040    db, config = _db()
3041    try:
3042        state = collect_dashboard_state(db, config, limit=limit)
3043        print(render_dashboard(state, width=_terminal_width(), chars=chars), end="")
3044        print()
3045    finally:
3046        db.close()
3047
3048
3049def _print_shell_help() -> None:
3050    _render_shell_help(rule=_rule)
3051
3052
3053def _run_shell_line(line: str) -> bool:
3054    line = line.strip()
3055    if not line:
3056        return True
3057    if line in {"exit", "quit", ":q"}:
3058        return False
3059    if line in {"help", "?", "commands"}:
3060        _print_shell_help()
3061        return True
3062    if line == "clear":
3063        print("\033[2J\033[H", end="")
3064        return True
3065    try:
3066        tokens = shlex.split(line)
3067    except ValueError as exc:
3068        print(f"parse error: {exc}")
3069        return True
3070    if tokens and tokens[0] == "nipux":
3071        tokens = tokens[1:]
3072    if not tokens:
3073        return True
3074    natural = natural_command_for(" ".join(tokens))
3075    if natural:
3076        tokens = [natural]
3077    if tokens[0] == "ls":
3078        tokens[0] = "jobs"
3079    if tokens[0] == "new":
3080        tokens[0] = "create"
3081    if tokens[0] == "focus" and len(tokens) > 1 and tokens[1].lower() in {"on", "more", "only"}:
3082        _steer_default_job(line)
3083        return True
3084    if tokens[0] not in SHELL_COMMAND_NAMES and tokens[0] not in SHELL_BUILTINS:
3085        _steer_default_job(line)
3086        return True
3087    try:
3088        parser = build_parser()
3089        parsed = parser.parse_args(tokens)
3090        if parsed.func is cmd_shell:
3091            print("already in nipux shell")
3092            return True
3093        parsed.func(parsed)
3094    except SystemExit as exc:
3095        code = exc.code if isinstance(exc.code, int) else 1
3096        if code:
3097            print(f"command exited with status {code}")
3098    return True
3099
3100
3101def _steer_default_job(message: str) -> None:
3102    db, _ = _db()
3103    try:
3104        job_id = _default_job_id(db)
3105        if not job_id:
3106            print('No focused job. Create one first, or run: create "objective"')
3107            return
3108        job = db.get_job(job_id)
3109        entry = db.append_operator_message(job_id, message, source="shell")
3110        print(f"waiting for {job['title']}: {entry['message']}")
3111        print("Waiting for the next worker step.")
3112    finally:
3113        db.close()
3114
3115
3116def build_parser() -> argparse.ArgumentParser:
3117    return build_arg_parser(
3118        handlers={
3119            "init": cmd_init,
3120            "update": cmd_update,
3121            "uninstall": cmd_uninstall,
3122            "create": cmd_create,
3123            "jobs": cmd_jobs,
3124            "focus": cmd_focus,
3125            "rename": cmd_rename,
3126            "delete": cmd_delete,
3127            "chat": cmd_chat,
3128            "shell": cmd_shell,
3129            "steer": cmd_steer,
3130            "pause": cmd_pause,
3131            "resume": cmd_resume,
3132            "cancel": cmd_cancel,
3133            "status": cmd_status,
3134            "health": cmd_health,
3135            "history": cmd_history,
3136            "events": cmd_events,
3137            "dashboard": cmd_dashboard,
3138            "start": cmd_start,
3139            "stop": cmd_stop,
3140            "restart": cmd_restart,
3141            "browser_dashboard": cmd_browser_dashboard,
3142            "autostart": cmd_autostart,
3143            "service": cmd_service,
3144            "artifacts": cmd_artifacts,
3145            "artifact": cmd_artifact,
3146            "lessons": cmd_lessons,
3147            "learn": cmd_learn,
3148            "findings": cmd_findings,
3149            "tasks": cmd_tasks,
3150            "roadmap": cmd_roadmap,
3151            "experiments": cmd_experiments,
3152            "sources": cmd_sources,
3153            "memory": cmd_memory,
3154            "metrics": cmd_metrics,
3155            "usage": cmd_usage,
3156            "logs": cmd_logs,
3157            "activity": cmd_activity,
3158            "updates": cmd_updates,
3159            "watch": cmd_watch,
3160            "run_one": cmd_run_one,
3161            "work": cmd_work,
3162            "run": cmd_run,
3163            "digest": cmd_digest,
3164            "daily_digest": cmd_daily_digest,
3165            "daemon": cmd_daemon,
3166            "doctor": cmd_doctor,
3167        },
3168        version=__version__,
3169        default_context_length=DEFAULT_CONTEXT_LENGTH,
3170    )
3171
3172
3173def main(argv: list[str] | None = None) -> None:
3174    argv = sys.argv[1:] if argv is None else argv
3175    try:
3176        if not argv:
3177            cmd_home(argparse.Namespace(history_limit=12))
3178            return
3179        parser = build_parser()
3180        args = parser.parse_args(argv)
3181        args.func(args)
3182    except KeyboardInterrupt:
3183        print()
3184        return
3185
3186
3187if __name__ == "__main__":
3188    main()
nipux_cli/cli_help.py 92 lines
   1"""Help text and static branding for the Nipux command console."""
   2
   3from __future__ import annotations
   4
   5from typing import Callable
   6
   7
   8NIPUX_BANNER = r"""
   9 _   _ _                  ____ _     ___
  10| \ | (_)_ __  _   ___  _/ ___| |   |_ _|
  11|  \| | | '_ \| | | \ \/ / |   | |    | |
  12| |\  | | |_) | |_| |>  <| |___| |___ | |
  13|_| \_|_| .__/ \__,_/_/\_\\____|_____|___|
  14        |_|
  15""".strip("\n")
  16
  17
  18def print_shell_help(*, rule: Callable[[str], str]) -> None:
  19    print(NIPUX_BANNER)
  20    print(rule("="))
  21    _print_group(
  22        "Jobs",
  23        (
  24            'create "objective" --title TITLE',
  25            "ls",
  26            "focus [JOB_TITLE]",
  27            "rename JOB_TITLE --title NEW_TITLE",
  28            "delete JOB_TITLE",
  29            "chat [JOB_TITLE]",
  30            "steer [--job JOB_TITLE] MESSAGE",
  31            "pause [JOB_TITLE] [note...]",
  32            "resume [JOB_TITLE]",
  33            "cancel [JOB_TITLE] [note...]",
  34        ),
  35    )
  36    _print_group(
  37        "Inspect",
  38        (
  39            "status [JOB_TITLE]",
  40            "health",
  41            "history [JOB_TITLE]",
  42            "events [JOB_TITLE] [--follow] [--json]",
  43            "activity [JOB_TITLE] [--follow]",
  44            "updates [JOB_TITLE]",
  45            "outputs [JOB_TITLE] --verbose",
  46            "findings [JOB_TITLE]",
  47            "tasks [JOB_TITLE]",
  48            "roadmap [JOB_TITLE]",
  49            "experiments [JOB_TITLE]",
  50            "sources [JOB_TITLE]",
  51            "memory [JOB_TITLE]",
  52            "metrics [JOB_TITLE]",
  53            "usage [JOB_TITLE]",
  54            "artifacts [JOB_TITLE]",
  55            "artifact QUERY_OR_TITLE",
  56            "lessons [JOB_TITLE]",
  57        ),
  58    )
  59    _print_group(
  60        "Worker",
  61        (
  62            "work [JOB_TITLE] --steps N [--verbose]",
  63            "run [JOB_TITLE] --poll-seconds N",
  64            "start --poll-seconds N",
  65            "restart --poll-seconds N",
  66            "stop  # daemon",
  67            "stop [JOB_TITLE]  # pause job",
  68        ),
  69    )
  70    _print_group(
  71        "System",
  72        (
  73            "learn [--job JOB_TITLE] LESSON",
  74            "digest JOB_TITLE",
  75            "daily-digest",
  76            "update",
  77            "service install|status|uninstall",
  78            "autostart install|status|uninstall",
  79            "dashboard [JOB_TITLE] --no-follow",
  80            "doctor --check-model",
  81            "browser-dashboard --port 4848",
  82            "help",
  83            "exit",
  84        ),
  85    )
  86
  87
  88def _print_group(title: str, commands: tuple[str, ...]) -> None:
  89    print(title)
  90    for command in commands:
  91        print(f"  {command}")
  92    print()
nipux_cli/cli_render.py 280 lines
   1"""Reusable text renderers for non-frame CLI commands."""
   2
   3from __future__ import annotations
   4
   5import json
   6import os
   7import shutil
   8import textwrap
   9from pathlib import Path
  10from typing import Any
  11
  12from nipux_cli.event_render import event_display_parts
  13from nipux_cli.tui_event_format import clean_step_summary
  14from nipux_cli.tui_status import job_display_state, worker_label
  15from nipux_cli.tui_style import _accent, _event_badge, _fancy_ui, _muted, _one_line, _status_badge
  16
  17
  18def clip_json(value: Any, limit: int) -> str:
  19    text = json.dumps(value, ensure_ascii=False, indent=2, sort_keys=True)
  20    if len(text) <= limit:
  21        return text
  22    return text[:limit] + f"\n... truncated {len(text) - limit} chars"
  23
  24
  25def print_step(step: dict[str, Any], *, verbose: bool = False, chars: int = 4000) -> None:
  26    tool = step.get("tool_name") or "-"
  27    summary = _one_line(clean_step_summary(step.get("summary") or ""), chars)
  28    error = _one_line(step["error"], chars) if step.get("error") else ""
  29    print(f"step #{step['step_no']} {step['started_at']} {step['status']} {step['kind']} {tool}")
  30    if summary:
  31        print(f"  summary: {summary}")
  32    if error:
  33        print(f"  error: {error}")
  34    output_data = step.get("output") or {}
  35    if not verbose and isinstance(output_data, dict):
  36        artifact_id = output_data.get("artifact_id")
  37        if artifact_id:
  38            print(f"  artifact: {artifact_id} (view with: artifact {artifact_id})")
  39        lesson = output_data.get("lesson") if isinstance(output_data.get("lesson"), dict) else None
  40        if lesson:
  41            print(f"  lesson: {_one_line(lesson.get('lesson') or '', chars)}")
  42        update = output_data.get("update") if isinstance(output_data.get("update"), dict) else None
  43        if update:
  44            print(f"  update: {_one_line(update.get('message') or '', chars)}")
  45        source = output_data.get("source") if isinstance(output_data.get("source"), dict) else None
  46        if source:
  47            print(f"  source: {_one_line(source.get('source') or '', chars)} score={source.get('usefulness_score')}")
  48        if isinstance(output_data.get("findings"), list):
  49            print(f"  findings: {output_data.get('added', 0)} new, {output_data.get('updated', 0)} updated")
  50        checkpoint = output_data.get("auto_checkpoint") if isinstance(output_data.get("auto_checkpoint"), dict) else None
  51        if checkpoint:
  52            print(f"  auto checkpoint: {checkpoint.get('artifact_id')}")
  53    if verbose:
  54        input_data = step.get("input") or {}
  55        if input_data:
  56            print("  input:")
  57            print(clip_json(input_data, chars))
  58        if output_data:
  59            print("  output:")
  60            print(clip_json(output_data, chars))
  61
  62
  63def print_artifact(artifact: dict[str, Any]) -> None:
  64    title = artifact.get("title") or artifact["id"]
  65    print(f"artifact {artifact['created_at']} {artifact['type']} {title}")
  66    print(f"  {artifact['path']}")
  67
  68
  69def print_run(run: dict[str, Any]) -> None:
  70    print(f"run {run['started_at']} {run['status']} {run['id']} {run.get('model') or ''}")
  71    if run.get("error"):
  72        print(f"  error: {run['error']}")
  73
  74
  75def print_wrapped(prefix: str, text: Any, *, width: int, subsequent_indent: str = "") -> None:
  76    content = " ".join(str(text).split())
  77    if not content:
  78        print(prefix.rstrip())
  79        return
  80    available = max(20, min(width, 96) - len(prefix))
  81    wrapped = textwrap.wrap(content, width=available) or [content]
  82    print(prefix + wrapped[0])
  83    for line in wrapped[1:]:
  84        print(subsequent_indent + line)
  85
  86
  87def section_title(title: str, subtitle: str = "") -> str:
  88    text = title.upper()
  89    if subtitle:
  90        text = f"{text} - {_one_line(subtitle, 52)}"
  91    width = min(terminal_width(), 96)
  92    if len(text) >= width - 2:
  93        return text[:width]
  94    if _fancy_ui():
  95        return _accent(f"╭─ {text} " + "─" * max(0, width - len(text) - 4))
  96    return f"{text} " + "-" * max(0, width - len(text) - 1)
  97
  98
  99def print_metric_grid(items: list[tuple[str, Any]]) -> None:
 100    width = min(terminal_width(), 96)
 101    cell_width = 24 if width >= 80 else 18
 102    cells = [f"{label:<12} {value}"[:cell_width].ljust(cell_width) for label, value in items]
 103    columns = max(1, width // cell_width)
 104    for start in range(0, len(cells), columns):
 105        print("  " + "  ".join(cells[start : start + columns]).rstrip())
 106
 107
 108def short_path(path: Path | str, *, max_width: int = 80) -> str:
 109    text = str(path)
 110    home = str(Path.home())
 111    if text.startswith(home + os.sep):
 112        text = "~" + text[len(home) :]
 113    if len(text) <= max_width:
 114        return text
 115    keep = max(12, max_width - 4)
 116    return "..." + text[-keep:]
 117
 118
 119def print_jobs_panel(jobs: list[dict[str, Any]], *, focused_job_id: str, daemon_running: bool) -> None:
 120    print(section_title("Jobs"))
 121    if not jobs:
 122        print("  No jobs yet. Type an objective or use /new OBJECTIVE.")
 123        return
 124    print("  #  job                         state       worker      kind")
 125    for index, item in enumerate(jobs[:8], start=1):
 126        marker = "*" if str(item.get("id")) == focused_job_id else " "
 127        state = job_display_state(item, daemon_running)
 128        worker = worker_label(item, daemon_running)
 129        title = _one_line(item.get("title") or item.get("id") or "job", 27)
 130        print(f"  {marker}{index:<2} {title:<27} {_status_badge(state):<11} {_status_badge(worker):<11} {item.get('kind') or ''}")
 131    if len(jobs) > 8:
 132        print(f"  ... {len(jobs) - 8} more. Use /jobs for the full list.")
 133    print("  switch: /focus JOB_TITLE")
 134
 135
 136def next_operator_action(job: dict[str, Any], daemon_running: bool) -> str:
 137    status = str(job.get("status") or "")
 138    if status == "planning":
 139        return "review the plan, or run when ready"
 140    if status == "cancelled":
 141        return "resume to reopen this job, or delete it"
 142    if status == "paused":
 143        return "resume, then run to continue"
 144    if status in {"queued", "running"} and not daemon_running:
 145        return "run to start background work"
 146    if status in {"queued", "running"} and daemon_running:
 147        return "daemon is active; live steps will stream here"
 148    if status == "completed":
 149        return "inspect history or artifacts"
 150    if status == "failed":
 151        return "resume, then run one worker step to test recovery"
 152    return ""
 153
 154
 155def important_startup_events(events: list[dict[str, Any]], *, limit: int) -> list[dict[str, Any]]:
 156    if len(events) <= limit:
 157        return events
 158    important_types = {
 159        "operator_message",
 160        "agent_message",
 161        "artifact",
 162        "finding",
 163        "task",
 164        "experiment",
 165        "lesson",
 166        "reflection",
 167        "error",
 168        "compaction",
 169    }
 170    selected: list[dict[str, Any]] = []
 171    for event in reversed(events):
 172        if event.get("event_type") in important_types:
 173            selected.append(event)
 174        if len(selected) >= limit:
 175            break
 176    if len(selected) < limit:
 177        for event in reversed(events):
 178            if event not in selected:
 179                selected.append(event)
 180            if len(selected) >= limit:
 181                break
 182    selected.sort(key=lambda event: (str(event.get("created_at") or ""), str(event.get("id") or "")))
 183    return selected
 184
 185
 186def print_event_card(event: dict[str, Any], *, chars: int, artifact_indexes: dict[str, int] | None = None) -> None:
 187    when, label, detail, access = event_display_parts(event, chars=chars, full=False)
 188    artifact_indexes = artifact_indexes or {}
 189    artifact_index = artifact_indexes.get(str(event.get("ref_id") or ""))
 190    if artifact_index and event.get("event_type") == "artifact":
 191        access = f"open: /artifact {artifact_index}"
 192    print(f"  {_event_badge(label):<8} {_muted(when):<16} {_one_line(detail, chars)}")
 193    if access:
 194        print(f"  {'':<8} {'':<16} {access}")
 195
 196
 197def public_event(event: dict[str, Any]) -> dict[str, Any]:
 198    public = dict(event)
 199    public.pop("metadata_json", None)
 200    return public
 201
 202
 203def print_event_details(event: dict[str, Any], *, chars: int) -> None:
 204    metadata = event.get("metadata") if isinstance(event.get("metadata"), dict) else {}
 205    if not metadata:
 206        return
 207    compact = {
 208        key: value
 209        for key, value in metadata.items()
 210        if key not in {"input", "output"} and value not in (None, "", [], {})
 211    }
 212    if compact:
 213        print(f"     meta: {_one_line(json.dumps(compact, ensure_ascii=False, sort_keys=True, default=str), chars)}")
 214    if isinstance(metadata.get("input"), dict):
 215        print(f"     input: {_one_line(json.dumps(metadata['input'], ensure_ascii=False, sort_keys=True, default=str), chars)}")
 216    if isinstance(metadata.get("output"), dict):
 217        print(f"     output: {_one_line(json.dumps(metadata['output'], ensure_ascii=False, sort_keys=True, default=str), chars)}")
 218
 219
 220def step_line(step: dict[str, Any], *, chars: int = 180) -> str:
 221    tool = step.get("tool_name") or step.get("kind") or "-"
 222    summary = clean_step_summary(step.get("summary") or step.get("error") or "-")
 223    error = " ERROR" if step.get("error") else ""
 224    return f"#{step['step_no']:<4} {step['status']:<9} {tool:<18} {_one_line(summary, chars)}{error}"
 225
 226
 227def terminal_width() -> int:
 228    return shutil.get_terminal_size((120, 40)).columns
 229
 230
 231def rule(char: str = "-", width: int | None = None) -> str:
 232    return char * min(width or terminal_width(), 96)
 233
 234
 235def json_default(value: Any) -> str:
 236    return str(value)
 237
 238
 239def daemon_state_line(lock: dict[str, Any]) -> str:
 240    metadata = lock.get("metadata") if isinstance(lock.get("metadata"), dict) else {}
 241    if lock.get("running"):
 242        pid = metadata.get("pid") or "unknown"
 243        stale = " stale-runtime" if lock.get("stale") else ""
 244        return f"running pid={pid}{stale}"
 245    return "ready when work starts"
 246
 247
 248def daemon_event_line(event: dict[str, Any], *, chars: int, job_titles: dict[str, str] | None = None) -> str:
 249    at = str(event.get("at") or "?")
 250    name = str(event.get("event") or "?")
 251    pieces = []
 252    job_titles = job_titles or {}
 253    for key in ("status", "tool", "job_id", "step_id", "error_type", "detail", "error"):
 254        value = event.get(key)
 255        if value not in (None, ""):
 256            label = key
 257            if key == "job_id":
 258                value = job_titles.get(str(value), value)
 259            pieces.append(f"{label}={value}")
 260    suffix = " ".join(pieces)
 261    return _one_line(f"{at} {name} {suffix}".strip(), chars)
 262
 263
 264def job_ref_text(value: Any) -> str | None:
 265    if value is None:
 266        return None
 267    if isinstance(value, list):
 268        text = " ".join(str(item) for item in value)
 269    else:
 270        text = str(value)
 271    text = " ".join(text.split())
 272    return text or None
 273
 274
 275def note_text(value: Any) -> str:
 276    if value is None:
 277        return ""
 278    if isinstance(value, list):
 279        return " ".join(str(item) for item in value).strip()
 280    return str(value).strip()
nipux_cli/cli_state.py 126 lines
   1"""Persistent CLI focus state and job lookup helpers."""
   2
   3from __future__ import annotations
   4
   5import hashlib
   6import json
   7from datetime import datetime, timezone
   8from pathlib import Path
   9from typing import Any
  10
  11from nipux_cli.config import AppConfig, load_config
  12from nipux_cli.db import AgentDB
  13
  14
  15def default_job_id(db: AgentDB) -> str | None:
  16    configured = configured_focus_job_id(db)
  17    if configured:
  18        return configured
  19    jobs = db.list_jobs()
  20    for status in ("running", "queued", "planning", "paused", "failed", "completed"):
  21        for job in jobs:
  22            if job.get("status") == status:
  23                return str(job["id"])
  24    return str(jobs[0]["id"]) if jobs else None
  25
  26
  27def configured_focus_job_id(db: AgentDB) -> str | None:
  28    job_id = read_shell_state().get("focus_job_id")
  29    if not isinstance(job_id, str) or not job_id:
  30        return None
  31    try:
  32        db.get_job(job_id)
  33    except KeyError:
  34        return None
  35    return job_id
  36
  37
  38def find_job(db: AgentDB, query: str) -> dict[str, Any] | None:
  39    needle = " ".join(query.split()).lower()
  40    if not needle:
  41        return None
  42    jobs = db.list_jobs()
  43    for job in jobs:
  44        if str(job["id"]).lower() == needle:
  45            return job
  46    for job in jobs:
  47        if str(job.get("title") or "").lower() == needle:
  48            return job
  49    for job in jobs:
  50        if needle in str(job.get("title") or "").lower():
  51            return job
  52    return None
  53
  54
  55def shell_state_path() -> Path:
  56    config = load_config()
  57    config.ensure_dirs()
  58    return config.runtime.home / "shell_state.json"
  59
  60
  61def read_shell_state() -> dict[str, Any]:
  62    path = shell_state_path()
  63    if not path.exists():
  64        return {}
  65    try:
  66        parsed = json.loads(path.read_text(encoding="utf-8"))
  67    except (OSError, json.JSONDecodeError):
  68        return {}
  69    return parsed if isinstance(parsed, dict) else {}
  70
  71
  72def write_shell_state(patch: dict[str, Any]) -> None:
  73    state = read_shell_state()
  74    state.update(patch)
  75    shell_state_path().write_text(
  76        json.dumps(state, ensure_ascii=False, indent=2, sort_keys=True) + "\n", encoding="utf-8"
  77    )
  78
  79
  80def setup_completed() -> bool:
  81    return bool(read_shell_state().get("setup_completed"))
  82
  83
  84def mark_setup_completed() -> None:
  85    write_shell_state({"setup_completed": True})
  86
  87
  88def model_setup_fingerprint(config: AppConfig | None = None) -> str:
  89    config = config or load_config()
  90    key_hash = hashlib.sha256(config.model.api_key.encode("utf-8")).hexdigest() if config.model.api_key else ""
  91    payload = {
  92        "model": config.model.model,
  93        "base_url": config.model.base_url,
  94        "api_key_env": config.model.api_key_env,
  95        "api_key_hash": key_hash,
  96    }
  97    return hashlib.sha256(json.dumps(payload, sort_keys=True).encode("utf-8")).hexdigest()
  98
  99
 100def model_setup_verified(config: AppConfig | None = None) -> bool:
 101    state = read_shell_state()
 102    marker = state.get("model_setup_verified")
 103    if not isinstance(marker, dict) or not marker.get("ok"):
 104        return False
 105    return marker.get("fingerprint") == model_setup_fingerprint(config)
 106
 107
 108def mark_model_setup_verified(config: AppConfig | None = None) -> None:
 109    config = config or load_config()
 110    write_shell_state(
 111        {
 112            "setup_completed": True,
 113            "model_setup_verified": {
 114                "ok": True,
 115                "fingerprint": model_setup_fingerprint(config),
 116                "checked_at": datetime.now(timezone.utc).isoformat(),
 117                "model": config.model.model,
 118                "base_url": config.model.base_url,
 119                "api_key_env": config.model.api_key_env,
 120            },
 121        }
 122    )
 123
 124
 125def clear_model_setup_verified() -> None:
 126    write_shell_state({"model_setup_verified": {}})
nipux_cli/compression.py 246 lines
   1"""Deterministic rolling memory summaries for long-running jobs."""
   2
   3from __future__ import annotations
   4
   5from nipux_cli.db import AgentDB
   6from nipux_cli.memory_graph import rank_memory_nodes
   7from nipux_cli.operator_context import active_prompt_operator_entries
   8
   9
  10def _clip_text(value: object, limit: int) -> str:
  11    text = " ".join(str(value or "").split())
  12    if len(text) <= limit:
  13        return text
  14    return text[: max(0, limit - 3)].rstrip() + "..."
  15
  16
  17def refresh_memory_index(db: AgentDB, job_id: str, *, max_steps: int = 8, max_artifacts: int = 8) -> str:
  18    """Write a compact, artifact-referenced job memory entry.
  19
  20    This is deliberately deterministic. A local model can later improve the
  21    prose, but the daemon should always have a cheap compaction path that runs
  22    after every step and survives model failures.
  23    """
  24
  25    job = db.get_job(job_id)
  26    steps = db.list_steps(job_id=job_id)[-max_steps:]
  27    artifacts = db.list_artifacts(job_id, limit=max_artifacts)
  28    artifact_refs = [artifact["id"] for artifact in artifacts]
  29    metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
  30    operator_messages = metadata.get("operator_messages") if isinstance(metadata.get("operator_messages"), list) else []
  31    active_operator = [
  32        entry
  33        for entry in active_prompt_operator_entries(operator_messages)
  34        if str(entry.get("mode") or "steer") in {"steer", "follow_up"}
  35    ][-5:]
  36    operator_notes = [
  37        entry for entry in operator_messages
  38        if isinstance(entry, dict)
  39        and str(entry.get("mode") or "steer") == "note"
  40    ][-3:]
  41
  42    lines = [
  43        f"Job lifecycle status: {job['status']}",
  44        f"Objective: {job['objective']}",
  45        "",
  46        "Active operator context:",
  47    ]
  48    if not active_operator and not operator_notes:
  49        lines.append("- none")
  50    for entry in active_operator:
  51        lines.append(
  52            f"- {entry.get('mode') or 'steer'} {entry.get('event_id') or ''}: "
  53            f"{_clip_text(entry.get('message') or '', 300)}"
  54        )
  55    for entry in operator_notes:
  56        lines.append(f"- note {entry.get('event_id') or ''}: {_clip_text(entry.get('message') or '', 300)}")
  57
  58    lines.extend([
  59        "",
  60        "Recent steps:",
  61    ])
  62    if not steps:
  63        lines.append("- none")
  64    for step in steps:
  65        tool = f" tool={step['tool_name']}" if step.get("tool_name") else ""
  66        summary = step.get("summary") or step.get("error") or ""
  67        lines.append(f"- #{step['step_no']} {step['kind']} {step['status']}{tool}: {_clip_text(summary, 280)}")
  68
  69    lines.extend(["", "Recent artifacts:"])
  70    if not artifacts:
  71        lines.append("- none")
  72    for artifact in artifacts:
  73        title = artifact.get("title") or artifact["id"]
  74        summary = artifact.get("summary") or ""
  75        lines.append(f"- {artifact['id']} {_clip_text(title, 120)} ({artifact['type']}): {_clip_text(summary, 240)}")
  76
  77    tasks = _metadata_list(metadata, "task_queue")
  78    findings = _metadata_list(metadata, "finding_ledger")
  79    sources = _metadata_list(metadata, "source_ledger")
  80    experiments = _metadata_list(metadata, "experiment_ledger")
  81    roadmap = metadata.get("roadmap") if isinstance(metadata.get("roadmap"), dict) else {}
  82    memory_graph = metadata.get("memory_graph") if isinstance(metadata.get("memory_graph"), dict) else {}
  83    memory_nodes = _metadata_list(memory_graph, "nodes")
  84    memory_edges = _metadata_list(memory_graph, "edges")
  85    pending_measurement = (
  86        metadata.get("pending_measurement_obligation")
  87        if isinstance(metadata.get("pending_measurement_obligation"), dict)
  88        and metadata.get("pending_measurement_obligation")
  89        and not metadata.get("pending_measurement_obligation", {}).get("resolved_at")
  90        else {}
  91    )
  92
  93    lines.extend(["", "Durable progress ledgers:"])
  94    lines.append(
  95        "- "
  96        + ", ".join(
  97            [
  98                f"tasks={len(tasks)}",
  99                f"findings={len(findings)}",
 100                f"sources={len(sources)}",
 101                f"experiments={len(experiments)}",
 102                f"memory_nodes={len(memory_nodes)}",
 103                f"roadmap={'yes' if roadmap else 'no'}",
 104            ]
 105        )
 106    )
 107    for node in rank_memory_nodes(memory_nodes, limit=4):
 108        lines.append(
 109            "- memory "
 110            f"{node.get('status') or 'active'} "
 111            f"{node.get('kind') or 'fact'} "
 112            f"{_clip_text(node.get('title') or node.get('key') or '', 120)}"
 113        )
 114    if memory_edges:
 115        lines.append(f"- memory_links={len(memory_edges)}")
 116    for task in _rank_tasks(tasks)[:4]:
 117        lines.append(
 118            "- task "
 119            f"{task.get('status') or 'open'} "
 120            f"{_clip_text(task.get('title') or '', 120)} "
 121            f"contract={task.get('output_contract') or '?'}"
 122        )
 123    for experiment in experiments[-3:]:
 124        metric = ""
 125        if experiment.get("metric_value") not in (None, ""):
 126            metric = (
 127                f" metric={experiment.get('metric_name') or 'value'}="
 128                f"{experiment.get('metric_value')}{experiment.get('metric_unit') or ''}"
 129            )
 130        lines.append(
 131            "- experiment "
 132            f"{experiment.get('status') or 'planned'} "
 133            f"{_clip_text(experiment.get('title') or '', 120)}{metric}"
 134        )
 135    if pending_measurement:
 136        candidates = pending_measurement.get("metric_candidates")
 137        candidate_text = "; ".join(str(item) for item in candidates[:3]) if isinstance(candidates, list) else ""
 138        lines.append(
 139            "- pending_measurement "
 140            f"step=#{pending_measurement.get('source_step_no') or '?'} "
 141            f"tool={pending_measurement.get('tool') or '?'} "
 142            f"{_clip_text(candidate_text or pending_measurement.get('summary') or '', 220)}"
 143        )
 144    for finding in findings[-3:]:
 145        lines.append(f"- finding {_clip_text(finding.get('name') or finding.get('title') or '', 140)}")
 146    for source in sources[-3:]:
 147        score = source.get("usefulness_score")
 148        lines.append(f"- source {_clip_text(source.get('source') or '', 140)} score={score if score is not None else '?'}")
 149    if roadmap:
 150        lines.append(
 151            "- roadmap "
 152            f"{roadmap.get('status') or 'planned'} "
 153            f"{_clip_text(roadmap.get('title') or 'Roadmap', 140)} "
 154            f"current={_clip_text(roadmap.get('current_milestone') or '', 120)}"
 155        )
 156
 157    usage = db.job_token_usage(job_id)
 158    if int(usage.get("calls") or 0) > 0:
 159        lines.extend(["", "Model usage:"])
 160        latest_prompt = _compact_count(usage.get("latest_prompt_tokens"))
 161        latest_total = _compact_count(usage.get("latest_total_tokens"))
 162        context_length = _first_positive_int(usage.get("latest_context_length"), usage.get("context_length"))
 163        context_fraction = _context_fraction(usage, context_length=context_length)
 164        lines.append(
 165            "- "
 166            + ", ".join(
 167                [
 168                    f"calls={usage.get('calls') or 0}",
 169                    f"total_tokens={_compact_count(usage.get('total_tokens'))}",
 170                    f"output_tokens={_compact_count(usage.get('completion_tokens'))}",
 171                    f"latest_context={latest_prompt}",
 172                    f"latest_total={latest_total}",
 173                    f"estimated_calls={usage.get('estimated_calls') or 0}",
 174                ]
 175            )
 176        )
 177        if context_fraction >= 0.65:
 178            lines.append(
 179                "- context_pressure "
 180                f"latest_context={latest_prompt}"
 181                + (f"/{_compact_count(context_length)}" if context_length else "")
 182                + f" ({context_fraction:.0%}); prefer compact ledgers, artifacts, and decisions over raw history."
 183            )
 184
 185    return db.upsert_memory(
 186        job_id=job_id,
 187        key="rolling_state",
 188        summary="\n".join(lines).strip(),
 189        artifact_refs=artifact_refs,
 190    )
 191
 192
 193def _metadata_list(metadata: dict, key: str) -> list[dict]:
 194    values = metadata.get(key)
 195    if not isinstance(values, list):
 196        return []
 197    return [value for value in values if isinstance(value, dict)]
 198
 199
 200def _rank_tasks(tasks: list[dict]) -> list[dict]:
 201    status_rank = {"active": 0, "open": 1, "blocked": 2, "validating": 3, "done": 4, "skipped": 5}
 202    return sorted(
 203        tasks,
 204        key=lambda task: (
 205            status_rank.get(str(task.get("status") or "open"), 9),
 206            -int(task.get("priority") or 0),
 207            str(task.get("title") or ""),
 208        ),
 209    )
 210
 211
 212def _compact_count(value: object) -> str:
 213    try:
 214        number = int(float(value or 0))
 215    except (TypeError, ValueError):
 216        number = 0
 217    if number >= 1_000_000:
 218        return f"{number / 1_000_000:.1f}M"
 219    if number >= 1_000:
 220        return f"{number / 1_000:.1f}K"
 221    return str(number)
 222
 223
 224def _context_fraction(usage: dict, *, context_length: int) -> float:
 225    raw_fraction = usage.get("latest_context_fraction") or usage.get("context_fraction")
 226    try:
 227        fraction = float(raw_fraction)
 228    except (TypeError, ValueError):
 229        fraction = 0.0
 230    if fraction > 0:
 231        return fraction
 232    latest_prompt = _first_positive_int(usage.get("latest_prompt_tokens"), usage.get("prompt_tokens"))
 233    if context_length <= 0 or latest_prompt <= 0:
 234        return 0.0
 235    return latest_prompt / context_length
 236
 237
 238def _first_positive_int(*values: object) -> int:
 239    for value in values:
 240        try:
 241            number = int(float(value or 0))
 242        except (TypeError, ValueError):
 243            continue
 244        if number > 0:
 245            return number
 246    return 0
nipux_cli/config.py 271 lines
   1"""Configuration for the Nipux long-running agent runtime."""
   2
   3from __future__ import annotations
   4
   5import os
   6from dataclasses import dataclass, field
   7from pathlib import Path
   8from typing import Any
   9
  10import yaml
  11
  12
  13DEFAULT_OPENROUTER_MODEL = "openrouter/auto"
  14DEFAULT_OPENROUTER_API_KEY_ENV = <redacted>
  15DEFAULT_MODEL = "local-model"
  16DEFAULT_BASE_URL = "http://localhost:8000/v1"
  17DEFAULT_API_KEY_ENV = <redacted>
  18DEFAULT_CONTEXT_LENGTH = 262_144
  19DEFAULT_REQUEST_TIMEOUT_SECONDS = 300.0
  20
  21
  22def get_agent_home() -> Path:
  23    """Return the Nipux agent home directory."""
  24
  25    value = os.environ.get("NIPUX_HOME", "").strip()
  26    return Path(value).expanduser() if value else Path.home() / ".nipux"
  27
  28
  29def load_env_file(path: str | Path) -> None:
  30    """Load KEY=value pairs from a local env file without overriding the shell."""
  31
  32    env_path = Path(path).expanduser()
  33    if not env_path.exists():
  34        return
  35    ensure_private_file_permissions(env_path)
  36    for raw_line in env_path.read_text(encoding="utf-8").splitlines():
  37        line = raw_line.strip()
  38        if not line or line.startswith("#") or "=" not in line:
  39            continue
  40        key, value = line.split("=", 1)
  41        key = key.strip()
  42        value = value.strip().strip("\"'")
  43        if key and key not in os.environ:
  44            os.environ[key] = value
  45
  46
  47def ensure_private_file_permissions(path: str | Path) -> None:
  48    """Best-effort POSIX privacy for local config/secret files."""
  49
  50    if os.name == "nt":
  51        return
  52    try:
  53        Path(path).chmod(0o600)
  54    except OSError:
  55        pass
  56
  57
  58def ensure_private_dir_permissions(path: str | Path) -> None:
  59    """Best-effort POSIX privacy for the local Nipux state directory."""
  60
  61    if os.name == "nt":
  62        return
  63    try:
  64        Path(path).chmod(0o700)
  65    except OSError:
  66        pass
  67
  68
  69def write_private_text(path: str | Path, text: str) -> None:
  70    """Write text with private file permissions from creation time."""
  71
  72    target = Path(path).expanduser()
  73    target.parent.mkdir(parents=True, exist_ok=True)
  74    flags = os.O_WRONLY | os.O_CREAT | os.O_TRUNC
  75    fd = os.open(target, flags, 0o600)
  76    try:
  77        with os.fdopen(fd, "w", encoding="utf-8") as handle:
  78            fd = -1
  79            handle.write(text)
  80    finally:
  81        if fd >= 0:
  82            os.close(fd)
  83    ensure_private_file_permissions(target)
  84
  85
  86@dataclass(frozen=True)
  87class ModelConfig:
  88    model: str = DEFAULT_MODEL
  89    base_url: str = DEFAULT_BASE_URL
  90    api_key_env: str = DEFAULT_API_KEY_ENV
  91    context_length: int = DEFAULT_CONTEXT_LENGTH
  92    request_timeout_seconds: float = DEFAULT_REQUEST_TIMEOUT_SECONDS
  93    input_cost_per_million: float | None = None
  94    output_cost_per_million: float | None = None
  95
  96    @property
  97    def api_key(self) -> str:
  98        return os.environ.get(self.api_key_env, "")
  99
 100
 101@dataclass(frozen=True)
 102class RuntimeConfig:
 103    home: Path = field(default_factory=get_agent_home)
 104    max_step_seconds: int = 600
 105    max_steps_per_run: int = 1
 106    artifact_inline_char_limit: int = 12_000
 107    daily_digest_enabled: bool = True
 108    daily_digest_time: str = "08:00"
 109    max_job_cost_usd: float | None = None
 110
 111    @property
 112    def state_db_path(self) -> Path:
 113        return self.home / "state.db"
 114
 115    @property
 116    def jobs_dir(self) -> Path:
 117        return self.home / "jobs"
 118
 119    @property
 120    def logs_dir(self) -> Path:
 121        return self.home / "logs"
 122
 123    @property
 124    def digests_dir(self) -> Path:
 125        return self.home / "digests"
 126
 127
 128@dataclass(frozen=True)
 129class ToolAccessConfig:
 130    browser: bool = True
 131    web: bool = True
 132    shell: bool = True
 133    files: bool = True
 134
 135
 136@dataclass(frozen=True)
 137class EmailConfig:
 138    enabled: bool = False
 139    smtp_host: str = ""
 140    smtp_port: int = 587
 141    username: str = ""
 142    password_env: str = "NIPUX_EMAIL_PASSWORD"
 143    from_addr: str = ""
 144    to_addr: str = ""
 145    use_tls: bool = True
 146
 147    @property
 148    def password(self) -> str:
 149        return os.environ.get(self.password_env, "")
 150
 151
 152@dataclass(frozen=True)
 153class AppConfig:
 154    runtime: RuntimeConfig = field(default_factory=RuntimeConfig)
 155    model: ModelConfig = field(default_factory=ModelConfig)
 156    tools: ToolAccessConfig = field(default_factory=ToolAccessConfig)
 157    email: EmailConfig = field(default_factory=EmailConfig)
 158
 159    def ensure_dirs(self) -> None:
 160        for directory in (
 161            self.runtime.home,
 162            self.runtime.jobs_dir,
 163            self.runtime.logs_dir,
 164            self.runtime.digests_dir,
 165        ):
 166            directory.mkdir(parents=True, exist_ok=True)
 167            ensure_private_dir_permissions(directory)
 168
 169
 170def _as_dict(value: Any) -> dict[str, Any]:
 171    return value if isinstance(value, dict) else {}
 172
 173
 174def _optional_float(value: Any) -> float | None:
 175    if value in (None, ""):
 176        return None
 177    return float(value)
 178
 179
 180def load_config(path: str | Path | None = None) -> AppConfig:
 181    """Load config.yaml, falling back to a local OpenAI-compatible endpoint."""
 182
 183    home = get_agent_home()
 184    load_env_file(home / ".env")
 185    cfg_path = Path(path).expanduser() if path else home / "config.yaml"
 186    raw: dict[str, Any] = {}
 187    if cfg_path.exists():
 188        loaded = yaml.safe_load(cfg_path.read_text(encoding="utf-8")) or {}
 189        raw = _as_dict(loaded)
 190
 191    runtime_raw = _as_dict(raw.get("runtime"))
 192    model_raw = _as_dict(raw.get("model"))
 193    tools_raw = _as_dict(raw.get("tools"))
 194    email_raw = _as_dict(raw.get("email"))
 195
 196    runtime_home = Path(runtime_raw.get("home") or home).expanduser()
 197    runtime = RuntimeConfig(
 198        home=runtime_home,
 199        max_step_seconds=int(runtime_raw.get("max_step_seconds", 600)),
 200        max_steps_per_run=int(runtime_raw.get("max_steps_per_run", 1)),
 201        artifact_inline_char_limit=int(runtime_raw.get("artifact_inline_char_limit", 12_000)),
 202        daily_digest_enabled=bool(runtime_raw.get("daily_digest_enabled", True)),
 203        daily_digest_time=str(runtime_raw.get("daily_digest_time") or "08:00"),
 204        max_job_cost_usd=_optional_float(runtime_raw.get("max_job_cost_usd")),
 205    )
 206    model = ModelConfig(
 207        model=str(model_raw.get("name") or model_raw.get("model") or DEFAULT_MODEL),
 208        base_url=str(model_raw.get("base_url") or DEFAULT_BASE_URL).rstrip("/"),
 209        api_key_env=str(model_raw.get("api_key_env") or DEFAULT_API_KEY_ENV),
 210        context_length=int(model_raw.get("context_length", DEFAULT_CONTEXT_LENGTH)),
 211        request_timeout_seconds=float(model_raw.get("request_timeout_seconds", DEFAULT_REQUEST_TIMEOUT_SECONDS)),
 212        input_cost_per_million=_optional_float(model_raw.get("input_cost_per_million")),
 213        output_cost_per_million=_optional_float(model_raw.get("output_cost_per_million")),
 214    )
 215    tools = ToolAccessConfig(
 216        browser=bool(tools_raw.get("browser", True)),
 217        web=bool(tools_raw.get("web", True)),
 218        shell=bool(tools_raw.get("shell", True)),
 219        files=bool(tools_raw.get("files", True)),
 220    )
 221    email = EmailConfig(
 222        enabled=bool(email_raw.get("enabled", False)),
 223        smtp_host=str(email_raw.get("smtp_host") or ""),
 224        smtp_port=int(email_raw.get("smtp_port", 587)),
 225        username=str(email_raw.get("username") or ""),
 226        password_env=str(email_raw.get("password_env") or "NIPUX_EMAIL_PASSWORD"),
 227        from_addr=str(email_raw.get("from_addr") or ""),
 228        to_addr=str(email_raw.get("to_addr") or ""),
 229        use_tls=bool(email_raw.get("use_tls", True)),
 230    )
 231    return AppConfig(runtime=runtime, model=model, tools=tools, email=email)
 232
 233
 234def default_config_yaml(
 235    *,
 236    model: str = DEFAULT_MODEL,
 237    base_url: str = DEFAULT_BASE_URL,
 238    api_key_env: str = DEFAULT_API_KEY_ENV,
 239    context_length: int = DEFAULT_CONTEXT_LENGTH,
 240) -> str:
 241    """Return a starter config file for an OpenAI-compatible model server."""
 242
 243    return (
 244        "model:\n"
 245        f"  name: {model}\n"
 246        f"  base_url: {base_url.rstrip('/')}\n"
 247        f"  api_key_env: {api_key_env}\n"
 248        f"  context_length: {context_length}\n"
 249        "  input_cost_per_million: null\n"
 250        "  output_cost_per_million: null\n"
 251        "runtime:\n"
 252        "  max_step_seconds: 600\n"
 253        "  max_steps_per_run: 1\n"
 254        "  artifact_inline_char_limit: 12000\n"
 255        "  daily_digest_enabled: true\n"
 256        "  daily_digest_time: \"08:00\"\n"
 257        "  max_job_cost_usd: null\n"
 258        "tools:\n"
 259        "  browser: true\n"
 260        "  web: true\n"
 261        "  shell: true\n"
 262        "  files: true\n"
 263        "email:\n"
 264        "  enabled: false\n"
 265        "  smtp_host: \"\"\n"
 266        "  smtp_port: 587\n"
 267        "  username: \"\"\n"
 268        "  password_env: NIPUX_EMAIL_PASSWORD\n"
 269        "  from_addr: \"\"\n"
 270        "  to_addr: \"\"\n"
 271    )
nipux_cli/context_pressure.py 254 lines
   1"""Context-pressure signals for long-running worker prompts."""
   2
   3from __future__ import annotations
   4
   5from datetime import datetime, timezone
   6from typing import Any
   7
   8from nipux_cli.db import AgentDB
   9
  10
  11CONTEXT_PRESSURE_BANDS = (
  12    (0.95, "critical"),
  13    (0.85, "high"),
  14    (0.65, "watch"),
  15)
  16USAGE_TOKEN_BANDS = <redacted>
  17    (20_000_000, "critical"),
  18    (5_000_000, "high"),
  19    (1_000_000, "watch"),
  20)
  21USAGE_CALL_BANDS = (
  22    (2_000, "critical"),
  23    (1_000, "high"),
  24    (200, "watch"),
  25)
  26USAGE_COST_BANDS = (
  27    (10.0, "critical"),
  28    (5.0, "high"),
  29    (1.0, "watch"),
  30)
  31USAGE_BAND_RANK = {"": 0, "watch": 1, "high": 2, "critical": 3}
  32
  33
  34def context_pressure_for_prompt(job: dict[str, Any]) -> str:
  35    metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
  36    pressure = metadata.get("context_pressure") if isinstance(metadata.get("context_pressure"), dict) else {}
  37    band = str(pressure.get("band") or "")
  38    if band not in {"watch", "high", "critical"}:
  39        return "None."
  40    prompt_tokens = compact_token_count(pressure.get("prompt_tokens"))
  41    context_length = compact_token_count(pressure.get("context_length"))
  42    context_text = prompt_tokens
  43    if context_length != "0":
  44        context_text = f"{context_text}/{context_length}"
  45    fraction = _as_float(pressure.get("fraction"))
  46    fraction_text = f" ({fraction:.0%})" if fraction else ""
  47    return (
  48        f"Context pressure is {band}: latest prompt used {context_text}{fraction_text}. "
  49        "Keep the next turn compact; prefer durable memory, ledgers, artifact references, and explicit decisions "
  50        "over copying raw history."
  51    )
  52
  53
  54def usage_pressure_for_prompt(job: dict[str, Any], usage: dict[str, Any] | None) -> str:
  55    usage = usage if isinstance(usage, dict) else {}
  56    band = _usage_pressure_band(usage)
  57    if not band:
  58        return "None."
  59    calls = _as_int(usage.get("calls"))
  60    prompt_tokens = _as_int(usage.get("prompt_tokens"))
  61    completion_tokens = _as_int(usage.get("completion_tokens"))
  62    total_tokens = _as_int(usage.get("total_tokens")) or prompt_tokens + completion_tokens
  63    latest_prompt_tokens = _as_int(usage.get("latest_prompt_tokens"))
  64    latest_context_length = _as_int(usage.get("latest_context_length"))
  65    durable_records = _durable_usage_signal_count(job)
  66    tokens_per_record = total_tokens / max(1, durable_records)
  67    latest_context = compact_token_count(latest_prompt_tokens)
  68    if latest_context_length:
  69        latest_context = f"{latest_context}/{compact_token_count(latest_context_length)}"
  70    bits = [
  71        f"calls={calls}",
  72        f"tokens={compact_token_count(total_tokens)}",
  73        f"prompt={compact_token_count(prompt_tokens)}",
  74        f"output={compact_token_count(completion_tokens)}",
  75    ]
  76    if bool(usage.get("has_cost")):
  77        bits.append(f"cost=${_as_float(usage.get('cost')):.4f}")
  78    if latest_prompt_tokens:
  79        bits.append(f"latest_context={latest_context}")
  80    lines = [
  81        f"Cumulative model usage pressure is {band}: " + " ".join(bits) + ".",
  82        (
  83            f"Durable progress records={durable_records}; "
  84            f"approximately {compact_token_count(int(tokens_per_record))} tokens per durable record."
  85        ),
  86        (
  87            "Next action should be high leverage: execute, measure, validate, consolidate, defer, or mark a branch "
  88            "blocked/skipped from concrete evidence. Avoid low-yield retries, broad rereads, or new research unless it "
  89            "directly resolves an active contract or unlocks the next experiment."
  90        ),
  91    ]
  92    return "\n".join(lines)
  93
  94
  95def emit_usage_pressure_update(db: AgentDB, job_id: str, usage: dict[str, Any]) -> None:
  96    band = _usage_pressure_band(usage)
  97    if not band:
  98        return
  99    job = db.get_job(job_id)
 100    metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
 101    previous = metadata.get("usage_pressure") if isinstance(metadata.get("usage_pressure"), dict) else {}
 102    previous_band = str(previous.get("band") or "")
 103    total_tokens = _as_int(usage.get("total_tokens"))
 104    previous_high_tokens = _as_int(previous.get("high_water_tokens"))
 105    should_emit = (
 106        previous_band != band
 107        or (previous_high_tokens > 0 and total_tokens >= int(previous_high_tokens * 1.5))
 108        or (previous_high_tokens <= 0)
 109    )
 110    pressure = {
 111        "band": band,
 112        "calls": _as_int(usage.get("calls")),
 113        "total_tokens": total_tokens,
 114        "prompt_tokens": _as_int(usage.get("prompt_tokens")),
 115        "completion_tokens": _as_int(usage.get("completion_tokens")),
 116        "cost": _as_float(usage.get("cost")) if bool(usage.get("has_cost")) else None,
 117        "has_cost": bool(usage.get("has_cost")),
 118        "high_water_tokens": max(total_tokens, previous_high_tokens),
 119        "updated_at": datetime.now(timezone.utc).isoformat(),
 120    }
 121    db.update_job_metadata(job_id, {"usage_pressure": pressure})
 122    if not should_emit:
 123        return
 124    cost_text = ""
 125    if pressure["has_cost"]:
 126        cost_text = f" cost=${pressure['cost']:.4f}"
 127    db.append_agent_update(
 128        job_id,
 129        (
 130            f"Usage pressure {band}: {compact_token_count(total_tokens)} tokens across "
 131            f"{pressure['calls']} model calls.{cost_text} Prefer high-leverage actions, measurement, "
 132            "consolidation, or explicit blocked/deferred branches over low-yield churn."
 133        ),
 134        category="update",
 135        metadata={"kind": "usage_pressure", "usage_pressure": pressure},
 136    )
 137
 138
 139def emit_context_pressure_update(db: AgentDB, job_id: str, usage: dict[str, Any]) -> None:
 140    fraction = _as_float(usage.get("context_fraction"))
 141    band = _context_pressure_band(fraction)
 142    if not band:
 143        return
 144    job = db.get_job(job_id)
 145    metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
 146    previous = metadata.get("context_pressure") if isinstance(metadata.get("context_pressure"), dict) else {}
 147    previous_band = str(previous.get("band") or "")
 148    previous_high = _as_float(previous.get("high_water_fraction"))
 149    should_emit = previous_band != band or fraction >= previous_high + 0.10
 150    prompt_tokens = _as_int(usage.get("prompt_tokens"))
 151    context_length = _as_int(usage.get("context_length"))
 152    pressure = {
 153        "band": band,
 154        "fraction": round(fraction, 6),
 155        "high_water_fraction": round(max(fraction, previous_high), 6),
 156        "prompt_tokens": prompt_tokens,
 157        "context_length": context_length,
 158        "updated_at": datetime.now(timezone.utc).isoformat(),
 159    }
 160    db.update_job_metadata(job_id, {"context_pressure": pressure})
 161    if not should_emit:
 162        return
 163    denominator = f"/{compact_token_count(context_length)}" if context_length else ""
 164    estimated = ", estimated" if usage.get("estimated") else ""
 165    db.append_agent_update(
 166        job_id,
 167        (
 168            f"Context pressure {band}: latest prompt "
 169            f"{compact_token_count(prompt_tokens)}{denominator} ({fraction:.0%}{estimated}). "
 170            "Prefer compact memory, ledgers, artifact references, and explicit decisions over raw history."
 171        ),
 172        category="update",
 173        metadata={"kind": "context_pressure", "context_pressure": pressure},
 174    )
 175
 176
 177def compact_token_count(value: object) -> str:
 178    number = _as_int(value)
 179    if number >= 1_000_000:
 180        return f"{number / 1_000_000:.1f}M"
 181    if number >= 1_000:
 182        return f"{number / 1_000:.1f}K"
 183    return str(number)
 184
 185
 186def _usage_pressure_band(usage: dict[str, Any]) -> str:
 187    total_tokens = _as_int(usage.get("total_tokens"))
 188    if total_tokens <= 0:
 189        total_tokens = _as_int(usage.get("prompt_tokens")) + _as_int(usage.get("completion_tokens"))
 190    calls = _as_int(usage.get("calls"))
 191    cost = _as_float(usage.get("cost")) if bool(usage.get("has_cost")) else 0.0
 192    band = ""
 193    for threshold, candidate in USAGE_TOKEN_BANDS:
 194        if total_tokens >= threshold and USAGE_BAND_RANK[candidate] > USAGE_BAND_RANK[band]:
 195            band = candidate
 196            break
 197    for threshold, candidate in USAGE_CALL_BANDS:
 198        if calls >= threshold and USAGE_BAND_RANK[candidate] > USAGE_BAND_RANK[band]:
 199            band = candidate
 200            break
 201    if bool(usage.get("has_cost")):
 202        for threshold, candidate in USAGE_COST_BANDS:
 203            if cost >= threshold and USAGE_BAND_RANK[candidate] > USAGE_BAND_RANK[band]:
 204                band = candidate
 205                break
 206    return band
 207
 208
 209def _context_pressure_band(fraction: float) -> str:
 210    for threshold, band in CONTEXT_PRESSURE_BANDS:
 211        if fraction >= threshold:
 212            return band
 213    return ""
 214
 215
 216def _durable_usage_signal_count(job: dict[str, Any]) -> int:
 217    metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
 218    count = 0
 219    for key in ("finding_ledger", "source_ledger", "experiment_ledger", "lessons"):
 220        records = metadata.get(key)
 221        if isinstance(records, list):
 222            count += sum(1 for record in records if isinstance(record, dict))
 223    tasks = metadata.get("task_queue")
 224    if isinstance(tasks, list):
 225        count += sum(
 226            1
 227            for task in tasks
 228            if isinstance(task, dict)
 229            and str(task.get("status") or "open").lower() in {"done", "blocked", "skipped"}
 230            and (task.get("result") or task.get("evidence_needed") or task.get("acceptance_criteria"))
 231        )
 232    roadmap = metadata.get("roadmap") if isinstance(metadata.get("roadmap"), dict) else {}
 233    milestones = roadmap.get("milestones") if isinstance(roadmap.get("milestones"), list) else []
 234    count += sum(
 235        1
 236        for milestone in milestones
 237        if isinstance(milestone, dict)
 238        and str(milestone.get("status") or "planned").lower() in {"active", "validating", "done", "blocked", "skipped"}
 239    )
 240    return count
 241
 242
 243def _as_float(value: Any, default: float = 0.0) -> float:
 244    try:
 245        return float(value)
 246    except (TypeError, ValueError):
 247        return default
 248
 249
 250def _as_int(value: Any, default: int = 0) -> int:
 251    try:
 252        return int(float(value))
 253    except (TypeError, ValueError):
 254        return default
nipux_cli/daemon.py 695 lines
   1"""Daemon runner for restartable background jobs."""
   2
   3from __future__ import annotations
   4
   5import contextlib
   6import fcntl
   7import hashlib
   8import json
   9import os
  10import signal
  11import threading
  12import time
  13from dataclasses import dataclass
  14from datetime import datetime, timezone
  15from email.utils import parsedate_to_datetime
  16from functools import lru_cache
  17from pathlib import Path
  18from typing import Any, Callable
  19
  20from nipux_cli.config import AppConfig, load_config
  21from nipux_cli.db import AgentDB
  22from nipux_cli.digest import write_daily_digest
  23from nipux_cli.doctor import run_doctor
  24from nipux_cli.provider_errors import provider_rate_limited
  25from nipux_cli.scheduling import job_deferred_until, job_is_deferred, job_provider_blocked
  26from nipux_cli.shell_tools import cleanup_registered_shell_processes
  27
  28
  29class DaemonAlreadyRunning(RuntimeError):
  30    pass
  31
  32
  33RUNTIME_CODE_GLOB = "*.py"
  34
  35
  36def runtime_code_file_names() -> tuple[str, ...]:
  37    package_dir = Path(__file__).resolve().parent
  38    return tuple(path.name for path in _runtime_code_paths(package_dir))
  39
  40
  41@lru_cache(maxsize=1)
  42def current_runtime_fingerprint() -> dict[str, Any]:
  43    """Return a stable fingerprint for code that affects daemon behavior."""
  44
  45    from nipux_cli import __version__
  46    from nipux_cli.tools import DEFAULT_REGISTRY
  47    from nipux_cli.worker_policy import SYSTEM_PROMPT, WORKER_PROTOCOL_VERSION
  48
  49    tool_schema = DEFAULT_REGISTRY.openai_tools()
  50    tool_schema_hash = hashlib.sha256(json.dumps(tool_schema, sort_keys=True, default=str).encode("utf-8")).hexdigest()
  51    prompt_hash = hashlib.sha256(SYSTEM_PROMPT.encode("utf-8")).hexdigest()
  52    code_fingerprint = _runtime_code_fingerprint()
  53    payload = {
  54        "nipux_version": __version__,
  55        "worker_protocol": WORKER_PROTOCOL_VERSION,
  56        "tool_schema_hash": tool_schema_hash[:16],
  57        "prompt_hash": prompt_hash[:16],
  58        "code_hash": code_fingerprint["code_hash"],
  59        "code_mtime": code_fingerprint["code_mtime"],
  60        "tool_count": len(DEFAULT_REGISTRY.names()),
  61    }
  62    hash_payload = {key: value for key, value in payload.items() if key != "code_mtime"}
  63    payload["runtime_hash"] = hashlib.sha256(json.dumps(hash_payload, sort_keys=True).encode("utf-8")).hexdigest()[:16]
  64    return payload
  65
  66
  67@lru_cache(maxsize=1)
  68def _runtime_code_fingerprint() -> dict[str, Any]:
  69    package_dir = Path(__file__).resolve().parent
  70    digest = hashlib.sha256()
  71    mtimes: list[float] = []
  72    for path in _runtime_code_paths(package_dir):
  73        name = path.name
  74        digest.update(name.encode("utf-8"))
  75        data = path.read_bytes()
  76        digest.update(hashlib.sha256(data).digest())
  77        mtimes.append(path.stat().st_mtime)
  78    return {
  79        "code_hash": digest.hexdigest()[:16],
  80        "code_mtime": max(mtimes) if mtimes else 0,
  81    }
  82
  83
  84def _runtime_code_paths(package_dir: Path) -> list[Path]:
  85    return sorted(path for path in package_dir.glob(RUNTIME_CODE_GLOB) if path.is_file())
  86
  87
  88RUNTIME_CODE_FILES = runtime_code_file_names()
  89PROVIDER_RECOVERY_PROBE_SECONDS = 300.0
  90WORK_HEARTBEAT_INTERVAL_SECONDS = 15.0
  91
  92
  93def runtime_stale(metadata: dict[str, Any] | None) -> bool:
  94    if not isinstance(metadata, dict):
  95        return False
  96    recorded = metadata.get("runtime")
  97    if not isinstance(recorded, dict):
  98        return True
  99    return recorded.get("runtime_hash") != current_runtime_fingerprint().get("runtime_hash")
 100
 101
 102def _parse_lock_metadata(raw: str) -> dict[str, Any]:
 103    raw = raw.strip()
 104    if not raw:
 105        return {}
 106    try:
 107        parsed = json.loads(raw)
 108        return parsed if isinstance(parsed, dict) else {"raw": raw}
 109    except json.JSONDecodeError:
 110        return {"raw": raw}
 111
 112
 113def daemon_lock_status(path: str | Path) -> dict[str, Any]:
 114    """Return whether another process currently holds the daemon lock."""
 115
 116    path = Path(path)
 117    path.parent.mkdir(parents=True, exist_ok=True)
 118    with path.open("a+", encoding="utf-8") as handle:
 119        handle.seek(0)
 120        metadata = _parse_lock_metadata(handle.read())
 121        try:
 122            fcntl.flock(handle.fileno(), fcntl.LOCK_EX | fcntl.LOCK_NB)
 123        except BlockingIOError:
 124            stale = runtime_stale(metadata)
 125            return {
 126                "running": True,
 127                "lock_path": str(path),
 128                "metadata": metadata,
 129                "stale": stale,
 130                "current_runtime": current_runtime_fingerprint(),
 131                "detail": "daemon lock is held",
 132            }
 133        with contextlib.suppress(OSError):
 134            fcntl.flock(handle.fileno(), fcntl.LOCK_UN)
 135    return {
 136        "running": False,
 137        "lock_path": str(path),
 138        "metadata": metadata,
 139        "stale": False,
 140        "current_runtime": current_runtime_fingerprint(),
 141        "detail": "daemon lock is free",
 142    }
 143
 144
 145@contextlib.contextmanager
 146def single_instance_lock(path: str | Path):
 147    """Hold an exclusive non-blocking daemon lock for this state directory."""
 148
 149    path = Path(path)
 150    path.parent.mkdir(parents=True, exist_ok=True)
 151    with path.open("w+", encoding="utf-8") as handle:
 152        try:
 153            fcntl.flock(handle.fileno(), fcntl.LOCK_EX | fcntl.LOCK_NB)
 154        except BlockingIOError as exc:
 155            raise DaemonAlreadyRunning(f"Another nipux daemon holds {path}") from exc
 156        payload = {
 157            "pid": os.getpid(),
 158            "started_at": datetime.now(timezone.utc).isoformat(),
 159            "runtime": current_runtime_fingerprint(),
 160        }
 161        handle.seek(0)
 162        handle.truncate()
 163        handle.write(json.dumps(payload, sort_keys=True))
 164        handle.flush()
 165        try:
 166            yield handle
 167        finally:
 168            fcntl.flock(handle.fileno(), fcntl.LOCK_UN)
 169
 170
 171def update_lock_metadata(handle, **patch: Any) -> None:
 172    handle.seek(0)
 173    metadata = _parse_lock_metadata(handle.read())
 174    metadata.setdefault("pid", os.getpid())
 175    metadata.setdefault("started_at", datetime.now(timezone.utc).isoformat())
 176    metadata.update(patch)
 177    handle.seek(0)
 178    handle.truncate()
 179    handle.write(json.dumps(metadata, sort_keys=True))
 180    handle.flush()
 181
 182
 183@contextlib.contextmanager
 184def _work_heartbeat(
 185    update_metadata: Callable[..., None],
 186    *,
 187    interval_seconds: float | None = None,
 188    state: str = "working",
 189    **metadata: Any,
 190):
 191    """Keep daemon lock metadata fresh while a worker turn is in progress."""
 192
 193    raw_interval = WORK_HEARTBEAT_INTERVAL_SECONDS if interval_seconds is None else interval_seconds
 194    interval = max(0.01, float(raw_interval))
 195    stop = threading.Event()
 196
 197    def beat() -> None:
 198        while not stop.wait(interval):
 199            update_metadata(
 200                last_heartbeat=datetime.now(timezone.utc).isoformat(),
 201                last_state=state,
 202                runtime=current_runtime_fingerprint(),
 203                **metadata,
 204            )
 205
 206    update_metadata(
 207        last_heartbeat=datetime.now(timezone.utc).isoformat(),
 208        last_state=state,
 209        runtime=current_runtime_fingerprint(),
 210        **metadata,
 211    )
 212    thread = threading.Thread(target=beat, name="nipux-daemon-heartbeat", daemon=True)
 213    thread.start()
 214    try:
 215        yield
 216    finally:
 217        stop.set()
 218        thread.join(timeout=1.0)
 219
 220
 221def append_daemon_event(config: AppConfig, event: str, **fields: Any) -> Path:
 222    """Append a small daemon event that the CLI can tail without parsing stdout."""
 223
 224    config.ensure_dirs()
 225    path = config.runtime.logs_dir / "daemon-events.jsonl"
 226    payload = {
 227        "at": datetime.now(timezone.utc).isoformat(),
 228        "event": event,
 229        **fields,
 230    }
 231    with path.open("a", encoding="utf-8") as handle:
 232        handle.write(json.dumps(payload, ensure_ascii=False, sort_keys=True, default=str) + "\n")
 233    return path
 234
 235
 236def read_daemon_events(config: AppConfig, *, limit: int = 20) -> list[dict[str, Any]]:
 237    path = config.runtime.logs_dir / "daemon-events.jsonl"
 238    if not path.exists():
 239        return []
 240    lines = path.read_text(encoding="utf-8", errors="replace").splitlines()[-limit:]
 241    events: list[dict[str, Any]] = []
 242    for line in lines:
 243        try:
 244            parsed = json.loads(line)
 245        except json.JSONDecodeError:
 246            events.append({"event": "unparseable", "raw": line})
 247            continue
 248        if isinstance(parsed, dict):
 249            events.append(parsed)
 250    return events
 251
 252
 253def fake_step_llm():
 254    from nipux_cli.llm import LLMResponse, ScriptedLLM, ToolCall
 255
 256    nonce = datetime.now(timezone.utc).isoformat()
 257    return ScriptedLLM([
 258        LLMResponse(tool_calls=[
 259            ToolCall(
 260                name="write_artifact",
 261                arguments={
 262                    "title": "daemon-fake-step",
 263                    "type": "text",
 264                    "summary": "Fake daemon step",
 265                    "content": f"This is a fake daemon worker step.\n\nnonce: {nonce}",
 266                },
 267            )
 268        ])
 269    ])
 270
 271
 272@dataclass
 273class Daemon:
 274    config: AppConfig
 275    db: AgentDB
 276
 277    @classmethod
 278    def open(cls, config: AppConfig | None = None) -> "Daemon":
 279        config = config or load_config()
 280        config.ensure_dirs()
 281        return cls(config=config, db=AgentDB(config.runtime.state_db_path))
 282
 283    @property
 284    def lock_path(self) -> Path:
 285        return self.config.runtime.home / "agentd.lock"
 286
 287    def close(self) -> None:
 288        self.db.close()
 289
 290    def next_runnable_job(self) -> dict | None:
 291        """Return the next runnable job by priority/age.
 292
 293        UI focus is intentionally not used here. Focus is for the operator's
 294        chat view; the daemon should keep all runnable jobs advancing.
 295        """
 296
 297        now = datetime.now(timezone.utc)
 298        self._maybe_recover_provider_blocked_jobs(now=now)
 299        runnable_jobs = self.db.list_jobs(statuses=["queued", "running"])
 300        for job in runnable_jobs:
 301            if job_provider_blocked(job):
 302                self.db.update_job_status(
 303                    job["id"],
 304                    "paused",
 305                    metadata_patch={
 306                        "last_note": "Model provider still requires operator action; paused before retrying failed calls.",
 307                    },
 308                )
 309                self.db.append_agent_update(
 310                    job["id"],
 311                    "Model provider still requires operator action; paused before retrying failed calls.",
 312                    category="error",
 313                    metadata={"reason": "llm_provider_blocked"},
 314                )
 315                continue
 316        for job in runnable_jobs:
 317            if job_provider_blocked(job):
 318                continue
 319            if job_is_deferred(job, now=now):
 320                continue
 321            return job
 322        return None
 323
 324    def _maybe_recover_provider_blocked_jobs(self, *, now: datetime) -> None:
 325        paused = [job for job in self.db.list_jobs(statuses=["paused"]) if job_provider_blocked(job)]
 326        due = [job for job in paused if _provider_probe_due(job, now=now)]
 327        if not due:
 328            return
 329        ok, detail = _model_generation_ready(self.config)
 330        timestamp = now.isoformat()
 331        if not ok:
 332            for job in due:
 333                self.db.update_job_status(
 334                    job["id"],
 335                    "paused",
 336                    metadata_patch={
 337                        "provider_last_probe_at": timestamp,
 338                        "provider_last_probe_detail": detail[:1000],
 339                        "last_note": "Model provider still unavailable; daemon will check again later.",
 340                    },
 341                )
 342            append_daemon_event(
 343                self.config,
 344                "provider_recovery_wait",
 345                checked_jobs=len(due),
 346                detail=detail[:500],
 347                next_probe_seconds=PROVIDER_RECOVERY_PROBE_SECONDS,
 348            )
 349            return
 350        for job in paused:
 351            self.db.update_job_status(
 352                job["id"],
 353                "queued",
 354                metadata_patch={
 355                    "provider_last_probe_at": timestamp,
 356                    "provider_unblocked_at": timestamp,
 357                    "last_note": "Model provider recovered; daemon resumed this job.",
 358                },
 359            )
 360            self.db.append_agent_update(
 361                job["id"],
 362                "Model provider recovered; continuing queued work.",
 363                category="progress",
 364                metadata={"reason": "llm_provider_recovered"},
 365            )
 366        append_daemon_event(self.config, "provider_recovered", resumed_jobs=len(paused), detail=detail[:500])
 367
 368    def idle_sleep_seconds(self, *, poll_seconds: float, now: datetime | None = None) -> float:
 369        """Return the next idle sleep, capped by the nearest deferred job wake."""
 370
 371        fallback = max(5.0, poll_seconds)
 372        now = (now or datetime.now(timezone.utc)).astimezone(timezone.utc)
 373        due_times: list[datetime] = []
 374        for job in self.db.list_jobs(statuses=["queued", "running"]):
 375            due = job_deferred_until(job, now=now)
 376            if due is not None:
 377                due_times.append(due)
 378        if not due_times:
 379            return fallback
 380        wait_seconds = min((due - now).total_seconds() for due in due_times)
 381        return max(0.5, min(fallback, wait_seconds))
 382
 383    def run_once(self, *, fake: bool = False, verbose: bool = False):
 384        from nipux_cli.worker import run_one_step
 385
 386        job = self.next_runnable_job()
 387        if job is None:
 388            return None
 389        if verbose:
 390            print(f"thinking job={job['id']} title={job['title']} kind={job['kind']}", flush=True)
 391            print(f"objective: {job['objective']}", flush=True)
 392        llm = fake_step_llm() if fake else None
 393        return run_one_step(job["id"], config=self.config, db=self.db, llm=llm)
 394
 395    def send_due_daily_digest(self, *, now: datetime | None = None) -> dict | None:
 396        if not self.config.runtime.daily_digest_enabled:
 397            return None
 398        now = now or datetime.now()
 399        if not _is_digest_due(now, self.config.runtime.daily_digest_time):
 400            return None
 401        day = now.date().isoformat()
 402        target = self.config.email.to_addr or "dry-run"
 403        if self.db.digest_exists(day=day, target=target):
 404            return None
 405        return write_daily_digest(self.config, self.db, day=day)
 406
 407    def run_forever(
 408        self,
 409        *,
 410        fake: bool = False,
 411        poll_seconds: float = 30.0,
 412        quiet: bool = False,
 413        verbose: bool = False,
 414        max_iterations: int | None = None,
 415    ) -> None:
 416        consecutive_failures = 0
 417        iterations = 0
 418        with single_instance_lock(self.lock_path) as lock_handle:
 419            metadata_lock = threading.Lock()
 420
 421            def locked_update_metadata(**patch: Any) -> None:
 422                with metadata_lock:
 423                    update_lock_metadata(lock_handle, **patch)
 424
 425            previous_sigterm = signal.getsignal(signal.SIGTERM)
 426            signal.signal(signal.SIGTERM, _raise_keyboard_interrupt)
 427            cleaned_shell_processes = cleanup_registered_shell_processes(self.config.runtime.home)
 428            recovered = self.db.mark_interrupted_running(reason="daemon recovered abandoned running work from a previous process")
 429            append_daemon_event(
 430                self.config,
 431                "daemon_started",
 432                pid=os.getpid(),
 433                fake=fake,
 434                poll_seconds=poll_seconds,
 435                recovered_steps=recovered["steps"],
 436                recovered_runs=recovered["runs"],
 437                cleaned_shell_processes=len(cleaned_shell_processes),
 438                runtime=current_runtime_fingerprint(),
 439            )
 440            if cleaned_shell_processes:
 441                append_daemon_event(
 442                    self.config,
 443                    "shell_processes_cleaned",
 444                    count=len(cleaned_shell_processes),
 445                    processes=cleaned_shell_processes[:12],
 446                )
 447            if recovered["steps"] or recovered["runs"]:
 448                append_daemon_event(self.config, "stale_work_recovered", **recovered)
 449            if not quiet:
 450                print(f"nipux daemon started; db={self.config.runtime.state_db_path}", flush=True)
 451            try:
 452                while True:
 453                    iterations += 1
 454                    locked_update_metadata(
 455                        last_heartbeat=datetime.now(timezone.utc).isoformat(),
 456                        last_state="checking",
 457                        consecutive_failures=consecutive_failures,
 458                        runtime=current_runtime_fingerprint(),
 459                    )
 460
 461                    try:
 462                        digest = self.send_due_daily_digest()
 463                        if digest:
 464                            append_daemon_event(self.config, "daily_digest", **digest)
 465                            if not quiet:
 466                                print(f"daily_digest {json.dumps(digest, ensure_ascii=False)}", flush=True)
 467                        with _work_heartbeat(
 468                            locked_update_metadata,
 469                            consecutive_failures=consecutive_failures,
 470                        ):
 471                            result = self.run_once(fake=fake, verbose=verbose and not quiet)
 472                    except Exception as exc:
 473                        consecutive_failures += 1
 474                        payload = _exception_payload(exc)
 475                        locked_update_metadata(
 476                            last_heartbeat=datetime.now(timezone.utc).isoformat(),
 477                            last_state="error",
 478                            last_error=payload["error"],
 479                            last_error_type=payload["error_type"],
 480                            consecutive_failures=consecutive_failures,
 481                            runtime=current_runtime_fingerprint(),
 482                        )
 483                        append_daemon_event(self.config, "daemon_error", **payload, consecutive_failures=consecutive_failures)
 484                        if not quiet:
 485                            print(
 486                                f"daemon_error type={payload['error_type']} error={payload['error'][:240]}",
 487                                flush=True,
 488                            )
 489                        _sleep_or_stop(_exception_backoff(exc, poll_seconds, consecutive_failures), max_iterations, iterations)
 490                        if max_iterations is not None and iterations >= max_iterations:
 491                            return
 492                        continue
 493
 494                    if result is None:
 495                        locked_update_metadata(
 496                            last_heartbeat=datetime.now(timezone.utc).isoformat(),
 497                            last_state="idle",
 498                            runtime=current_runtime_fingerprint(),
 499                        )
 500                        idle_sleep = self.idle_sleep_seconds(poll_seconds=poll_seconds)
 501                        if not quiet:
 502                            print(f"idle; sleeping {idle_sleep:g}s", flush=True)
 503                        _sleep_or_stop(idle_sleep, max_iterations, iterations)
 504                    else:
 505                        consecutive_failures = consecutive_failures + 1 if result.status == "failed" else 0
 506                        locked_update_metadata(
 507                            last_heartbeat=datetime.now(timezone.utc).isoformat(),
 508                            last_state="step",
 509                            last_job_id=result.job_id,
 510                            last_run_id=result.run_id,
 511                            last_step_id=result.step_id,
 512                            last_status=result.status,
 513                            last_tool=result.tool_name,
 514                            last_error="" if result.status != "failed" else str(result.result.get("error") or ""),
 515                            last_error_type="" if result.status != "failed" else str(result.result.get("error_type") or ""),
 516                            consecutive_failures=consecutive_failures,
 517                            runtime=current_runtime_fingerprint(),
 518                        )
 519                        detail = result.result.get("error") or result.result.get("artifact_id") or result.result.get("content", "")
 520                        append_daemon_event(
 521                            self.config,
 522                            "step",
 523                            job_id=result.job_id,
 524                            run_id=result.run_id,
 525                            step_id=result.step_id,
 526                            status=result.status,
 527                            tool=result.tool_name,
 528                            detail=str(detail)[:500],
 529                            consecutive_failures=consecutive_failures,
 530                        )
 531                        if not quiet:
 532                            print(
 533                                f"step job={result.job_id} run={result.run_id} step={result.step_id} "
 534                                f"status={result.status} tool={result.tool_name or '-'} detail={str(detail)[:240]}",
 535                                flush=True,
 536                            )
 537                            if verbose:
 538                                print(json.dumps(result.result, ensure_ascii=False, indent=2)[:8000], flush=True)
 539                        sleep_seconds = (
 540                            _step_failure_backoff(result, poll_seconds, consecutive_failures)
 541                            if result.status == "failed"
 542                            else max(0.0, poll_seconds)
 543                        )
 544                        _sleep_or_stop(sleep_seconds, max_iterations, iterations)
 545                    if max_iterations is not None and iterations >= max_iterations:
 546                        return
 547            except KeyboardInterrupt:
 548                interrupted = self.db.mark_interrupted_running(reason="daemon stopped during active work")
 549                locked_update_metadata(
 550                    last_heartbeat=datetime.now(timezone.utc).isoformat(),
 551                    last_state="stopped",
 552                    consecutive_failures=consecutive_failures,
 553                    runtime=current_runtime_fingerprint(),
 554                )
 555                append_daemon_event(self.config, "daemon_stopped", pid=os.getpid(), interrupted_steps=interrupted["steps"], interrupted_runs=interrupted["runs"])
 556                if not quiet:
 557                    print("nipux daemon stopped", flush=True)
 558            finally:
 559                signal.signal(signal.SIGTERM, previous_sigterm)
 560
 561
 562def _is_digest_due(now: datetime, configured_time: str) -> bool:
 563    try:
 564        hour_text, minute_text = configured_time.split(":", 1)
 565        hour = int(hour_text)
 566        minute = int(minute_text)
 567    except ValueError:
 568        hour, minute = 8, 0
 569    return (now.hour, now.minute) >= (hour, minute)
 570
 571
 572def _raise_keyboard_interrupt(signum, frame) -> None:
 573    raise KeyboardInterrupt
 574
 575
 576def _exception_payload(exc: Exception) -> dict[str, str]:
 577    return {
 578        "error": str(exc),
 579        "error_type": type(exc).__name__,
 580    }
 581
 582
 583def _failure_backoff(poll_seconds: float, consecutive_failures: int) -> float:
 584    base = max(1.0, poll_seconds)
 585    return min(60.0, base * min(8, max(1, consecutive_failures)))
 586
 587
 588def _step_failure_backoff(result: Any, poll_seconds: float, consecutive_failures: int) -> float:
 589    """Return a retry delay for failed worker steps.
 590
 591    Worker failures are recorded as failed steps rather than escaping as daemon
 592    exceptions, so they use the same generic throttling path here.
 593    """
 594
 595    fallback = _failure_backoff(poll_seconds, consecutive_failures)
 596    return fallback
 597
 598
 599def _exception_backoff(exc: Exception, poll_seconds: float, consecutive_failures: int) -> float:
 600    fallback = _failure_backoff(poll_seconds, consecutive_failures)
 601    if not _is_rate_limit_error(exc):
 602        return fallback
 603    retry_after = _retry_after_seconds(exc)
 604    if retry_after is None:
 605        return max(fallback, 10.0)
 606    return max(fallback, min(300.0, retry_after))
 607
 608
 609def _is_rate_limit_error(exc: Exception) -> bool:
 610    status_code = getattr(exc, "status_code", None)
 611    if status_code == 429:
 612        return True
 613    return _is_rate_limit_text(f"{type(exc).__name__} {exc}")
 614
 615
 616def _is_rate_limit_text(text: str) -> bool:
 617    return provider_rate_limited(text)
 618
 619
 620def _retry_after_seconds(exc: Exception) -> float | None:
 621    headers = _exception_headers(exc)
 622    for key, value in headers.items():
 623        normalized = key.lower()
 624        if normalized in {"retry-after", "x-ratelimit-reset", "x-rate-limit-reset"}:
 625            parsed = _parse_retry_after(value)
 626            if parsed is not None:
 627                return parsed
 628    return None
 629
 630
 631def _exception_headers(exc: Exception) -> dict[str, str]:
 632    response = getattr(exc, "response", None)
 633    headers = getattr(response, "headers", None)
 634    if headers:
 635        return {str(key): str(value) for key, value in dict(headers).items()}
 636    return {}
 637
 638
 639def _parse_retry_after(value: str) -> float | None:
 640    text = str(value).strip()
 641    if not text:
 642        return None
 643    with contextlib.suppress(ValueError):
 644        number = float(text)
 645        if number > 10_000_000_000:
 646            number = number / 1000
 647        if number > 1_000_000_000:
 648            return max(0.0, number - time.time())
 649        return max(0.0, number)
 650    with contextlib.suppress(ValueError, TypeError, OSError):
 651        parsed = parsedate_to_datetime(text)
 652        if parsed.tzinfo is None:
 653            parsed = parsed.replace(tzinfo=timezone.utc)
 654        return max(0.0, parsed.timestamp() - time.time())
 655    return None
 656
 657
 658def _provider_probe_due(job: dict[str, Any], *, now: datetime) -> bool:
 659    metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
 660    raw = str(metadata.get("provider_last_probe_at") or "").strip()
 661    if not raw:
 662        return True
 663    with contextlib.suppress(ValueError):
 664        previous = datetime.fromisoformat(raw.replace("Z", "+00:00"))
 665        if previous.tzinfo is None:
 666            previous = previous.replace(tzinfo=timezone.utc)
 667        return (now.astimezone(timezone.utc) - previous.astimezone(timezone.utc)).total_seconds() >= PROVIDER_RECOVERY_PROBE_SECONDS
 668    return True
 669
 670
 671def _model_generation_ready(config: AppConfig) -> tuple[bool, str]:
 672    checks = run_doctor(config=config, check_model=True)
 673    failures = [check for check in checks if not check.ok and check.name in {"model_config", "model_auth", "model_endpoint", "model_generation"}]
 674    if not failures:
 675        return True, "model_generation accepted"
 676    detail = "; ".join(f"{check.name}: {check.detail}" for check in failures)
 677    return False, detail
 678
 679
 680def _sleep_or_stop(seconds: float, max_iterations: int | None, iterations: int) -> None:
 681    if max_iterations is not None and iterations >= max_iterations:
 682        return
 683    time.sleep(seconds)
 684
 685
 686def _focused_job_id(config: AppConfig) -> str | None:
 687    path = config.runtime.home / "shell_state.json"
 688    if not path.exists():
 689        return None
 690    try:
 691        parsed = json.loads(path.read_text(encoding="utf-8"))
 692    except (OSError, json.JSONDecodeError):
 693        return None
 694    job_id = parsed.get("focus_job_id") if isinstance(parsed, dict) else None
 695    return job_id if isinstance(job_id, str) and job_id else None
nipux_cli/daemon_control.py 243 lines
   1"""Daemon process control helpers used by CLI commands."""
   2
   3from __future__ import annotations
   4
   5import argparse
   6import os
   7import signal
   8import subprocess
   9import sys
  10import time
  11from pathlib import Path
  12from typing import Any, Callable
  13
  14from nipux_cli.config import AppConfig, load_config
  15from nipux_cli.cli_state import clear_model_setup_verified, mark_model_setup_verified
  16from nipux_cli.daemon import daemon_lock_status
  17from nipux_cli.doctor import run_doctor
  18from nipux_cli.provider_errors import provider_action_required, provider_rate_limited
  19
  20
  21ReadyFn = Callable[[Any], bool]
  22StartFn = Callable[[argparse.Namespace], None]
  23StopFn = Callable[[Any], bool]
  24PidAliveFn = Callable[[int], bool]
  25
  26
  27def remote_model_preflight_failures(config: Any, *, doctor_fn: Callable[..., list[Any]] = run_doctor) -> list[str]:
  28    blocking = {"model_config", "model_auth", "model_endpoint", "model_generation"}
  29    checks = doctor_fn(config=config, check_model=True)
  30    return [f"{check.name}: {check.detail}" for check in checks if not check.ok and check.name in blocking]
  31
  32
  33def _recoverable_provider_preflight(failures: list[str]) -> bool:
  34    if not failures:
  35        return False
  36    for failure in failures:
  37        name = failure.split(":", 1)[0].strip()
  38        if name != "model_generation":
  39            return False
  40        if not (provider_action_required(failure) or provider_rate_limited(failure)):
  41            return False
  42    return True
  43
  44
  45def recoverable_remote_model_preflight_failures(
  46    config: Any,
  47    *,
  48    doctor_fn: Callable[..., list[Any]] = run_doctor,
  49) -> list[str]:
  50    failures = remote_model_preflight_failures(config, doctor_fn=doctor_fn)
  51    return failures if _recoverable_provider_preflight(failures) else []
  52
  53
  54def provider_preflight_is_recoverable(failures: list[str]) -> bool:
  55    return _recoverable_provider_preflight(failures)
  56
  57
  58def ensure_remote_model_ready_for_worker(
  59    config: Any,
  60    *,
  61    fake: bool,
  62    doctor_fn: Callable[..., list[Any]] = run_doctor,
  63) -> bool:
  64    if fake:
  65        return True
  66    failures = remote_model_preflight_failures(config, doctor_fn=doctor_fn)
  67    if not failures:
  68        mark_model_setup_verified(config)
  69        return True
  70    if _recoverable_provider_preflight(failures):
  71        clear_model_setup_verified()
  72        print("model provider is not ready; starting daemon in recovery monitor mode")
  73        for failure in failures:
  74            print(f"  wait {failure}")
  75        print("The daemon will periodically re-check the configured model and resume provider-blocked jobs when it works.")
  76        return True
  77    clear_model_setup_verified()
  78    print("model is not ready; daemon not started")
  79    for failure in failures:
  80        print(f"  fail {failure}")
  81    print("Run `nipux doctor --check-model` after fixing the model configuration.")
  82    return False
  83
  84
  85def cmd_start_impl(
  86    args: argparse.Namespace,
  87    *,
  88    ready_fn: Callable[[Any, bool], bool],
  89    stop_fn: Callable[[AppConfig, float, bool], bool],
  90) -> None:
  91    config = load_config()
  92    config.ensure_dirs()
  93    status = daemon_lock_status(config.runtime.home / "agentd.lock")
  94    if status["running"]:
  95        metadata = status.get("metadata") or {}
  96        if status.get("stale"):
  97            print(f"nipux daemon stale pid={metadata.get('pid', 'unknown')}; restarting")
  98            stop_fn(config, 5.0, True)
  99            time.sleep(0.5)
 100        else:
 101            print(f"nipux daemon already running pid={metadata.get('pid', 'unknown')}")
 102            return
 103    if not ready_fn(config, bool(args.fake)):
 104        return
 105    log_path = Path(args.log_file).expanduser() if args.log_file else config.runtime.logs_dir / "daemon.log"
 106    log_path.parent.mkdir(parents=True, exist_ok=True)
 107    command = [
 108        sys.executable,
 109        "-m",
 110        "nipux_cli.cli",
 111        "daemon",
 112        "--poll-seconds",
 113        str(args.poll_seconds),
 114    ]
 115    if args.fake:
 116        command.append("--fake")
 117    command.append("--quiet" if args.quiet else "--verbose")
 118    with log_path.open("a", encoding="utf-8") as log_file:
 119        process = subprocess.Popen(
 120            command,
 121            cwd=str(Path.cwd()),
 122            stdout=log_file,
 123            stderr=subprocess.STDOUT,
 124            start_new_session=True,
 125        )
 126    time.sleep(0.5)
 127    status = daemon_lock_status(config.runtime.home / "agentd.lock")
 128    if status["running"]:
 129        metadata = status.get("metadata") or {}
 130        print(f"nipux daemon started pid={metadata.get('pid') or process.pid}")
 131        print(f"log: {log_path}")
 132        return
 133    if process.poll() is None:
 134        print(f"nipux daemon process started pid={process.pid}, waiting for lock")
 135        print(f"log: {log_path}")
 136        return
 137    raise SystemExit(f"nipux daemon exited immediately with code {process.returncode}; see {log_path}")
 138
 139
 140def start_daemon_if_needed_impl(
 141    *,
 142    poll_seconds: float,
 143    fake: bool,
 144    quiet: bool,
 145    log_file: str | None,
 146    start_fn: StartFn,
 147    stop_fn: Callable[[AppConfig, float, bool], bool],
 148) -> None:
 149    config = load_config()
 150    config.ensure_dirs()
 151    status = daemon_lock_status(config.runtime.home / "agentd.lock")
 152    if status["running"]:
 153        metadata = status.get("metadata") or {}
 154        if status.get("stale"):
 155            print(f"daemon stale pid={metadata.get('pid', 'unknown')}; restarting")
 156            stop_fn(config, 5.0, True)
 157            time.sleep(0.5)
 158            start_fn(argparse.Namespace(poll_seconds=poll_seconds, fake=fake, quiet=quiet, log_file=log_file))
 159            return
 160        print(f"daemon already running pid={metadata.get('pid', 'unknown')}")
 161        return
 162    start_fn(argparse.Namespace(poll_seconds=poll_seconds, fake=fake, quiet=quiet, log_file=log_file))
 163
 164
 165def cmd_restart_impl(
 166    args: argparse.Namespace,
 167    *,
 168    start_fn: StartFn,
 169    stop_fn: Callable[[AppConfig, float, bool], bool],
 170) -> None:
 171    config = load_config()
 172    config.ensure_dirs()
 173    stopped = stop_fn(config, float(args.wait), False)
 174    if stopped:
 175        time.sleep(0.5)
 176    start_fn(argparse.Namespace(poll_seconds=args.poll_seconds, fake=args.fake, quiet=args.quiet, log_file=args.log_file))
 177
 178
 179def stop_daemon_process_impl(
 180    config: AppConfig,
 181    *,
 182    wait: float,
 183    quiet: bool,
 184    pid_alive: PidAliveFn,
 185) -> bool:
 186    status = daemon_lock_status(config.runtime.home / "agentd.lock")
 187    if not status["running"]:
 188        if not quiet:
 189            print("nipux daemon is not running")
 190        return False
 191    metadata = status.get("metadata") or {}
 192    pid = metadata.get("pid")
 193    if not isinstance(pid, int):
 194        recovered = _find_single_daemon_process()
 195        if recovered is None:
 196            raise SystemExit("daemon is running but lock file has no pid; stop it from the terminal that owns it")
 197        pid = recovered
 198        if not quiet:
 199            print(f"daemon lock had no pid; recovered daemon pid={pid}")
 200    os.kill(pid, signal.SIGTERM)
 201    deadline = time.time() + wait
 202    while time.time() < deadline:
 203        if not pid_alive(pid):
 204            if not quiet:
 205                print(f"nipux daemon stopped pid={pid}")
 206            return True
 207        time.sleep(0.2)
 208    if not quiet:
 209        print(f"sent SIGTERM to nipux daemon pid={pid}; it may still be shutting down")
 210    return False
 211
 212
 213def _find_single_daemon_process() -> int | None:
 214    """Best-effort recovery for older locks that lost pid metadata."""
 215
 216    try:
 217        result = subprocess.run(["ps", "-eo", "pid=,args="], capture_output=True, text=True, timeout=5)
 218    except (OSError, subprocess.SubprocessError):
 219        return None
 220    if result.returncode != 0:
 221        return None
 222    candidates: list[int] = []
 223    current_pid = os.getpid()
 224    for raw_line in result.stdout.splitlines():
 225        line = raw_line.strip()
 226        if not line:
 227            continue
 228        pid_text, _, command = line.partition(" ")
 229        try:
 230            pid = int(pid_text)
 231        except ValueError:
 232            continue
 233        if pid == current_pid:
 234            continue
 235        normalized = " ".join(command.split())
 236        if "-m nipux_cli.cli daemon" in normalized or " nipux_cli.cli daemon" in normalized:
 237            candidates.append(pid)
 238            continue
 239        parts = normalized.split()
 240        if parts and Path(parts[0]).name == "nipux" and "daemon" in parts[1:]:
 241            candidates.append(pid)
 242    unique = sorted(set(candidates))
 243    return unique[0] if len(unique) == 1 else None
nipux_cli/dashboard.py 493 lines
   1"""Operator-facing dashboard state and rendering."""
   2
   3from __future__ import annotations
   4
   5from collections import Counter
   6from datetime import datetime, timezone
   7from pathlib import Path
   8from textwrap import shorten
   9from typing import Any
  10
  11from nipux_cli.config import AppConfig
  12from nipux_cli.daemon import daemon_lock_status
  13from nipux_cli.db import AgentDB
  14from nipux_cli.operator_context import active_prompt_operator_entries
  15from nipux_cli.scheduling import job_deferred_until
  16from nipux_cli.tools import DEFAULT_REGISTRY
  17
  18
  19def collect_dashboard_state(
  20    db: AgentDB,
  21    config: AppConfig,
  22    *,
  23    job_id: str | None = None,
  24    limit: int = 12,
  25) -> dict[str, Any]:
  26    """Build a serializable snapshot for status and dashboard commands."""
  27
  28    jobs = db.list_jobs()
  29    selected = _select_focus_job(db, jobs, job_id)
  30    job_cards = [_job_card(db, job) for job in jobs]
  31    focus = _focus_state(db, selected, limit=limit) if selected else None
  32    return {
  33        "generated_at": datetime.now(timezone.utc).isoformat(),
  34        "daemon": daemon_lock_status(config.runtime.home / "agentd.lock"),
  35        "runtime": {
  36            "home": str(config.runtime.home),
  37            "state_db": str(config.runtime.state_db_path),
  38            "logs_dir": str(config.runtime.logs_dir),
  39            "model": config.model.model,
  40            "base_url": config.model.base_url,
  41            "tool_count": len(DEFAULT_REGISTRY.names()),
  42        },
  43        "jobs": job_cards,
  44        "focus": focus,
  45    }
  46
  47
  48def render_dashboard(state: dict[str, Any], *, width: int = 120, chars: int = 260) -> str:
  49    """Render a compact terminal dashboard."""
  50
  51    width = max(72, min(width, 160))
  52    line = "-" * width
  53    runtime = state["runtime"]
  54    daemon = state["daemon"]
  55    focus = state.get("focus")
  56    generated_at = _compact_time(state["generated_at"])
  57    daemon_text = _daemon_text(daemon)
  58    lines = [
  59        "Nipux CLI Dashboard".ljust(width - len(generated_at)) + generated_at,
  60        line,
  61        f"daemon: {daemon_text}",
  62        f"model: {runtime['model']} | endpoint: {runtime['base_url']} | tools: {runtime['tool_count']}",
  63        f"home: {runtime['home']}",
  64        "trace: model-visible state, tool calls, outputs, artifacts, and errors. Hidden chain-of-thought is not exposed.",
  65        line,
  66        "Jobs",
  67    ]
  68    jobs = state.get("jobs") or []
  69    if not jobs:
  70        lines.append("  no jobs yet")
  71    else:
  72        lines.append("  title                         state      kind            steps  artifacts  last action")
  73        for job in jobs[:12]:
  74            latest = job.get("latest_step") or {}
  75            last_action = _one_line(latest.get("summary") or latest.get("error") or "-", 42)
  76            display_state = _job_state_text(job, bool(daemon.get("running")))
  77            lines.append(
  78                f"  {_one_line(job['title'], 29):<29} {display_state:<10} {job['kind']:<15} "
  79                f"{job['step_count']:>5} {job['artifact_count']:>10}  {last_action}"
  80            )
  81    if focus:
  82        lines.extend(_render_focus(focus, width=width, chars=chars, daemon_running=bool(daemon.get("running"))))
  83    return "\n".join(lines).rstrip() + "\n"
  84
  85
  86def render_overview(state: dict[str, Any], *, width: int = 100) -> str:
  87    """Render a human-sized status view for the interactive shell."""
  88
  89    width = max(72, min(width, 120))
  90    runtime = state["runtime"]
  91    daemon = state["daemon"]
  92    focus = state.get("focus")
  93    jobs = state.get("jobs") or []
  94    latest_step = ((focus or {}).get("recent_steps") or [{}])[-1] if focus else {}
  95    lines = [
  96        "Nipux Status",
  97        "=" * min(width, 96),
  98        f"daemon: {_daemon_health_text(daemon, latest_step=latest_step)}",
  99        f"model: {runtime['model']}",
 100        f"jobs: {len(jobs)} total | tools: {runtime['tool_count']} | home: {runtime['home']}",
 101    ]
 102    if not focus:
 103        lines.append("focus: no job yet")
 104        lines.append("")
 105        lines.append("next: create \"your objective\" --title \"name\"")
 106        return "\n".join(lines).rstrip() + "\n"
 107
 108    job = focus["job"]
 109    counts = focus["counts"]
 110    artifacts = focus.get("artifacts") or []
 111    metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
 112    operator = (metadata.get("last_operator_message") if isinstance(metadata, dict) else None) or {}
 113    agent_update = (metadata.get("last_agent_update") if isinstance(metadata, dict) else None) or {}
 114    lesson = (metadata.get("last_lesson") if isinstance(metadata, dict) else None) or {}
 115    findings = metadata.get("finding_ledger") if isinstance(metadata.get("finding_ledger"), list) else []
 116    sources = metadata.get("source_ledger") if isinstance(metadata.get("source_ledger"), list) else []
 117    tasks = metadata.get("task_queue") if isinstance(metadata.get("task_queue"), list) else []
 118    experiments = metadata.get("experiment_ledger") if isinstance(metadata.get("experiment_ledger"), list) else []
 119    active_operator = _active_operator_messages(metadata)
 120    pending_measurement = metadata.get("pending_measurement_obligation") if isinstance(metadata.get("pending_measurement_obligation"), dict) else {}
 121    lines.extend([
 122        "",
 123        f"focus: {job['title']}",
 124        (
 125            f"state: {_job_state_text(job, bool(daemon.get('running')))} | "
 126            f"worker: {_worker_text(job, bool(daemon.get('running')))} | kind: {job['kind']} | "
 127            f"steps: {counts['steps']} | artifacts: {counts['artifacts']} | failures: {counts['failed_steps']}"
 128        ),
 129        f"learning: findings={len(findings)} | sources={len(sources)} | tasks={len(tasks)} | experiments={len(experiments)} | lessons={counts.get('lessons', 0)} | reflections={counts.get('reflections', 0)}",
 130        f"objective: {_one_line(job['objective'], width - 11)}",
 131    ])
 132    if active_operator:
 133        lines.append(f"operator context: {len(active_operator)} active | {_one_line(active_operator[-1].get('message') or '', width - 28)}")
 134    if pending_measurement:
 135        lines.append(f"measurement: pending from step #{pending_measurement.get('source_step_no') or '?'}")
 136    if latest_step:
 137        tool = latest_step.get("tool_name") or latest_step.get("kind") or "-"
 138        status = latest_step.get("status") or "-"
 139        summary = latest_step.get("summary") or latest_step.get("error") or "-"
 140        lines.append(f"latest: #{latest_step.get('step_no')} {status} {tool}: {_one_line(summary, width - 22)}")
 141    if artifacts:
 142        artifact = artifacts[0]
 143        lines.append(f"latest artifact: {artifact.get('title') or artifact['id']}")
 144    if operator:
 145        lines.append(f"last steering: {_one_line(operator.get('message') or '', width - 15)}")
 146    if agent_update:
 147        lines.append(f"agent note: {_one_line(agent_update.get('message') or '', width - 12)}")
 148    if lesson:
 149        lines.append(f"latest lesson: {_one_line(lesson.get('lesson') or '', width - 16)}")
 150    lines.extend([
 151        "",
 152        "commands: activity | updates | findings | tasks | sources | memory | metrics | work --steps 3 | start | stop",
 153    ])
 154    return "\n".join(lines).rstrip() + "\n"
 155
 156
 157def _select_focus_job(db: AgentDB, jobs: list[dict[str, Any]], job_id: str | None) -> dict[str, Any] | None:
 158    if job_id:
 159        return db.get_job(job_id)
 160    for status in ("running", "queued", "paused", "failed", "completed"):
 161        for job in jobs:
 162            if job.get("status") == status:
 163                return job
 164    return jobs[0] if jobs else None
 165
 166
 167def _job_card(db: AgentDB, job: dict[str, Any]) -> dict[str, Any]:
 168    steps = db.list_steps(job_id=job["id"])
 169    artifacts = db.list_artifacts(job["id"], limit=500)
 170    runs = db.list_runs(job["id"], limit=500)
 171    return {
 172        "id": job["id"],
 173        "status": job["status"],
 174        "kind": job["kind"],
 175        "title": job["title"],
 176        "updated_at": job["updated_at"],
 177        "step_count": _step_count(steps),
 178        "run_count": len(runs),
 179        "failed_steps": sum(1 for step in steps if step.get("status") == "failed"),
 180        "artifact_count": len(artifacts),
 181        "latest_step": _public_step(steps[-1]) if steps else None,
 182    }
 183
 184
 185def _focus_state(db: AgentDB, job: dict[str, Any], *, limit: int) -> dict[str, Any]:
 186    steps = db.list_steps(job_id=job["id"])
 187    runs = db.list_runs(job["id"], limit=limit)
 188    artifacts = db.list_artifacts(job["id"], limit=limit)
 189    memory = db.list_memory(job["id"])
 190    metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
 191    lessons = metadata.get("lessons") if isinstance(metadata.get("lessons"), list) else []
 192    findings = metadata.get("finding_ledger") if isinstance(metadata.get("finding_ledger"), list) else []
 193    sources = metadata.get("source_ledger") if isinstance(metadata.get("source_ledger"), list) else []
 194    tasks = metadata.get("task_queue") if isinstance(metadata.get("task_queue"), list) else []
 195    experiments = metadata.get("experiment_ledger") if isinstance(metadata.get("experiment_ledger"), list) else []
 196    reflections = metadata.get("reflections") if isinstance(metadata.get("reflections"), list) else []
 197    active_operator = _active_operator_messages(metadata)
 198    tool_counts = Counter(step.get("tool_name") or step.get("kind") or "unknown" for step in steps)
 199    blocked = [step for step in steps if str(step.get("error") or "").endswith("blocked") or "blocked" in str(step.get("summary") or "")]
 200    return {
 201        "job": {
 202            "id": job["id"],
 203            "title": job["title"],
 204            "kind": job["kind"],
 205            "status": job["status"],
 206            "objective": job["objective"],
 207            "updated_at": job["updated_at"],
 208            "metadata": job.get("metadata") or {},
 209        },
 210        "counts": {
 211            "steps": _step_count(steps),
 212            "runs": len(db.list_runs(job["id"], limit=1000)),
 213            "artifacts": len(db.list_artifacts(job["id"], limit=1000)),
 214            "failed_steps": sum(1 for step in steps if step.get("status") == "failed"),
 215            "blocked_steps": len(blocked),
 216            "findings": len(findings),
 217            "sources": len(sources),
 218            "tasks": len(tasks),
 219            "experiments": len(experiments),
 220            "active_operator_messages": len(active_operator),
 221            "lessons": len(lessons),
 222            "reflections": len(reflections),
 223        },
 224        "tool_counts": dict(tool_counts.most_common(8)),
 225        "recent_runs": [_public_run(run) for run in runs],
 226        "recent_steps": [_public_step(step) for step in steps[-limit:]],
 227        "artifacts": [_public_artifact(artifact) for artifact in artifacts],
 228        "memory": [
 229            {
 230                "key": entry.get("key"),
 231                "summary": entry.get("summary"),
 232                "artifact_refs": entry.get("artifact_refs") or [],
 233                "updated_at": entry.get("updated_at"),
 234            }
 235            for entry in memory[:4]
 236        ],
 237        "lessons": [
 238            {
 239                "at": entry.get("at"),
 240                "category": entry.get("category") or "memory",
 241                "lesson": entry.get("lesson") or "",
 242                "confidence": entry.get("confidence"),
 243            }
 244            for entry in lessons[-5:]
 245            if isinstance(entry, dict)
 246        ],
 247        "findings": findings[-8:],
 248        "tasks": tasks[-12:],
 249        "experiments": experiments[-12:],
 250        "active_operator_messages": active_operator[-12:],
 251        "sources": sources[-8:],
 252        "reflections": reflections[-4:],
 253    }
 254
 255
 256def _render_focus(focus: dict[str, Any], *, width: int, chars: int, daemon_running: bool) -> list[str]:
 257    job = focus["job"]
 258    counts = focus["counts"]
 259    lines = [
 260        "-" * width,
 261        f"Focus Job: {job['title']} | state {_job_state_text(job, daemon_running)} | {job['kind']}",
 262        f"objective: {_one_line(job['objective'], width - 11)}",
 263        (
 264            f"counts: steps={counts['steps']} runs={counts['runs']} artifacts={counts['artifacts']} "
 265            f"failed_steps={counts['failed_steps']} blocked_steps={counts['blocked_steps']}"
 266        ),
 267        f"learning: findings={counts.get('findings', 0)} sources={counts.get('sources', 0)} tasks={counts.get('tasks', 0)} experiments={counts.get('experiments', 0)} lessons={counts.get('lessons', 0)} reflections={counts.get('reflections', 0)}",
 268        f"tool mix: {_tool_mix(focus.get('tool_counts') or {})}",
 269    ]
 270    active_operator = focus.get("active_operator_messages") or []
 271    if active_operator:
 272        lines.append(f"operator context: {len(active_operator)} active | {_one_line(active_operator[-1].get('message') or '', chars)}")
 273    pending_measurement = (job.get("metadata") or {}).get("pending_measurement_obligation") if isinstance(job.get("metadata"), dict) else {}
 274    if isinstance(pending_measurement, dict) and pending_measurement:
 275        lines.append(f"measurement obligation: pending from step #{pending_measurement.get('source_step_no') or '?'}")
 276    lines.extend(["", "Recent Steps"])
 277    recent_steps = focus.get("recent_steps") or []
 278    if not recent_steps:
 279        lines.append("  no steps recorded")
 280    for step in recent_steps:
 281        error = f" | error={_one_line(step['error'], 70)}" if step.get("error") else ""
 282        lines.append(
 283            f"  #{step['step_no']:<4} {step['status']:<9} {step.get('tool_name') or step['kind']:<18} "
 284            f"{_one_line(step.get('summary') or '-', chars)}{error}"
 285        )
 286        args = step.get("arguments") or {}
 287        if args:
 288            lines.append(f"       args: {_one_line(_compact_value(args), chars)}")
 289    lines.append("")
 290    lines.append("Artifacts")
 291    artifacts = focus.get("artifacts") or []
 292    if not artifacts:
 293        lines.append("  no artifacts yet")
 294    for artifact in artifacts[:8]:
 295        title = artifact.get("title") or artifact["id"]
 296        lines.append(f"  {artifact['created_at']} {artifact['type']} {title}")
 297        if artifact.get("summary"):
 298            lines.append(f"       {_one_line(artifact['summary'], chars)}")
 299    lessons = focus.get("lessons") or []
 300    if lessons:
 301        lines.append("")
 302        lines.append("Lessons")
 303        for lesson in lessons:
 304            lines.append(f"  {lesson.get('category') or 'memory'}: {_one_line(lesson.get('lesson') or '', chars)}")
 305    findings = focus.get("findings") or []
 306    if findings:
 307        lines.append("")
 308        lines.append("Recent Findings")
 309        for finding in findings[-5:]:
 310            lines.append(f"  {_one_line(finding.get('name') or 'unknown', 48)} score={finding.get('score')} {finding.get('category') or ''}")
 311    tasks = focus.get("tasks") or []
 312    if tasks:
 313        lines.append("")
 314        lines.append("Task Queue")
 315        for task in tasks[-6:]:
 316            lines.append(f"  {task.get('status') or 'open':<7} p={task.get('priority') or 0:<3} {_one_line(task.get('title') or 'untitled', 56)}")
 317    sources = focus.get("sources") or []
 318    if sources:
 319        lines.append("")
 320        lines.append("Recent Sources")
 321        for source in sources[-5:]:
 322            lines.append(f"  {_one_line(source.get('source') or 'unknown', 48)} score={source.get('usefulness_score')} findings={source.get('yield_count') or 0}")
 323    memory = focus.get("memory") or []
 324    if memory:
 325        lines.append("")
 326        lines.append("Compact Memory")
 327        for entry in memory:
 328            refs = ", ".join(entry.get("artifact_refs") or [])
 329            suffix = f" refs={refs}" if refs else ""
 330            lines.append(f"  {entry['key']}: {_one_line(entry.get('summary') or '', chars)}{suffix}")
 331    return lines
 332
 333
 334def _public_run(run: dict[str, Any]) -> dict[str, Any]:
 335    return {
 336        "id": run["id"],
 337        "status": run["status"],
 338        "started_at": run["started_at"],
 339        "ended_at": run.get("ended_at"),
 340        "model": run.get("model"),
 341        "error": run.get("error"),
 342    }
 343
 344
 345def _public_step(step: dict[str, Any]) -> dict[str, Any]:
 346    input_data = step.get("input") if isinstance(step.get("input"), dict) else {}
 347    args = input_data.get("arguments") if isinstance(input_data.get("arguments"), dict) else {}
 348    return {
 349        "id": step["id"],
 350        "step_no": step["step_no"],
 351        "kind": step["kind"],
 352        "status": step["status"],
 353        "tool_name": step.get("tool_name"),
 354        "started_at": step["started_at"],
 355        "ended_at": step.get("ended_at"),
 356        "summary": _clean_step_summary(step.get("summary")),
 357        "error": step.get("error"),
 358        "arguments": args,
 359    }
 360
 361
 362def _public_artifact(artifact: dict[str, Any]) -> dict[str, Any]:
 363    return {
 364        "id": artifact["id"],
 365        "created_at": artifact["created_at"],
 366        "type": artifact["type"],
 367        "title": artifact.get("title"),
 368        "summary": artifact.get("summary"),
 369        "path": artifact["path"],
 370    }
 371
 372
 373def _step_count(steps: list[dict[str, Any]]) -> int:
 374    numbers = [int(step.get("step_no") or 0) for step in steps]
 375    return max(numbers, default=0)
 376
 377
 378def _active_operator_messages(metadata: dict[str, Any]) -> list[dict[str, Any]]:
 379    messages = metadata.get("operator_messages") if isinstance(metadata.get("operator_messages"), list) else []
 380    prompt_entries = active_prompt_operator_entries(messages)
 381    return [
 382        entry for entry in messages
 383        if isinstance(entry, dict)
 384        and entry in prompt_entries
 385        and str(entry.get("mode") or "steer") in {"steer", "follow_up"}
 386    ]
 387
 388
 389def _daemon_text(daemon: dict[str, Any]) -> str:
 390    metadata = daemon.get("metadata") or {}
 391    if daemon.get("running"):
 392        pid = metadata.get("pid") or "unknown"
 393        started = metadata.get("started_at") or "unknown start"
 394        stale = " stale-runtime" if daemon.get("stale") else ""
 395        return f"running pid={pid}{stale} started={started}"
 396    return "ready when work starts"
 397
 398
 399def _daemon_health_text(daemon: dict[str, Any], *, latest_step: dict[str, Any] | None = None) -> str:
 400    if not daemon.get("running"):
 401        return "ready when work starts"
 402    metadata = daemon.get("metadata") or {}
 403    heartbeat = metadata.get("last_heartbeat")
 404    status = "running"
 405    if daemon.get("stale"):
 406        status = "running stale-runtime"
 407    if heartbeat:
 408        age = _age_seconds(heartbeat)
 409        if age is not None:
 410            status += f" | heartbeat {int(age)}s ago"
 411            running_step = latest_step or {}
 412            if age > 120 and running_step.get("status") == "running":
 413                tool = running_step.get("tool_name") or running_step.get("kind") or "step"
 414                step_age = _age_seconds(running_step.get("started_at") or "")
 415                if step_age is not None:
 416                    status += f" | busy #{running_step.get('step_no')} {tool} for {int(step_age)}s"
 417                else:
 418                    status += f" | busy #{running_step.get('step_no')} {tool}"
 419            elif age > 120:
 420                status += " (stale)"
 421    failures = metadata.get("consecutive_failures")
 422    if failures:
 423        status += f" | consecutive failures: {failures}"
 424    tool = metadata.get("last_tool")
 425    step_status = metadata.get("last_status")
 426    if tool or step_status:
 427        status += f" | last: {step_status or '?'} {tool or '-'}"
 428    if metadata.get("last_error"):
 429        status += f" | error: {_one_line(metadata.get('last_error'), 48)}"
 430    return status
 431
 432
 433def _worker_text(job: dict[str, Any], daemon_running: bool) -> str:
 434    status = str(job.get("status") or "")
 435    if status in {"paused", "completed", "cancelled", "failed"}:
 436        return status
 437    if job_deferred_until(job):
 438        return "waiting"
 439    return "active" if daemon_running and status in {"running", "queued"} else "idle"
 440
 441
 442def _job_state_text(job: dict[str, Any], daemon_running: bool) -> str:
 443    status = str(job.get("status") or "")
 444    if status in {"running", "queued"}:
 445        if job_deferred_until(job):
 446            return "waiting"
 447        return "advancing" if daemon_running else "open"
 448    return status or "unknown"
 449
 450
 451def _age_seconds(value: str) -> float | None:
 452    try:
 453        parsed = datetime.fromisoformat(value.replace("Z", "+00:00"))
 454    except ValueError:
 455        return None
 456    return max(0.0, (datetime.now(timezone.utc) - parsed.astimezone(timezone.utc)).total_seconds())
 457
 458
 459def _compact_time(value: str) -> str:
 460    try:
 461        parsed = datetime.fromisoformat(value.replace("Z", "+00:00"))
 462    except ValueError:
 463        return value
 464    return parsed.astimezone().strftime("%Y-%m-%d %H:%M:%S %Z")
 465
 466
 467def _one_line(value: Any, width: int) -> str:
 468    text = " ".join(str(value).split())
 469    return shorten(text, width=max(8, width), placeholder="...")
 470
 471
 472def _clean_step_summary(summary: Any) -> str:
 473    text = " ".join(str(summary or "").split())
 474    if text.startswith("write_artifact saved ") and " at /" in text:
 475        return text.split(" at /", 1)[0]
 476    return text
 477
 478
 479def _compact_value(value: Any) -> str:
 480    if isinstance(value, dict):
 481        parts = [f"{key}={value[key]!r}" for key in sorted(value)]
 482        return ", ".join(parts)
 483    return str(value)
 484
 485
 486def _tool_mix(tool_counts: dict[str, int]) -> str:
 487    if not tool_counts:
 488        return "none"
 489    return ", ".join(f"{name}:{count}" for name, count in tool_counts.items())
 490
 491
 492def resolve_artifact_path(path: str | Path) -> str:
 493    return str(Path(path).expanduser())
nipux_cli/db.py 2752 lines
   1"""SQLite state store for the Nipux agent."""
   2
   3from __future__ import annotations
   4
   5import json
   6import random
   7import re
   8import sqlite3
   9import threading
  10import time
  11import uuid
  12from datetime import datetime, timezone
  13from pathlib import Path
  14from typing import Any, Callable, Iterable, TypeVar
  15
  16from nipux_cli.metric_format import format_metric_value
  17from nipux_cli.memory_graph import DEFAULT_NODE_KIND, DEFAULT_NODE_STATUS, NODE_KINDS, NODE_STATUSES
  18
  19T = TypeVar("T")
  20
  21SCHEMA_VERSION = 1
  22
  23SCHEMA_SQL = """
  24CREATE TABLE IF NOT EXISTS schema_version (
  25    version INTEGER NOT NULL
  26);
  27
  28CREATE TABLE IF NOT EXISTS jobs (
  29    id TEXT PRIMARY KEY,
  30    title TEXT NOT NULL,
  31    objective TEXT NOT NULL,
  32    kind TEXT NOT NULL DEFAULT 'generic',
  33    status TEXT NOT NULL DEFAULT 'queued',
  34    priority INTEGER NOT NULL DEFAULT 0,
  35    cadence TEXT,
  36    created_at TEXT NOT NULL,
  37    updated_at TEXT NOT NULL,
  38    metadata_json TEXT NOT NULL DEFAULT '{}'
  39);
  40
  41CREATE TABLE IF NOT EXISTS job_runs (
  42    id TEXT PRIMARY KEY,
  43    job_id TEXT NOT NULL REFERENCES jobs(id),
  44    status TEXT NOT NULL,
  45    started_at TEXT NOT NULL,
  46    ended_at TEXT,
  47    model TEXT,
  48    config_hash TEXT,
  49    score REAL,
  50    error TEXT
  51);
  52
  53CREATE TABLE IF NOT EXISTS steps (
  54    id TEXT PRIMARY KEY,
  55    job_id TEXT NOT NULL REFERENCES jobs(id),
  56    run_id TEXT NOT NULL REFERENCES job_runs(id),
  57    step_no INTEGER NOT NULL,
  58    kind TEXT NOT NULL,
  59    status TEXT NOT NULL,
  60    tool_name TEXT,
  61    started_at TEXT NOT NULL,
  62    ended_at TEXT,
  63    summary TEXT,
  64    input_json TEXT NOT NULL DEFAULT '{}',
  65    output_json TEXT NOT NULL DEFAULT '{}',
  66    error TEXT
  67);
  68
  69CREATE TABLE IF NOT EXISTS artifacts (
  70    id TEXT PRIMARY KEY,
  71    job_id TEXT NOT NULL REFERENCES jobs(id),
  72    run_id TEXT,
  73    step_id TEXT,
  74    type TEXT NOT NULL,
  75    path TEXT NOT NULL,
  76    sha256 TEXT NOT NULL,
  77    title TEXT,
  78    summary TEXT,
  79    metadata_json TEXT NOT NULL DEFAULT '{}',
  80    created_at TEXT NOT NULL
  81);
  82
  83CREATE TABLE IF NOT EXISTS evidence (
  84    id TEXT PRIMARY KEY,
  85    job_id TEXT NOT NULL REFERENCES jobs(id),
  86    url_or_source TEXT NOT NULL,
  87    artifact_id TEXT REFERENCES artifacts(id),
  88    extracted_text_path TEXT,
  89    summary TEXT,
  90    score_json TEXT NOT NULL DEFAULT '{}',
  91    created_at TEXT NOT NULL
  92);
  93
  94CREATE TABLE IF NOT EXISTS memory_index (
  95    id TEXT PRIMARY KEY,
  96    job_id TEXT NOT NULL REFERENCES jobs(id),
  97    key TEXT NOT NULL,
  98    summary TEXT NOT NULL,
  99    artifact_refs_json TEXT NOT NULL DEFAULT '[]',
 100    updated_at TEXT NOT NULL,
 101    UNIQUE(job_id, key)
 102);
 103
 104CREATE TABLE IF NOT EXISTS digests (
 105    id TEXT PRIMARY KEY,
 106    day TEXT NOT NULL,
 107    target TEXT,
 108    subject TEXT,
 109    body_path TEXT,
 110    sent_at TEXT,
 111    status TEXT NOT NULL,
 112    error TEXT
 113);
 114
 115CREATE TABLE IF NOT EXISTS events (
 116    id TEXT PRIMARY KEY,
 117    job_id TEXT REFERENCES jobs(id),
 118    event_type TEXT NOT NULL,
 119    created_at TEXT NOT NULL,
 120    title TEXT,
 121    body TEXT,
 122    ref_table TEXT,
 123    ref_id TEXT,
 124    metadata_json TEXT NOT NULL DEFAULT '{}'
 125);
 126
 127CREATE INDEX IF NOT EXISTS idx_jobs_status_priority ON jobs(status, priority DESC, updated_at);
 128CREATE INDEX IF NOT EXISTS idx_runs_job ON job_runs(job_id, started_at DESC);
 129CREATE INDEX IF NOT EXISTS idx_steps_run ON steps(run_id, step_no);
 130CREATE INDEX IF NOT EXISTS idx_artifacts_job ON artifacts(job_id, created_at DESC);
 131CREATE INDEX IF NOT EXISTS idx_events_job_time ON events(job_id, created_at DESC);
 132CREATE INDEX IF NOT EXISTS idx_events_ref ON events(ref_table, ref_id);
 133"""
 134
 135
 136def utc_now() -> str:
 137    return datetime.now(timezone.utc).isoformat()
 138
 139
 140def new_id(prefix: str) -> str:
 141    return f"{prefix}_{uuid.uuid4().hex[:16]}"
 142
 143
 144def _slugify(value: str) -> str:
 145    slug = re.sub(r"[^a-z0-9]+", "-", value.lower()).strip("-")
 146    return slug[:72].strip("-") or new_id("job")
 147
 148
 149def _unique_job_id(conn: sqlite3.Connection, seed: str) -> str:
 150    base = _slugify(seed)
 151    candidate = base
 152    suffix = 2
 153    while conn.execute("SELECT 1 FROM jobs WHERE id = ? LIMIT 1", (candidate,)).fetchone():
 154        candidate = f"{base[:68]}-{suffix}"
 155        suffix += 1
 156    return candidate
 157
 158
 159def _json_dumps(value: Any) -> str:
 160    return json.dumps(value if value is not None else {}, ensure_ascii=False, sort_keys=True)
 161
 162
 163def _json_loads(value: str | None) -> dict[str, Any]:
 164    try:
 165        loaded = json.loads(value or "{}")
 166    except json.JSONDecodeError:
 167        return {}
 168    return loaded if isinstance(loaded, dict) else {}
 169
 170
 171def _bounded_float(value: Any, low: float, high: float) -> float:
 172    try:
 173        number = float(value)
 174    except (TypeError, ValueError):
 175        return low
 176    return min(high, max(low, number))
 177
 178
 179def _merge_string_lists(existing: Any, incoming: Any, *, limit: int) -> list[str]:
 180    values: list[str] = []
 181    for source in (existing, incoming):
 182        if isinstance(source, list):
 183            items = source
 184        elif isinstance(source, str) and source.strip():
 185            items = [source]
 186        else:
 187            items = []
 188        for item in items:
 189            text = " ".join(str(item).split())
 190            if text and text not in values:
 191                values.append(text)
 192    return values[-limit:]
 193
 194
 195def _memory_edge_key(edge: dict[str, Any]) -> str:
 196    from_key = str(edge.get("from_key") or "")
 197    relation = str(edge.get("relation") or "")
 198    to_key = str(edge.get("to_key") or "")
 199    return f"{from_key}|{relation}|{to_key}" if from_key and relation and to_key else ""
 200
 201
 202def _as_int(value: Any) -> int:
 203    try:
 204        return int(float(value))
 205    except (TypeError, ValueError):
 206        return 0
 207
 208
 209def _as_float(value: Any) -> float | None:
 210    try:
 211        return float(value)
 212    except (TypeError, ValueError):
 213        return None
 214
 215
 216def _nested_value(value: dict[str, Any], *keys: str) -> Any:
 217    current: Any = value
 218    for key in keys:
 219        if not isinstance(current, dict):
 220            return None
 221        current = current.get(key)
 222    return current
 223
 224
 225def _metadata_list(metadata: dict[str, Any], key: str) -> list[dict[str, Any]]:
 226    values = metadata.get(key)
 227    if not isinstance(values, list):
 228        return []
 229    return [value for value in values if isinstance(value, dict)]
 230
 231
 232def _change_fingerprint(entry: dict[str, Any], fields: Iterable[str]) -> str:
 233    return _json_dumps({field: entry.get(field) for field in fields})
 234
 235
 236def _norm_key(value: str) -> str:
 237    return re.sub(r"[^a-z0-9]+", "-", value.lower()).strip("-")[:120]
 238
 239
 240def _clean_status(value: str, allowed: set[str], default: str) -> str:
 241    status = (value.strip().lower() or default).replace(" ", "_")
 242    return status if status in allowed else default
 243
 244
 245def _experiment_metric_value(entry: dict[str, Any]) -> float | None:
 246    try:
 247        value = entry.get("metric_value")
 248        if value is None:
 249            return None
 250        return float(value)
 251    except (TypeError, ValueError):
 252        return None
 253
 254
 255def _same_metric_group(
 256    entry: dict[str, Any],
 257    *,
 258    metric_name: str,
 259    metric_unit: str,
 260    higher_is_better: bool,
 261) -> bool:
 262    return (
 263        str(entry.get("metric_name") or "").strip().lower() == metric_name.strip().lower()
 264        and str(entry.get("metric_unit") or "").strip().lower() == metric_unit.strip().lower()
 265        and bool(entry.get("higher_is_better", True)) == bool(higher_is_better)
 266        and _experiment_metric_value(entry) is not None
 267    )
 268
 269
 270def _best_experiment_for_metric(
 271    experiments: list[dict[str, Any]],
 272    *,
 273    metric_name: str,
 274    metric_unit: str,
 275    higher_is_better: bool,
 276    exclude_key: str = "",
 277) -> dict[str, Any] | None:
 278    candidates = [
 279        experiment
 280        for experiment in experiments
 281        if experiment.get("key") != exclude_key
 282        and _same_metric_group(
 283            experiment,
 284            metric_name=metric_name,
 285            metric_unit=metric_unit,
 286            higher_is_better=higher_is_better,
 287        )
 288    ]
 289    if not candidates:
 290        return None
 291    return max(candidates, key=lambda item: _experiment_metric_value(item) or 0.0) if higher_is_better else min(candidates, key=lambda item: _experiment_metric_value(item) or 0.0)
 292
 293
 294def _metric_delta(
 295    *,
 296    metric_value: Any,
 297    previous_best: dict[str, Any] | None,
 298    higher_is_better: bool,
 299) -> float | None:
 300    try:
 301        current = float(metric_value)
 302    except (TypeError, ValueError):
 303        return None
 304    if previous_best is None:
 305        return None
 306    previous = _experiment_metric_value(previous_best)
 307    if previous is None:
 308        return None
 309    delta = current - previous if higher_is_better else previous - current
 310    return round(delta, 6)
 311
 312
 313def _mark_best_experiments(experiments: list[dict[str, Any]]) -> dict[str, Any] | None:
 314    groups: dict[tuple[str, str, bool], list[dict[str, Any]]] = {}
 315    for experiment in experiments:
 316        metric_name = str(experiment.get("metric_name") or "").strip().lower()
 317        if _experiment_metric_value(experiment) is None or not metric_name:
 318            experiment["best_observed"] = False
 319            continue
 320        key = (
 321            metric_name,
 322            str(experiment.get("metric_unit") or "").strip().lower(),
 323            bool(experiment.get("higher_is_better", True)),
 324        )
 325        groups.setdefault(key, []).append(experiment)
 326    winners: list[dict[str, Any]] = []
 327    for (_metric_name, _metric_unit, higher_is_better), entries in groups.items():
 328        winner = max(entries, key=lambda item: _experiment_metric_value(item) or 0.0) if higher_is_better else min(entries, key=lambda item: _experiment_metric_value(item) or 0.0)
 329        for entry in entries:
 330            entry["best_observed"] = entry is winner
 331        winners.append(winner)
 332    if not winners:
 333        return None
 334    return max(winners, key=lambda item: str(item.get("updated_at") or item.get("created_at") or ""))
 335
 336
 337def _row_to_dict(row: sqlite3.Row | None) -> dict[str, Any] | None:
 338    if row is None:
 339        return None
 340    result = dict(row)
 341    for key in ("metadata_json", "input_json", "output_json", "score_json", "artifact_refs_json"):
 342        if key in result:
 343            try:
 344                result[key.removesuffix("_json")] = json.loads(result[key] or "{}")
 345            except json.JSONDecodeError:
 346                result[key.removesuffix("_json")] = {}
 347    return result
 348
 349
 350def _insert_event(
 351    conn: sqlite3.Connection,
 352    *,
 353    job_id: str | None,
 354    event_type: str,
 355    title: str = "",
 356    body: str = "",
 357    ref_table: str = "",
 358    ref_id: str = "",
 359    metadata: dict[str, Any] | None = None,
 360    created_at: str | None = None,
 361) -> dict[str, Any]:
 362    event_id = new_id("evt")
 363    when = created_at or utc_now()
 364    conn.execute(
 365        """
 366        INSERT INTO events(id, job_id, event_type, created_at, title, body, ref_table, ref_id, metadata_json)
 367        VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)
 368        """,
 369        (
 370            event_id,
 371            job_id,
 372            event_type.strip().lower() or "event",
 373            when,
 374            title.strip(),
 375            body.strip(),
 376            ref_table.strip(),
 377            ref_id.strip(),
 378            _json_dumps(metadata or {}),
 379        ),
 380    )
 381    return {
 382        "id": event_id,
 383        "job_id": job_id,
 384        "event_type": event_type.strip().lower() or "event",
 385        "created_at": when,
 386        "title": title.strip(),
 387        "body": body.strip(),
 388        "ref_table": ref_table.strip(),
 389        "ref_id": ref_id.strip(),
 390        "metadata": metadata or {},
 391    }
 392
 393
 394def _projected_event(
 395    *,
 396    event_id: str,
 397    job_id: str,
 398    event_type: str,
 399    created_at: str,
 400    title: str = "",
 401    body: str = "",
 402    ref_table: str = "",
 403    ref_id: str = "",
 404    metadata: dict[str, Any] | None = None,
 405) -> dict[str, Any]:
 406    return {
 407        "id": event_id,
 408        "job_id": job_id,
 409        "event_type": event_type,
 410        "created_at": created_at,
 411        "title": title,
 412        "body": body,
 413        "ref_table": ref_table,
 414        "ref_id": ref_id,
 415        "metadata": metadata or {},
 416        "projected": True,
 417    }
 418
 419
 420class AgentDB:
 421    """Small SQLite wrapper with WAL and jittered write retries."""
 422
 423    _WRITE_RETRIES = 12
 424
 425    def __init__(self, path: str | Path):
 426        self.path = Path(path)
 427        self.path.parent.mkdir(parents=True, exist_ok=True)
 428        self._lock = threading.RLock()
 429        self._conn = sqlite3.connect(
 430            str(self.path),
 431            check_same_thread=False,
 432            timeout=1.0,
 433            isolation_level=None,
 434        )
 435        self._conn.row_factory = sqlite3.Row
 436        self._conn.execute("PRAGMA journal_mode=WAL")
 437        self._conn.execute("PRAGMA foreign_keys=ON")
 438        self._init_schema()
 439
 440    def close(self) -> None:
 441        with self._lock:
 442            if self._conn is not None:
 443                try:
 444                    self._conn.execute("PRAGMA wal_checkpoint(PASSIVE)")
 445                finally:
 446                    self._conn.close()
 447                    self._conn = None
 448
 449    def _init_schema(self) -> None:
 450        with self._lock:
 451            self._conn.executescript(SCHEMA_SQL)
 452            row = self._conn.execute("SELECT version FROM schema_version LIMIT 1").fetchone()
 453            if row is None:
 454                self._conn.execute("INSERT INTO schema_version(version) VALUES (?)", (SCHEMA_VERSION,))
 455            elif int(row["version"]) != SCHEMA_VERSION:
 456                raise RuntimeError(f"Unsupported nipux schema version: {row['version']}")
 457
 458    def _write(self, fn: Callable[[sqlite3.Connection], T]) -> T:
 459        last_error: Exception | None = None
 460        for attempt in range(self._WRITE_RETRIES):
 461            try:
 462                with self._lock:
 463                    self._conn.execute("BEGIN IMMEDIATE")
 464                    try:
 465                        result = fn(self._conn)
 466                        self._conn.commit()
 467                        return result
 468                    except BaseException:
 469                        self._conn.rollback()
 470                        raise
 471            except sqlite3.OperationalError as exc:
 472                if "locked" not in str(exc).lower() and "busy" not in str(exc).lower():
 473                    raise
 474                last_error = exc
 475                if attempt < self._WRITE_RETRIES - 1:
 476                    time.sleep(random.uniform(0.02, 0.15))
 477        raise last_error or sqlite3.OperationalError("database is locked")
 478
 479    def append_event(
 480        self,
 481        job_id: str | None = None,
 482        *,
 483        event_type: str,
 484        title: str = "",
 485        body: str = "",
 486        ref_table: str = "",
 487        ref_id: str = "",
 488        metadata: dict[str, Any] | None = None,
 489        created_at: str | None = None,
 490    ) -> dict[str, Any]:
 491        def op(conn: sqlite3.Connection) -> dict[str, Any]:
 492            return _insert_event(
 493                conn,
 494                job_id=job_id,
 495                event_type=event_type,
 496                title=title,
 497                body=body,
 498                ref_table=ref_table,
 499                ref_id=ref_id,
 500                metadata=metadata,
 501                created_at=created_at,
 502            )
 503
 504        return self._write(op)
 505
 506    def list_events(
 507        self,
 508        *,
 509        job_id: str | None = None,
 510        limit: int = 100,
 511        event_types: Iterable[str] | None = None,
 512    ) -> list[dict[str, Any]]:
 513        filters = []
 514        params: list[Any] = []
 515        if job_id is not None:
 516            filters.append("job_id = ?")
 517            params.append(job_id)
 518        if event_types:
 519            values = [str(value).strip().lower() for value in event_types if str(value).strip()]
 520            if values:
 521                filters.append(f"event_type IN ({','.join('?' for _ in values)})")
 522                params.extend(values)
 523        where = f"WHERE {' AND '.join(filters)}" if filters else ""
 524        rows = self._conn.execute(
 525            f"""
 526            SELECT * FROM (
 527                SELECT * FROM events
 528                {where}
 529                ORDER BY created_at DESC, id DESC
 530                LIMIT ?
 531            )
 532            ORDER BY created_at ASC, id ASC
 533            """,
 534            [*params, int(limit)],
 535        ).fetchall()
 536        return [_row_to_dict(row) for row in rows]
 537
 538    def list_timeline_events(self, job_id: str, *, limit: int = 100) -> list[dict[str, Any]]:
 539        """Return visible job history, combining durable events with old projected state."""
 540
 541        actual = self.list_events(job_id=job_id, limit=max(limit * 4, 250))
 542        actual_ids = {str(event.get("id")) for event in actual}
 543        actual_refs = {
 544            (str(event.get("ref_table") or ""), str(event.get("ref_id") or ""))
 545            for event in actual
 546            if event.get("ref_table") and event.get("ref_id")
 547        }
 548        timeline: list[dict[str, Any]] = list(actual)
 549        job = self.get_job(job_id)
 550        metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
 551
 552        for index, entry in enumerate(_metadata_list(metadata, "operator_messages")):
 553            if entry.get("event_id") in actual_ids:
 554                continue
 555            timeline.append(_projected_event(
 556                event_id=f"projected_operator_{index}",
 557                job_id=job_id,
 558                event_type="operator_message",
 559                created_at=str(entry.get("at") or job.get("updated_at") or job.get("created_at")),
 560                title=str(entry.get("source") or "operator"),
 561                body=str(entry.get("message") or ""),
 562                metadata={
 563                    "source": entry.get("source") or "operator",
 564                    "mode": entry.get("mode") or "steer",
 565                    "claimed_at": entry.get("claimed_at"),
 566                    "acknowledged_at": entry.get("acknowledged_at"),
 567                    "superseded_at": entry.get("superseded_at"),
 568                },
 569            ))
 570
 571        for index, entry in enumerate(_metadata_list(metadata, "agent_updates")):
 572            if entry.get("event_id") in actual_ids:
 573                continue
 574            timeline.append(_projected_event(
 575                event_id=f"projected_agent_{index}",
 576                job_id=job_id,
 577                event_type="agent_message",
 578                created_at=str(entry.get("at") or job.get("updated_at") or job.get("created_at")),
 579                title=str(entry.get("category") or "progress"),
 580                body=str(entry.get("message") or ""),
 581                metadata=entry.get("metadata") if isinstance(entry.get("metadata"), dict) else {},
 582            ))
 583
 584        for index, lesson in enumerate(_metadata_list(metadata, "lessons")):
 585            if lesson.get("event_id") in actual_ids:
 586                continue
 587            timeline.append(_projected_event(
 588                event_id=f"projected_lesson_{index}",
 589                job_id=job_id,
 590                event_type="lesson",
 591                created_at=str(lesson.get("at") or lesson.get("last_seen") or job.get("updated_at") or job.get("created_at")),
 592                title=str(lesson.get("category") or "memory"),
 593                body=str(lesson.get("lesson") or ""),
 594                metadata={"confidence": lesson.get("confidence"), **(lesson.get("metadata") if isinstance(lesson.get("metadata"), dict) else {})},
 595            ))
 596
 597        for index, source in enumerate(_metadata_list(metadata, "source_ledger")):
 598            if source.get("event_id") in actual_ids:
 599                continue
 600            timeline.append(_projected_event(
 601                event_id=f"projected_source_{index}",
 602                job_id=job_id,
 603                event_type="source",
 604                created_at=str(source.get("last_seen") or source.get("first_seen") or job.get("updated_at") or job.get("created_at")),
 605                title=str(source.get("source") or "source"),
 606                body=str(source.get("last_outcome") or ""),
 607                metadata=source,
 608            ))
 609
 610        for index, finding in enumerate(_metadata_list(metadata, "finding_ledger")):
 611            if finding.get("event_id") in actual_ids:
 612                continue
 613            timeline.append(_projected_event(
 614                event_id=f"projected_finding_{index}",
 615                job_id=job_id,
 616                event_type="finding",
 617                created_at=str(finding.get("updated_at") or finding.get("created_at") or job.get("updated_at") or job.get("created_at")),
 618                title=str(finding.get("name") or "finding"),
 619                body=str(finding.get("reason") or finding.get("category") or ""),
 620                metadata=finding,
 621            ))
 622
 623        for index, task in enumerate(_metadata_list(metadata, "task_queue")):
 624            if task.get("event_id") in actual_ids:
 625                continue
 626            timeline.append(_projected_event(
 627                event_id=f"projected_task_{index}",
 628                job_id=job_id,
 629                event_type="task",
 630                created_at=str(task.get("updated_at") or task.get("created_at") or job.get("updated_at") or job.get("created_at")),
 631                title=str(task.get("title") or "task"),
 632                body=str(task.get("result") or task.get("goal") or ""),
 633                metadata=task,
 634            ))
 635
 636        for index, experiment in enumerate(_metadata_list(metadata, "experiment_ledger")):
 637            if experiment.get("event_id") in actual_ids:
 638                continue
 639            metric = ""
 640            if experiment.get("metric_value") is not None:
 641                metric = format_metric_value(
 642                    experiment.get("metric_name") or "metric",
 643                    experiment.get("metric_value"),
 644                    experiment.get("metric_unit") or "",
 645                )
 646            timeline.append(_projected_event(
 647                event_id=f"projected_experiment_{index}",
 648                job_id=job_id,
 649                event_type="experiment",
 650                created_at=str(experiment.get("updated_at") or experiment.get("created_at") or job.get("updated_at") or job.get("created_at")),
 651                title=str(experiment.get("title") or "experiment"),
 652                body=str(experiment.get("result") or metric or experiment.get("hypothesis") or ""),
 653                metadata=experiment,
 654            ))
 655
 656        for index, reflection in enumerate(_metadata_list(metadata, "reflections")):
 657            if reflection.get("event_id") in actual_ids:
 658                continue
 659            timeline.append(_projected_event(
 660                event_id=f"projected_reflection_{index}",
 661                job_id=job_id,
 662                event_type="reflection",
 663                created_at=str(reflection.get("at") or job.get("updated_at") or job.get("created_at")),
 664                title="reflection",
 665                body=str(reflection.get("summary") or reflection.get("strategy") or ""),
 666                metadata=reflection.get("metadata") if isinstance(reflection.get("metadata"), dict) else {},
 667            ))
 668
 669        for step in self.list_steps(job_id=job_id):
 670            ref = ("steps", str(step["id"]))
 671            if ref in actual_refs:
 672                continue
 673            event_type = "error" if step.get("status") == "failed" or step.get("error") else "tool_result"
 674            title = str(step.get("tool_name") or step.get("kind") or "step")
 675            body = str(step.get("summary") or step.get("error") or "")
 676            timeline.append(_projected_event(
 677                event_id=f"projected_step_{step['id']}",
 678                job_id=job_id,
 679                event_type=event_type,
 680                created_at=str(step.get("ended_at") or step.get("started_at")),
 681                title=title,
 682                body=body,
 683                ref_table="steps",
 684                ref_id=str(step["id"]),
 685                metadata={"step_no": step.get("step_no"), "status": step.get("status"), "kind": step.get("kind")},
 686            ))
 687
 688        for artifact in self.list_artifacts(job_id, limit=10000):
 689            ref = ("artifacts", str(artifact["id"]))
 690            if ref in actual_refs:
 691                continue
 692            timeline.append(_projected_event(
 693                event_id=f"projected_artifact_{artifact['id']}",
 694                job_id=job_id,
 695                event_type="artifact",
 696                created_at=str(artifact.get("created_at")),
 697                title=str(artifact.get("title") or artifact["id"]),
 698                body=str(artifact.get("summary") or artifact.get("path") or ""),
 699                ref_table="artifacts",
 700                ref_id=str(artifact["id"]),
 701                metadata={"type": artifact.get("type"), "path": artifact.get("path")},
 702            ))
 703
 704        for memory in self.list_memory(job_id):
 705            ref = ("memory_index", str(memory["id"]))
 706            if ref in actual_refs:
 707                continue
 708            timeline.append(_projected_event(
 709                event_id=f"projected_memory_{memory['id']}",
 710                job_id=job_id,
 711                event_type="compaction",
 712                created_at=str(memory.get("updated_at")),
 713                title=str(memory.get("key") or "compact memory"),
 714                body=str(memory.get("summary") or ""),
 715                ref_table="memory_index",
 716                ref_id=str(memory["id"]),
 717                metadata={"artifact_refs": memory.get("artifact_refs") or []},
 718            ))
 719
 720        timeline = [event for event in timeline if event.get("created_at")]
 721        timeline.sort(key=lambda event: (str(event.get("created_at") or ""), str(event.get("id") or "")))
 722        return timeline[-int(limit):]
 723
 724    def create_job(
 725        self,
 726        objective: str,
 727        *,
 728        title: str | None = None,
 729        kind: str = "generic",
 730        priority: int = 0,
 731        cadence: str | None = None,
 732        metadata: dict[str, Any] | None = None,
 733    ) -> str:
 734        now = utc_now()
 735        title = title or objective.strip().splitlines()[0][:80] or "Untitled job"
 736
 737        def op(conn: sqlite3.Connection) -> str:
 738            job_id = _unique_job_id(conn, title)
 739            conn.execute(
 740                """
 741                INSERT INTO jobs(id, title, objective, kind, status, priority, cadence, created_at, updated_at, metadata_json)
 742                VALUES (?, ?, ?, ?, 'queued', ?, ?, ?, ?, ?)
 743                """,
 744                (job_id, title, objective, kind, priority, cadence, now, now, _json_dumps(metadata)),
 745            )
 746            _insert_event(
 747                conn,
 748                job_id=job_id,
 749                event_type="daemon",
 750                title="job created",
 751                body=objective,
 752                metadata={"title": title, "kind": kind, "cadence": cadence},
 753                created_at=now,
 754            )
 755            return job_id
 756
 757        return self._write(op)
 758
 759    def get_job(self, job_id: str) -> dict[str, Any]:
 760        row = self._conn.execute("SELECT * FROM jobs WHERE id = ?", (job_id,)).fetchone()
 761        job = _row_to_dict(row)
 762        if job is None:
 763            raise KeyError(f"Job not found: {job_id}")
 764        return job
 765
 766    def list_jobs(self, *, statuses: Iterable[str] | None = None) -> list[dict[str, Any]]:
 767        if statuses:
 768            values = list(statuses)
 769            placeholders = ",".join("?" for _ in values)
 770            rows = self._conn.execute(
 771                f"SELECT * FROM jobs WHERE status IN ({placeholders}) ORDER BY priority DESC, updated_at",
 772                values,
 773            ).fetchall()
 774        else:
 775            rows = self._conn.execute("SELECT * FROM jobs ORDER BY updated_at DESC").fetchall()
 776        return [_row_to_dict(row) for row in rows]
 777
 778    def update_job_status(self, job_id: str, status: str, *, metadata_patch: dict[str, Any] | None = None) -> None:
 779        now = utc_now()
 780
 781        def op(conn: sqlite3.Connection) -> None:
 782            metadata_json = None
 783            if metadata_patch:
 784                row = conn.execute("SELECT metadata_json FROM jobs WHERE id = ?", (job_id,)).fetchone()
 785                if row is None:
 786                    raise KeyError(f"Job not found: {job_id}")
 787                current = json.loads(row["metadata_json"] or "{}")
 788                current.update(metadata_patch)
 789                metadata_json = _json_dumps(current)
 790            if metadata_json is None:
 791                conn.execute("UPDATE jobs SET status = ?, updated_at = ? WHERE id = ?", (status, now, job_id))
 792            else:
 793                conn.execute(
 794                    "UPDATE jobs SET status = ?, updated_at = ?, metadata_json = ? WHERE id = ?",
 795                    (status, now, metadata_json, job_id),
 796                )
 797            _insert_event(
 798                conn,
 799                job_id=job_id,
 800                event_type="daemon",
 801                title=f"job {status}",
 802                body=str((metadata_patch or {}).get("last_note") or ""),
 803                metadata={"status": status, "metadata_patch": metadata_patch or {}},
 804                created_at=now,
 805            )
 806
 807        self._write(op)
 808
 809    def update_job_metadata(self, job_id: str, metadata_patch: dict[str, Any]) -> None:
 810        now = utc_now()
 811
 812        def op(conn: sqlite3.Connection) -> None:
 813            row = conn.execute("SELECT metadata_json FROM jobs WHERE id = ?", (job_id,)).fetchone()
 814            if row is None:
 815                raise KeyError(f"Job not found: {job_id}")
 816            current = json.loads(row["metadata_json"] or "{}")
 817            current.update(metadata_patch)
 818            conn.execute(
 819                "UPDATE jobs SET updated_at = ?, metadata_json = ? WHERE id = ?",
 820                (now, _json_dumps(current), job_id),
 821            )
 822
 823        self._write(op)
 824
 825    def claim_operator_messages(
 826        self,
 827        job_id: str,
 828        *,
 829        modes: Iterable[str] = ("steer",),
 830        limit: int = 1,
 831    ) -> list[dict[str, Any]]:
 832        now = utc_now()
 833        allowed = {mode.strip().lower().replace("-", "_") for mode in modes}
 834
 835        def op(conn: sqlite3.Connection) -> list[dict[str, Any]]:
 836            row = conn.execute("SELECT metadata_json FROM jobs WHERE id = ?", (job_id,)).fetchone()
 837            if row is None:
 838                raise KeyError(f"Job not found: {job_id}")
 839            metadata = json.loads(row["metadata_json"] or "{}")
 840            messages = metadata.get("operator_messages")
 841            if not isinstance(messages, list):
 842                return []
 843            claimed: list[dict[str, Any]] = []
 844            for entry in messages:
 845                if len(claimed) >= limit:
 846                    break
 847                if not isinstance(entry, dict):
 848                    continue
 849                mode = str(entry.get("mode") or "steer").strip().lower().replace("-", "_")
 850                if mode not in allowed or entry.get("claimed_at"):
 851                    continue
 852                if entry.get("acknowledged_at") or entry.get("superseded_at"):
 853                    continue
 854                entry["claimed_at"] = now
 855                entry["delivered_at"] = now
 856                claimed.append(dict(entry))
 857            if not claimed:
 858                return []
 859            metadata["operator_messages"] = messages[-200:]
 860            metadata["last_claimed_operator_messages"] = claimed
 861            conn.execute(
 862                "UPDATE jobs SET updated_at = ?, metadata_json = ? WHERE id = ?",
 863                (now, _json_dumps(metadata), job_id),
 864            )
 865            for entry in claimed:
 866                _insert_event(
 867                    conn,
 868                    job_id=job_id,
 869                    event_type="loop",
 870                    title="steering claimed",
 871                    body=str(entry.get("message") or ""),
 872                    metadata={
 873                        "source": entry.get("source"),
 874                        "mode": entry.get("mode"),
 875                        "operator_event_id": entry.get("event_id"),
 876                    },
 877                    created_at=now,
 878                )
 879            return claimed
 880
 881        return self._write(op)
 882
 883    def acknowledge_operator_messages(
 884        self,
 885        job_id: str,
 886        *,
 887        message_ids: Iterable[str] | None = None,
 888        summary: str = "",
 889        status: str = "acknowledged",
 890    ) -> dict[str, Any]:
 891        now = utc_now()
 892        wanted = {str(message_id).strip() for message_id in (message_ids or []) if str(message_id).strip()}
 893        status = status.strip().lower().replace("-", "_") or "acknowledged"
 894        if status not in {"acknowledged", "superseded"}:
 895            status = "acknowledged"
 896
 897        def op(conn: sqlite3.Connection) -> dict[str, Any]:
 898            row = conn.execute("SELECT metadata_json FROM jobs WHERE id = ?", (job_id,)).fetchone()
 899            if row is None:
 900                raise KeyError(f"Job not found: {job_id}")
 901            metadata = json.loads(row["metadata_json"] or "{}")
 902            messages = metadata.get("operator_messages")
 903            if not isinstance(messages, list):
 904                messages = []
 905            acknowledged: list[dict[str, Any]] = []
 906            for entry in messages:
 907                if not isinstance(entry, dict):
 908                    continue
 909                mode = str(entry.get("mode") or "steer").strip().lower().replace("-", "_")
 910                if mode not in {"steer", "follow_up"}:
 911                    continue
 912                event_id = str(entry.get("event_id") or "")
 913                if wanted and event_id not in wanted:
 914                    continue
 915                if not wanted and not entry.get("claimed_at"):
 916                    continue
 917                if entry.get("acknowledged_at") or entry.get("superseded_at"):
 918                    continue
 919                if status == "superseded":
 920                    entry["superseded_at"] = now
 921                else:
 922                    entry["acknowledged_at"] = now
 923                if summary:
 924                    entry["acknowledgement_summary"] = summary.strip()
 925                acknowledged.append(dict(entry))
 926            metadata["operator_messages"] = messages[-200:]
 927            metadata["last_operator_context_ack"] = {
 928                "at": now,
 929                "status": status,
 930                "summary": summary.strip(),
 931                "message_ids": [entry.get("event_id") for entry in acknowledged if entry.get("event_id")],
 932                "count": len(acknowledged),
 933            }
 934            conn.execute(
 935                "UPDATE jobs SET updated_at = ?, metadata_json = ? WHERE id = ?",
 936                (now, _json_dumps(metadata), job_id),
 937            )
 938            event = _insert_event(
 939                conn,
 940                job_id=job_id,
 941                event_type="operator_context",
 942                title=f"operator {status}",
 943                body=summary.strip() or f"{len(acknowledged)} operator message(s) {status}",
 944                metadata={
 945                    "status": status,
 946                    "message_ids": [entry.get("event_id") for entry in acknowledged if entry.get("event_id")],
 947                    "count": len(acknowledged),
 948                },
 949                created_at=now,
 950            )
 951            return {"event": event, "messages": acknowledged, "count": len(acknowledged), "status": status}
 952
 953        return self._write(op)
 954
 955    def rename_job(self, job_id: str, title: str) -> dict[str, Any]:
 956        now = utc_now()
 957        new_title = title.strip()
 958        if not new_title:
 959            raise ValueError("title is required")
 960
 961        def op(conn: sqlite3.Connection) -> dict[str, Any]:
 962            row = conn.execute("SELECT * FROM jobs WHERE id = ?", (job_id,)).fetchone()
 963            if row is None:
 964                raise KeyError(f"Job not found: {job_id}")
 965            conn.execute("UPDATE jobs SET title = ?, updated_at = ? WHERE id = ?", (new_title, now, job_id))
 966            _insert_event(
 967                conn,
 968                job_id=job_id,
 969                event_type="daemon",
 970                title="job renamed",
 971                body=f"{row['title']} -> {new_title}",
 972                metadata={"old_title": row["title"], "new_title": new_title},
 973                created_at=now,
 974            )
 975            updated = dict(row)
 976            updated["title"] = new_title
 977            updated["updated_at"] = now
 978            return _row_to_dict(updated)
 979
 980        return self._write(op)
 981
 982    def delete_job(self, job_id: str) -> dict[str, Any]:
 983        def op(conn: sqlite3.Connection) -> dict[str, Any]:
 984            row = conn.execute("SELECT * FROM jobs WHERE id = ?", (job_id,)).fetchone()
 985            if row is None:
 986                raise KeyError(f"Job not found: {job_id}")
 987            artifact_rows = conn.execute("SELECT path FROM artifacts WHERE job_id = ?", (job_id,)).fetchall()
 988            artifact_paths = [str(artifact["path"]) for artifact in artifact_rows if artifact["path"]]
 989            counts = {
 990                "evidence": conn.execute("SELECT COUNT(*) AS n FROM evidence WHERE job_id = ?", (job_id,)).fetchone()["n"],
 991                "artifacts": conn.execute("SELECT COUNT(*) AS n FROM artifacts WHERE job_id = ?", (job_id,)).fetchone()["n"],
 992                "memory": conn.execute("SELECT COUNT(*) AS n FROM memory_index WHERE job_id = ?", (job_id,)).fetchone()["n"],
 993                "steps": conn.execute("SELECT COUNT(*) AS n FROM steps WHERE job_id = ?", (job_id,)).fetchone()["n"],
 994                "runs": conn.execute("SELECT COUNT(*) AS n FROM job_runs WHERE job_id = ?", (job_id,)).fetchone()["n"],
 995                "events": conn.execute("SELECT COUNT(*) AS n FROM events WHERE job_id = ?", (job_id,)).fetchone()["n"],
 996            }
 997            conn.execute("DELETE FROM evidence WHERE job_id = ?", (job_id,))
 998            conn.execute("DELETE FROM artifacts WHERE job_id = ?", (job_id,))
 999            conn.execute("DELETE FROM memory_index WHERE job_id = ?", (job_id,))
1000            conn.execute("DELETE FROM steps WHERE job_id = ?", (job_id,))
1001            conn.execute("DELETE FROM job_runs WHERE job_id = ?", (job_id,))
1002            conn.execute("DELETE FROM events WHERE job_id = ?", (job_id,))
1003            conn.execute("DELETE FROM jobs WHERE id = ?", (job_id,))
1004            return {
1005                "job": _row_to_dict(row),
1006                "artifact_paths": artifact_paths,
1007                "counts": counts,
1008            }
1009
1010        return self._write(op)
1011
1012    def append_operator_message(
1013        self,
1014        job_id: str,
1015        message: str,
1016        *,
1017        source: str = "operator",
1018        mode: str = "steer",
1019    ) -> dict[str, Any]:
1020        now = utc_now()
1021        text = message.strip()
1022        if not text:
1023            raise ValueError("message is required")
1024        mode = mode.strip().lower().replace("-", "_") or "steer"
1025        if mode not in {"steer", "follow_up", "note"}:
1026            mode = "steer"
1027        entry = {"at": now, "source": source, "mode": mode, "message": text}
1028
1029        def op(conn: sqlite3.Connection) -> dict[str, Any]:
1030            row = conn.execute("SELECT metadata_json FROM jobs WHERE id = ?", (job_id,)).fetchone()
1031            if row is None:
1032                raise KeyError(f"Job not found: {job_id}")
1033            event = _insert_event(
1034                conn,
1035                job_id=job_id,
1036                event_type="operator_message",
1037                title=source,
1038                body=text,
1039                metadata={"source": source, "mode": mode},
1040                created_at=now,
1041            )
1042            entry["event_id"] = event["id"]
1043            metadata = json.loads(row["metadata_json"] or "{}")
1044            messages = metadata.get("operator_messages")
1045            if not isinstance(messages, list):
1046                messages = []
1047            messages.append(entry)
1048            metadata["operator_messages"] = messages[-200:]
1049            metadata["last_operator_message"] = entry
1050            conn.execute(
1051                "UPDATE jobs SET updated_at = ?, metadata_json = ? WHERE id = ?",
1052                (now, _json_dumps(metadata), job_id),
1053            )
1054            return entry
1055
1056        return self._write(op)
1057
1058    def append_agent_update(
1059        self,
1060        job_id: str,
1061        message: str,
1062        *,
1063        category: str = "progress",
1064        metadata: dict[str, Any] | None = None,
1065    ) -> dict[str, Any]:
1066        now = utc_now()
1067        text = message.strip()
1068        if not text:
1069            raise ValueError("message is required")
1070        entry = {
1071            "at": now,
1072            "category": category.strip() or "progress",
1073            "message": text,
1074            "metadata": metadata or {},
1075        }
1076
1077        def op(conn: sqlite3.Connection) -> dict[str, Any]:
1078            row = conn.execute("SELECT metadata_json FROM jobs WHERE id = ?", (job_id,)).fetchone()
1079            if row is None:
1080                raise KeyError(f"Job not found: {job_id}")
1081            event = _insert_event(
1082                conn,
1083                job_id=job_id,
1084                event_type="agent_message",
1085                title=entry["category"],
1086                body=text,
1087                metadata=entry["metadata"],
1088                created_at=now,
1089            )
1090            entry["event_id"] = event["id"]
1091            job_metadata = json.loads(row["metadata_json"] or "{}")
1092            updates = job_metadata.get("agent_updates")
1093            if not isinstance(updates, list):
1094                updates = []
1095            updates.append(entry)
1096            job_metadata["agent_updates"] = updates[-100:]
1097            job_metadata["last_agent_update"] = entry
1098            conn.execute(
1099                "UPDATE jobs SET updated_at = ?, metadata_json = ? WHERE id = ?",
1100                (now, _json_dumps(job_metadata), job_id),
1101            )
1102            return entry
1103
1104        return self._write(op)
1105
1106    def append_lesson(
1107        self,
1108        job_id: str,
1109        lesson: str,
1110        *,
1111        category: str = "memory",
1112        confidence: float | None = None,
1113        metadata: dict[str, Any] | None = None,
1114    ) -> dict[str, Any]:
1115        now = utc_now()
1116        text = lesson.strip()
1117        if not text:
1118            raise ValueError("lesson is required")
1119        entry = {
1120            "at": now,
1121            "category": category.strip().lower() or "memory",
1122            "key": _norm_key(f"{category}:{text}"),
1123            "lesson": text,
1124            "confidence": confidence,
1125            "metadata": metadata or {},
1126        }
1127
1128        def op(conn: sqlite3.Connection) -> dict[str, Any]:
1129            row = conn.execute("SELECT metadata_json FROM jobs WHERE id = ?", (job_id,)).fetchone()
1130            if row is None:
1131                raise KeyError(f"Job not found: {job_id}")
1132            job_metadata = json.loads(row["metadata_json"] or "{}")
1133            lessons = job_metadata.get("lessons")
1134            if not isinstance(lessons, list):
1135                lessons = []
1136            existing = next(
1137                (
1138                    item
1139                    for item in lessons
1140                    if isinstance(item, dict)
1141                    and (item.get("key") or _norm_key(f"{item.get('category', 'memory')}:{item.get('lesson', '')}"))
1142                    == entry["key"]
1143                ),
1144                None,
1145            )
1146            if existing is None:
1147                lessons.append(entry)
1148                current = entry
1149                current["created"] = True
1150                current["substantive_update"] = True
1151                event = _insert_event(
1152                    conn,
1153                    job_id=job_id,
1154                    event_type="lesson",
1155                    title=current.get("category") or "memory",
1156                    body=current.get("lesson") or text,
1157                    metadata={
1158                        "confidence": current.get("confidence"),
1159                        "seen_count": current.get("seen_count"),
1160                        **(current.get("metadata") if isinstance(current.get("metadata"), dict) else {}),
1161                    },
1162                    created_at=now,
1163                )
1164                current["event_id"] = event["id"]
1165            else:
1166                existing["last_seen"] = now
1167                existing["seen_count"] = int(existing.get("seen_count") or 1) + 1
1168                if confidence is not None:
1169                    existing["confidence"] = confidence
1170                if metadata:
1171                    merged = existing.get("metadata") if isinstance(existing.get("metadata"), dict) else {}
1172                    merged.update(metadata)
1173                    existing["metadata"] = merged
1174                existing["key"] = entry["key"]
1175                current = existing
1176                current["created"] = False
1177                current["substantive_update"] = False
1178            job_metadata["lessons"] = lessons[-200:]
1179            job_metadata["last_lesson"] = current
1180            conn.execute(
1181                "UPDATE jobs SET updated_at = ?, metadata_json = ? WHERE id = ?",
1182                (now, _json_dumps(job_metadata), job_id),
1183            )
1184            return current
1185
1186        return self._write(op)
1187
1188    def append_memory_graph_records(
1189        self,
1190        job_id: str,
1191        *,
1192        nodes: list[dict[str, Any]] | None = None,
1193        edges: list[dict[str, Any]] | None = None,
1194    ) -> dict[str, Any]:
1195        now = utc_now()
1196        node_items = [node for node in (nodes or []) if isinstance(node, dict)]
1197        edge_items = [edge for edge in (edges or []) if isinstance(edge, dict)]
1198        if not node_items and not edge_items:
1199            raise ValueError("nodes or edges are required")
1200
1201        def op(conn: sqlite3.Connection) -> dict[str, Any]:
1202            row = conn.execute("SELECT metadata_json FROM jobs WHERE id = ?", (job_id,)).fetchone()
1203            if row is None:
1204                raise KeyError(f"Job not found: {job_id}")
1205            job_metadata = json.loads(row["metadata_json"] or "{}")
1206            graph = job_metadata.get("memory_graph") if isinstance(job_metadata.get("memory_graph"), dict) else {}
1207            stored_nodes = _metadata_list(graph, "nodes")
1208            stored_edges = _metadata_list(graph, "edges")
1209            node_by_key = {str(node.get("key") or ""): node for node in stored_nodes if node.get("key")}
1210            added_nodes = 0
1211            updated_nodes = 0
1212            touched_nodes: list[dict[str, Any]] = []
1213
1214            for node in node_items[:50]:
1215                title = str(node.get("title") or node.get("name") or "").strip()
1216                summary = str(node.get("summary") or node.get("body") or "").strip()
1217                if not title and not summary:
1218                    continue
1219                key = _norm_key(str(node.get("key") or title or summary[:80]))
1220                current = node_by_key.get(key)
1221                created = current is None
1222                if current is None:
1223                    current = {
1224                        "key": key,
1225                        "title": title or key,
1226                        "kind": DEFAULT_NODE_KIND,
1227                        "status": DEFAULT_NODE_STATUS,
1228                        "summary": "",
1229                        "tags": [],
1230                        "evidence_refs": [],
1231                        "links": [],
1232                        "metadata": {},
1233                        "created_at": now,
1234                    }
1235                    stored_nodes.append(current)
1236                    node_by_key[key] = current
1237                    added_nodes += 1
1238                else:
1239                    updated_nodes += 1
1240                if title:
1241                    current["title"] = title
1242                if summary:
1243                    current["summary"] = summary
1244                kind = str(node.get("kind") or current.get("kind") or DEFAULT_NODE_KIND).strip().lower()
1245                current["kind"] = kind if kind in NODE_KINDS else DEFAULT_NODE_KIND
1246                status = str(node.get("status") or current.get("status") or DEFAULT_NODE_STATUS).strip().lower()
1247                current["status"] = status if status in NODE_STATUSES else DEFAULT_NODE_STATUS
1248                if "salience" in node:
1249                    current["salience"] = _bounded_float(node.get("salience"), 0.0, 1.0)
1250                elif "salience" not in current:
1251                    current["salience"] = 0.5
1252                if "confidence" in node:
1253                    current["confidence"] = _bounded_float(node.get("confidence"), 0.0, 1.0)
1254                elif "confidence" not in current:
1255                    current["confidence"] = 0.5
1256                parent_key = str(node.get("parent_key") or node.get("parent") or "").strip()
1257                if parent_key:
1258                    current["parent_key"] = _norm_key(parent_key)
1259                current["tags"] = _merge_string_lists(current.get("tags"), node.get("tags"), limit=24)
1260                current["evidence_refs"] = _merge_string_lists(current.get("evidence_refs"), node.get("evidence_refs") or node.get("evidence"), limit=24)
1261                current["links"] = _merge_string_lists(current.get("links"), node.get("links"), limit=50)
1262                if isinstance(node.get("metadata"), dict):
1263                    merged = current.get("metadata") if isinstance(current.get("metadata"), dict) else {}
1264                    merged.update(node["metadata"])
1265                    current["metadata"] = merged
1266                current["created"] = created
1267                current["updated_at"] = now
1268                current["use_count"] = int(current.get("use_count") or 0)
1269                touched_nodes.append(current)
1270
1271            existing_edge_keys = {
1272                _memory_edge_key(edge)
1273                for edge in stored_edges
1274                if _memory_edge_key(edge)
1275            }
1276            added_edges = 0
1277            touched_edges: list[dict[str, Any]] = []
1278            for edge in edge_items[:100]:
1279                from_key = _norm_key(str(edge.get("from_key") or edge.get("from") or "").strip())
1280                to_key = _norm_key(str(edge.get("to_key") or edge.get("to") or "").strip())
1281                if not from_key or not to_key:
1282                    continue
1283                relation = str(edge.get("relation") or "related_to").strip().lower().replace(" ", "_")
1284                relation = re.sub(r"[^a-z0-9_-]+", "_", relation).strip("_") or "related_to"
1285                edge_key = f"{from_key}|{relation}|{to_key}"
1286                if edge_key in existing_edge_keys:
1287                    continue
1288                stored = {
1289                    "key": edge_key,
1290                    "from_key": from_key,
1291                    "to_key": to_key,
1292                    "relation": relation,
1293                    "evidence_refs": _merge_string_lists([], edge.get("evidence_refs") or edge.get("evidence"), limit=24),
1294                    "metadata": edge.get("metadata") if isinstance(edge.get("metadata"), dict) else {},
1295                    "created_at": now,
1296                    "updated_at": now,
1297                }
1298                stored_edges.append(stored)
1299                existing_edge_keys.add(edge_key)
1300                touched_edges.append(stored)
1301                added_edges += 1
1302
1303            graph = {
1304                "nodes": stored_nodes[-1000:],
1305                "edges": stored_edges[-2000:],
1306                "updated_at": now,
1307            }
1308            event = _insert_event(
1309                conn,
1310                job_id=job_id,
1311                event_type="memory_node",
1312                title="memory graph",
1313                body=f"nodes +{added_nodes}/~{updated_nodes}; edges +{added_edges}",
1314                metadata={
1315                    "added_nodes": added_nodes,
1316                    "updated_nodes": updated_nodes,
1317                    "added_edges": added_edges,
1318                    "node_keys": [node.get("key") for node in touched_nodes[-20:]],
1319                    "edge_keys": [edge.get("key") for edge in touched_edges[-20:]],
1320                },
1321                created_at=now,
1322            )
1323            graph["event_id"] = event["id"]
1324            job_metadata["memory_graph"] = graph
1325            job_metadata["last_memory_graph_record"] = {
1326                "at": now,
1327                "event_id": event["id"],
1328                "added_nodes": added_nodes,
1329                "updated_nodes": updated_nodes,
1330                "added_edges": added_edges,
1331                "nodes": touched_nodes[-20:],
1332                "edges": touched_edges[-20:],
1333            }
1334            conn.execute(
1335                "UPDATE jobs SET updated_at = ?, metadata_json = ? WHERE id = ?",
1336                (now, _json_dumps(job_metadata), job_id),
1337            )
1338            return {
1339                "added_nodes": added_nodes,
1340                "updated_nodes": updated_nodes,
1341                "added_edges": added_edges,
1342                "nodes": touched_nodes,
1343                "edges": touched_edges,
1344                "event_id": event["id"],
1345            }
1346
1347        return self._write(op)
1348
1349    def append_source_record(
1350        self,
1351        job_id: str,
1352        source: str,
1353        *,
1354        source_type: str = "",
1355        usefulness_score: float | None = None,
1356        yield_count: int = 0,
1357        fail_count_delta: int = 0,
1358        warnings: list[str] | None = None,
1359        outcome: str = "",
1360        metadata: dict[str, Any] | None = None,
1361    ) -> dict[str, Any]:
1362        now = utc_now()
1363        text = source.strip()
1364        if not text:
1365            raise ValueError("source is required")
1366        key = _norm_key(text)
1367
1368        def op(conn: sqlite3.Connection) -> dict[str, Any]:
1369            row = conn.execute("SELECT metadata_json FROM jobs WHERE id = ?", (job_id,)).fetchone()
1370            if row is None:
1371                raise KeyError(f"Job not found: {job_id}")
1372            job_metadata = json.loads(row["metadata_json"] or "{}")
1373            sources = _metadata_list(job_metadata, "source_ledger")
1374            current = next((entry for entry in sources if entry.get("key") == key), None)
1375            created = current is None
1376            change_fields = (
1377                "source",
1378                "source_type",
1379                "usefulness_score",
1380                "fail_count",
1381                "yield_count",
1382                "warnings",
1383                "last_outcome",
1384                "metadata",
1385            )
1386            before = "" if created else _change_fingerprint(current, change_fields)
1387            if current is None:
1388                current = {
1389                    "key": key,
1390                    "source": text,
1391                    "source_type": source_type.strip() or "unknown",
1392                    "usefulness_score": 0.0,
1393                    "fail_count": 0,
1394                    "yield_count": 0,
1395                    "warnings": [],
1396                    "last_outcome": "",
1397                    "metadata": {},
1398                    "first_seen": now,
1399                }
1400                sources.append(current)
1401            if source_type:
1402                current["source_type"] = source_type.strip()
1403            if usefulness_score is not None:
1404                current["usefulness_score"] = float(usefulness_score)
1405            if yield_count:
1406                current["yield_count"] = int(current.get("yield_count") or 0) + int(yield_count)
1407            if fail_count_delta:
1408                current["fail_count"] = int(current.get("fail_count") or 0) + int(fail_count_delta)
1409            if warnings:
1410                merged = list(dict.fromkeys([*current.get("warnings", []), *[str(warning) for warning in warnings]]))
1411                current["warnings"] = merged[-20:]
1412            if outcome:
1413                current["last_outcome"] = outcome.strip()
1414            if metadata:
1415                merged_metadata = current.get("metadata") if isinstance(current.get("metadata"), dict) else {}
1416                merged_metadata.update(metadata)
1417                current["metadata"] = merged_metadata
1418            current["created"] = created
1419            substantive_update = created or before != _change_fingerprint(current, change_fields)
1420            current["substantive_update"] = substantive_update
1421            if substantive_update:
1422                current["updated_at"] = now
1423            current["last_seen"] = now
1424            if substantive_update:
1425                event = _insert_event(
1426                    conn,
1427                    job_id=job_id,
1428                    event_type="source",
1429                    title=current.get("source") or text,
1430                    body=current.get("last_outcome") or outcome,
1431                    metadata={
1432                        "created": created,
1433                        "substantive_update": substantive_update,
1434                        "source_type": current.get("source_type"),
1435                        "usefulness_score": current.get("usefulness_score"),
1436                        "yield_count": current.get("yield_count"),
1437                        "fail_count": current.get("fail_count"),
1438                        "warnings": current.get("warnings") or [],
1439                        **(current.get("metadata") if isinstance(current.get("metadata"), dict) else {}),
1440                    },
1441                    created_at=now,
1442                )
1443                current["event_id"] = event["id"]
1444            job_metadata["source_ledger"] = sources[-250:]
1445            job_metadata["last_source_record"] = current
1446            conn.execute(
1447                "UPDATE jobs SET updated_at = ?, metadata_json = ? WHERE id = ?",
1448                (now, _json_dumps(job_metadata), job_id),
1449            )
1450            return current
1451
1452        return self._write(op)
1453
1454    def append_finding_record(
1455        self,
1456        job_id: str,
1457        *,
1458        name: str,
1459        url: str = "",
1460        source_url: str = "",
1461        category: str = "",
1462        location: str = "",
1463        contact: str = "",
1464        reason: str = "",
1465        status: str = "new",
1466        score: float | None = None,
1467        evidence_artifact: str = "",
1468        metadata: dict[str, Any] | None = None,
1469    ) -> dict[str, Any]:
1470        now = utc_now()
1471        name = name.strip()
1472        if not name:
1473            raise ValueError("name is required")
1474        url = url.strip()
1475        source_url = source_url.strip()
1476        key = _norm_key(f"{name}|{url or source_url}")
1477
1478        def op(conn: sqlite3.Connection) -> dict[str, Any]:
1479            row = conn.execute("SELECT metadata_json FROM jobs WHERE id = ?", (job_id,)).fetchone()
1480            if row is None:
1481                raise KeyError(f"Job not found: {job_id}")
1482            job_metadata = json.loads(row["metadata_json"] or "{}")
1483            findings = _metadata_list(job_metadata, "finding_ledger")
1484            current = next((entry for entry in findings if entry.get("key") == key), None)
1485            created = current is None
1486            change_fields = (
1487                "url",
1488                "source_url",
1489                "category",
1490                "location",
1491                "contact",
1492                "reason",
1493                "status",
1494                "score",
1495                "evidence_artifact",
1496                "metadata",
1497            )
1498            before = "" if created else _change_fingerprint(current, change_fields)
1499            if current is None:
1500                current = {
1501                    "key": key,
1502                    "name": name,
1503                    "url": url,
1504                    "source_url": source_url,
1505                    "category": category.strip(),
1506                    "location": location.strip(),
1507                    "contact": contact.strip(),
1508                    "reason": reason.strip(),
1509                    "status": status.strip() or "new",
1510                    "score": score,
1511                    "evidence_artifact": evidence_artifact.strip(),
1512                    "metadata": metadata or {},
1513                    "created_at": now,
1514                }
1515                findings.append(current)
1516            else:
1517                for field, value in {
1518                    "url": url,
1519                    "source_url": source_url,
1520                    "category": category.strip(),
1521                    "location": location.strip(),
1522                    "contact": contact.strip(),
1523                    "reason": reason.strip(),
1524                    "status": status.strip(),
1525                    "evidence_artifact": evidence_artifact.strip(),
1526                }.items():
1527                    if value:
1528                        current[field] = value
1529                if score is not None:
1530                    current["score"] = score
1531                if metadata:
1532                    merged_metadata = current.get("metadata") if isinstance(current.get("metadata"), dict) else {}
1533                    merged_metadata.update(metadata)
1534                    current["metadata"] = merged_metadata
1535            current["created"] = created
1536            substantive_update = created or before != _change_fingerprint(current, change_fields)
1537            current["substantive_update"] = substantive_update
1538            if substantive_update:
1539                current["updated_at"] = now
1540            if substantive_update:
1541                event = _insert_event(
1542                    conn,
1543                    job_id=job_id,
1544                    event_type="finding",
1545                    title=current.get("name") or name,
1546                    body=current.get("reason") or current.get("category") or "",
1547                    metadata={
1548                        "created": created,
1549                        "substantive_update": substantive_update,
1550                        "score": current.get("score"),
1551                        "status": current.get("status"),
1552                        "source_url": current.get("source_url"),
1553                        "evidence_artifact": current.get("evidence_artifact"),
1554                        **(current.get("metadata") if isinstance(current.get("metadata"), dict) else {}),
1555                    },
1556                    created_at=now,
1557                )
1558                current["event_id"] = event["id"]
1559            job_metadata["finding_ledger"] = findings[-1000:]
1560            job_metadata["last_finding_record"] = current
1561            conn.execute(
1562                "UPDATE jobs SET updated_at = ?, metadata_json = ? WHERE id = ?",
1563                (now, _json_dumps(job_metadata), job_id),
1564            )
1565            return current
1566
1567        return self._write(op)
1568
1569    def append_roadmap_record(
1570        self,
1571        job_id: str,
1572        *,
1573        title: str,
1574        status: str = "planned",
1575        objective: str = "",
1576        scope: str = "",
1577        current_milestone: str = "",
1578        validation_contract: str = "",
1579        milestones: list[dict[str, Any]] | None = None,
1580        metadata: dict[str, Any] | None = None,
1581    ) -> dict[str, Any]:
1582        now = utc_now()
1583        title = title.strip()
1584        if not title:
1585            raise ValueError("title is required")
1586        status = _clean_status(status, {"planned", "active", "validating", "done", "blocked", "paused"}, "planned")
1587        milestone_items = milestones if isinstance(milestones, list) else []
1588
1589        def merge_feature(existing_features: list[dict[str, Any]], feature: dict[str, Any]) -> tuple[dict[str, Any] | None, bool, bool]:
1590            feature_title = str(feature.get("title") or feature.get("name") or "").strip()
1591            if not feature_title:
1592                return None, False, False
1593            feature_key = _norm_key(str(feature.get("key") or feature_title))
1594            feature_title_key = _norm_key(feature_title)
1595            current = next(
1596                (
1597                    entry for entry in existing_features
1598                    if entry.get("key") == feature_key
1599                    or _norm_key(str(entry.get("title") or "")) == feature_title_key
1600                ),
1601                None,
1602            )
1603            created = current is None
1604            change_fields = (
1605                "title",
1606                "status",
1607                "goal",
1608                "output_contract",
1609                "acceptance_criteria",
1610                "evidence_needed",
1611                "result",
1612                "metadata",
1613            )
1614            before = "" if created else _change_fingerprint(current, change_fields)
1615            if current is None:
1616                current = {
1617                    "key": feature_key,
1618                    "title": feature_title,
1619                    "status": _clean_status(str(feature.get("status") or "planned"), {"planned", "active", "done", "blocked", "skipped"}, "planned"),
1620                    "goal": str(feature.get("goal") or feature.get("description") or "").strip(),
1621                    "output_contract": str(feature.get("output_contract") or feature.get("contract") or "").strip().lower().replace(" ", "_"),
1622                    "acceptance_criteria": str(feature.get("acceptance_criteria") or "").strip(),
1623                    "evidence_needed": str(feature.get("evidence_needed") or "").strip(),
1624                    "result": str(feature.get("result") or feature.get("outcome") or "").strip(),
1625                    "metadata": feature.get("metadata") if isinstance(feature.get("metadata"), dict) else {},
1626                    "created_at": now,
1627                }
1628                existing_features.append(current)
1629            else:
1630                current["status"] = _clean_status(str(feature.get("status") or current.get("status") or "planned"), {"planned", "active", "done", "blocked", "skipped"}, "planned")
1631                for field, value in {
1632                    "title": feature_title,
1633                    "goal": str(feature.get("goal") or feature.get("description") or "").strip(),
1634                    "output_contract": str(feature.get("output_contract") or feature.get("contract") or "").strip().lower().replace(" ", "_"),
1635                    "acceptance_criteria": str(feature.get("acceptance_criteria") or "").strip(),
1636                    "evidence_needed": str(feature.get("evidence_needed") or "").strip(),
1637                    "result": str(feature.get("result") or feature.get("outcome") or "").strip(),
1638                }.items():
1639                    if value:
1640                        current[field] = value
1641                if isinstance(feature.get("metadata"), dict):
1642                    merged = current.get("metadata") if isinstance(current.get("metadata"), dict) else {}
1643                    merged.update(feature["metadata"])
1644                    current["metadata"] = merged
1645            if current.get("output_contract") not in {"research", "artifact", "experiment", "action", "monitor", "decision", "report", "validation"}:
1646                current["output_contract"] = ""
1647            current["created"] = created
1648            changed = created or before != _change_fingerprint(current, change_fields)
1649            current["substantive_update"] = changed
1650            if changed:
1651                current["updated_at"] = now
1652            return current, created, changed
1653
1654        def op(conn: sqlite3.Connection) -> dict[str, Any]:
1655            row = conn.execute("SELECT objective, metadata_json FROM jobs WHERE id = ?", (job_id,)).fetchone()
1656            if row is None:
1657                raise KeyError(f"Job not found: {job_id}")
1658            job_metadata = json.loads(row["metadata_json"] or "{}")
1659            roadmap = job_metadata.get("roadmap")
1660            created = not isinstance(roadmap, dict)
1661            roadmap_change_fields = (
1662                "title",
1663                "status",
1664                "objective",
1665                "scope",
1666                "validation_contract",
1667                "current_milestone",
1668                "metadata",
1669            )
1670            roadmap_before = "" if created else _change_fingerprint(roadmap, roadmap_change_fields)
1671            if created:
1672                roadmap = {
1673                    "key": _norm_key(title),
1674                    "title": title,
1675                    "status": status,
1676                    "objective": objective.strip() or str(row["objective"] or "").strip(),
1677                    "scope": scope.strip(),
1678                    "validation_contract": validation_contract.strip(),
1679                    "current_milestone": current_milestone.strip(),
1680                    "milestones": [],
1681                    "metadata": metadata or {},
1682                    "created_at": now,
1683                }
1684            else:
1685                roadmap["title"] = title or roadmap.get("title") or "Roadmap"
1686                roadmap["status"] = status
1687                for field, value in {
1688                    "objective": objective.strip(),
1689                    "scope": scope.strip(),
1690                    "validation_contract": validation_contract.strip(),
1691                    "current_milestone": current_milestone.strip(),
1692                }.items():
1693                    if value:
1694                        roadmap[field] = value
1695                if metadata:
1696                    merged_metadata = roadmap.get("metadata") if isinstance(roadmap.get("metadata"), dict) else {}
1697                    merged_metadata.update(metadata)
1698                    roadmap["metadata"] = merged_metadata
1699
1700            stored_milestones = roadmap.get("milestones") if isinstance(roadmap.get("milestones"), list) else []
1701            added_milestones = 0
1702            updated_milestones = 0
1703            added_features = 0
1704            updated_features = 0
1705            touched: list[dict[str, Any]] = []
1706            for milestone in milestone_items[:100]:
1707                if not isinstance(milestone, dict):
1708                    continue
1709                milestone_title = str(milestone.get("title") or milestone.get("name") or "").strip()
1710                if not milestone_title:
1711                    continue
1712                milestone_key = _norm_key(str(milestone.get("key") or milestone_title))
1713                milestone_title_key = _norm_key(milestone_title)
1714                current = next(
1715                    (
1716                        entry for entry in stored_milestones
1717                        if entry.get("key") == milestone_key
1718                        or _norm_key(str(entry.get("title") or "")) == milestone_title_key
1719                    ),
1720                    None,
1721                )
1722                milestone_created = current is None
1723                milestone_change_fields = (
1724                    "title",
1725                    "status",
1726                    "priority",
1727                    "goal",
1728                    "acceptance_criteria",
1729                    "evidence_needed",
1730                    "validation_status",
1731                    "validation_result",
1732                    "next_action",
1733                    "metadata",
1734                )
1735                milestone_before = "" if milestone_created else _change_fingerprint(current, milestone_change_fields)
1736                if current is None:
1737                    current = {
1738                        "key": milestone_key,
1739                        "title": milestone_title,
1740                        "status": _clean_status(str(milestone.get("status") or "planned"), {"planned", "active", "validating", "done", "blocked", "skipped"}, "planned"),
1741                        "priority": int(milestone.get("priority") or 0),
1742                        "goal": str(milestone.get("goal") or milestone.get("description") or "").strip(),
1743                        "acceptance_criteria": str(milestone.get("acceptance_criteria") or "").strip(),
1744                        "evidence_needed": str(milestone.get("evidence_needed") or "").strip(),
1745                        "validation_status": _clean_status(str(milestone.get("validation_status") or "not_started"), {"not_started", "pending", "passed", "failed", "blocked"}, "not_started"),
1746                        "validation_result": str(milestone.get("validation_result") or "").strip(),
1747                        "next_action": str(milestone.get("next_action") or "").strip(),
1748                        "features": [],
1749                        "metadata": milestone.get("metadata") if isinstance(milestone.get("metadata"), dict) else {},
1750                        "created_at": now,
1751                    }
1752                    stored_milestones.append(current)
1753                    added_milestones += 1
1754                else:
1755                    current["status"] = _clean_status(str(milestone.get("status") or current.get("status") or "planned"), {"planned", "active", "validating", "done", "blocked", "skipped"}, "planned")
1756                    if "priority" in milestone:
1757                        current["priority"] = int(milestone.get("priority") or 0)
1758                    for field, value in {
1759                        "title": milestone_title,
1760                        "goal": str(milestone.get("goal") or milestone.get("description") or "").strip(),
1761                        "acceptance_criteria": str(milestone.get("acceptance_criteria") or "").strip(),
1762                        "evidence_needed": str(milestone.get("evidence_needed") or "").strip(),
1763                        "validation_status": _clean_status(str(milestone.get("validation_status") or ""), {"not_started", "pending", "passed", "failed", "blocked"}, ""),
1764                        "validation_result": str(milestone.get("validation_result") or "").strip(),
1765                        "next_action": str(milestone.get("next_action") or "").strip(),
1766                    }.items():
1767                        if value:
1768                            current[field] = value
1769                    if isinstance(milestone.get("metadata"), dict):
1770                        merged_metadata = current.get("metadata") if isinstance(current.get("metadata"), dict) else {}
1771                        merged_metadata.update(milestone["metadata"])
1772                        current["metadata"] = merged_metadata
1773                feature_items = milestone.get("features") if isinstance(milestone.get("features"), list) else []
1774                features = current.get("features") if isinstance(current.get("features"), list) else []
1775                feature_changed = False
1776                for feature in feature_items[:100]:
1777                    if not isinstance(feature, dict):
1778                        continue
1779                    stored_feature, feature_created, feature_updated = merge_feature(features, feature)
1780                    if stored_feature is None:
1781                        continue
1782                    if feature_created:
1783                        added_features += 1
1784                    elif feature_updated:
1785                        updated_features += 1
1786                    feature_changed = feature_changed or feature_updated
1787                current["features"] = features[-500:]
1788                current["created"] = milestone_created
1789                milestone_changed = milestone_created or milestone_before != _change_fingerprint(current, milestone_change_fields)
1790                current["substantive_update"] = milestone_changed or feature_changed
1791                if current["substantive_update"]:
1792                    current["updated_at"] = now
1793                if not milestone_created and milestone_changed:
1794                    updated_milestones += 1
1795                touched.append(current)
1796
1797            roadmap["milestones"] = stored_milestones[-500:]
1798            roadmap["created"] = created
1799            roadmap_substantive_update = created or roadmap_before != _change_fingerprint(roadmap, roadmap_change_fields)
1800            roadmap["substantive_update"] = roadmap_substantive_update
1801            if roadmap_substantive_update or added_milestones or updated_milestones or added_features or updated_features:
1802                roadmap["updated_at"] = now
1803            roadmap["added_milestones"] = added_milestones
1804            roadmap["updated_milestones"] = updated_milestones
1805            roadmap["added_features"] = added_features
1806            roadmap["updated_features"] = updated_features
1807            roadmap_has_change = bool(
1808                roadmap_substantive_update
1809                or added_milestones
1810                or updated_milestones
1811                or added_features
1812                or updated_features
1813            )
1814            if roadmap_has_change:
1815                event = _insert_event(
1816                    conn,
1817                    job_id=job_id,
1818                    event_type="roadmap",
1819                    title=roadmap.get("title") or title,
1820                    body=f"{roadmap.get('status')} | milestones +{added_milestones}/~{updated_milestones} | features +{added_features}/~{updated_features}",
1821                    metadata={
1822                        "created": created,
1823                        "substantive_update": roadmap_substantive_update,
1824                        "status": roadmap.get("status"),
1825                        "current_milestone": roadmap.get("current_milestone"),
1826                        "milestone_count": len(roadmap.get("milestones") or []),
1827                        "added_milestones": added_milestones,
1828                        "updated_milestones": updated_milestones,
1829                        "added_features": added_features,
1830                        "updated_features": updated_features,
1831                        "roadmap_updated": roadmap_substantive_update and not created,
1832                    },
1833                    created_at=now,
1834                )
1835                roadmap["event_id"] = event["id"]
1836            job_metadata["roadmap"] = roadmap
1837            job_metadata["last_roadmap_record"] = {
1838                "at": now,
1839                "updated_at": roadmap.get("updated_at") or now,
1840                "event_id": roadmap.get("event_id"),
1841                "created": created,
1842                "substantive_update": roadmap_substantive_update,
1843                "title": roadmap.get("title"),
1844                "status": roadmap.get("status"),
1845                "added_milestones": added_milestones,
1846                "updated_milestones": updated_milestones,
1847                "added_features": added_features,
1848                "updated_features": updated_features,
1849                "roadmap_updated": roadmap_substantive_update and not created,
1850                "milestones": touched[-10:],
1851            }
1852            conn.execute(
1853                "UPDATE jobs SET updated_at = ?, metadata_json = ? WHERE id = ?",
1854                (now, _json_dumps(job_metadata), job_id),
1855            )
1856            return roadmap
1857
1858        return self._write(op)
1859
1860    def append_milestone_validation_record(
1861        self,
1862        job_id: str,
1863        *,
1864        milestone: str,
1865        validation_status: str = "pending",
1866        result: str = "",
1867        evidence: str = "",
1868        issues: list[str] | None = None,
1869        next_action: str = "",
1870        metadata: dict[str, Any] | None = None,
1871    ) -> dict[str, Any]:
1872        now = utc_now()
1873        milestone = milestone.strip()
1874        if not milestone:
1875            raise ValueError("milestone is required")
1876        validation_status = _clean_status(validation_status, {"pending", "passed", "failed", "blocked"}, "pending")
1877        issue_values = [str(issue).strip() for issue in (issues or []) if str(issue).strip()]
1878        milestone_key = _norm_key(milestone)
1879
1880        def op(conn: sqlite3.Connection) -> dict[str, Any]:
1881            row = conn.execute("SELECT objective, metadata_json FROM jobs WHERE id = ?", (job_id,)).fetchone()
1882            if row is None:
1883                raise KeyError(f"Job not found: {job_id}")
1884            job_metadata = json.loads(row["metadata_json"] or "{}")
1885            roadmap = job_metadata.get("roadmap")
1886            if not isinstance(roadmap, dict):
1887                roadmap = {
1888                    "key": _norm_key(str(row["objective"] or "roadmap")),
1889                    "title": "Roadmap",
1890                    "status": "active",
1891                    "objective": str(row["objective"] or ""),
1892                    "scope": "",
1893                    "validation_contract": "",
1894                    "current_milestone": milestone,
1895                    "milestones": [],
1896                    "metadata": {},
1897                    "created_at": now,
1898                }
1899            milestones = roadmap.get("milestones") if isinstance(roadmap.get("milestones"), list) else []
1900            current = next(
1901                (
1902                    entry for entry in milestones
1903                    if entry.get("key") == milestone_key
1904                    or _norm_key(str(entry.get("title") or "")) == milestone_key
1905                ),
1906                None,
1907            )
1908            created = current is None
1909            if current is None:
1910                current = {
1911                    "key": milestone_key,
1912                    "title": milestone,
1913                    "status": "validating" if validation_status == "pending" else ("done" if validation_status == "passed" else "blocked"),
1914                    "priority": 0,
1915                    "goal": "",
1916                    "acceptance_criteria": "",
1917                    "evidence_needed": "",
1918                    "features": [],
1919                    "metadata": {},
1920                    "created_at": now,
1921                }
1922                milestones.append(current)
1923            current["validation_status"] = validation_status
1924            current["validation_result"] = result.strip()
1925            current["validation_evidence"] = evidence.strip()
1926            current["validation_issues"] = issue_values
1927            current["next_action"] = next_action.strip()
1928            if validation_status == "passed":
1929                current["status"] = "done"
1930            elif validation_status == "pending":
1931                current["status"] = "validating"
1932            elif validation_status in {"failed", "blocked"}:
1933                current["status"] = "blocked"
1934            if metadata:
1935                merged_metadata = current.get("metadata") if isinstance(current.get("metadata"), dict) else {}
1936                merged_metadata.update(metadata)
1937                current["metadata"] = merged_metadata
1938            current["updated_at"] = now
1939            current["created"] = created
1940            roadmap["milestones"] = milestones[-500:]
1941            roadmap["status"] = "active" if validation_status in {"failed", "blocked"} else ("validating" if validation_status == "pending" else roadmap.get("status") or "active")
1942            roadmap["current_milestone"] = current.get("title") or milestone
1943            roadmap["updated_at"] = now
1944            event = _insert_event(
1945                conn,
1946                job_id=job_id,
1947                event_type="milestone_validation",
1948                title=current.get("title") or milestone,
1949                body=result.strip() or validation_status,
1950                metadata={
1951                    "created": created,
1952                    "validation_status": validation_status,
1953                    "evidence": evidence.strip(),
1954                    "issues": issue_values,
1955                    "next_action": next_action.strip(),
1956                    **(metadata or {}),
1957                },
1958                created_at=now,
1959            )
1960            current["validation_event_id"] = event["id"]
1961            job_metadata["roadmap"] = roadmap
1962            job_metadata["last_milestone_validation"] = {
1963                "at": now,
1964                "validated_at": now,
1965                "event_id": event["id"],
1966                "milestone": current.get("title"),
1967                "validation_status": validation_status,
1968                "result": result.strip(),
1969                "issues": issue_values,
1970                "next_action": next_action.strip(),
1971            }
1972            conn.execute(
1973                "UPDATE jobs SET updated_at = ?, metadata_json = ? WHERE id = ?",
1974                (now, _json_dumps(job_metadata), job_id),
1975            )
1976            return current
1977
1978        return self._write(op)
1979
1980    def append_task_record(
1981        self,
1982        job_id: str,
1983        *,
1984        title: str,
1985        status: str = "open",
1986        priority: int = 0,
1987        goal: str = "",
1988        source_hint: str = "",
1989        result: str = "",
1990        parent: str = "",
1991        output_contract: str = "",
1992        acceptance_criteria: str = "",
1993        evidence_needed: str = "",
1994        stall_behavior: str = "",
1995        metadata: dict[str, Any] | None = None,
1996    ) -> dict[str, Any]:
1997        now = utc_now()
1998        title = title.strip()
1999        if not title:
2000            raise ValueError("title is required")
2001        status = (status.strip().lower() or "open").replace(" ", "_")
2002        if status not in {"open", "active", "done", "blocked", "skipped"}:
2003            status = "open"
2004        output_contract = output_contract.strip().lower().replace(" ", "_")
2005        if output_contract not in {"research", "artifact", "experiment", "action", "monitor", "decision", "report"}:
2006            output_contract = ""
2007        key = _norm_key(f"{parent}|{title}")
2008
2009        def op(conn: sqlite3.Connection) -> dict[str, Any]:
2010            row = conn.execute("SELECT metadata_json FROM jobs WHERE id = ?", (job_id,)).fetchone()
2011            if row is None:
2012                raise KeyError(f"Job not found: {job_id}")
2013            job_metadata = json.loads(row["metadata_json"] or "{}")
2014            tasks = _metadata_list(job_metadata, "task_queue")
2015            current = next(
2016                (
2017                    entry
2018                    for entry in tasks
2019                    if entry.get("key") == key
2020                    or (
2021                        not entry.get("key")
2022                        and _norm_key(f"{entry.get('parent') or ''}|{entry.get('title') or ''}") == key
2023                    )
2024                ),
2025                None,
2026            )
2027            created = current is None
2028            change_fields = (
2029                "status",
2030                "priority",
2031                "goal",
2032                "source_hint",
2033                "result",
2034                "parent",
2035                "output_contract",
2036                "acceptance_criteria",
2037                "evidence_needed",
2038                "stall_behavior",
2039                "metadata",
2040            )
2041            before = "" if created else _change_fingerprint(current, change_fields)
2042            if current is None:
2043                current = {
2044                    "key": key,
2045                    "title": title,
2046                    "status": status,
2047                    "priority": int(priority),
2048                    "goal": goal.strip(),
2049                    "source_hint": source_hint.strip(),
2050                    "result": result.strip(),
2051                    "parent": parent.strip(),
2052                    "output_contract": output_contract,
2053                    "acceptance_criteria": acceptance_criteria.strip(),
2054                    "evidence_needed": evidence_needed.strip(),
2055                    "stall_behavior": stall_behavior.strip(),
2056                    "metadata": metadata or {},
2057                    "created_at": now,
2058                }
2059                tasks.append(current)
2060            else:
2061                current["status"] = status
2062                current["priority"] = int(priority)
2063                for field, value in {
2064                    "goal": goal.strip(),
2065                    "source_hint": source_hint.strip(),
2066                    "result": result.strip(),
2067                    "parent": parent.strip(),
2068                    "output_contract": output_contract,
2069                    "acceptance_criteria": acceptance_criteria.strip(),
2070                    "evidence_needed": evidence_needed.strip(),
2071                    "stall_behavior": stall_behavior.strip(),
2072                }.items():
2073                    if value:
2074                        current[field] = value
2075                if metadata:
2076                    merged_metadata = current.get("metadata") if isinstance(current.get("metadata"), dict) else {}
2077                    merged_metadata.update(metadata)
2078                    current["metadata"] = merged_metadata
2079            current["created"] = created
2080            substantive_update = created or before != _change_fingerprint(current, change_fields)
2081            current["substantive_update"] = substantive_update
2082            if substantive_update:
2083                current["updated_at"] = now
2084            if substantive_update:
2085                event = _insert_event(
2086                    conn,
2087                    job_id=job_id,
2088                    event_type="task",
2089                    title=current.get("title") or title,
2090                    body=current.get("result") or current.get("goal") or "",
2091                    metadata={
2092                        "created": created,
2093                        "substantive_update": substantive_update,
2094                        "status": current.get("status"),
2095                        "priority": current.get("priority"),
2096                        "parent": current.get("parent"),
2097                        "source_hint": current.get("source_hint"),
2098                        "output_contract": current.get("output_contract"),
2099                        "acceptance_criteria": current.get("acceptance_criteria"),
2100                        "evidence_needed": current.get("evidence_needed"),
2101                        "stall_behavior": current.get("stall_behavior"),
2102                        **(current.get("metadata") if isinstance(current.get("metadata"), dict) else {}),
2103                    },
2104                    created_at=now,
2105                )
2106                current["event_id"] = event["id"]
2107            job_metadata["task_queue"] = tasks[-500:]
2108            job_metadata["last_task_record"] = current
2109            conn.execute(
2110                "UPDATE jobs SET updated_at = ?, metadata_json = ? WHERE id = ?",
2111                (now, _json_dumps(job_metadata), job_id),
2112            )
2113            return current
2114
2115        return self._write(op)
2116
2117    def append_experiment_record(
2118        self,
2119        job_id: str,
2120        *,
2121        title: str,
2122        hypothesis: str = "",
2123        status: str = "planned",
2124        metric_name: str = "",
2125        metric_value: float | None = None,
2126        metric_unit: str = "",
2127        higher_is_better: bool = True,
2128        baseline_value: float | None = None,
2129        config: dict[str, Any] | None = None,
2130        result: str = "",
2131        evidence_artifact: str = "",
2132        next_action: str = "",
2133        metadata: dict[str, Any] | None = None,
2134    ) -> dict[str, Any]:
2135        now = utc_now()
2136        title = title.strip()
2137        if not title:
2138            raise ValueError("title is required")
2139        status = (status.strip().lower() or "planned").replace(" ", "_")
2140        if status not in {"planned", "running", "measured", "failed", "blocked", "skipped"}:
2141            status = "planned"
2142        config_value = config if isinstance(config, dict) else {}
2143        key = _norm_key(f"{title}|{_json_dumps(config_value)}")
2144
2145        def op(conn: sqlite3.Connection) -> dict[str, Any]:
2146            row = conn.execute("SELECT metadata_json FROM jobs WHERE id = ?", (job_id,)).fetchone()
2147            if row is None:
2148                raise KeyError(f"Job not found: {job_id}")
2149            job_metadata = json.loads(row["metadata_json"] or "{}")
2150            experiments = _metadata_list(job_metadata, "experiment_ledger")
2151            current = next((entry for entry in experiments if entry.get("key") == key), None)
2152            created = current is None
2153            change_fields = (
2154                "hypothesis",
2155                "status",
2156                "metric_name",
2157                "metric_value",
2158                "metric_unit",
2159                "higher_is_better",
2160                "baseline_value",
2161                "config",
2162                "result",
2163                "evidence_artifact",
2164                "next_action",
2165                "metadata",
2166                "delta_from_previous_best",
2167                "best_observed",
2168            )
2169            before = "" if created else _change_fingerprint(current, change_fields)
2170            previous_best = _best_experiment_for_metric(
2171                experiments,
2172                metric_name=metric_name,
2173                metric_unit=metric_unit,
2174                higher_is_better=higher_is_better,
2175                exclude_key=key,
2176            )
2177            if current is None:
2178                current = {
2179                    "key": key,
2180                    "title": title,
2181                    "hypothesis": hypothesis.strip(),
2182                    "status": status,
2183                    "metric_name": metric_name.strip(),
2184                    "metric_value": metric_value,
2185                    "metric_unit": metric_unit.strip(),
2186                    "higher_is_better": bool(higher_is_better),
2187                    "baseline_value": baseline_value,
2188                    "config": config_value,
2189                    "result": result.strip(),
2190                    "evidence_artifact": evidence_artifact.strip(),
2191                    "next_action": next_action.strip(),
2192                    "metadata": metadata or {},
2193                    "created_at": now,
2194                }
2195                experiments.append(current)
2196            else:
2197                current["status"] = status
2198                for field, value in {
2199                    "hypothesis": hypothesis.strip(),
2200                    "metric_name": metric_name.strip(),
2201                    "metric_unit": metric_unit.strip(),
2202                    "result": result.strip(),
2203                    "evidence_artifact": evidence_artifact.strip(),
2204                    "next_action": next_action.strip(),
2205                }.items():
2206                    if value:
2207                        current[field] = value
2208                current["higher_is_better"] = bool(higher_is_better)
2209                if metric_value is not None:
2210                    current["metric_value"] = metric_value
2211                if baseline_value is not None:
2212                    current["baseline_value"] = baseline_value
2213                if config_value:
2214                    current["config"] = config_value
2215                if metadata:
2216                    merged_metadata = current.get("metadata") if isinstance(current.get("metadata"), dict) else {}
2217                    merged_metadata.update(metadata)
2218                    current["metadata"] = merged_metadata
2219            current["created"] = created
2220            current["delta_from_previous_best"] = _metric_delta(
2221                metric_value=current.get("metric_value"),
2222                previous_best=previous_best,
2223                higher_is_better=bool(current.get("higher_is_better", True)),
2224            )
2225            best = _mark_best_experiments(experiments)
2226            substantive_update = created or before != _change_fingerprint(current, change_fields)
2227            current["substantive_update"] = substantive_update
2228            if substantive_update:
2229                current["updated_at"] = now
2230            event_body = current.get("result") or ""
2231            if current.get("metric_value") is not None:
2232                event_body = format_metric_value(
2233                    current.get("metric_name") or "metric",
2234                    current.get("metric_value"),
2235                    current.get("metric_unit") or "",
2236                )
2237                if current.get("delta_from_previous_best") is not None:
2238                    event_body += f" delta={current.get('delta_from_previous_best')}"
2239                if current.get("result"):
2240                    event_body += f" | {current.get('result')}"
2241            if substantive_update:
2242                event = _insert_event(
2243                    conn,
2244                    job_id=job_id,
2245                    event_type="experiment",
2246                    title=current.get("title") or title,
2247                    body=event_body,
2248                    metadata={
2249                        "created": created,
2250                        "substantive_update": substantive_update,
2251                        "status": current.get("status"),
2252                        "metric_name": current.get("metric_name"),
2253                        "metric_value": current.get("metric_value"),
2254                        "metric_unit": current.get("metric_unit"),
2255                        "higher_is_better": current.get("higher_is_better"),
2256                        "best_observed": current.get("best_observed"),
2257                        "delta_from_previous_best": current.get("delta_from_previous_best"),
2258                        "evidence_artifact": current.get("evidence_artifact"),
2259                        **(current.get("metadata") if isinstance(current.get("metadata"), dict) else {}),
2260                    },
2261                    created_at=now,
2262                )
2263                current["event_id"] = event["id"]
2264            job_metadata["experiment_ledger"] = experiments[-1000:]
2265            job_metadata["last_experiment_record"] = current
2266            if best:
2267                job_metadata["best_experiment_record"] = best
2268            conn.execute(
2269                "UPDATE jobs SET updated_at = ?, metadata_json = ? WHERE id = ?",
2270                (now, _json_dumps(job_metadata), job_id),
2271            )
2272            return current
2273
2274        return self._write(op)
2275
2276    def append_reflection(
2277        self,
2278        job_id: str,
2279        summary: str,
2280        *,
2281        strategy: str = "",
2282        metadata: dict[str, Any] | None = None,
2283    ) -> dict[str, Any]:
2284        now = utc_now()
2285        text = summary.strip()
2286        if not text:
2287            raise ValueError("summary is required")
2288        entry = {
2289            "at": now,
2290            "summary": text,
2291            "strategy": strategy.strip(),
2292            "metadata": metadata or {},
2293        }
2294
2295        def op(conn: sqlite3.Connection) -> dict[str, Any]:
2296            row = conn.execute("SELECT metadata_json FROM jobs WHERE id = ?", (job_id,)).fetchone()
2297            if row is None:
2298                raise KeyError(f"Job not found: {job_id}")
2299            event = _insert_event(
2300                conn,
2301                job_id=job_id,
2302                event_type="reflection",
2303                title="reflection",
2304                body=text,
2305                metadata={"strategy": strategy.strip(), **(metadata or {})},
2306                created_at=now,
2307            )
2308            entry["event_id"] = event["id"]
2309            job_metadata = json.loads(row["metadata_json"] or "{}")
2310            reflections = _metadata_list(job_metadata, "reflections")
2311            reflections.append(entry)
2312            job_metadata["reflections"] = reflections[-100:]
2313            job_metadata["last_reflection"] = entry
2314            conn.execute(
2315                "UPDATE jobs SET updated_at = ?, metadata_json = ? WHERE id = ?",
2316                (now, _json_dumps(job_metadata), job_id),
2317            )
2318            return entry
2319
2320        return self._write(op)
2321
2322    def start_run(self, job_id: str, *, model: str = "", config_hash: str = "") -> str:
2323        run_id = new_id("run")
2324        now = utc_now()
2325
2326        def op(conn: sqlite3.Connection) -> str:
2327            conn.execute(
2328                """
2329                INSERT INTO job_runs(id, job_id, status, started_at, model, config_hash)
2330                VALUES (?, ?, 'running', ?, ?, ?)
2331                """,
2332                (run_id, job_id, now, model, config_hash),
2333            )
2334            conn.execute("UPDATE jobs SET status = 'running', updated_at = ? WHERE id = ?", (now, job_id))
2335            _insert_event(
2336                conn,
2337                job_id=job_id,
2338                event_type="daemon",
2339                title="run started",
2340                body=f"model={model}" if model else "",
2341                ref_table="job_runs",
2342                ref_id=run_id,
2343                metadata={"model": model, "config_hash": config_hash},
2344                created_at=now,
2345            )
2346            return run_id
2347
2348        return self._write(op)
2349
2350    def finish_run(self, run_id: str, status: str, *, score: float | None = None, error: str | None = None) -> None:
2351        now = utc_now()
2352
2353        def op(conn: sqlite3.Connection) -> None:
2354            conn.execute(
2355                "UPDATE job_runs SET status = ?, ended_at = ?, score = ?, error = ? WHERE id = ?",
2356                (status, now, score, error, run_id),
2357            )
2358
2359        self._write(op)
2360
2361    def mark_interrupted_running(self, *, reason: str = "daemon interrupted active work") -> dict[str, int]:
2362        now = utc_now()
2363        output = {"success": False, "error": reason, "error_type": "Interrupted"}
2364
2365        def op(conn: sqlite3.Connection) -> dict[str, int]:
2366            step_result = conn.execute(
2367                """
2368                UPDATE steps
2369                SET status = 'failed',
2370                    ended_at = ?,
2371                    summary = COALESCE(summary, ?),
2372                    output_json = ?,
2373                    error = ?
2374                WHERE status = 'running'
2375                """,
2376                (now, reason, _json_dumps(output), reason),
2377            )
2378            run_result = conn.execute(
2379                """
2380                UPDATE job_runs
2381                SET status = 'failed',
2382                    ended_at = ?,
2383                    error = ?
2384                WHERE status = 'running'
2385                """,
2386                (now, reason),
2387            )
2388            return {"steps": int(step_result.rowcount or 0), "runs": int(run_result.rowcount or 0)}
2389
2390        return self._write(op)
2391
2392    def next_step_no(self, job_id: str) -> int:
2393        row = self._conn.execute("SELECT COALESCE(MAX(step_no), 0) + 1 AS next_step FROM steps WHERE job_id = ?", (job_id,)).fetchone()
2394        return int(row["next_step"])
2395
2396    def add_step(
2397        self,
2398        *,
2399        job_id: str,
2400        run_id: str,
2401        kind: str,
2402        status: str = "running",
2403        tool_name: str | None = None,
2404        summary: str | None = None,
2405        input_data: dict[str, Any] | None = None,
2406    ) -> str:
2407        step_id = new_id("step")
2408        step_no = self.next_step_no(job_id)
2409        now = utc_now()
2410
2411        def op(conn: sqlite3.Connection) -> str:
2412            conn.execute(
2413                """
2414                INSERT INTO steps(id, job_id, run_id, step_no, kind, status, tool_name, started_at, summary, input_json)
2415                VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
2416                """,
2417                (step_id, job_id, run_id, step_no, kind, status, tool_name, now, summary, _json_dumps(input_data)),
2418            )
2419            _insert_event(
2420                conn,
2421                job_id=job_id,
2422                event_type="tool_call" if tool_name else kind,
2423                title=tool_name or kind,
2424                body=summary or "",
2425                ref_table="steps",
2426                ref_id=step_id,
2427                metadata={"run_id": run_id, "step_no": step_no, "kind": kind, "status": status, "input": input_data or {}},
2428                created_at=now,
2429            )
2430            return step_id
2431
2432        return self._write(op)
2433
2434    def finish_step(
2435        self,
2436        step_id: str,
2437        *,
2438        status: str,
2439        summary: str | None = None,
2440        output_data: dict[str, Any] | None = None,
2441        error: str | None = None,
2442    ) -> None:
2443        now = utc_now()
2444
2445        def op(conn: sqlite3.Connection) -> None:
2446            row = conn.execute("SELECT job_id, run_id, step_no, kind, tool_name FROM steps WHERE id = ?", (step_id,)).fetchone()
2447            conn.execute(
2448                """
2449                UPDATE steps
2450                SET status = ?, ended_at = ?, summary = COALESCE(?, summary), output_json = ?, error = ?
2451                WHERE id = ?
2452                """,
2453                (status, now, summary, _json_dumps(output_data), error, step_id),
2454            )
2455            if row is not None:
2456                event_type = "error" if status == "failed" or error else "tool_result"
2457                if row["kind"] == "reflection" and not error:
2458                    event_type = "reflection"
2459                _insert_event(
2460                    conn,
2461                    job_id=row["job_id"],
2462                    event_type=event_type,
2463                    title=row["tool_name"] or row["kind"],
2464                    body=summary or error or "",
2465                    ref_table="steps",
2466                    ref_id=step_id,
2467                    metadata={
2468                        "run_id": row["run_id"],
2469                        "step_no": row["step_no"],
2470                        "kind": row["kind"],
2471                        "status": status,
2472                        "output": output_data or {},
2473                        "error": error,
2474                    },
2475                    created_at=now,
2476                )
2477
2478        self._write(op)
2479
2480    def list_steps(self, *, job_id: str | None = None, run_id: str | None = None, limit: int | None = None) -> list[dict[str, Any]]:
2481        if run_id:
2482            if limit:
2483                rows = self._conn.execute(
2484                    """
2485                    SELECT * FROM (
2486                        SELECT * FROM steps WHERE run_id = ? ORDER BY step_no DESC LIMIT ?
2487                    ) ORDER BY step_no
2488                    """,
2489                    (run_id, int(limit)),
2490                ).fetchall()
2491            else:
2492                rows = self._conn.execute("SELECT * FROM steps WHERE run_id = ? ORDER BY step_no", (run_id,)).fetchall()
2493        elif job_id:
2494            if limit:
2495                rows = self._conn.execute(
2496                    """
2497                    SELECT * FROM (
2498                        SELECT * FROM steps WHERE job_id = ? ORDER BY step_no DESC LIMIT ?
2499                    ) ORDER BY step_no
2500                    """,
2501                    (job_id, int(limit)),
2502                ).fetchall()
2503            else:
2504                rows = self._conn.execute("SELECT * FROM steps WHERE job_id = ? ORDER BY started_at", (job_id,)).fetchall()
2505        else:
2506            if limit:
2507                rows = self._conn.execute(
2508                    """
2509                    SELECT * FROM (
2510                        SELECT * FROM steps ORDER BY started_at DESC LIMIT ?
2511                    ) ORDER BY started_at
2512                    """,
2513                    (int(limit),),
2514                ).fetchall()
2515            else:
2516                rows = self._conn.execute("SELECT * FROM steps ORDER BY started_at").fetchall()
2517        return [_row_to_dict(row) for row in rows]
2518
2519    def job_record_counts(self, job_id: str) -> dict[str, int]:
2520        row = self._conn.execute(
2521            """
2522            SELECT
2523                (SELECT COUNT(*) FROM steps WHERE job_id = ?) AS steps,
2524                (SELECT COUNT(*) FROM artifacts WHERE job_id = ?) AS artifacts,
2525                (SELECT COUNT(*) FROM memory_index WHERE job_id = ?) AS memory,
2526                (SELECT COUNT(*) FROM events WHERE job_id = ?) AS events
2527            """,
2528            (job_id, job_id, job_id, job_id),
2529        ).fetchone()
2530        return {
2531            "steps": int(row["steps"] or 0),
2532            "artifacts": int(row["artifacts"] or 0),
2533            "memory": int(row["memory"] or 0),
2534            "events": int(row["events"] or 0),
2535        }
2536
2537    def job_token_usage(self, job_id: str) -> dict[str, Any]:
2538        rows = self._conn.execute(
2539            """
2540            SELECT created_at, metadata_json
2541            FROM events
2542            WHERE job_id = ? AND event_type = 'loop' AND title = 'message_end'
2543            ORDER BY created_at ASC, id ASC
2544            """,
2545            (job_id,),
2546        ).fetchall()
2547        totals: dict[str, Any] = {
2548            "prompt_tokens": 0,
2549            "completion_tokens": 0,
2550            "total_tokens": 0,
2551            "reasoning_tokens": 0,
2552            "cached_tokens": 0,
2553            "cost": 0.0,
2554            "calls": 0,
2555            "estimated_calls": 0,
2556            "latest_prompt_tokens": 0,
2557            "latest_completion_tokens": 0,
2558            "latest_total_tokens": 0,
2559            "latest_context_length": 0,
2560            "latest_context_fraction": 0.0,
2561            "latest_at": "",
2562            "has_cost": False,
2563        }
2564        for row in rows:
2565            metadata = _json_loads(row["metadata_json"])
2566            usage = metadata.get("usage")
2567            if not isinstance(usage, dict):
2568                continue
2569            prompt = _as_int(usage.get("prompt_tokens"))
2570            completion = _as_int(usage.get("completion_tokens"))
2571            total = _as_int(usage.get("total_tokens")) or prompt + completion
2572            totals["prompt_tokens"] += prompt
2573            totals["completion_tokens"] += completion
2574            totals["total_tokens"] += total
2575            totals["reasoning_tokens"] += _as_int(_nested_value(usage, "completion_tokens_details", "reasoning_tokens"))
2576            totals["cached_tokens"] += _as_int(_nested_value(usage, "prompt_tokens_details", "cached_tokens"))
2577            cost = _as_float(usage.get("cost"))
2578            if cost is not None:
2579                totals["cost"] += cost
2580                totals["has_cost"] = True
2581            totals["calls"] += 1
2582            if bool(usage.get("estimated")):
2583                totals["estimated_calls"] += 1
2584            totals["latest_prompt_tokens"] = prompt
2585            totals["latest_completion_tokens"] = completion
2586            totals["latest_total_tokens"] = total
2587            totals["latest_context_length"] = _as_int(usage.get("context_length"))
2588            totals["latest_context_fraction"] = _as_float(usage.get("context_fraction")) or 0.0
2589            totals["latest_at"] = str(row["created_at"] or "")
2590        return totals
2591
2592    def list_runs(self, job_id: str, *, limit: int = 50) -> list[dict[str, Any]]:
2593        rows = self._conn.execute(
2594            "SELECT * FROM job_runs WHERE job_id = ? ORDER BY started_at DESC LIMIT ?",
2595            (job_id, limit),
2596        ).fetchall()
2597        return [_row_to_dict(row) for row in rows]
2598
2599    def add_artifact(
2600        self,
2601        *,
2602        job_id: str,
2603        path: str | Path,
2604        sha256: str,
2605        artifact_type: str,
2606        run_id: str | None = None,
2607        step_id: str | None = None,
2608        title: str | None = None,
2609        summary: str | None = None,
2610        metadata: dict[str, Any] | None = None,
2611    ) -> str:
2612        artifact_id = new_id("art")
2613        now = utc_now()
2614
2615        def op(conn: sqlite3.Connection) -> str:
2616            conn.execute(
2617                """
2618                INSERT INTO artifacts(id, job_id, run_id, step_id, type, path, sha256, title, summary, metadata_json, created_at)
2619                VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
2620                """,
2621                (
2622                    artifact_id,
2623                    job_id,
2624                    run_id,
2625                    step_id,
2626                    artifact_type,
2627                    str(path),
2628                    sha256,
2629                    title,
2630                    summary,
2631                    _json_dumps(metadata),
2632                    now,
2633                ),
2634            )
2635            _insert_event(
2636                conn,
2637                job_id=job_id,
2638                event_type="artifact",
2639                title=title or artifact_id,
2640                body=summary or str(path),
2641                ref_table="artifacts",
2642                ref_id=artifact_id,
2643                metadata={"type": artifact_type, "path": str(path), "sha256": sha256, **(metadata or {})},
2644                created_at=now,
2645            )
2646            return artifact_id
2647
2648        return self._write(op)
2649
2650    def get_artifact(self, artifact_id: str) -> dict[str, Any]:
2651        row = self._conn.execute("SELECT * FROM artifacts WHERE id = ?", (artifact_id,)).fetchone()
2652        artifact = _row_to_dict(row)
2653        if artifact is None:
2654            raise KeyError(f"Artifact not found: {artifact_id}")
2655        return artifact
2656
2657    def list_artifacts(self, job_id: str, *, limit: int = 100) -> list[dict[str, Any]]:
2658        rows = self._conn.execute(
2659            "SELECT * FROM artifacts WHERE job_id = ? ORDER BY created_at DESC LIMIT ?",
2660            (job_id, limit),
2661        ).fetchall()
2662        return [_row_to_dict(row) for row in rows]
2663
2664    def upsert_memory(
2665        self,
2666        *,
2667        job_id: str,
2668        key: str,
2669        summary: str,
2670        artifact_refs: list[str] | None = None,
2671    ) -> str:
2672        memory_id = new_id("mem")
2673        now = utc_now()
2674
2675        def op(conn: sqlite3.Connection) -> str:
2676            conn.execute(
2677                """
2678                INSERT INTO memory_index(id, job_id, key, summary, artifact_refs_json, updated_at)
2679                VALUES (?, ?, ?, ?, ?, ?)
2680                ON CONFLICT(job_id, key) DO UPDATE SET
2681                    summary = excluded.summary,
2682                    artifact_refs_json = excluded.artifact_refs_json,
2683                    updated_at = excluded.updated_at
2684                """,
2685                (memory_id, job_id, key, summary, _json_dumps(artifact_refs or []), now),
2686            )
2687            row = conn.execute("SELECT id FROM memory_index WHERE job_id = ? AND key = ?", (job_id, key)).fetchone()
2688            current_id = str(row["id"])
2689            _insert_event(
2690                conn,
2691                job_id=job_id,
2692                event_type="compaction",
2693                title=key,
2694                body=summary,
2695                ref_table="memory_index",
2696                ref_id=current_id,
2697                metadata={"artifact_refs": artifact_refs or []},
2698                created_at=now,
2699            )
2700            return current_id
2701
2702        return self._write(op)
2703
2704    def list_memory(self, job_id: str) -> list[dict[str, Any]]:
2705        rows = self._conn.execute(
2706            "SELECT * FROM memory_index WHERE job_id = ? ORDER BY updated_at DESC",
2707            (job_id,),
2708        ).fetchall()
2709        return [_row_to_dict(row) for row in rows]
2710
2711    def digest_exists(self, *, day: str, target: str) -> bool:
2712        row = self._conn.execute(
2713            "SELECT 1 FROM digests WHERE day = ? AND target = ? AND status IN ('sent', 'dry_run') LIMIT 1",
2714            (day, target),
2715        ).fetchone()
2716        return row is not None
2717
2718    def record_digest(
2719        self,
2720        *,
2721        day: str,
2722        target: str,
2723        subject: str,
2724        body_path: str | Path,
2725        status: str,
2726        error: str | None = None,
2727    ) -> str:
2728        digest_id = new_id("dig")
2729        sent_at = utc_now() if status in {"sent", "dry_run"} else None
2730
2731        def op(conn: sqlite3.Connection) -> str:
2732            conn.execute(
2733                """
2734                INSERT INTO digests(id, day, target, subject, body_path, sent_at, status, error)
2735                VALUES (?, ?, ?, ?, ?, ?, ?, ?)
2736                """,
2737                (digest_id, day, target, subject, str(body_path), sent_at, status, error),
2738            )
2739            _insert_event(
2740                conn,
2741                job_id=None,
2742                event_type="digest",
2743                title=subject,
2744                body=str(body_path),
2745                ref_table="digests",
2746                ref_id=digest_id,
2747                metadata={"day": day, "target": target, "status": status, "error": error},
2748                created_at=sent_at or utc_now(),
2749            )
2750            return digest_id
2751
2752        return self._write(op)
nipux_cli/digest.py 380 lines
   1"""Digest rendering and optional email delivery."""
   2
   3from __future__ import annotations
   4
   5import smtplib
   6from datetime import date
   7from email.message import EmailMessage
   8from pathlib import Path
   9from typing import Any
  10
  11from nipux_cli.config import AppConfig, EmailConfig
  12from nipux_cli.db import AgentDB
  13from nipux_cli.operator_context import active_prompt_operator_entries
  14from nipux_cli.tui_layout import _format_compact_count, _format_usage_cost
  15
  16
  17def _metadata_list(job: dict, key: str) -> list[dict]:
  18    metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
  19    values = metadata.get(key)
  20    return [value for value in values if isinstance(value, dict)] if isinstance(values, list) else []
  21
  22
  23def _active_operator_messages(messages: list[dict]) -> list[dict]:
  24    prompt_entries = active_prompt_operator_entries(messages)
  25    return [
  26        entry for entry in messages
  27        if str(entry.get("mode") or "steer") in {"steer", "follow_up"}
  28        and entry in prompt_entries
  29    ]
  30
  31
  32def _safe_int(value: Any) -> int:
  33    try:
  34        return int(float(value))
  35    except (TypeError, ValueError):
  36        return 0
  37
  38
  39def _latest_run_model(db: AgentDB, job_id: str) -> str:
  40    runs = db.list_runs(job_id, limit=1)
  41    if runs:
  42        return str(runs[0].get("model") or "unknown")
  43    return "unknown"
  44
  45
  46def _usage_lines(
  47    db: AgentDB,
  48    job_id: str,
  49    *,
  50    model: str | None = None,
  51    base_url: str = "",
  52    context_length: int = 0,
  53    input_cost_per_million: float | None = None,
  54    output_cost_per_million: float | None = None,
  55) -> list[str]:
  56    usage = db.job_token_usage(job_id)
  57    usage["input_cost_per_million"] = input_cost_per_million
  58    usage["output_cost_per_million"] = output_cost_per_million
  59    calls = _safe_int(usage.get("calls"))
  60    if calls <= 0:
  61        return ["- No model usage recorded yet."]
  62    model_name = model or _latest_run_model(db, job_id)
  63    prompt = _safe_int(usage.get("prompt_tokens"))
  64    completion = _safe_int(usage.get("completion_tokens"))
  65    total = _safe_int(usage.get("total_tokens")) or prompt + completion
  66    latest_prompt = _safe_int(usage.get("latest_prompt_tokens"))
  67    latest_completion = _safe_int(usage.get("latest_completion_tokens"))
  68    context_text = _format_compact_count(latest_prompt)
  69    if context_length > 0:
  70        context_text = f"{context_text}/{_format_compact_count(context_length)}"
  71    cost_text = _format_usage_cost(usage, model=model_name, base_url=base_url)
  72    lines = [
  73        (
  74            f"- {model_name}: {calls} calls, {_format_compact_count(total)} tokens "
  75            f"({_format_compact_count(prompt)} prompt, {_format_compact_count(completion)} output), "
  76            f"latest ctx={context_text}, latest output={_format_compact_count(latest_completion)}, cost={cost_text}"
  77        )
  78    ]
  79    if _safe_int(usage.get("estimated_calls")):
  80        lines.append("- Some token/cost values are estimated because the provider did not return complete usage metadata.")
  81    elif not bool(usage.get("has_cost")) and cost_text == "pending":
  82        lines.append("- Cost is pending until the provider returns cost metadata or the model is configured as local/free.")
  83    return lines
  84
  85
  86def render_job_digest(
  87    db: AgentDB,
  88    job_id: str,
  89    *,
  90    model: str | None = None,
  91    base_url: str = "",
  92    context_length: int = 0,
  93    input_cost_per_million: float | None = None,
  94    output_cost_per_million: float | None = None,
  95) -> str:
  96    job = db.get_job(job_id)
  97    artifacts = db.list_artifacts(job_id, limit=50)
  98    steps = db.list_steps(job_id=job_id)
  99    findings = _metadata_list(job, "finding_ledger")
 100    sources = _metadata_list(job, "source_ledger")
 101    tasks = _metadata_list(job, "task_queue")
 102    experiments = _metadata_list(job, "experiment_ledger")
 103    lessons = _metadata_list(job, "lessons")
 104    reflections = _metadata_list(job, "reflections")
 105    operator_messages = _metadata_list(job, "operator_messages")
 106    active_operator = _active_operator_messages(operator_messages)
 107    lines = [
 108        f"# {job['title']}",
 109        "",
 110        f"Status: {job['status']}",
 111        f"Findings: {len(findings)}",
 112        f"Sources: {len(sources)}",
 113        f"Tasks: {len(tasks)}",
 114        f"Experiments: {len(experiments)}",
 115        f"Lessons: {len(lessons)}",
 116        "",
 117        "## Model Usage",
 118        "",
 119        *_usage_lines(
 120            db,
 121            job_id,
 122            model=model,
 123            base_url=base_url,
 124            context_length=context_length,
 125            input_cost_per_million=input_cost_per_million,
 126            output_cost_per_million=output_cost_per_million,
 127        ),
 128        "",
 129        "## Objective",
 130        "",
 131        job["objective"],
 132        "",
 133        "## Active Operator Context",
 134        "",
 135    ]
 136    if not active_operator:
 137        lines.append("- none")
 138    for entry in active_operator[-8:]:
 139        lines.append(f"- {entry.get('mode') or 'steer'}: {entry.get('message') or ''}")
 140    lines.extend([
 141        "",
 142        "## Recent Steps",
 143        "",
 144    ])
 145    if not steps:
 146        lines.append("- No steps have run yet.")
 147    for step in steps[-20:]:
 148        tool = f" `{step['tool_name']}`" if step.get("tool_name") else ""
 149        lines.append(f"- #{step['step_no']} {step['kind']}{tool}: {step['status']} - {step.get('summary') or ''}")
 150    lines.extend(["", "## Best Findings", ""])
 151    if not findings:
 152        lines.append("- No findings recorded yet.")
 153    for finding in sorted(findings, key=lambda item: float(item.get("score") or 0), reverse=True)[:15]:
 154        details = " | ".join(str(finding.get(key) or "") for key in ("category", "location", "contact") if finding.get(key))
 155        suffix = f" - {details}" if details else ""
 156        lines.append(f"- {finding.get('name') or 'unknown'} (score={finding.get('score')}){suffix}")
 157        if finding.get("reason"):
 158            lines.append(f"  - {finding['reason']}")
 159    lines.extend(["", "## Source Learning", ""])
 160    if not sources:
 161        lines.append("- No sources scored yet.")
 162    for source in sorted(sources, key=lambda item: float(item.get("usefulness_score") or 0), reverse=True)[:12]:
 163        lines.append(
 164            f"- {source.get('source')} score={source.get('usefulness_score')} "
 165            f"findings={source.get('yield_count') or 0} fails={source.get('fail_count') or 0}: {source.get('last_outcome') or ''}"
 166        )
 167    lines.extend(["", "## Task Queue", ""])
 168    if not tasks:
 169        lines.append("- No tasks recorded yet.")
 170    status_order = {"active": 0, "open": 1, "blocked": 2, "done": 3, "skipped": 4}
 171    for task in sorted(tasks, key=lambda item: (status_order.get(str(item.get("status") or "open"), 9), -int(item.get("priority") or 0)))[:15]:
 172        contract = f" [{task.get('output_contract')}]" if task.get("output_contract") else ""
 173        lines.append(f"- {task.get('status') or 'open'} p={task.get('priority') or 0}{contract}: {task.get('title') or 'untitled'}")
 174        for key, label in (("acceptance_criteria", "accept"), ("evidence_needed", "evidence"), ("stall_behavior", "stall")):
 175            if task.get(key):
 176                lines.append(f"  - {label}: {task[key]}")
 177        if task.get("result"):
 178            lines.append(f"  - {task['result']}")
 179    lines.extend(["", "## Experiments", ""])
 180    if not experiments:
 181        lines.append("- No experiments recorded yet.")
 182    measured = [experiment for experiment in experiments if experiment.get("metric_value") is not None]
 183    for experiment in sorted(measured or experiments, key=lambda item: (not bool(item.get("best_observed")), str(item.get("updated_at") or item.get("created_at") or "")))[:15]:
 184        metric = ""
 185        if experiment.get("metric_value") is not None:
 186            metric = f" {experiment.get('metric_name') or 'metric'}={experiment.get('metric_value')}{experiment.get('metric_unit') or ''}"
 187        best = " best" if experiment.get("best_observed") else ""
 188        lines.append(f"- {experiment.get('status') or 'planned'}: {experiment.get('title') or 'experiment'}{metric}{best}")
 189        if experiment.get("result"):
 190            lines.append(f"  - {experiment['result']}")
 191    lines.extend(["", "## Lessons", ""])
 192    if not lessons:
 193        lines.append("- No lessons recorded yet.")
 194    for lesson in lessons[-12:]:
 195        lines.append(f"- {lesson.get('category') or 'memory'}: {lesson.get('lesson') or ''}")
 196    if reflections:
 197        lines.extend(["", "## Current Strategy", ""])
 198        reflection = reflections[-1]
 199        lines.append(reflection.get("summary") or "")
 200        if reflection.get("strategy"):
 201            lines.append("")
 202            lines.append(reflection["strategy"])
 203    lines.extend(["", "## Artifacts", ""])
 204    if not artifacts:
 205        lines.append("- No artifacts yet.")
 206    for artifact in artifacts[:20]:
 207        title = artifact.get("title") or artifact["id"]
 208        lines.append(f"- {title} ({artifact['type']}): {artifact['path']}")
 209    return "\n".join(lines).rstrip() + "\n"
 210
 211
 212def send_digest_email(config: EmailConfig, *, subject: str, body: str, to_addr: str | None = None) -> dict:
 213    if not config.enabled:
 214        return {"sent": False, "dry_run": True, "reason": "email.disabled", "subject": subject, "body": body}
 215    target = to_addr or config.to_addr
 216    if not all([config.smtp_host, config.from_addr, target]):
 217        raise ValueError("Email is enabled but smtp_host/from_addr/to_addr is incomplete")
 218    message = EmailMessage()
 219    message["Subject"] = subject
 220    message["From"] = config.from_addr
 221    message["To"] = target
 222    message.set_content(body)
 223    with smtplib.SMTP(config.smtp_host, config.smtp_port, timeout=30) as smtp:
 224        if config.use_tls:
 225            smtp.starttls()
 226        if config.username:
 227            smtp.login(config.username, config.password)
 228        smtp.send_message(message)
 229    return {"sent": True, "target": target, "subject": subject}
 230
 231
 232def render_daily_digest(
 233    db: AgentDB,
 234    *,
 235    model: str | None = None,
 236    base_url: str = "",
 237    context_length: int = 0,
 238    input_cost_per_million: float | None = None,
 239    output_cost_per_million: float | None = None,
 240) -> str:
 241    jobs = [job for job in db.list_jobs() if job["status"] not in {"cancelled"}]
 242    lines = ["# Nipux CLI Daily Digest", ""]
 243    if not jobs:
 244        lines.append("No jobs are currently tracked.")
 245        return "\n".join(lines).rstrip() + "\n"
 246
 247    for job in jobs:
 248        artifacts = db.list_artifacts(job["id"], limit=10)
 249        steps = db.list_steps(job_id=job["id"])[-5:]
 250        findings = _metadata_list(job, "finding_ledger")
 251        sources = _metadata_list(job, "source_ledger")
 252        tasks = _metadata_list(job, "task_queue")
 253        experiments = _metadata_list(job, "experiment_ledger")
 254        lessons = _metadata_list(job, "lessons")
 255        reflections = _metadata_list(job, "reflections")
 256        operator_messages = _metadata_list(job, "operator_messages")
 257        active_operator = _active_operator_messages(operator_messages)
 258        finding_batches = [artifact for artifact in artifacts if "finding" in str(artifact.get("title") or artifact.get("summary") or "").lower()]
 259        lines.extend([
 260            f"## {job['title']}",
 261            "",
 262            f"Status: {job['status']}",
 263            f"Kind: {job['kind']}",
 264            f"Counts: {len(findings)} findings, {len(sources)} sources, {len(tasks)} tasks, {len(experiments)} experiments, {len(lessons)} lessons, {len(finding_batches)} recent finding artifacts",
 265            "",
 266            "Model usage:",
 267        ])
 268        lines.extend(
 269            _usage_lines(
 270                db,
 271                job["id"],
 272                model=model,
 273                base_url=base_url,
 274                context_length=context_length,
 275                input_cost_per_million=input_cost_per_million,
 276                output_cost_per_million=output_cost_per_million,
 277            )
 278        )
 279        lines.extend([
 280            "",
 281            "Recent steps:",
 282        ])
 283        if not steps:
 284            lines.append("- none")
 285        for step in steps:
 286            tool = f" `{step['tool_name']}`" if step.get("tool_name") else ""
 287            lines.append(f"- #{step['step_no']} {step['kind']}{tool}: {step['status']} - {step.get('summary') or ''}")
 288        lines.extend(["", "Active operator context:"])
 289        if not active_operator:
 290            lines.append("- none")
 291        for entry in active_operator[-5:]:
 292            lines.append(f"- {entry.get('mode') or 'steer'}: {entry.get('message') or ''}")
 293        lines.extend(["", "Best findings:"])
 294        if not findings:
 295            lines.append("- none")
 296        for finding in sorted(findings, key=lambda item: float(item.get("score") or 0), reverse=True)[:8]:
 297            lines.append(f"- {finding.get('name') or 'unknown'} (score={finding.get('score')}) - {finding.get('reason') or finding.get('category') or ''}")
 298        lines.extend(["", "Task queue:"])
 299        if not tasks:
 300            lines.append("- none")
 301        status_order = {"active": 0, "open": 1, "blocked": 2, "done": 3, "skipped": 4}
 302        for task in sorted(tasks, key=lambda item: (status_order.get(str(item.get("status") or "open"), 9), -int(item.get("priority") or 0)))[:8]:
 303            contract = f" [{task.get('output_contract')}]" if task.get("output_contract") else ""
 304            lines.append(f"- {task.get('status') or 'open'} p={task.get('priority') or 0}{contract}: {task.get('title') or 'untitled'}")
 305        lines.extend(["", "Experiments:"])
 306        if not experiments:
 307            lines.append("- none")
 308        measured = [experiment for experiment in experiments if experiment.get("metric_value") is not None]
 309        for experiment in (measured or experiments)[-8:]:
 310            metric = ""
 311            if experiment.get("metric_value") is not None:
 312                metric = f" {experiment.get('metric_name') or 'metric'}={experiment.get('metric_value')}{experiment.get('metric_unit') or ''}"
 313            best = " best" if experiment.get("best_observed") else ""
 314            lines.append(f"- {experiment.get('status') or 'planned'}: {experiment.get('title') or 'experiment'}{metric}{best}")
 315        lines.extend(["", "Lessons learned:"])
 316        if not lessons:
 317            lines.append("- none")
 318        for lesson in lessons[-8:]:
 319            lines.append(f"- {lesson.get('category') or 'memory'}: {lesson.get('lesson') or ''}")
 320        lines.extend(["", "Source quality:"])
 321        if not sources:
 322            lines.append("- none")
 323        for source in sorted(sources, key=lambda item: float(item.get("usefulness_score") or 0), reverse=True)[:8]:
 324            lines.append(f"- {source.get('source')} score={source.get('usefulness_score')} findings={source.get('yield_count') or 0}: {source.get('last_outcome') or ''}")
 325        if reflections:
 326            reflection = reflections[-1]
 327            lines.extend(["", "Current strategy:", f"- {reflection.get('strategy') or reflection.get('summary') or ''}"])
 328        lines.extend(["", "Next branches:"])
 329        lines.append("- Continue with high-yield source types, avoid low-yield paths, and save durable findings as artifacts.")
 330        lines.extend(["", "Recent artifacts:"])
 331        if not artifacts:
 332            lines.append("- none")
 333        for artifact in artifacts:
 334            title = artifact.get("title") or artifact["id"]
 335            lines.append(f"- {title} ({artifact['type']}): {artifact['path']}")
 336        lines.append("")
 337    return "\n".join(lines).rstrip() + "\n"
 338
 339
 340def write_daily_digest(config: AppConfig, db: AgentDB, *, day: str | None = None) -> dict:
 341    day = day or date.today().isoformat()
 342    target = config.email.to_addr or "dry-run"
 343    subject = f"Nipux CLI daily digest - {day}"
 344    if db.digest_exists(day=day, target=target):
 345        return {"sent": False, "skipped": True, "reason": "already_recorded", "day": day, "target": target}
 346
 347    body = render_daily_digest(
 348        db,
 349        model=config.model.model,
 350        base_url=config.model.base_url,
 351        context_length=config.model.context_length,
 352        input_cost_per_million=config.model.input_cost_per_million,
 353        output_cost_per_million=config.model.output_cost_per_million,
 354    )
 355    config.runtime.digests_dir.mkdir(parents=True, exist_ok=True)
 356    body_path = Path(config.runtime.digests_dir) / f"{day}-daily.md"
 357    body_path.write_text(body, encoding="utf-8")
 358
 359    try:
 360        email_result = send_digest_email(config.email, subject=subject, body=body)
 361        status = "sent" if email_result.get("sent") else "dry_run"
 362        digest_id = db.record_digest(day=day, target=target, subject=subject, body_path=body_path, status=status)
 363        return {
 364            "digest_id": digest_id,
 365            "status": status,
 366            "day": day,
 367            "target": target,
 368            "path": str(body_path),
 369            "email": email_result,
 370        }
 371    except Exception as exc:
 372        digest_id = db.record_digest(
 373            day=day,
 374            target=target,
 375            subject=subject,
 376            body_path=body_path,
 377            status="failed",
 378            error=str(exc),
 379        )
 380        return {"digest_id": digest_id, "status": "failed", "day": day, "target": target, "path": str(body_path), "error": str(exc)}
nipux_cli/doctor.py 260 lines
   1"""Runtime checks for the Nipux agent."""
   2
   3from __future__ import annotations
   4
   5import json
   6import shutil
   7import urllib.error
   8import urllib.request
   9from dataclasses import dataclass
  10from pathlib import Path
  11from typing import Any
  12from urllib.parse import urlparse
  13
  14from nipux_cli.config import AppConfig, load_config
  15from nipux_cli.db import AgentDB
  16from nipux_cli.tools import DEFAULT_REGISTRY
  17
  18
  19@dataclass(frozen=True)
  20class Check:
  21    name: str
  22    ok: bool
  23    detail: str
  24
  25    def as_dict(self) -> dict[str, Any]:
  26        return {"name": self.name, "ok": self.ok, "detail": self.detail}
  27
  28
  29def _check_writable_dir(path: Path) -> Check:
  30    try:
  31        path.mkdir(parents=True, exist_ok=True)
  32        probe = path / ".write-test"
  33        probe.write_text("ok", encoding="utf-8")
  34        probe.unlink()
  35        return Check("state_dir_writable", True, str(path))
  36    except OSError as exc:
  37        return Check("state_dir_writable", False, f"{path}: {exc}")
  38
  39
  40def _check_db(config: AppConfig) -> Check:
  41    try:
  42        db = AgentDB(config.runtime.state_db_path)
  43        db.close()
  44        return Check("sqlite", True, str(config.runtime.state_db_path))
  45    except Exception as exc:
  46        return Check("sqlite", False, str(exc))
  47
  48
  49def _check_tool_surface() -> Check:
  50    names = DEFAULT_REGISTRY.names()
  51    forbidden = sorted({"terminal", "delegate_task", "skill_manage", "image_generate"} & set(names))
  52    if forbidden:
  53        return Check("tool_surface", False, f"forbidden tools exposed: {', '.join(forbidden)}")
  54    return Check("tool_surface", True, f"{len(names)} tools: {', '.join(names)}")
  55
  56
  57def _check_model_config(config: AppConfig) -> Check:
  58    base_url = config.model.base_url
  59    host = (urlparse(base_url).hostname or "").lower()
  60    local_hosts = {"", "localhost", "127.0.0.1", "::1", "0.0.0.0"}
  61    if host in local_hosts or host.endswith(".local"):
  62        return Check("model_config", True, f"{config.model.model} at {base_url}")
  63    if config.model.api_key:
  64        return Check("model_config", True, f"{config.model.model} at {base_url}; key read from {config.model.api_key_env}")
  65    return Check(
  66        "model_config",
  67        False,
  68        f"{config.model.api_key_env} is not set for remote endpoint {base_url}; put it in the shell or ~/.nipux/.env",
  69    )
  70
  71
  72def _check_browser_runtime() -> Check:
  73    direct = shutil.which("agent-browser")
  74    if direct:
  75        return Check("browser_runtime", True, f"agent-browser: {direct}")
  76    npx = shutil.which("npx")
  77    if npx:
  78        return Check("browser_runtime", True, f"agent-browser available through npx fallback: {npx}")
  79    return Check(
  80        "browser_runtime",
  81        False,
  82        "agent-browser not found and npx is unavailable; install with: npm install -g agent-browser && agent-browser install",
  83    )
  84
  85
  86def _check_model_endpoint(config: AppConfig) -> Check:
  87    if "openrouter.ai" in config.model.base_url and not config.model.api_key:
  88        return Check("model_endpoint", False, "API key is not set")
  89    auth = _check_openrouter_auth(config)
  90    if auth is not None and not auth.ok:
  91        return auth
  92    url = config.model.base_url.rstrip("/") + "/models"
  93    request = urllib.request.Request(url, headers={"Authorization": f"Bearer {config.model.api_key or 'local-no-key'}"})
  94    try:
  95        with urllib.request.urlopen(request, timeout=5) as response:
  96            payload = response.read(512_000).decode("utf-8", errors="replace")
  97        try:
  98            data = json.loads(payload)
  99            count = len(data.get("data", [])) if isinstance(data, dict) else "unknown"
 100            available = _model_available(data, config.model.model)
 101            if available is False:
 102                return Check("model_endpoint", False, f"{config.model.model} not found at {url}; models={count}")
 103            generation = _check_model_generation(config)
 104            if not generation.ok:
 105                return generation
 106            return Check("model_endpoint", True, f"{url} returned models={count}; {config.model.model} available; generation accepted")
 107        except json.JSONDecodeError:
 108            generation = _check_model_generation(config)
 109            if not generation.ok:
 110                return generation
 111            return Check("model_endpoint", True, f"{url} responded; generation accepted")
 112    except (urllib.error.URLError, TimeoutError, OSError) as exc:
 113        return Check("model_endpoint", False, f"{url}: {exc}")
 114
 115
 116def _check_model_generation(config: AppConfig) -> Check:
 117    url = config.model.base_url.rstrip("/") + "/chat/completions"
 118    payload = {
 119        "model": config.model.model,
 120        "messages": [{"role": "user", "content": "Reply with exactly: ok"}],
 121        "max_tokens": 8,
 122        "temperature": 0,
 123        "tools": [
 124            {
 125                "type": "function",
 126                "function": {
 127                    "name": "noop",
 128                    "description": "No-op model readiness probe.",
 129                    "parameters": {
 130                        "type": "object",
 131                        "properties": {"reason": {"type": "string"}},
 132                        "required": ["reason"],
 133                    },
 134                },
 135            }
 136        ],
 137    }
 138    request = urllib.request.Request(
 139        url,
 140        data=json.dumps(payload).encode("utf-8"),
 141        headers={
 142            "Authorization": f"Bearer {config.model.api_key or 'local-no-key'}",
 143            "Content-Type": "application/json",
 144        },
 145        method="POST",
 146    )
 147    try:
 148        with urllib.request.urlopen(request, timeout=15) as response:
 149            body = response.read(64_000).decode("utf-8", errors="replace")
 150        data = json.loads(body)
 151        choices = data.get("choices") if isinstance(data, dict) else None
 152        if isinstance(choices, list) and choices:
 153            return Check("model_generation", True, f"{url} accepted chat/tool request")
 154        return Check("model_generation", False, f"{url} returned no choices")
 155    except urllib.error.HTTPError as exc:
 156        body = exc.read(2048).decode("utf-8", errors="replace")
 157        detail = _extract_error_message(body) or str(exc)
 158        return Check("model_generation", False, f"{url}: {detail}")
 159    except (json.JSONDecodeError, urllib.error.URLError, TimeoutError, OSError) as exc:
 160        return Check("model_generation", False, f"{url}: {exc}")
 161
 162
 163def _check_openrouter_auth(config: AppConfig) -> Check | None:
 164    if "openrouter.ai" not in config.model.base_url:
 165        return None
 166    if not config.model.api_key:
 167        return Check("model_auth", False, "OpenRouter API key is not set")
 168    url = "https://openrouter.ai/api/v1/key"
 169    request = urllib.request.Request(url, headers={"Authorization": f"Bearer {config.model.api_key}"})
 170    try:
 171        with urllib.request.urlopen(request, timeout=5) as response:
 172            response.read(2048)
 173        return Check("model_auth", True, "OpenRouter API key accepted")
 174    except urllib.error.HTTPError as exc:
 175        body = exc.read(512).decode("utf-8", errors="replace")
 176        detail = _extract_error_message(body) or str(exc)
 177        return Check("model_auth", False, f"OpenRouter rejected API key: {detail}")
 178    except (urllib.error.URLError, TimeoutError, OSError) as exc:
 179        return Check("model_auth", False, f"{url}: {exc}")
 180
 181
 182def _extract_error_message(body: str) -> str:
 183    try:
 184        data = json.loads(body)
 185    except json.JSONDecodeError:
 186        return body.strip()
 187    error = data.get("error") if isinstance(data, dict) else None
 188    if isinstance(error, dict):
 189        message = str(error.get("message") or "").strip()
 190        code = str(error.get("code") or "").strip()
 191        metadata = error.get("metadata")
 192        raw = _extract_error_raw(metadata)
 193        provider = _metadata_value(metadata, "provider_name")
 194        byok = _metadata_value(metadata, "is_byok")
 195
 196        primary = message or code
 197        if raw and raw != primary:
 198            primary = f"{primary}: {raw}" if primary else raw
 199        details = []
 200        if code and code not in primary:
 201            details.append(f"code={code}")
 202        if provider:
 203            details.append(f"provider={provider}")
 204        if byok not in {"", None}:
 205            details.append(f"byok={byok}")
 206        if details:
 207            primary = f"{primary} ({'; '.join(details)})" if primary else "; ".join(details)
 208        return primary.strip()
 209    return ""
 210
 211
 212def _metadata_value(metadata: Any, key: str) -> str:
 213    if not isinstance(metadata, dict):
 214        return ""
 215    value = metadata.get(key)
 216    if value is None:
 217        return ""
 218    return str(value).strip()
 219
 220
 221def _extract_error_raw(metadata: Any) -> str:
 222    if not isinstance(metadata, dict):
 223        return ""
 224    raw = metadata.get("raw")
 225    if raw is None:
 226        return ""
 227    if isinstance(raw, dict):
 228        return _extract_error_message(json.dumps(raw))
 229    raw_text = str(raw).strip()
 230    if not raw_text:
 231        return ""
 232    try:
 233        raw_json = json.loads(raw_text)
 234    except json.JSONDecodeError:
 235        return raw_text
 236    if isinstance(raw_json, dict):
 237        nested = _extract_error_message(json.dumps(raw_json))
 238        return nested or raw_text
 239    return raw_text
 240
 241
 242def _model_available(data: Any, model: str) -> bool | None:
 243    if not isinstance(data, dict) or not isinstance(data.get("data"), list):
 244        return None
 245    ids = {str(item.get("id") or "") for item in data["data"] if isinstance(item, dict)}
 246    return model in ids
 247
 248
 249def run_doctor(*, config: AppConfig | None = None, check_model: bool = False) -> list[Check]:
 250    config = config or load_config()
 251    checks = [
 252        _check_writable_dir(config.runtime.home),
 253        _check_db(config),
 254        _check_model_config(config),
 255        _check_tool_surface(),
 256        _check_browser_runtime(),
 257    ]
 258    if check_model:
 259        checks.append(_check_model_endpoint(config))
 260    return checks
nipux_cli/event_render.py 118 lines
   1"""Readable event rendering shared by CLI history and chat context."""
   2
   3from __future__ import annotations
   4
   5import shlex
   6from typing import Any
   7
   8from nipux_cli.metric_format import format_metric_value
   9from nipux_cli.tui_event_format import clean_step_summary, generic_display_text
  10from nipux_cli.tui_style import _one_line
  11
  12
  13def event_line(event: dict[str, Any], *, chars: int, full: bool = False) -> str:
  14    when, label, detail, access = event_display_parts(event, chars=chars, full=full)
  15    suffix = f" | {access}" if access and full else ""
  16    return f"{when:<16} {label:<8} {_one_line(detail + suffix, chars)}"
  17
  18
  19def event_display_parts(event: dict[str, Any], *, chars: int, full: bool = False) -> tuple[str, str, str, str]:
  20    when = compact_time(str(event.get("created_at") or "?"))
  21    kind = str(event.get("event_type") or "event")
  22    title = str(event.get("title") or "").strip()
  23    body = generic_display_text(event.get("body") or "")
  24    ref_table = str(event.get("ref_table") or "")
  25    ref_id = str(event.get("ref_id") or "")
  26    metadata = event.get("metadata") if isinstance(event.get("metadata"), dict) else {}
  27    label = event_label(kind, metadata)
  28    access = ""
  29    if kind == "tool_result" and metadata.get("status"):
  30        label = event_label(f"{kind}:{metadata.get('status')}", metadata)
  31    if kind == "error":
  32        label = "ERROR"
  33    if kind.startswith("tool_result") or kind == "error":
  34        body = clean_step_summary(body)
  35    if kind == "artifact":
  36        title = title or ref_id
  37        if body.startswith("/") or "/.nipux/jobs/" in body or "/jobs/job_" in body:
  38            body = generic_display_text(metadata.get("summary") or "saved output")
  39        if title:
  40            access = f"open: /artifact {shlex.quote(title)}"
  41    if kind == "operator_message" and metadata.get("mode"):
  42        title = f"{title or 'operator'} {metadata.get('mode')}"
  43    if kind == "operator_context":
  44        body = body or f"{metadata.get('count') or 0} message(s)"
  45    if kind in {"tool_call", "tool_result", "error"} and metadata.get("step_no"):
  46        title = f"#{metadata.get('step_no')} {title}".strip()
  47    if not body and kind == "artifact" and metadata.get("path"):
  48        body = str(metadata.get("type") or "saved artifact")
  49    if not body and kind == "finding" and metadata.get("category"):
  50        body = str(metadata.get("category") or "")
  51    if not body and kind == "task" and metadata.get("status"):
  52        body = str(metadata.get("status") or "")
  53    if not body and kind == "roadmap" and metadata.get("status"):
  54        body = str(metadata.get("status") or "")
  55    if not body and kind == "milestone_validation" and metadata.get("validation_status"):
  56        body = str(metadata.get("validation_status") or "")
  57    if not body and kind == "experiment":
  58        metric_value = metadata.get("metric_value")
  59        if metric_value is not None:
  60            body = format_metric_value(
  61                metadata.get("metric_name") or "metric",
  62                metric_value,
  63                metadata.get("metric_unit") or "",
  64            )
  65    if kind == "compaction":
  66        body = _one_line(body, min(chars, 140))
  67    if kind == "daemon" and title == "run started":
  68        body = body or str(metadata.get("model") or "")
  69    detail = title if title else kind
  70    if body:
  71        detail = f"{detail} - {body}"
  72    if ref_table and ref_id and full:
  73        detail = f"{detail} [{ref_table}:{ref_id}]"
  74    return when, label, detail, access
  75
  76
  77def event_label(kind: str, metadata: dict[str, Any]) -> str:
  78    if kind == "operator_message":
  79        mode = str(metadata.get("mode") or "")
  80        return "FOLLOW" if mode == "follow_up" else "USER"
  81    if kind == "operator_context":
  82        return "ACK"
  83    if kind == "agent_message":
  84        return "AGENT"
  85    if kind == "roadmap":
  86        return "ROAD"
  87    if kind == "milestone_validation":
  88        return "VALID"
  89    if kind == "tool_call":
  90        return "TOOL"
  91    if kind.startswith("tool_result"):
  92        status = str(metadata.get("status") or "")
  93        if status == "blocked":
  94            return "BLOCK"
  95        if status == "failed" or kind.endswith(":failed"):
  96            return "ERROR"
  97        return "DONE"
  98    labels = {
  99        "artifact": "OUTPUT",
 100        "compaction": "MEMORY",
 101        "daemon": "SYSTEM",
 102        "digest": "DIGEST",
 103        "error": "ERROR",
 104        "experiment": "TEST",
 105        "finding": "FIND",
 106        "lesson": "LEARN",
 107        "reflection": "PLAN",
 108        "source": "SOURCE",
 109        "task": "TASK",
 110    }
 111    return labels.get(kind, kind.upper()[:8])
 112
 113
 114def compact_time(value: str) -> str:
 115    text = value.replace("T", " ")
 116    if len(text) >= 16 and text[4:5] == "-" and text[13:14] == ":":
 117        return text[:16]
 118    return _one_line(text, 16)
nipux_cli/first_run_controller.py 170 lines
   1"""First-run command decisions for the Nipux TUI."""
   2
   3from __future__ import annotations
   4
   5import shlex
   6from contextlib import redirect_stdout
   7from dataclasses import dataclass
   8from io import StringIO
   9from typing import Callable
  10
  11from nipux_cli.settings import config_field_value
  12from nipux_cli.tui_commands import CHAT_SETTING_COMMANDS
  13from nipux_cli.frame_snapshot import WORKSPACE_CHAT_ID
  14
  15
  16TOGGLE_SETTING_COMMANDS = {
  17    "tools.browser": "browser",
  18    "tools.web": "web",
  19    "tools.shell": "cli-access",
  20    "tools.files": "file-access",
  21}
  22
  23
  24@dataclass(frozen=True)
  25class FirstRunFrameDeps:
  26    capture_command: Callable[[str], list[str]]
  27    capture_setting_command: Callable[[str], list[str]]
  28    create_job: Callable[..., tuple[str, str]]
  29    current_default_job_id: Callable[[], str | None]
  30    extract_objective: Callable[[str], str]
  31    model_setup_verified: Callable[[], bool]
  32    verify_model_setup: Callable[[], list[str]]
  33    shell_command_names: set[str]
  34
  35
  36def handle_first_run_action(action: str, *, deps: FirstRunFrameDeps) -> tuple[str, str | list[str] | None]:
  37    if action == "open_workspace" and not deps.model_setup_verified():
  38        return "notice", "Run Doctor first. The workspace opens only after the configured model accepts a chat request."
  39    if action == "open_workspace":
  40        return "open", WORKSPACE_CHAT_ID
  41    if action.startswith("view:"):
  42        return "view", action.split(":", 1)[1]
  43    if action == "preset:local":
  44        notices = [
  45            *deps.capture_setting_command("model local-model"),
  46            *deps.capture_setting_command("base-url http://localhost:8000/v1"),
  47            *deps.capture_setting_command("api-key-env OPENAI_API_KEY"),
  48            "Local connector selected. Start your OpenAI-compatible server, then run Doctor.",
  49        ]
  50        return "notice", notices
  51    if action.startswith("toggle:"):
  52        field = action.split(":", 1)[1]
  53        command = TOGGLE_SETTING_COMMANDS.get(field)
  54        if not command:
  55            return "notice", f"Unknown setup toggle: {field}"
  56        next_value = "false" if bool(config_field_value(field)) else "true"
  57        return "notice", deps.capture_setting_command(f"{command} {next_value}")
  58    if action.startswith("edit:"):
  59        return "edit", action.split(":", 1)[1]
  60    if action.startswith("secret:"):
  61        return "edit", action
  62    if action == "new":
  63        return "notice", "Finish setup first. Then describe worker jobs in the chat workspace."
  64    if action == "back":
  65        return "view", "endpoint"
  66    if action == "jobs":
  67        return "notice", "Finish setup first. Jobs are available after Doctor verifies the configured model."
  68    if action == "doctor":
  69        notices = deps.verify_model_setup()
  70        if deps.model_setup_verified():
  71            return "open", WORKSPACE_CHAT_ID
  72        return "notice", notices
  73    if action == "init":
  74        return "notice", deps.capture_command("init")
  75    if action == "exit":
  76        return "exit", None
  77    return "notice", f"Unknown action: {action}"
  78
  79
  80def handle_first_run_frame_line(line: str, *, deps: FirstRunFrameDeps) -> tuple[str, str | list[str] | None]:
  81    original = line.strip()
  82    if original.startswith("/"):
  83        original = original[1:].strip()
  84    lowered = original.lower()
  85    if lowered in {"exit", "quit", ":q", "5"}:
  86        return "exit", None
  87    if lowered in {"clear"}:
  88        return "clear", None
  89    if lowered in {"help", "?", "commands"}:
  90        return "notice", [
  91            "Finish setup before chat or jobs are available.",
  92            "Enter endpoint, API key, and model id when prompted.",
  93            "Doctor must verify the configured model before the workspace opens.",
  94        ]
  95    if lowered in {"1", "new"}:
  96        return "notice", "Finish setup first. Then tell Nipux what job to create from the chat workspace."
  97    if lowered.startswith("new "):
  98        return "notice", "Finish setup first. Then describe worker jobs in the chat workspace."
  99    if lowered in {"2", "jobs", "ls"}:
 100        return "notice", "Finish setup first. Jobs are available after Doctor verifies the configured model."
 101    if lowered == "settings":
 102        return "notice", "Config is changed with slash commands: /model, /api-key, /base-url, /context."
 103    if lowered in {"back"}:
 104        return "notice", "Setup is linear during first run. Continue forward, then edit settings later if needed."
 105    if lowered in {"3", "doctor"}:
 106        notices = deps.verify_model_setup()
 107        if deps.model_setup_verified():
 108            return "open", WORKSPACE_CHAT_ID
 109        return "notice", notices
 110    if lowered in {"4", "init"}:
 111        return "notice", deps.capture_command("init")
 112    if lowered == "shell":
 113        return "notice", "The old console is only available as `nipux shell` from your terminal."
 114    first = first_token(original)
 115    if first == "shell":
 116        return "notice", "The old console is only available as `nipux shell` from your terminal."
 117    if first in {"create", "new"}:
 118        return "notice", "Finish setup first. Then describe worker jobs in the chat workspace."
 119    if first in CHAT_SETTING_COMMANDS or first in {"api-key", "key"}:
 120        return "notice", deps.capture_setting_command(original)
 121    if first in deps.shell_command_names:
 122        before_job_id = deps.current_default_job_id()
 123        output = deps.capture_command(original)
 124        after_job_id = deps.current_default_job_id()
 125        if first == "create" and after_job_id and after_job_id != before_job_id:
 126            return "open", after_job_id
 127        return "notice", output
 128    objective = deps.extract_objective(original)
 129    if objective:
 130        return "notice", "Finish setup first. Then describe worker jobs in the chat workspace."
 131    return "notice", first_run_chat_reply(original)
 132
 133
 134def first_run_chat_reply(message: str) -> str:
 135    del message
 136    return "Setup must be completed before chat is available."
 137
 138
 139def create_first_run_job(objective: str, *, deps: FirstRunFrameDeps) -> str | list[str]:
 140    objective = objective.strip()
 141    if not objective:
 142        return ["No job created. Type an objective first."]
 143    if not deps.model_setup_verified():
 144        return [
 145            "No job created.",
 146            "Finish model setup first: choose a connector, set the endpoint/key if needed, then run Doctor.",
 147            "Doctor must confirm that the configured model accepts a chat request.",
 148        ]
 149    job_id, _title = deps.create_job(objective=objective, title=None, kind="generic", cadence=None)
 150    return job_id
 151
 152
 153def capture_first_run_command(line: str, run_shell_line: Callable[[str], bool]) -> list[str]:
 154    stream = StringIO()
 155    with redirect_stdout(stream):
 156        try:
 157            run_shell_line(line)
 158        except SystemExit as exc:
 159            if exc.code not in (None, 0):
 160                print(f"command exited with status {exc.code}")
 161    lines = [" ".join(item.split()) for item in stream.getvalue().splitlines() if item.strip()]
 162    return lines[-8:] or ["done"]
 163
 164
 165def first_token(line: str) -> str:
 166    try:
 167        parts = shlex.split(line)
 168    except ValueError:
 169        parts = line.split()
 170    return parts[0].lower() if parts else ""
nipux_cli/first_run_frame_runtime.py 436 lines
   1"""Terminal runtime for the first-run Nipux workspace."""
   2
   3from __future__ import annotations
   4
   5import select
   6import shutil
   7import sys
   8import termios
   9import time
  10import tty
  11from dataclasses import dataclass
  12from typing import Callable
  13from urllib.parse import urlparse
  14
  15from nipux_cli.config import load_config
  16from nipux_cli.settings import inline_setting_notice
  17from nipux_cli.tui_commands import FIRST_RUN_SLASH_COMMANDS, autocomplete_slash, cycle_slash, slash_completion_for_submit
  18from nipux_cli.tui_input import (
  19    decode_terminal_escape,
  20    drain_pending_input,
  21    read_escape_sequence,
  22    read_terminal_char,
  23)
  24from nipux_cli.tui_style import _frame_enter_sequence, _frame_exit_sequence
  25
  26
  27@dataclass(frozen=True)
  28class FirstRunRuntimeDeps:
  29    render_frame: Callable[[str, list[str], int, str, str | None, str], str]
  30    actions: Callable[[str], list[tuple[str, str, str]]]
  31    handle_action: Callable[[str], tuple[str, str | list[str] | None]]
  32    handle_line: Callable[[str], tuple[str, str | list[str] | None]]
  33    click_action: Callable[[int, int, str], int | str | None]
  34
  35
  36def run_first_run_frame(*, deps: FirstRunRuntimeDeps) -> str | None:
  37    buffer = ""
  38    notices: list[str] = []
  39    next_job_id: str | None = None
  40    view = "endpoint"
  41    selected = 0
  42    editing_field: str | None = required_first_run_edit_field(view)
  43    old_attrs = termios.tcgetattr(sys.stdin)
  44    print(_frame_enter_sequence(), end="", flush=True)
  45    try:
  46        stdin_fd = sys.stdin.fileno()
  47        tty.setcbreak(stdin_fd)
  48        needs_render = True
  49        last_render = 0.0
  50        last_frame = ""
  51        while next_job_id is None:
  52            now = time.monotonic()
  53            if needs_render or now - last_render >= 1.0:
  54                selected = clamp_selection(selected, deps.actions(view))
  55                last_frame = _safe_render_frame(
  56                    deps,
  57                    buffer=buffer,
  58                    notices=notices,
  59                    selected=selected,
  60                    view=view,
  61                    editing_field=editing_field,
  62                    previous_frame=last_frame,
  63                )
  64                needs_render = False
  65                last_render = now
  66            try:
  67                readable, _, _ = select.select([stdin_fd], [], [], 0.05)
  68            except OSError as exc:
  69                _append_notice(notices, f"terminal read failed: {type(exc).__name__}: {_one_line(exc, 90)}")
  70                needs_render = True
  71                continue
  72            if not readable:
  73                continue
  74            try:
  75                char = read_terminal_char(stdin_fd)
  76            except OSError as exc:
  77                _append_notice(notices, f"terminal input failed: {type(exc).__name__}: {_one_line(exc, 90)}")
  78                needs_render = True
  79                continue
  80            if editing_field is not None:
  81                previous_edit = editing_field
  82                try:
  83                    buffer, editing_field, should_exit = _handle_edit_input(
  84                        char,
  85                        buffer=buffer,
  86                        editing_field=editing_field,
  87                        notices=notices,
  88                        stdin_fd=stdin_fd,
  89                    )
  90                except Exception as exc:
  91                    buffer = ""
  92                    editing_field = None
  93                    _append_notice(notices, f"edit failed: {type(exc).__name__}: {_one_line(exc, 90)}")
  94                    needs_render = True
  95                    continue
  96                if should_exit:
  97                    return None
  98                if previous_edit and editing_field is None:
  99                    next_view = next_first_run_view_after_edit(view)
 100                    if next_view:
 101                        view = next_view
 102                        selected = 0
 103                        editing_field = required_first_run_edit_field(view)
 104                needs_render = True
 105                continue
 106            if char in {"\r", "\n"}:
 107                buffer, should_submit = slash_completion_for_submit(buffer, FIRST_RUN_SLASH_COMMANDS)
 108                if not should_submit:
 109                    needs_render = True
 110                    continue
 111                try:
 112                    action, payload = _submit_first_run_line(buffer, selected=selected, view=view, deps=deps)
 113                except Exception as exc:
 114                    action, payload = "notice", f"input failed: {type(exc).__name__}: {_one_line(exc, 100)}"
 115                buffer = ""
 116                try:
 117                    state = _apply_first_run_action(action, payload, view=view, selected=selected, notices=notices)
 118                except Exception as exc:
 119                    _append_notice(notices, f"action failed: {type(exc).__name__}: {_one_line(exc, 100)}")
 120                    state = (view, selected, None, None, False)
 121                view, selected, editing_field, next_job_id, should_exit = state
 122                editing_field = editing_field or required_first_run_edit_field(view)
 123                if should_exit:
 124                    return None
 125                needs_render = True
 126                continue
 127            if char in {"\x04"}:
 128                return None
 129            if char == "\x03":
 130                buffer = ""
 131                _append_notice(notices, "cancelled input")
 132                needs_render = True
 133                continue
 134            if char == "\x15":
 135                buffer = ""
 136                needs_render = True
 137                continue
 138            if char in {"\x7f", "\b"}:
 139                buffer = buffer[:-1]
 140                needs_render = True
 141                continue
 142            if char == "\t":
 143                try:
 144                    buffer = autocomplete_slash(buffer, FIRST_RUN_SLASH_COMMANDS)
 145                except Exception as exc:
 146                    _append_notice(notices, f"autocomplete failed: {type(exc).__name__}: {_one_line(exc, 90)}")
 147                needs_render = True
 148                continue
 149            if char == "\x1b":
 150                try:
 151                    view, selected, editing_field, next_job_id, should_exit, buffer = _handle_first_run_escape(
 152                        stdin_fd,
 153                        view=view,
 154                        selected=selected,
 155                        buffer=buffer,
 156                        notices=notices,
 157                        deps=deps,
 158                    )
 159                except Exception as exc:
 160                    _append_notice(notices, f"navigation failed: {type(exc).__name__}: {_one_line(exc, 90)}")
 161                    should_exit = False
 162                if should_exit:
 163                    return None
 164                editing_field = editing_field or required_first_run_edit_field(view)
 165                needs_render = True
 166                continue
 167            if char.isprintable():
 168                buffer += char
 169                needs_render = True
 170    except KeyboardInterrupt:
 171        return None
 172    finally:
 173        termios.tcsetattr(sys.stdin, termios.TCSADRAIN, old_attrs)
 174        print(_frame_exit_sequence(), flush=True)
 175    return next_job_id
 176
 177
 178def clamp_selection(selected: int, actions: list[tuple[str, str, str]]) -> int:
 179    if not actions:
 180        return 0
 181    return max(0, min(selected, len(actions) - 1))
 182
 183
 184def _safe_render_frame(
 185    deps: FirstRunRuntimeDeps,
 186    *,
 187    buffer: str,
 188    notices: list[str],
 189    selected: int,
 190    view: str,
 191    editing_field: str | None,
 192    previous_frame: str,
 193) -> str:
 194    try:
 195        return deps.render_frame(buffer, notices, selected, view, editing_field, previous_frame)
 196    except Exception as exc:
 197        _append_notice(notices, f"render failed: {type(exc).__name__}: {_one_line(exc, 100)}")
 198        frame = _fallback_first_run_frame(buffer=buffer, notices=notices, view=view)
 199        print("\033[H" + frame, end="", flush=True)
 200        return frame
 201
 202
 203def _fallback_first_run_frame(*, buffer: str, notices: list[str], view: str) -> str:
 204    width, height = shutil.get_terminal_size((100, 30))
 205    width = max(60, width)
 206    lines = [
 207        _fit_plain("NIPUX - setup safe mode", width),
 208        _fit_plain("=" * width, width),
 209        _fit_plain(f"Screen: {view}", width),
 210        _fit_plain("A UI render error was caught. You can keep typing; /exit leaves.", width),
 211        "",
 212        "Recent notices:",
 213    ]
 214    lines.extend(f"- {_one_line(notice, width - 3)}" for notice in notices[-8:])
 215    lines.extend(["", f"> {_one_line(buffer, width - 3)}"])
 216    return "\n".join(_fit_plain(line, width) for line in lines[:height])
 217
 218
 219def _submit_first_run_line(
 220    buffer: str,
 221    *,
 222    selected: int,
 223    view: str,
 224    deps: FirstRunRuntimeDeps,
 225) -> tuple[str, str | list[str] | None]:
 226    line = buffer.strip()
 227    if not line:
 228        actions = deps.actions(view)
 229        if not actions:
 230            return "notice", "This setup step requires an explicit value."
 231        return deps.handle_action(actions[clamp_selection(selected, actions)][0])
 232    if not line.startswith("/"):
 233        return "notice", "Complete the active setup field before continuing."
 234    return deps.handle_line(line)
 235
 236
 237def _handle_first_run_escape(
 238    stdin_fd: int,
 239    *,
 240    view: str,
 241    selected: int,
 242    buffer: str,
 243    notices: list[str],
 244    deps: FirstRunRuntimeDeps,
 245) -> tuple[str, int, str | None, str | None, bool, str]:
 246    key, payload = decode_terminal_escape(read_escape_sequence("\x1b", fd=stdin_fd))
 247    if key in {"up", "down"} and buffer.startswith("/"):
 248        buffer = cycle_slash(buffer, FIRST_RUN_SLASH_COMMANDS, direction=-1 if key == "up" else 1)
 249        return view, selected, None, None, False, buffer
 250    if key == "up":
 251        actions = deps.actions(view)
 252        if not actions:
 253            return view, selected, None, None, False, buffer
 254        return view, (selected - 1) % len(actions), None, None, False, buffer
 255    if key == "down":
 256        actions = deps.actions(view)
 257        if not actions:
 258            return view, selected, None, None, False, buffer
 259        return view, (selected + 1) % len(actions), None, None, False, buffer
 260    if key in {"left", "right"}:
 261        actions = deps.actions(view)
 262        if not actions:
 263            return view, selected, None, None, False, buffer
 264        delta = 1 if key == "right" else -1
 265        return view, (selected + delta) % len(actions), None, None, False, buffer
 266    if key == "click" and isinstance(payload, tuple):
 267        clicked = deps.click_action(payload[0], payload[1], view)
 268        if clicked is not None:
 269            if isinstance(clicked, str):
 270                action, payload = deps.handle_action(clicked)
 271                next_view, next_selected, editing_field, next_job_id, should_exit = _apply_first_run_action(
 272                    action,
 273                    payload,
 274                    view=view,
 275                    selected=selected,
 276                    notices=notices,
 277                )
 278                return next_view, next_selected, editing_field, next_job_id, should_exit, buffer
 279            actions = deps.actions(view)
 280            if not actions:
 281                return view, selected, None, None, False, buffer
 282            action, payload = deps.handle_action(actions[clamp_selection(clicked, actions)][0])
 283            next_view, next_selected, editing_field, next_job_id, should_exit = _apply_first_run_action(
 284                action,
 285                payload,
 286                view=view,
 287                selected=clicked,
 288                notices=notices,
 289            )
 290            return next_view, next_selected, editing_field, next_job_id, should_exit, buffer
 291    drain_pending_input(stdin_fd)
 292    return view, selected, None, None, False, buffer
 293
 294
 295def directional_first_run_action(actions: list[tuple[str, str, str]], *, direction: int) -> str | None:
 296    """Return the setup-screen action for left/right navigation."""
 297
 298    if direction >= 0:
 299        for key, label, _detail in actions:
 300            if key.startswith("view:") and label.lower() in {"begin setup", "continue"}:
 301                return key
 302        return None
 303    for key, label, _detail in reversed(actions):
 304        if key.startswith("view:") and label.lower() == "back":
 305            return key
 306    return None
 307
 308
 309def _apply_first_run_action(
 310    action: str,
 311    payload: str | list[str] | None,
 312    *,
 313    view: str,
 314    selected: int,
 315    notices: list[str],
 316) -> tuple[str, int, str | None, str | None, bool]:
 317    if action == "view":
 318        notices.clear()
 319        return str(payload or "start"), 0, None, None, False
 320    if action == "exit":
 321        return view, selected, None, None, True
 322    if action == "clear":
 323        notices.clear()
 324        return view, selected, None, None, False
 325    if action == "open":
 326        return view, selected, None, str(payload), False
 327    if action == "edit":
 328        editing_field = str(payload)
 329        _append_notice(notices, f"editing {editing_field}; enter saves, escape cancels", limit=10)
 330        return view, selected, editing_field, None, False
 331    if isinstance(payload, list):
 332        for item in payload:
 333            if str(item).strip():
 334                _append_notice(notices, str(item), limit=10)
 335    elif payload:
 336        _append_notice(notices, str(payload), limit=10)
 337    return view, selected, None, None, False
 338
 339
 340def _handle_edit_input(
 341    char: str,
 342    *,
 343    buffer: str,
 344    editing_field: str,
 345    notices: list[str],
 346    stdin_fd: int,
 347) -> tuple[str, str | None, bool]:
 348    if char in {"\r", "\n"}:
 349        saved, notice = _save_first_run_edit(editing_field, buffer)
 350        _append_notice(notices, notice, limit=10)
 351        return ("", None, False) if saved else (buffer, editing_field, False)
 352    if char in {"\x04"}:
 353        return buffer, editing_field, True
 354    if char == "\x03":
 355        _append_notice(notices, "cancelled edit", limit=10)
 356        return "", editing_field, False
 357    if char == "\x15":
 358        return "", editing_field, False
 359    if char in {"\x7f", "\b"}:
 360        return buffer[:-1], editing_field, False
 361    if char == "\x1b":
 362        key, _payload = decode_terminal_escape(read_escape_sequence(char, fd=stdin_fd))
 363        if key == "unknown":
 364            _append_notice(notices, "cancelled edit", limit=10)
 365            return "", editing_field, False
 366        return buffer, editing_field, False
 367    if char.isprintable():
 368        return buffer + char, editing_field, False
 369    return buffer, editing_field, False
 370
 371
 372def required_first_run_edit_field(view: str) -> str | None:
 373    return {
 374        "endpoint": "model.base_url",
 375        "api": "secret:model.api_key",
 376        "model": "model.name",
 377    }.get(view)
 378
 379
 380def next_first_run_view_after_edit(view: str) -> str | None:
 381    return {
 382        "endpoint": "api",
 383        "api": "model",
 384        "model": "access",
 385    }.get(view)
 386
 387
 388def _save_first_run_edit(field: str, raw_value: str) -> tuple[bool, str]:
 389    value = raw_value.strip()
 390    if field == "model.base_url":
 391        if not value:
 392            return False, "Endpoint URL is required."
 393        parsed = urlparse(value)
 394        if parsed.scheme not in {"http", "https"} or not parsed.netloc:
 395            return False, "Endpoint must be a full http:// or https:// URL."
 396        if not parsed.path.rstrip("/").endswith("/v1"):
 397            return False, "Endpoint must point at an OpenAI-compatible /v1 path."
 398        return True, inline_setting_notice(field, value)
 399    if field == "model.name":
 400        if not value:
 401            return False, "Model id is required."
 402        return True, inline_setting_notice(field, value)
 403    if field == "secret:model.api_key":
 404        if not value:
 405            return False, "API key is required, or type skip for a local endpoint."
 406        if value.lower() in {"skip", "none", "local"}:
 407            config = load_config()
 408            if not _is_local_endpoint(config.model.base_url):
 409                return False, "Only local endpoints can skip the API key."
 410            return True, "skipped API key for local endpoint"
 411        return True, inline_setting_notice(field, value)
 412    return True, inline_setting_notice(field, value)
 413
 414
 415def _is_local_endpoint(value: str) -> bool:
 416    host = (urlparse(value).hostname or "").lower()
 417    return host in {"localhost", "127.0.0.1", "::1", "0.0.0.0"} or host.endswith(".local")
 418
 419
 420def _append_notice(notices: list[str], message: str, *, limit: int = 10) -> None:
 421    notices.append(message)
 422    notices[:] = notices[-limit:]
 423
 424
 425def _one_line(value: object, width: int) -> str:
 426    text = " ".join(str(value).split())
 427    if len(text) <= width:
 428        return text
 429    return text[: max(0, width - 3)] + "..."
 430
 431
 432def _fit_plain(text: object, width: int) -> str:
 433    content = str(text)
 434    if len(content) > width:
 435        content = _one_line(content, width)
 436    return content + " " * max(0, width - len(content))
nipux_cli/first_run_tui.py 571 lines
   1"""First-run terminal UI rendering for Nipux."""
   2
   3from __future__ import annotations
   4
   5from typing import Any
   6
   7from nipux_cli.config import AppConfig
   8from nipux_cli.settings import (
   9    config_field_value,
  10    edit_target_hint,
  11    edit_target_label,
  12    edit_target_masks_input,
  13)
  14from nipux_cli.tui_layout import _compose_bar
  15from nipux_cli.tui_style import (
  16    _accent,
  17    _bold,
  18    _center_ansi,
  19    _fit_ansi,
  20    _muted,
  21    _one_line,
  22    _style,
  23    _strip_ansi,
  24    _themed_lines,
  25)
  26
  27
  28INSTALL_FLOW = [
  29    ("endpoint", "Endpoint", "OpenAI-compatible /v1"),
  30    ("api", "API key", "secret stored in .env"),
  31    ("model", "Model", "choose the model id"),
  32    ("access", "Tools", "browser, web, CLI, files"),
  33    ("doctor", "Doctor", "check setup"),
  34]
  35
  36
  37FIRST_RUN_ACTIONS_BY_VIEW: dict[str, list[tuple[str, str, str]]] = {
  38    "endpoint": [],
  39    "api": [],
  40    "model": [],
  41    "access": [
  42        ("toggle:tools.browser", "Browser", "automation"),
  43        ("toggle:tools.web", "Web", "search/extract"),
  44        ("toggle:tools.shell", "CLI", "terminal commands"),
  45        ("toggle:tools.files", "Files", "write files"),
  46        ("view:doctor", "Continue", "run checks"),
  47    ],
  48    "doctor": [
  49        ("doctor", "Run doctor", "verify setup"),
  50        ("open_workspace", "Open chat", "talk to Nipux"),
  51    ],
  52}
  53
  54
  55def build_first_run_frame(
  56    input_buffer: str,
  57    notices: list[str],
  58    *,
  59    width: int,
  60    height: int,
  61    selected: int = 0,
  62    view: str = "start",
  63    editing_field: str | None = None,
  64    config: AppConfig,
  65    daemon_text: str,
  66    jobs: list[dict[str, Any]],
  67    home: str,
  68    config_path: str,
  69) -> str:
  70    del daemon_text
  71    width = max(92, width)
  72    height = max(22, height)
  73    view = _normalize_first_run_view(view)
  74    selected = _clamp_first_run_selection(selected, view)
  75    header: list[str] = []
  76    if editing_field:
  77        hint = _first_run_edit_hint(editing_field, config)
  78        prompt_label = _first_run_prompt_label(editing_field)
  79    else:
  80        hint = _first_run_hint(view)
  81        prompt_label = "Setup"
  82    suggestions: list[str] = []
  83    compose_lines = _compose_bar(
  84        input_buffer,
  85        width=width,
  86        hint=hint,
  87        suggestions=suggestions,
  88        prompt_label=prompt_label,
  89        title="setup",
  90        mask_input=edit_target_masks_input(editing_field),
  91    )
  92    footer_rows = len(compose_lines)
  93    body_rows = max(10, height - len(header) - 1 - footer_rows)
  94    body_lines = _wizard_body_lines(
  95        notices=notices,
  96        jobs=jobs,
  97        config=config,
  98        home=home,
  99        config_path=config_path,
 100        selected=selected,
 101        view=view,
 102        width=width,
 103        rows=body_rows,
 104    )
 105    lines = [*header, *body_lines, *compose_lines]
 106    return "\n".join(first_run_themed_lines(lines[:height], width=width))
 107
 108
 109def first_run_columns(width: int) -> tuple[int, int]:
 110    right_width = min(max(40, int(width * 0.34)), 54)
 111    left_width = max(48, width - right_width - 3)
 112    if left_width < 48:
 113        left_width = 48
 114        right_width = max(36, width - left_width - 3)
 115    return left_width, right_width
 116
 117
 118def first_run_actions(view: str) -> list[tuple[str, str, str]]:
 119    return FIRST_RUN_ACTIONS_BY_VIEW[_normalize_first_run_view(view)]
 120
 121
 122def first_run_themed_lines(lines: list[str], *, width: int) -> list[str]:
 123    return _themed_lines(lines, width=width)
 124
 125
 126def _wizard_body_lines(
 127    *,
 128    notices: list[str],
 129    jobs: list[dict[str, Any]],
 130    config: AppConfig,
 131    home: str,
 132    config_path: str,
 133    selected: int,
 134    view: str,
 135    width: int,
 136    rows: int,
 137) -> list[str]:
 138    if view == "model":
 139        lines = _model_page_lines(config=config, selected=selected, width=width)
 140    elif view == "endpoint":
 141        lines = _endpoint_page_lines(config=config, selected=selected, width=width)
 142    elif view == "api":
 143        lines = _api_page_lines(config=config, selected=selected, width=width)
 144    elif view == "access":
 145        lines = _access_page_lines(config=config, selected=selected, width=width)
 146    elif view == "doctor":
 147        lines = _doctor_page_lines(config=config, selected=selected, width=width)
 148    else:
 149        lines = _endpoint_page_lines(config=config, selected=selected, width=width)
 150    if notices:
 151        lines = _append_notice_block(lines, notices, width=width, rows=rows)
 152    return _fit_page(lines, width=width, rows=rows)
 153
 154
 155def _model_page_lines(*, config: AppConfig, selected: int, width: int) -> list[str]:
 156    return [
 157        *_step_header("model", width=width),
 158        "",
 159        _center_ansi(_muted(_step_count_label("model")), width),
 160        _center_ansi(_bold("Enter the model id"), width),
 161        _center_ansi(_muted("This exact model powers chat replies and background workers."), width),
 162        "",
 163        *_panel(
 164            "MODEL ID",
 165            [
 166                _bold(_accent("waiting for input")),
 167                _muted(f"current config: {_one_line(config.model.model, max(16, width - 30))}"),
 168                _muted("Type the model id below. Blank input is not accepted."),
 169            ],
 170            width=min(84, width - 8),
 171            page_width=width,
 172        ),
 173        "",
 174        _center_ansi(_muted("Press Enter after typing the model. Setup moves forward automatically."), width),
 175    ]
 176
 177
 178def _endpoint_page_lines(*, config: AppConfig, selected: int, width: int) -> list[str]:
 179    return [
 180        *_step_header("endpoint", width=width),
 181        "",
 182        _center_ansi(_muted(_step_count_label("endpoint")), width),
 183        _center_ansi(_bold("Enter the endpoint first"), width),
 184        _center_ansi(_muted("Use an OpenAI-compatible /v1 endpoint. Local or hosted both work."), width),
 185        "",
 186        *_form_panel(
 187            "BASE URL",
 188            f"waiting for input · current config: {config.model.base_url}",
 189            "required",
 190            width=min(90, width - 8),
 191            page_width=width,
 192        ),
 193        "",
 194        _center_ansi(_muted("Example formats: http://localhost:8000/v1 or https://provider.example/v1"), width),
 195    ]
 196
 197
 198def _api_page_lines(*, config: AppConfig, selected: int, width: int) -> list[str]:
 199    key_state = "set" if config.model.api_key else "missing"
 200    key_color = _style(key_state, "32" if key_state == "set" else "33")
 201    return [
 202        *_step_header("api", width=width),
 203        "",
 204        _center_ansi(_muted(_step_count_label("api")), width),
 205        _center_ansi(_bold("Enter the API key"), width),
 206        _center_ansi(_muted("Hosted endpoints need a key. For a local endpoint, type skip."), width),
 207        "",
 208        *_panel(
 209            "API KEY",
 210            [
 211                f"{_muted('state')} {key_color}",
 212                f"{_muted('env')}   {_bold(config.model.api_key_env)}",
 213                _muted("Stored in the local Nipux env file, never in repository config."),
 214            ],
 215            width=min(84, width - 8),
 216            page_width=width,
 217        ),
 218        "",
 219        _center_ansi(_muted("Blank input is not accepted; type skip only when the endpoint is local."), width),
 220    ]
 221
 222
 223def _access_page_lines(*, config: AppConfig, selected: int, width: int) -> list[str]:
 224    rows = [
 225        _access_row("browser", config.tools.browser, "persistent browser automation"),
 226        _access_row("web", config.tools.web, "web search and page extraction"),
 227        _access_row("CLI", config.tools.shell, "bounded terminal commands"),
 228        _access_row("files", config.tools.files, "write deliverables into the workspace"),
 229    ]
 230    return [
 231        *_step_header("access", width=width),
 232        "",
 233        _center_ansi(_muted(_step_count_label("access")), width),
 234        _center_ansi(_bold("Choose tool access"), width),
 235        _center_ansi(_muted("These switches control the generic tools workers can call for any job."), width),
 236        "",
 237        *_panel("TOOL ACCESS", rows, width=min(90, width - 8), page_width=width),
 238        "",
 239        *_action_cards(first_run_actions("access"), selected=selected, config=config, width=width),
 240    ]
 241
 242
 243def _doctor_page_lines(*, config: AppConfig, selected: int, width: int) -> list[str]:
 244    checks = [
 245        ("state directory", "writable under ~/.nipux or NIPUX_HOME"),
 246        ("database", "SQLite state store can open"),
 247        ("model config", f"{config.model.model} at {config.model.base_url}"),
 248        (
 249            "tools",
 250            f"browser={config.tools.browser} web={config.tools.web} CLI={config.tools.shell} files={config.tools.files}",
 251        ),
 252    ]
 253    rows = [f"{_accent('✓')} {_fit_ansi(name, 18)} {_muted(detail)}" for name, detail in checks]
 254    return [
 255        *_step_header("doctor", width=width),
 256        "",
 257        _center_ansi(_muted(_step_count_label("doctor")), width),
 258        _center_ansi(_bold("Run checks"), width),
 259        _center_ansi(_muted("Doctor calls the configured model before the workspace opens."), width),
 260        "",
 261        *_panel("DOCTOR", rows, width=min(90, width - 8), page_width=width),
 262        "",
 263        _center_ansi(_muted("If a check fails, edit with /base-url, /api-key, or /model, then run Doctor again."), width),
 264        "",
 265        *_action_cards(first_run_actions("doctor"), selected=selected, config=config, width=width),
 266    ]
 267
 268
 269def _stepper_lines(view: str, *, config: AppConfig, width: int) -> list[str]:
 270    lines: list[str] = []
 271    for key, label, _detail in INSTALL_FLOW:
 272        marker = _accent("●") if key == view else _muted("○")
 273        state = _step_state(key, config=config)
 274        lines.append(_fit_ansi(f"{marker} {_fit_ansi(label, 10)} {_muted(state)}", width))
 275    return lines
 276
 277
 278def _step_header(view: str, *, width: int) -> list[str]:
 279    parts = []
 280    for index, (key, label, _detail) in enumerate(INSTALL_FLOW, start=1):
 281        marker = _accent("●") if key == view else _muted("○")
 282        text = _bold(label) if key == view else _muted(label)
 283        parts.append(f"{marker} {index} {text}")
 284    return [
 285        _center_ansi("   ".join(parts), width),
 286        _muted("─" * width),
 287    ]
 288
 289
 290def _action_cards(
 291    actions: list[tuple[str, str, str]],
 292    *,
 293    selected: int,
 294    config: AppConfig,
 295    width: int,
 296) -> list[str]:
 297    if not actions:
 298        return []
 299    gap = 2
 300    card_width = max(18, min(34, (width - (len(actions) - 1) * gap - 4) // len(actions)))
 301    cards = [_action_tile(index, action, selected=selected, config=config, width=card_width) for index, action in enumerate(actions)]
 302    rows = _join_many_cards(cards, gap=gap, width=width)
 303    return [_center_ansi(row.rstrip(), width) for row in rows]
 304
 305
 306def _action_tile(
 307    index: int,
 308    action: tuple[str, str, str],
 309    *,
 310    selected: int,
 311    config: AppConfig,
 312    width: int,
 313) -> list[str]:
 314    key, label, detail = action
 315    active = index == selected
 316    border = _accent if active else _muted
 317    marker = _accent("›") if active else _muted(" ")
 318    label_text = _bold(label) if active else label
 319    value = _action_value(key, detail, config=config)
 320    inner = max(8, width - 4)
 321    return [
 322        border("╭" + "─" * (width - 2) + "╮"),
 323        border("│ ") + _fit_ansi(f"{marker} {index + 1}. {label_text}", inner) + border(" │"),
 324        border("│ ") + _fit_ansi(_muted(_one_line(value, inner)), inner) + border(" │"),
 325        border("╰" + "─" * (width - 2) + "╯"),
 326    ]
 327
 328
 329def _panel(title: str, body: list[str], *, width: int, page_width: int | None = None) -> list[str]:
 330    width = max(32, width)
 331    inner = max(8, width - 4)
 332    title_text = f" {title} "
 333    lines = [_muted("╭─" + title_text + "─" * max(0, width - len(title_text) - 3) + "╮")]
 334    for item in body:
 335        lines.append(_muted("│ ") + _fit_ansi(item, inner) + _muted(" │"))
 336    lines.append(_muted("╰" + "─" * (width - 2) + "╯"))
 337    return [_center_ansi(line, page_width or width) for line in lines]
 338
 339
 340def _form_panel(title: str, value: str, command: str, *, width: int, page_width: int | None = None) -> list[str]:
 341    return _panel(
 342        title,
 343        [
 344            _bold(_accent(_one_line(value, max(16, width - 10)))),
 345            _muted(f"{command}; type the value in the setup input below"),
 346        ],
 347        width=width,
 348        page_width=page_width,
 349    )
 350
 351
 352def _choice_card(title: str, copy: str, value: str, *, active: bool, width: int) -> list[str]:
 353    border = _accent if active else _muted
 354    marker = _accent("● selected") if active else _muted("○ available")
 355    inner = max(8, width - 4)
 356    return [
 357        border("╭" + "─" * (width - 2) + "╮"),
 358        border("│ ") + _fit_ansi(_bold(title), inner) + border(" │"),
 359        border("│ ") + _fit_ansi(marker, inner) + border(" │"),
 360        border("│ ") + _fit_ansi(_muted(copy), inner) + border(" │"),
 361        border("│ ") + _fit_ansi(_accent(value), inner) + border(" │"),
 362        border("╰" + "─" * (width - 2) + "╯"),
 363    ]
 364
 365
 366def _join_cards(left: list[str], right: list[str], *, width: int) -> list[str]:
 367    gap = "  "
 368    rows = []
 369    for index in range(max(len(left), len(right))):
 370        left_line = left[index] if index < len(left) else " " * len(_strip_ansi(left[0]))
 371        right_line = right[index] if index < len(right) else " " * len(_strip_ansi(right[0]))
 372        rows.append(_center_ansi(left_line + gap + right_line, width))
 373    return rows
 374
 375
 376def _join_many_cards(cards: list[list[str]], *, gap: int, width: int) -> list[str]:
 377    rows: list[str] = []
 378    max_rows = max(len(card) for card in cards)
 379    gap_text = " " * gap
 380    for row_index in range(max_rows):
 381        row_parts = []
 382        for card in cards:
 383            fallback_width = len(_strip_ansi(card[0]))
 384            row_parts.append(card[row_index] if row_index < len(card) else " " * fallback_width)
 385        rows.append(gap_text.join(row_parts))
 386    return [_fit_ansi(row, width) for row in rows]
 387
 388
 389def _append_notice_block(lines: list[str], notices: list[str], *, width: int, rows: int) -> list[str]:
 390    budget = max(3, min(6, rows // 4))
 391    notice_lines = [_bold("Transcript")]
 392    for notice in notices[-budget:]:
 393        notice_lines.append(_fit_ansi(_accent("› ") + _one_line(notice, width - 4), width))
 394    if len(lines) + len(notice_lines) + 1 <= rows:
 395        return [*lines, "", *notice_lines]
 396    keep = max(0, rows - len(notice_lines) - 1)
 397    return [*lines[:keep], "", *notice_lines]
 398
 399
 400def _fit_page(lines: list[str], *, width: int, rows: int) -> list[str]:
 401    fitted = [_fit_ansi(line, width) for line in lines]
 402    if len(fitted) < rows:
 403        fitted.extend([" " * width for _ in range(rows - len(fitted))])
 404    return fitted[:rows]
 405
 406
 407def _action_line(
 408    index: int,
 409    action: tuple[str, str, str],
 410    *,
 411    selected: int,
 412    config: AppConfig,
 413    width: int,
 414) -> str:
 415    key, label, detail = action
 416    marker = _accent("›") if index == selected else _muted(" ")
 417    label_text = _bold(label) if index == selected else label
 418    value = _action_value(key, detail, config=config)
 419    return _fit_ansi(
 420        f"{marker} {index + 1}. {_fit_ansi(label_text, 15)} {_muted(_one_line(value, max(8, width - 21)))}",
 421        width,
 422    )
 423
 424
 425def _screen_value_lines(view: str, *, config: AppConfig, width: int) -> list[str]:
 426    if view == "model":
 427        return [_large_value("model", config.model.model, width=width)]
 428    if view == "endpoint":
 429        return [_large_value("endpoint", config.model.base_url, width=width)]
 430    if view == "api":
 431        key_state = "set" if config.model.api_key else "missing"
 432        return [
 433            _large_value("key", key_state, width=width),
 434            _muted(f"Stored under {config.model.api_key_env} in ~/.nipux/.env."),
 435        ]
 436    if view == "doctor":
 437        return [
 438            _large_value("check", "ready to run", width=width),
 439            _muted("Doctor verifies runtime checks, then sends a small chat request to the configured model."),
 440        ]
 441    return []
 442
 443
 444def _large_value(label: str, value: str, *, width: int) -> str:
 445    label_text = _muted(f"{label} ")
 446    return _fit_ansi(label_text + _bold(_accent(_one_line(value, max(12, width - len(label) - 2)))), width)
 447
 448
 449def _action_value(key: str, detail: str, *, config: AppConfig) -> str:
 450    if key.startswith("view:"):
 451        return detail
 452    if key.startswith("edit:"):
 453        field = key.split(":", 1)[1]
 454        return str(config_field_value(field, config))
 455    if key.startswith("toggle:"):
 456        field = key.split(":", 1)[1]
 457        return "enabled" if bool(config_field_value(field, config)) else "disabled"
 458    if key == "secret:model.api_key":
 459        return "stored in .env" if config.model.api_key else f"uses {config.model.api_key_env}"
 460    if key == "preset:local":
 461        return "http://localhost:8000/v1"
 462    return detail
 463
 464
 465def _step_state(key: str, *, config: AppConfig) -> str:
 466    if key == "model":
 467        return _one_line(config.model.model, 20)
 468    if key == "endpoint":
 469        return _one_line(config.model.base_url, 20)
 470    if key == "api":
 471        return "ready" if config.model.api_key or _is_local_endpoint(config.model.base_url) else "missing"
 472    if key == "access":
 473        enabled = sum(bool(value) for value in (config.tools.browser, config.tools.web, config.tools.shell, config.tools.files))
 474        return f"{enabled}/4 enabled"
 475    if key == "doctor":
 476        return "pending"
 477    return ""
 478
 479
 480def _first_run_hint(view: str) -> str:
 481    if view == "endpoint":
 482        return "Required: type an OpenAI-compatible endpoint URL, then Enter."
 483    if view == "api":
 484        return "Required: type an API key, or type skip for a local endpoint."
 485    if view == "model":
 486        return "Required: type the model id accepted by this endpoint."
 487    if view == "access":
 488        return "Use arrows/clicks to toggle tools, then choose Continue."
 489    if view == "doctor":
 490        return "Run Doctor, then open the chat workspace."
 491    return "Complete setup before the workspace opens."
 492
 493
 494def _first_run_edit_hint(field: str, config: AppConfig) -> str:
 495    if field == "model.base_url":
 496        return "Endpoint URL required. Enter saves and advances. Blank input is blocked."
 497    if field == "secret:model.api_key":
 498        return "API key required for hosted endpoints. For local endpoints, type skip."
 499    if field == "model.name":
 500        return "Model id required. Enter saves and advances. Blank input is blocked."
 501    return edit_target_hint(field, config)
 502
 503
 504def _first_run_prompt_label(field: str) -> str:
 505    if field == "model.base_url":
 506        return "Endpoint"
 507    if field == "secret:model.api_key":
 508        return "API key"
 509    if field == "model.name":
 510        return "Model"
 511    return edit_target_label(field)
 512
 513
 514def _left_title(view: str) -> str:
 515    return _screen_heading(view)
 516
 517
 518def _screen_heading(view: str) -> str:
 519    return {
 520        "model": "Choose model",
 521        "endpoint": "Connect endpoint",
 522        "api": "Add API key",
 523        "access": "Choose tools",
 524        "doctor": "Run checks",
 525    }.get(view, "Connect endpoint")
 526
 527
 528def _screen_copy(view: str) -> str:
 529    return {
 530        "model": "The chat controller and workers use this model unless you change it later.",
 531        "endpoint": "Use any OpenAI-compatible /v1 endpoint. This stays generic and provider-neutral.",
 532        "api": "Hosted providers need a secret. Local endpoints can continue without one.",
 533        "access": "Enable the generic tools this worker can use for any job.",
 534        "doctor": "Verify the configured model, then open the main chat workspace.",
 535    }.get(view, "Nipux installs through this full-screen setup.")
 536
 537
 538def _install_summary(config: AppConfig, *, width: int) -> str:
 539    connector = "local connector" if _is_local_endpoint(config.model.base_url) else "hosted connector"
 540    text = f"{connector} · {config.model.model} · {config.model.base_url}"
 541    return _muted(_one_line(text, width))
 542
 543
 544def _normalize_first_run_view(view: str) -> str:
 545    return view if view in FIRST_RUN_ACTIONS_BY_VIEW else "endpoint"
 546
 547
 548def _step_count_label(view: str) -> str:
 549    keys = [key for key, _label, _detail in INSTALL_FLOW]
 550    try:
 551        index = keys.index(view) + 1
 552    except ValueError:
 553        index = 1
 554    return f"STEP {index} / {len(INSTALL_FLOW)}"
 555
 556
 557def _access_row(name: str, enabled: bool, detail: str) -> str:
 558    marker = _accent("on ") if enabled else _muted("off")
 559    return f"{_fit_ansi(name, 10)} {marker} {_muted(detail)}"
 560
 561
 562def _is_local_endpoint(value: str) -> bool:
 563    lowered = value.lower()
 564    return "localhost" in lowered or "127.0.0.1" in lowered or lowered.startswith("http://0.0.0.0")
 565
 566
 567def _clamp_first_run_selection(selected: int, view: str) -> int:
 568    actions = first_run_actions(view)
 569    if not actions:
 570        return 0
 571    return max(0, min(selected, len(actions) - 1))
nipux_cli/frame_snapshot.py 183 lines
   1"""Data loading contract for the interactive Nipux terminal frame."""
   2
   3from __future__ import annotations
   4
   5from typing import Any
   6
   7from nipux_cli.config import AppConfig
   8from nipux_cli.daemon import daemon_lock_status
   9from nipux_cli.db import AgentDB
  10from nipux_cli.tui_outcomes import SUMMARY_EVENT_TYPES, SUMMARY_TOOL_EVENT_TYPES, is_summary_event_candidate
  11
  12
  13WORKSPACE_CHAT_ID = "__workspace__"
  14
  15
  16def load_frame_snapshot(
  17    db: AgentDB,
  18    config: AppConfig,
  19    job_id: str,
  20    *,
  21    default_job_id: str | None = None,
  22    history_limit: int = 12,
  23    workspace_events: list[dict[str, Any]] | None = None,
  24) -> dict[str, Any]:
  25    """Return the compact state bundle rendered by the chat TUI."""
  26
  27    resolved_job_id = job_id or default_job_id
  28    if resolved_job_id == WORKSPACE_CHAT_ID:
  29        return load_workspace_frame_snapshot(
  30            db,
  31            config,
  32            focused_job_id=default_job_id,
  33            history_limit=history_limit,
  34            workspace_events=workspace_events or [],
  35        )
  36    job = db.get_job(resolved_job_id)
  37    jobs = db.list_jobs()
  38    token_usage = db.job_token_usage(resolved_job_id)
  39    token_usage["input_cost_per_million"] = config.model.input_cost_per_million
  40    token_usage["output_cost_per_million"] = config.model.output_cost_per_million
  41    summary_events = _summary_events(db, resolved_job_id, history_limit=history_limit)
  42    return {
  43        "job_id": resolved_job_id,
  44        "job": job,
  45        "jobs": jobs,
  46        "steps": db.list_steps(job_id=resolved_job_id, limit=80),
  47        "artifacts": db.list_artifacts(resolved_job_id, limit=8),
  48        "job_artifacts": {
  49            str(item["id"]): db.list_artifacts(str(item["id"]), limit=3)
  50            for item in jobs[:6]
  51            if item.get("id")
  52        },
  53        "job_summary_events": {
  54            str(item["id"]): _summary_events(db, str(item["id"]), history_limit=3)
  55            for item in jobs[:6]
  56            if item.get("id")
  57        },
  58        "job_counts": {
  59            str(item["id"]): db.job_record_counts(str(item["id"]))
  60            for item in jobs[:6]
  61            if item.get("id")
  62        },
  63        "memory_entries": db.list_memory(resolved_job_id)[:8],
  64        "events": db.list_events(job_id=resolved_job_id, limit=max(history_limit * 16, 240)),
  65        "summary_events": summary_events,
  66        "daemon": daemon_lock_status(config.runtime.home / "agentd.lock"),
  67        "model": config.model.model,
  68        "base_url": config.model.base_url,
  69        "context_length": config.model.context_length,
  70        "token_usage": token_usage,
  71        "counts": db.job_record_counts(resolved_job_id),
  72    }
  73
  74
  75def load_workspace_frame_snapshot(
  76    db: AgentDB,
  77    config: AppConfig,
  78    *,
  79    focused_job_id: str | None = None,
  80    history_limit: int = 12,
  81    workspace_events: list[dict[str, Any]] | None = None,
  82) -> dict[str, Any]:
  83    """Return the chat/control frame before any worker job is focused."""
  84
  85    jobs = db.list_jobs()
  86    events = list(workspace_events or [])[-max(history_limit * 8, 80) :]
  87    focused_job = _safe_job(db, focused_job_id)
  88    focused_id = str(focused_job.get("id") or "") if focused_job else ""
  89    token_usage = _workspace_token_usage(config)
  90    if focused_id:
  91        token_usage = db.job_token_usage(focused_id)
  92        token_usage["input_cost_per_million"] = config.model.input_cost_per_million
  93        token_usage["output_cost_per_million"] = config.model.output_cost_per_million
  94    workspace_job = {
  95        "id": WORKSPACE_CHAT_ID,
  96        "title": "Nipux",
  97        "objective": "Chat with Nipux to create, start, inspect, and steer long-running worker jobs.",
  98        "kind": "workspace",
  99        "status": "ready",
 100        "metadata": {},
 101    }
 102    right_job = focused_job or workspace_job
 103    return {
 104        "job_id": WORKSPACE_CHAT_ID,
 105        "job": workspace_job,
 106        "right_job": right_job,
 107        "right_job_id": focused_id or WORKSPACE_CHAT_ID,
 108        "jobs": jobs,
 109        "steps": db.list_steps(job_id=focused_id, limit=80) if focused_id else [],
 110        "artifacts": db.list_artifacts(focused_id, limit=8) if focused_id else [],
 111        "job_artifacts": {
 112            str(item["id"]): db.list_artifacts(str(item["id"]), limit=3)
 113            for item in jobs[:6]
 114            if item.get("id")
 115        },
 116        "job_summary_events": {
 117            str(item["id"]): _summary_events(db, str(item["id"]), history_limit=3)
 118            for item in jobs[:6]
 119            if item.get("id")
 120        },
 121        "job_counts": {
 122            str(item["id"]): db.job_record_counts(str(item["id"]))
 123            for item in jobs[:6]
 124            if item.get("id")
 125        },
 126        "memory_entries": db.list_memory(focused_id)[:8] if focused_id else [],
 127        "events": events,
 128        "right_events": db.list_events(job_id=focused_id, limit=max(history_limit * 16, 240)) if focused_id else [],
 129        "summary_events": _summary_events(db, focused_id, history_limit=history_limit) if focused_id else [],
 130        "daemon": daemon_lock_status(config.runtime.home / "agentd.lock"),
 131        "model": config.model.model,
 132        "base_url": config.model.base_url,
 133        "context_length": config.model.context_length,
 134        "token_usage": token_usage,
 135        "counts": db.job_record_counts(focused_id) if focused_id else {"steps": 0, "artifacts": 0, "memory": 0, "events": len(events)},
 136    }
 137
 138
 139def _safe_job(db: AgentDB, job_id: str | None) -> dict[str, Any] | None:
 140    if not job_id:
 141        return None
 142    try:
 143        return db.get_job(job_id)
 144    except Exception:
 145        return None
 146
 147
 148def _workspace_token_usage(config: AppConfig) -> dict[str, Any]:
 149    return {
 150        "prompt_tokens": 0,
 151        "completion_tokens": 0,
 152        "total_tokens": 0,
 153        "cost": 0.0,
 154        "calls": 0,
 155        "has_cost": False,
 156        "input_cost_per_million": config.model.input_cost_per_million,
 157        "output_cost_per_million": config.model.output_cost_per_million,
 158    }
 159
 160
 161def _summary_events(db: AgentDB, job_id: str, *, history_limit: int) -> list[dict[str, Any]]:
 162    durable_events = db.list_events(
 163        job_id=job_id,
 164        limit=max(history_limit * 24, 360),
 165        event_types=SUMMARY_EVENT_TYPES,
 166    )
 167    tool_events = [
 168        event
 169        for event in db.list_events(
 170            job_id=job_id,
 171            limit=max(history_limit * 10, 160),
 172            event_types=SUMMARY_TOOL_EVENT_TYPES,
 173        )
 174        if is_summary_event_candidate(event)
 175    ]
 176    merged: dict[str, dict[str, Any]] = {}
 177    for event in [*durable_events, *tool_events]:
 178        event_id = str(event.get("id") or "")
 179        if event_id:
 180            merged[event_id] = event
 181        else:
 182            merged[f"{event.get('created_at')}-{event.get('event_type')}-{event.get('title')}-{len(merged)}"] = event
 183    return sorted(merged.values(), key=lambda event: (str(event.get("created_at") or ""), str(event.get("id") or "")))
nipux_cli/llm.py 288 lines
   1"""LLM provider adapters for one bounded worker step."""
   2
   3from __future__ import annotations
   4
   5import json
   6import urllib.error
   7import urllib.parse
   8import urllib.request
   9from dataclasses import dataclass, field
  10from typing import Any, Protocol
  11
  12from openai import OpenAI
  13
  14from nipux_cli.config import ModelConfig
  15
  16
  17@dataclass(frozen=True)
  18class ToolCall:
  19    name: str
  20    arguments: dict[str, Any] = field(default_factory=dict)
  21    id: str = ""
  22
  23
  24@dataclass(frozen=True)
  25class LLMResponse:
  26    content: str = ""
  27    tool_calls: list[ToolCall] = field(default_factory=list)
  28    usage: dict[str, Any] = field(default_factory=dict)
  29    model: str = ""
  30    response_id: str = ""
  31
  32
  33class LLMResponseError(RuntimeError):
  34    """Raised when a provider returns an OpenAI-shaped response without choices."""
  35
  36    def __init__(self, message: str, *, payload: dict[str, Any] | None = None):
  37        super().__init__(message)
  38        self.payload = payload or {}
  39
  40
  41class StepLLM(Protocol):
  42    def next_action(self, *, messages: list[dict[str, Any]], tools: list[dict[str, Any]]) -> LLMResponse:
  43        ...
  44
  45
  46class OpenAIChatLLM:
  47    """OpenAI-compatible chat-completions adapter."""
  48
  49    tool_repair = True
  50
  51    def __init__(self, config: ModelConfig):
  52        self.config = config
  53        headers = {}
  54        if "openrouter.ai" in config.base_url:
  55            headers = {
  56                "HTTP-Referer": "https://github.com/nipuxx/agent-cli",
  57                "X-Title": "Nipux CLI",
  58            }
  59        self._openai = OpenAI(
  60            api_key=config.api_key or "local-no-key",
  61            base_url=config.base_url,
  62            timeout=config.request_timeout_seconds,
  63            max_retries=0,
  64            default_headers=headers or None,
  65        )
  66
  67    def next_action(self, *, messages: list[dict[str, Any]], tools: list[dict[str, Any]]) -> LLMResponse:
  68        request: dict[str, Any] = {
  69            "model": self.config.model,
  70            "messages": messages,
  71            "tools": tools,
  72        }
  73        if tools:
  74            request["tool_choice"] = "required"
  75        try:
  76            response = self._openai.chat.completions.create(**request)
  77        except Exception as exc:
  78            if "tool_choice" not in request or not _is_unsupported_tool_choice_error(exc):
  79                raise
  80            fallback_request = dict(request)
  81            fallback_request.pop("tool_choice", None)
  82            response = self._openai.chat.completions.create(**fallback_request)
  83        choices = response.choices or []
  84        if not choices:
  85            payload = _response_payload(response)
  86            error = payload.get("error") if isinstance(payload.get("error"), dict) else {}
  87            detail = error.get("message") or payload.get("message") or "provider returned no choices"
  88            raise LLMResponseError(str(detail), payload=payload)
  89        message = choices[0].message
  90        calls: list[ToolCall] = []
  91        for call in message.tool_calls or []:
  92            raw_args = call.function.arguments or "{}"
  93            try:
  94                parsed = json.loads(raw_args)
  95            except json.JSONDecodeError:
  96                parsed = {}
  97            calls.append(ToolCall(name=call.function.name, arguments=parsed, id=call.id or ""))
  98        content = message.content or ""
  99        response_id = _response_id(response)
 100        usage = _response_usage(response, messages=messages, content=content, tool_calls=calls)
 101        usage = _enrich_openrouter_generation_usage(
 102            usage,
 103            response_id=response_id,
 104            base_url=self.config.base_url,
 105            api_key=self.config.api_key,
 106        )
 107        return LLMResponse(
 108            content=content,
 109            tool_calls=calls,
 110            usage=usage,
 111            model=_response_model(response),
 112            response_id=response_id,
 113        )
 114
 115    def complete(self, *, messages: list[dict[str, Any]]) -> str:
 116        return self.complete_response(messages=messages).content
 117
 118    def complete_response(self, *, messages: list[dict[str, Any]]) -> LLMResponse:
 119        response = self._openai.chat.completions.create(
 120            model=self.config.model,
 121            messages=messages,
 122        )
 123        choices = response.choices or []
 124        if not choices:
 125            payload = _response_payload(response)
 126            error = payload.get("error") if isinstance(payload.get("error"), dict) else {}
 127            detail = error.get("message") or payload.get("message") or "provider returned no choices"
 128            raise LLMResponseError(str(detail), payload=payload)
 129        content = choices[0].message.content or ""
 130        response_id = _response_id(response)
 131        usage = _response_usage(response, messages=messages, content=content, tool_calls=[])
 132        usage = _enrich_openrouter_generation_usage(
 133            usage,
 134            response_id=response_id,
 135            base_url=self.config.base_url,
 136            api_key=self.config.api_key,
 137        )
 138        return LLMResponse(
 139            content=content,
 140            usage=usage,
 141            model=_response_model(response),
 142            response_id=response_id,
 143        )
 144
 145
 146class ScriptedLLM:
 147    """Tiny deterministic LLM used by tests and CLI dry runs."""
 148
 149    def __init__(self, responses: list[LLMResponse]):
 150        self.responses = list(responses)
 151
 152    def next_action(self, *, messages: list[dict[str, Any]], tools: list[dict[str, Any]]) -> LLMResponse:
 153        del messages, tools
 154        if not self.responses:
 155            return LLMResponse(content="No scripted response left.")
 156        return self.responses.pop(0)
 157
 158
 159def _response_payload(response: Any) -> dict[str, Any]:
 160    if hasattr(response, "model_dump"):
 161        dumped = response.model_dump()
 162        return dumped if isinstance(dumped, dict) else {"response": dumped}
 163    if hasattr(response, "to_dict"):
 164        dumped = response.to_dict()
 165        return dumped if isinstance(dumped, dict) else {"response": dumped}
 166    return {"response": repr(response)}
 167
 168
 169def _response_usage(
 170    response: Any,
 171    *,
 172    messages: list[dict[str, Any]],
 173    content: str,
 174    tool_calls: list[ToolCall],
 175) -> dict[str, Any]:
 176    payload = _response_payload(response)
 177    usage = payload.get("usage")
 178    if isinstance(usage, dict):
 179        normalized = dict(usage)
 180        normalized["estimated"] = False
 181        return normalized
 182    usage_obj = getattr(response, "usage", None)
 183    if usage_obj is not None:
 184        dumped = usage_obj.model_dump() if hasattr(usage_obj, "model_dump") else getattr(usage_obj, "__dict__", {})
 185        if isinstance(dumped, dict) and dumped:
 186            normalized = dict(dumped)
 187            normalized["estimated"] = False
 188            return normalized
 189    prompt_tokens = _estimate_token_count(json.dumps(messages, ensure_ascii=False, default=str))
 190    tool_text = json.dumps([{"name": call.name, "arguments": call.arguments} for call in tool_calls], ensure_ascii=False, default=str)
 191    completion_tokens = _estimate_token_count(content + tool_text)
 192    return {
 193        "prompt_tokens": prompt_tokens,
 194        "completion_tokens": completion_tokens,
 195        "total_tokens": prompt_tokens + completion_tokens,
 196        "estimated": True,
 197    }
 198
 199
 200def _enrich_openrouter_generation_usage(
 201    usage: dict[str, Any],
 202    *,
 203    response_id: str,
 204    base_url: str,
 205    api_key: str,
 206) -> dict[str, Any]:
 207    if usage.get("cost") is not None or not response_id or not api_key:
 208        return usage
 209    if "openrouter.ai" not in base_url:
 210        return usage
 211    parsed = urllib.parse.urlparse(base_url)
 212    root = f"{parsed.scheme or 'https'}://{parsed.netloc or 'openrouter.ai'}"
 213    url = f"{root}/api/v1/generation?id={urllib.parse.quote(response_id)}"
 214    request = urllib.request.Request(url, headers={"Authorization": f"Bearer {api_key}"})
 215    try:
 216        with urllib.request.urlopen(request, timeout=5) as response:
 217            payload = json.loads(response.read().decode("utf-8", errors="replace"))
 218    except (OSError, urllib.error.URLError, urllib.error.HTTPError, json.JSONDecodeError):
 219        return usage
 220    data = payload.get("data") if isinstance(payload, dict) else None
 221    if not isinstance(data, dict):
 222        return usage
 223    enriched = dict(usage)
 224    cost = _safe_float(data.get("total_cost") or data.get("cost"))
 225    if cost is not None:
 226        enriched["cost"] = cost
 227    prompt = _safe_int(data.get("native_tokens_prompt") or data.get("tokens_prompt"))
 228    completion = _safe_int(data.get("native_tokens_completion") or data.get("tokens_completion"))
 229    total = _safe_int(data.get("native_tokens_total") or data.get("tokens_total"))
 230    if prompt is not None:
 231        enriched["prompt_tokens"] = prompt
 232    if completion is not None:
 233        enriched["completion_tokens"] = completion
 234    if total is not None:
 235        enriched["total_tokens"] = total
 236    elif prompt is not None or completion is not None:
 237        enriched["total_tokens"] = int(enriched.get("prompt_tokens") or 0) + int(enriched.get("completion_tokens") or 0)
 238    enriched["estimated"] = bool(enriched.get("estimated")) and cost is None
 239    return enriched
 240
 241
 242def _safe_float(value: Any) -> float | None:
 243    try:
 244        if value in (None, ""):
 245            return None
 246        return float(value)
 247    except (TypeError, ValueError):
 248        return None
 249
 250
 251def _safe_int(value: Any) -> int | None:
 252    try:
 253        if value in (None, ""):
 254            return None
 255        return int(float(value))
 256    except (TypeError, ValueError):
 257        return None
 258
 259
 260def _is_unsupported_tool_choice_error(exc: Exception) -> bool:
 261    text = f"{type(exc).__name__} {exc}".lower()
 262    return "tool_choice" in text and any(
 263        marker in text
 264        for marker in (
 265            "unsupported",
 266            "unknown parameter",
 267            "unrecognized",
 268            "not supported",
 269            "invalid_request",
 270            "extra inputs are not permitted",
 271        )
 272    )
 273
 274
 275def _estimate_token_count(text: str) -> int:
 276    if not text:
 277        return 0
 278    return max(1, (len(text) + 3) // 4)
 279
 280
 281def _response_model(response: Any) -> str:
 282    payload = _response_payload(response)
 283    return str(payload.get("model") or getattr(response, "model", "") or "")
 284
 285
 286def _response_id(response: Any) -> str:
 287    payload = _response_payload(response)
 288    return str(payload.get("id") or getattr(response, "id", "") or "")
nipux_cli/measurement.py 141 lines
   1"""Measurement parsing helpers for generic progress accounting."""
   2
   3from __future__ import annotations
   4
   5import re
   6from typing import Any
   7
   8
   9MEASUREMENT_PATTERN = re.compile(
  10    r"(?i)(?:"
  11    r"\b\d+(?:\.\d+)?\s*(?:%|ms|s|sec|secs|seconds|msec|us|hz|khz|mhz|ghz|kb/s|mb/s|gb/s|tb/s|"
  12    r"it/s|ops/s|req/s|qps|rps|samples/s|items/s|units/s|tokens/s|tok/s|t/s)\b"
  13    r"|(?:score|rate|speed|throughput|latency|accuracy|loss|error|duration|runtime|time|memory|cpu|gpu|ram)\D{0,40}\d+(?:\.\d+)?"
  14    r")"
  15)
  16MEASUREMENT_INTENT_PATTERN = re.compile(
  17    r"(?i)\b(bench(?:mark)?|compare|duration|eval(?:uate)?|experiment|hyperfine|latency|measure|metric|perf|"
  18    r"profile|rate|runtime|speed|test|throughput|time|trial)\b"
  19)
  20DIAGNOSTIC_MEASUREMENT_PATTERN = re.compile(r"(?i)^\s*(?:cpu|gpu|memory|mem|ram)\b")
  21ACTION_MEASUREMENT_PATTERN = re.compile(
  22    r"(?i)^\s*(?:score|rate|speed|throughput|latency|accuracy|loss|error|duration|runtime|time)\b"
  23)
  24LABELED_MEASUREMENT_PATTERN = re.compile(
  25    r"(?i)^\s*(?:score|rate|speed|throughput|latency|accuracy|loss|error|duration|runtime|time)\s*(?:=|:)\s*[-+]?\d"
  26)
  27EXPLICIT_RESULT_UNIT_PATTERN = re.compile(
  28    r"(?i)\b\d+(?:\.\d+)?\s*(?:%|ms|msec|sec|secs|seconds|it/s|ops/s|req/s|qps|rps|samples/s|items/s|units/s|"
  29    r"tokens/s|tok/s|t/s|kb/s|mb/s|gb/s|tb/s)\b"
  30)
  31TABLE_UNIT_PATTERN = re.compile(
  32    r"(?i)^(?:%|ms|msec|sec|secs|seconds|it/s|ops/s|req/s|qps|rps|samples/s|items/s|units/s|"
  33    r"tokens/s|tok/s|t/s|kb/s|mb/s|gb/s|tb/s)$"
  34)
  35TABLE_NUMBER_PATTERN = re.compile(r"[-+]?\d+(?:\.\d+)?(?:\s*(?:±|\+/-)\s*[-+]?\d+(?:\.\d+)?)?")
  36
  37
  38def measurement_candidates(output: dict[str, Any], *, command: str = "", limit: int = 8) -> list[str]:
  39    text = "\n".join(
  40        str(output.get(key) or "")
  41        for key in ("stdout", "stderr", "result", "content")
  42        if output.get(key) is not None
  43    )
  44    if not text.strip():
  45        return []
  46    command_has_measurement_intent = bool(MEASUREMENT_INTENT_PATTERN.search(command))
  47    candidates: list[str] = []
  48    for candidate in _table_measurement_candidates(text, limit=limit):
  49        if candidate not in candidates:
  50            candidates.append(candidate)
  51        if len(candidates) >= limit:
  52            return candidates
  53    for match in MEASUREMENT_PATTERN.finditer(text[:20000]):
  54        candidate = " ".join(match.group(0).split())
  55        if not EXPLICIT_RESULT_UNIT_PATTERN.search(candidate):
  56            expanded = " ".join(text[match.start() : min(len(text), match.end() + 32)].split())
  57            if EXPLICIT_RESULT_UNIT_PATTERN.search(expanded):
  58                candidate = expanded
  59        if _candidate_is_diagnostic_only(candidate, command_has_measurement_intent):
  60            continue
  61        if candidate not in candidates:
  62            candidates.append(candidate[:140])
  63        if len(candidates) >= limit:
  64            break
  65    return candidates
  66
  67
  68def _table_measurement_candidates(text: str, *, limit: int = 8) -> list[str]:
  69    candidates: list[str] = []
  70    table_lines = [line.strip() for line in str(text or "").splitlines() if line.strip().startswith("|") and "|" in line.strip()[1:]]
  71    for index, line in enumerate(table_lines):
  72        headers = _split_markdown_table_row(line)
  73        if not headers or _is_markdown_separator_row(headers):
  74            continue
  75        unit_indexes = [idx for idx, header in enumerate(headers) if TABLE_UNIT_PATTERN.search(header.strip())]
  76        if not unit_indexes:
  77            continue
  78        for row_line in table_lines[index + 1 : index + 16]:
  79            cells = _split_markdown_table_row(row_line)
  80            if not cells or _is_markdown_separator_row(cells):
  81                continue
  82            for unit_index in unit_indexes:
  83                if unit_index >= len(cells):
  84                    continue
  85                value = cells[unit_index].strip()
  86                number = TABLE_NUMBER_PATTERN.search(value)
  87                if not number:
  88                    continue
  89                unit = headers[unit_index].strip()
  90                label = _table_measurement_label(headers, cells, unit_index=unit_index)
  91                candidate = f"{label} {number.group(0).strip()} {unit}".strip()
  92                if candidate not in candidates:
  93                    candidates.append(candidate[:140])
  94                if len(candidates) >= limit:
  95                    return candidates
  96    return candidates
  97
  98
  99def _split_markdown_table_row(line: str) -> list[str]:
 100    raw = str(line or "").strip()
 101    if raw.startswith("|"):
 102        raw = raw[1:]
 103    if raw.endswith("|"):
 104        raw = raw[:-1]
 105    return [" ".join(cell.strip().split()) for cell in raw.split("|")]
 106
 107
 108def _is_markdown_separator_row(cells: list[str]) -> bool:
 109    return bool(cells) and all(re.fullmatch(r":?-{2,}:?", cell.strip()) for cell in cells if cell.strip())
 110
 111
 112def _table_measurement_label(headers: list[str], cells: list[str], *, unit_index: int) -> str:
 113    preferred_headers = {"test", "metric", "name", "case", "benchmark"}
 114    for index, header in enumerate(headers):
 115        if index >= len(cells) or index == unit_index:
 116            continue
 117        if header.strip().lower() in preferred_headers and cells[index].strip():
 118            return cells[index].strip()
 119    for index in range(min(unit_index, len(cells)) - 1, -1, -1):
 120        cell = cells[index].strip()
 121        if cell and not TABLE_NUMBER_PATTERN.fullmatch(cell):
 122            return cell
 123    return "measurement"
 124
 125
 126def measurement_candidates_are_diagnostic_only(candidates: list[Any], *, command: str = "") -> bool:
 127    command_has_measurement_intent = bool(MEASUREMENT_INTENT_PATTERN.search(command))
 128    return all(_candidate_is_diagnostic_only(str(candidate), command_has_measurement_intent) for candidate in candidates)
 129
 130
 131def _candidate_is_diagnostic_only(candidate: str, command_has_measurement_intent: bool) -> bool:
 132    has_structured_metric = bool(EXPLICIT_RESULT_UNIT_PATTERN.search(candidate) or LABELED_MEASUREMENT_PATTERN.search(candidate))
 133    if command_has_measurement_intent:
 134        return not has_structured_metric
 135    if DIAGNOSTIC_MEASUREMENT_PATTERN.search(candidate):
 136        return True
 137    if EXPLICIT_RESULT_UNIT_PATTERN.search(candidate) and not re.search(r"(?i)\b(?:cpu|gpu|ram|mem|memory)\b", candidate):
 138        return False
 139    if ACTION_MEASUREMENT_PATTERN.search(candidate):
 140        return not bool(LABELED_MEASUREMENT_PATTERN.search(candidate))
 141    return True
nipux_cli/memory_graph.py 302 lines
   1"""Job-local memory graph helpers for long-running workers."""
   2
   3from __future__ import annotations
   4
   5import re
   6from datetime import datetime
   7from typing import Any
   8
   9from nipux_cli.worker_prompt_format import clip_text
  10
  11
  12NODE_KINDS = {
  13    "artifact",
  14    "constraint",
  15    "decision",
  16    "episode",
  17    "experiment",
  18    "fact",
  19    "milestone",
  20    "question",
  21    "skill",
  22    "source",
  23    "strategy",
  24    "task",
  25}
  26NODE_STATUSES = {"active", "blocked", "deprecated", "open", "resolved", "stable"}
  27DEFAULT_NODE_KIND = "fact"
  28DEFAULT_NODE_STATUS = "active"
  29NEGATIVE_MEMORY_MARKERS = (
  30    "0 files",
  31    "0 results",
  32    "blocked until",
  33    "cannot access",
  34    "critical blocker",
  35    "does not exist",
  36    "failed to find",
  37    "missing",
  38    "must be downloaded",
  39    "no ",
  40    "no such",
  41    "none",
  42    "not available",
  43    "not detected",
  44    "not downloaded",
  45    "not found",
  46    "not installed",
  47    "prevents",
  48    "unavailable",
  49    "without",
  50)
  51
  52
  53def memory_graph_from_job(job: dict[str, Any]) -> dict[str, Any]:
  54    metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
  55    graph = metadata.get("memory_graph") if isinstance(metadata.get("memory_graph"), dict) else {}
  56    nodes = graph.get("nodes") if isinstance(graph.get("nodes"), list) else []
  57    edges = graph.get("edges") if isinstance(graph.get("edges"), list) else []
  58    return {
  59        "nodes": [node for node in nodes if isinstance(node, dict)],
  60        "edges": [edge for edge in edges if isinstance(edge, dict)],
  61        "updated_at": graph.get("updated_at") or "",
  62    }
  63
  64
  65def memory_graph_for_prompt(job: dict[str, Any], *, limit: int = 10, stale_tokens: list[str] | None = None) -> str:
  66    graph = memory_graph_from_job(job)
  67    metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
  68    stale_tokens = [str(token) for token in stale_tokens or [] if str(token).strip()]
  69    stale_node_ids = {
  70        str(record.get("record_id") or "")
  71        for record in metadata.get("stale_negative_records", [])
  72        if isinstance(record, dict) and str(record.get("kind") or "") == "memory_node"
  73    } if isinstance(metadata.get("stale_negative_records"), list) else set()
  74    stale_nodes = [
  75        node
  76        for node in graph["nodes"]
  77        if _node_contains_stale_token(node, stale_tokens) or _node_has_stale_id(node, stale_node_ids)
  78    ]
  79    active_nodes = [node for node in graph["nodes"] if node not in stale_nodes]
  80    nodes = rank_memory_nodes(active_nodes, limit=limit)
  81    durable_count = _durable_signal_count(job)
  82    if not nodes:
  83        hint = (
  84            "No memory graph yet. When a branch produces reusable knowledge, use record_memory_graph "
  85            "to save connected episode, fact, strategy, skill, question, decision, or constraint nodes."
  86        )
  87        if durable_count:
  88            hint += (
  89                f" Durable ledgers already contain {durable_count} reusable item(s); consolidate the most important "
  90                "ones into graph nodes before raw history grows further."
  91            )
  92        return hint
  93    edge_index = _edge_index(graph["edges"])
  94    lines = [
  95        f"Memory graph: nodes={len(graph['nodes'])} edges={len(graph['edges'])}",
  96    ]
  97    if stale_nodes:
  98        lines.append(
  99            f"Suppressed {len(stale_nodes)} stale memory node(s) matching unsupported tokens; "
 100            "do not use them as facts until observed again."
 101        )
 102    if durable_count >= 8 and len(graph["nodes"]) < max(4, durable_count // 4):
 103        lines.append(
 104            "Consolidation pressure: durable ledgers are growing faster than the memory graph. "
 105            "After the next meaningful checkpoint, use record_memory_graph to add or update connected nodes."
 106        )
 107    for node in nodes:
 108        key = str(node.get("key") or "")
 109        title = str(node.get("title") or key or "memory")
 110        kind = str(node.get("kind") or DEFAULT_NODE_KIND)
 111        status = str(node.get("status") or DEFAULT_NODE_STATUS)
 112        summary = str(node.get("summary") or "")
 113        tags = _clean_string_list(node.get("tags"))[:5]
 114        refs = _clean_string_list(node.get("evidence_refs"))[:4]
 115        parent = str(node.get("parent_key") or "")
 116        line = f"- {status} {kind}: {title}"
 117        if parent:
 118            line += f" | parent={parent}"
 119        if tags:
 120            line += f" | tags={', '.join(tags)}"
 121        if refs:
 122            line += f" | evidence={', '.join(refs)}"
 123        if summary:
 124            line += f" | {summary}"
 125        lines.append(clip_text(line, 620))
 126        related = edge_index.get(key, [])[:3]
 127        if related:
 128            lines.append("  links: " + clip_text("; ".join(related), 420))
 129    return "\n".join(lines)
 130
 131
 132def _node_contains_stale_token(node: dict[str, Any], stale_tokens: list[str]) -> bool:
 133    if not stale_tokens:
 134        return False
 135    text_parts = [
 136        node.get("key"),
 137        node.get("title"),
 138        node.get("kind"),
 139        node.get("status"),
 140        node.get("summary"),
 141        " ".join(_clean_string_list(node.get("tags"))),
 142    ]
 143    text = " ".join(str(part or "") for part in text_parts)
 144    text_lower = text.lower()
 145    negative_node = _node_has_negative_memory_marker(text_lower)
 146    for token in stale_tokens:
 147        pattern = r"(?<![A-Za-z0-9])" + re.escape(token) + r"(?![A-Za-z0-9])"
 148        if re.search(pattern, text, flags=re.IGNORECASE):
 149            return True
 150        if negative_node and token.startswith("."):
 151            bare = token[1:].strip()
 152            if bare and re.search(r"(?<![A-Za-z0-9])" + re.escape(bare) + r"(?![A-Za-z0-9])", text_lower):
 153                return True
 154    return False
 155
 156
 157def _node_has_negative_memory_marker(text_lower: str) -> bool:
 158    return any(marker in text_lower for marker in NEGATIVE_MEMORY_MARKERS)
 159
 160
 161def _node_has_stale_id(node: dict[str, Any], stale_node_ids: set[str]) -> bool:
 162    if not stale_node_ids:
 163        return False
 164    for key in ("key", "event_id", "id"):
 165        value = str(node.get(key) or "").strip()
 166        if value and value in stale_node_ids:
 167            return True
 168    return False
 169
 170
 171def search_memory_graph(graph: dict[str, Any], query: str, *, limit: int = 10) -> dict[str, Any]:
 172    nodes = graph.get("nodes") if isinstance(graph.get("nodes"), list) else []
 173    edges = graph.get("edges") if isinstance(graph.get("edges"), list) else []
 174    ranked = rank_memory_nodes([node for node in nodes if isinstance(node, dict)], query=query, limit=limit)
 175    keys = {str(node.get("key") or "") for node in ranked}
 176    related_edges = [
 177        edge
 178        for edge in edges
 179        if isinstance(edge, dict)
 180        and (str(edge.get("from_key") or "") in keys or str(edge.get("to_key") or "") in keys)
 181    ][: max(limit * 2, 10)]
 182    return {"nodes": ranked, "edges": related_edges}
 183
 184
 185def rank_memory_nodes(nodes: list[dict[str, Any]], *, query: str = "", limit: int = 10) -> list[dict[str, Any]]:
 186    tokens = _tokens(query)
 187    ranked = sorted(nodes, key=lambda node: _node_score(node, tokens), reverse=True)
 188    if tokens:
 189        ranked = [node for node in ranked if _node_score(node, tokens) > 0]
 190    return ranked[: max(0, limit)]
 191
 192
 193def _node_score(node: dict[str, Any], query_tokens: set[str]) -> float:
 194    haystack = " ".join(
 195        str(value or "")
 196        for value in [
 197            node.get("key"),
 198            node.get("title"),
 199            node.get("kind"),
 200            node.get("status"),
 201            node.get("summary"),
 202            " ".join(_clean_string_list(node.get("tags"))),
 203        ]
 204    ).lower()
 205    score = 0.0
 206    for token in query_tokens:
 207        if token in haystack:
 208            score += 4.0 if token in str(node.get("title") or "").lower() else 2.0
 209    score += _float_between(node.get("salience"), 0.0, 1.0) * 3.0
 210    score += _float_between(node.get("confidence"), 0.0, 1.0)
 211    status = str(node.get("status") or DEFAULT_NODE_STATUS)
 212    if status in {"active", "open"}:
 213        score += 1.2
 214    elif status == "stable":
 215        score += 0.7
 216    elif status == "deprecated":
 217        score -= 1.5
 218    kind = str(node.get("kind") or DEFAULT_NODE_KIND)
 219    if kind in {"strategy", "skill", "decision", "constraint", "question"}:
 220        score += 0.5
 221    score += min(int(node.get("use_count") or 0), 8) * 0.08
 222    score += _recency_score(str(node.get("updated_at") or node.get("created_at") or ""))
 223    return score
 224
 225
 226def _edge_index(edges: list[dict[str, Any]]) -> dict[str, list[str]]:
 227    index: dict[str, list[str]] = {}
 228    for edge in edges:
 229        from_key = str(edge.get("from_key") or "")
 230        to_key = str(edge.get("to_key") or "")
 231        if not from_key or not to_key:
 232            continue
 233        relation = str(edge.get("relation") or "related_to")
 234        index.setdefault(from_key, []).append(f"{relation} -> {to_key}")
 235        index.setdefault(to_key, []).append(f"{relation} <- {from_key}")
 236    return index
 237
 238
 239def _tokens(value: str) -> set[str]:
 240    return {token for token in re.findall(r"[a-z0-9][a-z0-9_-]{1,}", value.lower()) if token not in _STOPWORDS}
 241
 242
 243def _float_between(value: Any, low: float, high: float) -> float:
 244    try:
 245        number = float(value)
 246    except (TypeError, ValueError):
 247        return low
 248    return min(high, max(low, number))
 249
 250
 251def _recency_score(value: str) -> float:
 252    if not value:
 253        return 0.0
 254    try:
 255        parsed = datetime.fromisoformat(value.replace("Z", "+00:00"))
 256    except ValueError:
 257        return 0.0
 258    age_seconds = max(0.0, (datetime.now(parsed.tzinfo) - parsed).total_seconds())
 259    if age_seconds < 3600:
 260        return 0.8
 261    if age_seconds < 86_400:
 262        return 0.4
 263    if age_seconds < 604_800:
 264        return 0.15
 265    return 0.0
 266
 267
 268def _clean_string_list(value: Any) -> list[str]:
 269    if not isinstance(value, list):
 270        return []
 271    return [" ".join(str(item).split()) for item in value if str(item).strip()]
 272
 273
 274def _durable_signal_count(job: dict[str, Any]) -> int:
 275    metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
 276    count = 0
 277    for key in (
 278        "experiment_ledger",
 279        "finding_ledger",
 280        "lessons",
 281        "source_ledger",
 282        "task_queue",
 283    ):
 284        values = metadata.get(key)
 285        if isinstance(values, list):
 286            count += sum(1 for value in values if isinstance(value, dict))
 287    roadmap = metadata.get("roadmap") if isinstance(metadata.get("roadmap"), dict) else {}
 288    milestones = roadmap.get("milestones") if isinstance(roadmap.get("milestones"), list) else []
 289    count += sum(1 for milestone in milestones if isinstance(milestone, dict))
 290    return count
 291
 292
 293_STOPWORDS = {
 294    "and",
 295    "for",
 296    "from",
 297    "into",
 298    "that",
 299    "the",
 300    "this",
 301    "with",
 302}
nipux_cli/memory_graph_view.py 299 lines
   1"""Self-contained HTML view for a job-local memory graph."""
   2
   3from __future__ import annotations
   4
   5import json
   6from html import escape
   7from typing import Any
   8
   9from nipux_cli.memory_graph import memory_graph_from_job
  10
  11
  12def render_memory_graph_html(job: dict[str, Any]) -> str:
  13    """Return a standalone clickable graph page for a job's memory graph."""
  14
  15    graph = memory_graph_from_job(job)
  16    nodes = [_view_node(node) for node in graph["nodes"]]
  17    edges = [_view_edge(edge) for edge in graph["edges"]]
  18    data = json.dumps(
  19        {
  20            "job": {
  21                "id": str(job.get("id") or ""),
  22                "title": str(job.get("title") or ""),
  23                "objective": str(job.get("objective") or ""),
  24            },
  25            "updated_at": graph.get("updated_at") or "",
  26            "nodes": nodes,
  27            "edges": edges,
  28        },
  29        ensure_ascii=False,
  30    ).replace("</", "<\\/")
  31    title = escape(str(job.get("title") or "Nipux memory graph"))
  32    return f"""<!doctype html>
  33<html lang="en">
  34<head>
  35<meta charset="utf-8">
  36<meta name="viewport" content="width=device-width, initial-scale=1">
  37<title>Nipux Memory Graph - {title}</title>
  38<style>
  39:root {{
  40  color-scheme: dark;
  41  --bg: #101112;
  42  --panel: #17191b;
  43  --line: #303336;
  44  --text: #eeeeea;
  45  --muted: #9a9a94;
  46  --accent: #82e6e1;
  47  --gold: #e6d06f;
  48  --purple: #c99ce8;
  49}}
  50* {{ box-sizing: border-box; }}
  51html, body {{ margin: 0; min-height: 100%; background: var(--bg); color: var(--text); font: 14px/1.45 ui-monospace, SFMono-Regular, Menlo, Consolas, monospace; }}
  52body {{ overflow: hidden; }}
  53.shell {{ display: grid; grid-template-columns: minmax(0, 1fr) 360px; height: 100vh; }}
  54.stage {{ position: relative; min-width: 0; border-right: 1px solid var(--line); }}
  55.top {{ position: absolute; top: 22px; left: 26px; right: 26px; z-index: 2; display: flex; align-items: start; justify-content: space-between; gap: 24px; }}
  56.eyebrow {{ color: var(--muted); letter-spacing: .22em; text-transform: uppercase; font-size: 12px; }}
  57h1 {{ margin: 8px 0 0; font-size: clamp(30px, 4vw, 64px); line-height: .95; letter-spacing: -.04em; }}
  58.stats {{ display: flex; gap: 18px; color: var(--muted); white-space: nowrap; }}
  59.stats b {{ color: var(--text); font-size: 20px; }}
  60canvas {{ display: block; width: 100%; height: 100%; }}
  61.help {{ position: absolute; left: 26px; bottom: 22px; color: var(--muted); z-index: 2; }}
  62aside {{ min-width: 0; background: var(--panel); padding: 26px; overflow: auto; }}
  63.card {{ border: 1px solid var(--line); border-radius: 18px; padding: 18px; margin-top: 18px; background: rgba(255,255,255,.018); }}
  64.label {{ color: var(--muted); font-size: 12px; letter-spacing: .18em; text-transform: uppercase; }}
  65.node-title {{ margin-top: 10px; font-size: 22px; line-height: 1.12; }}
  66.row {{ display: grid; grid-template-columns: 88px minmax(0, 1fr); gap: 12px; margin-top: 12px; }}
  67.row span:first-child {{ color: var(--muted); }}
  68.pill {{ display: inline-block; margin: 5px 6px 0 0; padding: 3px 8px; border: 1px solid var(--line); border-radius: 999px; color: var(--accent); }}
  69.list {{ margin: 8px 0 0; padding-left: 18px; color: var(--muted); }}
  70.empty {{ color: var(--muted); margin-top: 12px; }}
  71.search {{ width: 100%; margin-top: 18px; padding: 12px 14px; border-radius: 12px; border: 1px solid var(--line); background: #0d0e0f; color: var(--text); font: inherit; outline: none; }}
  72.search:focus {{ border-color: var(--accent); }}
  73@media (max-width: 900px) {{
  74  .shell {{ grid-template-columns: 1fr; grid-template-rows: 62vh 38vh; }}
  75  .stage {{ border-right: 0; border-bottom: 1px solid var(--line); }}
  76}}
  77</style>
  78</head>
  79<body>
  80<main class="shell">
  81  <section class="stage">
  82    <div class="top">
  83      <div>
  84        <div class="eyebrow">Nipux memory graph</div>
  85        <h1>{title}</h1>
  86      </div>
  87      <div class="stats">
  88        <div><b id="node-count">0</b><br>nodes</div>
  89        <div><b id="edge-count">0</b><br>links</div>
  90      </div>
  91    </div>
  92    <canvas id="graph"></canvas>
  93    <div class="help">drag to rotate · scroll to zoom · click a node</div>
  94  </section>
  95  <aside>
  96    <div class="label">inspect</div>
  97    <input id="search" class="search" placeholder="search nodes">
  98    <div id="details" class="card">
  99      <div class="label">selected node</div>
 100      <div class="empty">Click a node to inspect its summary, evidence, and links.</div>
 101    </div>
 102    <div id="results" class="card">
 103      <div class="label">visible nodes</div>
 104      <div class="empty">No nodes yet. The worker can create graph memory with record_memory_graph.</div>
 105    </div>
 106  </aside>
 107</main>
 108<script id="graph-data" type="application/json">{data}</script>
 109<script>
 110const data = JSON.parse(document.getElementById("graph-data").textContent);
 111const canvas = document.getElementById("graph");
 112const ctx = canvas.getContext("2d");
 113const details = document.getElementById("details");
 114const results = document.getElementById("results");
 115const search = document.getElementById("search");
 116document.getElementById("node-count").textContent = data.nodes.length;
 117document.getElementById("edge-count").textContent = data.edges.length;
 118
 119let width = 0, height = 0, zoom = 1, rotX = -0.35, rotY = 0.65, dragging = false, last = [0, 0], selected = null;
 120let lastResultsSignature = "";
 121const nodeByKey = new Map(data.nodes.map((node, index) => [node.key, {{ ...node, index }}]));
 122const nodes = data.nodes.map((node, index) => {{
 123  const a = index * 2.399963229728653;
 124  const r = 110 + (index % 7) * 24;
 125  const z = ((index * 53) % 240) - 120;
 126  return {{ ...node, x: Math.cos(a) * r, y: Math.sin(a) * r, z, vx: 0, vy: 0, vz: 0, screen: [0, 0], visible: true }};
 127}});
 128const nodeLookup = new Map(nodes.map(node => [node.key, node]));
 129const edges = data.edges.map(edge => ({{ ...edge, from: nodeLookup.get(edge.from_key), to: nodeLookup.get(edge.to_key) }})).filter(edge => edge.from && edge.to);
 130
 131function resize() {{
 132  const ratio = window.devicePixelRatio || 1;
 133  width = canvas.clientWidth;
 134  height = canvas.clientHeight;
 135  canvas.width = Math.max(1, width * ratio);
 136  canvas.height = Math.max(1, height * ratio);
 137  ctx.setTransform(ratio, 0, 0, ratio, 0, 0);
 138}}
 139window.addEventListener("resize", resize);
 140resize();
 141
 142function project(node) {{
 143  const cy = Math.cos(rotY), sy = Math.sin(rotY), cx = Math.cos(rotX), sx = Math.sin(rotX);
 144  let x = node.x * cy - node.z * sy;
 145  let z = node.x * sy + node.z * cy;
 146  let y = node.y * cx - z * sx;
 147  z = node.y * sx + z * cx;
 148  const scale = zoom * 520 / (520 + z);
 149  return [width / 2 + x * scale, height / 2 + y * scale, scale, z];
 150}}
 151
 152function color(node) {{
 153  if (node.status === "deprecated") return "#7d7d77";
 154  if (node.kind === "question") return "#e6d06f";
 155  if (node.kind === "skill" || node.kind === "strategy") return "#82e6e1";
 156  if (node.kind === "decision" || node.kind === "constraint") return "#c99ce8";
 157  return "#eeeeea";
 158}}
 159
 160function draw() {{
 161  ctx.clearRect(0, 0, width, height);
 162  ctx.fillStyle = "#101112";
 163  ctx.fillRect(0, 0, width, height);
 164  const q = search.value.trim().toLowerCase();
 165  for (const node of nodes) {{
 166    const hay = [node.key, node.title, node.kind, node.status, node.summary, ...(node.tags || [])].join(" ").toLowerCase();
 167    node.visible = !q || hay.includes(q);
 168  }}
 169  for (const edge of edges) {{
 170    if (!edge.from.visible || !edge.to.visible) continue;
 171    const a = project(edge.from), b = project(edge.to);
 172    ctx.strokeStyle = "rgba(154,154,148,.26)";
 173    ctx.lineWidth = 1;
 174    ctx.beginPath();
 175    ctx.moveTo(a[0], a[1]);
 176    ctx.lineTo(b[0], b[1]);
 177    ctx.stroke();
 178  }}
 179  const sorted = [...nodes].map(node => [node, project(node)]).sort((a, b) => a[1][3] - b[1][3]);
 180  for (const [node, p] of sorted) {{
 181    node.screen = p;
 182    if (!node.visible) continue;
 183    const radius = Math.max(5, 9 * p[2]);
 184    ctx.fillStyle = color(node);
 185    ctx.globalAlpha = selected && selected.key !== node.key ? .55 : 1;
 186    ctx.beginPath();
 187    ctx.arc(p[0], p[1], radius, 0, Math.PI * 2);
 188    ctx.fill();
 189    if (selected && selected.key === node.key) {{
 190      ctx.strokeStyle = "#ffffff";
 191      ctx.lineWidth = 2;
 192      ctx.stroke();
 193    }}
 194  }}
 195  ctx.globalAlpha = 1;
 196  renderResults(q);
 197  requestAnimationFrame(draw);
 198}}
 199
 200function renderResults(query) {{
 201  const visible = nodes.filter(node => node.visible).slice(0, 18);
 202  const signature = query + "|" + visible.map(node => node.key).join(",");
 203  if (signature === lastResultsSignature) return;
 204  lastResultsSignature = signature;
 205  results.innerHTML = '<div class="label">visible nodes</div>' + (
 206    visible.length
 207      ? visible.map(node => `<div class="row"><span>${{escapeHtml(node.kind)}}</span><a href="#" data-key="${{escapeHtml(node.key)}}">${{escapeHtml(node.title || node.key)}}</a></div>`).join("")
 208      : '<div class="empty">No nodes match the current search.</div>'
 209  );
 210  for (const link of results.querySelectorAll("a[data-key]")) {{
 211    link.addEventListener("click", event => {{
 212      event.preventDefault();
 213      selectNode(nodeLookup.get(link.dataset.key));
 214    }});
 215  }}
 216}}
 217
 218function selectNode(node) {{
 219  selected = node;
 220  if (!node) return;
 221  const linked = edges.filter(edge => edge.from.key === node.key || edge.to.key === node.key).slice(0, 12);
 222  details.innerHTML = `
 223    <div class="label">${{escapeHtml(node.kind)}} · ${{escapeHtml(node.status)}}</div>
 224    <div class="node-title">${{escapeHtml(node.title || node.key)}}</div>
 225    <div class="row"><span>key</span><div>${{escapeHtml(node.key)}}</div></div>
 226    <div class="row"><span>summary</span><div>${{escapeHtml(node.summary || "No summary recorded.")}}</div></div>
 227    <div class="row"><span>score</span><div>salience ${{node.salience ?? "n/a"}} · confidence ${{node.confidence ?? "n/a"}}</div></div>
 228    <div class="row"><span>tags</span><div>${{(node.tags || []).map(tag => `<span class="pill">${{escapeHtml(tag)}}</span>`).join("") || "none"}}</div></div>
 229    <div class="row"><span>evidence</span><ul class="list">${{(node.evidence_refs || []).map(ref => `<li>${{escapeHtml(ref)}}</li>`).join("") || "<li>none</li>"}}</ul></div>
 230    <div class="row"><span>links</span><ul class="list">${{linked.map(edge => `<li>${{escapeHtml(edge.from.key === node.key ? edge.relation + " → " + edge.to.key : edge.relation + " ← " + edge.from.key)}}</li>`).join("") || "<li>none</li>"}}</ul></div>
 231  `;
 232}}
 233
 234function escapeHtml(value) {{
 235  return String(value ?? "").replace(/[&<>"']/g, char => ({{ "&": "&amp;", "<": "&lt;", ">": "&gt;", '"': "&quot;", "'": "&#39;" }}[char]));
 236}}
 237
 238canvas.addEventListener("mousedown", event => {{ dragging = true; last = [event.clientX, event.clientY]; }});
 239window.addEventListener("mouseup", () => dragging = false);
 240window.addEventListener("mousemove", event => {{
 241  if (!dragging) return;
 242  rotY += (event.clientX - last[0]) * 0.006;
 243  rotX += (event.clientY - last[1]) * 0.006;
 244  last = [event.clientX, event.clientY];
 245}});
 246canvas.addEventListener("wheel", event => {{
 247  event.preventDefault();
 248  zoom = Math.max(.35, Math.min(3.2, zoom * (event.deltaY > 0 ? .92 : 1.08)));
 249}}, {{ passive: false }});
 250canvas.addEventListener("click", event => {{
 251  const rect = canvas.getBoundingClientRect();
 252  const x = event.clientX - rect.left, y = event.clientY - rect.top;
 253  let best = null, bestDistance = 18;
 254  for (const node of nodes) {{
 255    if (!node.visible) continue;
 256    const dx = node.screen[0] - x, dy = node.screen[1] - y;
 257    const distance = Math.hypot(dx, dy);
 258    if (distance < bestDistance) {{ best = node; bestDistance = distance; }}
 259  }}
 260  if (best) selectNode(best);
 261}});
 262search.addEventListener("input", () => {{}});
 263if (nodes[0]) selectNode(nodes[0]);
 264draw();
 265</script>
 266</body>
 267</html>
 268"""
 269
 270
 271def _view_node(node: dict[str, Any]) -> dict[str, Any]:
 272    return {
 273        "key": str(node.get("key") or node.get("title") or "memory"),
 274        "title": str(node.get("title") or node.get("key") or "memory"),
 275        "kind": str(node.get("kind") or "fact"),
 276        "status": str(node.get("status") or "active"),
 277        "summary": str(node.get("summary") or ""),
 278        "salience": node.get("salience"),
 279        "confidence": node.get("confidence"),
 280        "tags": _string_list(node.get("tags")),
 281        "evidence_refs": _string_list(node.get("evidence_refs")),
 282        "created_at": str(node.get("created_at") or ""),
 283        "updated_at": str(node.get("updated_at") or ""),
 284    }
 285
 286
 287def _view_edge(edge: dict[str, Any]) -> dict[str, Any]:
 288    return {
 289        "from_key": str(edge.get("from_key") or ""),
 290        "to_key": str(edge.get("to_key") or ""),
 291        "relation": str(edge.get("relation") or "related_to"),
 292        "summary": str(edge.get("summary") or ""),
 293    }
 294
 295
 296def _string_list(value: Any) -> list[str]:
 297    if not isinstance(value, list):
 298        return []
 299    return [" ".join(str(item).split()) for item in value if str(item).strip()]
nipux_cli/metric_format.py 17 lines
   1"""Small formatting helpers for measured worker results."""
   2
   3from __future__ import annotations
   4
   5from typing import Any
   6
   7
   8def format_metric_value(name: Any, value: Any, unit: Any = "") -> str:
   9    """Return a readable metric string such as ``score=0.82`` or ``tokens=4200 tokens``."""
  10
  11    metric_name = str(name or "metric").strip() or "metric"
  12    metric_value = str(value).strip()
  13    metric_unit = str(unit or "").strip()
  14    if not metric_unit:
  15        return f"{metric_name}={metric_value}"
  16    separator = "" if metric_unit.startswith(("%", "/", "°")) else " "
  17    return f"{metric_name}={metric_value}{separator}{metric_unit}"
nipux_cli/operator_context.py 83 lines
   1"""Generic filtering for operator messages that enter worker context."""
   2
   3from __future__ import annotations
   4
   5import re
   6from typing import Any
   7
   8
   9CONVERSATION_ONLY_PATTERNS = [
  10    re.compile(r"(?i)^\s*(hi|hello|hey|yo|thanks|thank you|ok|okay|cool|nice|great|hello\?)\s*[.!?]*\s*$"),
  11    re.compile(r"(?i)^\s*(how('?s| is) it going|what('?s| is) going on|any updates?|status|jobs|ls|help|clear|exit|quit)\s*[.!?]*\s*$"),
  12    re.compile(r"(?i)^\s*(how('?s| is) it going)\??\s*(have you got|any)\s+(any\s+)?(improvements?|updates?|results?)\s*(yet)?\s*[.!?]*\s*$"),
  13    re.compile(r"(?i)^\s*(run|start|stop|pause|resume|cancel|work|status|jobs|clear|exit|quit|help)\s+\d+\s*$"),
  14]
  15
  16ACTIONABLE_PATTERNS = [
  17    re.compile(
  18        r"(?i)\b("
  19        r"avoid|because|benchmark|change|constraint|correct|do not|don't|dont|fix|focus|instead|instruction|"
  20        r"measure|must|need|never|only|prefer|priority|remember|should|target|use|wrong"
  21        r"|prioriti[sz]e)\b"
  22    ),
  23    re.compile(r"[\"'`][^\"'`]{2,}[\"'`]"),
  24]
  25
  26
  27def operator_entry_is_active(entry: dict[str, Any]) -> bool:
  28    mode = str(entry.get("mode") or "steer").strip().lower().replace("-", "_")
  29    return (
  30        mode in {"steer", "follow_up"}
  31        and not entry.get("acknowledged_at")
  32        and not entry.get("superseded_at")
  33    )
  34
  35
  36def operator_entry_is_prompt_relevant(entry: dict[str, Any]) -> bool:
  37    mode = str(entry.get("mode") or "steer").strip().lower().replace("-", "_")
  38    message = str(entry.get("message") or "").strip()
  39    if not message:
  40        return False
  41    if mode == "note":
  42        return not _conversation_only(message)
  43    if mode not in {"steer", "follow_up"}:
  44        return False
  45    if entry.get("acknowledged_at") or entry.get("superseded_at"):
  46        return False
  47    return _actionable(message)
  48
  49
  50def active_prompt_operator_entries(messages: list[Any]) -> list[dict[str, Any]]:
  51    return [
  52        entry
  53        for entry in messages
  54        if isinstance(entry, dict)
  55        and operator_entry_is_prompt_relevant(entry)
  56    ]
  57
  58
  59def inactive_prompt_operator_ids(messages: list[Any]) -> list[str]:
  60    ids: list[str] = []
  61    for entry in messages:
  62        if not isinstance(entry, dict):
  63            continue
  64        if not operator_entry_is_active(entry):
  65            continue
  66        if operator_entry_is_prompt_relevant(entry):
  67            continue
  68        event_id = str(entry.get("event_id") or "")
  69        if event_id:
  70            ids.append(event_id)
  71    return ids
  72
  73
  74def _conversation_only(message: str) -> bool:
  75    text = " ".join(message.split())
  76    return any(pattern.search(text) for pattern in CONVERSATION_ONLY_PATTERNS)
  77
  78
  79def _actionable(message: str) -> bool:
  80    text = " ".join(message.split())
  81    if _conversation_only(text):
  82        return False
  83    return any(pattern.search(text) for pattern in ACTIONABLE_PATTERNS)
nipux_cli/parser_builder.py 356 lines
   1"""Argparse construction for Nipux CLI commands."""
   2
   3from __future__ import annotations
   4
   5import argparse
   6from collections.abc import Callable, Mapping
   7
   8
   9CommandHandler = Callable[[argparse.Namespace], None]
  10CommandHandlers = Mapping[str, CommandHandler]
  11
  12
  13def _handler(handlers: CommandHandlers, name: str) -> CommandHandler:
  14    return handlers[name]
  15
  16
  17def build_arg_parser(
  18    *,
  19    handlers: CommandHandlers,
  20    version: str,
  21    default_context_length: int,
  22) -> argparse.ArgumentParser:
  23    parser = argparse.ArgumentParser(prog="nipux")
  24    parser.add_argument("--version", action="version", version=f"nipux {version}")
  25    sub = parser.add_subparsers(dest="command", required=True)
  26
  27    init = sub.add_parser("init")
  28    init.add_argument("--path")
  29    init.add_argument("--force", action="store_true")
  30    init.add_argument("--openrouter", action="store_true", help="Write an OpenRouter config that reads OPENROUTER_API_KEY")
  31    init.add_argument("--model", help="Model name to write into config.yaml")
  32    init.add_argument("--base-url", help="OpenAI-compatible API base URL")
  33    init.add_argument("--api-key-env", help="Environment variable that stores the API key")
  34    init.add_argument("--context-length", type=int, default=default_context_length)
  35    init.set_defaults(func=_handler(handlers, "init"))
  36
  37    update = sub.add_parser("update")
  38    update.add_argument("--path", help="Git checkout to update. Defaults to the current Nipux install.")
  39    update.add_argument("--allow-dirty", action="store_true", help="Attempt git pull even when local changes exist")
  40    update.add_argument("--no-restart", action="store_true", help="Do not restart a running daemon after updating")
  41    update.set_defaults(func=_handler(handlers, "update"))
  42
  43    uninstall = sub.add_parser("uninstall")
  44    uninstall.add_argument("--yes", action="store_true", help="Confirm removal without an interactive prompt")
  45    uninstall.add_argument("--dry-run", action="store_true", help="Show what would be removed")
  46    uninstall.add_argument("--keep-legacy", action="store_true", help="Keep legacy ~/.kneepucks state if present")
  47    uninstall.add_argument("--keep-tool", action="store_true", help="Keep the installed nipux command")
  48    uninstall.add_argument("--remove-tool", action="store_true", help=argparse.SUPPRESS)
  49    uninstall.add_argument("--wait", type=float, default=5.0, help="Seconds to wait for daemon shutdown")
  50    uninstall.set_defaults(func=_handler(handlers, "uninstall"))
  51
  52    create = sub.add_parser("create", aliases=["new"])
  53    create.add_argument("objective")
  54    create.add_argument("--title")
  55    create.add_argument("--kind", default="generic")
  56    create.add_argument("--cadence")
  57    create.set_defaults(func=_handler(handlers, "create"))
  58
  59    jobs = sub.add_parser("jobs")
  60    jobs.set_defaults(func=_handler(handlers, "jobs"))
  61
  62    ls_cmd = sub.add_parser("ls")
  63    ls_cmd.set_defaults(func=_handler(handlers, "jobs"))
  64
  65    focus = sub.add_parser("focus")
  66    focus.add_argument("query", nargs="*")
  67    focus.set_defaults(func=_handler(handlers, "focus"))
  68
  69    rename = sub.add_parser("rename")
  70    rename.add_argument("job_id", nargs="*")
  71    rename.add_argument("--title", nargs="+", required=True)
  72    rename.set_defaults(func=_handler(handlers, "rename"))
  73
  74    delete = sub.add_parser("delete", aliases=["rm"])
  75    delete.add_argument("job_id", nargs="*")
  76    delete.add_argument("--keep-files", action="store_true")
  77    delete.set_defaults(func=_handler(handlers, "delete"))
  78
  79    chat = sub.add_parser("chat")
  80    chat.add_argument("job_id", nargs="*")
  81    chat.add_argument("--history-limit", type=int, default=12)
  82    chat.add_argument("--no-history", action="store_true")
  83    chat.set_defaults(func=_handler(handlers, "chat"))
  84
  85    shell = sub.add_parser("shell")
  86    shell.add_argument("--status", action="store_true", help="Render the full dashboard when the shell opens")
  87    shell.add_argument("--no-status", action="store_true", help=argparse.SUPPRESS)
  88    shell.add_argument("--limit", type=int, default=8)
  89    shell.add_argument("--chars", type=int, default=180)
  90    shell.set_defaults(func=_handler(handlers, "shell"))
  91
  92    steer = sub.add_parser("steer", aliases=["say"])
  93    steer.add_argument("--job", dest="job_id")
  94    steer.add_argument("message", nargs="+")
  95    steer.set_defaults(func=_handler(handlers, "steer"))
  96
  97    pause = sub.add_parser("pause")
  98    pause.add_argument("parts", nargs="*", help="Optional job title/id followed by an optional note")
  99    pause.set_defaults(func=_handler(handlers, "pause"))
 100
 101    resume = sub.add_parser("resume")
 102    resume.add_argument("job_id", nargs="*")
 103    resume.set_defaults(func=_handler(handlers, "resume"))
 104
 105    cancel = sub.add_parser("cancel")
 106    cancel.add_argument("parts", nargs="*", help="Optional job title/id followed by an optional note")
 107    cancel.set_defaults(func=_handler(handlers, "cancel"))
 108
 109    status = sub.add_parser("status")
 110    status.add_argument("job_id", nargs="*")
 111    status.add_argument("--limit", type=int, default=8)
 112    status.add_argument("--chars", type=int, default=180)
 113    status.add_argument("--full", action="store_true", help="Render the full dashboard")
 114    status.add_argument("--json", action="store_true")
 115    status.set_defaults(func=_handler(handlers, "status"))
 116
 117    health = sub.add_parser("health")
 118    health.add_argument("--limit", type=int, default=8)
 119    health.add_argument("--chars", type=int, default=180)
 120    health.set_defaults(func=_handler(handlers, "health"))
 121
 122    history = sub.add_parser("history")
 123    history.add_argument("job_id", nargs="*")
 124    history.add_argument("--limit", type=int, default=80)
 125    history.add_argument("--chars", type=int, default=260)
 126    history.add_argument("--full", action="store_true")
 127    history.add_argument("--json", action="store_true")
 128    history.set_defaults(func=_handler(handlers, "history"))
 129
 130    events = sub.add_parser("events")
 131    events.add_argument("job_id", nargs="*")
 132    events.add_argument("--limit", type=int, default=80)
 133    events.add_argument("--chars", type=int, default=260)
 134    events.add_argument("--full", action="store_true")
 135    events.add_argument("--follow", action="store_true")
 136    events.add_argument("--interval", type=float, default=2.0)
 137    events.add_argument("--json", action="store_true")
 138    events.set_defaults(func=_handler(handlers, "events"))
 139
 140    dashboard = sub.add_parser("dashboard", aliases=["dash"])
 141    dashboard.add_argument("job_id", nargs="*")
 142    dashboard.add_argument("--interval", type=float, default=2.0)
 143    dashboard.add_argument("--limit", type=int, default=12)
 144    dashboard.add_argument("--chars", type=int, default=260)
 145    dashboard.add_argument("--no-follow", dest="follow", action="store_false")
 146    dashboard.add_argument("--no-clear", dest="clear", action="store_false")
 147    dashboard.set_defaults(func=_handler(handlers, "dashboard"), follow=True, clear=True)
 148
 149    start = sub.add_parser("start")
 150    start.add_argument("--poll-seconds", type=float, default=0.0)
 151    start.add_argument("--fake", action="store_true", help="Use deterministic fake model responses")
 152    start.add_argument("--quiet", action="store_true", help="Write fewer daemon log lines")
 153    start.add_argument("--log-file")
 154    start.set_defaults(func=_handler(handlers, "start"))
 155
 156    stop = sub.add_parser("stop")
 157    stop.add_argument("job_id", nargs="*", help="Optional job title/id to pause instead of stopping the daemon")
 158    stop.add_argument("--wait", type=float, default=5.0)
 159    stop.set_defaults(func=_handler(handlers, "stop"))
 160
 161    restart = sub.add_parser("restart")
 162    restart.add_argument("--poll-seconds", type=float, default=0.0)
 163    restart.add_argument("--wait", type=float, default=5.0)
 164    restart.add_argument("--fake", action="store_true", help="Use deterministic fake model responses")
 165    restart.add_argument("--quiet", action="store_true", help="Write fewer daemon log lines")
 166    restart.add_argument("--log-file")
 167    restart.set_defaults(func=_handler(handlers, "restart"))
 168
 169    browser_dashboard = sub.add_parser("browser-dashboard")
 170    browser_dashboard.add_argument("--port", type=int, default=4848)
 171    browser_dashboard.add_argument("--foreground", action="store_true")
 172    browser_dashboard.add_argument("--stop", action="store_true")
 173    browser_dashboard.add_argument("--log-file")
 174    browser_dashboard.set_defaults(func=_handler(handlers, "browser_dashboard"))
 175
 176    autostart = sub.add_parser("autostart")
 177    autostart.add_argument("action", choices=["install", "status", "uninstall"])
 178    autostart.add_argument("--poll-seconds", type=float, default=5.0)
 179    autostart.add_argument("--quiet", action="store_true")
 180    autostart.set_defaults(func=_handler(handlers, "autostart"))
 181
 182    service = sub.add_parser("service")
 183    service.add_argument("action", choices=["install", "status", "uninstall"])
 184    service.add_argument("--poll-seconds", type=float, default=0.0)
 185    service.add_argument("--quiet", action="store_true")
 186    service.set_defaults(func=_handler(handlers, "service"))
 187
 188    artifacts = sub.add_parser("artifacts")
 189    artifacts.add_argument("job_id", nargs="*")
 190    artifacts.add_argument("--limit", type=int, default=25)
 191    artifacts.add_argument("--chars", type=int, default=220)
 192    artifacts.add_argument("--paths", action="store_true", help="Show full artifact paths")
 193    artifacts.set_defaults(func=_handler(handlers, "artifacts"))
 194
 195    artifact = sub.add_parser("artifact")
 196    artifact.add_argument("artifact_id_or_path", nargs="+")
 197    artifact.add_argument("--job", dest="job_id")
 198    artifact.add_argument("--chars", type=int, default=12000)
 199    artifact.set_defaults(func=_handler(handlers, "artifact"))
 200
 201    lessons = sub.add_parser("lessons")
 202    lessons.add_argument("job_id", nargs="*")
 203    lessons.add_argument("--limit", type=int, default=25)
 204    lessons.add_argument("--chars", type=int, default=220)
 205    lessons.set_defaults(func=_handler(handlers, "lessons"))
 206
 207    learn = sub.add_parser("learn")
 208    learn.add_argument("--job", dest="job_id")
 209    learn.add_argument("--category", default="operator_preference")
 210    learn.add_argument("--chars", type=int, default=220)
 211    learn.add_argument("lesson", nargs="+")
 212    learn.set_defaults(func=_handler(handlers, "learn"))
 213
 214    findings = sub.add_parser("findings")
 215    findings.add_argument("job_id", nargs="*")
 216    findings.add_argument("--limit", type=int, default=25)
 217    findings.add_argument("--chars", type=int, default=220)
 218    findings.add_argument("--json", action="store_true")
 219    findings.set_defaults(func=_handler(handlers, "findings"))
 220
 221    tasks = sub.add_parser("tasks")
 222    tasks.add_argument("job_id", nargs="*")
 223    tasks.add_argument("--limit", type=int, default=25)
 224    tasks.add_argument("--chars", type=int, default=220)
 225    tasks.add_argument("--status", nargs="+")
 226    tasks.add_argument("--json", action="store_true")
 227    tasks.set_defaults(func=_handler(handlers, "tasks"))
 228
 229    roadmap = sub.add_parser("roadmap")
 230    roadmap.add_argument("job_id", nargs="*")
 231    roadmap.add_argument("--limit", type=int, default=25)
 232    roadmap.add_argument("--features", type=int, default=3)
 233    roadmap.add_argument("--chars", type=int, default=220)
 234    roadmap.add_argument("--json", action="store_true")
 235    roadmap.set_defaults(func=_handler(handlers, "roadmap"))
 236
 237    experiments = sub.add_parser("experiments")
 238    experiments.add_argument("job_id", nargs="*")
 239    experiments.add_argument("--limit", type=int, default=25)
 240    experiments.add_argument("--chars", type=int, default=220)
 241    experiments.add_argument("--status", nargs="+")
 242    experiments.add_argument("--json", action="store_true")
 243    experiments.set_defaults(func=_handler(handlers, "experiments"))
 244
 245    sources = sub.add_parser("sources")
 246    sources.add_argument("job_id", nargs="*")
 247    sources.add_argument("--limit", type=int, default=25)
 248    sources.add_argument("--chars", type=int, default=220)
 249    sources.add_argument("--json", action="store_true")
 250    sources.set_defaults(func=_handler(handlers, "sources"))
 251
 252    memory = sub.add_parser("memory")
 253    memory.add_argument("job_id", nargs="*")
 254    memory.add_argument("--limit", type=int, default=10)
 255    memory.add_argument("--chars", type=int, default=260)
 256    memory.add_argument("--json", action="store_true", help="Print memory graph JSON")
 257    memory.add_argument("--graph", action="store_true", help="Write a clickable HTML memory graph")
 258    memory.add_argument("--output", help="Path for --graph HTML output")
 259    memory.set_defaults(func=_handler(handlers, "memory"))
 260
 261    metrics = sub.add_parser("metrics")
 262    metrics.add_argument("job_id", nargs="*")
 263    metrics.add_argument("--chars", type=int, default=220)
 264    metrics.set_defaults(func=_handler(handlers, "metrics"))
 265
 266    usage = sub.add_parser("usage")
 267    usage.add_argument("job_id", nargs="*")
 268    usage.add_argument("--json", action="store_true")
 269    usage.set_defaults(func=_handler(handlers, "usage"))
 270
 271    logs = sub.add_parser("logs", aliases=["outputs", "output"])
 272    logs.add_argument("job_id", nargs="*")
 273    logs.add_argument("--limit", type=int, default=25)
 274    logs.add_argument("--verbose", action="store_true")
 275    logs.add_argument("--chars", type=int, default=4000)
 276    logs.set_defaults(func=_handler(handlers, "logs"))
 277
 278    activity = sub.add_parser("activity", aliases=["feed", "tail"])
 279    activity.add_argument("job_id", nargs="*")
 280    activity.add_argument("--limit", type=int, default=20)
 281    activity.add_argument("--chars", type=int, default=180)
 282    activity.add_argument("--follow", action="store_true")
 283    activity.add_argument("--interval", type=float, default=2.0)
 284    activity.add_argument("--verbose", action="store_true")
 285    activity.add_argument("--paths", action="store_true", help="Show full artifact paths")
 286    activity.set_defaults(func=_handler(handlers, "activity"))
 287
 288    updates = sub.add_parser("updates", aliases=["outcomes", "outcome"])
 289    updates.add_argument("job_id", nargs="*")
 290    updates.add_argument("--all", action="store_true", help="Show durable outcome summaries for every job")
 291    updates.add_argument("--limit", type=int, default=5)
 292    updates.add_argument("--chars", type=int, default=180)
 293    updates.add_argument("--paths", action="store_true", help="Show full artifact paths")
 294    updates.set_defaults(func=_handler(handlers, "updates"))
 295
 296    watch = sub.add_parser("watch")
 297    watch.add_argument("job_id", nargs="+")
 298    watch.add_argument("--interval", type=float, default=2.0)
 299    watch.add_argument("--limit", type=int, default=20)
 300    watch.add_argument("--verbose", action="store_true")
 301    watch.add_argument("--chars", type=int, default=4000)
 302    watch.add_argument("--no-follow", dest="follow", action="store_false")
 303    watch.set_defaults(func=_handler(handlers, "watch"), follow=True)
 304
 305    run_one = sub.add_parser("run-one")
 306    run_one.add_argument("job_id", nargs="+")
 307    run_one.add_argument("--fake", action="store_true", help="Use a deterministic fake model response")
 308    run_one.set_defaults(func=_handler(handlers, "run_one"))
 309
 310    work = sub.add_parser("work")
 311    work.add_argument("job_id", nargs="*")
 312    work.add_argument("--steps", type=int, default=5)
 313    work.add_argument("--poll-seconds", type=float, default=0.5)
 314    work.add_argument("--fake", action="store_true", help="Use deterministic fake model responses")
 315    work.add_argument("--verbose", action="store_true", help="Print step inputs and outputs")
 316    work.add_argument("--dashboard", action="store_true", help="Render a dashboard snapshot after each step")
 317    work.add_argument("--limit", type=int, default=12)
 318    work.add_argument("--chars", type=int, default=4000)
 319    work.add_argument("--continue-on-error", action="store_true")
 320    work.set_defaults(func=_handler(handlers, "work"))
 321
 322    run = sub.add_parser("run")
 323    run.add_argument("job_id", nargs="*")
 324    run.add_argument("--poll-seconds", type=float, default=0.0)
 325    run.add_argument("--interval", type=float, default=2.0)
 326    run.add_argument("--limit", type=int, default=20)
 327    run.add_argument("--chars", type=int, default=180)
 328    run.add_argument("--verbose", action="store_true")
 329    run.add_argument("--paths", action="store_true")
 330    run.add_argument("--fake", action="store_true", help="Use deterministic fake model responses")
 331    run.add_argument("--quiet", action="store_true", help="Write fewer daemon log lines")
 332    run.add_argument("--log-file")
 333    run.add_argument("--no-follow", action="store_true", help="Start daemon and return without tailing activity")
 334    run.set_defaults(func=_handler(handlers, "run"))
 335
 336    digest = sub.add_parser("digest")
 337    digest.add_argument("job_id", nargs="+")
 338    digest.set_defaults(func=_handler(handlers, "digest"))
 339
 340    daily_digest = sub.add_parser("daily-digest")
 341    daily_digest.add_argument("--day", help="YYYY-MM-DD. Defaults to today.")
 342    daily_digest.set_defaults(func=_handler(handlers, "daily_digest"))
 343
 344    daemon = sub.add_parser("daemon")
 345    daemon.add_argument("--once", action="store_true", help="Run at most one job step and exit")
 346    daemon.add_argument("--fake", action="store_true", help="Use deterministic fake model responses")
 347    daemon.add_argument("--poll-seconds", type=float, default=0.0)
 348    daemon.add_argument("--quiet", action="store_true", help="Do not print foreground progress lines")
 349    daemon.add_argument("--verbose", action="store_true", help="Print model-visible job state and step results")
 350    daemon.set_defaults(func=_handler(handlers, "daemon"))
 351
 352    doctor = sub.add_parser("doctor")
 353    doctor.add_argument("--check-model", action="store_true", help="Also call the local model /models endpoint")
 354    doctor.set_defaults(func=_handler(handlers, "doctor"))
 355
 356    return parser
nipux_cli/planning.py 384 lines
   1"""Generic initial planning primitives for long-running jobs."""
   2
   3from __future__ import annotations
   4
   5import re
   6from typing import Any
   7
   8
   9_PROFILE_TERMS: dict[str, set[str]] = {
  10    "measured": {
  11        "accelerate",
  12        "benchmark",
  13        "compare",
  14        "decrease",
  15        "faster",
  16        "improve",
  17        "increase",
  18        "latency",
  19        "measure",
  20        "metric",
  21        "optimize",
  22        "performance",
  23        "reduce",
  24        "score",
  25        "speed",
  26        "test",
  27        "throughput",
  28    },
  29    "deliverable": {
  30        "article",
  31        "artifact",
  32        "checklist",
  33        "create",
  34        "deck",
  35        "doc",
  36        "document",
  37        "draft",
  38        "file",
  39        "generate",
  40        "guide",
  41        "manual",
  42        "memo",
  43        "outline",
  44        "paper",
  45        "produce",
  46        "presentation",
  47        "report",
  48        "spec",
  49        "template",
  50        "write",
  51    },
  52    "monitor": {
  53        "alert",
  54        "check",
  55        "observe",
  56        "periodic",
  57        "reporting",
  58        "track",
  59        "watch",
  60        "monitor",
  61    },
  62    "implementation": {
  63        "automate",
  64        "build",
  65        "change",
  66        "code",
  67        "debug",
  68        "deploy",
  69        "fix",
  70        "implement",
  71        "install",
  72        "repair",
  73        "run",
  74        "setup",
  75    },
  76    "research": {
  77        "analyze",
  78        "compare",
  79        "explore",
  80        "find",
  81        "investigate",
  82        "learn",
  83        "map",
  84        "research",
  85        "review",
  86        "summarize",
  87        "survey",
  88    },
  89}
  90_PROFILE_PRIORITY = {
  91    "measured": 0,
  92    "monitor": 1,
  93    "implementation": 2,
  94    "deliverable": 3,
  95    "research": 4,
  96}
  97
  98
  99def objective_profiles(objective: str) -> list[str]:
 100    """Infer generic work profiles from an objective without binding to a domain."""
 101
 102    tokens = set(re.findall(r"[a-z][a-z0-9_-]+", objective.lower()))
 103    scores: list[tuple[int, str]] = []
 104    for profile, terms in _PROFILE_TERMS.items():
 105        score = len(tokens & terms)
 106        if score:
 107            scores.append((score, profile))
 108    if not scores:
 109        return ["general"]
 110    scores.sort(key=lambda item: (-item[0], _PROFILE_PRIORITY.get(item[1], 99), item[1]))
 111    profiles = [profile for _score, profile in scores[:2]]
 112    return profiles or ["general"]
 113
 114
 115def initial_plan_for_objective(objective: str) -> dict[str, Any]:
 116    objective_text = " ".join(objective.split())
 117    profiles = objective_profiles(objective_text)
 118    tasks = _initial_tasks_for_profiles(profiles)
 119    questions = _initial_questions_for_profiles(profiles)
 120    return {
 121        "status": "needs_operator_review",
 122        "summary": _initial_summary_for_profiles(profiles),
 123        "profile": profiles[0],
 124        "profiles": profiles,
 125        "tasks": tasks,
 126        "questions": questions,
 127        "objective": objective_text,
 128    }
 129
 130
 131def initial_task_contract(task_title: str) -> dict[str, str]:
 132    lowered = task_title.lower()
 133    if any(word in lowered for word in ("baseline", "benchmark", "compare", "experiment", "measure", "metric", "test")):
 134        return {
 135            "output_contract": "experiment",
 136            "acceptance_criteria": "A baseline, result, comparison, or explicit blocked measurement is recorded.",
 137            "evidence_needed": "Experiment record with metric, environment or inputs, result direction, and next action.",
 138            "stall_behavior": "Record why measurement is blocked and create the smallest follow-up task that can obtain it.",
 139        }
 140    if any(
 141        word in lowered
 142        for word in (
 143            "article",
 144            "checklist",
 145            "draft",
 146            "deliverable",
 147            "document",
 148            "file",
 149            "generate",
 150            "guide",
 151            "manual",
 152            "paper",
 153            "produce",
 154            "report",
 155            "template",
 156            "write",
 157        )
 158    ):
 159        return {
 160            "output_contract": "report",
 161            "acceptance_criteria": "A durable draft, report, or deliverable section is saved with its evidence status.",
 162            "evidence_needed": "Saved output plus cited evidence, assumptions, gaps, or review notes.",
 163            "stall_behavior": "Save the partial output, record the gap, and create the next evidence or revision task.",
 164        }
 165    if any(word in lowered for word in ("validate", "review", "decide", "criteria", "constraint", "success")):
 166        return {
 167            "output_contract": "decision",
 168            "acceptance_criteria": "The decision, validation result, or success criteria are explicit.",
 169            "evidence_needed": "Operator context, durable notes, milestone validation, or task/roadmap updates.",
 170            "stall_behavior": "Ask for the missing constraint or record a reversible assumption and continue.",
 171        }
 172    if any(word in lowered for word in ("monitor", "watch", "track", "check", "observe")):
 173        return {
 174            "output_contract": "monitor",
 175            "acceptance_criteria": "A check cadence, signal, state change, or next observation time is recorded.",
 176            "evidence_needed": "Monitor result, status update, deferred follow-up, or recorded blocker.",
 177            "stall_behavior": "Defer the job until the next useful check or pivot to a diagnostic task.",
 178        }
 179    if any(word in lowered for word in ("act", "apply", "build", "change", "deploy", "fix", "implement", "install", "run")):
 180        return {
 181            "output_contract": "action",
 182            "acceptance_criteria": "The action produces an observable durable change or a clear blocker.",
 183            "evidence_needed": "Tool result plus file, artifact, ledger, task, roadmap, or experiment update.",
 184            "stall_behavior": "Record the blocker and open a smaller follow-up action.",
 185        }
 186    if "clarify" in lowered or "criteria" in lowered or "constraint" in lowered:
 187        return {
 188            "output_contract": "decision",
 189            "acceptance_criteria": "Success criteria, constraints, and first branches are explicit.",
 190            "evidence_needed": "Operator context, durable notes, or an updated roadmap/task queue.",
 191            "stall_behavior": "Ask for the missing constraint or record a decision with the best current assumption.",
 192        }
 193    if "map" in lowered or "research" in lowered or "branch" in lowered or "explore" in lowered or "source" in lowered:
 194        return {
 195            "output_contract": "research",
 196            "acceptance_criteria": "At least one viable branch is selected and low-value branches are avoided.",
 197            "evidence_needed": "Source notes, branch rationale, source ledger entries, or saved research output.",
 198            "stall_behavior": "Record a low-yield lesson and pivot to a different branch.",
 199        }
 200    if "collect" in lowered or "evidence" in lowered or "save" in lowered or "output" in lowered:
 201        return {
 202            "output_contract": "artifact",
 203            "acceptance_criteria": "A durable output is saved and linked to the task or ledger.",
 204            "evidence_needed": "Artifact, file output, finding record, source record, or experiment record.",
 205            "stall_behavior": "Record what evidence is missing and create the next evidence-producing task.",
 206        }
 207    if "reflect" in lowered or "memory" in lowered or "continue" in lowered:
 208        return {
 209            "output_contract": "monitor",
 210            "acceptance_criteria": "Progress is evaluated from durable deltas and the next branch is chosen.",
 211            "evidence_needed": "Reflection, lesson, task update, roadmap validation, or experiment comparison.",
 212            "stall_behavior": "Record a blocker or pivot when no durable delta was produced.",
 213        }
 214    return {
 215        "output_contract": "action",
 216        "acceptance_criteria": "The task produces an observable durable change.",
 217        "evidence_needed": "Tool result plus artifact, ledger, task, roadmap, or experiment update.",
 218        "stall_behavior": "Record a blocker and open a smaller follow-up task.",
 219    }
 220
 221
 222def initial_roadmap_for_objective(*, title: str, objective: str) -> dict[str, Any]:
 223    profiles = objective_profiles(objective)
 224    execute_contract = _primary_execution_contract(profiles)
 225    return {
 226        "title": title,
 227        "status": "planned",
 228        "objective": objective,
 229        "scope": (
 230            "Initial roadmap generated from the objective and inferred generic work profile "
 231            f"({', '.join(profiles)}). Refine this as evidence and operator context arrive."
 232        ),
 233        "current_milestone": "Clarify and frame the work",
 234        "validation_contract": (
 235            "Each milestone needs observable evidence that its acceptance criteria were met, "
 236            "or a recorded blocker plus follow-up tasks."
 237        ),
 238        "milestones": [
 239            {
 240                "title": "Clarify and frame the work",
 241                "status": "planned",
 242                "priority": 10,
 243                "goal": "Turn the objective into concrete success criteria and constraints.",
 244                "acceptance_criteria": "Success criteria and first branches are explicit.",
 245                "evidence_needed": "Operator context, planning notes, or a recorded task queue.",
 246                "features": [{"title": "Capture success criteria", "status": "planned", "output_contract": "decision"}],
 247            },
 248            {
 249                "title": "Execute first durable branches",
 250                "status": "planned",
 251                "priority": 8,
 252                "goal": "Produce artifacts, findings, actions, or measurements that advance the objective.",
 253                "acceptance_criteria": "At least one branch produces durable evidence.",
 254                "evidence_needed": "Saved outputs, ledger updates, action results, or experiment records.",
 255                "features": [
 256                    {
 257                        "title": "Run the first evidence-producing branch",
 258                        "status": "planned",
 259                        "output_contract": execute_contract,
 260                    }
 261                ],
 262            },
 263            {
 264                "title": "Validate and continue",
 265                "status": "planned",
 266                "priority": 6,
 267                "goal": "Check results against acceptance criteria and create follow-up work.",
 268                "acceptance_criteria": "Validation is passed, failed, or blocked with a next action.",
 269                "evidence_needed": "record_milestone_validation entry and follow-up tasks if needed.",
 270                "features": [{"title": "Validate the checkpoint", "status": "planned", "output_contract": "decision"}],
 271            },
 272        ],
 273        "metadata": {"phase": "initial_plan"},
 274    }
 275
 276
 277def _initial_summary_for_profiles(profiles: list[str]) -> str:
 278    primary = profiles[0] if profiles else "general"
 279    if primary == "measured":
 280        return "I will start by defining the measurable baseline, then iterate on branches that can prove improvement."
 281    if primary == "deliverable":
 282        return "I will start by framing the deliverable, collecting evidence, and saving drafts that can be improved."
 283    if primary == "monitor":
 284        return "I will start by defining the watched signals, first check, cadence, and durable update format."
 285    if primary == "implementation":
 286        return "I will start by inspecting the current state, planning a small action, and validating the result."
 287    if primary == "research":
 288        return "I will start by mapping source branches, collecting evidence, and saving concise findings."
 289    return "I will turn this objective into a durable long-running job before starting tool work."
 290
 291
 292def _initial_tasks_for_profiles(profiles: list[str]) -> list[str]:
 293    tasks: list[str] = ["Clarify success criteria, constraints, and review/report cadence."]
 294    primary = profiles[0] if profiles else "general"
 295    if primary == "measured":
 296        tasks.extend(
 297            [
 298                "Record the baseline metric and measurement method.",
 299                "Run the first measurable branch and record an experiment.",
 300                "Compare the result with the best known baseline and choose the next branch.",
 301            ]
 302        )
 303    elif primary == "deliverable":
 304        tasks.extend(
 305            [
 306                "Map the outline, audience, evidence gaps, and acceptance criteria.",
 307                "Collect evidence for the first section or deliverable unit.",
 308                "Save a durable draft or report checkpoint.",
 309                "Review and revise the latest draft against evidence gaps and acceptance criteria.",
 310            ]
 311        )
 312    elif primary == "monitor":
 313        tasks.extend(
 314            [
 315                "Define watched signals, check cadence, and alert conditions.",
 316                "Run the first status check and save the observation.",
 317                "Defer or continue based on the next useful check time.",
 318            ]
 319        )
 320    elif primary == "implementation":
 321        tasks.extend(
 322            [
 323                "Inspect current state and identify the smallest safe action.",
 324                "Apply one change or execute one action with observable output.",
 325                "Validate the result and record any follow-up branch.",
 326            ]
 327        )
 328    else:
 329        tasks.extend(
 330            [
 331                "Map the first research or execution branches.",
 332                "Collect evidence and save outputs as files.",
 333                "Reflect on what worked, update memory, and continue with the next branch.",
 334            ]
 335        )
 336    return tasks
 337
 338
 339def _initial_questions_for_profiles(profiles: list[str]) -> list[str]:
 340    questions = [
 341        "What result would make this job successful?",
 342        "Are there constraints, risks, or approaches I should avoid?",
 343    ]
 344    primary = profiles[0] if profiles else "general"
 345    if primary == "measured":
 346        questions.insert(1, "What metric should be treated as the primary measure of progress?")
 347    elif primary == "deliverable":
 348        questions.insert(1, "Who is the audience, and what quality bar should the deliverable meet?")
 349    elif primary == "monitor":
 350        questions.insert(1, "How often should I check, and what change should trigger a report?")
 351    elif primary == "implementation":
 352        questions.insert(1, "Which environment or files are in scope, and what requires approval?")
 353    else:
 354        questions.insert(1, "Which sources, artifacts, or signals should I trust most?")
 355    questions.append("Should this run aggressively in the background or wait for review between branches?")
 356    return questions
 357
 358
 359def _primary_execution_contract(profiles: list[str]) -> str:
 360    if "measured" in profiles:
 361        return "experiment"
 362    if "deliverable" in profiles:
 363        return "report"
 364    if "monitor" in profiles:
 365        return "monitor"
 366    if "implementation" in profiles:
 367        return "action"
 368    if "research" in profiles:
 369        return "research"
 370    return "artifact"
 371
 372
 373def format_initial_plan(plan: dict[str, Any]) -> str:
 374    tasks = plan.get("tasks") if isinstance(plan.get("tasks"), list) else []
 375    questions = plan.get("questions") if isinstance(plan.get("questions"), list) else []
 376    lines = [str(plan.get("summary") or "Initial plan created.")]
 377    if tasks:
 378        lines.append("Plan:")
 379        lines.extend(f"- {task}" for task in tasks)
 380    if questions:
 381        lines.append("Questions:")
 382        lines.extend(f"- {question}" for question in questions)
 383    lines.append("Reply with answers, or use the right-side Run control when this plan is good enough to start.")
 384    return "\n".join(lines)
nipux_cli/progress.py 213 lines
   1"""Generic progress summaries for long-running jobs."""
   2
   3from __future__ import annotations
   4
   5from dataclasses import dataclass
   6from typing import Any
   7
   8
   9@dataclass(frozen=True)
  10class ProgressCheckpoint:
  11    message: str
  12    category: str
  13    counts: dict[str, int]
  14    deltas: dict[str, int]
  15    updates: dict[str, int]
  16    resolutions: dict[str, int]
  17    recent: str
  18
  19
  20LEDGER_KEYS = ("findings", "sources", "tasks", "experiments", "lessons", "milestones")
  21
  22
  23def build_progress_checkpoint(
  24    metadata: dict[str, Any],
  25    *,
  26    previous_counts: dict[str, Any] | None = None,
  27    step_no: int,
  28    tool_name: str | None,
  29    artifact_id: str = "",
  30    is_finding_output: bool = False,
  31) -> ProgressCheckpoint:
  32    """Create the operator-facing checkpoint text from durable ledger deltas."""
  33    counts = ledger_counts(metadata)
  34    previous = previous_counts or {}
  35    deltas = {key: counts[key] - _as_int(previous.get(key)) for key in LEDGER_KEYS}
  36    updates = ledger_update_counts(metadata, since=str(metadata.get("last_checkpoint_at") or ""))
  37    resolutions = ledger_resolution_counts(metadata, since=str(metadata.get("last_checkpoint_at") or ""))
  38    recent = recent_progress_bits(metadata)
  39    if is_finding_output:
  40        message = (
  41            f"Saved output {artifact_id}; ledgers now have {counts['findings']} findings, "
  42            f"{counts['sources']} sources, {counts['tasks']} tasks, and {counts['experiments']} experiments."
  43        )
  44        category = "finding"
  45    else:
  46        changed_parts = [_count_phrase(value, key, prefix="+") for key, value in deltas.items() if value > 0]
  47        changed_parts.extend(
  48            _count_phrase(value, key, prefix="~", suffix="updated") for key, value in updates.items() if value > 0
  49        )
  50        changed_parts.extend(
  51            _count_phrase(value, key, suffix="resolved") for key, value in resolutions.items() if value > 0
  52        )
  53        changed = ", ".join(changed_parts)
  54        made_progress = bool(changed)
  55        if not changed:
  56            changed = "no new durable ledger entries"
  57        message = (
  58            f"Checkpoint step #{step_no}: {changed}. Totals: {counts['findings']} findings, "
  59            f"{counts['sources']} sources, {counts['tasks']} tasks, {counts['experiments']} experiments, "
  60            f"{counts['lessons']} lessons."
  61        )
  62        category = "progress" if made_progress else "activity"
  63    if recent:
  64        message = f"{message} Recent: {recent}."
  65    return ProgressCheckpoint(
  66        message=message,
  67        category=category,
  68        counts=counts,
  69        deltas=deltas,
  70        updates=updates,
  71        resolutions=resolutions,
  72        recent=recent,
  73    )
  74
  75
  76def ledger_counts(metadata: dict[str, Any]) -> dict[str, int]:
  77    roadmap = metadata.get("roadmap") if isinstance(metadata.get("roadmap"), dict) else {}
  78    milestones = roadmap.get("milestones") if isinstance(roadmap.get("milestones"), list) else []
  79    return {
  80        "findings": len(_metadata_list(metadata, "finding_ledger")),
  81        "sources": len(_metadata_list(metadata, "source_ledger")),
  82        "tasks": len(_metadata_list(metadata, "task_queue")),
  83        "experiments": len(_metadata_list(metadata, "experiment_ledger")),
  84        "lessons": len(_metadata_list(metadata, "lessons")),
  85        "milestones": len(milestones),
  86    }
  87
  88
  89def ledger_update_counts(metadata: dict[str, Any], *, since: str = "") -> dict[str, int]:
  90    """Count durable ledger updates that do not increase ledger size."""
  91    counts = {key: 0 for key in LEDGER_KEYS}
  92    record_map = {
  93        "findings": "last_finding_record",
  94        "sources": "last_source_record",
  95        "tasks": "last_task_record",
  96        "experiments": "last_experiment_record",
  97    }
  98    for key, metadata_key in record_map.items():
  99        record = metadata.get(metadata_key)
 100        if _updated_existing_record(record, since=since):
 101            counts[key] += 1
 102    roadmap = metadata.get("last_roadmap_record")
 103    if isinstance(roadmap, dict) and _record_after_checkpoint(roadmap, since=since):
 104        updated = _as_int(roadmap.get("updated_milestones")) + _as_int(roadmap.get("updated_features"))
 105        if roadmap.get("roadmap_updated"):
 106            updated += 1
 107        added = _as_int(roadmap.get("added_milestones")) + _as_int(roadmap.get("added_features"))
 108        if updated > 0 and added <= 0:
 109            counts["milestones"] += 1
 110    validation = metadata.get("last_milestone_validation")
 111    if isinstance(validation, dict) and _record_after_checkpoint(validation, since=since):
 112        counts["milestones"] += 1
 113    return counts
 114
 115
 116def ledger_resolution_counts(metadata: dict[str, Any], *, since: str = "") -> dict[str, int]:
 117    """Count durable branch resolutions so task updates do not look like empty churn."""
 118    counts = {key: 0 for key in LEDGER_KEYS}
 119    task = metadata.get("last_task_record")
 120    if _updated_existing_record(task, since=since):
 121        status = str(task.get("status") or "").lower() if isinstance(task, dict) else ""
 122        if status in {"done", "blocked", "skipped"} and (task.get("result") or task.get("evidence_needed")):
 123            counts["tasks"] += 1
 124    experiment = metadata.get("last_experiment_record")
 125    if _updated_existing_record(experiment, since=since):
 126        status = str(experiment.get("status") or "").lower() if isinstance(experiment, dict) else ""
 127        if status in {"measured", "failed", "blocked", "skipped"} or experiment.get("metric_value") is not None:
 128            counts["experiments"] += 1
 129    validation = metadata.get("last_milestone_validation")
 130    if isinstance(validation, dict) and _record_after_checkpoint(validation, since=since):
 131        counts["milestones"] += 1
 132    return counts
 133
 134
 135def recent_progress_bits(metadata: dict[str, Any]) -> str:
 136    bits: list[str] = []
 137    findings = _metadata_list(metadata, "finding_ledger")
 138    if findings:
 139        finding = findings[-1]
 140        bits.append(f"finding={_clip_text(str(finding.get('name') or finding.get('title') or 'finding'), 80)}")
 141    active_tasks = [
 142        task
 143        for task in _metadata_list(metadata, "task_queue")
 144        if str(task.get("status") or "open").lower() in {"active", "open", "blocked"}
 145    ]
 146    if active_tasks:
 147        task = sorted(active_tasks, key=lambda entry: -_as_int(entry.get("priority")))[0]
 148        bits.append(f"task={_clip_text(str(task.get('title') or 'task'), 80)}")
 149    measured = [
 150        experiment
 151        for experiment in _metadata_list(metadata, "experiment_ledger")
 152        if experiment.get("metric_value") is not None
 153    ]
 154    if measured:
 155        experiment = measured[-1]
 156        metric = f"{experiment.get('metric_name') or 'metric'}={experiment.get('metric_value')}{experiment.get('metric_unit') or ''}"
 157        bits.append(f"measurement={_clip_text(metric, 80)}")
 158    roadmap = metadata.get("roadmap") if isinstance(metadata.get("roadmap"), dict) else {}
 159    milestones = roadmap.get("milestones") if isinstance(roadmap.get("milestones"), list) else []
 160    active_milestones = [
 161        milestone
 162        for milestone in milestones
 163        if isinstance(milestone, dict)
 164        and str(milestone.get("status") or "planned").lower() in {"active", "validating", "blocked"}
 165    ]
 166    if active_milestones:
 167        bits.append(f"milestone={_clip_text(str(active_milestones[-1].get('title') or 'milestone'), 80)}")
 168    return "; ".join(bits)
 169
 170
 171def _metadata_list(metadata: dict[str, Any], key: str) -> list[dict[str, Any]]:
 172    value = metadata.get(key)
 173    if not isinstance(value, list):
 174        return []
 175    return [item for item in value if isinstance(item, dict)]
 176
 177
 178def _clip_text(value: str, limit: int) -> str:
 179    text = " ".join(value.split())
 180    if len(text) <= limit:
 181        return text
 182    return text[: max(0, limit - 1)].rstrip() + "..."
 183
 184
 185def _updated_existing_record(record: Any, *, since: str) -> bool:
 186    return (
 187        isinstance(record, dict)
 188        and record.get("created") is False
 189        and record.get("substantive_update") is not False
 190        and _record_after_checkpoint(record, since=since)
 191    )
 192
 193
 194def _record_after_checkpoint(record: dict[str, Any], *, since: str) -> bool:
 195    if not since:
 196        return True
 197    updated_at = str(record.get("updated_at") or record.get("validated_at") or record.get("last_seen") or record.get("at") or "")
 198    return bool(updated_at and updated_at > since)
 199
 200
 201def _count_phrase(value: int, key: str, *, prefix: str = "", suffix: str = "") -> str:
 202    label = key[:-1] if value == 1 and key.endswith("s") else key
 203    bits = [f"{prefix}{value} {label}"]
 204    if suffix:
 205        bits.append(suffix)
 206    return " ".join(bits)
 207
 208
 209def _as_int(value: Any) -> int:
 210    try:
 211        return int(value)
 212    except (TypeError, ValueError):
 213        return 0
nipux_cli/provider_errors.py 64 lines
   1"""Generic model-provider error classification."""
   2
   3from __future__ import annotations
   4
   5import json
   6from typing import Any
   7
   8
   9PROVIDER_ACTION_MARKERS = (
  10    "authenticationerror",
  11    "permissiondeniederror",
  12    "authentication failed",
  13    "permission denied",
  14    "invalid api key",
  15    "incorrect api key",
  16    "user not found",
  17    "key limit exceeded",
  18    "insufficient_quota",
  19    "insufficient quota",
  20    "insufficient credits",
  21    "billing",
  22    "payment required",
  23    "credit limit",
  24    "quota exceeded",
  25    "401",
  26    "403",
  27)
  28
  29RATE_LIMIT_MARKERS = (
  30    "429",
  31    "rate limit",
  32    "ratelimit",
  33    "too many requests",
  34    "temporarily over capacity",
  35)
  36
  37PROVIDER_ACTION_REQUIRED_NOTE = (
  38    "Model provider requires operator action: authentication, permission, billing, or quota is blocking calls. "
  39    "Paused this job so the daemon does not repeat failing model requests. Update credentials/model access, then resume."
  40)
  41
  42
  43def provider_error_text(error: Any) -> str:
  44    if isinstance(error, str):
  45        return error.lower()
  46    parts = [type(error).__name__, str(error)]
  47    payload = getattr(error, "payload", None)
  48    if isinstance(payload, dict) and payload:
  49        parts.append(json.dumps(payload, ensure_ascii=False, default=str))
  50    return " ".join(parts).lower()
  51
  52
  53def provider_action_required(text_or_error: Any) -> bool:
  54    text = provider_error_text(text_or_error)
  55    return any(marker in text for marker in PROVIDER_ACTION_MARKERS)
  56
  57
  58def provider_action_required_note(text_or_error: Any) -> str:
  59    return PROVIDER_ACTION_REQUIRED_NOTE if provider_action_required(text_or_error) else ""
  60
  61
  62def provider_rate_limited(text_or_error: Any) -> bool:
  63    text = provider_error_text(text_or_error)
  64    return any(marker in text for marker in RATE_LIMIT_MARKERS)
nipux_cli/record_commands.py 542 lines
   1"""Read-only CLI commands for job records, ledgers, memory, and usage."""
   2
   3from __future__ import annotations
   4
   5import json
   6from dataclasses import dataclass
   7from pathlib import Path
   8from typing import Any, Callable
   9
  10from nipux_cli.artifacts import ArtifactStore, sha256_text
  11from nipux_cli.cli_render import job_ref_text as default_job_ref_text
  12from nipux_cli.cli_render import json_default, rule
  13from nipux_cli.daemon import daemon_lock_status
  14from nipux_cli.memory_graph import memory_graph_from_job
  15from nipux_cli.memory_graph_view import render_memory_graph_html
  16from nipux_cli.tui_status import active_operator_messages, worker_label
  17from nipux_cli.tui_style import _one_line
  18from nipux_cli.usage import format_usage_report
  19
  20
  21@dataclass(frozen=True)
  22class RecordCommandDeps:
  23    db_factory: Callable[[], tuple[Any, Any]]
  24    resolve_job_id: Callable[[Any, Any], str | None]
  25    job_ref_text: Callable[[Any], str] = default_job_ref_text
  26
  27
  28def cmd_findings_impl(args: Any, deps: RecordCommandDeps) -> None:
  29    db, _ = deps.db_factory()
  30    try:
  31        job_id = _resolve_or_print(db, args, deps)
  32        if not job_id:
  33            return
  34        job = db.get_job(job_id)
  35        findings = _metadata_records(job, "finding_ledger")
  36        if args.json:
  37            print(json.dumps(findings, ensure_ascii=False, indent=2, default=json_default))
  38            return
  39        print(f"findings {job['title']} | {len(findings)} unique")
  40        print(rule("="))
  41        if not findings:
  42            print("none yet")
  43            return
  44        ranked = sorted(findings, key=lambda finding: float(finding.get("score") or 0), reverse=True)
  45        for index, finding in enumerate(ranked[: args.limit], start=1):
  46            score = finding.get("score")
  47            score_text = f" score={score:g}" if isinstance(score, (int, float)) else ""
  48            print(f"{index:>2}. {_one_line(finding.get('name') or 'unknown', 54)}{score_text}")
  49            details = " | ".join(
  50                value
  51                for value in [
  52                    str(finding.get("location") or "").strip(),
  53                    str(finding.get("category") or "").strip(),
  54                    str(finding.get("status") or "").strip(),
  55                ]
  56                if value
  57            )
  58            if details:
  59                print(f"    {details}")
  60            if finding.get("url") or finding.get("source_url"):
  61                print(f"    {finding.get('url') or finding.get('source_url')}")
  62            if finding.get("reason"):
  63                print(f"    {_one_line(finding['reason'], args.chars)}")
  64    finally:
  65        db.close()
  66
  67
  68def cmd_tasks_impl(args: Any, deps: RecordCommandDeps) -> None:
  69    db, _ = deps.db_factory()
  70    try:
  71        job_id = _resolve_or_print(db, args, deps)
  72        if not job_id:
  73            return
  74        job = db.get_job(job_id)
  75        tasks = _metadata_records(job, "task_queue")
  76        if args.status:
  77            wanted = {status.strip().lower() for status in args.status}
  78            tasks = [task for task in tasks if str(task.get("status") or "open").lower() in wanted]
  79        if args.json:
  80            print(json.dumps(tasks, ensure_ascii=False, indent=2, default=json_default))
  81            return
  82        status_order = {"active": 0, "open": 1, "blocked": 2, "done": 3, "skipped": 4}
  83        ranked = sorted(
  84            tasks,
  85            key=lambda task: (
  86                status_order.get(str(task.get("status") or "open"), 9),
  87                -int(task.get("priority") or 0),
  88                str(task.get("title") or ""),
  89            ),
  90        )
  91        print(f"tasks {job['title']} | {len(ranked)} tracked")
  92        print(rule("="))
  93        if not ranked:
  94            print("none yet")
  95            return
  96        for index, task in enumerate(ranked[: args.limit], start=1):
  97            status = str(task.get("status") or "open")
  98            priority = int(task.get("priority") or 0)
  99            print(f"{index:>2}. {status:<7} p={priority:<3} {_one_line(task.get('title') or 'untitled', 54)}")
 100            details = " | ".join(
 101                value
 102                for value in [
 103                    f"contract={task.get('output_contract')}" if task.get("output_contract") else "",
 104                    f"accept={task.get('acceptance_criteria')}" if task.get("acceptance_criteria") else "",
 105                    f"evidence={task.get('evidence_needed')}" if task.get("evidence_needed") else "",
 106                    f"stall={task.get('stall_behavior')}" if task.get("stall_behavior") else "",
 107                    str(task.get("goal") or "").strip(),
 108                    str(task.get("source_hint") or "").strip(),
 109                    str(task.get("result") or "").strip(),
 110                ]
 111                if value
 112            )
 113            if details:
 114                print(f"    {_one_line(details, args.chars)}")
 115    finally:
 116        db.close()
 117
 118
 119def cmd_roadmap_impl(args: Any, deps: RecordCommandDeps) -> None:
 120    db, _ = deps.db_factory()
 121    try:
 122        job_id = _resolve_or_print(db, args, deps)
 123        if not job_id:
 124            return
 125        job = db.get_job(job_id)
 126        metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
 127        roadmap = metadata.get("roadmap") if isinstance(metadata.get("roadmap"), dict) else {}
 128        if args.json:
 129            print(json.dumps(roadmap, ensure_ascii=False, indent=2, default=json_default))
 130            return
 131        print(f"roadmap {job['title']}")
 132        print(rule("="))
 133        if not roadmap:
 134            print("none yet")
 135            print("the worker can create one with record_roadmap when broad work needs milestones")
 136            return
 137        milestones = roadmap.get("milestones") if isinstance(roadmap.get("milestones"), list) else []
 138        print(f"title: {roadmap.get('title') or 'Roadmap'}")
 139        print(f"status: {roadmap.get('status') or 'planned'} | milestones: {len(milestones)}")
 140        if roadmap.get("current_milestone"):
 141            print(f"current: {_one_line(roadmap.get('current_milestone') or '', args.chars)}")
 142        if roadmap.get("scope"):
 143            print(f"scope: {_one_line(roadmap.get('scope') or '', args.chars)}")
 144        if roadmap.get("validation_contract"):
 145            print(f"validation: {_one_line(roadmap.get('validation_contract') or '', args.chars)}")
 146        _print_milestones(milestones, limit=args.limit, features=args.features, chars=args.chars)
 147    finally:
 148        db.close()
 149
 150
 151def cmd_experiments_impl(args: Any, deps: RecordCommandDeps) -> None:
 152    db, _ = deps.db_factory()
 153    try:
 154        job_id = _resolve_or_print(db, args, deps)
 155        if not job_id:
 156            return
 157        job = db.get_job(job_id)
 158        experiments = _metadata_records(job, "experiment_ledger")
 159        if args.status:
 160            wanted = {status.strip().lower() for status in args.status}
 161            experiments = [
 162                experiment for experiment in experiments if str(experiment.get("status") or "planned").lower() in wanted
 163            ]
 164        if args.json:
 165            print(json.dumps(experiments, ensure_ascii=False, indent=2, default=json_default))
 166            return
 167        status_order = {"running": 0, "planned": 1, "measured": 2, "blocked": 3, "failed": 4, "skipped": 5}
 168        ranked = sorted(
 169            experiments,
 170            key=lambda experiment: (
 171                not bool(experiment.get("best_observed")),
 172                status_order.get(str(experiment.get("status") or "planned"), 9),
 173                str(experiment.get("updated_at") or experiment.get("created_at") or ""),
 174            ),
 175        )
 176        print(f"experiments {job['title']} | {len(ranked)} tracked")
 177        print(rule("="))
 178        if not ranked:
 179            print("none yet")
 180            return
 181        for index, experiment in enumerate(ranked[: args.limit], start=1):
 182            status = str(experiment.get("status") or "planned")
 183            best = " *best*" if experiment.get("best_observed") else ""
 184            metric = ""
 185            if experiment.get("metric_value") is not None:
 186                metric = (
 187                    f" {experiment.get('metric_name') or 'metric'}="
 188                    f"{experiment.get('metric_value')}{experiment.get('metric_unit') or ''}"
 189                )
 190            print(f"{index:>2}. {status:<8} {_one_line(experiment.get('title') or 'experiment', 54)}{metric}{best}")
 191            details = " | ".join(
 192                value
 193                for value in [
 194                    str(experiment.get("result") or "").strip(),
 195                    f"next: {experiment.get('next_action')}" if experiment.get("next_action") else "",
 196                    f"delta: {experiment.get('delta_from_previous_best')}"
 197                    if experiment.get("delta_from_previous_best") is not None
 198                    else "",
 199                ]
 200                if value
 201            )
 202            if details:
 203                print(f"    {_one_line(details, args.chars)}")
 204    finally:
 205        db.close()
 206
 207
 208def cmd_sources_impl(args: Any, deps: RecordCommandDeps) -> None:
 209    db, _ = deps.db_factory()
 210    try:
 211        job_id = _resolve_or_print(db, args, deps)
 212        if not job_id:
 213            return
 214        job = db.get_job(job_id)
 215        sources = _metadata_records(job, "source_ledger")
 216        if args.json:
 217            print(json.dumps(sources, ensure_ascii=False, indent=2, default=json_default))
 218            return
 219        ranked = sorted(
 220            sources,
 221            key=lambda source: (float(source.get("usefulness_score") or 0), int(source.get("yield_count") or 0)),
 222            reverse=True,
 223        )
 224        print(f"sources {job['title']} | {len(sources)} scored")
 225        print(rule("="))
 226        if not ranked:
 227            print("none yet")
 228            return
 229        for index, source in enumerate(ranked[: args.limit], start=1):
 230            score = float(source.get("usefulness_score") or 0)
 231            print(
 232                f"{index:>2}. {_one_line(source.get('source') or 'unknown', 58)} "
 233                f"score={score:g} findings={source.get('yield_count') or 0} fails={source.get('fail_count') or 0}"
 234            )
 235            detail = " | ".join(
 236                value
 237                for value in [
 238                    str(source.get("source_type") or "").strip(),
 239                    str(source.get("last_outcome") or "").strip(),
 240                ]
 241                if value
 242            )
 243            if detail:
 244                print(f"    {_one_line(detail, args.chars)}")
 245            warnings = source.get("warnings") if isinstance(source.get("warnings"), list) else []
 246            if warnings:
 247                print(f"    warnings: {_one_line(', '.join(str(item) for item in warnings[-3:]), args.chars)}")
 248    finally:
 249        db.close()
 250
 251
 252def cmd_memory_impl(args: Any, deps: RecordCommandDeps) -> None:
 253    db, config = deps.db_factory()
 254    try:
 255        job_id = _resolve_or_print(db, args, deps)
 256        if not job_id:
 257            return
 258        job = db.get_job(job_id)
 259        metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
 260        lessons = _metadata_records(job, "lessons")
 261        reflections = _metadata_records(job, "reflections")
 262        compact = db.list_memory(job_id)
 263        active_operator = active_operator_messages(metadata)
 264        graph = memory_graph_from_job(job)
 265        if bool(getattr(args, "json", False)):
 266            print(json.dumps(graph, ensure_ascii=False, indent=2, default=json_default))
 267            return
 268        if bool(getattr(args, "graph", False)):
 269            _write_memory_graph_view(db=db, config=config, job=job, graph=graph, output=getattr(args, "output", None))
 270            return
 271        pending_measurement = (
 272            metadata.get("pending_measurement_obligation")
 273            if isinstance(metadata.get("pending_measurement_obligation"), dict)
 274            else {}
 275        )
 276        print(f"memory {job['title']}")
 277        print(rule("="))
 278        print(
 279            f"lessons={len(lessons)} reflections={len(reflections)} compact_entries={len(compact)} "
 280            f"graph_nodes={len(graph['nodes'])} graph_edges={len(graph['edges'])}"
 281        )
 282        _print_memory_sections(
 283            active_operator=active_operator,
 284            pending_measurement=pending_measurement,
 285            graph=graph,
 286            reflections=reflections,
 287            lessons=lessons,
 288            compact=compact,
 289            limit=args.limit,
 290            chars=args.chars,
 291        )
 292    finally:
 293        db.close()
 294
 295
 296def _write_memory_graph_view(
 297    *,
 298    db: Any,
 299    config: Any,
 300    job: dict[str, Any],
 301    graph: dict[str, Any],
 302    output: str | None,
 303) -> None:
 304    html = render_memory_graph_html(job)
 305    summary = f"Clickable memory graph with {len(graph['nodes'])} nodes and {len(graph['edges'])} links."
 306    metadata = {
 307        "memory_graph": True,
 308        "node_count": len(graph["nodes"]),
 309        "edge_count": len(graph["edges"]),
 310    }
 311    if output:
 312        path = Path(output).expanduser()
 313        path.parent.mkdir(parents=True, exist_ok=True)
 314        path.write_text(html, encoding="utf-8")
 315        artifact_id = db.add_artifact(
 316            job_id=str(job["id"]),
 317            path=path,
 318            sha256=sha256_text(html),
 319            artifact_type="html",
 320            title="Memory Graph",
 321            summary=summary,
 322            metadata=metadata,
 323        )
 324        print(f"memory graph written: {path}")
 325        print(f"artifact: {artifact_id}")
 326        return
 327    store = ArtifactStore(config.runtime.home, db)
 328    stored = store.write_text(
 329        job_id=str(job["id"]),
 330        content=html,
 331        title="Memory Graph",
 332        summary=summary,
 333        artifact_type="html",
 334        metadata=metadata,
 335    )
 336    print(f"memory graph written: {stored.path}")
 337    print(f"artifact: {stored.id}")
 338
 339
 340def cmd_metrics_impl(args: Any, deps: RecordCommandDeps) -> None:
 341    db, config = deps.db_factory()
 342    try:
 343        job_id = _resolve_or_print(db, args, deps)
 344        if not job_id:
 345            return
 346        job = db.get_job(job_id)
 347        steps = db.list_steps(job_id=job_id)
 348        artifacts = db.list_artifacts(job_id, limit=1000)
 349        findings = _metadata_records(job, "finding_ledger")
 350        sources = _metadata_records(job, "source_ledger")
 351        tasks = _metadata_records(job, "task_queue")
 352        experiments = _metadata_records(job, "experiment_ledger")
 353        lessons = _metadata_records(job, "lessons")
 354        reflections = _metadata_records(job, "reflections")
 355        metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
 356        roadmap = metadata.get("roadmap") if isinstance(metadata.get("roadmap"), dict) else {}
 357        milestones = roadmap.get("milestones") if isinstance(roadmap.get("milestones"), list) else []
 358        daemon = daemon_lock_status(config.runtime.home / "agentd.lock")
 359        finding_batches = [
 360            artifact
 361            for artifact in artifacts
 362            if "finding" in str(artifact.get("title") or artifact.get("summary") or "").lower()
 363        ]
 364        blocked = [step for step in steps if step.get("status") == "blocked"]
 365        failed = [step for step in steps if step.get("status") == "failed"]
 366        print(f"metrics {job['title']}")
 367        print(rule("="))
 368        print(f"daemon: {'running' if daemon['running'] else 'stopped'} | worker: {worker_label(job, bool(daemon['running']))}")
 369        print(f"steps: {_step_count(steps)} | failed: {len(failed)} | blocked/recovered: {len(blocked)}")
 370        print(f"artifacts: {len(artifacts)} | finding_batches: {len(finding_batches)}")
 371        print(
 372            f"findings: {len(findings)} | sources: {len(sources)} | tasks: {len(tasks)} | "
 373            f"milestones: {len(milestones)} | experiments: {len(experiments)} | "
 374            f"lessons: {len(lessons)} | reflections: {len(reflections)}"
 375        )
 376        _print_best_records(sources=sources, findings=findings, experiments=experiments, chars=args.chars)
 377    finally:
 378        db.close()
 379
 380
 381def cmd_usage_impl(args: Any, deps: RecordCommandDeps) -> None:
 382    db, config = deps.db_factory()
 383    try:
 384        job_id = _resolve_or_print(db, args, deps)
 385        if not job_id:
 386            return
 387        job = db.get_job(job_id)
 388        usage = db.job_token_usage(job_id)
 389        usage["input_cost_per_million"] = config.model.input_cost_per_million
 390        usage["output_cost_per_million"] = config.model.output_cost_per_million
 391        usage["max_job_cost_usd"] = config.runtime.max_job_cost_usd
 392        if args.json:
 393            print(json.dumps(usage, ensure_ascii=False, indent=2, sort_keys=True))
 394            return
 395        lines = format_usage_report(
 396            title=str(job.get("title") or job_id),
 397            usage=usage,
 398            context_length=int(config.model.context_length or 0),
 399            model=str(config.model.model),
 400            base_url=str(config.model.base_url),
 401        )
 402        print("\n".join(lines))
 403    finally:
 404        db.close()
 405
 406
 407def _resolve_or_print(db: Any, args: Any, deps: RecordCommandDeps) -> str | None:
 408    job_id = deps.resolve_job_id(db, args.job_id)
 409    if job_id:
 410        return job_id
 411    ref = deps.job_ref_text(args.job_id)
 412    print(f"No job matched: {ref}" if ref else "No jobs found.")
 413    return None
 414
 415
 416def _metadata_records(job: dict[str, Any], key: str) -> list[dict[str, Any]]:
 417    metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
 418    values = metadata.get(key)
 419    if not isinstance(values, list):
 420        return []
 421    return [value for value in values if isinstance(value, dict)]
 422
 423
 424def _print_milestones(milestones: list[Any], *, limit: int, features: int, chars: int) -> None:
 425    if not milestones:
 426        return
 427    print()
 428    status_order = {"active": 0, "validating": 1, "planned": 2, "blocked": 3, "done": 4, "skipped": 5}
 429    ranked = sorted(
 430        [milestone for milestone in milestones if isinstance(milestone, dict)],
 431        key=lambda milestone: (
 432            status_order.get(str(milestone.get("status") or "planned"), 9),
 433            -int(milestone.get("priority") or 0),
 434            str(milestone.get("title") or ""),
 435        ),
 436    )
 437    for index, milestone in enumerate(ranked[:limit], start=1):
 438        status = str(milestone.get("status") or "planned")
 439        validation = str(milestone.get("validation_status") or "not_started")
 440        milestone_features = milestone.get("features") if isinstance(milestone.get("features"), list) else []
 441        open_features = sum(
 442            1 for feature in milestone_features
 443            if isinstance(feature, dict) and str(feature.get("status") or "planned") in {"planned", "active"}
 444        )
 445        print(
 446            f"{index:>2}. {status:<10} validation={validation:<11} "
 447            f"p={int(milestone.get('priority') or 0):<3} {_one_line(milestone.get('title') or 'milestone', 54)}"
 448        )
 449        details = " | ".join(
 450            value
 451            for value in [
 452                f"features={len(milestone_features)}/{open_features} open" if milestone_features else "",
 453                f"accept={milestone.get('acceptance_criteria')}" if milestone.get("acceptance_criteria") else "",
 454                f"evidence={milestone.get('evidence_needed')}" if milestone.get("evidence_needed") else "",
 455                f"result={milestone.get('validation_result')}" if milestone.get("validation_result") else "",
 456                f"next={milestone.get('next_action')}" if milestone.get("next_action") else "",
 457            ]
 458            if value
 459        )
 460        if details:
 461            print(f"    {_one_line(details, chars)}")
 462        for feature in milestone_features[: min(3, features)]:
 463            if isinstance(feature, dict):
 464                print(f"    - {str(feature.get('status') or 'planned'):<7} {_one_line(feature.get('title') or 'feature', max(30, chars - 16))}")
 465
 466
 467def _print_memory_sections(
 468    *,
 469    active_operator: list[dict[str, Any]],
 470    pending_measurement: dict[str, Any],
 471    graph: dict[str, Any],
 472    reflections: list[dict[str, Any]],
 473    lessons: list[dict[str, Any]],
 474    compact: list[dict[str, Any]],
 475    limit: int,
 476    chars: int,
 477) -> None:
 478    if active_operator:
 479        print()
 480        print("active operator context:")
 481        for entry in active_operator[-min(limit, 8) :]:
 482            marker = entry.get("event_id") or "operator"
 483            print(f"  {marker}: {_one_line(entry.get('message') or '', chars)}")
 484    if pending_measurement:
 485        print()
 486        print(f"pending measurement: step #{pending_measurement.get('source_step_no') or '?'}")
 487        candidates = pending_measurement.get("metric_candidates") if isinstance(pending_measurement.get("metric_candidates"), list) else []
 488        if candidates:
 489            print(f"  candidates: {_one_line(', '.join(str(item) for item in candidates[:5]), chars)}")
 490    nodes = graph.get("nodes") if isinstance(graph.get("nodes"), list) else []
 491    if nodes:
 492        print()
 493        print("memory graph:")
 494        for node in nodes[: min(limit, 8)]:
 495            print(
 496                f"  {node.get('kind') or 'fact'}:{node.get('status') or 'active'} "
 497                f"{_one_line(node.get('title') or node.get('key') or 'memory', chars)}"
 498            )
 499        print("  view: memory --graph")
 500    if reflections:
 501        print()
 502        print("latest reflection:")
 503        reflection = reflections[-1]
 504        print(f"  {_one_line(reflection.get('summary') or '', chars)}")
 505        if reflection.get("strategy"):
 506            print(f"  strategy: {_one_line(reflection['strategy'], chars)}")
 507    if lessons:
 508        print()
 509        print("latest lessons:")
 510        for lesson in lessons[-min(limit, 8) :]:
 511            print(f"  {lesson.get('category') or 'memory'}: {_one_line(lesson.get('lesson') or '', chars)}")
 512    if compact:
 513        print()
 514        print("compact memory:")
 515        for entry in compact[: min(limit, 3)]:
 516            print(f"  {entry.get('key')}: {_one_line(entry.get('summary') or '', chars)}")
 517
 518
 519def _print_best_records(
 520    *,
 521    sources: list[dict[str, Any]],
 522    findings: list[dict[str, Any]],
 523    experiments: list[dict[str, Any]],
 524    chars: int,
 525) -> None:
 526    if sources:
 527        best = max(sources, key=lambda source: float(source.get("usefulness_score") or 0))
 528        print(f"best source: {_one_line(best.get('source') or '', chars)} score={best.get('usefulness_score')}")
 529    if findings:
 530        best_finding = max(findings, key=lambda finding: float(finding.get("score") or 0))
 531        print(f"best finding: {_one_line(best_finding.get('name') or '', chars)} score={best_finding.get('score')}")
 532    measured = [experiment for experiment in experiments if experiment.get("metric_value") is not None]
 533    best_experiments = [experiment for experiment in measured if experiment.get("best_observed")]
 534    if best_experiments:
 535        best_experiment = best_experiments[-1]
 536        metric = f"{best_experiment.get('metric_name') or 'metric'}={best_experiment.get('metric_value')}{best_experiment.get('metric_unit') or ''}"
 537        print(f"best experiment: {_one_line(best_experiment.get('title') or '', chars)} {metric}")
 538
 539
 540def _step_count(steps: list[dict[str, Any]]) -> int:
 541    numbers = [int(step.get("step_no") or 0) for step in steps]
 542    return max(numbers, default=0)
nipux_cli/scheduling.py 76 lines
   1"""Shared scheduling helpers for deferred long-running work."""
   2
   3from __future__ import annotations
   4
   5from datetime import datetime, timezone
   6from typing import Any
   7
   8
   9def job_deferred_until(job: dict[str, Any], *, now: datetime | None = None) -> datetime | None:
  10    metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
  11    raw_until = str(metadata.get("defer_until") or "").strip()
  12    if not raw_until:
  13        return None
  14    try:
  15        until = datetime.fromisoformat(raw_until.replace("Z", "+00:00"))
  16    except ValueError:
  17        return None
  18    if until.tzinfo is None:
  19        until = until.replace(tzinfo=timezone.utc)
  20    until = until.astimezone(timezone.utc)
  21    now = (now or datetime.now(timezone.utc)).astimezone(timezone.utc)
  22    return until if until > now else None
  23
  24
  25def job_is_deferred(job: dict[str, Any], *, now: datetime | None = None) -> bool:
  26    return job_deferred_until(job, now=now) is not None
  27
  28
  29def job_provider_blocked(job: dict[str, Any]) -> bool:
  30    """Return true when provider calls need operator action before retrying."""
  31
  32    metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
  33    blocked_raw = str(metadata.get("provider_blocked_at") or "").strip()
  34    if not blocked_raw:
  35        return False
  36    unblocked_raw = str(metadata.get("provider_unblocked_at") or "").strip()
  37    if not unblocked_raw:
  38        return True
  39    blocked_at = _metadata_time(blocked_raw)
  40    unblocked_at = _metadata_time(unblocked_raw)
  41    if blocked_at is None or unblocked_at is None:
  42        return False
  43    return blocked_at > unblocked_at
  44
  45
  46def provider_retry_metadata() -> dict[str, str]:
  47    """Metadata patch used when the operator explicitly retries provider work."""
  48
  49    return {
  50        "provider_blocked_at": "",
  51        "provider_unblocked_at": datetime.now(timezone.utc).isoformat(),
  52    }
  53
  54
  55def operator_resume_metadata() -> dict[str, str]:
  56    """Metadata patch used when the operator explicitly makes a job runnable."""
  57
  58    patch = provider_retry_metadata()
  59    patch.update(
  60        {
  61            "defer_until": "",
  62            "defer_reason": "",
  63            "defer_next_action": "",
  64        }
  65    )
  66    return patch
  67
  68
  69def _metadata_time(value: str) -> datetime | None:
  70    try:
  71        parsed = datetime.fromisoformat(value.replace("Z", "+00:00"))
  72    except ValueError:
  73        return None
  74    if parsed.tzinfo is None:
  75        parsed = parsed.replace(tzinfo=timezone.utc)
  76    return parsed.astimezone(timezone.utc)
nipux_cli/service_install.py 179 lines
   1"""OS service installation helpers for the Nipux daemon."""
   2
   3from __future__ import annotations
   4
   5import os
   6import shlex
   7import shutil
   8import subprocess
   9import sys
  10from argparse import Namespace
  11from pathlib import Path
  12
  13from nipux_cli.config import load_config
  14
  15
  16def launch_agent_path() -> Path:
  17    return Path.home() / "Library" / "LaunchAgents" / "com.nipux.agent.plist"
  18
  19
  20def launch_agent_plist(*, poll_seconds: float, quiet: bool) -> str:
  21    config = load_config()
  22    config.ensure_dirs()
  23    command = [
  24        sys.executable,
  25        "-m",
  26        "nipux_cli.cli",
  27        "daemon",
  28        "--poll-seconds",
  29        str(poll_seconds),
  30    ]
  31    command.append("--quiet" if quiet else "--verbose")
  32    args_xml = "\n".join(f"        <string>{xml_escape(part)}</string>" for part in command)
  33    log_path = config.runtime.logs_dir / "launchd-daemon.log"
  34    return f"""<?xml version="1.0" encoding="UTF-8"?>
  35<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
  36<plist version="1.0">
  37  <dict>
  38    <key>Label</key>
  39    <string>com.nipux.agent</string>
  40    <key>ProgramArguments</key>
  41    <array>
  42{args_xml}
  43    </array>
  44    <key>EnvironmentVariables</key>
  45    <dict>
  46      <key>NIPUX_HOME</key>
  47      <string>{xml_escape(str(config.runtime.home))}</string>
  48    </dict>
  49    <key>RunAtLoad</key>
  50    <true/>
  51    <key>KeepAlive</key>
  52    <true/>
  53    <key>StandardOutPath</key>
  54    <string>{xml_escape(str(log_path))}</string>
  55    <key>StandardErrorPath</key>
  56    <string>{xml_escape(str(log_path))}</string>
  57    <key>WorkingDirectory</key>
  58    <string>{xml_escape(str(Path.cwd()))}</string>
  59  </dict>
  60</plist>
  61"""
  62
  63
  64def systemd_service_path() -> Path:
  65    return Path.home() / ".config" / "systemd" / "user" / "nipux.service"
  66
  67
  68def systemd_service_text(*, poll_seconds: float, quiet: bool) -> str:
  69    config = load_config()
  70    config.ensure_dirs()
  71    command = [
  72        sys.executable,
  73        "-m",
  74        "nipux_cli.cli",
  75        "daemon",
  76        "--poll-seconds",
  77        str(poll_seconds),
  78    ]
  79    command.append("--quiet" if quiet else "--verbose")
  80    return "\n".join(
  81        [
  82            "[Unit]",
  83            "Description=Nipux 24/7 autonomous worker",
  84            "After=network-online.target",
  85            "Wants=network-online.target",
  86            "",
  87            "[Service]",
  88            "Type=simple",
  89            f"WorkingDirectory={Path.cwd()}",
  90            f"Environment=NIPUX_HOME={config.runtime.home}",
  91            f"ExecStart={' '.join(shlex.quote(part) for part in command)}",
  92            "Restart=always",
  93            "RestartSec=3",
  94            "",
  95            "[Install]",
  96            "WantedBy=default.target",
  97            "",
  98        ]
  99    )
 100
 101
 102def cmd_autostart(args: Namespace) -> None:
 103    path = launch_agent_path()
 104    label = "gui/" + str(os.getuid()) + "/com.nipux.agent"
 105    if args.action == "status":
 106        status = "installed" if path.exists() else "not installed"
 107        print(f"autostart: {status}")
 108        print(f"plist: {path}")
 109        if path.exists():
 110            result = subprocess.run(
 111                ["launchctl", "print", label], check=False, stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL
 112            )
 113            print("launchd: loaded" if result.returncode == 0 else "launchd: not loaded")
 114        return
 115    if args.action == "install":
 116        path.parent.mkdir(parents=True, exist_ok=True)
 117        path.write_text(launch_agent_plist(poll_seconds=args.poll_seconds, quiet=args.quiet), encoding="utf-8")
 118        subprocess.run(
 119            ["launchctl", "bootout", label], check=False, stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL
 120        )
 121        result = subprocess.run(["launchctl", "bootstrap", "gui/" + str(os.getuid()), str(path)], check=False)
 122        if result.returncode:
 123            raise SystemExit(result.returncode)
 124        subprocess.run(["launchctl", "enable", label], check=False)
 125        print(f"autostart installed: {path}")
 126        print("daemon will start at login and launchd will keep it alive")
 127        return
 128    if args.action == "uninstall":
 129        subprocess.run(
 130            ["launchctl", "bootout", label], check=False, stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL
 131        )
 132        if path.exists():
 133            path.unlink()
 134        print("autostart uninstalled")
 135        return
 136    raise SystemExit(f"unknown autostart action: {args.action}")
 137
 138
 139def cmd_service(args: Namespace) -> None:
 140    path = systemd_service_path()
 141    systemctl = shutil.which("systemctl")
 142    user_cmd = [systemctl, "--user"] if systemctl else None
 143    if args.action == "status":
 144        print(f"service: {'installed' if path.exists() else 'not installed'}")
 145        print(f"unit: {path}")
 146        if user_cmd:
 147            result = subprocess.run(
 148                [*user_cmd, "is-active", "nipux.service"], check=False, capture_output=True, text=True
 149            )
 150            print(f"systemd: {result.stdout.strip() or result.stderr.strip() or 'unknown'}")
 151        else:
 152            print("systemd: unavailable on this machine")
 153        return
 154    if args.action == "install":
 155        path.parent.mkdir(parents=True, exist_ok=True)
 156        path.write_text(systemd_service_text(poll_seconds=args.poll_seconds, quiet=args.quiet), encoding="utf-8")
 157        print(f"service file written: {path}")
 158        if user_cmd:
 159            subprocess.run([*user_cmd, "daemon-reload"], check=False)
 160            subprocess.run([*user_cmd, "enable", "--now", "nipux.service"], check=False)
 161            print("systemd user service enabled and started")
 162        else:
 163            print(
 164                "systemd not found; copy this service to a Linux server or run: systemctl --user enable --now nipux.service"
 165            )
 166        return
 167    if args.action == "uninstall":
 168        if user_cmd:
 169            subprocess.run([*user_cmd, "disable", "--now", "nipux.service"], check=False)
 170            subprocess.run([*user_cmd, "daemon-reload"], check=False)
 171        if path.exists():
 172            path.unlink()
 173        print("service uninstalled")
 174        return
 175    raise SystemExit(f"unknown service action: {args.action}")
 176
 177
 178def xml_escape(value: str) -> str:
 179    return value.replace("&", "&amp;").replace("<", "&lt;").replace(">", "&gt;")
nipux_cli/settings.py 153 lines
   1"""Inline config editing helpers for Nipux slash commands."""
   2
   3from __future__ import annotations
   4
   5import os
   6from pathlib import Path
   7from typing import Any
   8
   9import yaml
  10
  11from nipux_cli.config import default_config_yaml, get_agent_home, load_config, write_private_text
  12from nipux_cli.cli_state import clear_model_setup_verified
  13from nipux_cli.tui_commands import SETTINGS_FIELD_TYPES
  14
  15
  16def config_field_value(field: str, config: Any | None = None) -> Any:
  17    config = load_config() if config is None else config
  18    values = {
  19        "model.name": config.model.model,
  20        "model.base_url": config.model.base_url,
  21        "model.api_key_env": config.model.api_key_env,
  22        "model.context_length": config.model.context_length,
  23        "model.request_timeout_seconds": config.model.request_timeout_seconds,
  24        "model.input_cost_per_million": config.model.input_cost_per_million,
  25        "model.output_cost_per_million": config.model.output_cost_per_million,
  26        "runtime.home": str(config.runtime.home),
  27        "runtime.max_step_seconds": config.runtime.max_step_seconds,
  28        "runtime.artifact_inline_char_limit": config.runtime.artifact_inline_char_limit,
  29        "runtime.daily_digest_enabled": config.runtime.daily_digest_enabled,
  30        "runtime.daily_digest_time": config.runtime.daily_digest_time,
  31        "runtime.max_job_cost_usd": config.runtime.max_job_cost_usd,
  32        "tools.browser": config.tools.browser,
  33        "tools.web": config.tools.web,
  34        "tools.shell": config.tools.shell,
  35        "tools.files": config.tools.files,
  36    }
  37    return values.get(field, "")
  38
  39
  40def save_config_field(field: str, raw_value: str) -> Any:
  41    value = _coerce_config_value(field, raw_value)
  42    data = _load_config_yaml()
  43    section, key = field.split(".", 1)
  44    target = data.setdefault(section, {})
  45    if not isinstance(target, dict):
  46        target = {}
  47        data[section] = target
  48    target[key] = value
  49    _save_config_yaml(data)
  50    return value
  51
  52
  53def inline_setting_notice(field: str, raw_value: str) -> str:
  54    value = raw_value.strip()
  55    if not value:
  56        return f"kept {field}"
  57    if field == "secret:model.api_key":
  58        config = load_config()
  59        name = config.model.api_key_env
  60        _save_env_secret(name, value)
  61        clear_model_setup_verified()
  62        return f"saved {name} in {_short_path(get_agent_home() / '.env')}"
  63    try:
  64        saved = save_config_field(field, value)
  65    except ValueError as exc:
  66        return f"{field}: {exc}"
  67    clear_model_setup_verified()
  68    return f"saved {field} = {saved}"
  69
  70
  71def edit_target_label(field: str) -> str:
  72    if field == "secret:model.api_key":
  73        return "API key"
  74    return field
  75
  76
  77def edit_target_hint(field: str, config: Any | None = None) -> str:
  78    config = config or load_config()
  79    if field == "secret:model.api_key":
  80        state = "set" if config.model.api_key else "missing"
  81        return f"Editing API key ({state}). Enter saves, Esc cancels. Input is hidden."
  82    current = config_field_value(field, config)
  83    return f"Editing {field}. Current: {current}. Enter saves, Esc cancels, empty keeps current."
  84
  85
  86def edit_target_masks_input(field: str | None) -> bool:
  87    return field == "secret:model.api_key"
  88
  89
  90def _config_path() -> Path:
  91    return get_agent_home() / "config.yaml"
  92
  93
  94def _load_config_yaml() -> dict[str, Any]:
  95    path = _config_path()
  96    if not path.exists():
  97        loaded = yaml.safe_load(default_config_yaml()) or {}
  98        return loaded if isinstance(loaded, dict) else {}
  99    loaded = yaml.safe_load(path.read_text(encoding="utf-8")) or {}
 100    return loaded if isinstance(loaded, dict) else {}
 101
 102
 103def _save_config_yaml(data: dict[str, Any]) -> None:
 104    path = _config_path()
 105    write_private_text(path, yaml.safe_dump(data, sort_keys=False))
 106
 107
 108def _save_env_secret(name: str, value: str) -> None:
 109    env_path = get_agent_home() / ".env"
 110    env_path.parent.mkdir(parents=True, exist_ok=True)
 111    existing: dict[str, str] = {}
 112    if env_path.exists():
 113        for raw in env_path.read_text(encoding="utf-8", errors="ignore").splitlines():
 114            if "=" not in raw or raw.strip().startswith("#"):
 115                continue
 116            key, current = raw.split("=", 1)
 117            if key.strip():
 118                existing[key.strip()] = current.strip()
 119    existing[name] = value
 120    write_private_text(env_path, "\n".join(f"{key}={current}" for key, current in existing.items()) + "\n")
 121    os.environ[name] = value
 122
 123
 124def _coerce_config_value(field: str, raw_value: str) -> Any:
 125    kind = SETTINGS_FIELD_TYPES.get(field, "str")
 126    value = raw_value.strip()
 127    if field == "runtime.max_job_cost_usd" and value.lower() in {"0", "none", "off", "false", "null"}:
 128        return None
 129    if kind == "int":
 130        return int(value)
 131    if kind == "float":
 132        return float(value)
 133    if kind == "bool":
 134        lowered = value.lower()
 135        if lowered in {"1", "true", "yes", "on"}:
 136            return True
 137        if lowered in {"0", "false", "no", "off"}:
 138            return False
 139        raise ValueError("use true or false")
 140    if kind == "path":
 141        return str(Path(value).expanduser())
 142    return value
 143
 144
 145def _short_path(path: Path | str, *, max_width: int = 80) -> str:
 146    text = str(path)
 147    home = str(Path.home())
 148    if text.startswith(home + os.sep):
 149        text = "~" + text[len(home) :]
 150    if len(text) <= max_width:
 151        return text
 152    keep = max(12, max_width - 4)
 153    return "..." + text[-keep:]
nipux_cli/settings_commands.py 84 lines
   1"""Slash-command handlers for inline Nipux configuration."""
   2
   3from __future__ import annotations
   4
   5import shlex
   6from contextlib import redirect_stdout
   7from io import StringIO
   8
   9from nipux_cli.config import load_config
  10from nipux_cli.settings import config_field_value, inline_setting_notice
  11from nipux_cli.tui_commands import CHAT_SETTING_COMMANDS
  12
  13
  14def handle_chat_setting_command(command: str, rest: list[str]) -> bool:
  15    if command == "config":
  16        print("\n".join(config_summary_lines()))
  17        return True
  18    if command in {"key", "api-key"}:
  19        if not rest:
  20            config = load_config()
  21            state = "set" if config.model.api_key else "missing"
  22            print(f"API key is {state} via {config.model.api_key_env}. Use /api-key KEY to save a new one.")
  23            return True
  24        print(inline_setting_notice("secret:model.api_key", " ".join(rest)))
  25        return True
  26    if command not in CHAT_SETTING_COMMANDS:
  27        return False
  28    field, placeholder = CHAT_SETTING_COMMANDS[command]
  29    if not rest:
  30        current = config_field_value(field)
  31        print(f"{field} = {current}")
  32        print(f"usage: /{command} {placeholder}")
  33        return True
  34    print(inline_setting_notice(field, " ".join(rest)))
  35    return True
  36
  37
  38def config_summary_lines() -> list[str]:
  39    config = load_config()
  40    key_state = "set" if config.model.api_key else "missing"
  41    input_cost = _rate_text(config.model.input_cost_per_million)
  42    output_cost = _rate_text(config.model.output_cost_per_million)
  43    return [
  44        "config",
  45        f"model: {config.model.model}",
  46        f"endpoint: {config.model.base_url}",
  47        f"key: {key_state} ({config.model.api_key_env})",
  48        f"context: {config.model.context_length}",
  49        f"request timeout: {config.model.request_timeout_seconds}s",
  50        f"cost rates: input {input_cost} / output {output_cost} per 1M tokens",
  51        (
  52            "tools: "
  53            f"browser {config.tools.browser}, web {config.tools.web}, "
  54            f"CLI {config.tools.shell}, files {config.tools.files}"
  55        ),
  56        f"home: {config.runtime.home}",
  57        f"step timeout: {config.runtime.max_step_seconds}s",
  58        f"output preview: {config.runtime.artifact_inline_char_limit} chars",
  59        f"job cost limit: {_cost_limit_text(config.runtime.max_job_cost_usd)}",
  60        f"daily digest: {config.runtime.daily_digest_enabled} at {config.runtime.daily_digest_time}",
  61    ]
  62
  63
  64def _rate_text(value: float | None) -> str:
  65    return "provider-reported" if value is None else f"${value:g}"
  66
  67
  68def _cost_limit_text(value: float | None) -> str:
  69    return "none" if value is None else f"${value:g}"
  70
  71
  72def capture_setting_command(line: str) -> list[str]:
  73    try:
  74        parts = shlex.split(line[1:] if line.startswith("/") else line)
  75    except ValueError as exc:
  76        return [f"parse error: {exc}"]
  77    if not parts:
  78        return []
  79    stream = StringIO()
  80    with redirect_stdout(stream):
  81        if not handle_chat_setting_command(parts[0], parts[1:]):
  82            print(f"unknown config command: /{parts[0]}")
  83    lines = [" ".join(item.split()) for item in stream.getvalue().splitlines() if item.strip()]
  84    return lines[-12:] or ["done"]
nipux_cli/shell_tools.py 348 lines
   1"""Shell and workspace file tools for Nipux workers."""
   2
   3from __future__ import annotations
   4
   5import contextlib
   6import json
   7import os
   8import re
   9import signal
  10import subprocess
  11import time
  12from datetime import datetime, timezone
  13from pathlib import Path
  14from typing import Any
  15
  16
  17def write_file(args: dict[str, Any], ctx: Any) -> str:
  18    del ctx
  19    raw_path = str(args.get("path") or "").strip()
  20    if not raw_path:
  21        return _json({"success": False, "error": "path is required"})
  22    if "content" not in args:
  23        return _json({"success": False, "error": "content is required"})
  24    mode = str(args.get("mode") or "overwrite").strip().lower()
  25    if mode not in {"overwrite", "append"}:
  26        return _json({"success": False, "error": f"invalid mode: {mode}"})
  27    path = Path(raw_path).expanduser()
  28    if not path.is_absolute():
  29        path = Path.cwd() / path
  30    if path.exists() and path.is_dir():
  31        return _json({"success": False, "error": f"path is a directory: {path}"})
  32    create_parents = bool(args.get("create_parents", True))
  33    if create_parents:
  34        path.parent.mkdir(parents=True, exist_ok=True)
  35    content = str(args.get("content") or "")
  36    write_mode = "a" if mode == "append" else "w"
  37    with path.open(write_mode, encoding="utf-8") as fh:
  38        fh.write(content)
  39    return _json({
  40        "success": True,
  41        "path": str(path),
  42        "mode": mode,
  43        "bytes": path.stat().st_size,
  44    })
  45
  46
  47def shell_exec(args: dict[str, Any], ctx: Any) -> str:
  48    command = str(args.get("command") or "").strip()
  49    if not command:
  50        return _json({"success": False, "error": "command is required"})
  51    cwd_raw = str(args.get("cwd") or "").strip()
  52    cwd = cwd_raw or None
  53    if cwd and not Path(cwd).expanduser().exists():
  54        return _json({"success": False, "error": f"cwd does not exist: {cwd}"})
  55    timeout_raw = args.get("timeout_seconds")
  56    timeout = float(timeout_raw) if isinstance(timeout_raw, (int, float)) else 60.0
  57    timeout = max(1.0, min(timeout, 900.0))
  58    max_chars_raw = args.get("max_output_chars")
  59    max_chars = int(max_chars_raw) if isinstance(max_chars_raw, (int, float)) else 12000
  60    max_chars = max(1000, min(max_chars, 50000))
  61    shell = "/bin/zsh" if Path("/bin/zsh").exists() else None
  62    env = dict(os.environ)
  63    env["NIPUX_JOB_ID"] = ctx.job_id
  64    if ctx.run_id:
  65        env["NIPUX_RUN_ID"] = ctx.run_id
  66    started = time.monotonic()
  67    process: subprocess.Popen[str] | None = None
  68    try:
  69        process = subprocess.Popen(
  70            command,
  71            shell=True,
  72            executable=shell,
  73            cwd=str(Path(cwd).expanduser()) if cwd else None,
  74            env=env,
  75            stdout=subprocess.PIPE,
  76            stderr=subprocess.PIPE,
  77            text=True,
  78            start_new_session=True,
  79        )
  80        _register_shell_process(ctx, process, command=command, cwd=cwd or os.getcwd(), timeout_seconds=timeout)
  81        stdout, stderr = process.communicate(timeout=timeout)
  82    except subprocess.TimeoutExpired:
  83        assert process is not None
  84        _terminate_process_group(process)
  85        try:
  86            stdout, stderr = process.communicate(timeout=2)
  87        except subprocess.TimeoutExpired:
  88            _kill_process_group(process)
  89            stdout, stderr = process.communicate()
  90        return _json({
  91            "success": False,
  92            "error": f"command timed out after {timeout:.1f}s",
  93            "timed_out": True,
  94            "command": command,
  95            "cwd": cwd or os.getcwd(),
  96            "timeout_seconds": timeout,
  97            "duration_seconds": round(time.monotonic() - started, 3),
  98            "returncode": None,
  99            "stdout": _truncate_output(stdout, max_chars),
 100            "stderr": _truncate_output(stderr, max_chars),
 101        })
 102    except BaseException:
 103        if process is not None and process.poll() is None:
 104            _terminate_process_group(process)
 105            try:
 106                process.wait(timeout=2)
 107            except subprocess.TimeoutExpired:
 108                _kill_process_group(process)
 109        raise
 110    finally:
 111        if process is not None:
 112            _unregister_shell_process(ctx, process.pid)
 113    error = _shell_error(process.returncode, stdout, stderr, command=command)
 114    return _json({
 115        "success": process.returncode == 0 and not error,
 116        "error": error,
 117        "command": command,
 118        "cwd": cwd or os.getcwd(),
 119        "duration_seconds": round(time.monotonic() - started, 3),
 120        "returncode": process.returncode,
 121        "stdout": _truncate_output(stdout, max_chars),
 122        "stderr": _truncate_output(stderr, max_chars),
 123    })
 124
 125
 126def cleanup_registered_shell_processes(home: str | Path) -> list[dict[str, Any]]:
 127    path = _shell_process_registry_path(home)
 128    records = _read_shell_process_registry(path)
 129    if not records:
 130        return []
 131    cleaned: list[dict[str, Any]] = []
 132    survivors: list[dict[str, Any]] = []
 133    for record in records:
 134        pid = _as_int(record.get("pid"))
 135        if pid <= 0:
 136            continue
 137        if not _pid_exists(pid):
 138            continue
 139        try:
 140            os.killpg(pid, signal.SIGTERM)
 141        except ProcessLookupError:
 142            continue
 143        except PermissionError:
 144            survivors.append(record)
 145            continue
 146        time.sleep(0.05)
 147        if _pid_exists(pid):
 148            with contextlib.suppress(ProcessLookupError, PermissionError):
 149                os.killpg(pid, signal.SIGKILL)
 150        record = dict(record)
 151        record["cleaned_at"] = datetime.now(timezone.utc).isoformat()
 152        cleaned.append(record)
 153    _write_shell_process_registry(path, survivors)
 154    return cleaned
 155
 156
 157def _shell_error(returncode: int | None, stdout: str, stderr: str, *, command: str = "") -> str:
 158    if returncode == 0:
 159        return _shell_success_anomaly(stdout, stderr, command=command)
 160    combined = "\n".join(part.strip() for part in (stderr, stdout) if part and part.strip())
 161    lowered = combined.lower()
 162    if "sudo:" in lowered and ("password" in lowered or "terminal is required" in lowered):
 163        return "command requires interactive sudo/password; configure non-interactive privileges or choose a non-sudo path"
 164    if "permission denied" in lowered:
 165        return "command failed with permission denied"
 166    missing_probe = _missing_executable_probe(command, combined)
 167    if missing_probe:
 168        return f"command probe found no executable: {missing_probe}"
 169    excerpt = " ".join(combined.split())[:500] if combined else "no output"
 170    return f"command exited with status {returncode}: {excerpt}"
 171
 172
 173def _shell_success_anomaly(stdout: str, stderr: str, *, command: str = "") -> str:
 174    combined = "\n".join(part.strip() for part in (stderr, stdout) if part and part.strip())
 175    if not combined:
 176        empty_probe = _empty_observation_probe(command)
 177        if empty_probe:
 178            return f"command probe produced no output despite exit status 0: {empty_probe}"
 179        return ""
 180    lowered = combined.lower()
 181    auth_markers = (
 182        "401 unauthorized",
 183        "403 forbidden",
 184        "authentication failed",
 185        "username/password authentication failed",
 186        "invalid username or password",
 187        "permission denied",
 188    )
 189    if _shell_sudo_password_anomaly(lowered):
 190        return "command output indicates interactive sudo/password requirement despite exit status 0"
 191    if any(marker in lowered for marker in auth_markers):
 192        excerpt = " ".join(combined.split())[:500]
 193        return f"command output indicates authentication or authorization failure despite exit status 0: {excerpt}"
 194    missing_probe = _missing_executable_probe(command, combined)
 195    if missing_probe:
 196        return f"command probe found no executable despite exit status 0: {missing_probe}"
 197    command_missing_match = _shell_missing_command_anomaly(combined)
 198    if command_missing_match:
 199        excerpt = " ".join(combined.split())[:500]
 200        return f"command output indicates missing command despite exit status 0: {excerpt}"
 201    build_error_match = _shell_build_error_anomaly(combined)
 202    if build_error_match:
 203        excerpt = " ".join(combined.split())[:500]
 204        return f"command output indicates build/tool failure despite exit status 0: {excerpt}"
 205    http_error_match = _shell_http_error_anomaly(lowered)
 206    if http_error_match:
 207        excerpt = " ".join(combined.split())[:500]
 208        return f"command output indicates HTTP failure despite exit status 0: {excerpt}"
 209    return ""
 210
 211
 212def _missing_executable_probe(command: str, combined_output: str) -> str:
 213    text = str(command or "").strip()
 214    match = re.match(r"^(?:which|command\s+-v)\s+([A-Za-z0-9_.+-]+)(?:\s|$)", text)
 215    if not match:
 216        return ""
 217    if not combined_output.strip() or "not found" in combined_output.lower():
 218        return match.group(1)
 219    return ""
 220
 221
 222def _empty_observation_probe(command: str) -> str:
 223    text = str(command or "").strip()
 224    if re.match(r"^(?:which|command\s+-v)\s+([A-Za-z0-9_.+-]+)(?:\s|$)", text):
 225        return "probe found no executable: executable lookup returned no path"
 226    if re.match(r"^(?:find|ls|stat|file)\b", text):
 227        return "read-only filesystem probe returned no observation"
 228    return ""
 229
 230
 231def _shell_missing_command_anomaly(text: str) -> bool:
 232    return bool(
 233        re.search(
 234            r"(?im)(?:^|\n)(?:/bin/sh:\s*\d+:\s*)?(?:(?:/|~)[^\s:'\"]+|[A-Za-z0-9_.+-]+):\s*(?:command not found|not found)\s*$",
 235            text,
 236        )
 237    )
 238
 239
 240def _shell_sudo_password_anomaly(text: str) -> bool:
 241    return "sudo:" in text and ("password" in text or "terminal is required" in text)
 242
 243
 244def _shell_build_error_anomaly(text: str) -> bool:
 245    lowered = text.lower()
 246    if "no rule to make target" in lowered:
 247        return True
 248    if "***" in text and "stop." in lowered:
 249        return True
 250    return bool(re.search(r"(?im)^\s*(?:make(?:\[\d+\])?:\s*)?\*\*\* .*\bstop\.\s*$", text))
 251
 252
 253def _shell_http_error_anomaly(text: str) -> bool:
 254    return any(f" {code} " in f" {text} " for code in ("400", "401", "403", "404", "429", "500", "502", "503", "504")) and any(
 255        marker in text for marker in ("http", "error", "unauthorized", "forbidden", "not found", "too many requests")
 256    )
 257
 258
 259def _terminate_process_group(process: subprocess.Popen[str]) -> None:
 260    try:
 261        os.killpg(process.pid, signal.SIGTERM)
 262    except ProcessLookupError:
 263        return
 264
 265
 266def _kill_process_group(process: subprocess.Popen[str]) -> None:
 267    try:
 268        os.killpg(process.pid, signal.SIGKILL)
 269    except ProcessLookupError:
 270        return
 271
 272
 273def _register_shell_process(ctx: Any, process: subprocess.Popen[str], *, command: str, cwd: str, timeout_seconds: float) -> None:
 274    path = _shell_process_registry_path(ctx.config.runtime.home)
 275    path.parent.mkdir(parents=True, exist_ok=True)
 276    record = {
 277        "pid": process.pid,
 278        "pgid": process.pid,
 279        "job_id": getattr(ctx, "job_id", ""),
 280        "run_id": getattr(ctx, "run_id", "") or "",
 281        "step_id": getattr(ctx, "step_id", "") or "",
 282        "command": command[:1000],
 283        "cwd": cwd,
 284        "timeout_seconds": timeout_seconds,
 285        "started_at": datetime.now(timezone.utc).isoformat(),
 286    }
 287    with path.open("a", encoding="utf-8") as handle:
 288        handle.write(json.dumps(record, ensure_ascii=False) + "\n")
 289
 290
 291def _unregister_shell_process(ctx: Any, pid: int) -> None:
 292    path = _shell_process_registry_path(ctx.config.runtime.home)
 293    records = [record for record in _read_shell_process_registry(path) if _as_int(record.get("pid")) != pid]
 294    _write_shell_process_registry(path, records)
 295
 296
 297def _shell_process_registry_path(home: str | Path) -> Path:
 298    return Path(home).expanduser() / "runtime" / "shell_processes.jsonl"
 299
 300
 301def _read_shell_process_registry(path: Path) -> list[dict[str, Any]]:
 302    if not path.exists():
 303        return []
 304    records: list[dict[str, Any]] = []
 305    for line in path.read_text(encoding="utf-8").splitlines():
 306        try:
 307            record = json.loads(line)
 308        except json.JSONDecodeError:
 309            continue
 310        if isinstance(record, dict):
 311            records.append(record)
 312    return records
 313
 314
 315def _write_shell_process_registry(path: Path, records: list[dict[str, Any]]) -> None:
 316    path.parent.mkdir(parents=True, exist_ok=True)
 317    if not records:
 318        with contextlib.suppress(FileNotFoundError):
 319            path.unlink()
 320        return
 321    path.write_text("".join(json.dumps(record, ensure_ascii=False) + "\n" for record in records), encoding="utf-8")
 322
 323
 324def _pid_exists(pid: int) -> bool:
 325    try:
 326        os.kill(pid, 0)
 327        return True
 328    except ProcessLookupError:
 329        return False
 330
 331
 332def _as_int(value: Any) -> int:
 333    try:
 334        return int(value)
 335    except (TypeError, ValueError):
 336        return 0
 337
 338
 339def _truncate_output(value: Any, max_chars: int) -> str:
 340    text = value.decode("utf-8", errors="replace") if isinstance(value, bytes) else str(value or "")
 341    if len(text) <= max_chars:
 342        return text
 343    omitted = len(text) - max_chars
 344    return text[:max_chars] + f"\n... truncated {omitted} chars ..."
 345
 346
 347def _json(value: Any) -> str:
 348    return json.dumps(value, ensure_ascii=False)
nipux_cli/source_quality.py 32 lines
   1"""Source quality checks for web and browser tools."""
   2
   3from __future__ import annotations
   4
   5ANTI_BOT_MARKERS = (
   6    "performing security verification",
   7    "cloudflare security challenge",
   8    "verifies you are not a bot",
   9    "verify you are not a bot",
  10    "enable javascript and cookies",
  11    "checking your browser before accessing",
  12    "just a moment...",
  13    "you have been blocked",
  14    "browsing and clicking at a speed much faster than expected",
  15    "there is a robot on the same network",
  16)
  17
  18
  19def anti_bot_reason(*parts: str) -> str | None:
  20    """Return a short reason if text looks like an anti-bot interstitial."""
  21
  22    text = " ".join(part for part in parts if part).lower()
  23    if not text:
  24        return None
  25    for marker in ANTI_BOT_MARKERS:
  26        if marker in text:
  27            if "cloudflare" in text:
  28                return "cloudflare anti-bot challenge"
  29            if "captcha" in text or "you have been blocked" in text:
  30                return "captcha/anti-bot block"
  31            return "anti-bot challenge"
  32    return None
nipux_cli/task_match.py 102 lines
   1"""Task title matching helpers for long-running job queues."""
   2
   3from __future__ import annotations
   4
   5import re
   6from typing import Any
   7
   8TASK_MATCH_STOPWORDS = {
   9    "a",
  10    "an",
  11    "and",
  12    "as",
  13    "at",
  14    "by",
  15    "for",
  16    "from",
  17    "in",
  18    "into",
  19    "of",
  20    "on",
  21    "or",
  22    "the",
  23    "then",
  24    "to",
  25    "via",
  26    "with",
  27}
  28
  29
  30def task_key(parent: str, title: str) -> str:
  31    return re.sub(r"[^a-z0-9]+", "-", f"{parent}|{title}".lower()).strip("-")[:120]
  32
  33
  34def find_semantic_task_match(
  35    *,
  36    title: str,
  37    parent: str,
  38    tasks: list[dict[str, Any]],
  39    statuses: set[str] | None = None,
  40    min_score: float = 0.55,
  41) -> dict[str, Any] | None:
  42    incoming_title = str(title or "").strip()
  43    if not incoming_title:
  44        return None
  45    incoming_parent = str(parent or "").strip()
  46    incoming_key = task_key(incoming_parent, incoming_title)
  47    incoming_tokens = _task_tokens(incoming_title)
  48    if len(incoming_tokens) < 2:
  49        return None
  50    allowed_statuses = statuses or {"active", "open", "blocked"}
  51    best: dict[str, Any] | None = None
  52    best_score = 0.0
  53    for task in tasks:
  54        if not isinstance(task, dict):
  55            continue
  56        candidate_title = str(task.get("title") or "").strip()
  57        if not candidate_title:
  58            continue
  59        candidate_parent = str(task.get("parent") or "").strip()
  60        if incoming_parent and candidate_parent and incoming_parent != candidate_parent:
  61            continue
  62        candidate_key = str(task.get("key") or task_key(candidate_parent, candidate_title))
  63        if candidate_key == incoming_key:
  64            return None
  65        status = str(task.get("status") or "open").strip().lower().replace(" ", "_")
  66        if status not in allowed_statuses:
  67            continue
  68        candidate_tokens = _task_tokens(candidate_title)
  69        if len(candidate_tokens) < 2:
  70            continue
  71        score, overlap = _task_similarity(incoming_tokens, candidate_tokens)
  72        if overlap < 2 or score < min_score or score <= best_score:
  73            continue
  74        best_score = score
  75        best = {
  76            "task": task,
  77            "key": candidate_key,
  78            "title": candidate_title,
  79            "parent": candidate_parent,
  80            "status": status,
  81            "score": round(score, 3),
  82            "overlap": overlap,
  83        }
  84    return best
  85
  86
  87def _task_tokens(text: str) -> set[str]:
  88    tokens = {
  89        token
  90        for token in re.findall(r"[a-z0-9]+", str(text or "").lower())
  91        if len(token) > 1 and token not in TASK_MATCH_STOPWORDS
  92    }
  93    return tokens
  94
  95
  96def _task_similarity(left: set[str], right: set[str]) -> tuple[float, int]:
  97    overlap = len(left & right)
  98    if overlap <= 0:
  99        return 0.0, 0
 100    jaccard = overlap / max(1, len(left | right))
 101    containment = overlap / max(1, min(len(left), len(right)))
 102    return max(jaccard, containment), overlap
nipux_cli/templates.py 67 lines
   1"""Program templates for generic long-running jobs."""
   2
   3from __future__ import annotations
   4
   5
   6def program_for_job(*, kind: str, title: str, objective: str) -> str:
   7    kind = (kind or "generic").strip().lower()
   8    body = _TEMPLATES.get(kind, _generic_template)
   9    return body(title=title, objective=objective).strip() + "\n"
  10
  11
  12def _generic_template(*, title: str, objective: str) -> str:
  13    return f"""# {title}
  14
  15## Objective
  16
  17{objective}
  18
  19## Operating Rules
  20
  21- Work forever in bounded, resumable steps until the operator explicitly cancels or pauses the job.
  22- Treat useful results as checkpoints, not endings: save the result, create the next branch, and continue.
  23- Save important observations as artifacts.
  24- Use report_update for short progress notes or blocked-state notes.
  25- Use record_lesson when a source, mistake, operator preference, or strategy should affect future steps.
  26- Use record_source and record_findings when those tools are available so the job improves its ledgers over time.
  27- Use record_roadmap for broad work that needs milestones, feature groups, validation contracts, and roadmap-level checkpoints.
  28- Use record_milestone_validation to validate milestones from evidence and create follow-up tasks when validation fails or blocks.
  29- Use record_tasks to split broad objectives into durable branches with output contracts, acceptance criteria, required evidence, and stall behavior.
  30- Use record_experiment whenever a branch produces measured results, comparisons, benchmarks, scores, or optimization data.
  31- Use acknowledge_operator_context after incorporating or superseding active operator steering.
  32- Use browser and web tools first. Do not assume memory is exact unless it points to an artifact.
  33- Prefer quantity of attempts over one giant plan.
  34"""
  35
  36
  37def _research_paper_template(*, title: str, objective: str) -> str:
  38    return f"""# {title}
  39
  40## Objective
  41
  42{objective}
  43
  44## Research Rules
  45
  46- Save exact source URLs and extracted text snippets as artifacts.
  47- Keep a rolling citation map with claims, evidence, and open questions.
  48- Separate facts from hypotheses.
  49- Produce drafts only after evidence artifacts exist.
  50- Use report_update for brief progress, gap, or blocked-source notes.
  51- Use record_roadmap for the paper outline, evidence milestones, draft milestones, and validation checkpoints.
  52- Use record_milestone_validation when a section or draft milestone has enough evidence to judge.
  53- Use record_tasks to track source clusters, sections, and unresolved evidence gaps with output contracts and acceptance criteria.
  54- Use acknowledge_operator_context after incorporating or superseding active operator steering.
  55
  56## Step Loop
  57
  581. Search for one source cluster.
  592. Extract and save relevant evidence.
  603. Update the citation/evidence map.
  614. Write or improve one section when enough evidence exists.
  62"""
  63
  64
  65_TEMPLATES = {
  66    "research_paper": _research_paper_template,
  67}
nipux_cli/tools.py 2229 lines
   1"""Static tool registry for the Nipux agent."""
   2
   3from __future__ import annotations
   4
   5import json
   6import re
   7import time
   8from dataclasses import dataclass
   9from datetime import datetime, timedelta, timezone
  10from typing import Any, Callable
  11
  12from nipux_cli.artifacts import ArtifactStore
  13from nipux_cli.config import AppConfig
  14from nipux_cli.db import AgentDB
  15from nipux_cli.metric_format import format_metric_value
  16from nipux_cli.digest import send_digest_email
  17from nipux_cli.memory_graph import memory_graph_from_job, search_memory_graph
  18from nipux_cli.planning import initial_task_contract
  19from nipux_cli.shell_tools import shell_exec as _shell_exec
  20from nipux_cli.shell_tools import write_file as _write_file
  21from nipux_cli.task_match import find_semantic_task_match, task_key
  22
  23
  24@dataclass(frozen=True)
  25class ToolContext:
  26    config: AppConfig
  27    db: AgentDB
  28    artifacts: ArtifactStore
  29    job_id: str
  30    run_id: str | None = None
  31    step_id: str | None = None
  32    task_id: str | None = None
  33
  34
  35Handler = Callable[[dict[str, Any], ToolContext], str]
  36
  37EVIDENCE_OUTPUT_TERMS = {
  38    "audit",
  39    "checkpoint",
  40    "evidence",
  41    "extract",
  42    "extracted",
  43    "notes",
  44    "source",
  45    "sources",
  46}
  47DELIVERABLE_OUTPUT_TERMS = {
  48    "compiled",
  49    "deliverable",
  50    "draft",
  51    "final",
  52    "revision",
  53    "updated",
  54}
  55
  56
  57@dataclass(frozen=True)
  58class ToolSpec:
  59    name: str
  60    description: str
  61    parameters: dict[str, Any]
  62    handler: Handler
  63
  64    def as_openai_tool(self) -> dict[str, Any]:
  65        return {
  66            "type": "function",
  67            "function": {
  68                "name": self.name,
  69                "description": self.description,
  70                "parameters": self.parameters,
  71            },
  72        }
  73
  74
  75def _missing_argument(value: Any) -> bool:
  76    if value is None:
  77        return True
  78    if isinstance(value, str):
  79        stripped = value.strip()
  80        if not stripped:
  81            return True
  82        lowered = stripped.lower()
  83        if lowered in {"...", "…", "<...>", "{...}", "{{...}}", "placeholder", "todo", "tbd"}:
  84            return True
  85        if re.fullmatch(r"[.<{\[(\s]*\.{3,}[\s>}\])]*", stripped):
  86            return True
  87        return False
  88    if isinstance(value, (list, dict, tuple, set)):
  89        return not value
  90    return False
  91
  92
  93REFERENCE_LIKE_FIELD_PATTERN = re.compile(r"(?i)(?:^|_)(?:artifact|id|path|ref|source|url)(?:$|_)")
  94
  95
  96def _placeholder_argument(value: Any) -> bool:
  97    if _missing_argument(value):
  98        return True
  99    if not isinstance(value, str):
 100        return False
 101    stripped = value.strip().strip("'\"")
 102    if not stripped:
 103        return True
 104    if re.search(r"\s", stripped):
 105        return False
 106    return bool(re.search(r"(?:\.{3,}|…)$", stripped))
 107
 108
 109def _schema_placeholder_arguments(schema: dict[str, Any], value: Any, *, path: str = "") -> list[str]:
 110    schema_type = schema.get("type")
 111    placeholders: list[str] = []
 112    if schema_type == "object" and isinstance(value, dict):
 113        properties = schema.get("properties") if isinstance(schema.get("properties"), dict) else {}
 114        for name, child_schema in properties.items():
 115            if name not in value or not isinstance(child_schema, dict):
 116                continue
 117            child_path = f"{path}.{name}" if path else str(name)
 118            if REFERENCE_LIKE_FIELD_PATTERN.search(str(name)) and _placeholder_argument(value.get(name)):
 119                placeholders.append(child_path)
 120                continue
 121            placeholders.extend(_schema_placeholder_arguments(child_schema, value.get(name), path=child_path))
 122    elif schema_type == "array" and isinstance(value, list):
 123        item_schema = schema.get("items") if isinstance(schema.get("items"), dict) else {}
 124        for index, item in enumerate(value[:50]):
 125            placeholders.extend(_schema_placeholder_arguments(item_schema, item, path=f"{path}[{index}]"))
 126    return placeholders
 127
 128
 129def _schema_missing_arguments(schema: dict[str, Any], value: Any, *, path: str = "") -> list[str]:
 130    schema_type = schema.get("type")
 131    missing: list[str] = []
 132    if schema_type == "object" and isinstance(value, dict):
 133        properties = schema.get("properties") if isinstance(schema.get("properties"), dict) else {}
 134        for required in schema.get("required") or []:
 135            name = str(required)
 136            if _missing_argument(value.get(name)):
 137                missing.append(f"{path}.{name}" if path else name)
 138        for name, child_schema in properties.items():
 139            if name in value and isinstance(child_schema, dict) and not _missing_argument(value.get(name)):
 140                child_path = f"{path}.{name}" if path else str(name)
 141                missing.extend(_schema_missing_arguments(child_schema, value.get(name), path=child_path))
 142    elif schema_type == "array" and isinstance(value, list):
 143        item_schema = schema.get("items") if isinstance(schema.get("items"), dict) else {}
 144        for index, item in enumerate(value[:50]):
 145            missing.extend(_schema_missing_arguments(item_schema, item, path=f"{path}[{index}]"))
 146    return missing
 147
 148
 149REQUIRED_ARGUMENT_ALIASES: dict[str, dict[str, tuple[str, ...]]] = {
 150    "record_experiment": {
 151        "title": ("title", "name", "metric_name", "hypothesis", "result", "outcome"),
 152    },
 153}
 154
 155REQUIRED_ARGUMENT_GROUPS: dict[str, tuple[tuple[str, tuple[str, ...]], ...]] = {
 156    "read_artifact": (("artifact reference", ("artifact_id", "path", "title", "ref")),),
 157    "record_memory_graph": (("nodes or edges", ("nodes", "edges")),),
 158}
 159
 160
 161def _json(value: Any) -> str:
 162    return json.dumps(value, ensure_ascii=False)
 163
 164
 165def _write_artifact(args: dict[str, Any], ctx: ToolContext) -> str:
 166    content = str(args.get("content") or "")
 167    if not content:
 168        return _json({"success": False, "error": "content is required"})
 169    stored = ctx.artifacts.write_text(
 170        job_id=ctx.job_id,
 171        run_id=ctx.run_id,
 172        step_id=ctx.step_id,
 173        content=content,
 174        title=args.get("title"),
 175        summary=args.get("summary"),
 176        artifact_type=str(args.get("type") or "text"),
 177        metadata=args.get("metadata") if isinstance(args.get("metadata"), dict) else None,
 178    )
 179    return _json({
 180        "success": True,
 181        "artifact_id": stored.id,
 182        "path": str(stored.path),
 183        "sha256": stored.sha256,
 184    })
 185
 186
 187def _read_artifact(args: dict[str, Any], ctx: ToolContext) -> str:
 188    artifact_ref = str(args.get("artifact_id") or args.get("path") or args.get("title") or args.get("ref") or "")
 189    if not artifact_ref:
 190        return _json({"success": False, "error": "artifact_id is required"})
 191    resolved = _resolve_artifact_ref(ctx, artifact_ref)
 192    if not resolved:
 193        recent = _recent_artifact_refs(ctx)
 194        return _json({
 195            "success": False,
 196            "recoverable": True,
 197            "error": f"artifact not found: {artifact_ref}",
 198            "guidance": (
 199                "The requested artifact reference does not exist. Use one of the recent_artifacts refs, "
 200                "call search_artifacts with a concrete query, or continue from already observed evidence."
 201            ),
 202            "recent_artifacts": recent,
 203        })
 204    try:
 205        content = ctx.artifacts.read_text(resolved["id"])
 206    except (OSError, KeyError, ValueError) as exc:
 207        return _json({"success": False, "artifact_id": resolved["id"], "error": str(exc)})
 208    return _json({"success": True, "artifact_id": resolved["id"], "title": resolved.get("title"), "path": resolved.get("path"), "content": content})
 209
 210
 211def _recent_artifact_refs(ctx: ToolContext, limit: int = 8) -> list[dict[str, str]]:
 212    artifacts = ctx.db.list_artifacts(ctx.job_id, limit=limit)
 213    refs: list[dict[str, str]] = []
 214    for index, artifact in enumerate(artifacts, start=1):
 215        refs.append({
 216            "number": str(index),
 217            "id": str(artifact.get("id") or ""),
 218            "title": str(artifact.get("title") or ""),
 219            "path": str(artifact.get("path") or ""),
 220        })
 221    return refs
 222
 223
 224def _resolve_artifact_ref(ctx: ToolContext, artifact_ref: str) -> dict[str, Any] | None:
 225    ref = artifact_ref.strip().strip("'\"")
 226    if not ref:
 227        return None
 228    artifacts = ctx.db.list_artifacts(ctx.job_id, limit=250)
 229    for artifact in artifacts:
 230        if ref == artifact.get("id") or ref == str(artifact.get("path") or ""):
 231            return artifact
 232    if ref.isdigit():
 233        index = int(ref) - 1
 234        if 0 <= index < len(artifacts):
 235            return artifacts[index]
 236    lowered = ref.lower()
 237    for artifact in artifacts:
 238        title = str(artifact.get("title") or "").lower()
 239        if lowered == title:
 240            return artifact
 241    for artifact in artifacts:
 242        haystack = " ".join(str(artifact.get(key) or "") for key in ("title", "summary", "path")).lower()
 243        if lowered and lowered in haystack:
 244            return artifact
 245    return None
 246
 247
 248def _search_artifacts(args: dict[str, Any], ctx: ToolContext) -> str:
 249    query = str(args.get("query") or "")
 250    limit = int(args.get("limit") or 10)
 251    return _json({"success": True, "results": ctx.artifacts.search_text(job_id=ctx.job_id, query=query, limit=limit)})
 252
 253
 254def _update_job_state(args: dict[str, Any], ctx: ToolContext) -> str:
 255    status = str(args.get("status") or "").strip().lower()
 256    if status in {"paused", "cancelled", "completed", "failed"}:
 257        note = str(args.get("note") or "")
 258        follow_up_task = None
 259        if status == "completed":
 260            follow_up_task = _append_completion_audit_task(
 261                ctx,
 262                source="update_job_state",
 263                requested_status=status,
 264                claimed_message=note,
 265            )
 266        metadata = {"requested_status": status, "kept_running": True}
 267        if follow_up_task is not None:
 268            metadata["follow_up_task"] = follow_up_task.get("key")
 269        ctx.db.append_agent_update(
 270            ctx.job_id,
 271            f"Worker requested {status}; job remains running. {note}".strip(),
 272            category="progress" if status == "completed" else "blocked",
 273            metadata=metadata,
 274        )
 275        result = {
 276            "success": True,
 277            "job_id": ctx.job_id,
 278            "status": "running",
 279            "requested_status": status,
 280            "kept_running": True,
 281            "guidance": (
 282                "Jobs are perpetual by default. Do not mark the job complete or failed. "
 283                "Save the current result, create follow-up tasks, report a checkpoint, and continue."
 284            ),
 285        }
 286        if follow_up_task is not None:
 287            result["follow_up_task"] = follow_up_task
 288        return _json(result)
 289    if status not in {"queued", "running"}:
 290        return _json({"success": False, "error": f"invalid status: {status}"})
 291    note = str(args.get("note") or "")
 292    patch = {"last_note": note} if note else None
 293    ctx.db.update_job_status(ctx.job_id, status, metadata_patch=patch)
 294    return _json({"success": True, "job_id": ctx.job_id, "status": status})
 295
 296
 297def _defer_job(args: dict[str, Any], ctx: ToolContext) -> str:
 298    until = _defer_until(args)
 299    reason = str(args.get("reason") or "").strip()
 300    next_action = str(args.get("next_action") or "").strip()
 301    patch = {
 302        "defer_until": until.isoformat(),
 303        "defer_reason": reason,
 304        "defer_next_action": next_action,
 305    }
 306    job = ctx.db.get_job(ctx.job_id)
 307    status = str(job.get("status") or "queued")
 308    if status not in {"queued", "running"}:
 309        status = "queued"
 310    ctx.db.update_job_status(ctx.job_id, status, metadata_patch=patch)
 311    message = f"Deferred until {until.isoformat()}"
 312    if reason:
 313        message += f": {reason}"
 314    if next_action:
 315        message += f" Next: {next_action}"
 316    ctx.db.append_agent_update(
 317        ctx.job_id,
 318        message,
 319        category="progress",
 320        metadata={"defer_until": until.isoformat(), "reason": reason, "next_action": next_action},
 321    )
 322    return _json({
 323        "success": True,
 324        "job_id": ctx.job_id,
 325        "status": status,
 326        "defer_until": until.isoformat(),
 327        "reason": reason,
 328        "next_action": next_action,
 329    })
 330
 331
 332def _defer_until(args: dict[str, Any]) -> datetime:
 333    raw_until = str(args.get("until") or "").strip()
 334    if raw_until:
 335        try:
 336            parsed = datetime.fromisoformat(raw_until.replace("Z", "+00:00"))
 337        except ValueError:
 338            parsed = datetime.now(timezone.utc)
 339        if parsed.tzinfo is None:
 340            parsed = parsed.replace(tzinfo=timezone.utc)
 341        return parsed.astimezone(timezone.utc)
 342    seconds = args.get("seconds", args.get("delay_seconds", 300))
 343    try:
 344        delay = max(1.0, float(seconds))
 345    except (TypeError, ValueError):
 346        delay = 300.0
 347    return datetime.now(timezone.utc) + timedelta(seconds=delay)
 348
 349
 350def _report_update(args: dict[str, Any], ctx: ToolContext) -> str:
 351    message = str(args.get("message") or args.get("summary") or "").strip()
 352    if not message:
 353        return _json({"success": False, "error": "message is required"})
 354    category = str(args.get("category") or "progress").strip().lower()
 355    metadata = args.get("metadata") if isinstance(args.get("metadata"), dict) else {}
 356    normalized_message = _perpetual_checkpoint_message(message)
 357    if normalized_message != message:
 358        metadata = {**metadata, "original_message": message, "rewritten_completion_claim": True}
 359        message = normalized_message
 360        follow_up_task = _append_completion_audit_task(
 361            ctx,
 362            source="report_update",
 363            requested_status="completed",
 364            claimed_message=str(metadata.get("original_message") or ""),
 365        )
 366        metadata["follow_up_task"] = follow_up_task.get("key")
 367    entry = ctx.db.append_agent_update(ctx.job_id, message, category=category, metadata=metadata)
 368    return _json({"success": True, "job_id": ctx.job_id, "update": entry})
 369
 370
 371def _append_completion_audit_task(
 372    ctx: ToolContext,
 373    *,
 374    source: str,
 375    requested_status: str,
 376    claimed_message: str = "",
 377) -> dict[str, Any]:
 378    return ctx.db.append_task_record(
 379        ctx.job_id,
 380        title="Audit latest checkpoint against objective",
 381        status="open",
 382        priority=7,
 383        goal=(
 384            "Before treating the latest checkpoint as sufficient, compare the objective and operator context "
 385            "against concrete artifacts, files, findings, measurements, validations, and task results."
 386        ),
 387        output_contract="decision",
 388        acceptance_criteria=(
 389            "A prompt-to-artifact checklist maps explicit requirements to evidence, identifies uncovered gaps, "
 390            "and opens or continues the next branch from those gaps."
 391        ),
 392        evidence_needed=(
 393            "Objective text, active operator context, latest durable outputs, recent tool/test results, "
 394            "task queue state, roadmap validations, and measured results when applicable."
 395        ),
 396        stall_behavior=(
 397            "If evidence is missing, mark the checkpoint incomplete, record the gap, and create the smallest "
 398            "follow-up task instead of claiming completion."
 399        ),
 400        metadata={
 401            "source": source,
 402            "requested_status": requested_status,
 403            "completion_audit_required": True,
 404            "claimed_message": claimed_message[:1000],
 405        },
 406    )
 407
 408
 409def _perpetual_checkpoint_message(message: str) -> str:
 410    """Keep worker reports checkpoint-oriented without hiding the underlying audit trail."""
 411
 412    text = " ".join(str(message or "").split())
 413    if not text:
 414        return ""
 415    leading_claim = re.compile(
 416        r"(?i)^\s*(?:the\s+)?(?:job|objective|run|work)\s+"
 417        r"(?:is\s+|was\s+)?(?:complete|completed|done|finished)\b[.!:,\-\s]*"
 418    )
 419    if leading_claim.search(text):
 420        rest = leading_claim.sub("", text, count=1).strip()
 421        if rest:
 422            return f"Checkpoint reported; continuing work. {rest}"
 423        return "Checkpoint reported; continuing work."
 424    whole_job_claim = re.compile(
 425        r"(?i)\b(?:completed|finished|done\s+with)\s+(?:the\s+)?(?:job|objective|run|work)\b"
 426    )
 427    if whole_job_claim.search(text):
 428        return "Checkpoint reported; continuing work. " + whole_job_claim.sub("reached a checkpoint for the work", text, count=1)
 429    return text
 430
 431
 432def _record_lesson(args: dict[str, Any], ctx: ToolContext) -> str:
 433    lesson = str(args.get("lesson") or args.get("memory") or "").strip()
 434    if not lesson:
 435        return _json({"success": False, "error": "lesson is required"})
 436    category = str(args.get("category") or "memory").strip().lower()
 437    confidence_arg = args.get("confidence")
 438    confidence = float(confidence_arg) if isinstance(confidence_arg, (int, float)) else None
 439    metadata = args.get("metadata") if isinstance(args.get("metadata"), dict) else {}
 440    pending_measurement = _pending_measurement(ctx)
 441    measurement_resolution_category = category in {"constraint", "mistake", "strategy", "memory"}
 442    if pending_measurement and measurement_resolution_category and not _lesson_explains_measurement_obligation(lesson, metadata):
 443        return _json({
 444            "success": False,
 445            "error": "measurement explanation required",
 446            "message": (
 447                "A pending measurement must be resolved with record_experiment, a follow-up task, "
 448                "or a lesson that explicitly explains why the output is not a valid measurement."
 449            ),
 450            "pending_measurement_obligation": pending_measurement,
 451        })
 452    entry = ctx.db.append_lesson(ctx.job_id, lesson, category=category, confidence=confidence, metadata=metadata)
 453    if pending_measurement and measurement_resolution_category:
 454        _resolve_measurement_obligation(
 455            ctx,
 456            status="explained",
 457            reason=lesson,
 458            via_tool="record_lesson",
 459        )
 460    return _json({"success": True, "job_id": ctx.job_id, "lesson": entry})
 461
 462
 463def _lesson_explains_measurement_obligation(lesson: str, metadata: dict[str, Any]) -> bool:
 464    parts = [lesson]
 465    for value in metadata.values():
 466        if isinstance(value, (str, int, float, bool)):
 467            parts.append(str(value))
 468    text = " ".join(parts).lower()
 469    measurement_terms = (
 470        "measure",
 471        "measured",
 472        "measurement",
 473        "metric",
 474        "experiment",
 475        "trial",
 476        "benchmark",
 477        "result",
 478        "obligation",
 479    )
 480    accounting_terms = (
 481        "invalid",
 482        "valid",
 483        "not valid",
 484        "diagnostic",
 485        "missing",
 486        "no metric",
 487        "without metric",
 488        "blocked",
 489        "failed",
 490        "failure",
 491        "timeout",
 492        "permission",
 493        "auth",
 494        "quota",
 495        "rate limit",
 496        "unavailable",
 497        "unable",
 498        "cannot",
 499        "can't",
 500        "could not",
 501        "stale",
 502        "incomplete",
 503        "not comparable",
 504        "not enough",
 505        "rerun",
 506        "re-run",
 507        "retry",
 508    )
 509    return any(term in text for term in measurement_terms) and any(term in text for term in accounting_terms)
 510
 511
 512def _record_memory_graph(args: dict[str, Any], ctx: ToolContext) -> str:
 513    nodes = args.get("nodes") if isinstance(args.get("nodes"), list) else []
 514    edges = args.get("edges") if isinstance(args.get("edges"), list) else []
 515    if not nodes and not edges:
 516        return _json({"success": False, "error": "nodes or edges are required"})
 517    record = ctx.db.append_memory_graph_records(ctx.job_id, nodes=nodes, edges=edges)
 518    ctx.db.append_agent_update(
 519        ctx.job_id,
 520        (
 521            "Memory graph updated: "
 522            f"{record.get('added_nodes')} new nodes, {record.get('updated_nodes')} updated, "
 523            f"{record.get('added_edges')} new links."
 524        ),
 525        category="progress",
 526        metadata={
 527            "memory_graph_event_id": record.get("event_id"),
 528            "added_nodes": record.get("added_nodes"),
 529            "updated_nodes": record.get("updated_nodes"),
 530            "added_edges": record.get("added_edges"),
 531        },
 532    )
 533    return _json({"success": True, "job_id": ctx.job_id, **record})
 534
 535
 536def _search_memory_graph(args: dict[str, Any], ctx: ToolContext) -> str:
 537    query = str(args.get("query") or "")
 538    limit = int(args.get("limit") or 10)
 539    job = ctx.db.get_job(ctx.job_id)
 540    graph = memory_graph_from_job(job)
 541    results = search_memory_graph(graph, query=query, limit=limit)
 542    return _json({"success": True, "job_id": ctx.job_id, "query": query, **results})
 543
 544
 545def _acknowledge_operator_context(args: dict[str, Any], ctx: ToolContext) -> str:
 546    raw_ids = args.get("message_ids")
 547    message_ids = [str(item) for item in raw_ids] if isinstance(raw_ids, list) else []
 548    summary = str(args.get("summary") or args.get("reason") or "").strip()
 549    status = str(args.get("status") or "acknowledged").strip().lower()
 550    pending = _acknowledgeable_operator_messages(ctx.db.get_job(ctx.job_id), message_ids=message_ids)
 551    if not pending:
 552        return _json({
 553            "success": False,
 554            "recoverable": True,
 555            "error": "no active operator context to acknowledge",
 556            "message_ids": message_ids,
 557            "guidance": "Use acknowledge_operator_context only after incorporating claimed operator steering. Use report_update, record_lesson, record_tasks, or record_experiment for ordinary progress.",
 558        })
 559    result = ctx.db.acknowledge_operator_messages(
 560        ctx.job_id,
 561        message_ids=message_ids,
 562        summary=summary,
 563        status=status,
 564    )
 565    message = summary or f"Operator context {result.get('status')}."
 566    ctx.db.append_agent_update(
 567        ctx.job_id,
 568        message,
 569        category="progress",
 570        metadata={
 571            "operator_context_status": result.get("status"),
 572            "operator_message_count": result.get("count"),
 573            "operator_message_ids": [
 574                entry.get("event_id")
 575                for entry in result.get("messages", [])
 576                if isinstance(entry, dict) and entry.get("event_id")
 577            ],
 578        },
 579    )
 580    return _json({"success": True, "job_id": ctx.job_id, **result})
 581
 582
 583def _acknowledgeable_operator_messages(job: dict[str, Any], *, message_ids: list[str]) -> list[dict[str, Any]]:
 584    metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
 585    messages = metadata.get("operator_messages") if isinstance(metadata.get("operator_messages"), list) else []
 586    wanted = {str(message_id).strip() for message_id in message_ids if str(message_id).strip()}
 587    pending = []
 588    for entry in messages:
 589        if not isinstance(entry, dict):
 590            continue
 591        mode = str(entry.get("mode") or "steer").strip().lower().replace("-", "_")
 592        if mode not in {"steer", "follow_up"}:
 593            continue
 594        event_id = str(entry.get("event_id") or "")
 595        if wanted and event_id not in wanted:
 596            continue
 597        if not wanted and not entry.get("claimed_at"):
 598            continue
 599        if entry.get("acknowledged_at") or entry.get("superseded_at"):
 600            continue
 601        pending.append(entry)
 602    return pending
 603
 604
 605def _record_source(args: dict[str, Any], ctx: ToolContext) -> str:
 606    source = str(args.get("source") or args.get("url") or args.get("domain") or "").strip()
 607    if not source:
 608        return _json({"success": False, "error": "source is required"})
 609    warnings_raw = args.get("warnings")
 610    warnings = [str(item) for item in warnings_raw] if isinstance(warnings_raw, list) else []
 611    score_arg = args.get("usefulness_score")
 612    usefulness_score = float(score_arg) if isinstance(score_arg, (int, float)) else None
 613    yield_count = int(args.get("yield_count") or 0)
 614    fail_count_delta = int(args.get("fail_count_delta") or 0)
 615    metadata = args.get("metadata") if isinstance(args.get("metadata"), dict) else {}
 616    source_type = str(args.get("source_type") or "")
 617    outcome = str(args.get("outcome") or "")
 618    if not _source_has_assessment(
 619        source_type=source_type,
 620        usefulness_score=usefulness_score,
 621        yield_count=yield_count,
 622        fail_count_delta=fail_count_delta,
 623        warnings=warnings,
 624        outcome=outcome,
 625        metadata=metadata,
 626    ):
 627        return _json({
 628            "success": False,
 629            "error": "source assessment is required",
 630            "guidance": (
 631                "record_source must say why the source matters: include usefulness_score, "
 632                "yield_count, fail_count_delta, warnings, outcome, or evidence metadata."
 633            ),
 634        })
 635    entry = ctx.db.append_source_record(
 636        ctx.job_id,
 637        source,
 638        source_type=source_type,
 639        usefulness_score=usefulness_score,
 640        yield_count=yield_count,
 641        fail_count_delta=fail_count_delta,
 642        warnings=warnings,
 643        outcome=outcome,
 644        metadata=metadata,
 645    )
 646    return _json({"success": True, "job_id": ctx.job_id, "source": entry})
 647
 648
 649def _source_has_assessment(
 650    *,
 651    source_type: str,
 652    usefulness_score: float | None,
 653    yield_count: int,
 654    fail_count_delta: int,
 655    warnings: list[str],
 656    outcome: str,
 657    metadata: dict[str, Any],
 658) -> bool:
 659    return bool(
 660        usefulness_score is not None
 661        or yield_count
 662        or fail_count_delta
 663        or warnings
 664        or outcome.strip()
 665        or any(str(value).strip() for value in metadata.values())
 666    )
 667
 668
 669def _record_findings(args: dict[str, Any], ctx: ToolContext) -> str:
 670    raw_findings = args.get("findings")
 671    if isinstance(raw_findings, list):
 672        findings = [item for item in raw_findings if isinstance(item, dict)]
 673    else:
 674        findings = [args]
 675    if not findings:
 676        return _json({"success": False, "error": "findings are required"})
 677    evidence_artifact = str(args.get("evidence_artifact") or args.get("artifact_id") or "")
 678    stored = []
 679    added = 0
 680    updated = 0
 681    unchanged = 0
 682    rejected = []
 683    source_yields: dict[str, int] = {}
 684    for finding in findings[:50]:
 685        name = str(finding.get("name") or finding.get("title") or "").strip()
 686        if not name:
 687            continue
 688        source_url = str(finding.get("source_url") or finding.get("source") or args.get("source_url") or args.get("source") or "")
 689        reason = str(finding.get("reason") or finding.get("rationale") or "")
 690        finding_evidence_artifact = str(finding.get("evidence_artifact") or evidence_artifact)
 691        metadata = finding.get("metadata") if isinstance(finding.get("metadata"), dict) else {}
 692        if not _finding_has_evidence(
 693            url=str(finding.get("url") or ""),
 694            source_url=source_url,
 695            reason=reason,
 696            evidence_artifact=finding_evidence_artifact,
 697            metadata=metadata,
 698        ):
 699            rejected.append({"name": name, "reason": "missing_evidence"})
 700            continue
 701        score_arg = finding.get("score")
 702        score = float(score_arg) if isinstance(score_arg, (int, float)) else None
 703        entry = ctx.db.append_finding_record(
 704            ctx.job_id,
 705            name=name,
 706            url=str(finding.get("url") or ""),
 707            source_url=source_url,
 708            category=str(finding.get("category") or finding.get("type") or ""),
 709            location=str(finding.get("location") or ""),
 710            contact=str(finding.get("contact") or ""),
 711            reason=reason,
 712            status=str(finding.get("status") or "new"),
 713            score=score,
 714            evidence_artifact=finding_evidence_artifact,
 715            metadata=metadata,
 716        )
 717        if entry.get("created"):
 718            added += 1
 719            if source_url:
 720                source_yields[source_url] = source_yields.get(source_url, 0) + 1
 721        elif entry.get("substantive_update"):
 722            updated += 1
 723        else:
 724            unchanged += 1
 725        stored.append(entry)
 726    if not stored:
 727        return _json({
 728            "success": False,
 729            "error": "no valid finding with name/title and evidence was provided",
 730            "rejected": rejected,
 731            "guidance": (
 732                "Each finding must include an evidence anchor such as source_url/url, reason/rationale, "
 733                "evidence_artifact, or evidence metadata. Use record_tasks or record_source for unevidenced candidates."
 734            ),
 735        })
 736    source_records = []
 737    for source_url, count in source_yields.items():
 738        score = round(min(1.0, 0.55 + min(count, 10) * 0.04), 2)
 739        source_records.append(
 740            ctx.db.append_source_record(
 741                ctx.job_id,
 742                source_url,
 743                source_type=str(args.get("source_type") or "finding_source"),
 744                usefulness_score=score,
 745                yield_count=count,
 746                outcome=f"record_findings yielded {count} new candidate(s)",
 747                metadata={"auto_from_record_findings": True, "evidence_artifact": evidence_artifact},
 748            )
 749        )
 750    if added or updated or source_records:
 751        message = f"Finding ledger updated: {added} new, {updated} changed. Source ledger updated: {len(source_records)}."
 752        if unchanged:
 753            message += f" {unchanged} unchanged."
 754        ctx.db.append_agent_update(
 755            ctx.job_id,
 756            message,
 757            category="finding",
 758            metadata={
 759                "added": added,
 760                "updated": updated,
 761                "unchanged": unchanged,
 762                "rejected": len(rejected),
 763                "sources_updated": len(source_records),
 764            },
 765        )
 766    return _json({
 767        "success": True,
 768        "job_id": ctx.job_id,
 769        "added": added,
 770        "updated": updated,
 771        "unchanged": unchanged,
 772        "rejected": rejected,
 773        "sources_updated": len(source_records),
 774        "sources": source_records,
 775        "findings": stored,
 776    })
 777
 778
 779def _finding_has_evidence(
 780    *,
 781    url: str,
 782    source_url: str,
 783    reason: str,
 784    evidence_artifact: str,
 785    metadata: dict[str, Any],
 786) -> bool:
 787    if url.strip() or source_url.strip() or reason.strip() or evidence_artifact.strip():
 788        return True
 789    evidence_keys = {
 790        "artifact_id",
 791        "evidence_artifact",
 792        "experiment_key",
 793        "file_path",
 794        "output_path",
 795        "source_id",
 796        "source_url",
 797        "step_id",
 798    }
 799    return any(str(metadata.get(key) or "").strip() for key in evidence_keys)
 800
 801
 802def _record_tasks(args: dict[str, Any], ctx: ToolContext) -> str:
 803    raw_tasks = args.get("tasks")
 804    if isinstance(raw_tasks, list):
 805        tasks = [item for item in raw_tasks if isinstance(item, dict)]
 806    else:
 807        tasks = [args]
 808    if not tasks:
 809        return _json({"success": False, "error": "tasks are required"})
 810
 811    pending_measurement = _pending_measurement(ctx)
 812    task_queue_pressure = _task_queue_pressure_active(ctx)
 813    prepared_tasks: list[dict[str, Any]] = []
 814    for task in tasks[:50]:
 815        title = str(task.get("title") or task.get("name") or "").strip()
 816        if not title:
 817            continue
 818        parent = str(task.get("parent") or "")
 819        metadata = task.get("metadata") if isinstance(task.get("metadata"), dict) else {}
 820        if task_queue_pressure:
 821            match = _semantic_task_match_under_pressure(ctx, title=title, parent=parent)
 822            if match:
 823                metadata = dict(metadata)
 824                metadata.setdefault("original_title", title)
 825                metadata.setdefault(
 826                    "matched_existing_task",
 827                    {
 828                        "key": match.get("key"),
 829                        "title": match.get("title"),
 830                        "score": match.get("score"),
 831                        "overlap": match.get("overlap"),
 832                    },
 833                )
 834                title = str(match.get("title") or title)
 835                parent = str(match.get("parent") or parent)
 836        output_contract = str(
 837            task.get("output_contract")
 838            or task.get("contract")
 839            or metadata.get("output_contract")
 840            or metadata.get("contract")
 841            or ""
 842        )
 843        acceptance_criteria = str(task.get("acceptance_criteria") or "")
 844        evidence_needed = str(task.get("evidence_needed") or "")
 845        stall_behavior = str(task.get("stall_behavior") or "")
 846        output_contract, acceptance_criteria, evidence_needed, stall_behavior, metadata = _complete_task_contract(
 847            title=title,
 848            output_contract=output_contract,
 849            acceptance_criteria=acceptance_criteria,
 850            evidence_needed=evidence_needed,
 851            stall_behavior=stall_behavior,
 852            metadata=metadata,
 853        )
 854        goal = str(task.get("goal") or task.get("description") or "")
 855        source_hint = str(task.get("source_hint") or task.get("source") or "")
 856        result_text = str(task.get("result") or task.get("outcome") or "")
 857        priority_arg = task.get("priority")
 858        priority = int(priority_arg) if isinstance(priority_arg, (int, float)) else 0
 859        status = str(task.get("status") or "open")
 860        if pending_measurement and not _task_targets_measurement_obligation(
 861            title=title,
 862            goal=goal,
 863            source_hint=source_hint,
 864            result=result_text,
 865            output_contract=output_contract,
 866            acceptance_criteria=acceptance_criteria,
 867            evidence_needed=evidence_needed,
 868            stall_behavior=stall_behavior,
 869            metadata=metadata,
 870        ) and not _task_would_be_unchanged(
 871            ctx,
 872            title=title,
 873            status=status,
 874            priority=priority,
 875            goal=goal,
 876            source_hint=source_hint,
 877            result=result_text,
 878            parent=parent,
 879            output_contract=output_contract,
 880            acceptance_criteria=acceptance_criteria,
 881            evidence_needed=evidence_needed,
 882            stall_behavior=stall_behavior,
 883            metadata=metadata,
 884        ):
 885            return _json({
 886                "success": False,
 887                "error": "measurement task required",
 888                "message": (
 889                    "A pending measurement can only be deferred by a task that explicitly obtains, "
 890                    "repairs, validates, or accounts for that measurement."
 891                ),
 892                "rejected_task": title,
 893                "pending_measurement_obligation": pending_measurement,
 894            })
 895        prepared_tasks.append({
 896            "task": task,
 897            "title": title,
 898            "goal": goal,
 899            "source_hint": source_hint,
 900            "result_text": result_text,
 901            "parent": parent,
 902            "priority": priority,
 903            "status": status,
 904            "metadata": metadata,
 905            "output_contract": output_contract,
 906            "acceptance_criteria": acceptance_criteria,
 907            "evidence_needed": evidence_needed,
 908            "stall_behavior": stall_behavior,
 909        })
 910
 911    stored = []
 912    added = 0
 913    updated = 0
 914    unchanged = 0
 915    for prepared in prepared_tasks:
 916        task = prepared["task"]
 917        title = prepared["title"]
 918        status = prepared["status"]
 919        metadata = prepared["metadata"]
 920        output_contract = prepared["output_contract"]
 921        acceptance_criteria = prepared["acceptance_criteria"]
 922        evidence_needed = prepared["evidence_needed"]
 923        stall_behavior = prepared["stall_behavior"]
 924        result_text = prepared["result_text"]
 925        status, metadata = _validated_task_status(
 926            ctx,
 927            status=status,
 928            output_contract=output_contract,
 929            result=result_text,
 930            metadata=metadata,
 931        )
 932        entry = ctx.db.append_task_record(
 933            ctx.job_id,
 934            title=title,
 935            status=status,
 936            priority=prepared["priority"],
 937            goal=prepared["goal"],
 938            source_hint=prepared["source_hint"],
 939            result=result_text,
 940            parent=prepared["parent"],
 941            output_contract=output_contract,
 942            acceptance_criteria=acceptance_criteria,
 943            evidence_needed=evidence_needed,
 944            stall_behavior=stall_behavior,
 945            metadata=metadata,
 946        )
 947        if entry.get("created"):
 948            added += 1
 949        elif entry.get("substantive_update"):
 950            updated += 1
 951        else:
 952            unchanged += 1
 953        stored.append(entry)
 954    if not stored:
 955        return _json({"success": False, "error": "no valid task with title/name was provided"})
 956
 957    if added or updated:
 958        message = f"Task queue updated: {added} new, {updated} changed."
 959        if unchanged:
 960            message += f" {unchanged} unchanged."
 961        ctx.db.append_agent_update(
 962            ctx.job_id,
 963            message,
 964            category="plan",
 965            metadata={"added": added, "updated": updated, "unchanged": unchanged},
 966        )
 967    if (added or updated) and pending_measurement:
 968        _resolve_measurement_obligation(
 969            ctx,
 970            status="deferred",
 971            reason="Created or updated task branch to obtain or handle the pending measurement.",
 972            via_tool="record_tasks",
 973        )
 974    return _json({"success": True, "job_id": ctx.job_id, "added": added, "updated": updated, "unchanged": unchanged, "tasks": stored})
 975
 976
 977def _task_targets_measurement_obligation(
 978    *,
 979    title: str,
 980    goal: str,
 981    source_hint: str,
 982    result: str,
 983    output_contract: str,
 984    acceptance_criteria: str,
 985    evidence_needed: str,
 986    stall_behavior: str,
 987    metadata: dict[str, Any],
 988) -> bool:
 989    text = " ".join([
 990        title,
 991        goal,
 992        source_hint,
 993        result,
 994        output_contract,
 995        acceptance_criteria,
 996        evidence_needed,
 997        stall_behavior,
 998        _metadata_scalar_text(metadata),
 999    ]).lower()
1000    contract = output_contract.strip().lower()
1001    if contract in {"experiment", "monitor"} and _text_mentions_measurement(text):
1002        return True
1003    return _text_mentions_measurement(text) and _text_mentions_measurement_accounting(text)
1004
1005
1006def _task_would_be_unchanged(
1007    ctx: ToolContext,
1008    *,
1009    title: str,
1010    status: str,
1011    priority: int,
1012    goal: str,
1013    source_hint: str,
1014    result: str,
1015    parent: str,
1016    output_contract: str,
1017    acceptance_criteria: str,
1018    evidence_needed: str,
1019    stall_behavior: str,
1020    metadata: dict[str, Any],
1021) -> bool:
1022    try:
1023        job = ctx.db.get_job(ctx.job_id)
1024    except KeyError:
1025        return False
1026    job_metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
1027    tasks = job_metadata.get("task_queue") if isinstance(job_metadata.get("task_queue"), list) else []
1028    key = task_key(parent, title)
1029    current = next(
1030        (
1031            entry
1032            for entry in tasks
1033            if isinstance(entry, dict)
1034            and (
1035                entry.get("key") == key
1036                or (not entry.get("key") and task_key(str(entry.get("parent") or ""), str(entry.get("title") or "")) == key)
1037            )
1038        ),
1039        None,
1040    )
1041    if not current:
1042        return False
1043    fields = (
1044        "status",
1045        "priority",
1046        "goal",
1047        "source_hint",
1048        "result",
1049        "parent",
1050        "output_contract",
1051        "acceptance_criteria",
1052        "evidence_needed",
1053        "stall_behavior",
1054        "metadata",
1055    )
1056    before = _task_change_fingerprint(current, fields)
1057    after = dict(current)
1058    cleaned_status = (status.strip().lower() or "open").replace(" ", "_")
1059    after["status"] = cleaned_status if cleaned_status in {"open", "active", "done", "blocked", "skipped"} else "open"
1060    after["priority"] = int(priority)
1061    for field, value in {
1062        "goal": goal.strip(),
1063        "source_hint": source_hint.strip(),
1064        "result": result.strip(),
1065        "parent": parent.strip(),
1066        "output_contract": output_contract.strip().lower().replace(" ", "_"),
1067        "acceptance_criteria": acceptance_criteria.strip(),
1068        "evidence_needed": evidence_needed.strip(),
1069        "stall_behavior": stall_behavior.strip(),
1070    }.items():
1071        if value:
1072            after[field] = value
1073    if metadata:
1074        merged_metadata = after.get("metadata") if isinstance(after.get("metadata"), dict) else {}
1075        merged_metadata = dict(merged_metadata)
1076        merged_metadata.update(metadata)
1077        after["metadata"] = merged_metadata
1078    return before == _task_change_fingerprint(after, fields)
1079
1080
1081def _task_change_fingerprint(entry: dict[str, Any], fields: tuple[str, ...]) -> str:
1082    return json.dumps({field: entry.get(field) for field in fields}, sort_keys=True, separators=(",", ":"))
1083
1084
1085def _task_queue_pressure_active(ctx: ToolContext) -> bool:
1086    try:
1087        job = ctx.db.get_job(ctx.job_id)
1088    except KeyError:
1089        return False
1090    metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
1091    tasks = metadata.get("task_queue") if isinstance(metadata.get("task_queue"), list) else []
1092    objective_tasks = [task for task in tasks if not _task_is_guard_recovery(task)]
1093    open_tasks = [
1094        task
1095        for task in objective_tasks
1096        if str(task.get("status") or "open").strip().lower().replace(" ", "_") in {"open", "active"}
1097    ]
1098    return len(objective_tasks) > 80 or len(open_tasks) >= 40
1099
1100
1101def _semantic_task_match_under_pressure(ctx: ToolContext, *, title: str, parent: str) -> dict[str, Any] | None:
1102    try:
1103        job = ctx.db.get_job(ctx.job_id)
1104    except KeyError:
1105        return None
1106    metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
1107    tasks = metadata.get("task_queue") if isinstance(metadata.get("task_queue"), list) else []
1108    return find_semantic_task_match(
1109        title=title,
1110        parent=parent,
1111        tasks=[task for task in tasks if not _task_is_guard_recovery(task)],
1112    )
1113
1114
1115def _task_is_guard_recovery(task: dict[str, Any]) -> bool:
1116    metadata = task.get("metadata") if isinstance(task.get("metadata"), dict) else {}
1117    return bool(metadata.get("guard_recovery")) or str(task.get("title") or "").strip().lower().startswith("resolve guard:")
1118
1119
1120def _text_mentions_measurement(text: str) -> bool:
1121    terms = (
1122        "measure",
1123        "measured",
1124        "measurement",
1125        "metric",
1126        "experiment",
1127        "trial",
1128        "benchmark",
1129        "result",
1130        "obligation",
1131    )
1132    return any(term in text for term in terms)
1133
1134
1135def _text_mentions_measurement_accounting(text: str) -> bool:
1136    terms = (
1137        "account",
1138        "accounting",
1139        "obtain",
1140        "repair",
1141        "validate",
1142        "valid",
1143        "invalid",
1144        "diagnostic",
1145        "missing",
1146        "no metric",
1147        "without metric",
1148        "blocked",
1149        "failed",
1150        "failure",
1151        "timeout",
1152        "permission",
1153        "auth",
1154        "quota",
1155        "rate limit",
1156        "unavailable",
1157        "unable",
1158        "cannot",
1159        "can't",
1160        "could not",
1161        "stale",
1162        "incomplete",
1163        "not comparable",
1164        "not enough",
1165        "rerun",
1166        "re-run",
1167        "retry",
1168        "next measured",
1169    )
1170    return any(term in text for term in terms)
1171
1172
1173def _metadata_scalar_text(metadata: dict[str, Any]) -> str:
1174    parts = []
1175    for value in metadata.values():
1176        if isinstance(value, (str, int, float, bool)):
1177            parts.append(str(value))
1178    return " ".join(parts)
1179
1180
1181def _complete_task_contract(
1182    *,
1183    title: str,
1184    output_contract: str,
1185    acceptance_criteria: str,
1186    evidence_needed: str,
1187    stall_behavior: str,
1188    metadata: dict[str, Any],
1189) -> tuple[str, str, str, str, dict[str, Any]]:
1190    defaults = initial_task_contract(title)
1191    inferred = []
1192    if not output_contract.strip():
1193        output_contract = defaults["output_contract"]
1194        inferred.append("output_contract")
1195    if not acceptance_criteria.strip():
1196        acceptance_criteria = defaults["acceptance_criteria"]
1197        inferred.append("acceptance_criteria")
1198    if not evidence_needed.strip():
1199        evidence_needed = defaults["evidence_needed"]
1200        inferred.append("evidence_needed")
1201    if not stall_behavior.strip():
1202        stall_behavior = defaults["stall_behavior"]
1203        inferred.append("stall_behavior")
1204    if inferred:
1205        metadata = dict(metadata)
1206        existing = metadata.get("contract_inferred_fields")
1207        existing_fields = [str(item) for item in existing] if isinstance(existing, list) else []
1208        metadata["contract_inferred_fields"] = sorted(set(existing_fields + inferred))
1209    return output_contract, acceptance_criteria, evidence_needed, stall_behavior, metadata
1210
1211
1212def _validated_task_status(
1213    ctx: ToolContext,
1214    *,
1215    status: str,
1216    output_contract: str,
1217    result: str,
1218    metadata: dict[str, Any],
1219) -> tuple[str, dict[str, Any]]:
1220    normalized_status = status.strip().lower().replace(" ", "_") or "open"
1221    contract = output_contract.strip().lower().replace(" ", "_")
1222    if normalized_status == "done" and not result.strip() and not _task_metadata_has_completion_evidence(metadata, contract=contract):
1223        updated = dict(metadata)
1224        updated["completion_validation"] = "missing_result_evidence"
1225        return "active", updated
1226    if normalized_status != "done":
1227        return status, metadata
1228    if contract in {"artifact", "report"}:
1229        if _recent_deliverable_evidence(ctx):
1230            return status, metadata
1231        updated = dict(metadata)
1232        updated["completion_validation"] = "missing_recent_deliverable_evidence"
1233        if result:
1234            updated["claimed_result"] = result
1235        return "active", updated
1236    if _task_contract_completion_has_evidence(ctx, contract=contract, metadata=metadata):
1237        return status, metadata
1238    updated = dict(metadata)
1239    updated["completion_validation"] = f"missing_{contract}_evidence" if contract else "missing_contract_evidence"
1240    if result:
1241        updated["claimed_result"] = result
1242    return "active", updated
1243
1244
1245def _task_contract_completion_has_evidence(ctx: ToolContext, *, contract: str, metadata: dict[str, Any]) -> bool:
1246    if not contract or contract == "decision":
1247        return True
1248    if _task_metadata_has_completion_evidence(metadata, contract=contract):
1249        return True
1250    recent_evidence_tools = {
1251        "research": {"record_source", "record_findings"},
1252        "experiment": {"record_experiment"},
1253        "action": {"shell_exec", "write_file", "write_artifact"},
1254        "monitor": {"defer_job"},
1255        "validation": {"record_milestone_validation", "shell_exec"},
1256    }
1257    tools = recent_evidence_tools.get(contract)
1258    if not tools:
1259        return True
1260    for step in reversed(ctx.db.list_steps(job_id=ctx.job_id, limit=12)):
1261        if step.get("id") == ctx.step_id or step.get("status") != "completed":
1262            continue
1263        tool_name = str(step.get("tool_name") or "")
1264        if contract == "action" and tool_name == "shell_exec":
1265            input_data = step.get("input") if isinstance(step.get("input"), dict) else {}
1266            args = input_data.get("arguments") if isinstance(input_data.get("arguments"), dict) else {}
1267            if _shell_command_counts_as_action_evidence(str(args.get("command") or "")):
1268                return True
1269            continue
1270        if tool_name in tools:
1271            return True
1272    return False
1273
1274
1275def _task_metadata_has_completion_evidence(metadata: dict[str, Any], *, contract: str = "") -> bool:
1276    evidence_keys = {
1277        "artifact_id",
1278        "evidence_artifact",
1279        "experiment_key",
1280        "file_path",
1281        "output_path",
1282        "validation_event_id",
1283    }
1284    if contract == "research":
1285        evidence_keys.update({"finding_key", "source_key", "source_url", "finding_id", "source_id"})
1286    elif contract == "experiment":
1287        evidence_keys.update({"metric_name", "metric_value", "measurement_id"})
1288    elif contract == "action":
1289        evidence_keys.update({"step_id", "command", "action_id"})
1290    elif contract == "monitor":
1291        evidence_keys.update({"defer_until", "monitor_id", "check_at"})
1292    return any(str(metadata.get(key) or "").strip() for key in evidence_keys)
1293
1294
1295def _recent_deliverable_evidence(ctx: ToolContext, *, limit: int = 12) -> bool:
1296    for step in reversed(ctx.db.list_steps(job_id=ctx.job_id, limit=limit)):
1297        if step.get("id") == ctx.step_id:
1298            continue
1299        if step.get("status") != "completed":
1300            continue
1301        tool_name = str(step.get("tool_name") or "")
1302        input_data = step.get("input") if isinstance(step.get("input"), dict) else {}
1303        args = input_data.get("arguments") if isinstance(input_data.get("arguments"), dict) else {}
1304        if tool_name == "write_artifact" and _artifact_args_look_like_deliverable(args):
1305            return True
1306        if tool_name == "write_file":
1307            return True
1308        if tool_name == "shell_exec" and _shell_command_looks_like_write(str(args.get("command") or "")):
1309            return True
1310    return False
1311
1312
1313def _artifact_args_look_like_deliverable(args: dict[str, Any]) -> bool:
1314    text = " ".join(str(args.get(key) or "") for key in ("title", "summary", "type")).lower()
1315    if not text:
1316        return False
1317    evidence_like = any(term in text for term in EVIDENCE_OUTPUT_TERMS)
1318    deliverable_like = any(term in text for term in DELIVERABLE_OUTPUT_TERMS)
1319    return deliverable_like and not evidence_like
1320
1321
1322def _shell_command_looks_like_write(command: str) -> bool:
1323    text = command.strip()
1324    if not text:
1325        return False
1326    write_patterns = [
1327        r"(?<!\d)>>?\s*[^&]",
1328        r"\b1>>?\s*[^&]",
1329        r"\btee\b",
1330        r"\bcat\s+>\b",
1331        r"\bpython[0-9.]*\b.*\bwrite_text\b",
1332        r"\bpython[0-9.]*\b.*\bopen\([^)]*,\s*['\"]w",
1333        r"\bsed\s+-i\b",
1334    ]
1335    return any(re.search(pattern, text, flags=re.IGNORECASE | re.DOTALL) for pattern in write_patterns)
1336
1337
1338def _shell_command_counts_as_action_evidence(command: str) -> bool:
1339    text = command.strip()
1340    if not text:
1341        return False
1342    if _shell_command_looks_like_write(text):
1343        return True
1344    read_only = re.compile(
1345        r"(?is)^\s*(?:"
1346        r"awk\b|cat\b|df\b|du\b|echo\b|find\b|git\s+(?:diff|grep|log|ls-files|show|status)\b|"
1347        r"grep\b|head\b|ls\b|pwd\b|rg\b|sed\s+-n\b|stat\b|tail\b|tree\b|wc\b"
1348        r")"
1349    )
1350    if read_only.search(text):
1351        return False
1352    if re.match(r"(?is)^curl\b", text):
1353        mutating_flags = (
1354            r"\b-X\s*(?:POST|PUT|PATCH|DELETE)\b|--request\s+(?:POST|PUT|PATCH|DELETE)\b|"
1355            r"(?:^|\s)(?:-d|--data|--form|-F|-T|--upload-file)\b"
1356        )
1357        return bool(re.search(mutating_flags, text))
1358    return True
1359
1360
1361def _record_roadmap(args: dict[str, Any], ctx: ToolContext) -> str:
1362    title = str(args.get("title") or args.get("name") or "").strip()
1363    if not title:
1364        return _json({"success": False, "error": "title is required"})
1365    milestones_arg = args.get("milestones")
1366    milestones = [item for item in milestones_arg if isinstance(item, dict)] if isinstance(milestones_arg, list) else []
1367    roadmap = ctx.db.append_roadmap_record(
1368        ctx.job_id,
1369        title=title,
1370        status=str(args.get("status") or "planned"),
1371        objective=str(args.get("objective") or ""),
1372        scope=str(args.get("scope") or ""),
1373        current_milestone=str(args.get("current_milestone") or ""),
1374        validation_contract=str(args.get("validation_contract") or ""),
1375        milestones=milestones,
1376        metadata=args.get("metadata") if isinstance(args.get("metadata"), dict) else {},
1377    )
1378    ctx.db.append_agent_update(
1379        ctx.job_id,
1380        (
1381            f"Roadmap updated: {roadmap.get('status')} with "
1382            f"{len(roadmap.get('milestones') or [])} milestones."
1383        ),
1384        category="plan",
1385        metadata={
1386            "roadmap_title": roadmap.get("title"),
1387            "roadmap_status": roadmap.get("status"),
1388            "milestone_count": len(roadmap.get("milestones") or []),
1389            "current_milestone": roadmap.get("current_milestone"),
1390        },
1391    )
1392    return _json({"success": True, "job_id": ctx.job_id, "roadmap": roadmap})
1393
1394
1395def _record_milestone_validation(args: dict[str, Any], ctx: ToolContext) -> str:
1396    milestone = str(args.get("milestone") or args.get("milestone_title") or "").strip()
1397    if not milestone:
1398        return _json({"success": False, "error": "milestone is required"})
1399    raw_issues = args.get("issues")
1400    issues = [str(item) for item in raw_issues if str(item).strip()] if isinstance(raw_issues, list) else []
1401    follow_up_items = args.get("follow_up_tasks") if isinstance(args.get("follow_up_tasks"), list) else []
1402    validation_status = str(args.get("validation_status") or args.get("status") or "pending").strip().lower().replace(" ", "_")
1403    if validation_status not in {"pending", "passed", "failed", "blocked"}:
1404        validation_status = "pending"
1405    result_text = str(args.get("result") or args.get("summary") or "").strip()
1406    evidence_text = str(args.get("evidence") or args.get("evidence_artifact") or "").strip()
1407    next_action = str(args.get("next_action") or "").strip()
1408    metadata = args.get("metadata") if isinstance(args.get("metadata"), dict) else {}
1409    if validation_status == "passed" and not _validation_has_positive_evidence(result_text, evidence_text, metadata):
1410        return _json({
1411            "success": False,
1412            "error": "passed milestone validation requires evidence or result",
1413            "guidance": (
1414                "Use validation_status=passed only after concrete evidence or a validation result proves the milestone. "
1415                "Use pending, failed, or blocked when validation is incomplete or missing evidence."
1416            ),
1417        })
1418    if validation_status in {"failed", "blocked"} and not (
1419        result_text or evidence_text or issues or follow_up_items or next_action
1420    ):
1421        return _json({
1422            "success": False,
1423            "error": f"{validation_status} milestone validation requires a gap, issue, evidence, next_action, or follow-up task",
1424            "guidance": (
1425                "Failed or blocked validation must say what is missing or what should happen next, "
1426                "so the worker can continue from a concrete gap instead of logging a vague checkpoint."
1427            ),
1428        })
1429    validation = ctx.db.append_milestone_validation_record(
1430        ctx.job_id,
1431        milestone=milestone,
1432        validation_status=validation_status,
1433        result=result_text,
1434        evidence=evidence_text,
1435        issues=issues,
1436        next_action=next_action,
1437        metadata=metadata,
1438    )
1439    follow_up_tasks = []
1440    for task in follow_up_items[:25]:
1441        if not isinstance(task, dict):
1442            continue
1443        title = str(task.get("title") or task.get("name") or "").strip()
1444        if not title:
1445            continue
1446        priority_arg = task.get("priority")
1447        priority = int(priority_arg) if isinstance(priority_arg, (int, float)) else 0
1448        follow_up_tasks.append(ctx.db.append_task_record(
1449            ctx.job_id,
1450            title=title,
1451            status=str(task.get("status") or "open"),
1452            priority=priority,
1453            goal=str(task.get("goal") or task.get("description") or ""),
1454            source_hint=str(task.get("source_hint") or task.get("source") or ""),
1455            result=str(task.get("result") or task.get("outcome") or ""),
1456            parent=str(task.get("parent") or milestone),
1457            output_contract=str(
1458                task.get("output_contract")
1459                or task.get("contract")
1460                or (task.get("metadata") if isinstance(task.get("metadata"), dict) else {}).get("output_contract")
1461                or (task.get("metadata") if isinstance(task.get("metadata"), dict) else {}).get("contract")
1462                or "action"
1463            ),
1464            acceptance_criteria=str(task.get("acceptance_criteria") or ""),
1465            evidence_needed=str(task.get("evidence_needed") or ""),
1466            stall_behavior=str(task.get("stall_behavior") or ""),
1467            metadata=task.get("metadata") if isinstance(task.get("metadata"), dict) else {"source": "milestone_validation"},
1468        ))
1469    ctx.db.append_agent_update(
1470        ctx.job_id,
1471        (
1472            f"Milestone validation {validation.get('validation_status')}: "
1473            f"{validation.get('title') or milestone}; follow-up tasks {len(follow_up_tasks)}."
1474        ),
1475        category="plan",
1476        metadata={
1477            "milestone": validation.get("title") or milestone,
1478            "validation_status": validation.get("validation_status"),
1479            "follow_up_tasks": len(follow_up_tasks),
1480        },
1481    )
1482    return _json({
1483        "success": True,
1484        "job_id": ctx.job_id,
1485        "validation": validation,
1486        "follow_up_tasks": follow_up_tasks,
1487    })
1488
1489
1490def _validation_has_positive_evidence(result: str, evidence: str, metadata: dict[str, Any]) -> bool:
1491    if result.strip() or evidence.strip():
1492        return True
1493    evidence_keys = {
1494        "artifact_id",
1495        "evidence_artifact",
1496        "experiment_key",
1497        "file_path",
1498        "output_path",
1499        "source_url",
1500        "finding_key",
1501        "validation_event_id",
1502    }
1503    return any(str(metadata.get(key) or "").strip() for key in evidence_keys)
1504
1505
1506def _record_experiment(args: dict[str, Any], ctx: ToolContext) -> str:
1507    title = str(
1508        args.get("title")
1509        or args.get("name")
1510        or args.get("metric_name")
1511        or args.get("hypothesis")
1512        or args.get("result")
1513        or args.get("outcome")
1514        or "Experiment checkpoint"
1515    ).strip()
1516    metric_name = str(args.get("metric_name") or "").strip()
1517    metric_value = _optional_float(args.get("metric_value"))
1518    baseline_value = _optional_float(args.get("baseline_value"))
1519    status = str(args.get("status") or "planned").strip().lower() or "planned"
1520    next_action = str(args.get("next_action") or "").strip()
1521    if status == "measured" and (not metric_name or metric_value is None):
1522        return _json({
1523            "success": False,
1524            "error": "measured experiments require metric_name and numeric metric_value",
1525            "guidance": (
1526                "Use status=measured only for a real measurement with a metric name and numeric value. "
1527                "Use status=failed, blocked, skipped, running, or planned when the trial did not produce a valid metric."
1528            ),
1529        })
1530    if status in {"measured", "failed", "blocked", "skipped"} and not next_action:
1531        return _json({
1532            "success": False,
1533            "error": "next_action is required for measured, failed, blocked, or skipped experiments",
1534            "guidance": (
1535                "Experiment records that close out a trial must leave a concrete next action, "
1536                "such as the next experiment, action branch, monitor branch, pivot, or blocked condition."
1537            ),
1538        })
1539    if status in {"failed", "blocked", "skipped"} and not _experiment_has_closed_trial_context(args):
1540        return _json({
1541            "success": False,
1542            "error": f"{status} experiments require result, evidence, config, or metadata",
1543            "guidance": (
1544                "Closed non-measured trials must record what happened or what was attempted. "
1545                "Include result/outcome, evidence_artifact, config, or metadata with the blocker/context."
1546            ),
1547        })
1548    record = ctx.db.append_experiment_record(
1549        ctx.job_id,
1550        title=title,
1551        hypothesis=str(args.get("hypothesis") or ""),
1552        status=status,
1553        metric_name=metric_name,
1554        metric_value=metric_value,
1555        metric_unit=str(args.get("metric_unit") or ""),
1556        higher_is_better=bool(args.get("higher_is_better", True)),
1557        baseline_value=baseline_value,
1558        config=args.get("config") if isinstance(args.get("config"), dict) else {},
1559        result=str(args.get("result") or args.get("outcome") or ""),
1560        evidence_artifact=str(args.get("evidence_artifact") or args.get("artifact_id") or ""),
1561        next_action=next_action,
1562        metadata=args.get("metadata") if isinstance(args.get("metadata"), dict) else {},
1563    )
1564    metric = ""
1565    if record.get("metric_value") is not None:
1566        metric = " " + format_metric_value(
1567            record.get("metric_name") or "metric",
1568            record.get("metric_value"),
1569            record.get("metric_unit") or "",
1570        )
1571    best = " best" if record.get("best_observed") else ""
1572    ctx.db.append_agent_update(
1573        ctx.job_id,
1574        f"Experiment {record.get('status')}: {record.get('title')}{metric}{best}.",
1575        category="progress",
1576        metadata={
1577            "experiment_key": record.get("key"),
1578            "metric_name": record.get("metric_name"),
1579            "metric_value": record.get("metric_value"),
1580            "best_observed": record.get("best_observed"),
1581            "delta_from_previous_best": record.get("delta_from_previous_best"),
1582        },
1583    )
1584    if record.get("metric_value") is not None or str(record.get("status") or "") in {"measured", "failed", "blocked", "skipped"}:
1585        _resolve_measurement_obligation(
1586            ctx,
1587            status="recorded",
1588            reason=f"Recorded experiment {record.get('title')}.",
1589            via_tool="record_experiment",
1590            experiment_key=str(record.get("key") or ""),
1591        )
1592    return _json({"success": True, "job_id": ctx.job_id, "experiment": record})
1593
1594
1595def _experiment_has_closed_trial_context(args: dict[str, Any]) -> bool:
1596    if str(args.get("result") or args.get("outcome") or "").strip():
1597        return True
1598    if str(args.get("evidence_artifact") or args.get("artifact_id") or "").strip():
1599        return True
1600    config = args.get("config")
1601    if isinstance(config, dict) and any(str(value).strip() for value in config.values()):
1602        return True
1603    metadata = args.get("metadata")
1604    if isinstance(metadata, dict) and any(str(value).strip() for value in metadata.values()):
1605        return True
1606    return False
1607
1608
1609def _optional_float(value: Any) -> float | None:
1610    if isinstance(value, bool) or value is None:
1611        return None
1612    if isinstance(value, (int, float)):
1613        return float(value)
1614    if isinstance(value, str):
1615        text = value.strip().replace(",", "")
1616        if not text:
1617            return None
1618        try:
1619            return float(text)
1620        except ValueError:
1621            return None
1622    return None
1623
1624
1625def _pending_measurement(ctx: ToolContext) -> dict[str, Any] | None:
1626    try:
1627        job = ctx.db.get_job(ctx.job_id)
1628    except KeyError:
1629        return None
1630    metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
1631    obligation = metadata.get("pending_measurement_obligation")
1632    if isinstance(obligation, dict) and not obligation.get("resolved_at"):
1633        return obligation
1634    return None
1635
1636
1637def _resolve_measurement_obligation(
1638    ctx: ToolContext,
1639    *,
1640    status: str,
1641    reason: str,
1642    via_tool: str,
1643    experiment_key: str = "",
1644) -> None:
1645    obligation = _pending_measurement(ctx)
1646    if not obligation:
1647        return
1648    resolved = dict(obligation)
1649    resolved.update({
1650        "resolved_at": time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime()),
1651        "resolution_status": status,
1652        "resolution_reason": reason[:1000],
1653        "resolution_tool": via_tool,
1654    })
1655    if experiment_key:
1656        resolved["experiment_key"] = experiment_key
1657    ctx.db.update_job_metadata(
1658        ctx.job_id,
1659        {
1660            "pending_measurement_obligation": {},
1661            "last_measurement_obligation": resolved,
1662        },
1663    )
1664    ctx.db.append_agent_update(
1665        ctx.job_id,
1666        f"Measurement obligation {status}: {reason[:220]}",
1667        category="progress" if status == "recorded" else "blocked",
1668        metadata={"measurement_obligation": resolved},
1669    )
1670
1671
1672def _send_digest_email(args: dict[str, Any], ctx: ToolContext) -> str:
1673    subject = str(args.get("subject") or "Agent digest")
1674    body = str(args.get("body") or "")
1675    if not body:
1676        return _json({"success": False, "error": "body is required"})
1677    result = send_digest_email(ctx.config.email, subject=subject, body=body, to_addr=args.get("to_addr"))
1678    stored = ctx.artifacts.write_text(
1679        job_id=ctx.job_id,
1680        run_id=ctx.run_id,
1681        step_id=ctx.step_id,
1682        content=body,
1683        title=subject,
1684        summary="Digest email body",
1685        artifact_type="digest",
1686        metadata={"email": result},
1687    )
1688    return _json({"success": True, "email": result, "artifact_id": stored.id, "path": str(stored.path)})
1689
1690
1691def _browser_call(name: str, args: dict[str, Any], ctx: ToolContext) -> str:
1692    from nipux_cli import browser
1693
1694    task_id = ctx.task_id or ctx.job_id
1695    if name == "browser_navigate":
1696        return _json(browser.navigate(ctx.config, task_id=task_id, url=str(args.get("url") or "")))
1697    if name == "browser_snapshot":
1698        return _json(browser.snapshot(ctx.config, task_id=task_id, full=bool(args.get("full", False))))
1699    if name == "browser_click":
1700        return _json(browser.click(ctx.config, task_id=task_id, ref=str(args.get("ref") or "")))
1701    if name == "browser_type":
1702        return _json(browser.fill(ctx.config, task_id=task_id, ref=str(args.get("ref") or ""), text=str(args.get("text") or "")))
1703    if name == "browser_scroll":
1704        return _json(browser.scroll(ctx.config, task_id=task_id, direction=str(args.get("direction") or "down")))
1705    if name == "browser_back":
1706        return _json(browser.back(ctx.config, task_id=task_id))
1707    if name == "browser_press":
1708        return _json(browser.press(ctx.config, task_id=task_id, key=str(args.get("key") or "")))
1709    if name == "browser_console":
1710        return _json(browser.console(ctx.config, task_id=task_id, clear=bool(args.get("clear", False)), expression=args.get("expression")))
1711    raise KeyError(name)
1712
1713
1714def _web_call(name: str, args: dict[str, Any], ctx: ToolContext) -> str:
1715    del ctx
1716    from nipux_cli.web import web_extract, web_search
1717
1718    if name == "web_search":
1719        return _json(web_search(str(args.get("query") or ""), limit=int(args.get("limit") or 5)))
1720    if name == "web_extract":
1721        urls = args.get("urls") if isinstance(args.get("urls"), list) else []
1722        return _json(web_extract(urls[:5]))
1723    raise KeyError(name)
1724
1725
1726def _browser_handler(name: str) -> Handler:
1727    return lambda args, ctx: _browser_call(name, args, ctx)
1728
1729
1730def _web_handler(name: str) -> Handler:
1731    return lambda args, ctx: _web_call(name, args, ctx)
1732
1733
1734BROWSER_SCHEMAS: list[ToolSpec] = [
1735    ToolSpec("browser_navigate", "Navigate to a URL and return a compact browser snapshot.", {
1736        "type": "object",
1737        "properties": {"url": {"type": "string"}},
1738        "required": ["url"],
1739    }, _browser_handler("browser_navigate")),
1740    ToolSpec("browser_snapshot", "Refresh the current page accessibility snapshot.", {
1741        "type": "object",
1742        "properties": {"full": {"type": "boolean", "default": False}},
1743        "required": [],
1744    }, _browser_handler("browser_snapshot")),
1745    ToolSpec("browser_click", "Click an element by snapshot ref, for example @e5.", {
1746        "type": "object",
1747        "properties": {"ref": {"type": "string"}},
1748        "required": ["ref"],
1749    }, _browser_handler("browser_click")),
1750    ToolSpec("browser_type", "Fill an input by snapshot ref.", {
1751        "type": "object",
1752        "properties": {"ref": {"type": "string"}, "text": {"type": "string"}},
1753        "required": ["ref", "text"],
1754    }, _browser_handler("browser_type")),
1755    ToolSpec("browser_scroll", "Scroll the current page up or down.", {
1756        "type": "object",
1757        "properties": {"direction": {"type": "string", "enum": ["up", "down"]}},
1758        "required": ["direction"],
1759    }, _browser_handler("browser_scroll")),
1760    ToolSpec("browser_back", "Navigate back in browser history.", {"type": "object", "properties": {}, "required": []}, _browser_handler("browser_back")),
1761    ToolSpec("browser_press", "Press a keyboard key in the browser.", {
1762        "type": "object",
1763        "properties": {"key": {"type": "string"}},
1764        "required": ["key"],
1765    }, _browser_handler("browser_press")),
1766    ToolSpec("browser_console", "Read console errors or evaluate JavaScript in the current page.", {
1767        "type": "object",
1768        "properties": {"clear": {"type": "boolean", "default": False}, "expression": {"type": "string"}},
1769        "required": [],
1770    }, _browser_handler("browser_console")),
1771]
1772
1773
1774SUPPORT_SCHEMAS: list[ToolSpec] = [
1775    ToolSpec("web_search", "Search the web for candidate sources.", {
1776        "type": "object",
1777        "properties": {"query": {"type": "string"}, "limit": {"type": "integer", "default": 5}},
1778        "required": ["query"],
1779    }, _web_handler("web_search")),
1780    ToolSpec("web_extract", "Extract markdown text from up to five URLs.", {
1781        "type": "object",
1782        "properties": {"urls": {"type": "array", "items": {"type": "string"}, "maxItems": 5}},
1783        "required": ["urls"],
1784    }, _web_handler("web_extract")),
1785    ToolSpec("shell_exec", "Run a local shell command for CLI work. Use small read-only probes first. For long downloads, builds, training, crawls, or benchmarks, set a meaningful timeout, prefer resumable commands, and record or defer monitoring instead of repeatedly restarting short timed-out commands. Do not run destructive or high-risk cyber commands.", {
1786        "type": "object",
1787        "properties": {
1788            "command": {"type": "string"},
1789            "cwd": {"type": "string"},
1790            "timeout_seconds": {"type": "number", "default": 60},
1791            "max_output_chars": {"type": "integer", "default": 12000},
1792        },
1793        "required": ["command"],
1794    }, _shell_exec),
1795    ToolSpec("write_file", "Create, overwrite, or append a concrete workspace/local file for deliverables, code, documents, configs, or other file outputs.", {
1796        "type": "object",
1797        "properties": {
1798            "path": {"type": "string"},
1799            "content": {"type": "string"},
1800            "mode": {"type": "string", "enum": ["overwrite", "append"], "default": "overwrite"},
1801            "create_parents": {"type": "boolean", "default": True},
1802        },
1803        "required": ["path", "content"],
1804    }, _write_file),
1805    ToolSpec("write_artifact", "Persist important findings, evidence, reports, or checkpoints to the job artifact store.", {
1806        "type": "object",
1807        "properties": {
1808            "title": {"type": "string"},
1809            "type": {"type": "string", "default": "text"},
1810            "summary": {"type": "string"},
1811            "content": {"type": "string"},
1812            "metadata": {"type": "object"},
1813        },
1814        "required": ["content"],
1815    }, _write_artifact),
1816    ToolSpec("read_artifact", "Read a saved artifact by artifact_id, visible number, exact saved path, or title.", {
1817        "type": "object",
1818        "properties": {
1819            "artifact_id": {"type": "string", "description": "Artifact id, visible number, saved path, or title."},
1820            "path": {"type": "string"},
1821            "title": {"type": "string"},
1822            "ref": {"type": "string"},
1823        },
1824        "required": [],
1825    }, _read_artifact),
1826    ToolSpec("search_artifacts", "Search stored artifacts for exact evidence from prior steps.", {
1827        "type": "object",
1828        "properties": {"query": {"type": "string"}, "limit": {"type": "integer", "default": 10}},
1829        "required": ["query"],
1830    }, _search_artifacts),
1831    ToolSpec("update_job_state", "Keep the current job runnable. Completion, failure, pausing, and cancellation are operator-only; workers should report checkpoints and continue.", {
1832        "type": "object",
1833        "properties": {
1834            "status": {"type": "string", "enum": ["queued", "running"]},
1835            "note": {"type": "string"},
1836        },
1837        "required": ["status"],
1838    }, _update_job_state),
1839    ToolSpec("defer_job", "Wait before the next worker turn for this job. Use for long external processes, monitor/check-later tasks, or scheduled follow-up without completing or pausing the job.", {
1840        "type": "object",
1841        "properties": {
1842            "seconds": {"type": "number", "description": "Delay in seconds before this job is runnable again.", "default": 300},
1843            "until": {"type": "string", "description": "Optional ISO timestamp to resume after."},
1844            "reason": {"type": "string"},
1845            "next_action": {"type": "string"},
1846        },
1847        "required": [],
1848    }, _defer_job),
1849    ToolSpec("report_update", "Leave a short operator-readable progress note. Do not use this instead of write_artifact for durable evidence.", {
1850        "type": "object",
1851        "properties": {
1852            "message": {"type": "string"},
1853            "category": {"type": "string", "enum": ["progress", "finding", "blocked", "plan"], "default": "progress"},
1854            "metadata": {"type": "object"},
1855        },
1856        "required": ["message"],
1857    }, _report_update),
1858    ToolSpec("record_lesson", "Save durable learning for this job: bad source patterns, success criteria, strategy changes, mistakes to avoid, or operator preferences.", {
1859        "type": "object",
1860        "properties": {
1861            "lesson": {"type": "string"},
1862            "category": {
1863                "type": "string",
1864                "enum": [
1865                    "source_quality",
1866                    "task_profile",
1867                    "strategy",
1868                    "mistake",
1869                    "constraint",
1870                    "operator_preference",
1871                    "memory",
1872                ],
1873                "default": "memory",
1874            },
1875            "confidence": {"type": "number"},
1876            "metadata": {"type": "object"},
1877        },
1878        "required": ["lesson"],
1879    }, _record_lesson),
1880    ToolSpec("record_memory_graph", "Create or update the job's connected memory graph: reusable episodes, facts, strategies, skills, questions, decisions, constraints, and links between them. Use this to build a durable brain for long-running work instead of relying on raw history.", {
1881        "type": "object",
1882        "properties": {
1883            "nodes": {
1884                "type": "array",
1885                "maxItems": 50,
1886                "items": {
1887                    "type": "object",
1888                    "properties": {
1889                        "key": {"type": "string"},
1890                        "title": {"type": "string"},
1891                        "kind": {
1892                            "type": "string",
1893                            "enum": [
1894                                "artifact",
1895                                "constraint",
1896                                "decision",
1897                                "episode",
1898                                "experiment",
1899                                "fact",
1900                                "milestone",
1901                                "question",
1902                                "skill",
1903                                "source",
1904                                "strategy",
1905                                "task",
1906                            ],
1907                        },
1908                        "status": {
1909                            "type": "string",
1910                            "enum": ["active", "blocked", "deprecated", "open", "resolved", "stable"],
1911                            "default": "active",
1912                        },
1913                        "summary": {"type": "string"},
1914                        "salience": {"type": "number"},
1915                        "confidence": {"type": "number"},
1916                        "tags": {"type": "array", "items": {"type": "string"}},
1917                        "parent_key": {"type": "string"},
1918                        "links": {"type": "array", "items": {"type": "string"}},
1919                        "evidence_refs": {"type": "array", "items": {"type": "string"}},
1920                        "metadata": {"type": "object"},
1921                    },
1922                    "required": ["title"],
1923                },
1924            },
1925            "edges": {
1926                "type": "array",
1927                "maxItems": 100,
1928                "items": {
1929                    "type": "object",
1930                    "properties": {
1931                        "from_key": {"type": "string"},
1932                        "to_key": {"type": "string"},
1933                        "relation": {"type": "string"},
1934                        "evidence_refs": {"type": "array", "items": {"type": "string"}},
1935                        "metadata": {"type": "object"},
1936                    },
1937                    "required": ["from_key", "to_key", "relation"],
1938                },
1939            },
1940        },
1941        "required": [],
1942    }, _record_memory_graph),
1943    ToolSpec("search_memory_graph", "Search the job's connected memory graph for reusable facts, decisions, strategies, skills, questions, constraints, and related links.", {
1944        "type": "object",
1945        "properties": {
1946            "query": {"type": "string"},
1947            "limit": {"type": "integer", "default": 10},
1948        },
1949        "required": ["query"],
1950    }, _search_memory_graph),
1951    ToolSpec("acknowledge_operator_context", "Acknowledge that durable operator steering has been incorporated or superseded. Use this after acting on a chat correction so it can leave the active context while remaining in history.", {
1952        "type": "object",
1953        "properties": {
1954            "message_ids": {"type": "array", "items": {"type": "string"}},
1955            "summary": {"type": "string"},
1956            "status": {"type": "string", "enum": ["acknowledged", "superseded"], "default": "acknowledged"},
1957        },
1958        "required": ["summary"],
1959    }, _acknowledge_operator_context),
1960    ToolSpec("record_source", "Update the source ledger with source quality, finding yield, failures, warnings, and last outcome.", {
1961        "type": "object",
1962        "properties": {
1963            "source": {"type": "string"},
1964            "source_type": {"type": "string"},
1965            "usefulness_score": {"type": "number"},
1966            "yield_count": {"type": "integer", "default": 0},
1967            "fail_count_delta": {"type": "integer", "default": 0},
1968            "warnings": {"type": "array", "items": {"type": "string"}},
1969            "outcome": {"type": "string"},
1970            "metadata": {"type": "object"},
1971        },
1972        "required": ["source"],
1973    }, _record_source),
1974    ToolSpec("record_findings", "Update the finding ledger with evidence-backed useful results. Each finding needs an evidence anchor such as source_url/url, reason, evidence_artifact, or evidence metadata.", {
1975        "type": "object",
1976        "properties": {
1977            "evidence_artifact": {"type": "string"},
1978            "findings": {
1979                "type": "array",
1980                "maxItems": 50,
1981                "items": {
1982                    "type": "object",
1983                    "properties": {
1984                        "name": {"type": "string"},
1985                        "url": {"type": "string"},
1986                        "source_url": {"type": "string"},
1987                        "category": {"type": "string"},
1988                        "location": {"type": "string"},
1989                        "contact": {"type": "string"},
1990                        "reason": {"type": "string"},
1991                        "status": {"type": "string"},
1992                        "score": {"type": "number"},
1993                        "evidence_artifact": {"type": "string"},
1994                        "metadata": {"type": "object"},
1995                    },
1996                    "required": ["name"],
1997                },
1998            },
1999        },
2000        "required": ["findings"],
2001    }, _record_findings),
2002    ToolSpec("record_tasks", "Create or update a durable queue of objective-neutral work branches. Use this to split long jobs into next actions, mark blocked branches, and keep the agent from cycling on one path. Missing task contract fields are filled with generic defaults from the task title. When the queue is saturated, near-duplicate task titles are folded into the matching existing task instead of creating another branch.", {
2003        "type": "object",
2004        "properties": {
2005            "tasks": {
2006                "type": "array",
2007                "maxItems": 50,
2008                "items": {
2009                    "type": "object",
2010                    "properties": {
2011                        "title": {"type": "string"},
2012                        "status": {"type": "string", "enum": ["open", "active", "done", "blocked", "skipped"], "default": "open"},
2013                        "priority": {"type": "integer", "default": 0},
2014                        "goal": {"type": "string"},
2015                        "source_hint": {"type": "string"},
2016                        "result": {"type": "string"},
2017                        "parent": {"type": "string"},
2018                        "output_contract": {
2019                            "type": "string",
2020                            "enum": ["research", "artifact", "experiment", "action", "monitor", "decision", "report"],
2021                        },
2022                        "acceptance_criteria": {"type": "string"},
2023                        "evidence_needed": {"type": "string"},
2024                        "stall_behavior": {"type": "string"},
2025                        "metadata": {"type": "object"},
2026                    },
2027                    "required": ["title"],
2028                },
2029            },
2030        },
2031        "required": ["tasks"],
2032    }, _record_tasks),
2033    ToolSpec("record_roadmap", "Create or update a generic roadmap for broad work: milestones, features, success criteria, validation contract, scope, and current roadmap state. Use this before or during long-running work when task lists need higher-level structure.", {
2034        "type": "object",
2035        "properties": {
2036            "title": {"type": "string"},
2037            "status": {"type": "string", "enum": ["planned", "active", "validating", "done", "blocked", "paused"], "default": "planned"},
2038            "objective": {"type": "string"},
2039            "scope": {"type": "string"},
2040            "current_milestone": {"type": "string"},
2041            "validation_contract": {"type": "string"},
2042            "milestones": {
2043                "type": "array",
2044                "maxItems": 100,
2045                "items": {
2046                    "type": "object",
2047                    "properties": {
2048                        "key": {"type": "string"},
2049                        "title": {"type": "string"},
2050                        "status": {"type": "string", "enum": ["planned", "active", "validating", "done", "blocked", "skipped"], "default": "planned"},
2051                        "priority": {"type": "integer", "default": 0},
2052                        "goal": {"type": "string"},
2053                        "acceptance_criteria": {"type": "string"},
2054                        "evidence_needed": {"type": "string"},
2055                        "validation_status": {"type": "string", "enum": ["not_started", "pending", "passed", "failed", "blocked"], "default": "not_started"},
2056                        "validation_result": {"type": "string"},
2057                        "next_action": {"type": "string"},
2058                        "features": {
2059                            "type": "array",
2060                            "maxItems": 100,
2061                            "items": {
2062                                "type": "object",
2063                                "properties": {
2064                                    "key": {"type": "string"},
2065                                    "title": {"type": "string"},
2066                                    "status": {"type": "string", "enum": ["planned", "active", "done", "blocked", "skipped"], "default": "planned"},
2067                                    "goal": {"type": "string"},
2068                                    "output_contract": {"type": "string", "enum": ["research", "artifact", "experiment", "action", "monitor", "decision", "report", "validation"]},
2069                                    "acceptance_criteria": {"type": "string"},
2070                                    "evidence_needed": {"type": "string"},
2071                                    "result": {"type": "string"},
2072                                    "metadata": {"type": "object"},
2073                                },
2074                                "required": ["title"],
2075                            },
2076                        },
2077                        "metadata": {"type": "object"},
2078                    },
2079                    "required": ["title"],
2080                },
2081            },
2082            "metadata": {"type": "object"},
2083        },
2084        "required": ["title"],
2085    }, _record_roadmap),
2086    ToolSpec("record_milestone_validation", "Record validation for a roadmap milestone and optionally create follow-up tasks for gaps. Use fresh evidence, acceptance criteria, and clear pass/fail/blocker reasons.", {
2087        "type": "object",
2088        "properties": {
2089            "milestone": {"type": "string"},
2090            "validation_status": {"type": "string", "enum": ["pending", "passed", "failed", "blocked"], "default": "pending"},
2091            "result": {"type": "string"},
2092            "evidence": {"type": "string"},
2093            "issues": {"type": "array", "items": {"type": "string"}},
2094            "next_action": {"type": "string"},
2095            "follow_up_tasks": {
2096                "type": "array",
2097                "maxItems": 25,
2098                "items": {
2099                    "type": "object",
2100                    "properties": {
2101                        "title": {"type": "string"},
2102                        "status": {"type": "string", "enum": ["open", "active", "done", "blocked", "skipped"], "default": "open"},
2103                        "priority": {"type": "integer", "default": 0},
2104                        "goal": {"type": "string"},
2105                        "source_hint": {"type": "string"},
2106                        "result": {"type": "string"},
2107                        "parent": {"type": "string"},
2108                        "output_contract": {"type": "string", "enum": ["research", "artifact", "experiment", "action", "monitor", "decision", "report"]},
2109                        "acceptance_criteria": {"type": "string"},
2110                        "evidence_needed": {"type": "string"},
2111                        "stall_behavior": {"type": "string"},
2112                        "metadata": {"type": "object"},
2113                    },
2114                    "required": ["title"],
2115                },
2116            },
2117            "metadata": {"type": "object"},
2118        },
2119        "required": ["milestone", "validation_status"],
2120    }, _record_milestone_validation),
2121    ToolSpec("record_experiment", "Track a measurable trial, benchmark, comparison, hypothesis test, or optimization attempt. Use this after any command or source produces a concrete result so future steps compare against the best observed result instead of treating notes as progress. Closed trials must include next_action so long-running work can continue from the result.", {
2122        "type": "object",
2123        "properties": {
2124            "title": {"type": "string"},
2125            "hypothesis": {"type": "string"},
2126            "status": {"type": "string", "enum": ["planned", "running", "measured", "failed", "blocked", "skipped"], "default": "planned"},
2127            "metric_name": {"type": "string"},
2128            "metric_value": {"type": "number"},
2129            "metric_unit": {"type": "string"},
2130            "higher_is_better": {"type": "boolean", "default": True},
2131            "baseline_value": {"type": "number"},
2132            "config": {"type": "object"},
2133            "result": {"type": "string"},
2134            "evidence_artifact": {"type": "string"},
2135            "next_action": {"type": "string", "description": "Concrete next experiment, action, monitor branch, pivot, or blocked condition. Required when status is measured, failed, blocked, or skipped."},
2136            "metadata": {"type": "object"},
2137        },
2138        "required": ["title"],
2139    }, _record_experiment),
2140    ToolSpec("send_digest_email", "Send or dry-run a digest email and save the body as an artifact.", {
2141        "type": "object",
2142        "properties": {"subject": {"type": "string"}, "body": {"type": "string"}, "to_addr": {"type": "string"}},
2143        "required": ["body"],
2144    }, _send_digest_email),
2145]
2146
2147
2148APPROVED_TOOL_NAMES = tuple(spec.name for spec in [*BROWSER_SCHEMAS, *SUPPORT_SCHEMAS])
2149
2150
2151class ToolRegistry:
2152    def __init__(self, specs: list[ToolSpec] | None = None):
2153        self._specs = {spec.name: spec for spec in (specs or [*BROWSER_SCHEMAS, *SUPPORT_SCHEMAS])}
2154
2155    def names(self) -> list[str]:
2156        return sorted(self._specs)
2157
2158    def openai_tools(self, config: AppConfig | None = None) -> list[dict[str, Any]]:
2159        return [self._specs[name].as_openai_tool() for name in self.names() if _tool_enabled(name, config)]
2160
2161    def validate_arguments(self, name: str, args: dict[str, Any], config: AppConfig | None = None) -> dict[str, Any] | None:
2162        if name not in self._specs or not _tool_enabled(name, config):
2163            return None
2164        args = args if isinstance(args, dict) else {}
2165        spec = self._specs[name]
2166        missing: list[str] = []
2167        aliases = REQUIRED_ARGUMENT_ALIASES.get(name, {})
2168        for required in spec.parameters.get("required") or []:
2169            candidates = aliases.get(str(required), (str(required),))
2170            if all(_missing_argument(args.get(candidate)) for candidate in candidates):
2171                missing.append(" or ".join(candidates))
2172        for label, fields in REQUIRED_ARGUMENT_GROUPS.get(name, ()):
2173            if all(_missing_argument(args.get(field)) for field in fields):
2174                missing.append(label)
2175        nested_schema = dict(spec.parameters)
2176        nested_schema["required"] = []
2177        missing.extend(item for item in _schema_missing_arguments(nested_schema, args) if item not in missing)
2178        placeholders = [] if missing else [item for item in _schema_placeholder_arguments(nested_schema, args) if item not in missing]
2179        if not missing and not placeholders:
2180            return None
2181        concrete_fields = [*missing, *placeholders]
2182        return {
2183            "success": True,
2184            "recoverable": True,
2185            "error": "missing required tool arguments" if missing else "placeholder tool arguments",
2186            "missing_arguments": missing,
2187            "placeholder_arguments": placeholders,
2188            "blocked_tool": name,
2189            "guidance": (
2190                f"Retry {name} with concrete values for: {', '.join(concrete_fields)}. "
2191                "Do not call a tool with placeholder or empty arguments."
2192            ),
2193        }
2194
2195    def handle(self, name: str, args: dict[str, Any], ctx: ToolContext) -> str:
2196        if name not in self._specs:
2197            return _json({"success": False, "error": f"unknown tool: {name}"})
2198        if not _tool_enabled(name, ctx.config):
2199            group = _tool_access_group(name) or "tool"
2200            return _json({
2201                "success": False,
2202                "error": f"{name} is disabled by tool access config",
2203                "tool_access": group,
2204            })
2205        return self._specs[name].handler(args, ctx)
2206
2207
2208def _tool_enabled(name: str, config: AppConfig | None) -> bool:
2209    if config is None:
2210        return True
2211    group = _tool_access_group(name)
2212    if group is None:
2213        return True
2214    return bool(getattr(config.tools, group))
2215
2216
2217def _tool_access_group(name: str) -> str | None:
2218    if name.startswith("browser_"):
2219        return "browser"
2220    if name.startswith("web_"):
2221        return "web"
2222    if name == "shell_exec":
2223        return "shell"
2224    if name == "write_file":
2225        return "files"
2226    return None
2227
2228
2229DEFAULT_REGISTRY = ToolRegistry()
nipux_cli/tui_commands.py 296 lines
   1"""Slash command metadata and command-palette helpers for the TUI."""
   2
   3from __future__ import annotations
   4
   5from nipux_cli.tui_style import _accent, _bold, _fit_ansi, _muted
   6
   7
   8FIRST_RUN_SLASH_COMMANDS = [
   9    ("/model", "set model"),
  10    ("/base-url", "set endpoint"),
  11    ("/api-key", "save key"),
  12    ("/api-key-env", "key env var"),
  13    ("/config", "runtime config"),
  14    ("/context", "token budget"),
  15    ("/input-cost", "input $/1M"),
  16    ("/timeout", "request timeout"),
  17    ("/browser", "browser on/off"),
  18    ("/web", "web on/off"),
  19    ("/cli-access", "CLI on/off"),
  20    ("/file-access", "files on/off"),
  21    ("/home", "state directory"),
  22    ("/step-limit", "worker timeout"),
  23    ("/output-chars", "output preview size"),
  24    ("/output-cost", "output $/1M"),
  25    ("/max-cost", "job cost limit"),
  26    ("/daily-digest", "daily digest on/off"),
  27    ("/digest-time", "digest time"),
  28    ("/doctor", "check setup"),
  29    ("/init", "write config"),
  30    ("/help", "show commands"),
  31    ("/clear", "clear notices"),
  32    ("/exit", "quit"),
  33]
  34
  35CHAT_SLASH_COMMANDS = [
  36    ("/new", "create and start a worker"),
  37    ("/run", "resume focused work"),
  38    ("/jobs", "switch or inspect jobs"),
  39    ("/settings", "configure provider/tools"),
  40    ("/status", "focused job state"),
  41    ("/help", "core commands"),
  42    ("/outcomes", "durable work"),
  43    ("/artifacts", "saved files"),
  44    ("/activity", "tool calls"),
  45    ("/work", "one step"),
  46    ("/work-verbose", "verbose step"),
  47    ("/focus", "set focus"),
  48    ("/switch", "set focus"),
  49    ("/history", "timeline"),
  50    ("/events", "event feed"),
  51    ("/updates", "durable work"),
  52    ("/outputs", "raw runs"),
  53    ("/artifact", "open output"),
  54    ("/findings", "finding ledger"),
  55    ("/tasks", "task queue"),
  56    ("/roadmap", "milestones"),
  57    ("/experiments", "measurements"),
  58    ("/sources", "source ledger"),
  59    ("/memory", "learning"),
  60    ("/metrics", "counts"),
  61    ("/lessons", "lessons"),
  62    ("/usage", "tokens/cost"),
  63    ("/config", "runtime config"),
  64    ("/health", "daemon health"),
  65    ("/start", "start daemon"),
  66    ("/restart", "restart daemon"),
  67    ("/model", "set model"),
  68    ("/base-url", "set endpoint"),
  69    ("/api-key", "save key"),
  70    ("/api-key-env", "key env var"),
  71    ("/context", "token budget"),
  72    ("/input-cost", "input $/1M"),
  73    ("/timeout", "request timeout"),
  74    ("/browser", "browser on/off"),
  75    ("/web", "web on/off"),
  76    ("/cli-access", "CLI on/off"),
  77    ("/file-access", "files on/off"),
  78    ("/home", "state directory"),
  79    ("/step-limit", "worker timeout"),
  80    ("/output-chars", "output preview size"),
  81    ("/output-cost", "output $/1M"),
  82    ("/daily-digest", "daily digest on/off"),
  83    ("/digest-time", "digest time"),
  84    ("/doctor", "check setup"),
  85    ("/init", "write config"),
  86    ("/pause", "pause job"),
  87    ("/resume", "resume job"),
  88    ("/stop", "pause job"),
  89    ("/cancel", "cancel job"),
  90    ("/delete", "delete job"),
  91    ("/learn", "save lesson"),
  92    ("/note", "save note"),
  93    ("/follow", "queue follow-up"),
  94    ("/digest", "digest"),
  95    ("/clear", "clear notices"),
  96    ("/exit", "quit"),
  97]
  98
  99SETTINGS_FIELD_TYPES = {
 100    "model.name": "str",
 101    "model.base_url": "str",
 102    "model.api_key_env": "str",
 103    "model.context_length": "int",
 104    "model.request_timeout_seconds": "float",
 105    "model.input_cost_per_million": "float",
 106    "model.output_cost_per_million": "float",
 107    "runtime.home": "path",
 108    "runtime.max_step_seconds": "int",
 109    "runtime.artifact_inline_char_limit": "int",
 110    "runtime.daily_digest_enabled": "bool",
 111    "runtime.daily_digest_time": "str",
 112    "runtime.max_job_cost_usd": "float",
 113    "tools.browser": "bool",
 114    "tools.web": "bool",
 115    "tools.shell": "bool",
 116    "tools.files": "bool",
 117}
 118
 119CHAT_SETTING_COMMANDS = {
 120    "model": ("model.name", "MODEL"),
 121    "base-url": ("model.base_url", "URL"),
 122    "api-key-env": ("model.api_key_env", "ENV_NAME"),
 123    "context": ("model.context_length", "TOKENS"),
 124    "input-cost": ("model.input_cost_per_million", "DOLLARS_PER_1M_INPUT_TOKENS"),
 125    "output-cost": ("model.output_cost_per_million", "DOLLARS_PER_1M_OUTPUT_TOKENS"),
 126    "max-cost": ("runtime.max_job_cost_usd", "DOLLARS_OR_0"),
 127    "timeout": ("model.request_timeout_seconds", "SECONDS"),
 128    "browser": ("tools.browser", "true|false"),
 129    "web": ("tools.web", "true|false"),
 130    "cli-access": ("tools.shell", "true|false"),
 131    "file-access": ("tools.files", "true|false"),
 132    "home": ("runtime.home", "PATH"),
 133    "step-limit": ("runtime.max_step_seconds", "SECONDS"),
 134    "output-chars": ("runtime.artifact_inline_char_limit", "CHARS"),
 135    "daily-digest": ("runtime.daily_digest_enabled", "true|false"),
 136    "digest-time": ("runtime.daily_digest_time", "HH:MM"),
 137}
 138
 139SLASH_ARGUMENT_HINTS = {
 140    "new": "OBJECTIVE",
 141    "focus": "JOB_TITLE",
 142    "switch": "JOB_TITLE",
 143    "delete": "JOB_TITLE",
 144    "history": "LIMIT",
 145    "events": "LIMIT",
 146    "outputs": "LIMIT",
 147    "outcomes": "all",
 148    "updates": "all",
 149    "artifact": "QUERY_OR_ID",
 150    "work": "N",
 151    "work-verbose": "N",
 152    "learn": "LESSON",
 153    "note": "MESSAGE",
 154    "follow": "MESSAGE",
 155    **{command: placeholder for command, (_field, placeholder) in CHAT_SETTING_COMMANDS.items()},
 156    "api-key": "KEY",
 157    "key": "KEY",
 158}
 159
 160REQUIRED_SLASH_ARGUMENTS = {"new", "artifact", "learn", "note", "follow"}
 161
 162
 163def slash_suggestion_lines(
 164    input_buffer: str,
 165    commands: list[tuple[str, str]],
 166    *,
 167    width: int,
 168    limit: int = 5,
 169) -> list[str]:
 170    if not input_buffer.startswith("/"):
 171        return []
 172    parts = input_buffer[1:].split(maxsplit=1)
 173    token = parts[0].lower() if parts else ""
 174    if " " in input_buffer[1:]:
 175        hint = SLASH_ARGUMENT_HINTS.get(token)
 176        description = next((desc for cmd, desc in commands if cmd == f"/{token}"), "")
 177        if not hint:
 178            return []
 179        body = f"{_accent('/' + token)} {_muted(hint)}"
 180        if description:
 181            body += f"  {_muted(description)}"
 182        return [
 183            _muted("╭─ command " + "─" * max(0, width - 11)),
 184            _fit_ansi(_muted("│ ") + body, width),
 185            _fit_ansi(_muted(_slash_argument_footer(parts, hint)), width),
 186        ]
 187    command_names = [cmd for cmd, _desc in commands]
 188    selected_command = f"/{token}" if token else ""
 189    exact_selection = selected_command in command_names and input_buffer.rstrip() == selected_command
 190    if exact_selection:
 191        all_matches = commands
 192    else:
 193        all_matches = [(cmd, desc) for cmd, desc in commands if cmd[1:].startswith(token)]
 194        if not all_matches and token:
 195            all_matches = [(cmd, desc) for cmd, desc in commands if token in cmd[1:]]
 196    if exact_selection:
 197        selected_index = next((index for index, (cmd, _desc) in enumerate(all_matches) if cmd == selected_command), 0)
 198        start = max(0, min(selected_index - max(0, limit // 2), max(0, len(all_matches) - limit)))
 199        matches = all_matches[start : start + limit]
 200    else:
 201        matches = all_matches[:limit]
 202    if not matches:
 203        return [
 204            _muted("╭─ commands " + "─" * max(0, width - 12)),
 205            _fit_ansi(_muted("│ no matches"), width),
 206            _muted("╰" + "─" * max(0, width - 1)),
 207        ]
 208    cmd_width = min(14, max(len(cmd) for cmd, _ in matches) + 2)
 209    lines = [_fit_ansi(_muted("╭─ commands " + "─" * max(0, width - 12)), width)]
 210    for index, (cmd, desc) in enumerate(matches):
 211        active = cmd == selected_command if exact_selection else index == 0
 212        marker = _accent("›") if active else _muted(" ")
 213        hint = SLASH_ARGUMENT_HINTS.get(cmd[1:])
 214        command_text = cmd if not hint else f"{cmd} {hint}"
 215        command_width = cmd_width + (len(hint) + 1 if hint else 0)
 216        name = _bold(_accent(command_text)) if active else _accent(command_text)
 217        body = f"{_muted('│')} {marker} {_fit_ansi(name, command_width)} {_muted(desc)}"
 218        lines.append(_fit_ansi(body, width))
 219    hidden = max(0, len(all_matches) - len(matches))
 220    if hidden:
 221        lines.append(_fit_ansi(_muted("╰─ type to filter · enter selects first match"), width))
 222    else:
 223        lines.append(_fit_ansi(_muted("╰─ enter selects · tab fills · ↑↓ moves"), width))
 224    return lines
 225
 226
 227def autocomplete_slash(input_buffer: str, commands: list[tuple[str, str]]) -> str:
 228    if not input_buffer.startswith("/") or " " in input_buffer[1:]:
 229        return input_buffer
 230    matches = _slash_command_matches(input_buffer, commands)
 231    if not matches:
 232        return input_buffer
 233    return matches[0] + " "
 234
 235
 236def slash_completion_for_submit(input_buffer: str, commands: list[tuple[str, str]]) -> tuple[str, bool]:
 237    """Return the buffer to use and whether Enter should submit it now."""
 238
 239    if not input_buffer.startswith("/"):
 240        return input_buffer, True
 241    if " " in input_buffer[1:]:
 242        command = input_buffer[1:].split(maxsplit=1)[0].lower()
 243        if command in REQUIRED_SLASH_ARGUMENTS and not _slash_argument_text(input_buffer).strip():
 244            return input_buffer, False
 245        return input_buffer, True
 246    current = input_buffer.rstrip()
 247    if not current:
 248        return input_buffer, True
 249    command_names = {cmd for cmd, _desc in commands}
 250    token = current[1:].lower()
 251    exact = current in command_names
 252    if exact and token not in REQUIRED_SLASH_ARGUMENTS:
 253        return input_buffer, True
 254    matches = _slash_command_matches(input_buffer, commands)
 255    if not matches:
 256        return input_buffer, True
 257    selected = current if exact else matches[0]
 258    if selected[1:] not in REQUIRED_SLASH_ARGUMENTS:
 259        return selected, True
 260    suffix = " "
 261    completed = selected + suffix
 262    return completed, completed == input_buffer
 263
 264
 265def _slash_argument_text(input_buffer: str) -> str:
 266    parts = input_buffer[1:].split(maxsplit=1)
 267    return parts[1] if len(parts) > 1 else ""
 268
 269
 270def _slash_argument_footer(parts: list[str], hint: str) -> str:
 271    if len(parts) == 1:
 272        return f"╰─ type {hint}, then enter"
 273    return "╰─ enter sends"
 274
 275
 276def cycle_slash(input_buffer: str, commands: list[tuple[str, str]], *, direction: int) -> str:
 277    if not input_buffer.startswith("/") or " " in input_buffer[1:]:
 278        return input_buffer
 279    current = input_buffer.rstrip()
 280    command_names = [cmd for cmd, _desc in commands]
 281    matches = command_names if current in command_names else _slash_command_matches(input_buffer, commands)
 282    if not matches:
 283        return input_buffer
 284    try:
 285        index = matches.index(current)
 286    except ValueError:
 287        return matches[0] if direction >= 0 else matches[-1]
 288    return matches[(index + direction) % len(matches)]
 289
 290
 291def _slash_command_matches(input_buffer: str, commands: list[tuple[str, str]]) -> list[str]:
 292    token = input_buffer.strip()[1:].lower()
 293    matches = [cmd for cmd, _desc in commands if cmd[1:].startswith(token)]
 294    if not matches:
 295        matches = [cmd for cmd, _desc in commands if token in cmd[1:]]
 296    return matches
nipux_cli/tui_event_format.py 262 lines
   1"""Shared event formatting helpers for Nipux terminal renderers."""
   2
   3from __future__ import annotations
   4
   5import os
   6import re
   7import shlex
   8from pathlib import Path
   9from typing import Any
  10
  11from nipux_cli.metric_format import format_metric_value
  12from nipux_cli.tui_style import _one_line
  13
  14
  15def event_tool_args(metadata: dict[str, Any]) -> dict[str, Any]:
  16    input_data = metadata.get("input") if isinstance(metadata.get("input"), dict) else {}
  17    args = input_data.get("arguments") if isinstance(input_data.get("arguments"), dict) else {}
  18    return args
  19
  20
  21def shell_write_target(command: str) -> str:
  22    if not command.strip():
  23        return ""
  24    redirect = re.search(r"(?:^|\s)(?:1?>|>>)\s*([^\s;&|]+)", command)
  25    if redirect:
  26        target = redirect.group(1).strip("'\"")
  27        if target and not target.startswith("&"):
  28            return target
  29    try:
  30        parts = shlex.split(command)
  31    except ValueError:
  32        parts = command.split()
  33    for index, part in enumerate(parts):
  34        if part != "tee":
  35            continue
  36        for candidate in parts[index + 1 :]:
  37            if candidate.startswith("-"):
  38                continue
  39            return candidate
  40    return ""
  41
  42
  43def event_title_body(title: str, body: str, *, fallback: str) -> str:
  44    if title and body and title not in body:
  45        return f"{title} - {body}"
  46    return title or body or fallback
  47
  48
  49def experiment_metric_text(metadata: dict[str, Any]) -> str:
  50    value = metadata.get("metric_value")
  51    if value in (None, ""):
  52        return ""
  53    name = metadata.get("metric_name") or "metric"
  54    unit = metadata.get("metric_unit") or ""
  55    direction = metadata.get("result_direction") or metadata.get("decision") or ""
  56    return " ".join(part for part in [format_metric_value(name, value, unit), str(direction)] if part)
  57
  58
  59def event_clock(event: dict[str, Any]) -> str:
  60    compact = _compact_time(str(event.get("created_at") or ""))
  61    if len(compact) >= 16 and compact[10:11] == " ":
  62        return compact[11:16]
  63    return "" if compact == "?" else _one_line(compact, 5)
  64
  65
  66def event_hour(event: dict[str, Any]) -> str:
  67    compact = _compact_time(str(event.get("created_at") or ""))
  68    if len(compact) >= 13 and compact[10:11] == " ":
  69        return f"{compact[:13]}:00"
  70    if len(compact) >= 2:
  71        return compact
  72    return "recent"
  73
  74
  75def friendly_error_text(text: str) -> str:
  76    lowered = text.lower()
  77    if "key limit exceeded" in lowered:
  78        return "Provider key limit exceeded. Update the key limit or switch models."
  79    if "authenticationerror" in lowered or "user not found" in lowered or "401" in lowered:
  80        return "Model authentication failed. Update the API key with /api-key, then try again."
  81    if "permissiondeniederror" in lowered or "403" in lowered:
  82        return "Provider permission denied. Check model access or key limits."
  83    if (
  84        "apiconnectionerror" in lowered
  85        or "connection error" in lowered
  86        or "connection refused" in lowered
  87        or "failed to establish a new connection" in lowered
  88    ):
  89        return "Model endpoint is unreachable. Check /base-url or start the configured model server, then run /doctor."
  90    if "timeout" in lowered or "timed out" in lowered:
  91        return "Model request timed out. Check the endpoint/model or adjust /timeout, then run /doctor."
  92    return _one_line(clean_step_summary(text), 220)
  93
  94
  95def brief_reflection_text(text: str) -> str:
  96    clean = clean_step_summary(text)
  97    match = re.search(r"Reflection through step #?([0-9]+):\s*(.*?)(?:\. Best |\.\s*$|$)", clean)
  98    if match:
  99        counts = match.group(2)
 100        counts = counts.replace(", 0 active operator messages", "")
 101        counts = counts.replace(", 0 recent finding artifacts", "")
 102        return _one_line(f"reflected #{match.group(1)}: {counts}", 140)
 103    return _one_line(clean, 140)
 104
 105
 106def generic_display_text(value: Any) -> str:
 107    return " ".join(str(value).split())
 108
 109
 110def clean_step_summary(summary: Any) -> str:
 111    text = " ".join(str(summary).split())
 112    if text.startswith("write_artifact saved ") and " at /" in text:
 113        return text.split(" at /", 1)[0]
 114    return text
 115
 116
 117def chat_message_paragraphs(value: Any) -> list[str]:
 118    text = str(value).replace("\r\n", "\n").replace("\r", "\n")
 119    text = re.sub(r"\*\*([^*]+)\*\*", r"\1", text)
 120    text = re.sub(r"`([^`]+)`", r"\1", text)
 121    text = re.sub(r"(?<!^)\s(?=(?:[0-9]+\.|[-*])\s+)", "\n", text)
 122    paragraphs: list[str] = []
 123    for raw in text.splitlines():
 124        line = " ".join(raw.strip().split())
 125        if line:
 126            paragraphs.append(line)
 127    return paragraphs or [""]
 128
 129
 130def chat_agent_message_text(title: str, body: str) -> str:
 131    lowered = title.lower()
 132    if lowered == "chat":
 133        return body
 134    if lowered in {"plan", "planning"}:
 135        plan_body = body.split("Questions:", 1)[0]
 136        tasks = len(re.findall(r"(?:^|\s)- ", plan_body))
 137        if tasks:
 138            return f"Plan drafted with {tasks} items. Reply with changes or start work from the controls."
 139        return "Plan drafted. Reply with changes or start work from the controls."
 140    if lowered in {"progress", "update", "report"}:
 141        return _one_line(clean_step_summary(body), 220)
 142    return ""
 143
 144
 145def tool_live_summary(tool: str, metadata: dict[str, Any], body: str) -> str:
 146    args = event_tool_args(metadata)
 147    clean_body = clean_step_summary(body)
 148    if tool == "web_search":
 149        query = str(args.get("query") or _regex_group(r"query='([^']+)'", clean_body) or "")
 150        return f"search {query}" if query else "search web"
 151    if tool == "web_extract":
 152        urls = args.get("urls") if isinstance(args.get("urls"), list) else []
 153        count = len(urls)
 154        fetched = _regex_group(r"fetched ([0-9]+/[0-9]+ pages)", clean_body)
 155        return f"extract {fetched or (str(count) + ' pages' if count else 'pages')}"
 156    if tool == "shell_exec":
 157        command = str(args.get("command") or _regex_group(r"cmd='([^']+)'", clean_body) or "")
 158        prefix = f"shell {_short_command(command)}" if command else "shell command"
 159        rc = metadata.get("output", {}).get("returncode") if isinstance(metadata.get("output"), dict) else None
 160        return f"{prefix} rc={rc}" if rc is not None else prefix
 161    if tool == "browser_navigate":
 162        url = str(args.get("url") or _regex_group(r"<([^>]+)>", clean_body) or "")
 163        return f"open {_short_url(url)}" if url else "open page"
 164    if tool == "browser_snapshot":
 165        return "snapshot page"
 166    if tool == "browser_click":
 167        ref = str(args.get("ref") or "")
 168        return f"click {ref}" if ref else "click page"
 169    if tool == "browser_scroll":
 170        return f"scroll {args.get('direction') or 'page'}"
 171    if tool == "write_artifact":
 172        return "save output"
 173    if tool == "write_file":
 174        args_path = str(args.get("path") or "")
 175        output = metadata.get("output") if isinstance(metadata.get("output"), dict) else {}
 176        path = str(output.get("path") or args_path)
 177        return f"update {short_path(path, max_width=36)}" if path else "update file"
 178    if tool == "defer_job":
 179        seconds = args.get("seconds") or args.get("delay_seconds")
 180        until = args.get("until")
 181        if until:
 182            return f"wait until {until}"
 183        return f"wait {seconds}s" if seconds else "wait before next check"
 184    if tool == "record_lesson":
 185        return "learn memory"
 186    if tool == "record_memory_graph":
 187        return "map memory"
 188    if tool == "search_memory_graph":
 189        return "search memory"
 190    if tool == "record_source":
 191        return "score source"
 192    if tool == "record_findings":
 193        return "record findings"
 194    if tool == "record_tasks":
 195        return "update tasks"
 196    if tool == "record_roadmap":
 197        return "update roadmap"
 198    if tool == "record_milestone_validation":
 199        return "validate roadmap"
 200    if tool == "record_experiment":
 201        return "record experiment"
 202    if tool == "acknowledge_operator_context":
 203        return "ack operator"
 204    if tool == "report_update":
 205        return "report update"
 206    if tool == "read_artifact":
 207        return "read output"
 208    if tool == "search_artifacts":
 209        return "search outputs"
 210    return tool or clean_body or "step"
 211
 212
 213def short_path(path: Path | str, *, max_width: int = 80) -> str:
 214    text = str(path)
 215    home = str(Path.home())
 216    if text.startswith(home + os.sep):
 217        text = "~" + text[len(home) :]
 218    if len(text) <= max_width:
 219        return text
 220    keep = max(12, max_width - 4)
 221    return "..." + text[-keep:]
 222
 223
 224def _regex_group(pattern: str, text: str) -> str:
 225    match = re.search(pattern, text)
 226    return match.group(1) if match else ""
 227
 228
 229def _short_url(url: str) -> str:
 230    if not url:
 231        return ""
 232    stripped = url.replace("https://", "").replace("http://", "")
 233    return stripped.split("/", 1)[0] or stripped
 234
 235
 236def _short_command(command: str) -> str:
 237    if not command:
 238        return ""
 239    try:
 240        parts = shlex.split(command)
 241    except ValueError:
 242        parts = command.split()
 243    if not parts:
 244        return ""
 245    if parts[0] == "ssh":
 246        host = next((part for part in parts[1:] if not part.startswith("-") and "=" not in part), "")
 247        remote = " ".join(parts[parts.index(host) + 1 :]) if host in parts else ""
 248        if remote:
 249            remote_parts = remote.split()
 250            remote_head = remote_parts[0] if remote_parts else "remote"
 251            return f"ssh {host} {remote_head}"
 252        return f"ssh {host}".strip()
 253    if parts[0] in {"python", "python3", "uv", "npm", "pnpm", "yarn", "node"} and len(parts) > 1:
 254        return " ".join(parts[:3])
 255    return " ".join(parts[:2])
 256
 257
 258def _compact_time(value: str) -> str:
 259    text = value.replace("T", " ")
 260    if len(text) >= 16 and text[4:5] == "-" and text[13:14] == ":":
 261        return text[:16]
 262    return _one_line(text, 16)
nipux_cli/tui_events.py 371 lines
   1"""Compact event rendering helpers for the Nipux terminal UI."""
   2
   3from __future__ import annotations
   4
   5import re
   6import textwrap
   7from typing import Any
   8
   9from nipux_cli.tui_event_format import (
  10    brief_reflection_text,
  11    chat_message_paragraphs,
  12    event_clock,
  13    event_title_body,
  14    friendly_error_text,
  15    generic_display_text,
  16    tool_live_summary,
  17)
  18from nipux_cli.tui_style import (
  19    _accent,
  20    _bold,
  21    _center_ansi,
  22    _fit_ansi,
  23    _muted,
  24    _one_line,
  25    _style,
  26)
  27
  28THINKING_NOTICE_PREFIX = "__nipux_thinking__:"
  29WAITING_NOTICE_PREFIX = "__nipux_waiting__:"
  30
  31LOW_VALUE_CHAT_NOTICES = (
  32    "sent; waiting for model",
  33    "sent, waiting for model",
  34    "waiting for model",
  35    "waiting for the next worker step",
  36)
  37
  38NIPUX_HERO = [
  39    "███╗   ██╗██╗██████╗ ██╗   ██╗██╗  ██╗",
  40    "████╗  ██║██║██╔══██╗██║   ██║╚██╗██╔╝",
  41    "██╔██╗ ██║██║██████╔╝██║   ██║ ╚███╔╝ ",
  42    "██║╚██╗██║██║██╔═══╝ ██║   ██║ ██╔██╗ ",
  43    "██║ ╚████║██║██║     ╚██████╔╝██╔╝ ██╗",
  44    "╚═╝  ╚═══╝╚═╝╚═╝      ╚═════╝ ╚═╝  ╚═╝",
  45]
  46
  47LOW_SIGNAL_FRAME_TOOLS = {
  48    "acknowledge_operator_context",
  49    "read_artifact",
  50    "record_experiment",
  51    "record_findings",
  52    "record_lesson",
  53    "record_milestone_validation",
  54    "record_roadmap",
  55    "record_source",
  56    "record_tasks",
  57    "reflect",
  58    "report_update",
  59    "search_artifacts",
  60    "update_job_state",
  61    "write_artifact",
  62}
  63
  64GENERIC_CHAT_NOTICE_PREFIXES = (
  65    "opened ",
  66    "focus set",
  67    "paused ",
  68    "resumed ",
  69    "cancelled ",
  70    "deleted ",
  71)
  72
  73
  74def chat_event_parts(event: dict[str, Any]) -> tuple[str, str, str] | None:
  75    kind = str(event.get("event_type") or "")
  76    title = str(event.get("title") or "").strip()
  77    body = str(event.get("body") or "")
  78    metadata = event.get("metadata") if isinstance(event.get("metadata"), dict) else {}
  79    clock = event_clock(event)
  80    if kind == "operator_message":
  81        return "YOU", body, clock
  82    if kind == "agent_message" and title == "chat":
  83        if metadata.get("error"):
  84            body = friendly_error_text(body)
  85        if _is_low_value_chat_notice(_normalized_chat_body(body)) or _is_waiting_notice(_normalized_chat_body(body)):
  86            return None
  87        return "AGENT", body, clock
  88    return None
  89
  90
  91def append_chat_output(lines: list[str], label: str, body: Any, *, clock: str, width: int) -> None:
  92    label_text = _chat_label(label)
  93    meta = f"{label_text} {_muted(clock)}" if clock else label_text
  94    lines.append(_fit_ansi(meta, width))
  95    prefix = "  "
  96    available = max(18, width - len(prefix))
  97    for paragraph in chat_message_paragraphs(body):
  98        wrapped = textwrap.wrap(paragraph, width=available) or [""]
  99        for part in wrapped:
 100            lines.append(_fit_ansi(prefix + part, width))
 101    if len(lines) == 1:
 102        lines.append(_fit_ansi(prefix, width))
 103
 104
 105def chat_pane_lines(events: list[dict[str, Any]], notices: list[str], *, width: int, rows: int) -> list[str]:
 106    items: list[tuple[str, str, str]] = []
 107    seen_chat_bodies: set[tuple[str, str]] = set()
 108    seen_bodies: set[str] = set()
 109    for event in events:
 110        rendered = chat_event_parts(event)
 111        if not rendered:
 112            continue
 113        label, body, clock = rendered
 114        items.append((label, body, clock))
 115        normalized = _normalized_chat_body(body)
 116        seen_chat_bodies.add((label, normalized))
 117        seen_bodies.add(normalized)
 118    for notice in notices:
 119        if notice.startswith(THINKING_NOTICE_PREFIX):
 120            body = notice.removeprefix(THINKING_NOTICE_PREFIX).strip() or "thinking"
 121            items.append(("THINKING", body, ""))
 122        elif notice.startswith(WAITING_NOTICE_PREFIX):
 123            body = notice.removeprefix(WAITING_NOTICE_PREFIX).strip() or "waiting"
 124            items.append(("WAITING", body, ""))
 125        elif notice.startswith("> "):
 126            body = notice[2:]
 127            normalized = _normalized_chat_body(body)
 128            if normalized and normalized not in seen_bodies and ("YOU", normalized) not in seen_chat_bodies:
 129                items.append(("YOU", body, ""))
 130                seen_bodies.add(normalized)
 131        else:
 132            body = notice
 133            normalized = _normalized_chat_body(body)
 134            if _is_low_value_chat_notice(normalized) or _is_waiting_notice(normalized) or _is_generic_chat_notice(normalized):
 135                continue
 136            if normalized and normalized not in seen_bodies and ("AGENT", normalized) not in seen_chat_bodies:
 137                items.append(("NIPUX", body, ""))
 138                seen_bodies.add(normalized)
 139    if not items:
 140        return chat_empty_state_lines(width=width, rows=rows)
 141    rendered_items = [_chat_item_lines(label, body, clock=clock, width=width) for label, body, clock in items[-max(4, rows) :]]
 142    output_rows = _flatten_chat_blocks(rendered_items)
 143    if len(output_rows) <= rows:
 144        return output_rows
 145    if rows <= 1:
 146        return output_rows[-rows:]
 147    newest = rendered_items[-1]
 148    if len(newest) >= rows:
 149        if rows <= 3:
 150            visible = newest[: rows - 1]
 151            hidden = len(newest) - len(visible)
 152            marker = _fit_ansi(_muted(f"... {hidden} more lines in /history."), width)
 153            return [*visible, marker]
 154        head = max(1, min(4, rows // 3))
 155        tail = max(1, rows - head - 1)
 156        hidden = max(0, len(newest) - head - tail)
 157        marker = _fit_ansi(_muted(f"... {hidden} middle lines hidden; /history shows all."), width)
 158        return [*newest[:head], marker, *newest[-tail:]]
 159    visible_blocks: list[list[str]] = [newest]
 160    used = len(newest)
 161    hidden_lines = 0
 162    for block in reversed(rendered_items[:-1]):
 163        if used + len(block) + 1 <= rows:
 164            visible_blocks.insert(0, block)
 165            used += len(block)
 166        else:
 167            hidden_lines += len(block)
 168    marker = _fit_ansi(_muted(f"... {hidden_lines} older chat lines hidden; /history shows all."), width)
 169    return [marker, *_flatten_chat_blocks(visible_blocks)][:rows]
 170
 171
 172def _chat_item_lines(label: str, body: Any, *, clock: str, width: int) -> list[str]:
 173    if label == "THINKING":
 174        return [_fit_ansi(f"{_style('AGENT', '36;1')}  {_accent(str(body))}", width)]
 175    if label == "WAITING":
 176        return [_fit_ansi(f"{_style('AGENT', '36;1')}  {_muted(str(body))}", width)]
 177    lines: list[str] = []
 178    append_chat_output(lines, label, body, clock=clock, width=width)
 179    return lines
 180
 181
 182def _flatten_chat_blocks(blocks: list[list[str]]) -> list[str]:
 183    rows: list[str] = []
 184    for block in blocks:
 185        if rows:
 186            rows.append("")
 187        rows.extend(block)
 188    return rows
 189
 190
 191def _chat_label(label: str) -> str:
 192    if label == "YOU":
 193        return _style("you", "35;1")
 194    if label == "AGENT":
 195        return _style("nipux", "36;1")
 196    if label == "THINKING":
 197        return _style("thinking", "36")
 198    if label == "WAITING":
 199        return _style("waiting", "36")
 200    return _muted(label.lower())
 201
 202
 203def _normalized_chat_body(value: Any) -> str:
 204    return " ".join(str(value or "").split())
 205
 206
 207def _is_low_value_chat_notice(normalized: str) -> bool:
 208    lowered = normalized.lower()
 209    return any(phrase in lowered for phrase in LOW_VALUE_CHAT_NOTICES)
 210
 211
 212def _is_waiting_notice(normalized: str) -> bool:
 213    lowered = normalized.lower()
 214    return (
 215        lowered.startswith("waiting:")
 216        or lowered.startswith("waiting for ")
 217        or "message saved for the worker" in lowered
 218    )
 219
 220
 221def _is_generic_chat_notice(normalized: str) -> bool:
 222    lowered = normalized.lower()
 223    return any(lowered.startswith(prefix) for prefix in GENERIC_CHAT_NOTICE_PREFIXES)
 224
 225
 226def chat_empty_state_lines(*, width: int, rows: int) -> list[str]:
 227    content = [
 228        _center_ansi(_bold(_accent("NIPUX")), width),
 229        "",
 230        *_centered_wrapped_hint("Type a goal in plain English to start a worker.", width=width),
 231        *_centered_wrapped_hint("Or use /new OBJECTIVE.", width=width),
 232        *_centered_wrapped_hint("/settings configures provider/tools.", width=width),
 233    ]
 234    top_pad = max(0, (rows - len(content)) // 2)
 235    return ([""] * top_pad + content)[:rows]
 236
 237
 238def _centered_wrapped_hint(text: str, *, width: int) -> list[str]:
 239    available = max(24, min(68, width - 4))
 240    return [_center_ansi(_muted(part), width) for part in textwrap.wrap(text, width=available) or [text]]
 241
 242
 243def worker_activity_lines(events: list[dict[str, Any]], *, width: int, limit: int) -> list[str]:
 244    items: list[dict[str, Any]] = []
 245    for event in events:
 246        line = minimal_live_event_line(event, chars=max(16, width - 12))
 247        if not line:
 248            continue
 249        if items and items[-1].get("key") == line:
 250            items[-1]["count"] = int(items[-1].get("count") or 1) + 1
 251            continue
 252        items.append({"line": line, "count": 1, "key": line})
 253    rendered = []
 254    for item in items[-limit:]:
 255        line = str(item["line"])
 256        count = int(item.get("count") or 1)
 257        rendered.append(f"{live_badge(line)} {_one_line(live_display_text(line, count=count), max(16, width - 9))}")
 258    return rendered
 259
 260
 261def minimal_live_event_line(event: dict[str, Any], *, chars: int = 92) -> str:
 262    kind = str(event.get("event_type") or "")
 263    title = str(event.get("title") or "").strip()
 264    body = generic_display_text(event.get("body") or "")
 265    metadata = event.get("metadata") if isinstance(event.get("metadata"), dict) else {}
 266    status = str(metadata.get("status") or "")
 267    if kind == "operator_message":
 268        return ""
 269    if kind == "agent_message" and title == "chat":
 270        return ""
 271    if kind == "operator_context":
 272        return _one_line(f"operator {title or body}", chars)
 273    if kind == "tool_call":
 274        if title in LOW_SIGNAL_FRAME_TOOLS:
 275            return ""
 276        return _one_line("start " + tool_live_summary(title, metadata, body), chars)
 277    if kind == "tool_result":
 278        if title in LOW_SIGNAL_FRAME_TOOLS and status == "completed":
 279            return ""
 280        if title == "llm" and status == "failed":
 281            return ""
 282        prefix = "blocked" if status == "blocked" else "failed" if status == "failed" else "done"
 283        detail = friendly_error_text(body or title) if status in {"blocked", "failed"} else tool_live_summary(title, metadata, body)
 284        return _one_line(f"{prefix} {detail}", chars)
 285    if kind == "error":
 286        detail = friendly_error_text(body or title or "error")
 287        return _one_line(f"error {detail}", chars)
 288    if kind == "artifact":
 289        return _one_line(f"saved {title or body or 'output'}", chars)
 290    if kind == "finding":
 291        return _one_line(f"finding {title or body}", chars)
 292    if kind == "source":
 293        return _one_line(f"source {title or body}", chars)
 294    if kind == "task":
 295        return _one_line(f"task {title or body}", chars)
 296    if kind == "roadmap":
 297        return _one_line(f"roadmap {title or body}", chars)
 298    if kind == "milestone_validation":
 299        validation = str(metadata.get("validation_status") or metadata.get("status") or "")
 300        return _one_line(f"validate {validation} {title or body}".strip(), chars)
 301    if kind == "experiment":
 302        return _one_line(f"experiment {title or body}", chars)
 303    if kind == "lesson":
 304        detail = event_title_body(title, body, fallback="lesson")
 305        return _one_line(f"learned {detail}", chars)
 306    if kind == "reflection":
 307        return _one_line(f"reflect {brief_reflection_text(body or title)}", chars)
 308    if kind == "agent_message":
 309        if title not in {"chat", "progress", "update", "report"}:
 310            return ""
 311        return _one_line(f"update {body or title}", chars)
 312    if kind in {"daemon", "loop"}:
 313        return ""
 314    return ""
 315
 316
 317def live_badge(text: str) -> str:
 318    badge_text = re.sub(r"^x[0-9]+\s+", "", text)
 319    if badge_text.startswith("error") or badge_text.startswith("failed"):
 320        return _style("FAIL", "31")
 321    if badge_text.startswith("blocked"):
 322        return _style("BLOCK", "33")
 323    if badge_text.startswith("start"):
 324        return _style("run ", "36")
 325    if badge_text.startswith("done"):
 326        return _style("done", "32")
 327    if badge_text.startswith("saved"):
 328        return _style("save", "32")
 329    if badge_text.startswith("finding"):
 330        return _style("find", "32")
 331    if badge_text.startswith("source"):
 332        return _style("src ", "36")
 333    if badge_text.startswith("experiment"):
 334        return _style("test", "33")
 335    if badge_text.startswith("task"):
 336        return _style("task", "33")
 337    if badge_text.startswith("learned"):
 338        return _style("mem ", "36")
 339    if badge_text.startswith("reflect"):
 340        return _style("plan", "35")
 341    if badge_text.startswith("update"):
 342        return _style("note", "35")
 343    return _style("info", "2")
 344
 345
 346def live_display_text(text: str, *, count: int = 1) -> str:
 347    if count > 1 and (
 348        text.startswith("error")
 349        or text.startswith("failed")
 350        or text.startswith("blocked")
 351    ):
 352        return f"x{count} {text}"
 353    base = text
 354    for prefix in (
 355        "start ",
 356        "done ",
 357        "saved ",
 358        "finding ",
 359        "source ",
 360        "experiment ",
 361        "task ",
 362        "learned ",
 363        "reflect ",
 364        "update ",
 365    ):
 366        if text.startswith(prefix):
 367            base = text[len(prefix) :]
 368            break
 369    if count > 1:
 370        return f"{base} x{count}"
 371    return base
nipux_cli/tui_input.py 80 lines
   1"""Terminal input helpers for full-screen Nipux frames."""
   2
   3from __future__ import annotations
   4
   5import os
   6import re
   7import select
   8import sys
   9import time
  10
  11
  12def read_terminal_char(fd: int) -> str:
  13    data = os.read(fd, 1)
  14    return data.decode("latin1", errors="ignore")
  15
  16
  17def read_escape_sequence(first: str, *, fd: int | None = None) -> str:
  18    fd = sys.stdin.fileno() if fd is None else fd
  19    sequence = first
  20    deadline = time.monotonic() + 0.12
  21    while len(sequence) < 96:
  22        timeout = max(0.0, min(0.04, deadline - time.monotonic()))
  23        if timeout <= 0:
  24            break
  25        readable, _, _ = select.select([fd], [], [], timeout)
  26        if not readable:
  27            break
  28        sequence += read_terminal_char(fd)
  29        if terminal_escape_complete(sequence):
  30            break
  31    return sequence
  32
  33
  34def terminal_escape_complete(sequence: str) -> bool:
  35    if sequence in {"\x1b[A", "\x1b[B", "\x1b[C", "\x1b[D", "\x1bOA", "\x1bOB", "\x1bOC", "\x1bOD"}:
  36        return True
  37    if re.match(r"^\x1b\[[0-9;?]*[ABCD]$", sequence):
  38        return True
  39    if re.match(r"^\x1b\[<\d+;\d+;\d+[mM]$", sequence):
  40        return True
  41    if sequence.startswith("\x1b[M") and len(sequence) >= 6:
  42        return True
  43    return False
  44
  45
  46def decode_terminal_escape(sequence: str) -> tuple[str, tuple[int, int] | None]:
  47    arrows = {
  48        "\x1b[A": "up",
  49        "\x1b[B": "down",
  50        "\x1b[C": "right",
  51        "\x1b[D": "left",
  52        "\x1bOA": "up",
  53        "\x1bOB": "down",
  54        "\x1bOC": "right",
  55        "\x1bOD": "left",
  56    }
  57    if sequence in arrows:
  58        return arrows[sequence], None
  59    csi_arrow = re.match(r"^\x1b\[[0-9;?]*([ABCD])$", sequence)
  60    if csi_arrow:
  61        return {"A": "up", "B": "down", "C": "right", "D": "left"}[csi_arrow.group(1)], None
  62    match = re.match(r"^\x1b\[<(\d+);(\d+);(\d+)([mM])$", sequence)
  63    if match and match.group(4) == "M":
  64        button = int(match.group(1))
  65        if button == 0:
  66            return "click", (int(match.group(2)), int(match.group(3)))
  67    if sequence.startswith("\x1b[M") and len(sequence) >= 6:
  68        button = ord(sequence[3]) - 32
  69        if button == 0:
  70            return "click", (ord(sequence[4]) - 32, ord(sequence[5]) - 32)
  71    return "unknown", None
  72
  73
  74def drain_pending_input(fd: int | None = None) -> None:
  75    fd = sys.stdin.fileno() if fd is None else fd
  76    while True:
  77        readable, _, _ = select.select([fd], [], [], 0)
  78        if not readable:
  79            return
  80        os.read(fd, 1)
nipux_cli/tui_layout.py 234 lines
   1"""Reusable terminal layout primitives for Nipux frames."""
   2
   3from __future__ import annotations
   4
   5from typing import Any
   6
   7from nipux_cli.tui_style import (
   8    _accent,
   9    _bold,
  10    _fit_ansi,
  11    _muted,
  12    _one_line,
  13    _strip_ansi,
  14    _style,
  15)
  16
  17
  18def _top_bar(
  19    width: int,
  20    *,
  21    state: str,
  22    daemon: str,
  23    model: str,
  24    token_usage: dict[str, Any] | None = None,
  25    context_length: int = 0,
  26    base_url: str = "",
  27) -> list[str]:
  28    del state, daemon
  29    title = _style("NIPUX", "38;5;123;1")
  30    usage_text = _token_usage_topline(token_usage or {}, context_length=context_length, model=model, base_url=base_url)
  31    model_line = f"{_muted('model')} {_style(_one_line(model, max(16, width // 3)), '36')}"
  32    if width >= 118:
  33        compact_model = f"{_muted('model')} {_style(_one_line(model, max(14, width // 5)), '36')}"
  34        return [
  35            _edge_line(title, f"{compact_model}  {usage_text}", width=width),
  36            _muted("━" * width),
  37        ]
  38    first = _edge_line(title, model_line, width=width)
  39    second = _edge_line("", usage_text, width=width)
  40    return [
  41        first,
  42        second,
  43        _muted("━" * width),
  44    ]
  45
  46
  47def _two_col_title(left_width: int, right_width: int, left: str, right: str) -> str:
  48    left_title = _style(left.upper(), "38;5;252;1")
  49    right_title = _style(right.upper(), "38;5;252;1")
  50    return _fit_ansi(left_title, left_width) + _muted(" │ ") + _fit_ansi(right_title, right_width)
  51
  52
  53def _two_col_line(left: str, right: str, *, left_width: int, right_width: int) -> str:
  54    return _fit_ansi(left, left_width) + _muted(" │ ") + _fit_ansi(right, right_width)
  55
  56
  57def _edge_line(left: str, right: str, *, width: int) -> str:
  58    right_len = len(_strip_ansi(right))
  59    left_width = max(0, width - right_len - 2)
  60    left_text = _fit_ansi(left, left_width)
  61    gap = max(1, width - len(_strip_ansi(left_text)) - right_len)
  62    return _fit_ansi(left_text + " " * gap + right, width)
  63
  64
  65def _triple_line(left: str, center: str, right: str, *, width: int) -> str:
  66    right_len = len(_strip_ansi(right))
  67    center_len = len(_strip_ansi(center))
  68    left_len = len(_strip_ansi(left))
  69    center_start = max(left_len + 2, (width - center_len) // 2)
  70    right_start = max(center_start + center_len + 1, width - right_len)
  71    if right_start >= width:
  72        return _edge_line(center, right, width=width)
  73    parts = [
  74        left,
  75        " " * max(1, center_start - left_len),
  76        center,
  77        " " * max(1, right_start - center_start - center_len),
  78        right,
  79    ]
  80    return _fit_ansi("".join(parts), width)
  81
  82
  83def _compose_bar(
  84    input_buffer: str,
  85    *,
  86    width: int,
  87    hint: str | None = None,
  88    suggestions: list[str] | None = None,
  89    prompt_label: str = "❯",
  90    title: str = "message",
  91    mask_input: bool = False,
  92) -> list[str]:
  93    if mask_input:
  94        visible_input = "•" * min(len(input_buffer), max(8, width - 8))
  95    else:
  96        visible_input = input_buffer[-max(8, width - 8) :]
  97    hint = _muted(hint or "Enter send  ·  / commands  ·  arrows navigate")
  98    label = _accent(prompt_label) if prompt_label == "❯" else _muted(prompt_label)
  99    prompt = f"{label} {visible_input}{_accent('▌')}"
 100    lines = []
 101    if suggestions:
 102        lines.extend(suggestions)
 103    title = f" {title.strip()} "
 104    lines.extend([
 105        _muted("╭─" + title + "─" * max(0, width - len(title) - 2)),
 106        _fit_ansi(_muted("│ ") + prompt, width),
 107        _fit_ansi(_muted("╰─ ") + hint, width),
 108    ])
 109    return lines
 110
 111
 112def _metric_strip(items: list[tuple[str, Any]], *, width: int) -> str:
 113    parts = [f"{_muted(label)} {_bold(value)}" for label, value in items]
 114    text = "  ".join(parts)
 115    if len(_strip_ansi(text)) <= width:
 116        return text
 117    compact = [f"{label}:{value}" for label, value in items]
 118    return _one_line("  ".join(compact), width)
 119
 120
 121def _pill(label: str, value: Any) -> str:
 122    value_text = str(value)
 123    color = "36"
 124    lowered = value_text.lower()
 125    if any(term in lowered for term in ("running", "active", "advancing", "ok")):
 126        color = "32"
 127    elif any(term in lowered for term in ("paused", "idle", "queued", "planning")):
 128        color = "33"
 129    elif any(term in lowered for term in ("failed", "cancelled", "error", "stopped")):
 130        color = "31"
 131    return f"{_muted(label)} {_style(value_text, color)}"
 132
 133
 134def _token_usage_topline(
 135    usage: dict[str, Any],
 136    *,
 137    context_length: int,
 138    model: str,
 139    base_url: str,
 140) -> str:
 141    calls = _safe_int(usage.get("calls"))
 142    if calls <= 0:
 143        return (
 144            f"{_muted('ctx')} {_style('0', '36')}  "
 145            f"{_muted('out')} {_style('0', '36')}  "
 146            f"{_muted('tok')} {_style('0', '36')}  "
 147            f"{_muted('cost')} {_style('$0.00', '36')}"
 148        )
 149    latest_prompt = _safe_int(usage.get("latest_prompt_tokens"))
 150    completion = _safe_int(usage.get("completion_tokens"))
 151    total = _safe_int(usage.get("total_tokens")) or latest_prompt + completion
 152    ctx_text = _format_compact_count(latest_prompt)
 153    if context_length > 0:
 154        ctx_text = f"{ctx_text}/{_format_compact_count(context_length)}"
 155    cost_text = _format_usage_cost(usage, model=model, base_url=base_url)
 156    return (
 157        f"{_muted('ctx')} {_style(ctx_text, '36')}  "
 158        f"{_muted('out')} {_style(_format_compact_count(completion), '36')}  "
 159        f"{_muted('tok')} {_style(_format_compact_count(total), '36')}  "
 160        f"{_muted('cost')} {_style(cost_text, '36')}"
 161    )
 162
 163
 164def _model_cost_is_zero(*, model: str, base_url: str) -> bool:
 165    lowered_model = model.lower()
 166    lowered_url = base_url.lower()
 167    return (
 168        lowered_model.endswith(":free")
 169        or lowered_model in {"local-model", "fake", "test"}
 170        or "localhost" in lowered_url
 171        or "127.0.0.1" in lowered_url
 172    )
 173
 174
 175def _format_usage_cost(usage: dict[str, Any], *, model: str, base_url: str) -> str:
 176    if bool(usage.get("has_cost")):
 177        return f"${_safe_float(usage.get('cost')):.4f}"
 178    if _model_cost_is_zero(model=model, base_url=base_url):
 179        return "$0.00"
 180    input_rate = _safe_optional_float(usage.get("input_cost_per_million"))
 181    output_rate = _safe_optional_float(usage.get("output_cost_per_million"))
 182    if input_rate is not None and output_rate is not None:
 183        prompt = _safe_int(usage.get("prompt_tokens"))
 184        completion = _safe_int(usage.get("completion_tokens"))
 185        if prompt > 0 or completion > 0:
 186            estimated = (prompt / 1_000_000 * input_rate) + (completion / 1_000_000 * output_rate)
 187            return f"~${estimated:.4f}"
 188    if _safe_int(usage.get("estimated_calls")):
 189        return "pending"
 190    return "pending"
 191
 192
 193def _format_compact_count(value: Any) -> str:
 194    number = _safe_int(value)
 195    if number >= 1_000_000_000:
 196        return f"{number / 1_000_000_000:.1f}B"
 197    if number >= 1_000_000:
 198        return f"{number / 1_000_000:.1f}M"
 199    if number >= 1_000:
 200        return f"{number / 1_000:.1f}K"
 201    return str(number)
 202
 203
 204def _safe_int(value: Any) -> int:
 205    try:
 206        return int(float(value))
 207    except (TypeError, ValueError):
 208        return 0
 209
 210
 211def _safe_float(value: Any) -> float:
 212    try:
 213        return float(value)
 214    except (TypeError, ValueError):
 215        return 0.0
 216
 217
 218def _safe_optional_float(value: Any) -> float | None:
 219    if value in (None, ""):
 220        return None
 221    try:
 222        return float(value)
 223    except (TypeError, ValueError):
 224        return None
 225
 226
 227def _status_dot(state: str) -> str:
 228    if state in {"advancing", "running", "active"}:
 229        return _style("●", "32")
 230    if state in {"paused", "queued", "planning", "idle"}:
 231        return _style("●", "33")
 232    if state in {"failed", "cancelled"}:
 233        return _style("●", "31")
 234    return _style("●", "36")
nipux_cli/tui_outcomes.py 508 lines
   1"""Durable outcome summaries for the Nipux terminal UI."""
   2
   3from __future__ import annotations
   4
   5import textwrap
   6from typing import Any
   7
   8from nipux_cli.tui_event_format import (
   9    brief_reflection_text,
  10    chat_agent_message_text,
  11    event_clock,
  12    event_hour,
  13    event_title_body,
  14    event_tool_args,
  15    experiment_metric_text,
  16    generic_display_text,
  17    shell_write_target,
  18    short_path,
  19    tool_live_summary,
  20)
  21from nipux_cli.tui_style import _bold, _event_badge, _fit_ansi, _muted, _one_line, _page_indicator, _strip_ansi
  22
  23
  24CHAT_RIGHT_PAGES = [("updates", "Updates"), ("status", "Jobs")]
  25
  26DURABLE_OUTCOME_LABELS = {
  27    "SAVE",
  28    "FIND",
  29    "SOURCE",
  30    "TEST",
  31    "TASK",
  32    "ROAD",
  33    "VALID",
  34    "LEARN",
  35    "FILE",
  36}
  37
  38SUMMARY_COUNT_LABELS = DURABLE_OUTCOME_LABELS | {"DONE", "FAIL"}
  39PRIMARY_OUTCOME_LABELS = DURABLE_OUTCOME_LABELS | {"FAIL"}
  40OUTCOME_LABEL_ORDER = [
  41    "SAVE",
  42    "FIND",
  43    "TEST",
  44    "FILE",
  45    "TASK",
  46    "ROAD",
  47    "VALID",
  48    "SOURCE",
  49    "LEARN",
  50    "PLAN",
  51    "UPDATE",
  52    "FAIL",
  53    "DONE",
  54]
  55
  56OUTCOME_SUMMARY_NAMES = {
  57    "SAVE": "outputs",
  58    "FIND": "findings",
  59    "SOURCE": "sources",
  60    "TEST": "measurements",
  61    "TASK": "tasks",
  62    "ROAD": "roadmap",
  63    "VALID": "validations",
  64    "LEARN": "lessons",
  65    "PLAN": "plans",
  66    "UPDATE": "updates",
  67    "FAIL": "blocks",
  68    "FILE": "files",
  69    "DONE": "research",
  70}
  71
  72SUMMARY_EVENT_TYPES = (
  73    "agent_message",
  74    "artifact",
  75    "error",
  76    "experiment",
  77    "finding",
  78    "lesson",
  79    "milestone_validation",
  80    "reflection",
  81    "roadmap",
  82    "source",
  83    "task",
  84)
  85
  86SUMMARY_TOOL_EVENT_TYPES = ("tool_result",)
  87
  88
  89def model_update_event_parts(event: dict[str, Any], *, width: int, compact: bool = True) -> tuple[str, str, str] | None:
  90    kind = str(event.get("event_type") or "")
  91    title = generic_display_text(event.get("title") or "")
  92    body = generic_display_text(event.get("body") or "")
  93    metadata = event.get("metadata") if isinstance(event.get("metadata"), dict) else {}
  94    status = str(metadata.get("status") or "")
  95    clock = event_clock(event)
  96    chars = max(24, width - 16)
  97    if kind == "error":
  98        return "FAIL", _outcome_text(event_title_body(title, body, fallback="error"), chars=chars, compact=compact), clock
  99    if kind == "artifact":
 100        detail = title or body or str(metadata.get("summary") or "") or "saved output"
 101        return "SAVE", _outcome_text(detail, chars=chars, compact=compact), clock
 102    if kind == "finding":
 103        return "FIND", _outcome_text(event_title_body(title, body, fallback="finding"), chars=chars, compact=compact), clock
 104    if kind == "source":
 105        return "SOURCE", _outcome_text(event_title_body(title, body, fallback="source"), chars=chars, compact=compact), clock
 106    if kind == "experiment":
 107        metric = experiment_metric_text(metadata)
 108        detail = event_title_body(title, body, fallback="measurement")
 109        if metric and metric not in detail:
 110            detail = f"{detail} - {metric}"
 111        return "TEST", _outcome_text(detail, chars=chars, compact=compact), clock
 112    if kind == "task":
 113        task_status = str(metadata.get("status") or "")
 114        detail = event_title_body(title, body, fallback="task")
 115        prefix = f"{task_status} " if task_status else ""
 116        return "TASK", _outcome_text(prefix + detail, chars=chars, compact=compact), clock
 117    if kind == "roadmap":
 118        return "ROAD", _outcome_text(event_title_body(title, body, fallback="roadmap"), chars=chars, compact=compact), clock
 119    if kind == "milestone_validation":
 120        validation = str(metadata.get("validation_status") or metadata.get("status") or "")
 121        detail = event_title_body(title, body, fallback="milestone")
 122        return "VALID", _outcome_text(f"{validation} {detail}".strip(), chars=chars, compact=compact), clock
 123    if kind == "lesson":
 124        return "LEARN", _outcome_text(event_title_body(title, body, fallback="lesson"), chars=chars, compact=compact), clock
 125    if kind == "reflection":
 126        return "PLAN", _outcome_text(brief_reflection_text(body or title), chars=chars, compact=compact), clock
 127    if kind == "agent_message" and title.lower() in {"error", "blocked"}:
 128        detail = body or chat_agent_message_text(title, body) or event_title_body(title, body, fallback="error")
 129        return "FAIL", _outcome_text(detail, chars=chars, compact=compact), clock
 130    if kind == "agent_message" and title.lower() in {"progress", "update", "report", "plan", "planning"}:
 131        durable_progress = _durable_progress_event_parts(metadata, body=body, chars=chars, compact=compact, clock=clock)
 132        if durable_progress:
 133            return durable_progress
 134        detail = chat_agent_message_text(title, body) or event_title_body(title, body, fallback="update")
 135        return "UPDATE", _outcome_text(detail, chars=chars, compact=compact), clock
 136    if kind == "tool_result" and status == "completed":
 137        tool = title
 138        if tool in {"web_search", "web_extract"}:
 139            return "DONE", _outcome_text(tool_live_summary(tool, metadata, body), chars=chars, compact=compact), clock
 140        if tool == "shell_exec":
 141            command = str(event_tool_args(metadata).get("command") or "")
 142            target = shell_write_target(command)
 143            if target:
 144                return "FILE", _outcome_text(f"updated {short_path(target, max_width=chars - 8)} via shell", chars=chars, compact=compact), clock
 145        if tool == "write_file":
 146            output = metadata.get("output") if isinstance(metadata.get("output"), dict) else {}
 147            path = str(output.get("path") or event_tool_args(metadata).get("path") or "")
 148            return "FILE", _outcome_text(f"updated {short_path(path, max_width=chars - 8)}", chars=chars, compact=compact), clock
 149    return None
 150
 151
 152def is_summary_event_candidate(event: dict[str, Any]) -> bool:
 153    kind = str(event.get("event_type") or "")
 154    if kind in SUMMARY_EVENT_TYPES:
 155        return True
 156    if kind != "tool_result":
 157        return False
 158    metadata = event.get("metadata") if isinstance(event.get("metadata"), dict) else {}
 159    if str(metadata.get("status") or "") != "completed":
 160        return False
 161    title = str(event.get("title") or "")
 162    if title == "write_file":
 163        return True
 164    if title == "shell_exec":
 165        command = str(event_tool_args(metadata).get("command") or "")
 166        return bool(shell_write_target(command))
 167    return False
 168
 169
 170def latest_durable_outcome_line(events: list[dict[str, Any]], *, width: int) -> str:
 171    fallback: tuple[str, str, str] | None = None
 172    for event in reversed(events):
 173        parsed = model_update_event_parts(event, width=width)
 174        if not parsed:
 175            continue
 176        label, text, _clock = parsed
 177        if label == "DONE":
 178            fallback = fallback or parsed
 179            continue
 180        if label not in PRIMARY_OUTCOME_LABELS:
 181            continue
 182        prefix = f"{_muted('Outcome')} {_event_badge(label)} "
 183        return _fit_ansi(prefix + _one_line(text, max(12, width - len(_strip_ansi(prefix)))), width)
 184    if fallback:
 185        label, text, _clock = fallback
 186        prefix = f"{_muted('Outcome')} {_event_badge(label)} "
 187        return _fit_ansi(prefix + _one_line(text, max(12, width - len(_strip_ansi(prefix)))), width)
 188    return ""
 189
 190
 191def latest_hour_outcome_summary_line(events: list[dict[str, Any]], *, width: int) -> str:
 192    """Return a single compact count summary for the newest visible activity hour."""
 193
 194    buckets: dict[str, dict[str, int]] = {}
 195    order: list[str] = []
 196    for event in events:
 197        parsed = model_update_event_parts(event, width=max(width, 180), compact=False)
 198        if not parsed:
 199            continue
 200        label, _text, _clock = parsed
 201        if label not in SUMMARY_COUNT_LABELS:
 202            continue
 203        hour = event_hour(event)
 204        if hour not in buckets:
 205            buckets[hour] = {}
 206            order.append(hour)
 207        buckets[hour][label] = int(buckets[hour].get(label) or 0) + 1
 208    if not order:
 209        return ""
 210    summary = hourly_outcome_summary(buckets[order[-1]])
 211    if not summary:
 212        return ""
 213    prefix = f"{_muted('Latest hour')} "
 214    return _fit_ansi(prefix + _bold(_one_line(summary, max(12, width - len(_strip_ansi(prefix))))), width)
 215
 216
 217def visible_outcome_summary_line(events: list[dict[str, Any]], *, width: int) -> str:
 218    """Return a stable summary of the durable outcomes available to the pane."""
 219
 220    counts = outcome_counts(events, include_research=False, include_failures=True)
 221    summary = hourly_outcome_summary(counts)
 222    if not summary:
 223        return ""
 224    prefix = f"{_muted('Visible')} "
 225    return _fit_ansi(prefix + _bold(_one_line(summary, max(12, width - len(_strip_ansi(prefix))))), width)
 226
 227
 228def job_outcome_summary(events: list[dict[str, Any]], *, width: int) -> str:
 229    """Return a short per-job durable outcome mix for compact job cards."""
 230
 231    counts = outcome_counts(events, include_research=False, include_failures=False)
 232    summary = hourly_outcome_summary(counts)
 233    if not summary:
 234        return ""
 235    return _one_line(summary, width)
 236
 237
 238def outcome_counts(
 239    events: list[dict[str, Any]],
 240    *,
 241    include_research: bool,
 242    include_failures: bool,
 243) -> dict[str, int]:
 244    counts: dict[str, int] = {}
 245    for event in events:
 246        parsed = model_update_event_parts(event, width=220, compact=False)
 247        if not parsed:
 248            continue
 249        label, _text, _clock = parsed
 250        if label == "DONE" and not include_research:
 251            continue
 252        if label == "FAIL" and not include_failures:
 253            continue
 254        if label not in SUMMARY_COUNT_LABELS:
 255            continue
 256        counts[label] = int(counts.get(label) or 0) + 1
 257    return counts
 258
 259
 260def recent_model_update_lines(
 261    events: list[dict[str, Any]],
 262    *,
 263    width: int,
 264    limit: int,
 265    include_research: bool = False,
 266    wrap: bool = True,
 267) -> list[str]:
 268    """Render recent durable worker outcomes for the compact status pane."""
 269    if limit <= 0:
 270        return []
 271    lines: list[str] = []
 272    items: list[dict[str, Any]] = []
 273    index_by_key: dict[tuple[str, str], int] = {}
 274    for event in reversed(events):
 275        parsed = model_update_event_parts(event, width=max(width, 180), compact=False)
 276        if not parsed:
 277            continue
 278        label, text, clock = parsed
 279        if label == "DONE" and not include_research:
 280            continue
 281        if label not in PRIMARY_OUTCOME_LABELS and not (include_research and label == "DONE"):
 282            continue
 283        key = (label, text)
 284        if key in index_by_key:
 285            items[index_by_key[key]]["count"] = int(items[index_by_key[key]].get("count") or 1) + 1
 286            continue
 287        index_by_key[key] = len(items)
 288        items.append({"label": label, "text": text, "clock": clock, "count": 1})
 289        if len(items) >= max(limit * 2, limit + 8):
 290            break
 291    for item in items:
 292        label = str(item["label"])
 293        text = str(item["text"])
 294        clock = str(item["clock"])
 295        count = int(item.get("count") or 1)
 296        prefix = f"{_muted(clock)} {_event_badge(label)} " if clock else f"{_event_badge(label)} "
 297        prefix_width = len(_strip_ansi(prefix))
 298        available = max(12, width - prefix_width - 2)
 299        if count > 1:
 300            text = f"{text} x{count}"
 301        if not wrap:
 302            lines.append(_fit_ansi(prefix + _one_line(text, available), width))
 303            if len(lines) >= limit:
 304                return lines
 305            continue
 306        wrapped = textwrap.wrap(text, width=available) or [""]
 307        lines.append(_fit_ansi(prefix + wrapped[0], width))
 308        if len(lines) >= limit:
 309            return lines
 310        continuation_prefix = " " * prefix_width
 311        for part in wrapped[1:]:
 312            lines.append(_fit_ansi(continuation_prefix + part, width))
 313            if len(lines) >= limit:
 314                return lines
 315        if len(lines) >= limit:
 316            return lines
 317    return lines
 318
 319
 320def chat_updates_pane_lines(
 321    *,
 322    job: dict[str, Any],
 323    events: list[dict[str, Any]],
 324    width: int,
 325    rows: int,
 326) -> list[str]:
 327    lines = [
 328        f"{_muted('Page')}   {_page_indicator('updates', CHAT_RIGHT_PAGES)}",
 329        f"{_muted('Focus')}  {_bold(_one_line(job.get('title') or 'untitled', width - 8))}",
 330    ]
 331    counts = outcome_counts(events, include_research=False, include_failures=True)
 332    summary = hourly_outcome_summary(counts)
 333    if summary:
 334        lines.extend([*_wrapped_label_line("Visible", summary, width=width), ""])
 335    update_lines = recent_model_update_lines(events, width=width, limit=max(4, rows - len(lines)), wrap=False)
 336    if update_lines:
 337        lines.extend(update_lines)
 338    else:
 339        lines.extend(["", _muted("No model updates yet.")])
 340    return [_fit_ansi(line, width) for line in lines[:rows]]
 341
 342
 343def _wrapped_label_line(label: str, text: str, *, width: int) -> list[str]:
 344    prefix = f"{_muted(label)} "
 345    prefix_width = len(_strip_ansi(prefix))
 346    available = max(12, width - prefix_width)
 347    wrapped = textwrap.wrap(text, width=available) or [""]
 348    lines = [_fit_ansi(prefix + _bold(wrapped[0]), width)]
 349    continuation = " " * prefix_width
 350    for part in wrapped[1:]:
 351        lines.append(_fit_ansi(continuation + _bold(part), width))
 352    return lines
 353
 354
 355def hourly_update_lines(events: list[dict[str, Any]], *, width: int, limit: int) -> list[str]:
 356    if limit <= 0:
 357        return []
 358    buckets: dict[str, dict[str, Any]] = {}
 359    order: list[str] = []
 360    for event in events:
 361        parsed = model_update_event_parts(event, width=max(width, 220), compact=False)
 362        if not parsed:
 363            continue
 364        label, text, clock = parsed
 365        if label not in SUMMARY_COUNT_LABELS:
 366            continue
 367        hour = event_hour(event)
 368        if hour not in buckets:
 369            buckets[hour] = {"counts": {}, "items": [], "clock": clock}
 370            order.append(hour)
 371        bucket = buckets[hour]
 372        counts = bucket["counts"]
 373        counts[label] = int(counts.get(label) or 0) + 1
 374        item = (label, text)
 375        if item not in bucket["items"]:
 376            bucket["items"].append(item)
 377    rendered: list[str] = []
 378    # Each visible hour needs a header and at least a couple durable outcomes.
 379    # Showing too many buckets makes the pane churn and can trim off the hour
 380    # label, which is harder to scan during long-running jobs.
 381    max_visible_hours = max(1, min(len(order), max(1, limit // 4)))
 382    recent_hours = order[-max_visible_hours:]
 383    available_items = max(1, limit - len(recent_hours))
 384    per_bucket = max(1, min(6, available_items // max(1, len(recent_hours))))
 385    for hour in recent_hours:
 386        bucket = buckets[hour]
 387        counts = bucket["counts"]
 388        summary = hourly_outcome_summary(counts)
 389        rendered.append(_fit_ansi(f"{_muted(hour)} {_bold(summary or 'activity')}", width))
 390        primary_items = [item for item in bucket["items"] if item[0] in PRIMARY_OUTCOME_LABELS]
 391        visible_items = primary_items or bucket["items"]
 392        for label, text in visible_items[-per_bucket:]:
 393            prefix = f"  {_event_badge(label)} "
 394            available = max(16, width - len(_strip_ansi(prefix)))
 395            parts = textwrap.wrap(text, width=available) or [""]
 396            rendered.append(_fit_ansi(prefix + parts[0], width))
 397            for part in parts[1:]:
 398                rendered.append(_fit_ansi(" " * len(_strip_ansi(prefix)) + part, width))
 399                if len(rendered) >= limit:
 400                    return rendered[:limit]
 401        if len(rendered) >= limit:
 402            return rendered[:limit]
 403    return rendered[:limit]
 404
 405
 406def hourly_outcome_summary(counts: dict[str, Any]) -> str:
 407    pieces: list[str] = []
 408    ordered = [label for label in OUTCOME_LABEL_ORDER if label in counts]
 409    ordered.extend(sorted(label for label in counts if label not in set(OUTCOME_LABEL_ORDER)))
 410    for label in ordered:
 411        count = int(counts.get(label) or 0)
 412        if count <= 0:
 413            continue
 414        name = OUTCOME_SUMMARY_NAMES.get(label, label.lower())
 415        pieces.append(f"{count} {name}")
 416    return " ".join(pieces)
 417
 418
 419def _durable_progress_event_parts(
 420    metadata: dict[str, Any],
 421    *,
 422    body: str,
 423    chars: int,
 424    compact: bool,
 425    clock: str,
 426) -> tuple[str, str, str] | None:
 427    deltas = _count_map(metadata.get("deltas"))
 428    updates = _count_map(metadata.get("updates"))
 429    resolutions = _count_map(metadata.get("resolutions"))
 430    totals = {
 431        key: int(deltas.get(key) or 0) + int(updates.get(key) or 0) + int(resolutions.get(key) or 0)
 432        for key in set(deltas) | set(updates) | set(resolutions)
 433    }
 434    if not any(value > 0 for value in totals.values()):
 435        return None
 436    key = _dominant_progress_key(totals, resolutions=resolutions)
 437    if not key:
 438        return None
 439    label = _progress_label_for_key(key, resolution=bool(resolutions.get(key)))
 440    pieces: list[str] = []
 441    for record_key in ("findings", "experiments", "sources", "tasks", "milestones", "lessons"):
 442        if deltas.get(record_key):
 443            pieces.append(_progress_count_phrase(int(deltas[record_key]), record_key, prefix="+"))
 444        if updates.get(record_key):
 445            pieces.append(_progress_count_phrase(int(updates[record_key]), record_key, prefix="~", suffix="updated"))
 446        if resolutions.get(record_key):
 447            pieces.append(_progress_count_phrase(int(resolutions[record_key]), record_key, suffix="resolved"))
 448    detail = ", ".join(pieces)
 449    if body:
 450        detail = f"{detail} - {generic_display_text(body)}"
 451    return label, _outcome_text(detail, chars=chars, compact=compact), clock
 452
 453
 454def _count_map(value: Any) -> dict[str, int]:
 455    if not isinstance(value, dict):
 456        return {}
 457    result: dict[str, int] = {}
 458    for key, raw_count in value.items():
 459        try:
 460            count = int(raw_count)
 461        except (TypeError, ValueError):
 462            continue
 463        if count > 0:
 464            result[str(key)] = count
 465    return result
 466
 467
 468def _dominant_progress_key(totals: dict[str, int], *, resolutions: dict[str, int]) -> str:
 469    for key in ("experiments", "milestones", "findings", "sources", "tasks", "lessons"):
 470        if resolutions.get(key):
 471            return key
 472    for key in ("experiments", "findings", "milestones", "sources", "tasks", "lessons"):
 473        if totals.get(key):
 474            return key
 475    return ""
 476
 477
 478def _progress_label_for_key(key: str, *, resolution: bool) -> str:
 479    if key == "findings":
 480        return "FIND"
 481    if key == "sources":
 482        return "SOURCE"
 483    if key == "experiments":
 484        return "TEST"
 485    if key == "tasks":
 486        return "TASK"
 487    if key == "milestones":
 488        return "VALID" if resolution else "ROAD"
 489    if key == "lessons":
 490        return "LEARN"
 491    return "UPDATE"
 492
 493
 494def _progress_count_phrase(value: int, key: str, *, prefix: str = "", suffix: str = "") -> str:
 495    label = OUTCOME_SUMMARY_NAMES.get(_progress_label_for_key(key, resolution=False), key)
 496    if value == 1 and label.endswith("s"):
 497        label = label[:-1]
 498    parts = [f"{prefix}{value} {label}"]
 499    if suffix:
 500        parts.append(suffix)
 501    return " ".join(parts)
 502
 503
 504def _outcome_text(text: str, *, chars: int, compact: bool) -> str:
 505    clean = generic_display_text(text)
 506    if compact:
 507        return _one_line(clean, chars)
 508    return _one_line(clean, 900)
nipux_cli/tui_status.py 540 lines
   1"""Status and work-pane renderers for the Nipux terminal UI."""
   2
   3from __future__ import annotations
   4
   5import textwrap
   6from typing import Any
   7
   8from nipux_cli.config import AppConfig
   9from nipux_cli.operator_context import active_prompt_operator_entries
  10from nipux_cli.scheduling import job_deferred_until, job_provider_blocked
  11from nipux_cli.tui_event_format import experiment_metric_text
  12from nipux_cli.tui_events import (
  13    worker_activity_lines,
  14)
  15from nipux_cli.tui_outcomes import (
  16    CHAT_RIGHT_PAGES,
  17    job_outcome_summary,
  18    latest_durable_outcome_line,
  19    latest_hour_outcome_summary_line,
  20    model_update_event_parts,
  21    recent_model_update_lines,
  22)
  23from nipux_cli.tui_layout import _format_compact_count, _metric_strip
  24from nipux_cli.tui_style import (
  25    _accent,
  26    _bold,
  27    _event_badge,
  28    _fit_ansi,
  29    _muted,
  30    _one_line,
  31    _page_indicator,
  32    _status_badge,
  33)
  34
  35
  36def worker_label(job: dict[str, Any], daemon_running: bool) -> str:
  37    status = str(job.get("status") or "")
  38    if job_provider_blocked(job):
  39        return "provider wait"
  40    if status == "planning":
  41        return "waiting"
  42    if status in {"paused", "completed", "cancelled", "failed"}:
  43        return status
  44    if job_deferred_until(job):
  45        return "waiting"
  46    return "active" if daemon_running and status in {"running", "queued"} else "idle"
  47
  48
  49def job_display_state(job: dict[str, Any], daemon_running: bool) -> str:
  50    status = str(job.get("status") or "")
  51    if job_provider_blocked(job):
  52        return "provider wait"
  53    if status in {"running", "queued"}:
  54        if job_deferred_until(job):
  55            return "waiting"
  56        return "advancing" if daemon_running else "open"
  57    return status or "unknown"
  58
  59
  60def active_operator_messages(metadata: dict[str, Any]) -> list[dict[str, Any]]:
  61    messages = metadata.get("operator_messages") if isinstance(metadata.get("operator_messages"), list) else []
  62    return [
  63        entry
  64        for entry in messages
  65        if isinstance(entry, dict)
  66        and entry in active_prompt_operator_entries(messages)
  67        and str(entry.get("mode") or "steer") in {"steer", "follow_up"}
  68    ]
  69
  70
  71def right_pane_lines(
  72    *,
  73    job: dict[str, Any],
  74    jobs: list[dict[str, Any]],
  75    job_artifacts: dict[str, list[dict[str, Any]]],
  76    job_summary_events: dict[str, list[dict[str, Any]]],
  77    job_counts: dict[str, dict[str, Any]],
  78    job_id: str,
  79    daemon_running: bool,
  80    state: str,
  81    worker: str,
  82    daemon_text: str,
  83    model: str,
  84    goal_text: str,
  85    latest_text: str,
  86    metrics: list[tuple[str, Any]],
  87    events: list[dict[str, Any]],
  88    token_usage: dict[str, Any],
  89    context_length: int,
  90    width: int,
  91    rows: int,
  92    right_view: str = "status",
  93) -> list[str]:
  94    del model, latest_text, daemon_text
  95    if _is_workspace_placeholder(job) and not jobs:
  96        return _empty_workspace_status_lines(right_view=right_view, width=width, rows=rows)
  97    info_lines = _chat_workspace_lines(
  98        right_view=right_view,
  99        job=job,
 100        state=state,
 101        worker=worker,
 102        goal_text=goal_text,
 103        token_usage=token_usage,
 104        context_length=context_length,
 105        width=width,
 106    )
 107    metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
 108    active_operator = active_operator_messages(metadata)
 109    pending_measurement = (
 110        metadata.get("pending_measurement_obligation")
 111        if isinstance(metadata.get("pending_measurement_obligation"), dict)
 112        else {}
 113    )
 114    if active_operator:
 115        info_lines.append(f"{_muted('Operator')} {len(active_operator)} active")
 116        info_lines.append(f"{_muted('Context')} {_one_line(active_operator[-1].get('message') or '', width - 8)}")
 117    if pending_measurement:
 118        info_lines.append(f"{_muted('Measure')} pending step #{pending_measurement.get('source_step_no') or '?'}")
 119    if job_provider_blocked(job):
 120        info_lines.append(_fit_ansi(f"{_muted('Provider')} action needed before retrying model calls", width))
 121    defer_line = _defer_status_line(job, width=width)
 122    if defer_line:
 123        info_lines.append(defer_line)
 124    spacious = rows >= 18
 125    if spacious:
 126        info_lines.append("")
 127        info_lines.append(_bold("Now"))
 128    latest_hour = latest_hour_outcome_summary_line(events, width=width) if rows >= 18 else ""
 129    if latest_hour:
 130        info_lines.append(latest_hour)
 131    latest_outcome = latest_durable_outcome_line(events, width=width)
 132    if latest_outcome:
 133        info_lines.append(latest_outcome)
 134    if spacious and not latest_hour and not latest_outcome:
 135        info_lines.append(_muted("No durable outcome yet."))
 136    if spacious:
 137        info_lines.append("")
 138        info_lines.append(_bold("Progress"))
 139        info_lines.extend(_metrics_grid_lines(metrics, width=width))
 140        yield_line = _yield_line(metrics, width=width)
 141        if yield_line:
 142            info_lines.append(yield_line)
 143    else:
 144        info_lines.append(_metric_strip(metrics[:5], width=width))
 145    info_lines.append("")
 146    info_lines.append(_bold("Jobs"))
 147    info_lines.extend(
 148        frame_jobs_lines(
 149            jobs[:5],
 150            focused_job_id=job_id,
 151            daemon_running=daemon_running,
 152            width=width,
 153            job_artifacts=job_artifacts,
 154            job_summary_events=job_summary_events,
 155            job_counts=job_counts,
 156            show_outputs=True,
 157        )
 158    )
 159    info_lines.append("")
 160    info_lines.append(_bold("Recent outcomes"))
 161    outcome_lines = recent_model_update_lines(events, width=width, limit=max(3, rows - len(info_lines)))
 162    if outcome_lines:
 163        info_lines.extend(outcome_lines)
 164    else:
 165        current_outputs = job_artifacts.get(job_id) or []
 166        if current_outputs:
 167            for artifact in current_outputs[:4]:
 168                title = _one_line(str(artifact.get("title") or artifact.get("id") or "output"), max(10, width - 8))
 169                info_lines.append(_fit_ansi(f"{_event_badge('SAVE')} {title}", width))
 170        else:
 171            info_lines.append(_muted("No durable outcomes yet."))
 172    return info_lines[:rows]
 173
 174
 175def _is_workspace_placeholder(job: dict[str, Any]) -> bool:
 176    return str(job.get("kind") or "") == "workspace"
 177
 178
 179def _empty_workspace_status_lines(*, right_view: str, width: int, rows: int) -> list[str]:
 180    lines = [
 181        f"{_muted('Page')}   {_page_indicator(right_view, CHAT_RIGHT_PAGES)}",
 182        _bold("No workers yet"),
 183        _muted("Type a goal in chat to start one."),
 184        "",
 185        f"{_muted('Start')}  {_bold('plain English goal')} or {_bold('/new OBJECTIVE')}",
 186        f"{_muted('Setup')}  {_bold('/settings')}",
 187        f"{_muted('Check')}  {_bold('/doctor')}",
 188    ]
 189    return [_fit_ansi(line, width) for line in lines[:rows]]
 190
 191
 192def chat_work_pane_lines(
 193    *,
 194    job: dict[str, Any],
 195    events: list[dict[str, Any]],
 196    tasks: list[dict[str, Any]],
 197    experiments: list[dict[str, Any]],
 198    width: int,
 199    rows: int,
 200) -> list[str]:
 201    lines = [
 202        f"{_muted('Page')}   {_page_indicator('work', CHAT_RIGHT_PAGES)}",
 203        f"{_muted('Focus')}  {_bold(_one_line(job.get('title') or 'untitled', width - 8))}",
 204    ]
 205    done_lines = recent_model_update_lines(events, width=width, limit=max(2, rows // 4))
 206    if done_lines:
 207        lines.extend(["", _bold("Done")])
 208        lines.extend(done_lines)
 209    lines.extend([
 210        "",
 211        _bold("Tool / console"),
 212    ])
 213    tool_budget = max(3, min(max(4, rows // 3), rows - len(lines) - 5))
 214    tool_lines = worker_activity_lines(events, width=width, limit=tool_budget)
 215    if tool_lines:
 216        lines.extend(tool_lines)
 217    else:
 218        lines.append(_muted("No recent tool calls."))
 219    remaining = max(0, rows - len(lines))
 220    if remaining > 4:
 221        lines.append("")
 222        lines.append(_bold("Tasks"))
 223        for task in _rank_visible_tasks(tasks)[: max(1, remaining // 2)]:
 224            status = str(task.get("status") or "open")
 225            title = _one_line(str(task.get("title") or "task"), max(10, width - 15))
 226            lines.append(_fit_ansi(f"{_status_badge(status)} {title}", width))
 227    remaining = max(0, rows - len(lines))
 228    if remaining > 3 and experiments:
 229        lines.append("")
 230        lines.append(_bold("Measurements"))
 231        for experiment in experiments[-max(1, remaining - 2) :]:
 232            metric = experiment_metric_text(experiment)
 233            title = _one_line(str(experiment.get("title") or "experiment"), max(10, width - 16))
 234            suffix = f" {_muted(metric)}" if metric else ""
 235            lines.append(_fit_ansi(f"{_event_badge('TEST')} {title}{suffix}", width))
 236    return [_fit_ansi(line, width) for line in lines[:rows]]
 237
 238
 239def chat_settings_pane_lines(
 240    *,
 241    config: AppConfig,
 242    width: int,
 243    rows: int,
 244) -> list[str]:
 245    key_state = "set" if config.model.api_key else "missing"
 246    input_cost = _rate_text(config.model.input_cost_per_million)
 247    output_cost = _rate_text(config.model.output_cost_per_million)
 248    lines = [
 249        f"{_muted('Page')}   {_page_indicator('settings', CHAT_RIGHT_PAGES)}",
 250        _bold("Model"),
 251        _setting_line("id", config.model.model, command="/model MODEL", width=width),
 252        _setting_line("endpoint", config.model.base_url, command="/base-url URL", width=width),
 253        _setting_line("key", f"{key_state} via {config.model.api_key_env}", command="/api-key KEY", width=width),
 254        _setting_line("context", str(config.model.context_length), command="/context TOKENS", width=width),
 255        "",
 256        _bold("Runtime"),
 257        _setting_line("home", str(config.runtime.home), command="/home PATH", width=width),
 258        _setting_line("step", f"{config.runtime.max_step_seconds}s", command="/step-limit SECONDS", width=width),
 259        _setting_line("preview", f"{config.runtime.artifact_inline_char_limit} chars", command="/output-chars CHARS", width=width),
 260        "",
 261        _bold("Cost"),
 262        _setting_line("input", input_cost, command="/input-cost DOLLARS", width=width),
 263        _setting_line("output", output_cost, command="/output-cost DOLLARS", width=width),
 264        "",
 265        _bold("Digest"),
 266        _setting_line(
 267            "daily",
 268            f"{config.runtime.daily_digest_enabled} at {config.runtime.daily_digest_time}",
 269            command="/daily-digest true|false",
 270            width=width,
 271        ),
 272        _muted("Type a command in the composer to edit."),
 273    ]
 274    return [_fit_ansi(line, width) for line in lines[:rows]]
 275
 276
 277def frame_jobs_lines(
 278    jobs: list[dict[str, Any]],
 279    *,
 280    focused_job_id: str,
 281    daemon_running: bool,
 282    width: int,
 283    job_artifacts: dict[str, list[dict[str, Any]]] | None = None,
 284    job_summary_events: dict[str, list[dict[str, Any]]] | None = None,
 285    job_counts: dict[str, dict[str, Any]] | None = None,
 286    show_outputs: bool = False,
 287) -> list[str]:
 288    rendered = []
 289    for index, item in enumerate(jobs[:5], start=1):
 290        item_id = str(item.get("id") or "")
 291        marker = _accent("●") if item_id == focused_job_id else _muted("○")
 292        title_width = max(14, min(30, width - 34))
 293        title = _one_line(str(item.get("title") or item.get("id") or "job"), title_width)
 294        state = _status_badge(job_display_state(item, daemon_running))
 295        worker = _status_badge(worker_label(item, daemon_running))
 296        kind = _one_line(item.get("kind") or "", max(0, width - title_width - 33))
 297        rendered.append(
 298            _fit_ansi(
 299                f"{marker} {index:<2} {_fit_ansi(title, title_width)} "
 300                f"{_fit_ansi(state, 10)} {_fit_ansi(worker, 10)} {kind}",
 301                width,
 302            )
 303        )
 304        if show_outputs:
 305            rendered.extend(_job_compact_work_lines(
 306                outputs=(job_artifacts or {}).get(item_id) or [],
 307                counts=(job_counts or {}).get(item_id) or {},
 308                events=(job_summary_events or {}).get(item_id) or [],
 309                width=width,
 310                focused=item_id == focused_job_id,
 311            ))
 312    return rendered
 313
 314
 315def _job_compact_work_lines(
 316    *,
 317    outputs: list[dict[str, Any]],
 318    counts: dict[str, Any],
 319    events: list[dict[str, Any]],
 320    width: int,
 321    focused: bool = False,
 322) -> list[str]:
 323    lines: list[str] = []
 324    summary = job_outcome_summary(events, width=max(12, width - 13))
 325    if summary:
 326        lines.append(_fit_ansi(f"   {_muted('work')} {_bold(summary)}", width))
 327    if outputs:
 328        latest = outputs[0]
 329        output_total = int(counts.get("artifacts") or len(outputs))
 330        output_count = f"{output_total} outputs" if output_total != 1 else "1 output"
 331        title_budget = max(12, width - 13 - len(output_count))
 332        output_title = _one_line(str(latest.get("title") or latest.get("id") or "saved output"), title_budget)
 333        lines.append(_fit_ansi(f"   {_muted('made')} {_bold(output_count)} · {output_title}", width))
 334        if focused and len(outputs) > 1 and width >= 42:
 335            second = outputs[1]
 336            second_title = _one_line(str(second.get("title") or second.get("id") or "saved output"), max(12, width - 10))
 337            lines.append(_fit_ansi(f"   {_muted('also')} {second_title}", width))
 338    for outcome in _job_recent_non_output_pieces(
 339        events,
 340        width=max(12, width - 10),
 341        skip_save=bool(outputs),
 342        limit=2 if focused else 1,
 343    ):
 344        lines.append(_fit_ansi(f"   {_muted('did')}  {outcome}", width))
 345    return lines
 346
 347
 348def _job_recent_non_output_pieces(
 349    events: list[dict[str, Any]],
 350    *,
 351    width: int,
 352    skip_save: bool,
 353    limit: int,
 354) -> list[str]:
 355    pieces: list[str] = []
 356    seen: set[str] = set()
 357    for event in reversed(events):
 358        parsed = model_update_event_parts(event, width=max(width, 120))
 359        if not parsed:
 360            continue
 361        label, text, _clock = parsed
 362        if label == "DONE":
 363            continue
 364        if skip_save and label == "SAVE":
 365            continue
 366        prefix = _compact_outcome_label(label)
 367        piece = f"{_muted(prefix)} {_one_line(text, max(12, width - len(prefix) - 1))}"
 368        dedupe_key = _one_line(f"{prefix} {text}", 120)
 369        if dedupe_key in seen:
 370            continue
 371        seen.add(dedupe_key)
 372        pieces.append(piece)
 373        if len(pieces) >= max(1, limit):
 374            break
 375    return pieces
 376
 377
 378def _compact_outcome_label(label: str) -> str:
 379    return {
 380        "FIND": "find",
 381        "SOURCE": "src",
 382        "TEST": "test",
 383        "TASK": "task",
 384        "ROAD": "road",
 385        "VALID": "valid",
 386        "LEARN": "learn",
 387        "FILE": "file",
 388        "SAVE": "out",
 389        "FAIL": "fail",
 390        "PLAN": "plan",
 391        "UPDATE": "note",
 392    }.get(label, label.lower())
 393
 394
 395def _defer_status_line(job: dict[str, Any], *, width: int) -> str:
 396    until = job_deferred_until(job)
 397    if not until:
 398        return ""
 399    metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
 400    reason = str(metadata.get("defer_reason") or metadata.get("defer_next_action") or "").strip()
 401    time_text = until.astimezone().strftime("%b %d %H:%M")
 402    detail = f"next check {time_text}"
 403    if reason:
 404        detail += f" - {reason}"
 405    return _fit_ansi(f"{_muted('Wait')}   {_one_line(detail, max(12, width - 7))}", width)
 406
 407
 408def _rank_visible_tasks(tasks: list[dict[str, Any]]) -> list[dict[str, Any]]:
 409    status_order = {"active": 0, "open": 1, "blocked": 2, "validating": 3, "done": 4, "skipped": 5}
 410    return sorted(
 411        [task for task in tasks if isinstance(task, dict)],
 412        key=lambda task: (
 413            status_order.get(str(task.get("status") or "open"), 9),
 414            -int(task.get("priority") or 0),
 415            str(task.get("title") or ""),
 416        ),
 417    )
 418
 419
 420def _chat_workspace_lines(
 421    *,
 422    right_view: str,
 423    job: dict[str, Any],
 424    state: str,
 425    worker: str,
 426    goal_text: str,
 427    token_usage: dict[str, Any],
 428    context_length: int,
 429    width: int,
 430) -> list[str]:
 431    goal_lines = textwrap.wrap(goal_text, width=max(20, width - 8))[:2] or [""]
 432    while len(goal_lines) < 2:
 433        goal_lines.append("")
 434    title = _one_line(str(job.get("title") or "untitled"), max(10, width))
 435    lines = [
 436        f"{_muted('Page')}   {_page_indicator(right_view, CHAT_RIGHT_PAGES)}",
 437        _bold(title),
 438        f"{_muted('State')}  {_status_badge(state)}  {_muted('worker')} {_status_badge(worker)}",
 439        f"{_muted('Goal')}   {goal_lines[0]}",
 440        f"{_muted('       ')}{goal_lines[1]}",
 441    ]
 442    task_line = _current_task_line(job, width=width)
 443    if task_line:
 444        lines.append(task_line)
 445    context_line = _context_pressure_line(token_usage, context_length=context_length, width=width)
 446    if context_line:
 447        lines.append(context_line)
 448    return lines
 449
 450
 451def _current_task_line(job: dict[str, Any], *, width: int) -> str:
 452    metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
 453    tasks = metadata.get("task_queue") if isinstance(metadata.get("task_queue"), list) else []
 454    visible = [
 455        task
 456        for task in tasks
 457        if isinstance(task, dict)
 458        and str(task.get("status") or "open") in {"active", "open", "blocked"}
 459    ]
 460    if not visible:
 461        return ""
 462    ranked = _rank_visible_tasks(visible)
 463    task = ranked[0]
 464    status = str(task.get("status") or "open")
 465    title = _one_line(str(task.get("title") or "task"), max(12, width - 16))
 466    return _fit_ansi(f"{_muted('Task')}   {_status_badge(status)} {title}", width)
 467
 468
 469def _context_pressure_line(usage: dict[str, Any], *, context_length: int, width: int) -> str:
 470    latest_prompt = _safe_int(usage.get("latest_prompt_tokens"))
 471    context_limit = _safe_int(usage.get("latest_context_length")) or context_length
 472    if latest_prompt <= 0 or context_limit <= 0:
 473        return ""
 474    fraction = latest_prompt / max(1, context_limit)
 475    if fraction < 0.65:
 476        return ""
 477    label = "high" if fraction >= 0.85 else "watch" if fraction >= 0.65 else "ok"
 478    detail = (
 479        f"{_format_compact_count(latest_prompt)}/{_format_compact_count(context_limit)} "
 480        f"{fraction:.0%} {label}"
 481    )
 482    return _fit_ansi(f"{_muted('Context')} {_one_line(detail, max(12, width - 8))}", width)
 483
 484
 485def _metrics_grid_lines(metrics: list[tuple[str, Any]], *, width: int) -> list[str]:
 486    wanted = ["actions", "outputs", "findings", "sources", "tasks", "experiments", "memory"]
 487    lookup = {label: value for label, value in metrics}
 488    items = [(label, lookup[label]) for label in wanted if label in lookup]
 489    if width < 40:
 490        return [_metric_strip(items, width=width)]
 491    lines: list[str] = []
 492    col_width = max(16, (width - 2) // 2)
 493    for index in range(0, len(items), 2):
 494        left = _metric_cell(items[index], width=col_width)
 495        right = _metric_cell(items[index + 1], width=col_width) if index + 1 < len(items) else ""
 496        lines.append(_fit_ansi(left + "  " + right, width))
 497    return lines
 498
 499
 500def _metric_cell(item: tuple[str, Any], *, width: int) -> str:
 501    label, value = item
 502    return _fit_ansi(f"{_muted(label)} {_bold(value)}", width)
 503
 504
 505def _yield_line(metrics: list[tuple[str, Any]], *, width: int) -> str:
 506    lookup = {label: value for label, value in metrics}
 507    actions = _safe_int(lookup.get("actions"))
 508    if actions < 20:
 509        return ""
 510    outputs = _safe_int(lookup.get("outputs"))
 511    findings = _safe_int(lookup.get("findings"))
 512    sources = _safe_int(lookup.get("sources"))
 513    experiments = _safe_int(lookup.get("experiments"))
 514    durable = outputs + findings + sources + experiments
 515    if durable <= 0:
 516        return _fit_ansi(f"{_muted('Yield')}  {_status_badge('blocked')} no durable outcomes after {actions} actions", width)
 517    actions_per = actions / durable
 518    label = "watch" if actions_per >= 25 else "ok"
 519    if actions_per < 8:
 520        return ""
 521    detail = f"{actions_per:.1f} actions/outcome"
 522    return _fit_ansi(f"{_muted('Yield')}  {_status_badge(label)} {detail}", width)
 523
 524
 525def _setting_line(label: str, value: str, *, command: str, width: int) -> str:
 526    left = f"{_muted(label)} {_bold(_one_line(value, max(8, width - 24)))}"
 527    if width < 46:
 528        return _fit_ansi(left, width)
 529    return _fit_ansi(left + "  " + _muted(command), width)
 530
 531
 532def _rate_text(value: float | None) -> str:
 533    return "provider-reported" if value is None else f"${value:g}/1M"
 534
 535
 536def _safe_int(value: Any) -> int:
 537    try:
 538        return int(float(value))
 539    except (TypeError, ValueError):
 540        return 0
nipux_cli/tui_style.py 153 lines
   1"""Small terminal styling helpers shared by the CLI frame renderers."""
   2
   3from __future__ import annotations
   4
   5import os
   6import re
   7import sys
   8from typing import Any
   9
  10
  11def _fancy_ui() -> bool:
  12    return (
  13        sys.stdout.isatty()
  14        and os.environ.get("NO_COLOR") is None
  15        and os.environ.get("NIPUX_PLAIN") is None
  16        and os.environ.get("TERM", "") not in {"", "dumb"}
  17    )
  18
  19
  20def _style(text: Any, code: str) -> str:
  21    value = str(text)
  22    if not _fancy_ui():
  23        return value
  24    return f"\033[{code}m{value}\033[0m"
  25
  26
  27def _accent(text: Any) -> str:
  28    return _style(text, "38;5;123")
  29
  30
  31def _muted(text: Any) -> str:
  32    return _style(text, "38;5;248")
  33
  34
  35def _bold(text: Any) -> str:
  36    return _style(text, "1")
  37
  38
  39def _one_line(value: Any, width: int) -> str:
  40    text = " ".join(str(value).split())
  41    if len(text) <= width:
  42        return text
  43    return text[: max(0, width - 3)] + "..."
  44
  45
  46def _strip_ansi(text: str) -> str:
  47    return re.sub(r"\x1b\[[0-9;]*m", "", text)
  48
  49
  50def _fit_ansi(text: Any, width: int) -> str:
  51    width = max(0, int(width))
  52    content = str(text)
  53    visible = _strip_ansi(content)
  54    if len(visible) > width:
  55        content = _one_line(visible, width)
  56        visible = content
  57    return content + " " * max(0, width - len(visible))
  58
  59
  60def _center_ansi(text: str, width: int) -> str:
  61    text_width = len(_strip_ansi(text))
  62    if text_width >= width:
  63        return _fit_ansi(text, width)
  64    left_pad = max(0, (width - text_width) // 2)
  65    return _fit_ansi(" " * left_pad + text, width)
  66
  67
  68def _themed_lines(lines: list[str], *, width: int) -> list[str]:
  69    if not _fancy_ui():
  70        return [_fit_ansi(line, width) for line in lines]
  71    bg = "\033[48;5;234m\033[38;5;252m"
  72    reset = "\033[0m"
  73    return [bg + _fit_ansi(line, width).replace(reset, reset + bg) + reset for line in lines]
  74
  75
  76def _frame_enter_sequence() -> str:
  77    theme = "\033[48;5;234m\033[38;5;252m" if _fancy_ui() else ""
  78    return f"\033[?1049h{theme}\033[2J\033[H\033[?25l\033[?1000h\033[?1002h\033[?1006h"
  79
  80
  81def _frame_exit_sequence() -> str:
  82    return "\033[?1006l\033[?1002l\033[?1000l\033[?25h\033[0m\033[?1049l"
  83
  84
  85def _page_indicator(active: str, pages: list[tuple[str, str]]) -> str:
  86    parts: list[str] = []
  87    for key, label in pages:
  88        if key == active:
  89            parts.append(f"{_accent('●')} {_bold(label)}")
  90        else:
  91            parts.append(f"{_muted('○')} {_muted(label)}")
  92    return "  ".join(parts)
  93
  94
  95def _status_badge(value: Any) -> str:
  96    text = str(value)
  97    color = {
  98        "active": "32",
  99        "advancing": "32",
 100        "running": "32",
 101        "queued": "33",
 102        "planning": "35",
 103        "waiting": "35",
 104        "open": "33",
 105        "idle": "33",
 106        "paused": "33",
 107        "cancelled": "31",
 108        "failed": "31",
 109        "completed": "36",
 110        "ok": "32",
 111        "watch": "33",
 112        "ready": "32",
 113        "switch": "36",
 114        "missing": "31",
 115        "check": "33",
 116        "next": "35",
 117        "recommended": "36",
 118    }.get(text, "37")
 119    return _style(text, color)
 120
 121
 122def _event_badge(label: str) -> str:
 123    padded = f"{label:<8}"
 124    color = {
 125        "AGENT": "36",
 126        "USER": "35",
 127        "FOLLOW": "35",
 128        "YOU": "35",
 129        "NIPUX": "36",
 130        "RUN": "34",
 131        "TOOL": "34",
 132        "DONE": "32",
 133        "FILE": "32",
 134        "SAVE": "32",
 135        "OUTPUT": "32",
 136        "FIND": "32",
 137        "SOURCE": "36",
 138        "TASK": "33",
 139        "ROAD": "35",
 140        "VALID": "35",
 141        "TEST": "33",
 142        "UPDATE": "36",
 143        "ACK": "36",
 144        "FAIL": "31",
 145        "LEARN": "36",
 146        "PLAN": "36",
 147        "DIGEST": "36",
 148        "MEMORY": "36",
 149        "SYSTEM": "2",
 150        "BLOCK": "33",
 151        "ERROR": "31",
 152    }.get(label, "37")
 153    return _style(padded, color)
nipux_cli/uninstall.py 217 lines
   1"""Uninstall helpers for local Nipux runtime state."""
   2
   3from __future__ import annotations
   4
   5import os
   6import shutil
   7import subprocess
   8from dataclasses import dataclass
   9from pathlib import Path
  10from typing import Callable, Sequence
  11
  12from nipux_cli.config import get_agent_home
  13from nipux_cli.service_install import launch_agent_path, systemd_service_path
  14
  15
  16Runner = Callable[..., subprocess.CompletedProcess[str]]
  17CommandRunner = Callable[[Sequence[str]], subprocess.CompletedProcess[str]]
  18
  19
  20@dataclass(frozen=True)
  21class UninstallPlan:
  22    paths: tuple[Path, ...]
  23    service_paths: tuple[Path, ...]
  24
  25
  26def build_uninstall_plan(*, runtime_home: Path | None = None, include_legacy: bool = True) -> UninstallPlan:
  27    """Return all local runtime paths that a full uninstall should remove."""
  28
  29    homes = [runtime_home.expanduser() if runtime_home else get_agent_home(), get_agent_home(), Path.home() / ".nipux"]
  30    if include_legacy:
  31        homes.append(Path.home() / ".kneepucks")
  32    paths = tuple(_dedupe_paths(homes))
  33    service_paths = tuple(_dedupe_paths([launch_agent_path(), systemd_service_path()]))
  34    return UninstallPlan(paths=paths, service_paths=service_paths)
  35
  36
  37def uninstall_runtime(
  38    *,
  39    runtime_home: Path | None = None,
  40    dry_run: bool = False,
  41    include_legacy: bool = True,
  42    runner: Runner = subprocess.run,
  43) -> list[str]:
  44    """Remove local Nipux state, logs, service files, and legacy state dirs."""
  45
  46    plan = build_uninstall_plan(runtime_home=runtime_home, include_legacy=include_legacy)
  47    lines: list[str] = []
  48    lines.extend(_disable_services(dry_run=dry_run, runner=runner))
  49    for path in (*plan.service_paths, *plan.paths):
  50        target = path.expanduser()
  51        _assert_safe_delete_target(target)
  52        if dry_run:
  53            lines.append(f"would remove {target}")
  54            continue
  55        if target.is_dir() and not target.is_symlink():
  56            shutil.rmtree(target)
  57            lines.append(f"removed {target}")
  58        elif target.exists() or target.is_symlink():
  59            target.unlink()
  60            lines.append(f"removed {target}")
  61        else:
  62            lines.append(f"not found {target}")
  63    return lines
  64
  65
  66def uninstall_installed_tool(
  67    *,
  68    dry_run: bool = False,
  69    runner: CommandRunner | None = None,
  70) -> tuple[int, list[str]]:
  71    """Remove the installed `nipux` command from common uv-tool locations."""
  72
  73    uv = shutil.which("uv")
  74    run = runner or _run_command
  75    lines: list[str] = []
  76    if dry_run:
  77        lines.append("would run uv tool uninstall nipux")
  78        for path in installed_tool_paths():
  79            lines.append(f"would remove installed command path {path}")
  80        return 0, lines
  81    if uv:
  82        result = run([uv, "tool", "uninstall", "nipux"])
  83        lines.extend(_process_lines(result))
  84        if result.returncode == 0:
  85            if not lines:
  86                lines.append("removed installed nipux command")
  87            return 0, lines
  88        lines.append("uv tool uninstall failed; checking safe local tool paths")
  89    else:
  90        lines.append("uv not found; checking safe local tool paths")
  91
  92    removed = False
  93    errors = 0
  94    for path in installed_tool_paths():
  95        try:
  96            if path.is_dir() and not path.is_symlink():
  97                shutil.rmtree(path)
  98                lines.append(f"removed {path}")
  99                removed = True
 100            elif path.exists() or path.is_symlink():
 101                path.unlink()
 102                lines.append(f"removed {path}")
 103                removed = True
 104        except OSError as exc:
 105            lines.append(f"failed to remove {path}: {exc}")
 106            errors += 1
 107    if removed and not errors:
 108        return 0, lines
 109    if not removed:
 110        lines.append("installed nipux command not found")
 111    return (1 if errors else 0), lines
 112
 113
 114def installed_tool_paths() -> tuple[Path, ...]:
 115    """Return safe user-level paths for uv-tool Nipux installs."""
 116
 117    home = Path.home().expanduser().resolve(strict=False)
 118    candidates = [
 119        home / ".local" / "bin" / "nipux",
 120        home / ".local" / "share" / "uv" / "tools" / "nipux",
 121    ]
 122    current = shutil.which("nipux")
 123    if current:
 124        candidates.append(Path(current))
 125    safe: list[Path] = []
 126    for path in _dedupe_paths(candidates):
 127        expanded = path.expanduser()
 128        if _is_safe_installed_tool_path(expanded, home=home):
 129            safe.append(expanded)
 130    return tuple(safe)
 131
 132
 133def _disable_services(*, dry_run: bool, runner: Runner) -> list[str]:
 134    lines: list[str] = []
 135    launch_path = launch_agent_path()
 136    label = "gui/" + str(os.getuid()) + "/com.nipux.agent"
 137    launchctl = shutil.which("launchctl")
 138    if dry_run:
 139        lines.append(f"would unload launchd {label}")
 140    elif launchctl:
 141        runner([launchctl, "bootout", label], check=False, stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
 142        lines.append(f"unloaded launchd {label}")
 143    else:
 144        lines.append("launchd unavailable")
 145
 146    systemctl = shutil.which("systemctl")
 147    service_path = systemd_service_path()
 148    if systemctl and service_path.exists():
 149        if dry_run:
 150            lines.append("would disable systemd user service nipux.service")
 151        else:
 152            runner(
 153                [systemctl, "--user", "disable", "--now", "nipux.service"],
 154                check=False,
 155                stdout=subprocess.DEVNULL,
 156                stderr=subprocess.DEVNULL,
 157            )
 158            runner([systemctl, "--user", "daemon-reload"], check=False, stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
 159            lines.append("disabled systemd user service nipux.service")
 160    elif service_path.exists():
 161        lines.append("systemd unavailable; removing service file only")
 162
 163    if not launch_path.exists() and not service_path.exists():
 164        lines.append("no installed service files found")
 165    return lines
 166
 167
 168def _run_command(command: Sequence[str]) -> subprocess.CompletedProcess[str]:
 169    return subprocess.run(
 170        list(command),
 171        text=True,
 172        stdout=subprocess.PIPE,
 173        stderr=subprocess.STDOUT,
 174        check=False,
 175    )
 176
 177
 178def _process_lines(process: subprocess.CompletedProcess[str]) -> list[str]:
 179    output = process.stdout if isinstance(process.stdout, str) else ""
 180    stderr = process.stderr if isinstance(process.stderr, str) else ""
 181    return [line.rstrip() for line in f"{output}\n{stderr}".splitlines() if line.strip()]
 182
 183
 184def _dedupe_paths(paths: list[Path]) -> list[Path]:
 185    seen: set[str] = set()
 186    result: list[Path] = []
 187    for path in paths:
 188        key = str(path.expanduser())
 189        if key in seen:
 190            continue
 191        seen.add(key)
 192        result.append(path)
 193    return result
 194
 195
 196def _assert_safe_delete_target(path: Path) -> None:
 197    resolved = path.expanduser().resolve(strict=False)
 198    home = Path.home().resolve(strict=False)
 199    forbidden = {Path("/").resolve(strict=False), home}
 200    if resolved in forbidden:
 201        raise ValueError(f"refusing to remove unsafe path: {path}")
 202    if len(resolved.parts) < 3:
 203        raise ValueError(f"refusing to remove broad path: {path}")
 204
 205
 206def _is_safe_installed_tool_path(path: Path, *, home: Path) -> bool:
 207    expanded = path.expanduser()
 208    resolved = expanded.resolve(strict=False)
 209    user_bin = home / ".local" / "bin" / "nipux"
 210    uv_tool_root = home / ".local" / "share" / "uv" / "tools" / "nipux"
 211    return (
 212        expanded == user_bin
 213        or resolved == user_bin
 214        or expanded == uv_tool_root
 215        or resolved == uv_tool_root
 216        or uv_tool_root in resolved.parents
 217    )
nipux_cli/updater.py 179 lines
   1"""Self-update helpers for source checkouts and installed tools."""
   2
   3from __future__ import annotations
   4
   5import os
   6import shutil
   7import subprocess
   8from collections.abc import Callable, Sequence
   9from pathlib import Path
  10
  11
  12GitRunner = Callable[[Sequence[str], Path], subprocess.CompletedProcess[str]]
  13CommandRunner = Callable[[Sequence[str]], subprocess.CompletedProcess[str]]
  14
  15DEFAULT_UPDATE_REPO = "https://github.com/nipuxx/agent-cli.git"
  16DEFAULT_UPDATE_REF = "main"
  17
  18
  19def find_checkout_root(start: str | Path | None = None) -> Path | None:
  20    """Return the nearest enclosing git checkout for the Nipux install."""
  21
  22    current = Path(start).expanduser().resolve() if start else Path(__file__).resolve()
  23    if current.is_file():
  24        current = current.parent
  25    for candidate in (current, *current.parents):
  26        if (candidate / ".git").exists():
  27            return candidate
  28    return None
  29
  30
  31def update_checkout(
  32    *,
  33    path: str | Path | None = None,
  34    allow_dirty: bool = False,
  35    runner: GitRunner | None = None,
  36    command_runner: CommandRunner | None = None,
  37) -> tuple[int, list[str]]:
  38    """Update the current Nipux install and return output lines.
  39
  40    Source checkouts are fast-forwarded with git. Installed tools are refreshed
  41    from the configured source repository so `nipux update` works from anywhere.
  42    """
  43
  44    root = Path(path).expanduser().resolve() if path else find_checkout_root()
  45    if not root or not (root / ".git").exists():
  46        prefix = []
  47        if path is not None:
  48            prefix.append(f"{_short_path(root)} is not a source checkout; updating the installed Nipux tool instead.")
  49        code, lines = _update_uv_tool_install(runner=command_runner)
  50        return code, [*prefix, *lines]
  51    run = runner or _run_git
  52    top_level = run(["git", "rev-parse", "--show-toplevel"], root)
  53    if top_level.returncode != 0:
  54        return top_level.returncode, ["Cannot update: git could not identify the checkout.", *_process_lines(top_level)]
  55    checkout = Path(top_level.stdout.strip() or root).expanduser().resolve()
  56    before = _git_text(run(["git", "rev-parse", "--short", "HEAD"], checkout), fallback="unknown")
  57    branch = _git_text(run(["git", "branch", "--show-current"], checkout), fallback="detached")
  58    dirty = run(["git", "status", "--porcelain"], checkout)
  59    if dirty.returncode != 0:
  60        return dirty.returncode, ["Cannot update: git status failed.", *_process_lines(dirty)]
  61    if dirty.stdout.strip() and not allow_dirty:
  62        return (
  63            1,
  64            [
  65                f"Cannot update: local changes exist in {_short_path(checkout)}.",
  66                "Commit or stash them first, then run `nipux update` again.",
  67            ],
  68        )
  69    lines = [f"Updating Nipux in {_short_path(checkout)}", f"Current: {branch} @ {before}"]
  70    pulled = run(["git", "pull", "--ff-only"], checkout)
  71    lines.extend(_process_lines(pulled))
  72    if pulled.returncode != 0:
  73        return pulled.returncode, ["Update failed.", *lines]
  74    after = _git_text(run(["git", "rev-parse", "--short", "HEAD"], checkout), fallback=before)
  75    if after == before:
  76        lines.append("Nipux is already up to date.")
  77    else:
  78        lines.append(f"Updated Nipux: {before} -> {after}.")
  79    lines.append("Update complete.")
  80    return 0, lines
  81
  82
  83def _update_uv_tool_install(*, runner: CommandRunner | None = None) -> tuple[int, list[str]]:
  84    uv = shutil.which("uv")
  85    if not uv:
  86        return (
  87            1,
  88            [
  89                "Cannot update automatically because `uv` was not found.",
  90                "Install uv, then run `nipux update` again.",
  91            ],
  92        )
  93    run = runner or _run_command
  94    spec = _uv_tool_update_spec()
  95    lines = [
  96        "Updating installed Nipux command.",
  97        f"Source: {spec}",
  98    ]
  99    current = shutil.which("nipux")
 100    if current:
 101        lines.append(f"Command: {current}")
 102    updated = run([uv, "tool", "install", "--force", "--upgrade", "--reinstall", "--refresh", spec])
 103    lines.extend(_process_lines(updated))
 104    if updated.returncode != 0:
 105        return updated.returncode, ["Update failed.", *lines]
 106    lines.append("Nipux command refreshed from source.")
 107    verified = _verify_updated_command(runner=run)
 108    if verified:
 109        lines.append(verified)
 110    lines.append("Update complete.")
 111    return 0, lines
 112
 113
 114def _verify_updated_command(*, runner: CommandRunner) -> str:
 115    nipux = shutil.which("nipux")
 116    if not nipux:
 117        return ""
 118    checked = runner([nipux, "--version"])
 119    version_line = " ".join(_process_lines(checked)).strip()
 120    if checked.returncode != 0 or not version_line:
 121        return ""
 122    return f"Verified: {version_line}"
 123
 124
 125def _uv_tool_update_spec() -> str:
 126    """Return the direct source uv should use for installed-tool updates."""
 127
 128    explicit = os.environ.get("NIPUX_UPDATE_SPEC", "").strip()
 129    if explicit:
 130        return explicit
 131    repo = os.environ.get("NIPUX_REPO_URL", DEFAULT_UPDATE_REPO).strip() or DEFAULT_UPDATE_REPO
 132    ref = os.environ.get("NIPUX_REF", DEFAULT_UPDATE_REF).strip() or DEFAULT_UPDATE_REF
 133    if repo.startswith(("git+", "http://", "https://", "ssh://", "file://")):
 134        prefix = repo if repo.startswith("git+") else f"git+{repo}"
 135        return f"{prefix}@{ref}"
 136    return f"git+{repo}@{ref}"
 137
 138
 139def _run_git(command: Sequence[str], cwd: Path) -> subprocess.CompletedProcess[str]:
 140    return subprocess.run(
 141        list(command),
 142        cwd=cwd,
 143        text=True,
 144        stdout=subprocess.PIPE,
 145        stderr=subprocess.STDOUT,
 146        check=False,
 147    )
 148
 149
 150def _run_command(command: Sequence[str]) -> subprocess.CompletedProcess[str]:
 151    return subprocess.run(
 152        list(command),
 153        text=True,
 154        stdout=subprocess.PIPE,
 155        stderr=subprocess.STDOUT,
 156        check=False,
 157    )
 158
 159
 160def _process_lines(process: subprocess.CompletedProcess[str]) -> list[str]:
 161    output = process.stdout if isinstance(process.stdout, str) else ""
 162    return [line.rstrip() for line in output.splitlines() if line.strip()]
 163
 164
 165def _git_text(process: subprocess.CompletedProcess[str], *, fallback: str) -> str:
 166    if process.returncode != 0:
 167        return fallback
 168    value = process.stdout.strip() if isinstance(process.stdout, str) else ""
 169    return value or fallback
 170
 171
 172def _short_path(path: Path | str, *, max_width: int = 96) -> str:
 173    text = str(path)
 174    home = str(Path.home())
 175    if text.startswith(home + os.sep):
 176        text = "~" + text[len(home) :]
 177    if len(text) <= max_width:
 178        return text
 179    return "..." + text[-max(12, max_width - 4) :]
nipux_cli/updates.py 132 lines
   1"""Readable durable progress reports for jobs."""
   2
   3from __future__ import annotations
   4
   5import shlex
   6from typing import Any
   7
   8from nipux_cli.config import AppConfig
   9from nipux_cli.daemon import daemon_lock_status
  10from nipux_cli.db import AgentDB
  11from nipux_cli.tui_outcomes import hourly_update_lines, recent_model_update_lines
  12from nipux_cli.tui_status import job_display_state
  13from nipux_cli.tui_style import _one_line
  14
  15
  16def render_updates_report(
  17    db: AgentDB,
  18    config: AppConfig,
  19    job_id: str,
  20    *,
  21    limit: int = 5,
  22    chars: int = 180,
  23    paths: bool = False,
  24) -> list[str]:
  25    job = db.get_job(job_id)
  26    artifacts = db.list_artifacts(job_id, limit=limit)
  27    events = db.list_timeline_events(job_id, limit=max(250, limit * 80))
  28    metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
  29    operator_messages = _metadata_list(metadata, "operator_messages")
  30    agent_updates = _metadata_list(metadata, "agent_updates")
  31    lessons = _metadata_list(metadata, "lessons")
  32    daemon = daemon_lock_status(config.runtime.home / "agentd.lock")
  33    lines = [
  34        f"updates {job['title']} | state {job_display_state(job, bool(daemon['running']))}",
  35        "=" * 80,
  36    ]
  37    if operator_messages:
  38        latest = operator_messages[-1]
  39        lines.append(f"last steering: {_one_line(latest.get('message') or '', chars)}")
  40    lines.append("outcomes by hour:")
  41    outcome_lines = hourly_update_lines(events, width=max(72, chars), limit=max(8, limit * 4))
  42    if outcome_lines:
  43        lines.extend(f"  {line}" for line in outcome_lines)
  44    else:
  45        lines.append("  none yet")
  46    if agent_updates:
  47        lines.extend(["", "latest agent notes:"])
  48        for update in agent_updates[-min(limit, 5) :]:
  49            category = update.get("category") or "progress"
  50            lines.append(f"  {category}: {_one_line(update.get('message') or '', chars)}")
  51    if lessons:
  52        lines.extend(["", "latest lessons:"])
  53        for lesson in lessons[-min(limit, 5) :]:
  54            category = lesson.get("category") or "memory"
  55            lines.append(f"  {category}: {_one_line(lesson.get('lesson') or '', chars)}")
  56    lines.extend(["", "latest saved outputs:"])
  57    if not artifacts:
  58        lines.append("  none yet")
  59    for artifact in artifacts:
  60        title = artifact.get("title") or artifact["id"]
  61        summary = f" - {_one_line(artifact['summary'], chars)}" if artifact.get("summary") else ""
  62        lines.append(f"  {artifact['created_at']} {title}{summary}")
  63        lines.append(f"    view: artifact {shlex.quote(title)}")
  64        if paths:
  65            lines.append(f"    {artifact['path']}")
  66    lines.extend(["", "raw tool stream: activity"])
  67    return lines
  68
  69
  70def render_all_updates_report(
  71    db: AgentDB,
  72    config: AppConfig,
  73    *,
  74    limit: int = 5,
  75    chars: int = 180,
  76    paths: bool = False,
  77) -> list[str]:
  78    jobs = db.list_jobs()
  79    daemon = daemon_lock_status(config.runtime.home / "agentd.lock")
  80    lines = [
  81        f"outcomes all jobs | {len(jobs)} tracked",
  82        "=" * 80,
  83    ]
  84    if not jobs:
  85        lines.append("No jobs yet.")
  86        return lines
  87    for job in jobs[: max(1, limit)]:
  88        job_id = str(job["id"])
  89        counts = db.job_record_counts(job_id)
  90        state = job_display_state(job, bool(daemon["running"]))
  91        lines.append("")
  92        lines.append(f"{job['title']} | {state}")
  93        lines.append(
  94            "  "
  95            + " ".join(
  96                [
  97                    f"actions={counts.get('steps', 0)}",
  98                    f"outputs={counts.get('artifacts', 0)}",
  99                    f"findings={_metadata_count(job, 'finding_ledger')}",
 100                    f"tasks={_metadata_count(job, 'task_queue')}",
 101                    f"experiments={_metadata_count(job, 'experiment_ledger')}",
 102                ]
 103            )
 104        )
 105        events = db.list_events(job_id=job_id, limit=max(200, limit * 60))
 106        outcome_lines = recent_model_update_lines(events, width=max(72, chars), limit=max(2, min(4, limit)))
 107        if outcome_lines:
 108            lines.extend(f"  {line}" for line in outcome_lines)
 109        else:
 110            lines.append("  no durable outcomes yet")
 111        artifacts = db.list_artifacts(job_id, limit=2)
 112        for artifact in artifacts:
 113            title = artifact.get("title") or artifact["id"]
 114            summary = f" - {_one_line(artifact['summary'], chars)}" if artifact.get("summary") else ""
 115            lines.append(f"  output: {_one_line(title, chars)}{summary}")
 116            if paths:
 117                lines.append(f"    {artifact['path']}")
 118    if len(jobs) > limit:
 119        lines.append("")
 120        lines.append(f"... {len(jobs) - limit} more jobs hidden. Increase --limit to show more.")
 121    return lines
 122
 123
 124def _metadata_list(metadata: dict[str, Any], key: str) -> list[dict[str, Any]]:
 125    values = metadata.get(key)
 126    return [value for value in values if isinstance(value, dict)] if isinstance(values, list) else []
 127
 128
 129def _metadata_count(job: dict[str, Any], key: str) -> int:
 130    metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
 131    values = metadata.get(key)
 132    return len(values) if isinstance(values, list) else 0
nipux_cli/usage.py 78 lines
   1"""Formatting helpers for model token and cost usage."""
   2
   3from __future__ import annotations
   4
   5from typing import Any
   6
   7from nipux_cli.tui_layout import _format_compact_count, _format_usage_cost
   8
   9
  10def format_usage_report(
  11    *,
  12    title: str,
  13    usage: dict[str, Any],
  14    context_length: int,
  15    model: str,
  16    base_url: str,
  17) -> list[str]:
  18    calls = _safe_int(usage.get("calls"))
  19    prompt = _safe_int(usage.get("prompt_tokens"))
  20    completion = _safe_int(usage.get("completion_tokens"))
  21    total = _safe_int(usage.get("total_tokens")) or prompt + completion
  22    latest_prompt = _safe_int(usage.get("latest_prompt_tokens"))
  23    latest_completion = _safe_int(usage.get("latest_completion_tokens"))
  24    latest_total = _safe_int(usage.get("latest_total_tokens")) or latest_prompt + latest_completion
  25    estimated = _safe_int(usage.get("estimated_calls"))
  26    cached = _safe_int(usage.get("cached_tokens"))
  27    reasoning = _safe_int(usage.get("reasoning_tokens"))
  28    cost = _format_usage_cost(usage, model=model, base_url=base_url)
  29    cost_limit = _safe_optional_float(usage.get("max_job_cost_usd"))
  30    context_text = _format_compact_count(latest_prompt)
  31    if context_length > 0:
  32        context_text = f"{context_text}/{_format_compact_count(context_length)}"
  33    lines = [
  34        f"usage {title}",
  35        "=" * 80,
  36        f"model: {model}",
  37        f"calls: {calls} | estimated: {estimated}",
  38        f"tokens: total={_format_compact_count(total)} prompt={_format_compact_count(prompt)} output={_format_compact_count(completion)}",
  39        f"latest: ctx={context_text} output={_format_compact_count(latest_completion)} total={_format_compact_count(latest_total)}",
  40        f"details: cached={_format_compact_count(cached)} reasoning={_format_compact_count(reasoning)} cost={cost}",
  41    ]
  42    if cost_limit is not None and cost_limit > 0:
  43        current_cost = _safe_float(usage.get("cost"))
  44        remaining = max(0.0, cost_limit - current_cost)
  45        lines.append(f"limit: max job cost=${cost_limit:g} remaining=${remaining:.4f}")
  46    if calls <= 0:
  47        lines.append("no model usage has been recorded for this job yet")
  48    elif estimated:
  49        lines.append("some usage is estimated because the provider did not return complete token/cost metadata")
  50    elif not bool(usage.get("has_cost")):
  51        lines.append(
  52            "cost is pending unless the provider returns cost metadata, configured token rates are set, "
  53            "or the model is local/free"
  54        )
  55    return lines
  56
  57
  58def _safe_int(value: Any) -> int:
  59    try:
  60        return int(float(value))
  61    except (TypeError, ValueError):
  62        return 0
  63
  64
  65def _safe_float(value: Any) -> float:
  66    try:
  67        return float(value)
  68    except (TypeError, ValueError):
  69        return 0.0
  70
  71
  72def _safe_optional_float(value: Any) -> float | None:
  73    if value in (None, ""):
  74        return None
  75    try:
  76        return float(value)
  77    except (TypeError, ValueError):
  78        return None
nipux_cli/web.py 121 lines
   1"""Small web search/extract helpers without external web tool dependencies."""
   2
   3from __future__ import annotations
   4
   5import html
   6import re
   7import urllib.parse
   8import urllib.request
   9from html.parser import HTMLParser
  10from typing import Any
  11
  12from nipux_cli.source_quality import anti_bot_reason
  13
  14
  15class _TextExtractor(HTMLParser):
  16    def __init__(self):
  17        super().__init__()
  18        self.parts: list[str] = []
  19        self.skip_depth = 0
  20
  21    def handle_starttag(self, tag: str, attrs):
  22        del attrs
  23        if tag in {"script", "style", "noscript"}:
  24            self.skip_depth += 1
  25        if tag in {"p", "br", "div", "section", "article", "li", "h1", "h2", "h3"}:
  26            self.parts.append("\n")
  27
  28    def handle_endtag(self, tag: str):
  29        if tag in {"script", "style", "noscript"} and self.skip_depth:
  30            self.skip_depth -= 1
  31        if tag in {"p", "div", "section", "article", "li"}:
  32            self.parts.append("\n")
  33
  34    def handle_data(self, data: str):
  35        if not self.skip_depth:
  36            text = data.strip()
  37            if text:
  38                self.parts.append(text)
  39
  40    def text(self) -> str:
  41        raw = " ".join(self.parts)
  42        raw = re.sub(r"[ \t\r\f\v]+", " ", raw)
  43        raw = re.sub(r"\n\s+", "\n", raw)
  44        raw = re.sub(r"\n{3,}", "\n\n", raw)
  45        return html.unescape(raw).strip()
  46
  47
  48def _request(url: str, *, timeout: int = 20, max_bytes: int = 2_000_000) -> tuple[str, str]:
  49    request = urllib.request.Request(
  50        url,
  51        headers={
  52            "User-Agent": "Mozilla/5.0 (compatible; nipux/0.1; +https://github.com/nipuxx/agent-cli)",
  53            "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,text/plain;q=0.8,*/*;q=0.5",
  54        },
  55    )
  56    with urllib.request.urlopen(request, timeout=timeout) as response:
  57        content_type = response.headers.get("content-type", "")
  58        body = response.read(max_bytes + 1)
  59    if len(body) > max_bytes:
  60        raise ValueError(f"response exceeded {max_bytes} bytes")
  61    charset = "utf-8"
  62    match = re.search(r"charset=([\w.-]+)", content_type, re.I)
  63    if match:
  64        charset = match.group(1)
  65    return body.decode(charset, errors="replace"), content_type
  66
  67
  68def _strip_html(markup: str) -> str:
  69    parser = _TextExtractor()
  70    parser.feed(markup)
  71    return parser.text()
  72
  73
  74def _duckduckgo_link(raw: str) -> str:
  75    parsed = urllib.parse.urlparse(html.unescape(raw))
  76    query = urllib.parse.parse_qs(parsed.query)
  77    if "uddg" in query and query["uddg"]:
  78        return query["uddg"][0]
  79    return html.unescape(raw)
  80
  81
  82def web_search(query: str, *, limit: int = 5) -> dict[str, Any]:
  83    url = "https://duckduckgo.com/html/?" + urllib.parse.urlencode({"q": query})
  84    markup, _ = _request(url)
  85    pattern = re.compile(
  86        r'<a[^>]+class="result__a"[^>]+href="(?P<href>[^"]+)"[^>]*>(?P<title>.*?)</a>',
  87        re.I | re.S,
  88    )
  89    results = []
  90    for match in pattern.finditer(markup):
  91        title = re.sub(r"<[^>]+>", "", match.group("title"))
  92        results.append({"title": html.unescape(title).strip(), "url": _duckduckgo_link(match.group("href"))})
  93        if len(results) >= limit:
  94            break
  95    return {"success": True, "query": query, "results": results}
  96
  97
  98def web_extract(urls: list[str], *, limit_chars: int = 12_000) -> dict[str, Any]:
  99    pages = []
 100    for url in urls[:5]:
 101        try:
 102            body, content_type = _request(url)
 103            text = body if "text/plain" in content_type else _strip_html(body)
 104            reason = anti_bot_reason(url, text[:2000])
 105            page = {
 106                "url": url,
 107                "content_type": content_type,
 108                "text": text[:limit_chars],
 109                "truncated": len(text) > limit_chars,
 110            }
 111            if reason:
 112                page["source_warning"] = reason
 113                page["warnings"] = [{
 114                    "type": "anti_bot",
 115                    "message": reason,
 116                    "guidance": "This page may require normal human browser verification. Do not bypass protections.",
 117                }]
 118            pages.append(page)
 119        except Exception as exc:
 120            pages.append({"url": url, "error": str(exc)})
 121    return {"success": True, "pages": pages}
nipux_cli/worker.py 7538 lines
   1"""Bounded worker loop for one restartable agent step."""
   2
   3from __future__ import annotations
   4
   5import json
   6import re
   7import shlex
   8import signal
   9import threading
  10import time
  11from dataclasses import dataclass
  12from datetime import datetime, timezone
  13from pathlib import Path
  14from typing import Any
  15from urllib.parse import urlparse
  16
  17from nipux_cli.artifacts import ArtifactStore
  18from nipux_cli.config import AppConfig, load_config
  19from nipux_cli.compression import refresh_memory_index
  20from nipux_cli.context_pressure import (
  21    context_pressure_for_prompt,
  22    emit_context_pressure_update,
  23    emit_usage_pressure_update,
  24    usage_pressure_for_prompt,
  25)
  26from nipux_cli.db import AgentDB
  27from nipux_cli.llm import LLMResponse, LLMResponseError, OpenAIChatLLM, StepLLM, ToolCall
  28from nipux_cli.measurement import measurement_candidates, measurement_candidates_are_diagnostic_only
  29from nipux_cli.memory_graph import memory_graph_from_job
  30from nipux_cli.metric_format import format_metric_value
  31from nipux_cli.operator_context import (
  32    inactive_prompt_operator_ids,
  33)
  34from nipux_cli.progress import build_progress_checkpoint
  35from nipux_cli.provider_errors import provider_action_required_note
  36from nipux_cli.source_quality import anti_bot_reason
  37from nipux_cli.task_match import find_semantic_task_match, task_key
  38from nipux_cli.tools import DEFAULT_REGISTRY, ToolContext, ToolRegistry
  39from nipux_cli.worker_policy import (
  40    ACTIVITY_STAGNATION_BLOCKED_TOOLS,
  41    ACTIVITY_STAGNATION_CHECKPOINTS,
  42    ANTI_BOT_ACK_TERMS,
  43    ARTIFACT_ACCOUNTING_BLOCKED_TOOLS,
  44    ARTIFACT_ACCOUNTING_RESOLUTION_TOOLS,
  45    BRANCH_WORK_TOOLS,
  46    CHURN_TOOLS,
  47    DELIVERABLE_ARTIFACT_TERMS,
  48    DELIVERABLE_PROGRESS_BLOCKED_TOOLS,
  49    DELIVERABLE_RESEARCH_BUDGET_STEPS,
  50    EVIDENCE_ARTIFACT_TERMS,
  51    EXPERIMENT_DELIVERY_ACTION_TERMS,
  52    EXPERIMENT_INFORMATION_ACTION_TERMS,
  53    EXPERIMENT_NEXT_ACTION_BLOCKED_TOOLS,
  54    FILE_VALIDATION_BLOCKED_TOOLS,
  55    FILE_VALIDATION_RESOLUTION_TOOLS,
  56    LEDGER_PROGRESS_TOOLS,
  57    MAX_WORKER_PROMPT_CHARS,
  58    MEASURABLE_ACTION_BUDGET_STEPS,
  59    MEASURABLE_PROGRESS_PATTERN,
  60    MEASURABLE_RESEARCH_BLOCKED_TOOLS,
  61    MEASURABLE_RESEARCH_BUDGET_STEPS,
  62    MEASUREMENT_BLOCKED_TOOLS,
  63    MEASUREMENT_RESOLUTION_TOOLS,
  64    MEMORY_CONSOLIDATION_BLOCKED_TOOLS,
  65    MEMORY_ENTRY_PROMPT_CHARS,
  66    MEMORY_PROMPT_CHARS,
  67    MILESTONE_VALIDATION_BLOCKED_TOOLS,
  68    PROGRAM_PROMPT_CHARS,
  69    QUERY_STOPWORDS,
  70    READ_ONLY_SHELL_COMMAND_PATTERN,
  71    RECENT_STATE_PROMPT_CHARS,
  72    RECENT_STATE_STEPS,
  73    RECOVERABLE_GUARD_ERRORS,
  74    REFLECTION_INTERVAL_STEPS,
  75    RESEARCH_BALANCE_BLOCKED_TOOLS,
  76    ROADMAP_STALENESS_BLOCKED_TOOLS,
  77    SOURCE_YIELD_BLOCKED_TOOLS,
  78    SYSTEM_PROMPT,
  79    TASK_DELIVERABLE_ACTION_TERMS,
  80    TASK_PLANNING_STAGNATION_CHECKPOINTS,
  81    TASK_QUEUE_SATURATION_OPEN_TASKS,
  82    TASK_QUEUE_TOTAL_SOFT_LIMIT,
  83    TEXT_TOKEN_STOPWORDS,
  84)
  85from nipux_cli.worker_prompt_context import (
  86    _as_float,
  87    _as_int,
  88    _experiments_for_prompt,
  89    _ledgers_for_prompt,
  90    _lessons_for_prompt,
  91    _memory_graph_for_prompt,
  92    _memory_entries_for_prompt,
  93    _metadata_list,
  94    _operator_messages_for_prompt,
  95    _outcomes_for_prompt,
  96    _render_worker_prompt,
  97    _roadmap_for_prompt,
  98    _tasks_for_prompt,
  99    _timeline_for_prompt,
 100)
 101from nipux_cli.worker_prompt_format import (
 102    clip_text as _clip_text,
 103    format_step_for_prompt as _format_step_for_prompt,
 104    observation_for_prompt as _observation_for_prompt,
 105)
 106from nipux_cli.worker_tool_summary import summarize_tool_result as _summarize_tool_result
 107from nipux_cli.worker_usage import turn_usage_metadata
 108
 109
 110__all__ = ["MAX_WORKER_PROMPT_CHARS", "_render_worker_prompt", "build_messages", "run_one_step"]
 111
 112
 113LESSON_SPRAWL_MIN_LESSONS = 30
 114LESSON_SPRAWL_RECENT_LESSONS = 3
 115EXPERIMENT_STAGNATION_MIN_TRIALS = 6
 116EXPERIMENT_STAGNATION_NON_IMPROVING = 4
 117SOURCE_YIELD_MIN_SOURCES = 12
 118SOURCE_YIELD_MIN_RECENT_GATHERING = 5
 119
 120
 121@dataclass(frozen=True)
 122class StepExecution:
 123    job_id: str
 124    run_id: str
 125    step_id: str
 126    tool_name: str | None
 127    status: str
 128    result: dict[str, Any]
 129
 130
 131EXPERIMENT_NEXT_ACTION_VERIFY_SHELL_PATTERN = re.compile(
 132    r"(?is)^\s*(?:command\s+-v\b|which\b|type\b|test\b|ls\b|find\b|stat\b|file\b)"
 133)
 134EXPERIMENT_NEXT_ACTION_VERIFY_STOPWORDS = {
 135    "action",
 136    "after",
 137    "before",
 138    "from",
 139    "into",
 140    "next",
 141    "real",
 142    "then",
 143    "using",
 144    "with",
 145}
 146MILESTONE_MATCH_STOPWORDS = {
 147    "acceptance",
 148    "blocked",
 149    "criteria",
 150    "current",
 151    "done",
 152    "evidence",
 153    "failed",
 154    "issue",
 155    "issues",
 156    "milestone",
 157    "needed",
 158    "pending",
 159    "passed",
 160    "result",
 161    "roadmap",
 162    "status",
 163    "title",
 164    "validating",
 165    "validation",
 166    "validate",
 167}
 168
 169
 170def build_messages(
 171    job: dict[str, Any],
 172    recent_steps: list[dict[str, Any]],
 173    memory_entries: list[dict[str, Any]] | None = None,
 174    program_text: str = "",
 175    timeline_events: list[dict[str, Any]] | None = None,
 176    active_operator_messages: list[dict[str, Any]] | None = None,
 177    include_unclaimed_operator_messages: bool = True,
 178    token_usage: dict[str, Any] | None = None,
 179) -> list[dict[str, Any]]:
 180    step_lines = []
 181    for step in recent_steps[-RECENT_STATE_STEPS:]:
 182        step_lines.append(_clip_text(_format_step_for_prompt(step), 720))
 183    state = _clip_text("\n".join(step_lines), RECENT_STATE_PROMPT_CHARS) if step_lines else "No prior steps."
 184    memory_lines = []
 185    for entry in _memory_entries_for_prompt(memory_entries or []):
 186        refs = ", ".join((entry.get("artifact_refs") or [])[:8])
 187        suffix = f"\nArtifact refs: {refs}" if refs else ""
 188        memory_lines.append(
 189            _clip_text(f"### {entry.get('key') or 'memory'}\n{entry.get('summary') or ''}{suffix}", MEMORY_ENTRY_PROMPT_CHARS)
 190        )
 191    memory_text = _clip_text("\n\n".join(memory_lines), MEMORY_PROMPT_CHARS) if memory_lines else "No compact memory yet."
 192    program = _clip_text(program_text.strip(), PROGRAM_PROMPT_CHARS) if program_text else "No program.md saved yet."
 193    operator_messages = _operator_messages_for_prompt(
 194        job,
 195        active_messages=active_operator_messages or [],
 196        include_unclaimed=include_unclaimed_operator_messages,
 197    )
 198    current_execution_focus = _current_execution_focus_for_prompt(job, recent_steps)
 199    measurement_obligation = _measurement_obligation_for_prompt(job)
 200    recent_measurement_evidence = _recent_measurement_evidence_for_prompt(job, recent_steps)
 201    file_validation_obligation = _file_validation_obligation_for_prompt(job)
 202    candidate_file_discovery = _candidate_file_discovery_for_prompt(job, recent_steps)
 203    shell_path_recovery = _shell_path_recovery_for_prompt(recent_steps)
 204    shell_permission_recovery = _shell_permission_recovery_for_prompt(recent_steps)
 205    measured_progress_guard = _measured_progress_guard_for_prompt(job, recent_steps)
 206    experiment_stagnation_guard = _experiment_stagnation_guard_for_prompt(job, recent_steps)
 207    research_balance_guard = _research_balance_guard_for_prompt(job, recent_steps)
 208    source_yield_guard = _source_yield_guard_for_prompt(job, recent_steps)
 209    deliverable_progress_guard = _deliverable_progress_guard_for_prompt(job, recent_steps)
 210    progress_accounting_guard = _progress_accounting_for_prompt(recent_steps)
 211    evidence_checkpoint_guard = _evidence_checkpoint_accounting_for_prompt(job, recent_steps)
 212    activity_stagnation = _activity_stagnation_for_prompt(job)
 213    task_planning_guard = _task_planning_guard_for_prompt(job)
 214    task_queue_saturation = _task_queue_saturation_for_prompt(job, recent_steps)
 215    memory_consolidation_guard = _memory_consolidation_guard_for_prompt(job, recent_steps)
 216    lesson_consolidation_guard = _lesson_consolidation_guard_for_prompt(job, recent_steps)
 217    durable_yield = _durable_yield_for_prompt(job, recent_steps)
 218    context_pressure = context_pressure_for_prompt(job)
 219    usage_pressure = usage_pressure_for_prompt(job, token_usage)
 220    lessons = _lessons_for_prompt(job)
 221    memory_graph = _memory_graph_for_prompt(job)
 222    roadmap = _roadmap_for_prompt(job)
 223    tasks = _tasks_for_prompt(job)
 224    ledgers = _ledgers_for_prompt(job)
 225    experiments = _experiments_for_prompt(job)
 226    reflections = _reflections_for_prompt(job)
 227    timeline = _timeline_for_prompt(timeline_events or [])
 228    outcomes = _outcomes_for_prompt(timeline_events or [])
 229    next_constraint = _next_action_constraint(job, recent_steps)
 230    content = _render_worker_prompt(
 231        job,
 232        sections=[
 233            (
 234                "Workspace",
 235                "\n".join([
 236                    "- shell_exec runs on the machine hosting this Nipux worker, in the current worker directory unless the command changes it",
 237                    "- saved artifacts are separate Nipux outputs; read_artifact is only for those saved outputs",
 238                    "- use shell_exec for workspace/project files unless the file is a saved artifact",
 239                ]),
 240            ),
 241            ("Operator context", operator_messages),
 242            ("Current execution focus", current_execution_focus),
 243            ("Pending measurement obligation", measurement_obligation),
 244            ("Recent measurement evidence", recent_measurement_evidence),
 245            ("Pending file validation obligation", file_validation_obligation),
 246            ("Candidate file discovery", candidate_file_discovery),
 247            ("Shell path recovery", shell_path_recovery),
 248            ("Shell permission recovery", shell_permission_recovery),
 249            ("Measured progress guard", measured_progress_guard),
 250            ("Experiment stagnation guard", experiment_stagnation_guard),
 251            ("Research balance guard", research_balance_guard),
 252            ("Source yield guard", source_yield_guard),
 253            ("Deliverable progress guard", deliverable_progress_guard),
 254            ("Progress accounting guard", progress_accounting_guard),
 255            ("Evidence checkpoint accounting guard", evidence_checkpoint_guard),
 256            ("Activity stagnation", activity_stagnation),
 257            ("Task planning guard", task_planning_guard),
 258            ("Task queue saturation", task_queue_saturation),
 259            ("Memory consolidation guard", memory_consolidation_guard),
 260            ("Lesson consolidation guard", lesson_consolidation_guard),
 261            ("Durable progress yield", durable_yield),
 262            ("Context pressure", context_pressure),
 263            ("Usage pressure", usage_pressure),
 264            ("Program", program),
 265            ("Lessons learned", lessons),
 266            ("Memory graph", memory_graph),
 267            ("Roadmap", roadmap),
 268            ("Task queue", tasks),
 269            ("Durable outcomes", outcomes),
 270            ("Ledgers", ledgers),
 271            ("Experiment ledger", experiments),
 272            ("Reflections", reflections),
 273            ("Compact memory", memory_text),
 274            ("Recent visible timeline", timeline),
 275            ("Recent state", state),
 276            ("Next-action constraint", next_constraint),
 277        ],
 278    )
 279    return [
 280        {"role": "system", "content": SYSTEM_PROMPT},
 281        {"role": "user", "content": content},
 282    ]
 283
 284
 285def _acknowledge_non_prompt_operator_context(db: AgentDB, job_id: str) -> int:
 286    job = db.get_job(job_id)
 287    metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
 288    messages = metadata.get("operator_messages") if isinstance(metadata.get("operator_messages"), list) else []
 289    message_ids = inactive_prompt_operator_ids(messages)
 290    if not message_ids:
 291        return 0
 292    result = db.acknowledge_operator_messages(
 293        job_id,
 294        message_ids=message_ids,
 295        summary="conversation-only message retained in history, not used as worker constraint",
 296    )
 297    return int(result.get("count") or 0)
 298
 299
 300def _measured_progress_guard_for_prompt(job: dict[str, Any], recent_steps: list[dict[str, Any]]) -> str:
 301    context = _measured_progress_guard_context(job, recent_steps)
 302    if not context:
 303        return "None."
 304    if _as_int(context.get("shell_actions_since_last_experiment")) >= _as_int(context.get("shell_action_budget")):
 305        candidate_context = _candidate_file_discovery_context(job, recent_steps)
 306        shell_guidance = "Do not call shell_exec or do more research next."
 307        if candidate_context:
 308            shell_guidance = (
 309                "Do not call broad shell_exec or do more research next. A single bounded shell_exec is allowed only "
 310                "when it validates one exact candidate path already listed in Candidate file discovery."
 311            )
 312        return (
 313            "This objective or active task is measurably framed, and the shell/action budget since the last experiment "
 314            f"is exhausted. completed_since_last_experiment={context.get('completed_since_last_experiment')} "
 315            f"shell_actions={context.get('shell_actions_since_last_experiment')} shell_budget={context.get('shell_action_budget')} "
 316            f"reason={context.get('reason')}. {shell_guidance} Use record_experiment "
 317            "for a known result, record_tasks to create a missing experiment/monitor branch, or record_lesson if the "
 318            "branch is blocked or the recent outputs were not valid measurements."
 319        )
 320    return (
 321        "This objective or active task is measurably framed, but recent work has not produced "
 322        f"new experiment records. completed_since_last_experiment={context.get('completed_since_last_experiment')} "
 323        f"research_budget={context.get('research_budget')} shell_actions={context.get('shell_actions_since_last_experiment')} "
 324        f"shell_budget={context.get('shell_action_budget')} reason={context.get('reason')}. "
 325        "Next useful actions: run a small measuring action, call record_experiment for a known result, "
 326        "or use record_tasks to create an experiment/action/monitor task with acceptance criteria and evidence."
 327    )
 328
 329
 330def _deliverable_progress_guard_for_prompt(job: dict[str, Any], recent_steps: list[dict[str, Any]]) -> str:
 331    context = _deliverable_progress_guard_context(job, recent_steps)
 332    if not context:
 333        return "None."
 334    return (
 335        "This objective or active task expects a durable deliverable, but recent branch work has not produced a "
 336        "draft/report/file checkpoint. "
 337        f"completed_since_last_deliverable={context.get('completed_since_last_deliverable')} "
 338        f"research_budget={context.get('research_budget')} reason={context.get('reason')}. "
 339        "Next useful actions: write_file or write_artifact for a partial deliverable, record_tasks for a smaller "
 340        "deliverable branch, record_roadmap/record_milestone_validation for validation, or record_lesson if the "
 341        "deliverable is blocked."
 342    )
 343
 344
 345def _experiment_stagnation_guard_for_prompt(job: dict[str, Any], recent_steps: list[dict[str, Any]]) -> str:
 346    context = _experiment_stagnation_context(job, recent_steps)
 347    if not context:
 348        return "None."
 349    return (
 350        "Recent measured trials have not improved the best observed result. "
 351        f"metric={context.get('metric_name')} unit={context.get('metric_unit')} "
 352        f"best={context.get('best_value')} latest={context.get('latest_value')} "
 353        f"non_improving={context.get('non_improving_count')} recent_trials={context.get('recent_trials')}. "
 354        "Before more experiments, shell execution, research, or output churn, record a decision: reject or block the "
 355        "stale branch, pivot to a materially different branch, update the roadmap/task queue, or explain why the "
 356        "stagnant measurements are still useful."
 357    )
 358
 359
 360def _measurement_obligation_for_prompt(job: dict[str, Any]) -> str:
 361    metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
 362    obligation = metadata.get("pending_measurement_obligation")
 363    if not isinstance(obligation, dict) or not obligation or obligation.get("resolved_at"):
 364        return "None."
 365    candidates = obligation.get("metric_candidates") if isinstance(obligation.get("metric_candidates"), list) else []
 366    lines = [
 367        f"source_step=#{obligation.get('source_step_no') or '?'} tool={obligation.get('tool') or ''}",
 368        f"summary={obligation.get('summary') or ''}",
 369    ]
 370    command = str(obligation.get("command") or "")
 371    if command:
 372        lines.append(f"command={_clip_text(command, 360)}")
 373    if candidates:
 374        lines.append("metric_candidates=" + "; ".join(str(item) for item in candidates[:6]))
 375    lines.append(
 376        "Before more research or artifact churn, call record_experiment with the measured result, "
 377        "record_lesson explaining why it is not a valid measurement, or record_tasks to create the missing measurement branch."
 378    )
 379    return "\n".join(lines)
 380
 381
 382def _recent_measurement_evidence_for_prompt(job: dict[str, Any], recent_steps: list[dict[str, Any]], *, window: int = 140) -> str:
 383    if _pending_measurement_obligation(job):
 384        return "Covered by the pending measurement obligation."
 385    latest_experiment_step_no = max(
 386        (
 387            _as_int(step.get("step_no"))
 388            for step in recent_steps
 389            if step.get("tool_name") == "record_experiment" and step.get("status") == "completed"
 390        ),
 391        default=0,
 392    )
 393    lines: list[str] = []
 394    for step in reversed(_completed_or_failed_recent_steps(recent_steps)[-window:]):
 395        if step.get("tool_name") != "shell_exec":
 396            continue
 397        output = step.get("output") if isinstance(step.get("output"), dict) else {}
 398        if not output:
 399            continue
 400        command = _step_command(step) or str(output.get("command") or "")
 401        candidates = measurement_candidates(output, command=command, limit=4)
 402        if not candidates or measurement_candidates_are_diagnostic_only(candidates, command=command):
 403            continue
 404        step_no = _as_int(step.get("step_no"))
 405        relation = "after last experiment" if step_no > latest_experiment_step_no else "before last experiment"
 406        prefix = f"- step #{step_no or '?'} {step.get('status') or ''} ({relation})"
 407        detail = "; ".join(str(candidate) for candidate in candidates[:3])
 408        command_detail = f" command={_clip_text(command, 180)}" if command else ""
 409        lines.append(_clip_text(f"{prefix}: {detail}.{command_detail}", 520))
 410        if len(lines) >= 6:
 411            break
 412    if not lines:
 413        return "None."
 414    return "\n".join([
 415        "Recent shell output contains measurable-looking values. Reconcile valid values with record_experiment; "
 416        "if a value is invalid, record why before treating the branch as complete.",
 417        *reversed(lines),
 418    ])
 419
 420
 421def _file_validation_obligation_for_prompt(job: dict[str, Any]) -> str:
 422    obligation = _pending_file_validation_obligation(job)
 423    if not obligation:
 424        return "None."
 425    lines = [
 426        f"path={obligation.get('path') or ''}",
 427        f"source_step=#{obligation.get('source_step_no') or '?'}",
 428        f"reason={obligation.get('reason') or 'recent file output needs validation'}",
 429    ]
 430    suggested = str(obligation.get("suggested_validation") or "").strip()
 431    if suggested:
 432        lines.append(f"suggested_validation={suggested}")
 433    lines.append(
 434        "Before more research/output churn, validate the file with shell_exec, "
 435        "corroborating any `file` output with header/signature bytes, checksum/size, or a parser/loader when the expected "
 436        "format matters, "
 437        "or use record_tasks/record_lesson/record_experiment to explain the blocked or deferred validation."
 438    )
 439    return "\n".join(lines)
 440
 441
 442def _current_execution_focus_for_prompt(job: dict[str, Any], recent_steps: list[dict[str, Any]]) -> str:
 443    focus = _current_execution_focus_context(job, recent_steps)
 444    if not focus:
 445        return (
 446            "No isolated focus yet. Follow the next-action constraint, choose one bounded branch, "
 447            "and account for the result before expanding the backlog."
 448        )
 449    lines = [
 450        f"phase={focus['phase']}",
 451        "focus=" + _clip_text(str(focus.get("focus") or ""), 620),
 452        "next=" + _clip_text(str(focus.get("next") or ""), 620),
 453    ]
 454    evidence = str(focus.get("evidence") or "").strip()
 455    if evidence:
 456        lines.append("evidence=" + _clip_text(evidence, 520))
 457    task = str(focus.get("task") or "").strip()
 458    if task:
 459        lines.append("task=" + _clip_text(task, 520))
 460    backlog = focus.get("backlog") if isinstance(focus.get("backlog"), dict) else {}
 461    if backlog:
 462        lines.append(
 463            "backlog="
 464            f"{backlog.get('total')} tasks, {backlog.get('open')} runnable/open. "
 465            "Treat it as advisory until this focus is resolved; do not add new branches unless directly closing, "
 466            "merging, or relabeling existing work."
 467        )
 468    lines.append(
 469        "Boundary: do not switch to unrelated search, task creation, or stale branches unless this focus is blocked "
 470        "by fresh tool evidence. If it is blocked, record the blocker and the next concrete recovery action."
 471    )
 472    return "\n".join(lines)
 473
 474
 475def _current_execution_focus_context(job: dict[str, Any], recent_steps: list[dict[str, Any]]) -> dict[str, Any] | None:
 476    backlog = _task_backlog_pressure_context(job)
 477    measurement_obligation = _pending_measurement_obligation(job)
 478    if measurement_obligation:
 479        return {
 480            "phase": "account_measurement",
 481            "focus": f"Resolve pending measurement from step #{measurement_obligation.get('source_step_no') or '?'}",
 482            "next": "Use record_experiment for a valid measured result, or record_lesson/record_tasks if the result is invalid or missing.",
 483            "evidence": str(measurement_obligation.get("summary") or ""),
 484            "backlog": backlog,
 485        }
 486    file_validation = _pending_file_validation_obligation(job)
 487    if file_validation:
 488        return {
 489            "phase": "validate_file",
 490            "focus": f"Validate recently written file {file_validation.get('path') or ''}",
 491            "next": str(file_validation.get("suggested_validation") or "Run one bounded validation command, then account for the result."),
 492            "evidence": str(file_validation.get("reason") or ""),
 493            "backlog": backlog,
 494        }
 495    checkpoint = _auto_checkpoint_accounting_context(job, recent_steps)
 496    if checkpoint:
 497        if checkpoint.get("checkpoint_read"):
 498            next_action = "Use a durable ledger tool to account for the already-read evidence checkpoint."
 499        else:
 500            next_action = "Read the specific checkpoint artifact or account for it from existing evidence."
 501        return {
 502            "phase": "account_checkpoint",
 503            "focus": f"Resolve auto-saved evidence checkpoint {checkpoint.get('artifact_id') or checkpoint.get('title') or ''}",
 504            "next": next_action,
 505            "evidence": str(checkpoint.get("summary") or checkpoint.get("title") or ""),
 506            "backlog": backlog,
 507        }
 508    candidate_files = _candidate_file_discovery_context(job, recent_steps)
 509    if candidate_files:
 510        paths = candidate_files.get("paths") if isinstance(candidate_files.get("paths"), list) else []
 511        invalid_paths = set(str(path) for path in candidate_files.get("invalid_paths") or [])
 512        primary_path = next((str(path) for path in paths if str(path) not in invalid_paths), str(paths[0]) if paths else "")
 513        validated = _candidate_file_recently_validated(primary_path, recent_steps)
 514        if validated:
 515            return {
 516                "phase": "execute_with_validated_candidate",
 517                "focus": f"Use the recently validated candidate path: {primary_path}",
 518                "next": (
 519                    "Run the next bounded action or measurement that uses this validated path; do not repeat "
 520                    "existence checks unless a new command requires a different property."
 521                ),
 522                "evidence": validated,
 523                "task": str(candidate_files.get("task_text") or ""),
 524                "backlog": backlog,
 525            }
 526        return {
 527            "phase": "execute_candidate_validation",
 528            "focus": f"Validate the highest-confidence candidate path: {primary_path}",
 529            "next": "Run one bounded shell/file validation against the primary path, then record the measurement, finding, lesson, or blocker.",
 530            "evidence": f"Ranked candidates: {'; '.join(str(path) for path in paths[:4])}",
 531            "task": str(candidate_files.get("task_text") or ""),
 532            "backlog": backlog,
 533        }
 534    grounding_block = _latest_evidence_grounding_block(recent_steps)
 535    if grounding_block:
 536        return {
 537            "phase": "repair_record",
 538            "focus": "Rewrite or replace the blocked durable record using only observed evidence.",
 539            "next": "Use exact observed tokens/paths from recent evidence, or record why the attempted claim is invalid.",
 540            "evidence": str(grounding_block.get("error") or grounding_block.get("summary") or ""),
 541            "backlog": backlog,
 542        }
 543    experiment_next_action = _latest_experiment_next_action_context(job)
 544    if experiment_next_action:
 545        return {
 546            "phase": "execute_measured_next_action",
 547            "focus": "Continue from the latest measured experiment decision.",
 548            "next": str(experiment_next_action.get("next_action") or ""),
 549            "evidence": str(experiment_next_action.get("title") or experiment_next_action.get("summary") or ""),
 550            "backlog": backlog,
 551        }
 552    milestone_validation = _milestone_validation_needed(job)
 553    if milestone_validation:
 554        return {
 555            "phase": "validate_milestone",
 556            "focus": f"Validate milestone: {milestone_validation.get('title') or 'current milestone'}",
 557            "next": "Use record_milestone_validation with pass/fail/blocker status from observed evidence.",
 558            "evidence": str(milestone_validation.get("evidence_needed") or milestone_validation.get("acceptance_criteria") or ""),
 559            "backlog": backlog,
 560        }
 561    task = _primary_execution_task(job)
 562    if task and backlog:
 563        return {
 564            "phase": "execute_task",
 565            "focus": str(task.get("title") or "current task"),
 566            "next": str(task.get("next_action") or task.get("goal") or task.get("acceptance_criteria") or "Take one bounded action for this task."),
 567            "evidence": str(task.get("evidence_needed") or task.get("result") or ""),
 568            "task": str(task),
 569            "backlog": backlog,
 570        }
 571    return None
 572
 573
 574def _task_backlog_pressure_context(job: dict[str, Any]) -> dict[str, int] | None:
 575    tasks = [task for task in _metadata_list(job, "task_queue") if isinstance(task, dict)]
 576    if not tasks:
 577        return None
 578    runnable_statuses = {"active", "open", "waiting", "blocked"}
 579    open_count = sum(1 for task in tasks if str(task.get("status") or "open").lower() in runnable_statuses)
 580    total_count = len(tasks)
 581    if total_count < TASK_QUEUE_TOTAL_SOFT_LIMIT and open_count < TASK_QUEUE_SATURATION_OPEN_TASKS:
 582        return None
 583    return {"total": total_count, "open": open_count}
 584
 585
 586def _primary_execution_task(job: dict[str, Any]) -> dict[str, Any] | None:
 587    tasks = [task for task in _metadata_list(job, "task_queue") if isinstance(task, dict)]
 588    status_rank = {"active": 0, "open": 1, "waiting": 2, "blocked": 3}
 589    runnable = [task for task in tasks if str(task.get("status") or "open").lower() in status_rank]
 590    if not runnable:
 591        return None
 592    return sorted(
 593        runnable,
 594        key=lambda task: (
 595            status_rank.get(str(task.get("status") or "open").lower(), 9),
 596            -_as_int(task.get("priority")),
 597            str(task.get("title") or ""),
 598        ),
 599    )[0]
 600
 601
 602def _candidate_file_discovery_for_prompt(job: dict[str, Any], recent_steps: list[dict[str, Any]]) -> str:
 603    context = _candidate_file_discovery_context(job, recent_steps)
 604    if not context:
 605        return "None."
 606    paths = context["paths"]
 607    source_text = context["source_text"]
 608    lines = [
 609        f"{source_text} while open work depends on file/path validation.",
 610        "Candidate paths:",
 611    ]
 612    for path in paths[:8]:
 613        lines.append(f"- {path}")
 614    primary_path = str(paths[0]) if paths else ""
 615    validation = _candidate_file_recently_validated(primary_path, recent_steps)
 616    if validation:
 617        lines.append(
 618            "Highest-ranked candidate has recent positive validation evidence. "
 619            f"Use it for the next action instead of repeating existence checks: {_clip_text(validation, 420)}"
 620        )
 621    invalid_paths = context.get("invalid_paths") if isinstance(context.get("invalid_paths"), list) else []
 622    if invalid_paths:
 623        lines.append(
 624            "Recently invalid or stub-like candidates: "
 625            + "; ".join(str(path) for path in invalid_paths[:5])
 626            + ". Prefer higher-confidence candidates before retrying these."
 627        )
 628    lines.append(
 629        "Validate likely candidates with shell_exec before recording a no-file/no-progress claim or searching for alternatives. "
 630        "Do not reject a non-empty candidate binary from `file` output alone; corroborate with header/signature bytes, "
 631        "checksum/size, or a parser/loader for the expected format, or record uncertainty. "
 632        "Treat durable-record candidates as candidates until revalidated. This supersedes stale no-candidate/no-file memory "
 633        "until validation proves those candidates are irrelevant."
 634    )
 635    lines.append(f"Relevant open work: {_clip_text(context['task_text'], 500)}")
 636    return "\n".join(lines)
 637
 638
 639def _shell_path_recovery_for_prompt(recent_steps: list[dict[str, Any]]) -> str:
 640    context = _shell_path_recovery_context(recent_steps)
 641    if not context:
 642        return "None."
 643    paths = context.get("missing_paths") if isinstance(context.get("missing_paths"), list) else []
 644    commands = context.get("missing_commands") if isinstance(context.get("missing_commands"), list) else []
 645    candidate_executables = (
 646        context.get("candidate_executables") if isinstance(context.get("candidate_executables"), dict) else {}
 647    )
 648    observed_executables = (
 649        context.get("observed_executables") if isinstance(context.get("observed_executables"), list) else []
 650    )
 651    lines = [
 652        f"Recent shell step #{context.get('step_no') or '?'} reported a missing command or path.",
 653    ]
 654    if commands:
 655        lines.append("Missing commands: " + ", ".join(str(command) for command in commands[:6]))
 656    if candidate_executables:
 657        for command, command_paths in list(candidate_executables.items())[:6]:
 658            if not isinstance(command_paths, list) or not command_paths:
 659                continue
 660            lines.append(
 661                f"Observed candidate executable for {command}: "
 662                + ", ".join(str(path) for path in command_paths[:4])
 663            )
 664        lines.append("Recovery priority: try the exact candidate path or add its directory to PATH before package-manager/install retries.")
 665    if paths:
 666        lines.append("Missing paths: " + ", ".join(str(path) for path in paths[:6]))
 667    if observed_executables:
 668        lines.append("Observed executable paths in partial shell output: " + ", ".join(str(path) for path in observed_executables[:8]))
 669    if not commands and not paths:
 670        lines.append("Missing command/path was not parsed.")
 671    command = str(context.get("command") or "")
 672    if command:
 673        lines.append(f"Failed command: {_clip_text(command, 420)}")
 674    excerpt = str(context.get("excerpt") or "")
 675    if excerpt:
 676        lines.append(f"Observed output: {_clip_text(excerpt, 360)}")
 677    lines.append(
 678        "Do not treat this output as a successful measurement or deliverable. Next, locate or verify the real "
 679        "executable/file path with a bounded shell probe such as command -v, find, ls, or an equivalent platform "
 680        "tool. Retry using only a validated path, or record the branch as blocked/skipped with the observed reason."
 681    )
 682    return "\n".join(lines)
 683
 684
 685def _shell_path_recovery_context(recent_steps: list[dict[str, Any]], *, window: int = 16) -> dict[str, Any] | None:
 686    for step in reversed(_completed_or_failed_recent_steps(recent_steps)[-window:]):
 687        if step.get("tool_name") != "shell_exec":
 688            continue
 689        output = step.get("output") if isinstance(step.get("output"), dict) else {}
 690        text = "\n".join(str(output.get(key) or "") for key in ("stdout", "stderr", "error"))
 691        if not text.strip():
 692            continue
 693        missing_paths = _missing_paths_from_shell_output(text)
 694        if not missing_paths and not _shell_output_has_missing_command(text):
 695            continue
 696        commands = _missing_commands_from_shell_output(text)
 697        return {
 698            "step_no": step.get("step_no"),
 699            "command": output.get("command"),
 700            "missing_commands": commands,
 701            "candidate_executables": _candidate_executable_paths_for_missing_commands(recent_steps, commands),
 702            "observed_executables": _observed_executable_paths_from_recent_shell(
 703                recent_steps,
 704                exclude_paths=missing_paths,
 705            ),
 706            "missing_paths": missing_paths,
 707            "excerpt": text.strip(),
 708        }
 709    return None
 710
 711
 712def _shell_permission_recovery_for_prompt(recent_steps: list[dict[str, Any]]) -> str:
 713    context = _recent_privileged_shell_failure_context(recent_steps)
 714    if not context:
 715        return "None."
 716    lines = [
 717        f"Recent shell step #{context.get('step_no') or '?'} failed because a privileged/package-manager command lacked permission.",
 718    ]
 719    command = str(context.get("command") or "")
 720    if command:
 721        lines.append(f"Failed command: {_clip_text(command, 420)}")
 722    excerpt = str(context.get("excerpt") or "")
 723    if excerpt:
 724        lines.append(f"Observed output: {_clip_text(excerpt, 360)}")
 725    lines.append("Recovery priority: try non-privileged alternatives first; record when operator credentials are required.")
 726    lines.append(
 727        "Do not retry the same privileged/package-manager path. Prefer observed executables, user-writable installs, "
 728        "existing project files, or other non-privileged alternatives; otherwise record the branch as blocked, skipped, "
 729        "or needing operator credentials."
 730    )
 731    return "\n".join(lines)
 732
 733
 734def _shell_step_failure_text(step: dict[str, Any]) -> str:
 735    output = step.get("output") if isinstance(step.get("output"), dict) else {}
 736    return "\n".join(str(output.get(key) or "") for key in ("stdout", "stderr", "error"))
 737
 738
 739def _shell_output_has_missing_command(text: str) -> bool:
 740    lowered = text.lower()
 741    return any(marker in lowered for marker in ("command not found", ": not found", "no such file or directory"))
 742
 743
 744def _missing_paths_from_shell_output(text: str) -> list[str]:
 745    patterns = [
 746        r"(?:^|\n)(?:/bin/sh:\s*\d+:\s*)?(?P<path>/[^\s:'\"]+):\s*(?:not found|No such file or directory|command not found)",
 747        r"(?:cannot access|cannot stat|can't stat|stat: cannot statx?) ['\"](?P<quoted>[^'\"]+)['\"]:\s*No such file or directory",
 748        r"(?:^|\n)(?P<plain>/[^\s:'\"]+):\s*No such file or directory",
 749    ]
 750    paths: list[str] = []
 751    seen: set[str] = set()
 752    for pattern in patterns:
 753        for match in re.finditer(pattern, text, flags=re.IGNORECASE):
 754            path = str(match.groupdict().get("path") or match.groupdict().get("quoted") or match.groupdict().get("plain") or "").strip()
 755            if not path or path in seen:
 756                continue
 757            seen.add(path)
 758            paths.append(path)
 759            if len(paths) >= 12:
 760                return paths
 761    return paths
 762
 763
 764def _missing_commands_from_shell_output(text: str) -> list[str]:
 765    patterns = [
 766        r"(?:^|\n)(?:/bin/sh:\s*\d+:\s*)?(?P<cmd>[A-Za-z0-9_.+-]+):\s*(?:not found|command not found)",
 767        r"(?:^|\n)(?:sh|bash|zsh):\s*(?P<shell_cmd>[A-Za-z0-9_.+-]+):\s*command not found",
 768        r"command not found:\s*(?P<suffix_cmd>[A-Za-z0-9_.+-]+)",
 769    ]
 770    commands: list[str] = []
 771    seen: set[str] = set()
 772    for pattern in patterns:
 773        for match in re.finditer(pattern, text, flags=re.IGNORECASE):
 774            command = str(
 775                match.groupdict().get("cmd")
 776                or match.groupdict().get("shell_cmd")
 777                or match.groupdict().get("suffix_cmd")
 778                or ""
 779            ).strip()
 780            if not command or "/" in command or command in seen:
 781                continue
 782            seen.add(command)
 783            commands.append(command)
 784            if len(commands) >= 12:
 785                return commands
 786    return commands
 787
 788
 789def _candidate_executable_paths_for_missing_commands(
 790    recent_steps: list[dict[str, Any]], missing_commands: list[str], *, window: int = 20, max_paths_per_command: int = 6
 791) -> dict[str, list[str]]:
 792    command_names = {str(command or "").strip().lower() for command in missing_commands}
 793    command_names = {command for command in command_names if command}
 794    if not command_names:
 795        return {}
 796    matches: dict[str, list[str]] = {command: [] for command in command_names}
 797    seen: set[tuple[str, str]] = set()
 798    for path in _observed_executable_paths_from_recent_shell(recent_steps, command_names=command_names, window=window):
 799        name = Path(path).name.lower()
 800        if name not in command_names:
 801            continue
 802        key = (name, path.lower())
 803        if key in seen or len(matches.get(name, [])) >= max_paths_per_command:
 804            continue
 805        seen.add(key)
 806        matches.setdefault(name, []).append(path)
 807    return {command: paths for command, paths in matches.items() if paths}
 808
 809
 810def _observed_executable_paths_from_recent_shell(
 811    recent_steps: list[dict[str, Any]],
 812    *,
 813    command_names: set[str] | None = None,
 814    exclude_paths: list[str] | None = None,
 815    window: int = 20,
 816    max_paths: int = 12,
 817) -> list[str]:
 818    excluded = {str(path or "").lower() for path in (exclude_paths or []) if path}
 819    paths: list[str] = []
 820    seen: set[str] = set()
 821    for step in _completed_or_failed_recent_steps(recent_steps)[-window:]:
 822        if step.get("tool_name") != "shell_exec":
 823            continue
 824        output = step.get("output") if isinstance(step.get("output"), dict) else {}
 825        text = "\n".join(str(output.get(key) or "") for key in ("stdout", "stderr", "error"))
 826        for line in text.splitlines():
 827            if _shell_line_reports_missing_candidate(line):
 828                continue
 829            for path in _extract_candidate_executable_paths(line, command_names):
 830                key = path.lower()
 831                if key in excluded or key in seen:
 832                    continue
 833                seen.add(key)
 834                paths.append(path)
 835                if len(paths) >= max_paths:
 836                    return paths
 837    return paths
 838
 839
 840def _shell_line_reports_missing_candidate(line: str) -> bool:
 841    lowered = str(line or "").lower()
 842    return any(
 843        marker in lowered
 844        for marker in (
 845            "not found",
 846            "no such file or directory",
 847            "cannot access",
 848            "cannot stat",
 849            "can't stat",
 850            "missing",
 851        )
 852    )
 853
 854
 855def _extract_candidate_executable_paths(text: str, command_names: set[str] | None = None) -> list[str]:
 856    commands = {command.lower() for command in (command_names or set()) if command}
 857    paths: list[str] = []
 858    seen: set[str] = set()
 859    for match in re.finditer(r"(?<![A-Za-z0-9])(?:~|/)[^\s'\"<>|;&]{2,}", text or ""):
 860        raw = _clean_candidate_file_path(match.group(0))
 861        if not _looks_like_candidate_executable_path(raw):
 862            continue
 863        name = Path(raw).name.lower()
 864        if commands and name not in commands:
 865            continue
 866        key = raw.lower()
 867        if key in seen:
 868            continue
 869        seen.add(key)
 870        paths.append(raw)
 871    return paths
 872
 873
 874def _looks_like_candidate_executable_path(value: str) -> bool:
 875    raw = str(value or "").strip()
 876    if not raw or len(raw) > 500:
 877        return False
 878    if "://" in raw or raw.startswith("//") or "..." in raw or "…" in raw or "*" in raw:
 879        return False
 880    if not raw.startswith(("/", "~")):
 881        return False
 882    name = Path(raw).name
 883    if not name or name.startswith(".") or name in {".", ".."}:
 884        return False
 885    if any(char in name for char in ("$", "{", "}", "`")):
 886        return False
 887    return True
 888
 889
 890PACKAGE_MANAGER_WRITE_COMMAND_PATTERN = re.compile(
 891    r"(?is)(?:^|[;&|]{1,2}\s*)(?:sudo\s+|doas\s+|pkexec\s+)?"
 892    r"(?:apt-get|apt|dnf|yum|apk|pacman|zypper|brew|port)\s+"
 893    r"(?:install|upgrade|update|remove|erase|add|sync|build-dep)\b"
 894)
 895PRIVILEGED_COMMAND_PATTERN = re.compile(r"(?is)(?:^|[;&|]{1,2}\s*)(?:sudo|doas|pkexec)\b")
 896
 897
 898def _shell_command_looks_privileged_or_package_manager(command: str) -> bool:
 899    text = str(command or "").strip()
 900    if not text:
 901        return False
 902    return bool(PRIVILEGED_COMMAND_PATTERN.search(text) or PACKAGE_MANAGER_WRITE_COMMAND_PATTERN.search(text))
 903
 904
 905def _shell_output_has_permission_failure(text: str) -> bool:
 906    lowered = str(text or "").lower()
 907    return any(
 908        marker in lowered
 909        for marker in (
 910            "permission denied",
 911            "not permitted",
 912            "operation not permitted",
 913            "authentication",
 914            "authorization",
 915            "are you root",
 916            "sudo:",
 917            "password is required",
 918            "unable to acquire the dpkg frontend lock",
 919            "could not open lock file",
 920        )
 921    )
 922
 923
 924def _recent_privileged_shell_failure_context(recent_steps: list[dict[str, Any]], *, window: int = 12) -> dict[str, Any] | None:
 925    accounting_tools = {"record_experiment", "record_tasks", "record_lesson", "record_roadmap", "record_milestone_validation"}
 926    latest_accounting_step = max(
 927        (
 928            _as_int(step.get("step_no"))
 929            for step in recent_steps[-window:]
 930            if step.get("status") == "completed" and step.get("tool_name") in accounting_tools
 931        ),
 932        default=0,
 933    )
 934    for step in reversed(_completed_or_failed_recent_steps(recent_steps)[-window:]):
 935        step_no = _as_int(step.get("step_no"))
 936        if latest_accounting_step and step_no <= latest_accounting_step:
 937            continue
 938        if step.get("tool_name") != "shell_exec":
 939            continue
 940        output = step.get("output") if isinstance(step.get("output"), dict) else {}
 941        command = _step_command(step) or str(output.get("command") or "")
 942        text = _shell_step_failure_text(step)
 943        if not _shell_output_has_permission_failure(text):
 944            continue
 945        if not _shell_command_looks_privileged_or_package_manager(command):
 946            continue
 947        return {
 948            "step_no": step.get("step_no"),
 949            "command": command,
 950            "excerpt": text.strip(),
 951        }
 952    return None
 953
 954
 955def _observed_candidate_recovery_required_context(recent_steps: list[dict[str, Any]], args: dict[str, Any]) -> dict[str, Any] | None:
 956    command = str(args.get("command") or "")
 957    if not command.strip():
 958        return None
 959    context = _shell_path_recovery_context(recent_steps)
 960    if not context:
 961        return None
 962    candidate_executables = (
 963        context.get("candidate_executables") if isinstance(context.get("candidate_executables"), dict) else {}
 964    )
 965    if not candidate_executables:
 966        return None
 967    for missing_command, paths in candidate_executables.items():
 968        if not isinstance(paths, list) or not paths:
 969            continue
 970        missing_name = str(missing_command or "").strip()
 971        if not missing_name:
 972            continue
 973        if not _shell_command_invokes_bare_executable(command, missing_name):
 974            continue
 975        if _shell_command_mentions_candidate_path(command, paths):
 976            continue
 977        return {
 978            "step_no": context.get("step_no"),
 979            "missing_command": missing_name,
 980            "candidate_executables": paths[:6],
 981            "blocked_command": command,
 982        }
 983    return None
 984
 985
 986def _shell_command_invokes_bare_executable(command: str, executable_name: str) -> bool:
 987    name = str(executable_name or "").strip()
 988    if not name:
 989        return False
 990    return bool(re.search(rf"(?<![A-Za-z0-9_./-]){re.escape(name)}(?![A-Za-z0-9_.-])", command))
 991
 992
 993def _shell_command_mentions_candidate_path(command: str, candidate_paths: list[Any]) -> bool:
 994    text = str(command or "")
 995    for path_value in candidate_paths:
 996        path = str(path_value or "").strip()
 997        if not path:
 998            continue
 999        if path in text:
1000            return True
1001        parent = str(Path(path).parent)
1002        if parent and parent not in {".", "/"} and parent in text:
1003            return True
1004    return False
1005
1006
1007def _candidate_file_discovery_context(job: dict[str, Any], recent_steps: list[dict[str, Any]]) -> dict[str, Any] | None:
1008    task_text = _open_file_dependent_task_text(job)
1009    if not task_text:
1010        return None
1011    recent_paths = [
1012        *_candidate_file_paths_from_recent_shell(recent_steps),
1013        *_candidate_file_paths_from_recent_grounding_blocks(recent_steps),
1014    ]
1015    durable_paths = _candidate_file_paths_from_durable_records(job)
1016    paths: list[str] = []
1017    seen: set[str] = set()
1018    for path in [*recent_paths, *durable_paths]:
1019        key = path.lower()
1020        if key in seen:
1021            continue
1022        seen.add(key)
1023        paths.append(path)
1024    if not paths:
1025        return None
1026    source_text = "Recent shell output or durable records listed candidate file paths"
1027    if recent_paths and not durable_paths:
1028        source_text = "Recent shell output listed candidate file paths"
1029    elif durable_paths and not recent_paths:
1030        source_text = "Durable records mention candidate file paths"
1031    return {
1032        "paths": _rank_candidate_file_paths(job, task_text, paths, recent_steps=recent_steps),
1033        "invalid_paths": _invalid_candidate_file_paths(paths, recent_steps),
1034        "source_text": source_text,
1035        "task_text": task_text,
1036    }
1037
1038
1039def _shell_exec_targets_candidate_file(job: dict[str, Any], recent_steps: list[dict[str, Any]], args: dict[str, Any]) -> bool:
1040    command = str(args.get("command") or "")
1041    if not command.strip():
1042        return False
1043    context = _candidate_file_discovery_context(job, recent_steps)
1044    if not context:
1045        return False
1046    command_text = command.replace("\\ ", " ")
1047    return any(path and path in command_text for path in context.get("paths", [])[:12])
1048
1049
1050def _rank_candidate_file_paths(
1051    job: dict[str, Any],
1052    task_text: str,
1053    paths: list[str],
1054    *,
1055    recent_steps: list[dict[str, Any]] | None = None,
1056) -> list[str]:
1057    context_tokens = _candidate_context_tokens(job, task_text)
1058    indexed = list(enumerate(paths))
1059    ranked = sorted(
1060        indexed,
1061        key=lambda item: _candidate_file_path_score(
1062            item[1],
1063            context_tokens,
1064            item[0],
1065            recent_steps=recent_steps,
1066        ),
1067        reverse=True,
1068    )
1069    return [path for _, path in ranked]
1070
1071
1072def _candidate_context_tokens(job: dict[str, Any], task_text: str) -> set[str]:
1073    text = " ".join(str(job.get(key) or "") for key in ("title", "objective", "kind")) + " " + task_text
1074    tokens = set()
1075    for token in re.findall(r"[A-Za-z0-9][A-Za-z0-9._-]{2,}", text.lower()):
1076        cleaned = token.strip("._-")
1077        if not cleaned or cleaned in QUERY_STOPWORDS or cleaned in TEXT_TOKEN_STOPWORDS:
1078            continue
1079        tokens.add(cleaned)
1080        for part in re.split(r"[._-]+", cleaned):
1081            if len(part) >= 3 and part not in QUERY_STOPWORDS and part not in TEXT_TOKEN_STOPWORDS:
1082                tokens.add(part)
1083    return tokens
1084
1085
1086def _candidate_file_path_score(
1087    path: str,
1088    context_tokens: set[str],
1089    original_index: int,
1090    *,
1091    recent_steps: list[dict[str, Any]] | None = None,
1092) -> float:
1093    lowered_path = path.lower()
1094    name = Path(path).name.lower()
1095    stem = Path(name).stem.lower()
1096    path_tokens = set()
1097    for token in re.findall(r"[a-z0-9][a-z0-9._-]{1,}", lowered_path):
1098        path_tokens.add(token.strip("._-"))
1099        path_tokens.update(part for part in re.split(r"[._-]+", token) if len(part) >= 2)
1100    score = 0.0
1101    matches = context_tokens & {token for token in path_tokens if token}
1102    score += len(matches) * 8.0
1103    if any(token and token in stem for token in context_tokens):
1104        score += 6.0
1105    if "/" in path:
1106        score += min(path.count("/"), 8) * 0.15
1107    auxiliary_markers = (
1108        "vocab",
1109        "tokenizer",
1110        "tokeniser",
1111        "mmproj",
1112        "adapter",
1113        "config",
1114        "readme",
1115        "license",
1116        "metadata",
1117        "sample",
1118        "example",
1119        "stub",
1120    )
1121    if any(marker in name for marker in auxiliary_markers):
1122        score -= 18.0
1123    if name.startswith("."):
1124        score -= 20.0
1125    suffix = Path(name).suffix.lower()
1126    if suffix:
1127        score += 1.0
1128    score += _candidate_file_observation_score(path, recent_steps or [])
1129    score -= original_index * 0.01
1130    return score
1131
1132
1133def _invalid_candidate_file_paths(paths: list[str], recent_steps: list[dict[str, Any]]) -> list[str]:
1134    invalid: list[str] = []
1135    for path in paths:
1136        if _candidate_file_observation_score(path, recent_steps) <= -30:
1137            invalid.append(path)
1138    return invalid
1139
1140
1141def _candidate_file_observation_score(path: str, recent_steps: list[dict[str, Any]], *, window: int = 12) -> float:
1142    if not path:
1143        return 0.0
1144    path_key = path.lower()
1145    score = 0.0
1146    for step in _completed_or_failed_recent_steps(recent_steps)[-window:]:
1147        if step.get("tool_name") != "shell_exec":
1148            continue
1149        output = step.get("output") if isinstance(step.get("output"), dict) else {}
1150        text = "\n".join(str(output.get(key) or "") for key in ("stdout", "stderr", "error"))
1151        for line in text.splitlines():
1152            lowered = line.lower()
1153            if path_key not in lowered:
1154                continue
1155            if _shell_line_reports_missing_candidate(line):
1156                score -= 70.0
1157            if any(marker in lowered for marker in ("ascii text", "html document", "json data", "with no line terminators")):
1158                score -= 45.0
1159            score += _candidate_file_size_score_from_line(line)
1160    return score
1161
1162
1163def _candidate_file_recently_validated(path: str, recent_steps: list[dict[str, Any]], *, window: int = 12) -> str:
1164    if not path or _candidate_file_observation_score(path, recent_steps, window=window) < 30:
1165        return ""
1166    path_key = path.lower()
1167    evidence_lines: list[str] = []
1168    for step in _completed_or_failed_recent_steps(recent_steps)[-window:]:
1169        if step.get("tool_name") != "shell_exec":
1170            continue
1171        output = step.get("output") if isinstance(step.get("output"), dict) else {}
1172        text = "\n".join(str(output.get(key) or "") for key in ("stdout", "stderr", "error"))
1173        for line in text.splitlines():
1174            if path_key not in line.lower():
1175                continue
1176            if _shell_line_reports_missing_candidate(line):
1177                continue
1178            evidence_lines.append(" ".join(line.split()))
1179            if len(evidence_lines) >= 3:
1180                return " | ".join(evidence_lines)
1181    return "recent shell evidence showed a non-trivial candidate file size or positive file metadata"
1182
1183
1184def _candidate_file_size_score_from_line(line: str) -> float:
1185    lowered = str(line or "").lower()
1186    if re.search(r"\b\d+(?:\.\d+)?\s*(?:t|tb|tib|g|gb|gib)\b", lowered):
1187        return 55.0
1188    if re.search(r"\b\d+(?:\.\d+)?\s*(?:m|mb|mib)\b", lowered):
1189        return 18.0
1190    integers = [int(match) for match in re.findall(r"(?<![\w.])\d{1,15}(?![\w.])", lowered)]
1191    if any(value >= 1_000_000_000 for value in integers):
1192        return 55.0
1193    if any(value >= 1_000_000 for value in integers):
1194        return 18.0
1195    return 0.0
1196
1197
1198def _open_file_dependent_task_text(job: dict[str, Any]) -> str:
1199    tasks = _metadata_list(job, "task_queue")
1200    parts: list[str] = []
1201    for task in tasks:
1202        if not isinstance(task, dict):
1203            continue
1204        status = str(task.get("status") or "open").lower()
1205        if status not in {"open", "active", "waiting", "blocked"}:
1206            continue
1207        text = " ".join(
1208            str(task.get(key) or "")
1209            for key in ("title", "description", "acceptance_criteria", "evidence_needed", "stall_behavior", "contract")
1210        )
1211        lowered = text.lower()
1212        if any(term in lowered for term in ("file", "path", "download", "artifact", "validate", "benchmark", "script", "config")):
1213            parts.append(" ".join(text.split()))
1214        if len(parts) >= 4:
1215            break
1216    return " | ".join(parts)
1217
1218
1219def _candidate_file_paths_from_recent_shell(
1220    recent_steps: list[dict[str, Any]], *, window: int = 8, max_paths: int = 80
1221) -> list[str]:
1222    paths: list[str] = []
1223    seen: set[str] = set()
1224    for step in _completed_or_failed_recent_steps(recent_steps)[-window:]:
1225        if step.get("tool_name") != "shell_exec":
1226            continue
1227        output = step.get("output") if isinstance(step.get("output"), dict) else {}
1228        text = "\n".join(str(output.get(key) or "") for key in ("stdout", "stderr", "error"))
1229        for path in _extract_candidate_file_paths(text):
1230            key = path.lower()
1231            if key in seen:
1232                continue
1233            seen.add(key)
1234            paths.append(path)
1235            if len(paths) >= max_paths:
1236                return paths
1237    return paths
1238
1239
1240def _candidate_file_paths_from_recent_grounding_blocks(
1241    recent_steps: list[dict[str, Any]], *, window: int = 8, max_paths: int = 80
1242) -> list[str]:
1243    paths: list[str] = []
1244    seen: set[str] = set()
1245    for step in recent_steps[-window:]:
1246        output = step.get("output") if isinstance(step.get("output"), dict) else {}
1247        grounding = output.get("evidence_grounding") if isinstance(output.get("evidence_grounding"), dict) else {}
1248        candidates = grounding.get("missing_candidate_paths")
1249        if not isinstance(candidates, list):
1250            continue
1251        for candidate in candidates:
1252            path = _clean_candidate_file_path(str(candidate or ""))
1253            if not _looks_like_exact_candidate_file_path(path):
1254                continue
1255            key = path.lower()
1256            if key in seen:
1257                continue
1258            seen.add(key)
1259            paths.append(path)
1260            if len(paths) >= max_paths:
1261                return paths
1262    return paths
1263
1264
1265def _candidate_file_paths_from_durable_records(
1266    job: dict[str, Any], *, max_records: int = 80, max_paths: int = 80
1267) -> list[str]:
1268    metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
1269    paths: list[str] = []
1270    seen: set[str] = set()
1271    record_groups = [
1272        _metadata_list(job, "experiment_ledger"),
1273        _metadata_list(job, "finding_ledger"),
1274        _metadata_list(job, "lessons"),
1275        _metadata_list(job, "source_ledger"),
1276        _metadata_list(job, "task_queue"),
1277    ]
1278    roadmap = metadata.get("roadmap") if isinstance(metadata.get("roadmap"), dict) else {}
1279    milestones = roadmap.get("milestones") if isinstance(roadmap.get("milestones"), list) else []
1280    record_groups.append([item for item in milestones if isinstance(item, dict)])
1281    checked = 0
1282    for records in record_groups:
1283        for record in reversed(records[-max_records:]):
1284            if not isinstance(record, dict):
1285                continue
1286            checked += 1
1287            try:
1288                text = json.dumps(record, ensure_ascii=False, sort_keys=True)
1289            except (TypeError, ValueError):
1290                text = str(record)
1291            for path in _extract_candidate_file_paths(text):
1292                key = path.lower()
1293                if key in seen:
1294                    continue
1295                seen.add(key)
1296                paths.append(path)
1297                if len(paths) >= max_paths:
1298                    return paths
1299            if checked >= max_records * len(record_groups):
1300                return paths
1301    return paths
1302
1303
1304def _extract_candidate_file_paths(text: str) -> list[str]:
1305    paths: list[str] = []
1306    for match in re.finditer(r"(?<![A-Za-z0-9])(?:~|/)[^\s'\"<>|;&]{2,}", text or ""):
1307        raw = _clean_candidate_file_path(match.group(0))
1308        if not _looks_like_exact_candidate_file_path(raw):
1309            continue
1310        paths.append(raw)
1311    for match in re.finditer(r'"path"\s*:\s*"([^"]+\.[A-Za-z0-9][A-Za-z0-9_-]{1,12})"', text or ""):
1312        raw = _clean_candidate_file_path(match.group(1))
1313        if not _looks_like_exact_candidate_file_path(raw, allow_relative=True):
1314            continue
1315        paths.append(raw)
1316    return paths
1317
1318
1319def _looks_like_exact_candidate_file_path(value: str, *, allow_relative: bool = False) -> bool:
1320    raw = str(value or "").strip()
1321    if not raw or len(raw) > 500:
1322        return False
1323    if "://" in raw or raw.startswith("//") or "..." in raw or "…" in raw or "*" in raw:
1324        return False
1325    if not allow_relative and not raw.startswith(("/", "~")):
1326        return False
1327    name = Path(raw).name
1328    if not name or name.startswith("."):
1329        return False
1330    suffix = Path(name).suffix
1331    if not suffix or not re.match(r"^\.[A-Za-z0-9][A-Za-z0-9_]{1,12}$", suffix) or not any(ch.isalpha() for ch in suffix):
1332        return False
1333    return True
1334
1335
1336def _clean_candidate_file_path(value: str) -> str:
1337    raw = str(value or "").strip().rstrip(".,:;)")
1338    for separator in ("\\n", "\\r", "\\t", "\n", "\r", "\t"):
1339        raw = raw.split(separator, 1)[0]
1340    return raw.strip().rstrip(".,:;)")
1341
1342
1343def _progress_accounting_for_prompt(recent_steps: list[dict[str, Any]]) -> str:
1344    context = _artifact_accounting_context(recent_steps)
1345    if not context:
1346        return "None."
1347    return (
1348        "Recent saved outputs need accounting before more output/research. "
1349        f"artifact_count={context.get('artifact_count')} since_step={context.get('since_step')} "
1350        f"artifact_titles={'; '.join(str(title) for title in context.get('artifact_titles', [])[:4])}. "
1351        "Next use record_tasks or record_roadmap to mark progress/reopen branches, "
1352        "record_findings or record_source for reusable evidence, record_experiment for measured results, "
1353        "record_milestone_validation for milestone checks, or record_lesson if these outputs are not useful."
1354    )
1355
1356
1357def _activity_stagnation_for_prompt(job: dict[str, Any]) -> str:
1358    context = _activity_stagnation_context(job)
1359    if not context:
1360        return "None."
1361    return (
1362        "Recent checkpoints have reported activity without durable progress. "
1363        f"activity_checkpoint_streak={context.get('streak')} threshold={context.get('threshold')} "
1364        f"last_counts={context.get('counts')}. "
1365        "Next classify the branch with record_findings, record_source, record_experiment, record_tasks, "
1366        "record_roadmap, record_milestone_validation, or record_lesson. If the branch is low-yield, mark it "
1367        "blocked/skipped and pivot before doing more read-only work or saving more outputs."
1368    )
1369
1370
1371def _research_balance_guard_for_prompt(job: dict[str, Any], recent_steps: list[dict[str, Any]]) -> str:
1372    context = _research_balance_context(job, recent_steps)
1373    if not context:
1374        return "None."
1375    return (
1376        "Recent work is execution-heavy but has little source-backed research recorded. "
1377        f"completed_window={context.get('completed_window')} execution_actions={context.get('execution_actions')} "
1378        f"research_actions={context.get('research_actions')} sources={context.get('sources')} findings={context.get('findings')} "
1379        f"experiments={context.get('experiments')} files={context.get('files')}. "
1380        "Before another deep execution/testing loop, use available research, browser, source, documentation, or local-inspection tools "
1381        "to gather evidence and record it with record_source, record_findings, record_lesson, record_tasks, or an artifact. "
1382        "If external research is not relevant or tools are unavailable, explicitly record why and what evidence substitutes for it."
1383    )
1384
1385
1386def _source_yield_guard_for_prompt(job: dict[str, Any], recent_steps: list[dict[str, Any]]) -> str:
1387    context = _source_yield_context(job, recent_steps)
1388    if not context:
1389        return "None."
1390    return (
1391        "Many sources have been gathered without enough durable synthesis. "
1392        f"sources={context.get('sources')} findings={context.get('findings')} "
1393        f"yielded_sources={context.get('yielded_sources')} recent_gathering={context.get('recent_gathering')} "
1394        f"recent_source_titles={'; '.join(str(title) for title in context.get('recent_source_titles', [])[:4])}. "
1395        "Before more search, extraction, browsing, shell work, file/output writing, or report chatter, distill the "
1396        "source set into record_findings with evidence, update record_source with yield/fail outcomes, or update "
1397        "tasks/roadmap/lessons to reject or pivot the low-yield source branch."
1398    )
1399
1400
1401def _task_planning_guard_for_prompt(job: dict[str, Any]) -> str:
1402    context = _task_planning_stagnation_context(job)
1403    if not context:
1404        return "None."
1405    return (
1406        "Recent checkpoints only added or updated tasks without durable evidence, measurements, validations, "
1407        f"or lessons. task_only_checkpoints={context.get('task_only_checkpoints')} "
1408        f"open_tasks={context.get('open_tasks')} total_tasks={context.get('total_tasks')}. "
1409        "Do not create more new open tasks next. Execute, measure, validate, write a checkpoint, mark existing "
1410        "tasks done/blocked/skipped, or record a lesson from the branch."
1411    )
1412
1413
1414def _task_queue_saturation_for_prompt(job: dict[str, Any], recent_steps: list[dict[str, Any]]) -> str:
1415    context = _recent_task_queue_saturation_context(recent_steps)
1416    persistent_pressure = False
1417    if not context:
1418        metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
1419        pressure = metadata.get("task_backlog_pressure") if isinstance(metadata.get("task_backlog_pressure"), dict) else {}
1420        current_pressure = _current_task_backlog_pressure_context(job)
1421        if not pressure and not current_pressure:
1422            return "None."
1423        if current_pressure:
1424            guard_recovery = pressure.get("guard_recovery") if isinstance(pressure.get("guard_recovery"), dict) else {}
1425            task_queue = guard_recovery.get("task_queue") if isinstance(guard_recovery.get("task_queue"), dict) else {}
1426            context = {
1427                "step_no": pressure.get("latest_step_no") or guard_recovery.get("latest_step_no") or "current",
1428                "source": pressure.get("source") or ("guard_recovery" if guard_recovery else "current_queue"),
1429                "reason": pressure.get("reason") or task_queue.get("reason") or current_pressure.get("reason"),
1430                "open_count": current_pressure.get("open_count"),
1431                "total_count": current_pressure.get("total_count"),
1432                "open_titles": current_pressure.get("open_titles") or [],
1433            }
1434        else:
1435            return "None."
1436        persistent_pressure = True
1437    counts = []
1438    if context.get("open_count") is not None:
1439        counts.append(f"open_tasks={context.get('open_count')}")
1440    if context.get("total_count") is not None:
1441        counts.append(f"total_tasks={context.get('total_count')}")
1442    count_text = " ".join(counts) or "queue is saturated"
1443    open_titles = [str(title).strip() for title in context.get("open_titles") or [] if str(title).strip()]
1444    title_text = f" Existing runnable task titles: {json.dumps(open_titles[:8], ensure_ascii=False)}." if open_titles else ""
1445    if context.get("source") == "blocked_record_tasks":
1446        source_label = "record_tasks block"
1447    elif context.get("source") == "current_queue":
1448        source_label = "current queue"
1449    else:
1450        source_label = "guard recovery"
1451    opening = (
1452        f"Task backlog pressure remains active from {source_label} #{context.get('step_no')}: "
1453        if persistent_pressure
1454        else f"Task queue saturation was just hit at step #{context.get('step_no')}: "
1455    )
1456    return (
1457        opening
1458        + f"{context.get('reason') or 'task queue saturated'} ({count_text}). "
1459        f"{title_text} "
1460        "Do not create new task branches. Either execute an existing high-priority branch, "
1461        "or use record_tasks only to update existing task titles to active, done, blocked, or skipped "
1462        "with concise result/evidence. If you have a near-duplicate task, update the closest existing "
1463        "task instead of inventing a fresh title. Consolidate branch sprawl into roadmap/milestones when useful. "
1464        "If this repeats, record_tasks is temporarily withheld so the worker must use a non-planning action."
1465    )
1466
1467
1468def _current_task_backlog_pressure_context(job: dict[str, Any]) -> dict[str, Any] | None:
1469    tasks = _metadata_list(job, "task_queue")
1470    objective_tasks = [task for task in tasks if not _is_guard_recovery_task(task)]
1471    open_tasks = [
1472        task
1473        for task in objective_tasks
1474        if str(task.get("status") or "open").strip().lower().replace(" ", "_") in {"open", "active"}
1475    ]
1476    if len(objective_tasks) <= TASK_QUEUE_TOTAL_SOFT_LIMIT and len(open_tasks) < TASK_QUEUE_SATURATION_OPEN_TASKS:
1477        return None
1478    reason = "total task queue is too large" if len(objective_tasks) > TASK_QUEUE_TOTAL_SOFT_LIMIT else "too many open tasks"
1479    return {
1480        "reason": reason,
1481        "open_count": len(open_tasks),
1482        "total_count": len(objective_tasks),
1483        "open_titles": [
1484            str(task.get("title") or "").strip()
1485            for task in open_tasks[:8]
1486            if str(task.get("title") or "").strip()
1487        ],
1488    }
1489
1490
1491def _memory_consolidation_guard_for_prompt(job: dict[str, Any], recent_steps: list[dict[str, Any]]) -> str:
1492    context = _memory_graph_consolidation_context(job, recent_steps)
1493    if not context:
1494        return "None."
1495    return (
1496        "Durable job memory is growing faster than the connected memory graph. "
1497        f"durable_records={context.get('durable_records')} graph_nodes={context.get('graph_nodes')} "
1498        f"graph_edges={context.get('graph_edges')} reason={context.get('reason')}. "
1499        "Before more branch work, use record_memory_graph to consolidate the most reusable facts, strategies, "
1500        "decisions, questions, skills, constraints, episodes, and evidence links. If there is truly nothing "
1501        "reusable, record a lesson explaining why this branch should not become graph memory."
1502    )
1503
1504
1505def _lesson_consolidation_guard_for_prompt(job: dict[str, Any], recent_steps: list[dict[str, Any]]) -> str:
1506    context = _lesson_sprawl_context(job, recent_steps)
1507    if not context:
1508        return "None."
1509    return (
1510        "Raw lessons are accumulating faster than consolidated memory. "
1511        f"lessons={context.get('lessons')} recent_lessons={context.get('recent_lessons')} "
1512        f"graph_nodes={context.get('graph_nodes')} reason={context.get('reason')}. "
1513        "Do not add another raw lesson next. Consolidate the reusable strategy, mistake, constraint, decision, "
1514        "or question into record_memory_graph, or update existing tasks/roadmap state if the lesson only describes "
1515        "branch status."
1516    )
1517
1518
1519def _memory_graph_consolidation_context(job: dict[str, Any], recent_steps: list[dict[str, Any]]) -> dict[str, Any] | None:
1520    if any(step.get("tool_name") == "record_memory_graph" and step.get("status") == "completed" for step in recent_steps[-8:]):
1521        return None
1522    graph = memory_graph_from_job(job)
1523    node_count = len(graph["nodes"])
1524    edge_count = len(graph["edges"])
1525    durable_records = _durable_memory_signal_count(job)
1526    if durable_records < 6:
1527        return None
1528    reason = ""
1529    if node_count == 0:
1530        reason = "durable ledgers exist but no graph nodes have been consolidated"
1531    elif durable_records >= 12 and node_count * 5 < durable_records:
1532        reason = "graph is sparse relative to reusable durable records"
1533    elif node_count >= 3 and edge_count == 0 and durable_records >= 10:
1534        reason = "graph nodes exist but have no links"
1535    if not reason:
1536        return None
1537    return {
1538        "durable_records": durable_records,
1539        "graph_nodes": node_count,
1540        "graph_edges": edge_count,
1541        "reason": reason,
1542    }
1543
1544
1545def _lesson_sprawl_context(job: dict[str, Any], recent_steps: list[dict[str, Any]]) -> dict[str, Any] | None:
1546    memory_context = _memory_graph_consolidation_context(job, recent_steps)
1547    if not memory_context:
1548        return None
1549    lessons = _metadata_list(job, "lessons")
1550    lesson_count = len(lessons)
1551    if lesson_count < LESSON_SPRAWL_MIN_LESSONS:
1552        return None
1553    recent_lessons = [
1554        step
1555        for step in recent_steps[-12:]
1556        if step.get("tool_name") == "record_lesson" and str(step.get("status") or "").lower() == "completed"
1557    ]
1558    if len(recent_lessons) < LESSON_SPRAWL_RECENT_LESSONS and lesson_count < LESSON_SPRAWL_MIN_LESSONS * 2:
1559        return None
1560    return {
1561        "lessons": lesson_count,
1562        "recent_lessons": len(recent_lessons),
1563        "graph_nodes": memory_context.get("graph_nodes"),
1564        "graph_edges": memory_context.get("graph_edges"),
1565        "durable_records": memory_context.get("durable_records"),
1566        "reason": "raw lesson backlog needs graph consolidation",
1567    }
1568
1569
1570def _durable_memory_signal_count(job: dict[str, Any]) -> int:
1571    count = (
1572        len(_metadata_list(job, "finding_ledger"))
1573        + len(_metadata_list(job, "source_ledger"))
1574        + len(_metadata_list(job, "experiment_ledger"))
1575        + len(_metadata_list(job, "lessons"))
1576    )
1577    tasks = _metadata_list(job, "task_queue")
1578    count += sum(
1579        1
1580        for task in tasks
1581        if str(task.get("status") or "open").lower() in {"done", "blocked", "skipped"}
1582        and (task.get("result") or task.get("evidence_needed") or task.get("acceptance_criteria"))
1583    )
1584    metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
1585    roadmap = metadata.get("roadmap") if isinstance(metadata.get("roadmap"), dict) else {}
1586    milestones = roadmap.get("milestones") if isinstance(roadmap.get("milestones"), list) else []
1587    count += sum(
1588        1
1589        for milestone in milestones
1590        if isinstance(milestone, dict)
1591        and str(milestone.get("status") or "planned").lower() in {"active", "validating", "done", "blocked", "skipped"}
1592    )
1593    return count
1594
1595
1596def _durable_yield_for_prompt(job: dict[str, Any], recent_steps: list[dict[str, Any]]) -> str:
1597    completed = [step for step in recent_steps if step.get("status") == "completed"]
1598    if len(completed) < 20:
1599        return "None."
1600    durable_tools = LEDGER_PROGRESS_TOOLS | {"write_artifact", "write_file"}
1601    durable_indexes = [
1602        index
1603        for index, step in enumerate(completed)
1604        if step.get("tool_name") in durable_tools
1605    ]
1606    metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
1607    durable_records = (
1608        len(_metadata_list(job, "finding_ledger"))
1609        + len(_metadata_list(job, "source_ledger"))
1610        + len(_metadata_list(job, "experiment_ledger"))
1611        + len(_metadata_list(job, "lessons"))
1612    )
1613    roadmap = metadata.get("roadmap") if isinstance(metadata.get("roadmap"), dict) else {}
1614    milestones = roadmap.get("milestones") if isinstance(roadmap.get("milestones"), list) else []
1615    durable_records += len(milestones)
1616    if not durable_indexes and durable_records <= 0:
1617        return (
1618            f"No durable progress records after {len(completed)} completed actions. "
1619            "Next action should save an output, record findings/source/experiment/lesson/roadmap progress, "
1620            "or mark the branch blocked/skipped before more read-only work."
1621        )
1622    last_durable_index = durable_indexes[-1] if durable_indexes else -1
1623    actions_since = len(completed) - last_durable_index - 1
1624    durable_steps = len(durable_indexes)
1625    actions_per_durable = len(completed) / max(1, durable_steps + durable_records)
1626    if actions_since < 25 and actions_per_durable < 30:
1627        return "None."
1628    return (
1629        f"Durable yield is low: completed_actions={len(completed)} durable_steps={durable_steps} "
1630        f"durable_records={durable_records} actions_since_last_durable={actions_since} "
1631        f"actions_per_durable~{actions_per_durable:.1f}. "
1632        "Prefer a concrete checkpoint next: write/save output, record measured or reusable evidence, validate a milestone, "
1633        "or reject/pivot the branch with a lesson."
1634    )
1635
1636
1637def _reflections_for_prompt(job: dict[str, Any]) -> str:
1638    reflections = _metadata_list(job, "reflections")
1639    if not reflections:
1640        return "No reflection checkpoints yet."
1641    lines = []
1642    for reflection in reflections[-2:]:
1643        strategy = f" strategy={reflection.get('strategy')}" if reflection.get("strategy") else ""
1644        lines.append("- " + _clip_text(f"{reflection.get('summary')}{strategy}", 520))
1645    return "\n".join(lines)
1646
1647
1648def _next_action_constraint(job: dict[str, Any], recent_steps: list[dict[str, Any]]) -> str:
1649    measurement_obligation = _pending_measurement_obligation(job)
1650    if measurement_obligation:
1651        return (
1652            "A pending measurement obligation is active from "
1653            f"step #{measurement_obligation.get('source_step_no') or '?'}. "
1654            "Resolve it with record_experiment, record_lesson explaining why it is invalid, "
1655            "or record_tasks creating the missing measurement branch before more research/artifact churn."
1656        )
1657    file_validation = _pending_file_validation_obligation(job)
1658    if file_validation:
1659        return (
1660            "A recently written file needs validation before more branch work. "
1661            f"File: {_clip_text(str(file_validation.get('path') or ''), 260)}. "
1662            f"Suggested validation: {_clip_text(str(file_validation.get('suggested_validation') or ''), 360)}. "
1663            "Use shell_exec to validate it, or record_tasks/record_lesson/record_experiment if validation is blocked or deferred."
1664        )
1665    artifact_accounting = _artifact_accounting_context(recent_steps)
1666    if artifact_accounting:
1667        return (
1668            "Recent saved outputs need durable accounting. Before more artifact writing, reading, research, browsing, "
1669            "or shell work, use record_tasks, record_roadmap, record_milestone_validation, record_findings, record_source, record_experiment, or record_lesson "
1670            "to explain what changed and what branch is next."
1671        )
1672    checkpoint_accounting = _auto_checkpoint_accounting_context(job, recent_steps)
1673    if checkpoint_accounting:
1674        if not checkpoint_accounting.get("checkpoint_read"):
1675            return (
1676                "An auto-saved evidence checkpoint is pending. Read that specific checkpoint artifact, or use a durable "
1677                "ledger tool to account for the checkpoint from existing evidence before more branch work."
1678            )
1679        return (
1680            "An already-read evidence checkpoint is pending durable accounting. Use record_findings, record_source, "
1681            "record_experiment, record_tasks, record_roadmap, record_milestone_validation, or record_lesson before "
1682            "more shell, search, file, report, or artifact work."
1683        )
1684    grounding_block = _latest_evidence_grounding_block(recent_steps)
1685    if grounding_block:
1686        raw_missing_paths = grounding_block.get("missing_candidate_paths") if isinstance(grounding_block.get("missing_candidate_paths"), list) else []
1687        missing_paths = [
1688            path
1689            for path in (_clean_candidate_file_path(str(item or "")) for item in raw_missing_paths)
1690            if _looks_like_exact_candidate_file_path(path)
1691        ]
1692        path_text = "; ".join(str(path) for path in missing_paths[:6])
1693        detail = f" Missing exact paths: {path_text}." if path_text else ""
1694        candidate_files = _candidate_file_discovery_context(job, recent_steps)
1695        if candidate_files:
1696            paths = candidate_files.get("paths") if isinstance(candidate_files.get("paths"), list) else []
1697            current_path_text = "; ".join(str(path) for path in paths[:4])
1698            if current_path_text:
1699                return (
1700                    "Recent durable-record grounding failed, but current ranked candidate paths are available. "
1701                    "Treat the failed record as rejected for now and validate the highest-confidence candidate next "
1702                    "instead of repairing stale wording. "
1703                    f"Candidate paths: {_clip_text(current_path_text, 520)}."
1704                )
1705        return (
1706            "Recent evidence grounding blocked a durable record. Next, rewrite the record using only observed evidence, "
1707            "include the exact observed paths/tokens when claiming candidates or files, or explicitly record why they "
1708            f"are irrelevant/invalid.{detail}"
1709        )
1710    action_failure = _experiment_next_action_failure_context(job, recent_steps)
1711    if action_failure:
1712        return (
1713            "The latest experiment next action was attempted, but the observed shell output reports a missing "
1714            f"command/path/prerequisite at step #{action_failure.get('step_no') or '?'}. "
1715            f"Observed output: {_clip_text(str(action_failure.get('excerpt') or ''), 260)}. "
1716            "Next, account for this attempted action with record_experiment, record_tasks, or record_lesson: "
1717            "mark the branch failed/blocked or create the concrete recovery branch. Do not run more read-only probes "
1718            "until the failed action is durable."
1719        )
1720    measured_guard = _measured_progress_guard_context(job, recent_steps)
1721    if measured_guard:
1722        return (
1723            "This job needs measured progress, not more research-only activity. "
1724            "Do one of: run a small measuring command/action, call record_experiment for a known measurement, "
1725            "record_tasks with an experiment/action/monitor contract, or record_lesson if measurement is blocked."
1726        )
1727    activity_stagnation = _activity_stagnation_context(job)
1728    if activity_stagnation:
1729        return (
1730            "Recent checkpoints show activity without durable progress. "
1731            "Use a ledger or planning tool to classify what changed, reject the low-yield branch, or open a better branch "
1732            "before more read-only work or output churn."
1733        )
1734    task_planning_guard = _task_planning_stagnation_context(job)
1735    if task_planning_guard:
1736        return (
1737            "Recent progress is only task planning. Do not create more new open tasks next. Execute an existing task, "
1738            "record evidence/measurements/validation, write a checkpoint, mark tasks done/blocked/skipped, or record "
1739            "a lesson before expanding the queue again."
1740        )
1741    task_queue_saturation = _recent_task_queue_saturation_context(recent_steps)
1742    if task_queue_saturation:
1743        return (
1744            "The durable task queue is saturated. Do not create new task branches. Execute a current task, "
1745            "or use record_tasks only to update existing task titles to active/done/blocked/skipped with evidence."
1746        )
1747    memory_consolidation = _memory_graph_consolidation_context(job, recent_steps)
1748    if memory_consolidation:
1749        return (
1750            "Consolidate durable progress into the job memory graph before more branch work. "
1751            "Use record_memory_graph for connected reusable knowledge, or record_lesson if the recent branch has no "
1752            "reusable memory value."
1753        )
1754    deliverable_guard = _deliverable_progress_guard_context(job, recent_steps)
1755    if deliverable_guard:
1756        return (
1757            "This job needs a durable deliverable checkpoint, not more background collection. "
1758            "Use write_file or write_artifact to save a partial draft/report/file, or use record_tasks, "
1759            "record_roadmap, record_milestone_validation, or record_lesson to explain the specific blocker "
1760            "and the next deliverable branch."
1761        )
1762    research_balance = _research_balance_context(job, recent_steps)
1763    if research_balance:
1764        return (
1765            "Balance execution with research before the next deep action loop. "
1766            "Gather source-backed evidence with available web/browser/documentation/local-inspection tools and record it, "
1767            "or record why research is not applicable and what evidence replaces it."
1768        )
1769    candidate_files = _candidate_file_discovery_context(job, recent_steps)
1770    if candidate_files:
1771        paths = candidate_files.get("paths") if isinstance(candidate_files.get("paths"), list) else []
1772        path_text = "; ".join(str(path) for path in paths[:4])
1773        primary_path = str(paths[0]) if paths else ""
1774        validation = _candidate_file_recently_validated(primary_path, recent_steps)
1775        if validation:
1776            return (
1777                "A highest-confidence candidate file path has recent positive validation evidence. "
1778                "Use it in the next bounded action or measurement instead of repeating existence checks. "
1779                f"Candidate path: {_clip_text(primary_path, 420)}. Evidence: {_clip_text(validation, 420)}."
1780            )
1781        return (
1782            "Concrete candidate file paths are available while file/path-dependent work is open. "
1783            f"Validate likely candidates next with shell_exec before retrying downloads, searching for alternatives, "
1784            f"or recording no-file/no-progress claims. Candidate paths: {_clip_text(path_text, 520)}."
1785        )
1786    experiment_next_action = _latest_experiment_next_action_context(job)
1787    if experiment_next_action:
1788        return (
1789            "The latest measured experiment selected a concrete next action. "
1790            f"Next action: {_clip_text(experiment_next_action.get('next_action') or '', 520)}. "
1791            "Act on it with the appropriate tool, or use record_tasks/record_lesson if it is invalid or blocked. "
1792            "Do not bury it under more checkpoints or unrelated research."
1793        )
1794    milestone_validation = _milestone_validation_needed(job)
1795    if milestone_validation:
1796        return (
1797            f"Roadmap milestone '{milestone_validation.get('title')}' is ready for validation or is marked validating. "
1798            "Use record_milestone_validation with evidence and pass/fail/blocker status, then create follow-up tasks for gaps."
1799        )
1800    roadmap_staleness = _roadmap_staleness_context(job, recent_steps)
1801    if roadmap_staleness:
1802        return (
1803            "The roadmap has not advanced despite durable task/artifact activity. "
1804            "Use record_roadmap to mark the current milestone active/done/blocked, or record_milestone_validation "
1805            "if acceptance criteria can be judged from existing evidence, before more branch work."
1806        )
1807    if _roadmap_missing_for_broad_job(job):
1808        return (
1809            "The objective is broad enough to benefit from roadmap control. Use record_roadmap to define compact milestones, "
1810            "features, acceptance criteria, and validation checkpoints before expanding the task queue further."
1811        )
1812    evidence_step = _unpersisted_evidence_step(recent_steps)
1813    if evidence_step:
1814        return (
1815            f"You have unsaved evidence from step #{evidence_step['step_no']} "
1816            f"({evidence_step.get('tool_name') or evidence_step['kind']}). "
1817            "Your next tool call should usually be write_artifact. If this evidence taught a durable rule, record_lesson after saving it."
1818        )
1819    if _task_queue_exhausted(job):
1820        return (
1821            "All durable task branches are done, skipped, or blocked. Before more research or execution, "
1822            "use record_tasks to open the next concrete branch, or report_update if the operator needs a checkpoint."
1823        )
1824    for step in reversed(recent_steps[-5:]):
1825        if step.get("status") == "failed" and step.get("tool_name") == "read_artifact":
1826            output = step.get("output") if isinstance(step.get("output"), dict) else {}
1827            if "artifact not found" in str(output.get("error") or step.get("summary") or "").lower():
1828                return (
1829                    "The last artifact read used a reference that does not exist. Do not invent or retry artifact ids. "
1830                    "Use a valid recent artifact ref, call search_artifacts with a concrete query, or continue from "
1831                    "already observed evidence with a durable record."
1832                )
1833        error = str(step.get("error") or "")
1834        if error == "artifact required before more research":
1835            return "The last blocked action needs write_artifact, not another search or browser action."
1836        if error == "task branch required before more work":
1837            return "Create or reopen a task branch with record_tasks before doing more research or execution."
1838        if error in {"duplicate tool call blocked", "similar search query blocked", "search loop blocked"}:
1839            output = step.get("output") if isinstance(step.get("output"), dict) else {}
1840            blocked_tool = str(output.get("blocked_tool") or "")
1841            if blocked_tool == "read_artifact":
1842                return "Do not read the same artifact again. Use its content to choose a concrete next action: inspect a specific item, record findings/tasks, or write a report artifact."
1843            if blocked_tool == "shell_exec":
1844                return "Do not rerun the same shell discovery command. Use the prior output to inspect a specific file/item, save it, or update findings/tasks."
1845            return "Change source, extract an existing result, save an artifact, or record a lesson about the failed strategy."
1846    return "No special constraint beyond taking one bounded useful action."
1847
1848
1849def _latest_evidence_grounding_block(recent_steps: list[dict[str, Any]]) -> dict[str, Any] | None:
1850    resolution_after_block = False
1851    for step in reversed(recent_steps[-8:]):
1852        if (
1853            step.get("status") == "completed"
1854            and step.get("tool_name") in EVIDENCE_GROUNDING_RESOLUTION_TOOLS
1855        ):
1856            resolution_after_block = True
1857            continue
1858        if step.get("status") != "blocked":
1859            continue
1860        output = step.get("output") if isinstance(step.get("output"), dict) else {}
1861        if output.get("error") != "evidence grounding required":
1862            continue
1863        if resolution_after_block:
1864            return None
1865        grounding = output.get("evidence_grounding") if isinstance(output.get("evidence_grounding"), dict) else {}
1866        return grounding or {"unsupported_tokens": []}
1867    return None
1868
1869
1870def _milestone_validation_needed(job: dict[str, Any]) -> dict[str, Any] | None:
1871    metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
1872    roadmap = metadata.get("roadmap") if isinstance(metadata.get("roadmap"), dict) else {}
1873    milestones = roadmap.get("milestones") if isinstance(roadmap.get("milestones"), list) else []
1874    for milestone in milestones:
1875        if not isinstance(milestone, dict):
1876            continue
1877        status = str(milestone.get("status") or "planned")
1878        validation_status = str(milestone.get("validation_status") or "not_started")
1879        if status == "validating" or validation_status == "pending":
1880            return milestone
1881        features = milestone.get("features") if isinstance(milestone.get("features"), list) else []
1882        if status == "active" and features and all(
1883            isinstance(feature, dict) and str(feature.get("status") or "planned") in {"done", "skipped"}
1884            for feature in features
1885        ):
1886            return milestone
1887    return None
1888
1889
1890def _tool_call_matches_pending_milestone_need(tool_name: str, args: dict[str, Any], milestone: dict[str, Any]) -> bool:
1891    if str(milestone.get("validation_status") or "").strip().lower() != "pending":
1892        return False
1893    if tool_name not in BRANCH_WORK_TOOLS:
1894        return False
1895    return _text_matches_pending_milestone_need(_json_value_text(args), milestone)
1896
1897
1898def _text_matches_pending_milestone_need(text: str, milestone: dict[str, Any]) -> bool:
1899    parts = [
1900        str(milestone.get("title") or ""),
1901        str(milestone.get("next_action") or ""),
1902        str(milestone.get("acceptance_criteria") or ""),
1903        str(milestone.get("evidence_needed") or ""),
1904        str(milestone.get("validation_evidence") or ""),
1905        str(milestone.get("validation_result") or ""),
1906        " ".join(str(item) for item in milestone.get("validation_issues") or [] if item),
1907    ]
1908    features = milestone.get("features") if isinstance(milestone.get("features"), list) else []
1909    for feature in features:
1910        if not isinstance(feature, dict):
1911            continue
1912        parts.extend([
1913            str(feature.get("title") or ""),
1914            str(feature.get("goal") or ""),
1915            str(feature.get("acceptance_criteria") or ""),
1916            str(feature.get("evidence_needed") or ""),
1917        ])
1918    need_tokens = _substantive_next_action_tokens(" ".join(parts)) - MILESTONE_MATCH_STOPWORDS
1919    if not need_tokens:
1920        return False
1921    call_tokens = _substantive_next_action_tokens(text) - MILESTONE_MATCH_STOPWORDS
1922    if not call_tokens:
1923        return False
1924    return bool(need_tokens & call_tokens)
1925
1926
1927def _milestone_validation_call_matches_current(args: dict[str, Any], milestone: dict[str, Any]) -> bool:
1928    requested = _norm_task_key("", str(args.get("milestone") or args.get("title") or ""))
1929    if not requested:
1930        return False
1931    candidates = [
1932        _norm_task_key("", str(milestone.get("title") or "")),
1933        _norm_task_key("", str(milestone.get("key") or "")),
1934    ]
1935    for candidate in candidates:
1936        if not candidate:
1937            continue
1938        if requested == candidate or requested in candidate or candidate in requested:
1939            return True
1940    return False
1941
1942
1943def _normalize_milestone_validation_args_for_active_gate(
1944    tool_name: str,
1945    args: dict[str, Any],
1946    job: dict[str, Any],
1947) -> dict[str, Any]:
1948    if tool_name != "record_milestone_validation":
1949        return args
1950    milestone = _milestone_validation_needed(job)
1951    if not milestone or _milestone_validation_call_matches_current(args, milestone):
1952        return args
1953    if not _text_matches_pending_milestone_need(_json_value_text(args), milestone):
1954        return args
1955    normalized = dict(args)
1956    normalized["milestone"] = str(milestone.get("title") or args.get("milestone") or "")
1957    metadata = normalized.get("metadata") if isinstance(normalized.get("metadata"), dict) else {}
1958    normalized["metadata"] = {
1959        **metadata,
1960        "normalized_from_milestone": str(args.get("milestone") or ""),
1961        "normalized_to_active_gate": True,
1962    }
1963    return normalized
1964
1965
1966def _latest_experiment_next_action_context(job: dict[str, Any]) -> dict[str, Any] | None:
1967    experiments = _metadata_list(job, "experiment_ledger")
1968    for experiment in reversed(experiments):
1969        if not isinstance(experiment, dict):
1970            continue
1971        status = str(experiment.get("status") or "").strip().lower()
1972        next_action = str(experiment.get("next_action") or "").strip()
1973        if not next_action:
1974            continue
1975        if status in {"measured", "failed", "blocked"} or experiment.get("metric_value") is not None:
1976            return {
1977                "title": experiment.get("title"),
1978                "status": status,
1979                "metric_name": experiment.get("metric_name"),
1980                "metric_value": experiment.get("metric_value"),
1981                "next_action": next_action,
1982            }
1983    return None
1984
1985
1986def _experiment_next_action_requires_delivery(context: dict[str, Any] | None) -> bool:
1987    if not context:
1988        return False
1989    next_action = str(context.get("next_action") or "").lower()
1990    if not next_action:
1991        return False
1992    tokens = set(re.findall(r"[a-z][a-z0-9_-]+", next_action))
1993    if not tokens & EXPERIMENT_DELIVERY_ACTION_TERMS:
1994        return False
1995    return not bool(tokens & EXPERIMENT_INFORMATION_ACTION_TERMS)
1996
1997
1998def _experiment_next_action_failure_context(job: dict[str, Any], recent_steps: list[dict[str, Any]], *, window: int = 8) -> dict[str, Any] | None:
1999    context = _latest_experiment_next_action_context(job)
2000    if not _experiment_next_action_requires_delivery(context):
2001        return None
2002    latest_experiment_step_no = max(
2003        (
2004            _as_int(step.get("step_no"))
2005            for step in recent_steps
2006            if step.get("tool_name") == "record_experiment" and step.get("status") == "completed"
2007        ),
2008        default=0,
2009    )
2010    next_action = str(context.get("next_action") or "") if context else ""
2011    for step in reversed(_completed_or_failed_recent_steps(recent_steps)[-window:]):
2012        if step.get("tool_name") != "shell_exec":
2013            continue
2014        if latest_experiment_step_no and _as_int(step.get("step_no")) <= latest_experiment_step_no:
2015            continue
2016        text = _shell_step_failure_text(step)
2017        if not text.strip() or not _shell_output_has_missing_command(text):
2018            continue
2019        command = _step_command(step)
2020        if not _shell_command_matches_next_action(command, next_action):
2021            continue
2022        return {
2023            "step_no": step.get("step_no"),
2024            "command": command,
2025            "excerpt": text.strip(),
2026            "missing_commands": _missing_commands_from_shell_output(text),
2027            "missing_paths": _missing_paths_from_shell_output(text),
2028            "experiment_next_action": context,
2029        }
2030    return None
2031
2032
2033def _shell_command_looks_like_write(command: str) -> bool:
2034    text = command.strip()
2035    if not text:
2036        return False
2037    if re.match(r"(?is)^curl\b", text):
2038        download_flags = (
2039            r"(?:^|\s)(?:-o\s*\S+|-O\b|--output(?:=|\s+)\S+|--remote-name\b|--output-dir(?:=|\s+)\S+)"
2040        )
2041        if re.search(download_flags, text):
2042            return True
2043    if re.match(r"(?is)^(?:wget|aria2c)\b", text):
2044        return True
2045    write_patterns = [
2046        r"(?<!\d)>>?\s*[^&]",
2047        r"\b1>>?\s*[^&]",
2048        r"\btee\b",
2049        r"\bcat\s+>\b",
2050        r"\bpython[0-9.]*\b.*\bwrite_text\b",
2051        r"\bpython[0-9.]*\b.*\bopen\([^)]*,\s*['\"]w",
2052        r"\bsed\s+-i\b",
2053    ]
2054    return any(re.search(pattern, text, flags=re.IGNORECASE | re.DOTALL) for pattern in write_patterns)
2055
2056
2057def _shell_command_looks_read_only(command: str) -> bool:
2058    text = command.strip()
2059    if not text:
2060        return False
2061    if _shell_command_looks_like_write(text):
2062        return False
2063    if READ_ONLY_SHELL_COMMAND_PATTERN.search(text):
2064        return True
2065    if re.match(r"(?is)^curl\b", text):
2066        mutating_flags = r"\b-X\s*(?:POST|PUT|PATCH|DELETE)\b|--request\s+(?:POST|PUT|PATCH|DELETE)\b|(?:^|\s)(?:-d|--data|--form|-F|-T|--upload-file)\b"
2067        return not bool(re.search(mutating_flags, text))
2068    return False
2069
2070
2071def _shell_command_supports_experiment_next_action(command: str, context: dict[str, Any] | None) -> bool:
2072    if not context:
2073        return False
2074    text = command.strip()
2075    if not text or not EXPERIMENT_NEXT_ACTION_VERIFY_SHELL_PATTERN.search(text):
2076        return False
2077    next_action = str(context.get("next_action") or "")
2078    if not next_action.strip():
2079        return False
2080    action_tokens = _substantive_next_action_tokens(next_action)
2081    if not action_tokens:
2082        return False
2083    command_tokens = _substantive_next_action_tokens(text)
2084    return bool(action_tokens & command_tokens)
2085
2086
2087def _shell_command_matches_next_action(command: str, next_action: str) -> bool:
2088    if not command.strip() or not next_action.strip():
2089        return False
2090    action_tokens = _substantive_next_action_tokens(next_action)
2091    command_tokens = _substantive_next_action_tokens(command)
2092    return bool(action_tokens & command_tokens)
2093
2094
2095def _substantive_next_action_tokens(text: str) -> set[str]:
2096    tokens = set()
2097    for token in re.findall(r"[a-z0-9][a-z0-9_.-]{2,}", text.lower()):
2098        token = token.strip("._-")
2099        if len(token) < 3:
2100            continue
2101        if token in TEXT_TOKEN_STOPWORDS or token in EXPERIMENT_NEXT_ACTION_VERIFY_STOPWORDS:
2102            continue
2103        tokens.add(token)
2104        for part in re.split(r"[._/-]+", token):
2105            if len(part) >= 3 and part not in TEXT_TOKEN_STOPWORDS and part not in EXPERIMENT_NEXT_ACTION_VERIFY_STOPWORDS:
2106                tokens.add(part)
2107    return tokens
2108
2109
2110def _roadmap_staleness_context(job: dict[str, Any], recent_steps: list[dict[str, Any]]) -> dict[str, Any] | None:
2111    metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
2112    roadmap = metadata.get("roadmap") if isinstance(metadata.get("roadmap"), dict) else {}
2113    milestones = roadmap.get("milestones") if isinstance(roadmap.get("milestones"), list) else []
2114    if not milestones:
2115        return None
2116    if any(step.get("tool_name") in {"record_roadmap", "record_milestone_validation"} for step in recent_steps):
2117        return None
2118    if any(
2119        isinstance(milestone, dict)
2120        and (
2121            str(milestone.get("status") or "planned") != "planned"
2122            or str(milestone.get("validation_status") or "not_started") != "not_started"
2123        )
2124        for milestone in milestones
2125    ):
2126        return None
2127    tasks = metadata.get("task_queue") if isinstance(metadata.get("task_queue"), list) else []
2128    completed_artifacts = [
2129        step for step in recent_steps
2130        if step.get("status") == "completed" and step.get("tool_name") == "write_artifact"
2131    ]
2132    task_updates = [
2133        step for step in recent_steps
2134        if step.get("status") == "completed" and step.get("tool_name") == "record_tasks"
2135    ]
2136    if len(completed_artifacts) < 2 and len(task_updates) < 2 and len(tasks) < 8:
2137        return None
2138    return {
2139        "title": roadmap.get("title") or "Roadmap",
2140        "status": roadmap.get("status") or "planned",
2141        "milestone_count": len(milestones),
2142        "task_count": len(tasks),
2143        "artifact_count": len(completed_artifacts),
2144        "task_update_count": len(task_updates),
2145    }
2146
2147
2148def _roadmap_missing_for_broad_job(job: dict[str, Any]) -> bool:
2149    metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
2150    if isinstance(metadata.get("roadmap"), dict):
2151        return False
2152    objective = str(job.get("objective") or "")
2153    tasks = metadata.get("task_queue") if isinstance(metadata.get("task_queue"), list) else []
2154    if len(tasks) >= 6:
2155        return True
2156    words = re.findall(r"[A-Za-z0-9_]+", objective)
2157    broad_terms = {"build", "create", "develop", "implement", "research", "improve", "optimize", "migrate", "write", "analyze"}
2158    return len(words) >= 14 and any(term in objective.lower() for term in broad_terms)
2159
2160
2161def _task_queue_exhausted(job: dict[str, Any]) -> bool:
2162    tasks = _metadata_list(job, "task_queue")
2163    if not tasks:
2164        return False
2165    runnable = {"open", "active"}
2166    return not any(str(task.get("status") or "open").strip().lower() in runnable for task in tasks)
2167
2168
2169def _task_queue_saturation_context(job: dict[str, Any], args: dict[str, Any]) -> dict[str, Any] | None:
2170    tasks = _metadata_list(job, "task_queue")
2171    objective_tasks = [task for task in tasks if not _is_guard_recovery_task(task)]
2172    open_tasks = [task for task in objective_tasks if str(task.get("status") or "open").strip().lower() in {"open", "active"}]
2173    incoming = args.get("tasks") if isinstance(args.get("tasks"), list) else []
2174    if not incoming:
2175        return None
2176    existing_keys = {
2177        _norm_task_key(str(task.get("parent") or ""), str(task.get("title") or ""))
2178        for task in tasks
2179    }
2180    semantic_matches = []
2181    new_open_titles = []
2182    new_titles = []
2183    for task in incoming:
2184        if not isinstance(task, dict):
2185            continue
2186        status = str(task.get("status") or "open").strip().lower().replace(" ", "_")
2187        title = str(task.get("title") or task.get("name") or "").strip()
2188        parent = str(task.get("parent") or "")
2189        key = _norm_task_key(parent, title)
2190        matched_existing = key in existing_keys
2191        semantic_match = None
2192        if not matched_existing and (len(objective_tasks) > TASK_QUEUE_TOTAL_SOFT_LIMIT or len(open_tasks) >= TASK_QUEUE_SATURATION_OPEN_TASKS):
2193            semantic_match = find_semantic_task_match(
2194                title=title,
2195                parent=parent,
2196                tasks=[existing for existing in tasks if not _is_guard_recovery_task(existing)],
2197            )
2198            matched_existing = bool(semantic_match)
2199        if semantic_match:
2200            semantic_matches.append({
2201                "title": title,
2202                "matched_title": semantic_match.get("title"),
2203                "score": semantic_match.get("score"),
2204            })
2205        if not matched_existing:
2206            new_titles.append(str(task.get("title") or "").strip())
2207        if status in {"open", "active"} and not matched_existing:
2208            new_open_titles.append(str(task.get("title") or "").strip())
2209    projected_total = len(objective_tasks) + len(new_titles)
2210    projected_open = len(open_tasks) + len(new_open_titles)
2211    if projected_total > TASK_QUEUE_TOTAL_SOFT_LIMIT and new_titles:
2212        return {
2213            "reason": "total task queue is too large",
2214            "total_count": len(objective_tasks),
2215            "projected_total_count": projected_total,
2216            "total_threshold": TASK_QUEUE_TOTAL_SOFT_LIMIT,
2217            "open_count": len(open_tasks),
2218            "open_titles": [
2219                str(task.get("title") or "").strip()
2220                for task in open_tasks[:8]
2221                if str(task.get("title") or "").strip()
2222            ],
2223            "new_count": len(new_titles),
2224            "new_titles": new_titles[:8],
2225            "semantic_matches": semantic_matches[:8],
2226            "recovery_task_count": len(tasks) - len(objective_tasks),
2227        }
2228    if projected_open < TASK_QUEUE_SATURATION_OPEN_TASKS:
2229        return None
2230    if not new_open_titles:
2231        return None
2232    return {
2233        "reason": "too many open tasks",
2234        "open_count": len(open_tasks),
2235        "projected_open_count": projected_open,
2236        "open_threshold": TASK_QUEUE_SATURATION_OPEN_TASKS,
2237        "total_count": len(objective_tasks),
2238        "open_titles": [
2239            str(task.get("title") or "").strip()
2240            for task in open_tasks[:8]
2241            if str(task.get("title") or "").strip()
2242        ],
2243        "new_open_count": len(new_open_titles),
2244        "new_open_titles": new_open_titles[:8],
2245        "semantic_matches": semantic_matches[:8],
2246        "recovery_task_count": len(tasks) - len(objective_tasks),
2247    }
2248
2249
2250def _recent_task_queue_saturation_context(recent_steps: list[dict[str, Any]], *, window: int = 6) -> dict[str, Any] | None:
2251    for step in reversed(recent_steps[-window:]):
2252        if step.get("tool_name") != "record_tasks" or step.get("status") != "blocked":
2253            continue
2254        output = step.get("output") if isinstance(step.get("output"), dict) else {}
2255        if output.get("error") != "task queue saturated":
2256            continue
2257        task_queue = output.get("task_queue") if isinstance(output.get("task_queue"), dict) else {}
2258        return {
2259            "step_no": step.get("step_no"),
2260            "reason": task_queue.get("reason") or "task queue saturated",
2261            "open_count": task_queue.get("open_count"),
2262            "total_count": task_queue.get("total_count"),
2263            "open_titles": task_queue.get("open_titles") if isinstance(task_queue.get("open_titles"), list) else [],
2264        }
2265    return None
2266
2267
2268def _record_task_backlog_pressure(
2269    *,
2270    db: AgentDB,
2271    job_id: str,
2272    step_no: int | str | None,
2273    task_queue: dict[str, Any],
2274    source: str,
2275) -> None:
2276    if not isinstance(task_queue, dict) or not task_queue:
2277        return
2278    pressure = {
2279        "detected_at": datetime.now(timezone.utc).isoformat(),
2280        "source": source,
2281        "latest_step_no": step_no,
2282        "reason": task_queue.get("reason") or "task queue saturated",
2283        "open_count": task_queue.get("open_count"),
2284        "total_count": task_queue.get("total_count"),
2285        "projected_open_count": task_queue.get("projected_open_count"),
2286        "projected_total_count": task_queue.get("projected_total_count"),
2287        "open_titles": task_queue.get("open_titles") if isinstance(task_queue.get("open_titles"), list) else [],
2288    }
2289    db.update_job_metadata(job_id, {"task_backlog_pressure": pressure})
2290    db.append_agent_update(
2291        job_id,
2292        (
2293            "Task backlog pressure is active; next worker turns should execute, complete, block, skip, "
2294            "or consolidate existing tasks instead of adding new branches."
2295        ),
2296        category="blocked",
2297        metadata={"task_backlog_pressure": pressure},
2298    )
2299
2300
2301def _clear_stale_task_backlog_pressure(db: AgentDB, job_id: str, job: dict[str, Any]) -> bool:
2302    metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
2303    pressure = metadata.get("task_backlog_pressure")
2304    if not isinstance(pressure, dict) or not pressure:
2305        return False
2306    if _current_task_backlog_pressure_context(job):
2307        return False
2308    cleared = dict(pressure)
2309    cleared["resolved_at"] = datetime.now(timezone.utc).isoformat()
2310    db.update_job_metadata(job_id, {"task_backlog_pressure": {}})
2311    db.append_agent_update(
2312        job_id,
2313        "Task backlog pressure cleared; the active task queue is back under saturation limits.",
2314        category="progress",
2315        metadata={"cleared_task_backlog_pressure": cleared},
2316    )
2317    return True
2318
2319
2320def _repeated_task_queue_saturation_context(recent_steps: list[dict[str, Any]], *, window: int = 8, threshold: int = 2) -> dict[str, Any] | None:
2321    matches = []
2322    for step in recent_steps[-window:]:
2323        if step.get("tool_name") != "record_tasks" or step.get("status") != "blocked":
2324            continue
2325        output = step.get("output") if isinstance(step.get("output"), dict) else {}
2326        if output.get("error") == "task queue saturated":
2327            matches.append(step)
2328    if len(matches) < threshold:
2329        return None
2330    latest = matches[-1]
2331    output = latest.get("output") if isinstance(latest.get("output"), dict) else {}
2332    task_queue = output.get("task_queue") if isinstance(output.get("task_queue"), dict) else {}
2333    return {
2334        "count": len(matches),
2335        "latest_step_no": latest.get("step_no"),
2336        "reason": task_queue.get("reason") or "task queue saturated",
2337    }
2338
2339
2340def _task_planning_stagnation_context(job: dict[str, Any]) -> dict[str, Any] | None:
2341    metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
2342    streak = _as_int(metadata.get("task_planning_checkpoint_streak"))
2343    if streak < TASK_PLANNING_STAGNATION_CHECKPOINTS:
2344        return None
2345    tasks = _metadata_list(job, "task_queue")
2346    open_tasks = [
2347        task
2348        for task in tasks
2349        if str(task.get("status") or "open").strip().lower().replace(" ", "_") in {"open", "active"}
2350    ]
2351    return {
2352        "task_only_checkpoints": streak,
2353        "threshold": TASK_PLANNING_STAGNATION_CHECKPOINTS,
2354        "total_tasks": len(tasks),
2355        "open_tasks": len(open_tasks),
2356    }
2357
2358
2359def _is_guard_recovery_task(task: dict[str, Any]) -> bool:
2360    metadata = task.get("metadata") if isinstance(task.get("metadata"), dict) else {}
2361    return bool(metadata.get("guard_recovery")) or str(task.get("title") or "").strip().lower().startswith("resolve guard:")
2362
2363
2364def _record_tasks_adds_new_open_work(args: dict[str, Any], job: dict[str, Any]) -> bool:
2365    incoming = args.get("tasks") if isinstance(args.get("tasks"), list) else []
2366    if not incoming:
2367        incoming = [args]
2368    tasks = _metadata_list(job, "task_queue")
2369    existing_keys = {
2370        _norm_task_key(str(task.get("parent") or ""), str(task.get("title") or ""))
2371        for task in tasks
2372    }
2373    for task in incoming:
2374        if not isinstance(task, dict):
2375            continue
2376        title = str(task.get("title") or task.get("name") or "").strip()
2377        if not title:
2378            continue
2379        status = str(task.get("status") or "open").strip().lower().replace(" ", "_")
2380        key = _norm_task_key(str(task.get("parent") or ""), title)
2381        if status in {"open", "active"} and key not in existing_keys:
2382            return True
2383    return False
2384
2385
2386def _norm_task_key(parent: str, title: str) -> str:
2387    return task_key(parent, title)
2388
2389
2390def _parse_tool_result(raw: str) -> dict[str, Any]:
2391    try:
2392        parsed = json.loads(raw)
2393        return parsed if isinstance(parsed, dict) else {"result": parsed}
2394    except json.JSONDecodeError:
2395        return {"result": raw}
2396
2397
2398def _load_program_text(config: AppConfig, job_id: str) -> str:
2399    path = config.runtime.jobs_dir / job_id / "program.md"
2400    if not path.exists():
2401        return ""
2402    return path.read_text(encoding="utf-8", errors="replace")
2403
2404
2405def _browser_warning_context(output: dict[str, Any]) -> dict[str, str] | None:
2406    data = output.get("data") if isinstance(output.get("data"), dict) else {}
2407    title = str(data.get("title") or "")
2408    url = str(data.get("url") or data.get("origin") or output.get("url") or "")
2409    snapshot = str(output.get("snapshot") or data.get("snapshot") or output.get("data") or "")
2410    reason = anti_bot_reason(title, url, snapshot)
2411    if not reason:
2412        return None
2413    return {"reason": reason, "url": url, "title": title}
2414
2415
2416def _recent_anti_bot_context(recent_steps: list[dict[str, Any]], *, window: int = 8) -> dict[str, Any] | None:
2417    for step in reversed(recent_steps[-window:]):
2418        if step.get("status") != "completed" or step.get("tool_name") not in {"browser_navigate", "browser_snapshot"}:
2419            continue
2420        output = step.get("output") if isinstance(step.get("output"), dict) else {}
2421        warning = _browser_warning_context(output)
2422        if warning:
2423            return {**warning, "step_id": step.get("id"), "step_no": step.get("step_no")}
2424    return None
2425
2426
2427def _artifact_args_acknowledge_block(args: dict[str, Any]) -> bool:
2428    text = " ".join(str(args.get(key) or "") for key in ("title", "summary", "content")).lower()
2429    return any(term in text for term in ANTI_BOT_ACK_TERMS)
2430
2431
2432def _same_source_url(left: str, right: str) -> bool:
2433    if not left or not right:
2434        return False
2435    return left.split("#", 1)[0].rstrip("/") == right.split("#", 1)[0].rstrip("/")
2436
2437
2438def _normalized_source_url(value: str) -> str:
2439    value = str(value or "").strip()
2440    if not value:
2441        return ""
2442    if "://" not in value:
2443        return f"https://{value}"
2444    return value
2445
2446
2447def _source_host(value: str) -> str:
2448    parsed = urlparse(_normalized_source_url(value))
2449    return parsed.netloc.lower().removeprefix("www.")
2450
2451
2452def _source_matches(left: str, right: str) -> bool:
2453    if _same_source_url(left, right):
2454        return True
2455    left_host, left_path = _source_path_key(left)
2456    right_host, right_path = _source_path_key(right)
2457    if not left_host or left_host != right_host:
2458        return False
2459    if right_path in {"", "/"} or left_path in {"", "/"}:
2460        return False
2461    return left_path == right_path or left_path.startswith(right_path + "/") or right_path.startswith(left_path + "/")
2462
2463
2464def _source_path_key(value: str) -> tuple[str, str]:
2465    parsed = urlparse(_normalized_source_url(value))
2466    host = parsed.netloc.lower().removeprefix("www.")
2467    path = (parsed.path or "/").rstrip("/") or "/"
2468    return host, path
2469
2470
2471def _shell_source_matches(left: str, right: str) -> bool:
2472    if _same_source_url(left, right):
2473        return True
2474    left_host, left_path = _source_path_key(left)
2475    right_host, right_path = _source_path_key(right)
2476    if not left_host or left_host != right_host:
2477        return False
2478    if right_path in {"", "/"} or left_path in {"", "/"}:
2479        return False
2480    return left_path == right_path or left_path.startswith(right_path + "/") or right_path.startswith(left_path + "/")
2481
2482
2483def _urls_from_text(text: str) -> list[str]:
2484    urls: list[str] = []
2485    seen: set[str] = set()
2486    for match in re.finditer(r"https?://[^\s'\"<>)}\]]+", str(text or "")):
2487        url = match.group(0).rstrip(".,;:")
2488        key = url.lower()
2489        if key in seen:
2490            continue
2491        seen.add(key)
2492        urls.append(url)
2493    return urls
2494
2495
2496def _source_url_has_path(value: str) -> bool:
2497    _, path = _source_path_key(value)
2498    return path not in {"", "/"}
2499
2500
2501def _shell_guard_urls(text: str) -> list[str]:
2502    urls = _urls_from_text(text)
2503    if len(urls) <= 1:
2504        return urls
2505    path_urls = [url for url in urls if _source_url_has_path(url)]
2506    return path_urls or urls
2507
2508
2509SHELL_PLACEHOLDER_URL_HOSTS = {
2510    "domain",
2511    "endpoint",
2512    "example",
2513    "file",
2514    "host",
2515    "input",
2516    "output",
2517    "path",
2518    "placeholder",
2519    "source",
2520    "target",
2521    "url",
2522    "uri",
2523}
2524
2525SHELL_PLACEHOLDER_FIELD_NAMES = (
2526    "command",
2527    "domain",
2528    "endpoint",
2529    "file",
2530    "host",
2531    "input",
2532    "output",
2533    "path",
2534    "source",
2535    "target",
2536    "url",
2537    "uri",
2538)
2539
2540
2541def _shell_placeholder_context(command: str) -> dict[str, Any] | None:
2542    command = str(command or "").strip()
2543    if not command:
2544        return None
2545    if "```" in command:
2546        return {
2547            "kind": "markdown_code_fence",
2548            "value": "```",
2549            "reason": "command contains markdown code fences instead of executable shell only",
2550        }
2551    if re.search(r"(?m)^\s*-{3,}\s+\S", command) or re.search(r"(?m)^\s*\d+\.\s+```", command):
2552        return {
2553            "kind": "markdown_prose",
2554            "value": "markdown prose",
2555            "reason": "command contains copied markdown prose instead of executable shell only",
2556        }
2557    for url in _urls_from_text(command):
2558        parsed = urlparse(url)
2559        host = (parsed.hostname or "").lower()
2560        if host in SHELL_PLACEHOLDER_URL_HOSTS:
2561            return {
2562                "kind": "placeholder_url",
2563                "value": url,
2564                "reason": "URL host looks like an unresolved placeholder field",
2565            }
2566    fields = "|".join(re.escape(name) for name in SHELL_PLACEHOLDER_FIELD_NAMES)
2567    placeholder_patterns = [
2568        rf"<\s*(?:{fields})(?:[-_ ][A-Za-z0-9]+)?\s*>",
2569        rf"\{{\{{\s*(?:{fields})(?:[-_ ][A-Za-z0-9]+)?\s*\}}\}}",
2570        rf"\{{\s*(?:{fields})(?:[-_ ][A-Za-z0-9]+)?\s*\}}",
2571        r"</?\s*(?:parameter|arguments?|tool_call|function_call)\b[^>]*>",
2572        r"\b(?:YOUR|REPLACE|TODO|INSERT)_[A-Z0-9_]{3,}\b",
2573    ]
2574    for pattern in placeholder_patterns:
2575        match = re.search(pattern, command, flags=re.IGNORECASE)
2576        if match:
2577            return {
2578                "kind": "placeholder_token",
2579                "value": match.group(0),
2580                "reason": "command contains an unresolved placeholder token",
2581            }
2582    return None
2583
2584
2585def _shell_syntax_preflight_context(command: str) -> dict[str, Any] | None:
2586    command = str(command or "").strip()
2587    if not command:
2588        return None
2589    try:
2590        shlex.split(command, posix=True)
2591    except ValueError as exc:
2592        return {
2593            "kind": "shell_syntax",
2594            "value": str(exc),
2595            "reason": "command is not parseable shell syntax; usually an unmatched quote, escape, or partial pasted command",
2596        }
2597    return None
2598
2599
2600def _source_failure_family_url(value: str) -> str:
2601    parsed = urlparse(_normalized_source_url(value))
2602    if not parsed.scheme or not parsed.netloc:
2603        return ""
2604    segments = [segment for segment in (parsed.path or "").split("/") if segment]
2605    if len(segments) < 2:
2606        return ""
2607    last = segments[-1]
2608    looks_file_like = "." in last
2609    family_segments = segments[:-1] if looks_file_like else segments
2610    if len(family_segments) < 2:
2611        return ""
2612    return f"{parsed.scheme}://{parsed.netloc}/{'/'.join(family_segments)}"
2613
2614
2615def _known_bad_sources(job: dict[str, Any]) -> list[dict[str, Any]]:
2616    bad_sources = []
2617    for source in _metadata_list(job, "source_ledger"):
2618        if (
2619            _as_float(source.get("usefulness_score")) < 0.2
2620            and _as_int(source.get("yield_count")) <= 0
2621            and (_as_int(source.get("fail_count")) > 0 or source.get("warnings"))
2622        ):
2623            bad_sources.append(source)
2624    return bad_sources
2625
2626
2627def _known_bad_source_for_call(name: str, args: dict[str, Any], job: dict[str, Any]) -> dict[str, Any] | None:
2628    if name not in {"browser_navigate", "web_extract", "shell_exec"}:
2629        return None
2630    bad_sources = _known_bad_sources(job)
2631    if not bad_sources:
2632        return None
2633    urls: list[str] = []
2634    if name == "browser_navigate":
2635        urls = [str(args.get("url") or "")]
2636    elif isinstance(args.get("urls"), list):
2637        urls = [str(url) for url in args["urls"]]
2638    elif name == "shell_exec":
2639        urls = _shell_guard_urls(str(args.get("command") or ""))
2640    for url in [url for url in urls if url.strip()]:
2641        for source in bad_sources:
2642            source_value = str(source.get("source") or "")
2643            if not source_value:
2644                continue
2645            matches = _shell_source_matches(url, source_value) if name == "shell_exec" else _source_matches(url, source_value)
2646            if matches:
2647                return source
2648            if name == "shell_exec":
2649                source_family = _source_failure_family_url(source_value)
2650                if source_family and _shell_source_matches(url, source_family):
2651                    metadata = source.get("metadata") if isinstance(source.get("metadata"), dict) else {}
2652                    return {
2653                        **source,
2654                        "source": source_family,
2655                        "source_type": "shell_exec_family",
2656                        "metadata": {**metadata, "source_family": True, "source_family_from": source_value},
2657                    }
2658    return None
2659
2660
2661def _tool_signature(name: str, args: dict[str, Any]) -> str:
2662    return f"{name}:{json.dumps(args, ensure_ascii=False, sort_keys=True)}"
2663
2664
2665def _duplicate_recent_tool_call(
2666    name: str,
2667    args: dict[str, Any],
2668    recent_steps: list[dict[str, Any]],
2669    *,
2670    window: int = 24,
2671) -> dict[str, Any] | None:
2672    if name in {"browser_snapshot", "defer_job"}:
2673        return None
2674    signature = _tool_signature(name, args)
2675    for step in reversed(recent_steps[-window:]):
2676        if step.get("status") != "completed" or step.get("tool_name") != name:
2677            continue
2678        input_data = step.get("input") or {}
2679        previous_args = input_data.get("arguments") if isinstance(input_data, dict) else None
2680        if isinstance(previous_args, dict) and _tool_signature(name, previous_args) == signature:
2681            return step
2682    return None
2683
2684
2685def _completed_recent_steps(recent_steps: list[dict[str, Any]]) -> list[dict[str, Any]]:
2686    return [step for step in recent_steps if step.get("status") == "completed"]
2687
2688
2689def _completed_or_failed_recent_steps(recent_steps: list[dict[str, Any]]) -> list[dict[str, Any]]:
2690    return [step for step in recent_steps if step.get("status") in {"completed", "failed"}]
2691
2692
2693BROWSER_RUNTIME_UNAVAILABLE_TERMS = (
2694    "browser runtime unavailable",
2695    "browser not found",
2696    "browser executable",
2697    "chrome not found",
2698    "could not find chrome",
2699    "chromium executable",
2700    "executable doesn't exist",
2701    "playwright browser cache",
2702    "puppeteer browser cache",
2703)
2704
2705
2706SELF_DEFER_TERMS = (
2707    "next worker turn",
2708    "next worker step",
2709    "picked up by next worker",
2710    "picked up by the next worker",
2711    "picked up by next turn",
2712    "picked up by the next turn",
2713)
2714
2715
2716def _is_browser_tool(name: str | None) -> bool:
2717    return bool(str(name or "").startswith("browser_"))
2718
2719
2720def _browser_runtime_unavailable_context(
2721    recent_steps: list[dict[str, Any]],
2722    *,
2723    window: int = 512,
2724) -> dict[str, Any] | None:
2725    latest_browser_success_no = max(
2726        (
2727            int(step.get("step_no") or 0)
2728            for step in recent_steps[-window:]
2729            if _is_browser_tool(step.get("tool_name")) and step.get("status") == "completed"
2730        ),
2731        default=0,
2732    )
2733    for step in reversed(recent_steps[-window:]):
2734        if not _is_browser_tool(step.get("tool_name")):
2735            continue
2736        step_no = int(step.get("step_no") or 0)
2737        if step_no <= latest_browser_success_no:
2738            continue
2739        if step.get("status") not in {"failed", "blocked"}:
2740            continue
2741        output = step.get("output") if isinstance(step.get("output"), dict) else {}
2742        text = " ".join(
2743            str(part or "")
2744            for part in (
2745                step.get("summary"),
2746                step.get("error"),
2747                output.get("error"),
2748                output.get("summary"),
2749                output.get("stderr"),
2750                output.get("stdout"),
2751            )
2752        ).lower()
2753        if any(term in text for term in BROWSER_RUNTIME_UNAVAILABLE_TERMS):
2754            error = str(output.get("error") or step.get("error") or step.get("summary") or "")
2755            return {
2756                "step_no": step.get("step_no"),
2757                "tool": step.get("tool_name"),
2758                "status": step.get("status"),
2759                "error": _clip_text(error, 500),
2760            }
2761    return None
2762
2763
2764def _self_defer_context(args: dict[str, Any]) -> dict[str, Any] | None:
2765    reason = str(args.get("reason") or "")
2766    next_action = str(args.get("next_action") or "")
2767    text = f"{reason} {next_action}".lower()
2768    matched = next((term for term in SELF_DEFER_TERMS if term in text), "")
2769    if not matched and next_action.strip() and not reason.strip():
2770        matched = "missing wait reason"
2771    if not matched:
2772        return None
2773    return {
2774        "matched": matched,
2775        "reason": reason,
2776        "next_action": next_action,
2777    }
2778
2779
2780EVIDENCE_GROUNDED_TOOLS = {
2781    "record_experiment",
2782    "record_findings",
2783    "record_lesson",
2784    "record_memory_graph",
2785    "record_roadmap",
2786    "report_update",
2787    "write_artifact",
2788}
2789NARRATIVE_EVIDENCE_GROUNDED_TOOLS = {
2790    "record_findings",
2791    "record_lesson",
2792    "record_memory_graph",
2793    "record_roadmap",
2794    "report_update",
2795    "write_artifact",
2796}
2797EVIDENCE_GROUNDING_RESOLUTION_TOOLS = {
2798    "record_experiment",
2799    "record_findings",
2800    "record_lesson",
2801    "record_memory_graph",
2802    "record_milestone_validation",
2803    "record_roadmap",
2804    "record_source",
2805    "record_tasks",
2806    "report_update",
2807    "write_artifact",
2808}
2809EVIDENCE_CHECKPOINT_RESOLUTION_TOOLS = {
2810    "record_experiment",
2811    "record_findings",
2812    "record_lesson",
2813    "record_milestone_validation",
2814    "record_roadmap",
2815    "record_source",
2816    "record_tasks",
2817}
2818EVIDENCE_CHECKPOINT_ACCOUNTING_TOOLS = EVIDENCE_CHECKPOINT_RESOLUTION_TOOLS | {"guard_recovery"}
2819EVIDENCE_CHECKPOINT_PROMPT_TOOLS = {
2820    "record_experiment",
2821    "record_findings",
2822    "record_lesson",
2823    "record_source",
2824}
2825EVIDENCE_TOKEN_IGNORE = <redacted>
2826    "acceptance",
2827    "action",
2828    "actions",
2829    "active",
2830    "agent",
2831    "artifact",
2832    "api",
2833    "baseline",
2834    "branch",
2835    "branches",
2836    "candidate",
2837    "candidates",
2838    "cdn",
2839    "checkpoint",
2840    "compare",
2841    "complete",
2842    "constraint",
2843    "criteria",
2844    "current",
2845    "data",
2846    "deliverable",
2847    "direct",
2848    "done",
2849    "download",
2850    "downloadable",
2851    "downloaded",
2852    "downloading",
2853    "downloads",
2854    "discovered",
2855    "discovery",
2856    "environment",
2857    "existing",
2858    "evidence",
2859    "experiment",
2860    "experiments",
2861    "feature",
2862    "features",
2863    "file",
2864    "files",
2865    "format",
2866    "finding",
2867    "findings",
2868    "file-level",
2869    "html",
2870    "http",
2871    "https",
2872    "inspect",
2873    "inspection",
2874    "investigate",
2875    "investigation",
2876    "json",
2877    "goal",
2878    "gguf",
2879    "hardware",
2880    "improve",
2881    "located",
2882    "memory",
2883    "metric",
2884    "milestone",
2885    "milestones",
2886    "model",
2887    "next",
2888    "observation",
2889    "observations",
2890    "open",
2891    "oid",
2892    "output",
2893    "outputs",
2894    "plan",
2895    "planned",
2896    "pending",
2897    "priority",
2898    "progress",
2899    "parse",
2900    "parsed",
2901    "parsing",
2902    "record",
2903    "report",
2904    "rest",
2905    "research",
2906    "result",
2907    "roadmap",
2908    "runtime",
2909    "search",
2910    "server",
2911    "source",
2912    "sources",
2913    "status",
2914    "sha",
2915    "sha256",
2916    "step",
2917    "steps",
2918    "task",
2919    "tasks",
2920    "test",
2921    "throughput",
2922    "tool",
2923    "tools",
2924    "false",
2925    "none",
2926    "null",
2927    "true",
2928    "url",
2929    "usable",
2930    "unvalidated",
2931    "valid",
2932    "validity",
2933    "validate",
2934    "validated",
2935    "validating",
2936    "validation",
2937    "worker",
2938    "xml",
2939    "yaml",
2940    "yml",
2941    "confirmed",
2942    "consider",
2943    "checking",
2944    "ongoing",
2945    "proceed",
2946    "proceeding",
2947}
2948EVIDENCE_TOKEN_IGNORE.update({f"p{index}" for index in range(10)})
2949STALE_CLAIM_TOKEN_IGNORE = <redacted>
2950    "api",
2951    "ascii",
2952    "blocked",
2953    "broken",
2954    "cdn",
2955    "cli",
2956    "critical",
2957    "cpu",
2958    "cuda",
2959    "discovered",
2960    "ggml",
2961    "gguf",
2962    "gpu",
2963    "hf_token",
2964    "html",
2965    "http",
2966    "https",
2967    "incomplete",
2968    "json",
2969    "lfs",
2970    "not_found",
2971    "oid",
2972    "onnx",
2973    "planned",
2974    "python",
2975    "python3",
2976    "ram",
2977    "rest",
2978    "severe",
2979    "sha",
2980    "sha256",
2981    "vram",
2982    "xet",
2983    "xml",
2984    "yaml",
2985    "yml",
2986}
2987NEGATIVE_EXISTENCE_MARKERS = (
2988    "0 files",
2989    "0 results",
2990    "cannot access",
2991    "does not exist",
2992    "failed to find",
2993    "has not been",
2994    "is not installed",
2995    "missing",
2996    "no ",
2997    "no such",
2998    "none",
2999    "not available",
3000    "not detected",
3001    "not downloaded",
3002    "not found",
3003    "not installed",
3004    "unavailable",
3005    "was not",
3006    "without",
3007)
3008NEGATIVE_ROLE_CLASSIFICATION_MARKERS = (
3009    "not a primary",
3010    "not a required",
3011    "not a target",
3012    "not an expected",
3013    "not suitable as",
3014    "not suitable for",
3015    "not the expected",
3016    "not the needed",
3017    "not the primary",
3018    "not the required",
3019    "not the target",
3020    "not usable as",
3021    "not usable for",
3022    "only support",
3023    "support file",
3024    "support files",
3025)
3026EVIDENCE_NEGATIVE_LINE_MARKERS = (
3027    "0 files",
3028    "0 results",
3029    "cannot access",
3030    "denied",
3031    "does not exist",
3032    "error",
3033    "failed",
3034    "failure",
3035    "has not been",
3036    "missing",
3037    "no such",
3038    "not available",
3039    "not detected",
3040    "not downloaded",
3041    "not found",
3042    "not installed",
3043    "permission",
3044    "timeout",
3045    "unavailable",
3046    "was not",
3047)
3048
3049
3050def _stale_claim_tokens_from_unsupported(tokens: list[str], *, reference_text: str = "") -> list[str]:
3051    stale_tokens: list[str] = []
3052    seen: set[str] = set()
3053    reference_norm = _normalize_claim_text(reference_text)
3054    for token in tokens:
3055        cleaned = str(token or "").strip()
3056        if not cleaned:
3057            continue
3058        key = cleaned.lower()
3059        if key in seen or key in STALE_CLAIM_TOKEN_IGNORE or key in EVIDENCE_TOKEN_IGNORE:
3060            continue
3061        if reference_norm and _normalize_claim_text(cleaned) in reference_norm:
3062            continue
3063        if _looks_like_generated_or_file_token(cleaned):
3064            continue
3065        if len(cleaned) < 4:
3066            continue
3067        distinctive = any(ch.isalpha() for ch in cleaned) and any(ch.isdigit() for ch in cleaned)
3068        distinctive = distinctive or (cleaned.isupper() and len(cleaned) >= 4)
3069        if not distinctive:
3070            continue
3071        seen.add(key)
3072        stale_tokens.append(cleaned)
3073    return stale_tokens
3074
3075
3076def _looks_like_generated_or_file_token(token: str) -> bool:
3077    lowered = token.lower()
3078    if lowered.startswith((
3079        "art_",
3080        "step_",
3081        "step-",
3082        "shell_",
3083        "shell-",
3084        "web_",
3085        "web-",
3086        "episode-",
3087        "fact-",
3088        "source-",
3089        "quality-",
3090        "constraint-",
3091        "baseline-",
3092        "question-",
3093        "verified_",
3094        "verified-",
3095        "timeout_",
3096        "timeout-",
3097    )):
3098        return True
3099    if lowered.endswith((".md", ".py", ".json", ".yaml", ".yml", ".gguf", ".txt", ".log")):
3100        return True
3101    if lowered.startswith(("python-", "pip", "pip3")):
3102        return True
3103    if "_" in lowered and any(ch.isdigit() for ch in lowered) and any(ch.isalpha() for ch in lowered):
3104        return True
3105    return False
3106
3107
3108def _normalize_claim_text(text: str) -> str:
3109    return re.sub(r"[^a-z0-9]+", "", str(text or "").lower())
3110
3111
3112def _evidence_grounding_context(
3113    job: dict[str, Any],
3114    recent_steps: list[dict[str, Any]],
3115    *,
3116    tool_name: str,
3117    args: dict[str, Any],
3118    window: int = 8,
3119) -> dict[str, Any] | None:
3120    if tool_name not in EVIDENCE_GROUNDED_TOOLS:
3121        return None
3122    full_proposed_text = _json_text(args)
3123    proposed_text = _evidence_grounding_proposed_text(tool_name, args)
3124    if len(full_proposed_text.strip()) < 80:
3125        return None
3126    cited_steps = _cited_step_numbers(full_proposed_text)
3127    evidence_text = _recent_evidence_text(job, recent_steps, window=window, step_numbers=cited_steps or None)
3128    fresh_evidence_text = _recent_evidence_text(
3129        job,
3130        recent_steps,
3131        window=window,
3132        step_numbers=cited_steps or None,
3133        include_durable=False,
3134        include_job_context=False,
3135    )
3136    recent_grounding_paths = _candidate_file_paths_from_recent_grounding_blocks(recent_steps, window=window)
3137    if len(evidence_text.strip()) < 80 and not recent_grounding_paths:
3138        return None
3139    job_reference_text = " ".join(str(job.get(key) or "") for key in ("title", "objective", "kind"))
3140    proposed_tokens = [
3141        token
3142        for token in _concrete_evidence_tokens_for_grounding(tool_name, proposed_text)
3143        if not _grounding_token_in_reference_text(token, job_reference_text)
3144    ]
3145    positive_path_conflicts = _positive_path_claim_conflicts_for_grounding(
3146        tool_name=tool_name,
3147        proposed_text=proposed_text,
3148        full_proposed_text=full_proposed_text,
3149        fresh_evidence_text=fresh_evidence_text,
3150    )
3151    if positive_path_conflicts:
3152        conflict_paths = [item["path"] for item in positive_path_conflicts]
3153        return {
3154            "unsupported_tokens": conflict_paths[:12],
3155            "negative_path_conflicts": positive_path_conflicts[:6],
3156            "evidence_steps": [
3157                step.get("step_no")
3158                for step in _evidence_steps_for_grounding(recent_steps, window=window, step_numbers=cited_steps or None)
3159            ],
3160            "cited_steps": sorted(cited_steps),
3161            "guidance": (
3162                "The proposed durable record claims a path or executable is present/available, but recent shell "
3163                "evidence says that same path is missing or inaccessible. Inspect again, record it as missing, "
3164                "or cite a newer positive check before saving the claim."
3165            ),
3166        }
3167    negative_conflicts = _negative_claim_conflicts_for_grounding(
3168        tool_name=tool_name,
3169        proposed_text=proposed_text,
3170        fresh_evidence_text=fresh_evidence_text,
3171        tokens=proposed_tokens,
3172    )
3173    if negative_conflicts:
3174        conflict_tokens = [item["token"] for item in negative_conflicts]
3175        return {
3176            "unsupported_tokens": conflict_tokens[:12],
3177            "negative_claim_conflicts": negative_conflicts[:6],
3178            "evidence_steps": [
3179                step.get("step_no")
3180                for step in _evidence_steps_for_grounding(recent_steps, window=window, step_numbers=cited_steps or None)
3181            ],
3182            "cited_steps": sorted(cited_steps),
3183            "guidance": (
3184                "The proposed durable record negates a concrete item or file pattern that appears in recent positive evidence. "
3185                "Inspect the evidence again or record uncertainty instead of saving a conflicting claim."
3186            ),
3187        }
3188    missing_paths = _missing_candidate_paths_for_grounding(
3189        job=job,
3190        recent_steps=recent_steps,
3191        recent_grounding_paths=recent_grounding_paths,
3192        tool_name=tool_name,
3193        proposed_text=proposed_text,
3194        full_proposed_text=full_proposed_text,
3195        fresh_evidence_text=fresh_evidence_text,
3196    )
3197    if missing_paths:
3198        return {
3199            "unsupported_tokens": missing_paths[:8],
3200            "missing_candidate_paths": missing_paths[:8],
3201            "evidence_steps": [
3202                step.get("step_no")
3203                for step in _evidence_steps_for_grounding(recent_steps, window=window, step_numbers=cited_steps or None)
3204            ],
3205            "cited_steps": sorted(cited_steps),
3206            "guidance": (
3207                "Recent evidence contains concrete file/path candidates, but the durable record only summarized them. "
3208                "Record the exact observed candidate paths, or explicitly state why those paths are not relevant."
3209            ),
3210        }
3211    stale_tokens = _active_stale_claim_token_set(job)
3212    proposed_stale_tokens = [
3213        token
3214        for token in _concrete_evidence_tokens_for_grounding(tool_name, full_proposed_text)
3215        if not _grounding_token_in_reference_text(token, job_reference_text)
3216        if token.lower() in stale_tokens
3217    ]
3218    if tool_name == "record_lesson" and not proposed_stale_tokens:
3219        return None
3220    unsupported_threshold = 1 if cited_steps or proposed_stale_tokens else 3
3221    candidate_tokens = proposed_stale_tokens if tool_name == "record_lesson" else proposed_tokens + proposed_stale_tokens
3222    candidate_high_risk = [token for token in candidate_tokens if _high_risk_evidence_token(token)]
3223    if len(candidate_tokens) < unsupported_threshold and not candidate_high_risk:
3224        return None
3225    evidence_lower = evidence_text.lower()
3226    fresh_evidence_lower = fresh_evidence_text.lower()
3227    unsupported = []
3228    for token in candidate_tokens:
3229        lowered = token.lower()
3230        if lowered in fresh_evidence_lower:
3231            continue
3232        if lowered in evidence_lower and lowered not in stale_tokens:
3233            continue
3234        unsupported.append(token)
3235    unique = []
3236    seen = set()
3237    for token in unsupported:
3238        key = token.lower()
3239        if key in seen:
3240            continue
3241        seen.add(key)
3242        unique.append(token)
3243    high_risk_unique = [token for token in unique if _high_risk_evidence_token(token)]
3244    if len(unique) < unsupported_threshold and not high_risk_unique:
3245        return None
3246    return {
3247        "unsupported_tokens": (high_risk_unique or unique)[:12],
3248        "evidence_steps": [
3249            step.get("step_no")
3250            for step in _evidence_steps_for_grounding(recent_steps, window=window, step_numbers=cited_steps or None)
3251        ],
3252        "cited_steps": sorted(cited_steps),
3253        "guidance": (
3254            "The proposed durable record contains concrete tokens that are not present in recent evidence. "
3255            "Use exact observed evidence, inspect the source again, or record uncertainty instead of writing unsupported claims."
3256        ),
3257    }
3258
3259
3260def _concrete_evidence_tokens_for_grounding(tool_name: str, text: str) -> list[str]:
3261    tokens = _concrete_evidence_tokens(text)
3262    if tool_name not in NARRATIVE_EVIDENCE_GROUNDED_TOOLS:
3263        return tokens
3264    return [token for token in tokens if _high_risk_evidence_token(token)]
3265
3266
3267def _grounding_token_in_reference_text(token: str, reference_text: str) -> bool:
3268    normalized_token = _normalize_claim_text(token)
3269    if not normalized_token:
3270        return False
3271    return normalized_token in _normalize_claim_text(reference_text)
3272
3273
3274def _missing_candidate_paths_for_grounding(
3275    *,
3276    job: dict[str, Any],
3277    recent_steps: list[dict[str, Any]],
3278    recent_grounding_paths: list[str] | None = None,
3279    tool_name: str,
3280    proposed_text: str,
3281    full_proposed_text: str,
3282    fresh_evidence_text: str,
3283) -> list[str]:
3284    if tool_name not in {"record_findings", "record_experiment", "write_artifact", "report_update"}:
3285        return []
3286    proposed_lower = f"{proposed_text}\n{full_proposed_text}".lower()
3287    if not any(term in proposed_lower for term in ("file", "files", "path", "paths", "candidate", "found", "discovered")):
3288        return []
3289    positive_evidence_text = "\n".join(
3290        line
3291        for line in str(fresh_evidence_text or "").splitlines()
3292        if not _evidence_line_is_negative(line.lower())
3293    )
3294    evidence_paths = [
3295        *_extract_candidate_file_paths(positive_evidence_text),
3296        *(recent_grounding_paths or _candidate_file_paths_from_recent_grounding_blocks(recent_steps)),
3297    ]
3298    if not evidence_paths:
3299        return []
3300    if any(_path_mentioned_in_text(path, proposed_lower) for path in evidence_paths):
3301        return []
3302    distinctive_paths: list[str] = []
3303    seen: set[str] = set()
3304    for path in _rank_candidate_file_paths(job, full_proposed_text, evidence_paths):
3305        key = path.lower()
3306        if key in seen:
3307            continue
3308        seen.add(key)
3309        distinctive_paths.append(path)
3310        if len(distinctive_paths) >= 8:
3311            break
3312    return distinctive_paths
3313
3314
3315POSITIVE_PATH_CLAIM_MARKERS = (
3316    "available",
3317    "exists",
3318    "found",
3319    "is at",
3320    "located",
3321    "present",
3322    "ready",
3323    "succeed",
3324    "usable",
3325    "valid",
3326    "verified",
3327)
3328
3329
3330def _positive_path_claim_conflicts_for_grounding(
3331    *,
3332    tool_name: str,
3333    proposed_text: str,
3334    full_proposed_text: str,
3335    fresh_evidence_text: str,
3336) -> list[dict[str, str]]:
3337    if tool_name not in {"record_findings", "record_experiment", "record_source", "record_lesson", "write_artifact", "report_update"}:
3338        return []
3339    proposed_combined = f"{proposed_text}\n{full_proposed_text}"
3340    proposed_lower = proposed_combined.lower()
3341    conflicts: list[dict[str, str]] = []
3342    seen: set[str] = set()
3343    for line in str(fresh_evidence_text or "").splitlines():
3344        line_lower = line.lower()
3345        if not _evidence_line_is_negative(line_lower):
3346            continue
3347        paths = [
3348            *_extract_candidate_file_paths(line),
3349            *_extract_candidate_executable_paths(line),
3350        ]
3351        for path in paths:
3352            path = str(path or "").strip()
3353            if not path:
3354                continue
3355            key = path.lower()
3356            if key in seen:
3357                continue
3358            if key not in proposed_lower:
3359                continue
3360            if not _path_near_positive_claim(proposed_combined, path):
3361                continue
3362            seen.add(key)
3363            conflicts.append({
3364                "path": path,
3365                "evidence": _clip_text(line.strip(), 220),
3366                "claim": _clip_text(_excerpt_around(proposed_combined, path, window=96), 220),
3367            })
3368            if len(conflicts) >= 8:
3369                return conflicts
3370    return conflicts
3371
3372
3373def _path_near_positive_claim(text: str, path: str, *, window: int = 96) -> bool:
3374    for excerpt in _excerpts_around_all(text, path, window=window):
3375        excerpt_lower = excerpt.lower()
3376        if _evidence_line_is_negative(excerpt_lower):
3377            continue
3378        if any(marker in excerpt_lower for marker in POSITIVE_PATH_CLAIM_MARKERS):
3379            return True
3380    return False
3381
3382
3383def _excerpt_around(text: str, needle: str, *, window: int = 80) -> str:
3384    excerpts = _excerpts_around_all(text, needle, window=window, max_matches=1)
3385    return excerpts[0] if excerpts else ""
3386
3387
3388def _excerpts_around_all(text: str, needle: str, *, window: int = 80, max_matches: int = 8) -> list[str]:
3389    source = str(text or "")
3390    needle_text = str(needle or "")
3391    if not source or not needle_text:
3392        return []
3393    source_lower = source.lower()
3394    needle_lower = needle_text.lower()
3395    excerpts: list[str] = []
3396    index = 0
3397    while len(excerpts) < max_matches:
3398        found = source_lower.find(needle_lower, index)
3399        if found < 0:
3400            break
3401        start = max(0, found - window)
3402        end = min(len(source), found + len(needle_text) + window)
3403        excerpts.append(source[start:end])
3404        index = found + max(1, len(needle_text))
3405    return excerpts
3406
3407
3408def _path_mentioned_in_text(path: str, text_lower: str) -> bool:
3409    path_lower = path.lower()
3410    if path_lower in text_lower:
3411        return True
3412    name = Path(path).name.lower()
3413    return bool(name and name in text_lower)
3414
3415
3416def _refresh_contradicted_negative_claims(
3417    db: AgentDB,
3418    job_id: str,
3419    job: dict[str, Any],
3420    recent_steps: list[dict[str, Any]],
3421) -> int:
3422    fresh_evidence_text = _recent_evidence_text(
3423        job,
3424        recent_steps,
3425        window=8,
3426        include_durable=False,
3427        include_job_context=False,
3428    )
3429    if len(fresh_evidence_text.strip()) < 80:
3430        return 0
3431    metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
3432    existing = metadata.get("stale_negative_records") if isinstance(metadata.get("stale_negative_records"), list) else []
3433    seen = {
3434        (
3435            str(item.get("kind") or ""),
3436            str(item.get("record_id") or ""),
3437            str(item.get("token") or "").lower(),
3438        )
3439        for item in existing
3440        if isinstance(item, dict)
3441    }
3442    now = datetime.now(timezone.utc).isoformat()
3443    new_records: list[dict[str, Any]] = []
3444    for kind, records in (
3445        ("finding", _metadata_list(job, "finding_ledger")[-80:]),
3446        ("lesson", _metadata_list(job, "lessons")[-80:]),
3447        (
3448            "memory_node",
3449            (
3450                metadata.get("memory_graph", {}).get("nodes", [])
3451                if isinstance(metadata.get("memory_graph"), dict)
3452                and isinstance(metadata.get("memory_graph", {}).get("nodes"), list)
3453                else []
3454            ),
3455        ),
3456    ):
3457        for record in records:
3458            if not isinstance(record, dict):
3459                continue
3460            record_text = _negative_record_text(kind, record)
3461            if not record_text:
3462                continue
3463            conflicts = _negative_claim_conflicts_for_grounding(
3464                tool_name="record_findings",
3465                proposed_text=record_text,
3466                fresh_evidence_text=fresh_evidence_text,
3467                tokens=_concrete_evidence_tokens(record_text),
3468            )
3469            if not conflicts:
3470                continue
3471            record_id = _negative_record_id(kind, record)
3472            for conflict in conflicts[:4]:
3473                token = str(conflict.get("token") or "")
3474                key = (kind, record_id, token.lower())
3475                if key in seen:
3476                    continue
3477                seen.add(key)
3478                new_records.append({
3479                    "kind": kind,
3480                    "record_id": record_id,
3481                    "title": _negative_record_title(kind, record),
3482                    "token": token,
3483                    "evidence": conflict.get("evidence") or "",
3484                    "observed_at": now,
3485                })
3486    if not new_records:
3487        return 0
3488    db.update_job_metadata(job_id, {"stale_negative_records": (existing + new_records)[-120:]})
3489    db.append_agent_update(
3490        job_id,
3491        f"Suppressed {len(new_records)} contradicted negative durable claim(s) after fresh evidence.",
3492        category="memory",
3493        metadata={"stale_negative_records": new_records[:12]},
3494    )
3495    return len(new_records)
3496
3497
3498def _negative_record_text(kind: str, record: dict[str, Any]) -> str:
3499    if kind == "lesson":
3500        return str(record.get("lesson") or "")
3501    if kind == "memory_node":
3502        return " ".join(
3503            str(record.get(key) or "")
3504            for key in ("key", "title", "kind", "status", "summary")
3505        )
3506    return " ".join(
3507        str(record.get(key) or "")
3508        for key in ("name", "category", "reason", "status", "source_url", "url")
3509    )
3510
3511
3512def _negative_record_id(kind: str, record: dict[str, Any]) -> str:
3513    for key in ("key", "event_id", "id"):
3514        value = str(record.get(key) or "").strip()
3515        if value:
3516            return value
3517    return _normalize_claim_text(f"{kind}:{_negative_record_title(kind, record)}")[:120]
3518
3519
3520def _negative_record_title(kind: str, record: dict[str, Any]) -> str:
3521    if kind == "lesson":
3522        return _clip_text(str(record.get("lesson") or "lesson"), 120)
3523    return str(record.get("name") or record.get("title") or "finding")
3524
3525
3526def _negative_claim_conflicts_for_grounding(
3527    *,
3528    tool_name: str,
3529    proposed_text: str,
3530    fresh_evidence_text: str,
3531    tokens: list[str],
3532) -> list[dict[str, str]]:
3533    if tool_name not in EVIDENCE_GROUNDED_TOOLS:
3534        return []
3535    proposed_lower = proposed_text.lower()
3536    if not any(marker in proposed_lower for marker in NEGATIVE_EXISTENCE_MARKERS):
3537        return []
3538    evidence_lines = [line.strip() for line in fresh_evidence_text.splitlines() if line.strip()]
3539    if not evidence_lines:
3540        return []
3541    candidates = tokens + _file_pattern_tokens_for_grounding(proposed_text)
3542    conflicts: list[dict[str, str]] = []
3543    seen: set[str] = set()
3544    for token in candidates:
3545        key = token.lower()
3546        if key in seen:
3547            continue
3548        seen.add(key)
3549        if not token.startswith(".") and "/" not in token and not _high_risk_evidence_token(token):
3550            continue
3551        if not _token_near_negative_claim(proposed_text, token):
3552            continue
3553        positive_line = _positive_evidence_line_for_token(evidence_lines, token)
3554        if not positive_line:
3555            continue
3556        conflicts.append({"token": token, "evidence": _clip_text(positive_line, 220)})
3557    return conflicts
3558
3559
3560def _file_pattern_tokens_for_grounding(text: str) -> list[str]:
3561    tokens: list[str] = []
3562    seen: set[str] = set()
3563    for match in re.finditer(r"(?<![A-Za-z0-9])(?:\*\.)?\.?([A-Za-z0-9][A-Za-z0-9_-]{1,12})(?![A-Za-z0-9_-])", text or ""):
3564        raw = match.group(0).strip("'\"`")
3565        if not raw:
3566            continue
3567        if not raw.startswith((".", "*.")):
3568            continue
3569        if "." not in raw and not raw.startswith("*."):
3570            continue
3571        if raw.startswith(".") and not raw.startswith("*."):
3572            previous_char = text[match.start() - 1] if match.start() > 0 else ""
3573            next_char = text[match.end()] if match.end() < len(text) else ""
3574            if previous_char == "/" or next_char == "/":
3575                continue
3576        ext = "." + match.group(1).lower().lstrip(".")
3577        if ext in {".app", ".co", ".com", ".dev", ".edu", ".gov", ".io", ".net", ".org", ".www", ".http", ".https"}:
3578            continue
3579        if ext in seen:
3580            continue
3581        seen.add(ext)
3582        tokens.append(ext)
3583    return tokens
3584
3585
3586def _token_near_negative_claim(text: str, token: str, *, window: int = 64) -> bool:
3587    text_lower = text.lower()
3588    token_lower = token.lower()
3589    start = 0
3590    while True:
3591        index = text_lower.find(token_lower, start)
3592        if index < 0:
3593            return False
3594        nearby = text_lower[max(0, index - window): index + len(token_lower) + window]
3595        if any(marker in nearby for marker in NEGATIVE_EXISTENCE_MARKERS):
3596            if _nearby_negative_is_role_classification(nearby):
3597                start = index + len(token_lower)
3598                continue
3599            if _nearby_negative_is_positive_validation(nearby):
3600                start = index + len(token_lower)
3601                continue
3602            return True
3603        start = index + len(token_lower)
3604
3605
3606def _nearby_negative_is_role_classification(text: str) -> bool:
3607    return any(marker in text for marker in NEGATIVE_ROLE_CLASSIFICATION_MARKERS)
3608
3609
3610def _nearby_negative_is_positive_validation(text: str) -> bool:
3611    return bool(re.search(r"\bnot\s+(?:a|an|the)?\s*(?:[\w.-]+\s+){0,5}(?:stub|placeholder|empty file)\b", text))
3612
3613
3614def _positive_evidence_line_for_token(lines: list[str], token: str) -> str:
3615    token_lower = token.lower()
3616    for line in lines:
3617        line_lower = line.lower()
3618        if token_lower not in line_lower:
3619            continue
3620        if _evidence_line_is_negative(line_lower):
3621            continue
3622        return line
3623    return ""
3624
3625
3626def _evidence_line_is_negative(line_lower: str) -> bool:
3627    if any(marker in line_lower for marker in EVIDENCE_NEGATIVE_LINE_MARKERS):
3628        return True
3629    return line_lower.startswith("no ") or " no " in line_lower or line_lower.startswith("zero ") or " zero " in line_lower
3630
3631
3632def _evidence_grounding_proposed_text(tool_name: str, args: dict[str, Any]) -> str:
3633    if tool_name == "record_experiment":
3634        return "\n".join(
3635            _json_value_text(args.get(key))
3636            for key in (
3637                "action",
3638                "baseline",
3639                "command",
3640                "config",
3641                "decision",
3642                "environment",
3643                "evidence",
3644                "evidence_artifact",
3645                "metric_name",
3646                "metric_unit",
3647                "metric_value",
3648                "result",
3649                "status",
3650            )
3651            if args.get(key) is not None
3652        )
3653    if tool_name != "record_memory_graph":
3654        return _json_value_text(args)
3655    parts: list[str] = []
3656    nodes = args.get("nodes") if isinstance(args.get("nodes"), list) else []
3657    for node in nodes:
3658        if not isinstance(node, dict):
3659            continue
3660        for key in ("title", "summary", "tags", "metadata"):
3661            value = node.get(key)
3662            if value:
3663                parts.append(_json_text(value))
3664    edges = args.get("edges") if isinstance(args.get("edges"), list) else []
3665    for edge in edges:
3666        if not isinstance(edge, dict):
3667            continue
3668        for key in ("evidence_refs", "metadata"):
3669            value = edge.get(key)
3670            if value:
3671                parts.append(_json_text(value))
3672    return "\n".join(parts)
3673
3674
3675def _json_text(value: Any) -> str:
3676    try:
3677        return json.dumps(value, ensure_ascii=False, sort_keys=True)
3678    except TypeError:
3679        return str(value)
3680
3681
3682def _json_value_text(value: Any) -> str:
3683    if isinstance(value, dict):
3684        return "\n".join(_json_value_text(item) for item in value.values())
3685    if isinstance(value, list):
3686        return "\n".join(_json_value_text(item) for item in value)
3687    return str(value or "")
3688
3689
3690def _cited_step_numbers(text: str) -> set[int]:
3691    numbers = set()
3692    patterns = [
3693        r"(?i)\bsteps?\s*(?:#|-)?\s*(\d+)\b",
3694        r"(?i)\bstep[_-](\d+)\b",
3695        r"(?i)\bshell_exec[_\s-]*step[_\s#-]*(\d+)\b",
3696        r"(?i)\btool[_\s-]*step[_\s#-]*(\d+)\b",
3697    ]
3698    for pattern in patterns:
3699        for match in re.finditer(pattern, text):
3700            raw = match.group(1)
3701            try:
3702                value = int(raw)
3703            except (TypeError, ValueError):
3704                continue
3705            if value > 0:
3706                numbers.add(value)
3707    return numbers
3708
3709
3710def _evidence_steps_for_grounding(
3711    recent_steps: list[dict[str, Any]],
3712    *,
3713    window: int,
3714    step_numbers: set[int] | None = None,
3715) -> list[dict[str, Any]]:
3716    completed = _completed_recent_steps(recent_steps)
3717    if step_numbers:
3718        steps = [step for step in completed if int(step.get("step_no") or 0) in step_numbers]
3719    else:
3720        steps = completed[-window:]
3721    evidence_steps = [
3722        step
3723        for step in steps
3724        if step.get("tool_name") in {"browser_snapshot", "shell_exec", "web_extract", "web_search", "read_artifact"}
3725    ]
3726    if evidence_steps or not step_numbers:
3727        return evidence_steps
3728    return [
3729        step
3730        for step in completed[-window:]
3731        if step.get("tool_name") in {"browser_snapshot", "shell_exec", "web_extract", "web_search", "read_artifact"}
3732    ]
3733
3734
3735def _recent_evidence_text(
3736    job: dict[str, Any],
3737    recent_steps: list[dict[str, Any]],
3738    *,
3739    window: int,
3740    step_numbers: set[int] | None = None,
3741    include_durable: bool = True,
3742    include_job_context: bool = True,
3743) -> str:
3744    parts: list[str] = []
3745    if include_job_context:
3746        parts.extend([str(job.get("title") or ""), str(job.get("objective") or ""), str(job.get("kind") or "")])
3747    durable_text = _durable_records_for_grounding(job) if include_durable else ""
3748    if include_durable and durable_text:
3749        parts.append(durable_text)
3750    for step in _evidence_steps_for_grounding(recent_steps, window=window, step_numbers=step_numbers):
3751        parts.append(str(step.get("summary") or ""))
3752        input_data = step.get("input") if isinstance(step.get("input"), dict) else {}
3753        if input_data:
3754            parts.append(_json_text(input_data))
3755        output = step.get("output") if isinstance(step.get("output"), dict) else {}
3756        if not output:
3757            continue
3758        for key in ("stdout", "stderr", "text", "content", "excerpt", "query", "command"):
3759            if output.get(key):
3760                parts.append(str(output.get(key)))
3761        pages = output.get("pages") if isinstance(output.get("pages"), list) else []
3762        for page in pages[:6]:
3763            if isinstance(page, dict):
3764                parts.append(_json_text({key: page.get(key) for key in ("url", "title", "text", "error", "source_warning")}))
3765        results = output.get("results") if isinstance(output.get("results"), list) else []
3766        for item in results[:8]:
3767            if isinstance(item, dict):
3768                parts.append(_json_text({key: item.get(key) for key in ("url", "title", "snippet")}))
3769    return "\n".join(parts)
3770
3771
3772def _active_stale_claim_token_set(job: dict[str, Any]) -> set[str]:
3773    metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
3774    raw_tokens = metadata.get("unsupported_claim_tokens") if isinstance(metadata.get("unsupported_claim_tokens"), list) else []
3775    filtered = _stale_claim_tokens_from_unsupported(
3776        [str(token) for token in raw_tokens],
3777        reference_text=" ".join(str(job.get(key) or "") for key in ("title", "objective", "kind")),
3778    )
3779    return {str(token).strip().lower() for token in filtered if str(token).strip()}
3780
3781
3782def _durable_records_for_grounding(job: dict[str, Any]) -> str:
3783    metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
3784    parts: list[str] = []
3785    for finding in _metadata_list(job, "finding_ledger")[-20:]:
3786        parts.append(_json_text({
3787            "finding": finding.get("name") or finding.get("title"),
3788            "category": finding.get("category"),
3789            "reason": finding.get("reason") or finding.get("summary"),
3790            "location": finding.get("location"),
3791            "status": finding.get("status"),
3792            "evidence_artifact": finding.get("evidence_artifact"),
3793            "url": finding.get("url"),
3794            "metadata": finding.get("metadata") if isinstance(finding.get("metadata"), dict) else {},
3795        }))
3796    for experiment in _metadata_list(job, "experiment_ledger")[-12:]:
3797        parts.append(_json_text({
3798            "experiment": experiment.get("title") or experiment.get("name"),
3799            "hypothesis": experiment.get("hypothesis"),
3800            "status": experiment.get("status"),
3801            "metric_name": experiment.get("metric_name"),
3802            "metric_value": experiment.get("metric_value"),
3803            "metric_unit": experiment.get("metric_unit"),
3804            "result": experiment.get("result"),
3805            "next_action": experiment.get("next_action"),
3806            "config": experiment.get("config") if isinstance(experiment.get("config"), dict) else {},
3807        }))
3808    for source in _metadata_list(job, "source_ledger")[-12:]:
3809        parts.append(_json_text({
3810            "source": source.get("source") or source.get("url"),
3811            "source_type": source.get("source_type"),
3812            "outcome": source.get("outcome"),
3813            "score": source.get("score"),
3814        }))
3815    roadmap = metadata.get("roadmap") if isinstance(metadata.get("roadmap"), dict) else {}
3816    if roadmap:
3817        parts.append(_json_text({
3818            "roadmap": roadmap.get("title"),
3819            "objective": roadmap.get("objective"),
3820            "current_milestone": roadmap.get("current_milestone"),
3821            "validation_contract": roadmap.get("validation_contract"),
3822        }))
3823    graph = metadata.get("memory_graph") if isinstance(metadata.get("memory_graph"), dict) else {}
3824    nodes = graph.get("nodes") if isinstance(graph.get("nodes"), list) else []
3825    for node in [item for item in nodes if isinstance(item, dict)][-20:]:
3826        parts.append(_json_text({
3827            "memory_node": node.get("key"),
3828            "kind": node.get("kind"),
3829            "title": node.get("title"),
3830            "summary": node.get("summary"),
3831        }))
3832    return "\n".join(parts)
3833
3834
3835def _concrete_evidence_tokens(text: str) -> list[str]:
3836    text = text.replace("\\n", "\n").replace("\\t", "\t")
3837    tokens: list[str] = []
3838    seen_numeric: set[str] = set()
3839    for raw in re.findall(
3840        r"(?i)\b\d+(?:\.\d+)?\s*(?:[KMGTPE]i?B|[KMGTPE]|bytes?|tok/s|t/s|tokens/sec|tokens/s|ms|sec|secs|seconds?|minutes?|hours?|%)\b",
3841        text,
3842    ):
3843        token = re.sub(r"\s+", "", raw.strip())
3844        key = token.lower()
3845        if key in seen_numeric:
3846            continue
3847        seen_numeric.add(key)
3848        tokens.append(token)
3849    for raw in re.findall(r"\b[A-Za-z][A-Za-z0-9_.+-]{1,}\b", text):
3850        token = raw.strip("._+-")
3851        if not token:
3852            continue
3853        lowered = token.lower()
3854        if lowered in EVIDENCE_TOKEN_IGNORE:
3855            continue
3856        if re.match(r"^[a-z]\d+$", token):
3857            continue
3858        if _looks_like_generated_evidence_token(token):
3859            continue
3860        if lowered.startswith("art_"):
3861            continue
3862        if lowered.startswith("step_"):
3863            continue
3864        if lowered.endswith("_output") or lowered.endswith("_stdout") or lowered.endswith("_stderr"):
3865            continue
3866        if token.isupper() and len(token) >= 3:
3867            tokens.append(token)
3868            continue
3869        if any(ch.isdigit() for ch in token) and any(ch.isalpha() for ch in token):
3870            tokens.append(token)
3871            continue
3872        if token[:1].isupper() and token[1:].islower() and len(token) >= 4:
3873            tokens.append(token)
3874            continue
3875    return tokens
3876
3877
3878def _high_risk_evidence_token(token: str) -> bool:
3879    lowered = token.lower()
3880    if not token or lowered in EVIDENCE_TOKEN_IGNORE:
3881        return False
3882    if _looks_like_generated_evidence_token(token):
3883        return False
3884    if lowered.startswith(("art_", "step_")):
3885        return False
3886    if lowered.endswith((".md", ".py", ".json", ".yaml", ".yml", ".gguf", ".txt", ".log")):
3887        return True
3888    if any(ch.isdigit() for ch in token) and any(ch.isalpha() for ch in token):
3889        return True
3890    if token.isupper() and len(token) >= 3:
3891        return True
3892    return False
3893
3894
3895def _looks_like_generated_evidence_token(token: str) -> bool:
3896    lowered = token.lower().strip()
3897    if re.match(
3898        r"^(?:art|step|shell|web|episode|fact|source|quality|constraint|baseline|question|verified|timeout)[_-]\d+[a-z]*$",
3899        lowered,
3900    ):
3901        return True
3902    return bool(re.match(r"^(?:shell|web|browser|tool)[a-z0-9_-]*[_-]step[_-]?\d+[a-z]*$", lowered))
3903
3904
3905def _step_has_evidence(step: dict[str, Any]) -> bool:
3906    tool_name = step.get("tool_name")
3907    output = step.get("output") if isinstance(step.get("output"), dict) else {}
3908    if tool_name == "web_extract":
3909        pages = output.get("pages") if isinstance(output.get("pages"), list) else []
3910        for page in pages:
3911            if page.get("error"):
3912                continue
3913            if str(page.get("text") or "").strip():
3914                return True
3915    if tool_name in {"browser_navigate", "browser_snapshot"}:
3916        data = output.get("data") if isinstance(output.get("data"), dict) else {}
3917        snapshot = str(output.get("snapshot") or data.get("snapshot") or "")
3918        if anti_bot_reason(str(data.get("title") or ""), str(data.get("url") or data.get("origin") or ""), snapshot):
3919            return False
3920        return len(snapshot.strip()) >= 500
3921    if tool_name == "shell_exec":
3922        text = "\n".join(str(output.get(key) or "") for key in ("stdout", "stderr"))
3923        return len(text.strip()) >= 1000
3924    return False
3925
3926
3927def _unpersisted_evidence_step(recent_steps: list[dict[str, Any]]) -> dict[str, Any] | None:
3928    for step in reversed(recent_steps):
3929        if step.get("status") not in {"completed", "blocked"}:
3930            continue
3931        output = step.get("output") if isinstance(step.get("output"), dict) else {}
3932        if step.get("tool_name") == "write_artifact":
3933            return None
3934        if isinstance(output.get("auto_checkpoint"), dict):
3935            return None
3936        if step.get("status") == "completed" and _step_has_evidence(step):
3937            return step
3938    return None
3939
3940
3941def _evidence_checkpoint_accounting_for_prompt(job: dict[str, Any], recent_steps: list[dict[str, Any]]) -> str:
3942    context = _auto_checkpoint_accounting_context(job, recent_steps)
3943    if not context:
3944        return "None."
3945    read_text = "already read" if context.get("checkpoint_read") else "not read yet"
3946    next_action = (
3947        "Next use record_findings, record_source, record_experiment, record_tasks, record_roadmap, "
3948        "record_milestone_validation, or record_lesson to account for it. Do not read the checkpoint again. "
3949        if context.get("checkpoint_read")
3950        else "Next either read that checkpoint artifact, or use record_findings, record_source, record_experiment, "
3951        "record_tasks, record_roadmap, record_milestone_validation, or record_lesson to account for it. "
3952    )
3953    return (
3954        "An auto-saved evidence checkpoint is waiting for durable accounting. "
3955        f"artifact={context.get('artifact_id') or '?'} title={context.get('title') or ''} "
3956        f"evidence_step={context.get('evidence_step_no') or context.get('evidence_step') or '?'} "
3957        f"blocked_tool={context.get('blocked_tool') or ''} status={read_text}. "
3958        f"{next_action}"
3959        "Do not continue shell, search, file, artifact, report, or other branch work until this is resolved."
3960    )
3961
3962
3963def _pending_evidence_checkpoint(job: dict[str, Any]) -> dict[str, Any] | None:
3964    metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
3965    checkpoint = metadata.get("pending_evidence_checkpoint")
3966    if isinstance(checkpoint, dict) and checkpoint and not checkpoint.get("resolved_at"):
3967        return checkpoint
3968    return None
3969
3970
3971def _step_created_auto_checkpoint(step: dict[str, Any]) -> dict[str, Any] | None:
3972    output = step.get("output") if isinstance(step.get("output"), dict) else {}
3973    checkpoint = output.get("auto_checkpoint")
3974    if not isinstance(checkpoint, dict):
3975        return None
3976    if not checkpoint.get("artifact_id"):
3977        return None
3978    # Only auto-persisted checkpoints have a stored path. Guard-context payloads use a
3979    # different key so they cannot reset the read/accounting state.
3980    if not checkpoint.get("path"):
3981        return None
3982    return checkpoint
3983
3984
3985def _auto_checkpoint_accounting_context(job: dict[str, Any], recent_steps: list[dict[str, Any]]) -> dict[str, Any] | None:
3986    pending = _pending_evidence_checkpoint(job)
3987    if pending:
3988        return {
3989            "artifact_id": str(pending.get("artifact_id") or ""),
3990            "title": str(pending.get("title") or ""),
3991            "checkpoint_step_no": pending.get("checkpoint_step_no"),
3992            "evidence_step": pending.get("evidence_step"),
3993            "evidence_step_no": pending.get("evidence_step_no"),
3994            "blocked_tool": pending.get("blocked_tool"),
3995            "checkpoint_read": bool(pending.get("read_at")),
3996            "read_at": pending.get("read_at"),
3997            "created_at": pending.get("created_at"),
3998            "source": "job_metadata",
3999        }
4000    checkpoint_step = None
4001    checkpoint = None
4002    for step in reversed(recent_steps):
4003        created = _step_created_auto_checkpoint(step)
4004        if created:
4005            checkpoint_step = step
4006            checkpoint = created
4007            break
4008    if not checkpoint_step or not checkpoint:
4009        return None
4010    checkpoint_step_no = int(checkpoint_step.get("step_no") or 0)
4011    tail = [step for step in recent_steps if int(step.get("step_no") or 0) > checkpoint_step_no]
4012    if any(step.get("tool_name") in EVIDENCE_CHECKPOINT_ACCOUNTING_TOOLS for step in tail if step.get("status") == "completed"):
4013        return None
4014    artifact_id = str(checkpoint.get("artifact_id") or "")
4015    artifact_title = str(checkpoint.get("title") or "")
4016    checkpoint_read = any(
4017        step.get("tool_name") == "read_artifact"
4018        and step.get("status") == "completed"
4019        and _read_artifact_args_match_checkpoint(step, artifact_id=artifact_id, artifact_title=artifact_title)
4020        for step in tail
4021    )
4022    return {
4023        "artifact_id": artifact_id,
4024        "title": artifact_title,
4025        "checkpoint_step_no": checkpoint_step.get("step_no"),
4026        "evidence_step": checkpoint.get("evidence_step"),
4027        "blocked_tool": checkpoint.get("blocked_tool"),
4028        "checkpoint_read": checkpoint_read,
4029    }
4030
4031
4032def _evidence_checkpoint_blocks_tool(name: str, args: dict[str, Any], context: dict[str, Any] | None) -> bool:
4033    if not context:
4034        return False
4035    if name in EVIDENCE_CHECKPOINT_RESOLUTION_TOOLS or name == "acknowledge_operator_context":
4036        return False
4037    if (
4038        name == "read_artifact"
4039        and not context.get("checkpoint_read")
4040        and _read_artifact_call_matches_checkpoint(
4041            args,
4042            artifact_id=str(context.get("artifact_id") or ""),
4043            artifact_title=str(context.get("title") or ""),
4044        )
4045    ):
4046        return False
4047    return True
4048
4049
4050def _evidence_checkpoint_block_guidance(context: dict[str, Any]) -> str:
4051    tools = (
4052        "record_findings, record_source, record_experiment, record_tasks, "
4053        "record_roadmap, record_milestone_validation, or record_lesson"
4054    )
4055    if context.get("checkpoint_read"):
4056        return (
4057            "The auto-saved evidence checkpoint has already been read. Do not read it again. "
4058            f"Use {tools} to account for what the checkpoint proved, rejected, changed, or blocked "
4059            "before more shell, search, file, report, artifact, or branch work."
4060        )
4061    return (
4062        "An auto-saved evidence checkpoint is waiting to be converted into durable progress. "
4063        f"Read that checkpoint artifact once, or use {tools} to account for it before more shell, "
4064        "search, file, report, artifact, or other branch work."
4065    )
4066
4067
4068def _read_artifact_args_match_checkpoint(step: dict[str, Any], *, artifact_id: str, artifact_title: str) -> bool:
4069    input_data = step.get("input") if isinstance(step.get("input"), dict) else {}
4070    args = input_data.get("arguments") if isinstance(input_data.get("arguments"), dict) else {}
4071    return _read_artifact_call_matches_checkpoint(args, artifact_id=artifact_id, artifact_title=artifact_title)
4072
4073
4074def _read_artifact_call_matches_checkpoint(args: dict[str, Any], *, artifact_id: str, artifact_title: str) -> bool:
4075    values = [str(args.get(key) or "").strip() for key in ("artifact_id", "id", "title", "query")]
4076    values = [value for value in values if value]
4077    if artifact_id and artifact_id in values:
4078        return True
4079    return bool(artifact_title and any(value == artifact_title for value in values))
4080
4081
4082def _recent_search_streak(recent_steps: list[dict[str, Any]]) -> int:
4083    return _recent_tool_streak(recent_steps, "web_search")
4084
4085
4086def _pending_measurement_obligation(job: dict[str, Any]) -> dict[str, Any] | None:
4087    metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
4088    obligation = metadata.get("pending_measurement_obligation")
4089    if isinstance(obligation, dict) and obligation and not obligation.get("resolved_at"):
4090        return obligation
4091    return None
4092
4093
4094CODELIKE_FILE_SUFFIXES = {
4095    ".bash",
4096    ".cfg",
4097    ".cjs",
4098    ".conf",
4099    ".cpp",
4100    ".css",
4101    ".go",
4102    ".h",
4103    ".hpp",
4104    ".ini",
4105    ".java",
4106    ".js",
4107    ".json",
4108    ".jsx",
4109    ".lua",
4110    ".mjs",
4111    ".php",
4112    ".pl",
4113    ".py",
4114    ".rb",
4115    ".rs",
4116    ".sh",
4117    ".sql",
4118    ".toml",
4119    ".ts",
4120    ".tsx",
4121    ".yaml",
4122    ".yml",
4123    ".zsh",
4124}
4125
4126
4127def _pending_file_validation_obligation(job: dict[str, Any]) -> dict[str, Any] | None:
4128    metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
4129    obligation = metadata.get("pending_file_validation_obligation")
4130    if isinstance(obligation, dict) and obligation and not obligation.get("resolved_at"):
4131        return obligation
4132    return None
4133
4134
4135def _file_output_needs_validation(path: str, content: str) -> bool:
4136    suffix = Path(path).suffix.lower()
4137    if suffix in CODELIKE_FILE_SUFFIXES:
4138        return True
4139    first_line = content.lstrip().splitlines()[0] if content.strip() else ""
4140    if first_line.startswith("#!"):
4141        return True
4142    lowered = Path(path).name.lower()
4143    return lowered in {"dockerfile", "makefile", "justfile", "procfile"}
4144
4145
4146def _suggested_file_validation(path: str) -> str:
4147    suffix = Path(path).suffix.lower()
4148    quoted = shlex_quote(path)
4149    if suffix == ".py":
4150        return f"python3 -m py_compile {quoted}"
4151    if suffix in {".sh", ".bash", ".zsh"}:
4152        return f"bash -n {quoted}"
4153    if suffix == ".json":
4154        return f"python3 -m json.tool {quoted}"
4155    if suffix in {".yaml", ".yml"}:
4156        return f"python3 - <<'PY'\nimport pathlib, yaml\nyaml.safe_load(pathlib.Path({path!r}).read_text())\nPY"
4157    return f"run the narrowest available syntax check, test, or dry-run for {quoted}"
4158
4159
4160def shlex_quote(value: str) -> str:
4161    return "'" + str(value).replace("'", "'\"'\"'") + "'"
4162
4163
4164def _clear_invalid_measurement_obligation(db: AgentDB, job_id: str) -> bool:
4165    job = db.get_job(job_id)
4166    obligation = _pending_measurement_obligation(job)
4167    if not obligation:
4168        return False
4169    candidates = obligation.get("metric_candidates") if isinstance(obligation.get("metric_candidates"), list) else []
4170    if not candidates:
4171        return False
4172    command = str(obligation.get("command") or "")
4173    if not measurement_candidates_are_diagnostic_only(candidates, command=command):
4174        return False
4175    db.update_job_metadata(job_id, {"pending_measurement_obligation": {}})
4176    db.append_agent_update(
4177        job_id,
4178        "Cleared measurement obligation because the output was diagnostic context, not a trial result.",
4179        category="progress",
4180        metadata={"cleared_measurement_obligation": obligation},
4181    )
4182    return True
4183
4184
4185def _progress_churn_context(recent_steps: list[dict[str, Any]], *, window: int = 10) -> dict[str, Any] | None:
4186    completed = [step for step in recent_steps if step.get("status") == "completed"]
4187    tail = completed[-window:]
4188    if len(tail) < 8:
4189        return None
4190    if any(step.get("tool_name") in LEDGER_PROGRESS_TOOLS for step in tail):
4191        return None
4192    churn_count = sum(1 for step in tail if step.get("tool_name") in CHURN_TOOLS)
4193    if churn_count < 7:
4194        return None
4195    return {
4196        "window": len(tail),
4197        "churn_count": churn_count,
4198        "since_step": tail[0].get("step_no"),
4199        "tools": [step.get("tool_name") or step.get("kind") for step in tail],
4200    }
4201
4202
4203def _activity_stagnation_context(job: dict[str, Any]) -> dict[str, Any] | None:
4204    metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
4205    streak = _as_int(metadata.get("activity_checkpoint_streak"))
4206    if streak < ACTIVITY_STAGNATION_CHECKPOINTS:
4207        return None
4208    counts = metadata.get("last_checkpoint_counts") if isinstance(metadata.get("last_checkpoint_counts"), dict) else {}
4209    return {
4210        "streak": streak,
4211        "threshold": ACTIVITY_STAGNATION_CHECKPOINTS,
4212        "counts": {key: _as_int(counts.get(key)) for key in ("findings", "sources", "tasks", "experiments", "lessons", "milestones")},
4213    }
4214
4215
4216def _research_balance_context(
4217    job: dict[str, Any],
4218    recent_steps: list[dict[str, Any]],
4219    *,
4220    window: int = 28,
4221    min_execution_actions: int = 5,
4222) -> dict[str, Any] | None:
4223    metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
4224    sources = len(_metadata_list(job, "source_ledger"))
4225    findings = len(_metadata_list(job, "finding_ledger"))
4226    experiments = len(_metadata_list(job, "experiment_ledger"))
4227    if sources > 0 or findings > 0:
4228        return None
4229    if metadata.get("pending_measurement_obligation"):
4230        return None
4231    completed = [step for step in recent_steps if step.get("status") == "completed"]
4232    if not completed:
4233        return None
4234    tail = completed[-window:]
4235    execution_tools = {"shell_exec", "write_file", "record_experiment", "write_artifact"}
4236    research_tools = {
4237        "web_search",
4238        "web_extract",
4239        "browser_navigate",
4240        "browser_snapshot",
4241        "browser_click",
4242        "record_source",
4243        "record_findings",
4244    }
4245    execution_actions = [step for step in tail if step.get("tool_name") in execution_tools]
4246    research_actions = [step for step in tail if step.get("tool_name") in research_tools]
4247    file_actions = [step for step in tail if step.get("tool_name") in {"write_file", "shell_exec"}]
4248    tasks = _metadata_list(job, "task_queue")
4249    active_research_tasks = [
4250        task
4251        for task in tasks
4252        if str(task.get("status") or "open") in {"open", "active", "blocked"}
4253        and str(task.get("output_contract") or "") == "research"
4254    ]
4255    has_research_intent = bool(active_research_tasks) or any(
4256        "research" in str(job.get(key) or "").lower()
4257        for key in ("title", "objective", "kind")
4258    )
4259    if len(execution_actions) < min_execution_actions and not (has_research_intent and len(file_actions) >= 3):
4260        return None
4261    if research_actions and not (has_research_intent and len(execution_actions) >= min_execution_actions * 2):
4262        return None
4263    if experiments <= 0 and len(execution_actions) < min_execution_actions + 2:
4264        return None
4265    return {
4266        "completed_window": len(tail),
4267        "execution_actions": len(execution_actions),
4268        "research_actions": len(research_actions),
4269        "sources": sources,
4270        "findings": findings,
4271        "experiments": experiments,
4272        "files": len(file_actions),
4273    }
4274
4275
4276def _source_yield_context(job: dict[str, Any], recent_steps: list[dict[str, Any]]) -> dict[str, Any] | None:
4277    sources = _metadata_list(job, "source_ledger")
4278    source_count = len(sources)
4279    if source_count < SOURCE_YIELD_MIN_SOURCES:
4280        return None
4281    findings = _metadata_list(job, "finding_ledger")
4282    yielded_sources = [
4283        source
4284        for source in sources
4285        if _as_int(source.get("yield_count")) > 0
4286        or _as_float(source.get("usefulness_score")) >= 0.8
4287    ]
4288    required_yield = max(2, source_count // 8)
4289    if len(findings) + len(yielded_sources) >= required_yield:
4290        return None
4291    completed = [step for step in recent_steps if step.get("status") == "completed"]
4292    last_synthesis_no = 0
4293    for step in completed:
4294        if step.get("tool_name") in {
4295            "record_findings",
4296            "record_source",
4297            "record_tasks",
4298            "record_roadmap",
4299            "record_milestone_validation",
4300            "record_lesson",
4301        }:
4302            last_synthesis_no = max(last_synthesis_no, _as_int(step.get("step_no")))
4303    gathering_after_synthesis = [
4304        step
4305        for step in completed
4306        if _as_int(step.get("step_no")) > last_synthesis_no
4307        and step.get("tool_name") in {
4308            "web_search",
4309            "web_extract",
4310            "browser_navigate",
4311            "browser_snapshot",
4312            "browser_click",
4313            "browser_scroll",
4314        }
4315    ]
4316    recent_gathering = gathering_after_synthesis[-24:]
4317    if len(recent_gathering) < SOURCE_YIELD_MIN_RECENT_GATHERING:
4318        return None
4319    recent_source_titles = [
4320        str(source.get("source") or source.get("title") or "").strip()
4321        for source in sources[-8:]
4322        if str(source.get("source") or source.get("title") or "").strip()
4323    ]
4324    return {
4325        "sources": source_count,
4326        "findings": len(findings),
4327        "yielded_sources": len(yielded_sources),
4328        "required_yield": required_yield,
4329        "recent_gathering": len(recent_gathering),
4330        "since_step": recent_gathering[0].get("step_no") if recent_gathering else None,
4331        "recent_source_titles": recent_source_titles,
4332    }
4333
4334
4335def _artifact_accounting_context(
4336    recent_steps: list[dict[str, Any]],
4337    *,
4338    threshold: int = 3,
4339    window: int = 12,
4340) -> dict[str, Any] | None:
4341    completed = [step for step in recent_steps if step.get("status") == "completed"]
4342    tail: list[dict[str, Any]] = []
4343    for step in reversed(completed[-window:]):
4344        if step.get("tool_name") in LEDGER_PROGRESS_TOOLS:
4345            break
4346        tail.append(step)
4347    tail.reverse()
4348    artifact_steps = [step for step in tail if step.get("tool_name") == "write_artifact"]
4349    if len(artifact_steps) < threshold:
4350        return None
4351    titles = []
4352    for step in artifact_steps[-5:]:
4353        input_data = step.get("input") if isinstance(step.get("input"), dict) else {}
4354        args = input_data.get("arguments") if isinstance(input_data.get("arguments"), dict) else {}
4355        title = str(args.get("title") or step.get("summary") or f"step #{step.get('step_no')}")
4356        titles.append(_clip_text(title, 120))
4357    return {
4358        "artifact_count": len(artifact_steps),
4359        "since_step": tail[0].get("step_no") if tail else None,
4360        "artifact_steps": [step.get("step_no") for step in artifact_steps],
4361        "artifact_titles": titles,
4362        "tools": [step.get("tool_name") or step.get("kind") for step in tail],
4363    }
4364
4365
4366def _job_requires_measured_progress(job: dict[str, Any]) -> bool:
4367    text_parts = [
4368        str(job.get("title") or ""),
4369        str(job.get("objective") or ""),
4370        str(job.get("kind") or ""),
4371    ]
4372    tasks = _metadata_list(job, "task_queue")
4373    for task in tasks:
4374        status = str(task.get("status") or "open")
4375        if status in {"done", "skipped"}:
4376            continue
4377        contract = str(task.get("output_contract") or "")
4378        if contract in {"experiment", "monitor"}:
4379            return True
4380        if contract == "action" and _task_text_requires_measurement(task):
4381            return True
4382        text_parts.extend(
4383            str(task.get(key) or "")
4384            for key in ("title", "goal", "acceptance_criteria", "evidence_needed", "stall_behavior")
4385        )
4386    return any(MEASURABLE_PROGRESS_PATTERN.search(part) for part in text_parts if part)
4387
4388
4389def _task_text_requires_measurement(task: dict[str, Any]) -> bool:
4390    return any(
4391        MEASURABLE_PROGRESS_PATTERN.search(str(task.get(key) or ""))
4392        for key in ("title", "goal", "acceptance_criteria", "evidence_needed", "stall_behavior")
4393    )
4394
4395
4396def _job_requires_deliverable_progress(job: dict[str, Any]) -> bool:
4397    tasks = _metadata_list(job, "task_queue")
4398    report_tasks: list[dict[str, Any]] = []
4399    competing_execution_tasks: list[dict[str, Any]] = []
4400    for task in tasks:
4401        status = str(task.get("status") or "open").strip().lower()
4402        if status in {"done", "skipped"}:
4403            continue
4404        contract = str(task.get("output_contract") or "").strip().lower()
4405        if contract == "report":
4406            report_tasks.append(task)
4407        elif contract in {"action", "experiment", "monitor"}:
4408            competing_execution_tasks.append(task)
4409    if report_tasks:
4410        active_report = any(str(task.get("status") or "open").strip().lower() == "active" for task in report_tasks)
4411        active_competing = any(
4412            str(task.get("status") or "open").strip().lower() == "active"
4413            for task in competing_execution_tasks
4414        )
4415        max_report_priority = max(_as_int(task.get("priority")) for task in report_tasks)
4416        higher_priority_competing = any(
4417            _as_int(task.get("priority")) >= max_report_priority
4418            for task in competing_execution_tasks
4419        )
4420        if active_report or (not active_competing and not higher_priority_competing):
4421            return True
4422    text = " ".join(str(job.get(key) or "") for key in ("title", "objective", "kind")).lower()
4423    tokens = set(re.findall(r"[a-z][a-z0-9_-]+", text))
4424    objective_terms = DELIVERABLE_ARTIFACT_TERMS - {"compiled", "final", "revision", "section", "updated"}
4425    return bool(tokens & objective_terms)
4426
4427
4428def _step_is_deliverable_checkpoint(step: dict[str, Any]) -> bool:
4429    tool = step.get("tool_name")
4430    if tool == "write_file":
4431        return True
4432    if tool != "write_artifact":
4433        return False
4434    input_data = step.get("input") if isinstance(step.get("input"), dict) else {}
4435    args = input_data.get("arguments") if isinstance(input_data.get("arguments"), dict) else {}
4436    text = " ".join(
4437        str(value or "")
4438        for value in (
4439            args.get("title"),
4440            args.get("summary"),
4441            args.get("artifact_type"),
4442            step.get("summary"),
4443        )
4444    ).lower()
4445    tokens = set(re.findall(r"[a-z][a-z0-9_-]+", text))
4446    if tokens & EVIDENCE_ARTIFACT_TERMS:
4447        return False
4448    return bool(tokens & DELIVERABLE_ARTIFACT_TERMS)
4449
4450
4451def _deliverable_progress_guard_context(
4452    job: dict[str, Any],
4453    recent_steps: list[dict[str, Any]],
4454    *,
4455    budget: int = DELIVERABLE_RESEARCH_BUDGET_STEPS,
4456) -> dict[str, Any] | None:
4457    if not _job_requires_deliverable_progress(job):
4458        return None
4459    completed = [step for step in recent_steps if step.get("status") == "completed"]
4460    if not completed:
4461        return None
4462    last_checkpoint_index = -1
4463    for index, step in enumerate(completed):
4464        if _step_is_deliverable_checkpoint(step):
4465            last_checkpoint_index = index
4466    tail = completed[last_checkpoint_index + 1 :]
4467    branch_activity = [
4468        step
4469        for step in tail
4470        if step.get("tool_name") in BRANCH_WORK_TOOLS
4471        or (
4472            step.get("tool_name") == "shell_exec"
4473            and _shell_command_looks_read_only(_step_command(step))
4474        )
4475    ]
4476    if len(branch_activity) < budget:
4477        return None
4478    deliverable_accounting_tools = {"record_tasks", "record_roadmap", "record_milestone_validation", "record_lesson"}
4479    if any(step.get("tool_name") in deliverable_accounting_tools for step in tail[-6:]):
4480        return None
4481    return {
4482        "reason": "no deliverable checkpoint yet" if last_checkpoint_index < 0 else "no recent deliverable checkpoint",
4483        "research_budget": budget,
4484        "completed_since_last_deliverable": len(tail),
4485        "branch_activity": len(branch_activity),
4486        "since_step": branch_activity[0].get("step_no") if branch_activity else None,
4487        "tools": [step.get("tool_name") or step.get("kind") for step in branch_activity[-10:]],
4488    }
4489
4490
4491def _step_command(step: dict[str, Any]) -> str:
4492    input_data = step.get("input") if isinstance(step.get("input"), dict) else {}
4493    args = input_data.get("arguments") if isinstance(input_data.get("arguments"), dict) else {}
4494    return str(args.get("command") or "")
4495
4496
4497def _read_only_shell_churn_context(recent_steps: list[dict[str, Any]], *, window: int = 10, threshold: int = 3) -> dict[str, Any] | None:
4498    completed = [step for step in recent_steps if step.get("status") == "completed"]
4499    if not completed:
4500        return None
4501    tail = completed[-window:]
4502    read_only_shell = [
4503        step
4504        for step in tail
4505        if step.get("tool_name") == "shell_exec" and _shell_command_looks_read_only(_step_command(step))
4506    ]
4507    if len(read_only_shell) < threshold:
4508        return None
4509    action_steps = [
4510        step
4511        for step in tail
4512        if step.get("tool_name") in {"write_file", "write_artifact", "defer_job"}
4513        or step.get("tool_name") in {
4514            "record_experiment",
4515            "record_findings",
4516            "record_lesson",
4517            "record_milestone_validation",
4518            "record_roadmap",
4519            "record_source",
4520            "record_tasks",
4521            "report_update",
4522        }
4523        or (step.get("tool_name") == "shell_exec" and not _shell_command_looks_read_only(_step_command(step)))
4524    ]
4525    if action_steps:
4526        return None
4527    return {
4528        "read_only_shell_count": len(read_only_shell),
4529        "threshold": threshold,
4530        "window": len(tail),
4531        "since_step": read_only_shell[0].get("step_no"),
4532        "commands": [_clip_text(_step_command(step), 140) for step in read_only_shell[-5:]],
4533    }
4534
4535
4536def _experiment_metric_group_key(experiment: dict[str, Any]) -> tuple[str, str, bool] | None:
4537    metric_name = str(experiment.get("metric_name") or "").strip().lower()
4538    if not metric_name:
4539        return None
4540    if experiment.get("metric_value") is None:
4541        return None
4542    return (
4543        metric_name,
4544        str(experiment.get("metric_unit") or "").strip().lower(),
4545        bool(experiment.get("higher_is_better", True)),
4546    )
4547
4548
4549def _experiment_metric_number(experiment: dict[str, Any]) -> float | None:
4550    try:
4551        return float(experiment.get("metric_value"))
4552    except (TypeError, ValueError):
4553        return None
4554
4555
4556def _experiment_value_improves(*, value: float, best_value: float, higher_is_better: bool) -> bool:
4557    return value > best_value if higher_is_better else value < best_value
4558
4559
4560def _experiment_stagnation_context(job: dict[str, Any], recent_steps: list[dict[str, Any]]) -> dict[str, Any] | None:
4561    if not _job_requires_measured_progress(job):
4562        return None
4563    experiments = [
4564        experiment
4565        for experiment in _metadata_list(job, "experiment_ledger")
4566        if str(experiment.get("status") or "").lower() == "measured"
4567        and _experiment_metric_group_key(experiment) is not None
4568    ]
4569    if len(experiments) < EXPERIMENT_STAGNATION_MIN_TRIALS:
4570        return None
4571    latest = experiments[-1]
4572    key = _experiment_metric_group_key(latest)
4573    if key is None:
4574        return None
4575    group = [experiment for experiment in experiments if _experiment_metric_group_key(experiment) == key]
4576    if len(group) < EXPERIMENT_STAGNATION_MIN_TRIALS:
4577        return None
4578    higher_is_better = bool(latest.get("higher_is_better", True))
4579    best_index = 0
4580    best_value = _experiment_metric_number(group[0])
4581    for index, experiment in enumerate(group[1:], start=1):
4582        value = _experiment_metric_number(experiment)
4583        if value is None:
4584            continue
4585        if best_value is None or _experiment_value_improves(
4586            value=value,
4587            best_value=best_value,
4588            higher_is_better=higher_is_better,
4589        ):
4590            best_index = index
4591            best_value = value
4592    if best_value is None:
4593        return None
4594    non_improving = group[best_index + 1:]
4595    if len(non_improving) < EXPERIMENT_STAGNATION_NON_IMPROVING:
4596        return None
4597    last_experiment_step_no = 0
4598    for step in recent_steps:
4599        if step.get("tool_name") == "record_experiment" and str(step.get("status") or "").lower() == "completed":
4600            last_experiment_step_no = max(last_experiment_step_no, _as_int(step.get("step_no")))
4601    if last_experiment_step_no > 0:
4602        decision_tools = {"record_lesson", "record_tasks", "record_roadmap", "record_milestone_validation"}
4603        if any(
4604            _as_int(step.get("step_no")) > last_experiment_step_no
4605            and str(step.get("status") or "").lower() == "completed"
4606            and step.get("tool_name") in decision_tools
4607            for step in recent_steps
4608        ):
4609            return None
4610    best = group[best_index]
4611    return {
4612        "metric_name": latest.get("metric_name"),
4613        "metric_unit": latest.get("metric_unit"),
4614        "higher_is_better": higher_is_better,
4615        "best_title": best.get("title"),
4616        "best_value": best.get("metric_value"),
4617        "latest_title": latest.get("title"),
4618        "latest_value": latest.get("metric_value"),
4619        "non_improving_count": len(non_improving),
4620        "recent_trials": len(group),
4621        "recent_titles": [str(experiment.get("title") or "") for experiment in non_improving[-5:]],
4622    }
4623
4624
4625def _measured_progress_guard_context(
4626    job: dict[str, Any],
4627    recent_steps: list[dict[str, Any]],
4628    *,
4629    budget: int = MEASURABLE_RESEARCH_BUDGET_STEPS,
4630) -> dict[str, Any] | None:
4631    if not _job_requires_measured_progress(job):
4632        return None
4633    if _pending_measurement_obligation(job):
4634        return None
4635    completed = [step for step in recent_steps if step.get("status") == "completed"]
4636    if not completed:
4637        return None
4638    last_experiment_index = -1
4639    for index, step in enumerate(completed):
4640        if step.get("tool_name") == "record_experiment":
4641            last_experiment_index = index
4642    tail = completed[last_experiment_index + 1 :]
4643    branch_activity = [step for step in tail if step.get("tool_name") in BRANCH_WORK_TOOLS | {"write_artifact"}]
4644    shell_actions = [step for step in tail if step.get("tool_name") == "shell_exec"]
4645    if len(branch_activity) < budget and len(shell_actions) < MEASURABLE_ACTION_BUDGET_STEPS:
4646        return None
4647    if any(_step_accounts_for_measured_progress_guard(step) for step in tail[-6:]):
4648        return None
4649    experiments = _metadata_list(job, "experiment_ledger")
4650    reason = "no experiment records yet" if not experiments else "no recent experiment update"
4651    return {
4652        "reason": reason,
4653        "research_budget": budget,
4654        "shell_action_budget": MEASURABLE_ACTION_BUDGET_STEPS,
4655        "completed_since_last_experiment": len(tail),
4656        "branch_activity": len(branch_activity),
4657        "shell_actions_since_last_experiment": len(shell_actions),
4658        "since_step": branch_activity[0].get("step_no") if branch_activity else None,
4659        "tools": [step.get("tool_name") or step.get("kind") for step in branch_activity[-10:]],
4660    }
4661
4662
4663def _step_accounts_for_measured_progress_guard(step: dict[str, Any]) -> bool:
4664    tool_name = step.get("tool_name")
4665    if tool_name == "record_lesson":
4666        return True
4667    if tool_name != "record_tasks":
4668        return False
4669    output = step.get("output") if isinstance(step.get("output"), dict) else {}
4670    tasks = output.get("tasks") if isinstance(output.get("tasks"), list) else []
4671    for task in tasks:
4672        if not isinstance(task, dict):
4673            continue
4674        status = str(task.get("status") or "open").strip().lower().replace(" ", "_")
4675        if status in {"done", "skipped"}:
4676            continue
4677        contract = str(task.get("output_contract") or "").strip().lower().replace(" ", "_")
4678        if contract in {"experiment", "monitor"}:
4679            return True
4680        if contract == "action" and _task_text_requires_measurement(task):
4681            return True
4682    return False
4683
4684
4685def _maybe_create_measurement_obligation(
4686    *,
4687    db: AgentDB,
4688    job_id: str,
4689    step: dict[str, Any] | None,
4690    tool_name: str,
4691    args: dict[str, Any],
4692    result: dict[str, Any],
4693) -> None:
4694    if tool_name != "shell_exec":
4695        return
4696    command = str(args.get("command") or result.get("command") or "")
4697    candidates = measurement_candidates(result, command=command)
4698    if not candidates:
4699        return
4700    metadata = db.get_job(job_id).get("metadata")
4701    if isinstance(metadata, dict):
4702        existing = metadata.get("pending_measurement_obligation")
4703        if isinstance(existing, dict) and existing and not existing.get("resolved_at"):
4704            return
4705    obligation = {
4706        "created_at": datetime.now(timezone.utc).isoformat(),
4707        "source_step_id": step.get("id") if step else "",
4708        "source_step_no": step.get("step_no") if step else None,
4709        "tool": tool_name,
4710        "summary": "Tool output contains measurable-looking results that need experiment accounting.",
4711        "metric_candidates": candidates,
4712        "command": command[:1000],
4713    }
4714    db.update_job_metadata(job_id, {"pending_measurement_obligation": obligation})
4715    db.append_agent_update(
4716        job_id,
4717        f"Measured output needs accounting: {', '.join(candidates[:3])}.",
4718        category="blocked",
4719        metadata={"pending_measurement_obligation": obligation},
4720    )
4721
4722
4723def _maybe_create_file_validation_obligation(
4724    *,
4725    db: AgentDB,
4726    job_id: str,
4727    step: dict[str, Any] | None,
4728    args: dict[str, Any],
4729    result: dict[str, Any],
4730) -> None:
4731    path = str(result.get("path") or args.get("path") or "").strip()
4732    content = str(args.get("content") or "")
4733    if not path or not _file_output_needs_validation(path, content):
4734        return
4735    metadata = db.get_job(job_id).get("metadata")
4736    if isinstance(metadata, dict):
4737        existing = metadata.get("pending_file_validation_obligation")
4738        if isinstance(existing, dict) and existing and not existing.get("resolved_at"):
4739            return
4740    obligation = {
4741        "created_at": datetime.now(timezone.utc).isoformat(),
4742        "source_step_id": step.get("id") if step else "",
4743        "source_step_no": step.get("step_no") if step else None,
4744        "tool": "write_file",
4745        "path": path,
4746        "reason": "code/config/script-like file was written and needs validation before more branch work",
4747        "suggested_validation": _suggested_file_validation(path),
4748    }
4749    db.update_job_metadata(job_id, {"pending_file_validation_obligation": obligation})
4750    db.append_agent_update(
4751        job_id,
4752        f"File output needs validation: {path}",
4753        category="blocked",
4754        metadata={"pending_file_validation_obligation": obligation},
4755    )
4756
4757
4758def _command_references_path(command: str, path: str) -> bool:
4759    if not command or not path:
4760        return False
4761    path_obj = Path(path)
4762    needles = {str(path_obj), path_obj.name}
4763    try:
4764        needles.add(str(path_obj.expanduser().resolve()))
4765    except OSError:
4766        pass
4767    return any(needle and needle in command for needle in needles)
4768
4769
4770def _resolve_file_validation_obligation(
4771    db: AgentDB,
4772    job_id: str,
4773    *,
4774    status: str,
4775    reason: str,
4776    via_tool: str,
4777    result: dict[str, Any] | None = None,
4778) -> None:
4779    job = db.get_job(job_id)
4780    obligation = _pending_file_validation_obligation(job)
4781    if not obligation:
4782        return
4783    resolved = dict(obligation)
4784    resolved.update({
4785        "resolved_at": datetime.now(timezone.utc).isoformat(),
4786        "resolution_status": status,
4787        "resolution_reason": reason[:1000],
4788        "resolution_tool": via_tool,
4789    })
4790    if result:
4791        resolved["validation_result"] = {
4792            key: result.get(key)
4793            for key in ("success", "returncode", "error", "summary")
4794            if key in result
4795        }
4796    db.update_job_metadata(
4797        job_id,
4798        {
4799            "pending_file_validation_obligation": {},
4800            "last_file_validation_obligation": resolved,
4801        },
4802    )
4803    db.append_agent_update(
4804        job_id,
4805        f"File validation {status}: {reason[:220]}",
4806        category="progress" if status == "validated" else "blocked",
4807        metadata={"file_validation_obligation": resolved},
4808    )
4809
4810
4811def _maybe_resolve_file_validation_obligation(
4812    *,
4813    db: AgentDB,
4814    job_id: str,
4815    tool_name: str,
4816    args: dict[str, Any],
4817    result: dict[str, Any],
4818    ok: bool,
4819) -> None:
4820    obligation = _pending_file_validation_obligation(db.get_job(job_id))
4821    if not obligation:
4822        return
4823    if tool_name == "shell_exec":
4824        command = str(args.get("command") or result.get("command") or "")
4825        path = str(obligation.get("path") or "")
4826        if not _command_references_path(command, path):
4827            return
4828        status = "validated" if ok else "failed"
4829        reason = "Validation command completed." if ok else f"Validation command failed: {result.get('error') or 'non-zero result'}"
4830        _resolve_file_validation_obligation(db, job_id, status=status, reason=reason, via_tool=tool_name, result=result)
4831        return
4832    if ok and tool_name in {"record_lesson", "record_tasks", "record_experiment", "record_milestone_validation"}:
4833        _resolve_file_validation_obligation(
4834            db,
4835            job_id,
4836            status="deferred",
4837            reason=f"Validation was handled or deferred via {tool_name}.",
4838            via_tool=tool_name,
4839            result=result,
4840        )
4841
4842
4843def _step_by_id(db: AgentDB, job_id: str, step_id: str) -> dict[str, Any] | None:
4844    for step in db.list_steps(job_id=job_id):
4845        if str(step.get("id") or "") == step_id:
4846            return step
4847    return None
4848
4849
4850def _search_query(args: dict[str, Any]) -> str:
4851    return str(args.get("query") or "").strip()
4852
4853
4854def _query_tokens(query: str) -> set[str]:
4855    return {
4856        token
4857        for token in re.findall(r"[a-z0-9]+", query.lower())
4858        if len(token) > 2 and token not in QUERY_STOPWORDS
4859    }
4860
4861
4862def _text_tokens(value: str) -> set[str]:
4863    return {
4864        token
4865        for token in re.findall(r"[a-z0-9]+", str(value or "").lower())
4866        if len(token) > 2 and token not in TEXT_TOKEN_STOPWORDS
4867    }
4868
4869
4870def _similar_recent_search(
4871    args: dict[str, Any],
4872    recent_steps: list[dict[str, Any]],
4873    *,
4874    window: int = 12,
4875) -> dict[str, Any] | None:
4876    return _similar_recent_query_tool("web_search", args, recent_steps, window=window)
4877
4878
4879def _similar_recent_query_tool(
4880    tool_name: str,
4881    args: dict[str, Any],
4882    recent_steps: list[dict[str, Any]],
4883    *,
4884    window: int = 12,
4885) -> dict[str, Any] | None:
4886    query = _search_query(args)
4887    tokens = _query_tokens(query)
4888    if len(tokens) < 2:
4889        return None
4890    for step in reversed(_completed_recent_steps(recent_steps)[-window:]):
4891        if step.get("tool_name") != tool_name:
4892            continue
4893        input_data = step.get("input") or {}
4894        previous_args = input_data.get("arguments") if isinstance(input_data, dict) else None
4895        if not isinstance(previous_args, dict):
4896            continue
4897        previous_query = _search_query(previous_args)
4898        previous_tokens = _query_tokens(previous_query)
4899        if len(previous_tokens) < 2:
4900            continue
4901        overlap = len(tokens & previous_tokens) / max(len(tokens), len(previous_tokens))
4902        if overlap >= 0.72:
4903            return step
4904    return None
4905
4906
4907def _recent_tool_streak(recent_steps: list[dict[str, Any]], tool_name: str) -> int:
4908    streak = 0
4909    for step in reversed(_completed_recent_steps(recent_steps)):
4910        current_tool = step.get("tool_name")
4911        if current_tool == tool_name:
4912            streak += 1
4913            continue
4914        if current_tool:
4915            break
4916    return streak
4917
4918
4919def _repeated_guard_block_context(
4920    recent_steps: list[dict[str, Any]],
4921    *,
4922    threshold: int = 3,
4923    window: int = 12,
4924) -> dict[str, Any] | None:
4925    recoveries = [
4926        step
4927        for step in recent_steps
4928        if step.get("tool_name") == "guard_recovery" and step.get("status") == "completed"
4929    ]
4930    last_recovery = max(
4931        recoveries,
4932        key=lambda step: int(step.get("step_no") or 0),
4933        default=None,
4934    )
4935    last_recovery_no = int(last_recovery.get("step_no") or 0) if last_recovery else 0
4936    last_recovery_error = ""
4937    if last_recovery:
4938        recovery_output = last_recovery.get("output") if isinstance(last_recovery.get("output"), dict) else {}
4939        recovery_context = recovery_output.get("guard_recovery") if isinstance(recovery_output.get("guard_recovery"), dict) else {}
4940        last_recovery_error = str(recovery_context.get("error") or "")
4941    operational_steps = [
4942        step
4943        for step in recent_steps
4944        if int(step.get("step_no") or 0) > last_recovery_no
4945        if step.get("kind") in {"tool", "recovery", "assistant"} and step.get("tool_name") != "guard_recovery"
4946    ]
4947    tail = operational_steps[-window:]
4948    latest_blocked = next((step for step in reversed(tail) if step.get("status") == "blocked"), None)
4949    if not latest_blocked:
4950        return None
4951    output = latest_blocked.get("output") if isinstance(latest_blocked.get("output"), dict) else {}
4952    error = str(output.get("error") or latest_blocked.get("error") or "")
4953    if error not in RECOVERABLE_GUARD_ERRORS:
4954        return None
4955    count = 0
4956    blocked_tools = []
4957    first_step_no = None
4958    for step in tail:
4959        step_output = step.get("output") if isinstance(step.get("output"), dict) else {}
4960        step_error = str(step_output.get("error") or step.get("error") or "")
4961        if step.get("status") == "blocked" and step_error == error:
4962            count += 1
4963            first_step_no = first_step_no or step.get("step_no")
4964            blocked_tools.append(str(step.get("tool_name") or step.get("kind") or "tool"))
4965    effective_threshold = 1 if _already_read_checkpoint_accounting_block(latest_blocked) else threshold
4966    if count < effective_threshold:
4967        return None
4968    progress_after_recovery = any(
4969        step.get("status") == "completed"
4970        and step.get("tool_name") != "guard_recovery"
4971        for step in operational_steps
4972    )
4973    if last_recovery_error == error and not progress_after_recovery:
4974        return None
4975    context = {
4976        "error": error,
4977        "count": count,
4978        "first_step_no": first_step_no,
4979        "latest_step_no": latest_blocked.get("step_no"),
4980        "blocked_tools": blocked_tools[-8:],
4981    }
4982    if error == "task queue saturated":
4983        task_queue = output.get("task_queue") if isinstance(output.get("task_queue"), dict) else {}
4984        context["task_queue"] = {
4985            "reason": task_queue.get("reason") or "task queue saturated",
4986            "open_count": task_queue.get("open_count"),
4987            "total_count": task_queue.get("total_count"),
4988            "open_titles": task_queue.get("open_titles") if isinstance(task_queue.get("open_titles"), list) else [],
4989        }
4990    return context
4991
4992
4993def _already_read_checkpoint_accounting_block(step: dict[str, Any]) -> bool:
4994    output = step.get("output") if isinstance(step.get("output"), dict) else {}
4995    checkpoint = output.get("pending_evidence_checkpoint") if isinstance(output.get("pending_evidence_checkpoint"), dict) else {}
4996    return (
4997        output.get("error") == "evidence checkpoint accounting required"
4998        and (bool(output.get("checkpoint_already_read")) or bool(checkpoint.get("checkpoint_read")))
4999    )
5000
5001
5002def _step_error_text(step: dict[str, Any]) -> str:
5003    output = step.get("output") if isinstance(step.get("output"), dict) else {}
5004    parts = [
5005        output.get("error"),
5006        output.get("error_type"),
5007        output.get("detail"),
5008        output.get("message"),
5009        step.get("error"),
5010        step.get("summary"),
5011    ]
5012    return " ".join(str(part) for part in parts if part)
5013
5014
5015def _blocked_tool_call_result(
5016    name: str,
5017    args: dict[str, Any],
5018    recent_steps: list[dict[str, Any]],
5019    job: dict[str, Any],
5020) -> tuple[dict[str, Any], str] | None:
5021    if name == "defer_job":
5022        self_defer = _self_defer_context(args)
5023        if self_defer:
5024            result = {
5025                "success": False,
5026                "error": "self-defer blocked",
5027                "blocked_tool": name,
5028                "blocked_arguments": args,
5029                "self_defer": self_defer,
5030                "guidance": (
5031                    "Do not defer merely for a future worker turn to pick up ordinary work. Use defer_job only when "
5032                    "waiting for a real external process, scheduled monitor interval, long-running command, "
5033                    "or other time-based condition. Otherwise execute, measure, record a task/experiment/lesson, or "
5034                    "mark the branch blocked now."
5035                ),
5036            }
5037            return result, "blocked defer_job; self-defer is not progress"
5038
5039    if name == "record_tasks":
5040        saturated = _task_queue_saturation_context(job, args)
5041        if saturated:
5042            result = {
5043                "success": False,
5044                "error": "task queue saturated",
5045                "blocked_tool": name,
5046                "blocked_arguments": args,
5047                "task_queue": saturated,
5048                "guidance": (
5049                    "The durable task queue already has many branches. Do not create more branch sprawl. "
5050                    "Choose an existing high-priority task and execute it, update existing tasks to active, "
5051                    "done, blocked, or skipped, or consolidate the queue into roadmap/milestone state."
5052                ),
5053            }
5054            return result, f"blocked record_tasks; {saturated['reason']}"
5055        task_planning_stagnation = _task_planning_stagnation_context(job)
5056        if task_planning_stagnation and _record_tasks_adds_new_open_work(args, job):
5057            result = {
5058                "success": False,
5059                "error": "task execution required",
5060                "blocked_tool": name,
5061                "blocked_arguments": args,
5062                "task_planning": task_planning_stagnation,
5063                "guidance": (
5064                    "Recent checkpoints only expanded the task queue. Do not add more new open tasks yet. "
5065                    "Execute or validate an existing branch, save a durable checkpoint, record findings/source/"
5066                    "experiment evidence, mark existing tasks done/blocked/skipped, or record a lesson."
5067                ),
5068            }
5069            return result, "blocked record_tasks; task-only planning needs execution"
5070
5071    current_milestone_validation = _milestone_validation_needed(job)
5072    if (
5073        name == "record_milestone_validation"
5074        and current_milestone_validation
5075        and not _milestone_validation_call_matches_current(args, current_milestone_validation)
5076    ):
5077        result = {
5078            "success": False,
5079            "error": "current milestone validation required",
5080            "blocked_tool": name,
5081            "blocked_arguments": args,
5082            "milestone": {
5083                "title": current_milestone_validation.get("title"),
5084                "status": current_milestone_validation.get("status"),
5085                "validation_status": current_milestone_validation.get("validation_status"),
5086                "acceptance_criteria": current_milestone_validation.get("acceptance_criteria"),
5087                "evidence_needed": current_milestone_validation.get("evidence_needed"),
5088            },
5089            "guidance": (
5090                "A milestone validation gate is already active. Validate that current milestone by name, "
5091                "or update the roadmap to make a different milestone current before validating another one."
5092            ),
5093        }
5094        return result, "blocked record_milestone_validation; current milestone validation required"
5095
5096    auto_checkpoint_accounting = _auto_checkpoint_accounting_context(job, recent_steps)
5097    checkpoint_read_call = bool(
5098        auto_checkpoint_accounting
5099        and name == "read_artifact"
5100        and not auto_checkpoint_accounting.get("checkpoint_read")
5101        and _read_artifact_call_matches_checkpoint(
5102            args,
5103            artifact_id=str(auto_checkpoint_accounting.get("artifact_id") or ""),
5104            artifact_title=str(auto_checkpoint_accounting.get("title") or ""),
5105        )
5106    )
5107    if _evidence_checkpoint_blocks_tool(name, args, auto_checkpoint_accounting):
5108        checkpoint_already_read = bool(auto_checkpoint_accounting and auto_checkpoint_accounting.get("checkpoint_read"))
5109        result = {
5110            "success": False,
5111            "error": "evidence checkpoint accounting required",
5112            "blocked_tool": name,
5113            "blocked_arguments": args,
5114            "pending_evidence_checkpoint": auto_checkpoint_accounting,
5115            "checkpoint_already_read": checkpoint_already_read,
5116            "required_next_action": "durable_checkpoint_accounting" if checkpoint_already_read else "read_or_account_checkpoint",
5117            "allowed_resolution_tools": sorted(EVIDENCE_CHECKPOINT_RESOLUTION_TOOLS),
5118            "guidance": _evidence_checkpoint_block_guidance(auto_checkpoint_accounting or {}),
5119        }
5120        return result, f"blocked {name}; evidence checkpoint accounting required"
5121    checkpoint_resolution_call = bool(auto_checkpoint_accounting and name in EVIDENCE_CHECKPOINT_RESOLUTION_TOOLS)
5122
5123    if name == "shell_exec":
5124        placeholder = _shell_placeholder_context(str(args.get("command") or ""))
5125        if placeholder:
5126            result = {
5127                "success": False,
5128                "error": "unresolved placeholder in shell command",
5129                "blocked_tool": name,
5130                "blocked_arguments": args,
5131                "placeholder": placeholder,
5132                "guidance": (
5133                    "Do not execute shell commands that still contain placeholder URLs, paths, hosts, or template "
5134                    "tokens. Resolve the concrete value from evidence, ask the operator if it is genuinely unknown, "
5135                    "or record a blocked task/source before continuing."
5136                ),
5137            }
5138            return result, "blocked shell_exec; unresolved placeholder in command"
5139        syntax_error = _shell_syntax_preflight_context(str(args.get("command") or ""))
5140        if syntax_error:
5141            result = {
5142                "success": False,
5143                "recoverable": True,
5144                "error": "malformed shell command",
5145                "blocked_tool": name,
5146                "blocked_arguments": args,
5147                "syntax": syntax_error,
5148                "guidance": (
5149                    "Do not execute partial or malformed shell. Rebuild the command from exact observed paths, "
5150                    "or use a simpler bounded probe before retrying."
5151                ),
5152            }
5153            return result, "blocked shell_exec; malformed command syntax"
5154        candidate_recovery = _observed_candidate_recovery_required_context(recent_steps, args)
5155        if candidate_recovery:
5156            result = {
5157                "success": False,
5158                "error": "observed executable recovery required",
5159                "blocked_tool": name,
5160                "blocked_arguments": args,
5161                "candidate_recovery": candidate_recovery,
5162                "guidance": (
5163                    "A recent shell step reported this command as missing, and later evidence showed candidate "
5164                    "executable paths. Retry with an exact observed executable path, add its directory to PATH, "
5165                    "or record why that observed candidate is invalid before running the bare command again."
5166                ),
5167            }
5168            return result, "blocked shell_exec; observed executable recovery required"
5169        privileged_failure = _recent_privileged_shell_failure_context(recent_steps)
5170        if privileged_failure and _shell_command_looks_privileged_or_package_manager(str(args.get("command") or "")):
5171            result = {
5172                "success": False,
5173                "error": "privileged command recovery required",
5174                "blocked_tool": name,
5175                "blocked_arguments": args,
5176                "privileged_failure": privileged_failure,
5177                "guidance": (
5178                    "A recent privileged/package-manager shell command failed due permission or authorization. "
5179                    "Do not retry that class of command until the failure is accounted for. Use observed executable "
5180                    "paths, user-writable installs, existing project files, or record_tasks/record_lesson/"
5181                    "record_experiment to mark the branch blocked or choose a non-privileged recovery."
5182                ),
5183            }
5184            return result, "blocked shell_exec; privileged command recovery required"
5185
5186    unpersisted_evidence = _unpersisted_evidence_step(recent_steps)
5187    if unpersisted_evidence and name in BRANCH_WORK_TOOLS:
5188        result = {
5189            "success": False,
5190            "error": "artifact required before more research",
5191            "blocked_tool": name,
5192            "blocked_arguments": args,
5193            "previous_step": unpersisted_evidence["id"],
5194            "guidance": (
5195                "Fresh browser, extracted, or shell evidence is waiting. Save or account for that evidence with "
5196                "write_artifact, record_findings, record_source, record_experiment, record_tasks, "
5197                "record_roadmap, record_milestone_validation, or record_lesson before doing more search, "
5198                "browsing, shell work, or artifact review."
5199            ),
5200        }
5201        return result, f"blocked {name}; write_artifact required after evidence step #{unpersisted_evidence['step_no']}"
5202
5203    duplicate_step = _duplicate_recent_tool_call(name, args, recent_steps)
5204    if duplicate_step:
5205        guidance = "Use a different query, extract one of the prior result URLs, open a result in the browser, or write an artifact."
5206        if name == "read_artifact":
5207            guidance = (
5208                "This artifact was already read. Do not read it again; use its content to inspect a concrete item, "
5209                "record findings/tasks, or write a report artifact."
5210            )
5211        elif name == "shell_exec":
5212            guidance = (
5213                "This shell command was already run. Do not rerun discovery; use the previous output to inspect a "
5214                "specific file/item, write an artifact, or update findings/tasks."
5215            )
5216        result = {
5217            "success": False,
5218            "recoverable": name == "read_artifact",
5219            "error": "duplicate tool call blocked",
5220            "blocked_tool": name,
5221            "blocked_arguments": args,
5222            "previous_step": duplicate_step["id"],
5223            "guidance": guidance,
5224        }
5225        return result, f"blocked duplicate {name}; previous step #{duplicate_step['step_no']}"
5226
5227    if checkpoint_read_call:
5228        return None
5229
5230    browser_runtime_unavailable = _browser_runtime_unavailable_context(recent_steps)
5231    if browser_runtime_unavailable and _is_browser_tool(name):
5232        result = {
5233            "success": False,
5234            "error": "browser runtime unavailable",
5235            "blocked_tool": name,
5236            "blocked_arguments": args,
5237            "browser_runtime": browser_runtime_unavailable,
5238            "guidance": (
5239                "Browser automation is unavailable on this host. Do not retry browser tools until the runtime is "
5240                "installed or configured. Use web_search, web_extract, shell_exec, source/ledger tools, or record "
5241                "a blocked task/source and continue through a non-browser branch."
5242            ),
5243        }
5244        return result, f"blocked {name}; browser runtime unavailable"
5245
5246    measurement_obligation = _pending_measurement_obligation(job)
5247    if (
5248        measurement_obligation
5249        and not checkpoint_resolution_call
5250        and name in MEASUREMENT_BLOCKED_TOOLS
5251        and name not in MEASUREMENT_RESOLUTION_TOOLS
5252    ):
5253        result = {
5254            "success": False,
5255            "error": "measurement obligation pending",
5256            "blocked_tool": name,
5257            "blocked_arguments": args,
5258            "pending_measurement_obligation": measurement_obligation,
5259            "guidance": (
5260                "A recent action produced measurable output. Record it with record_experiment, "
5261                "explain why it is invalid with record_lesson, or create the missing measurement branch with record_tasks "
5262                "before doing more research, artifact writing, or finding/source updates."
5263            ),
5264        }
5265        return result, f"blocked {name}; record_experiment required after measured output"
5266
5267    file_validation_obligation = _pending_file_validation_obligation(job)
5268    if (
5269        file_validation_obligation
5270        and not checkpoint_resolution_call
5271        and name in FILE_VALIDATION_BLOCKED_TOOLS
5272        and name not in FILE_VALIDATION_RESOLUTION_TOOLS
5273    ):
5274        result = {
5275            "success": False,
5276            "error": "file validation pending",
5277            "blocked_tool": name,
5278            "blocked_arguments": args,
5279            "pending_file_validation_obligation": file_validation_obligation,
5280            "guidance": (
5281                "A recent file output needs validation before more research/output churn. "
5282                "Use shell_exec to run a syntax check, dry-run, test, or other narrow validation for the file, "
5283                "or use record_tasks/record_lesson/record_experiment if validation is blocked or deferred."
5284            ),
5285        }
5286        return result, f"blocked {name}; file validation required after write_file"
5287
5288    early_anti_bot_context = _recent_anti_bot_context(recent_steps)
5289    if early_anti_bot_context and name == "write_artifact" and not _artifact_args_acknowledge_block(args):
5290        result = {
5291            "success": False,
5292            "error": "misleading blocked-source artifact blocked",
5293            "blocked_tool": name,
5294            "blocked_arguments": args,
5295            "anti_bot_source": early_anti_bot_context,
5296            "guidance": "The latest browser evidence is an anti-bot/CAPTCHA block. Write only a blocked-source note or pivot.",
5297        }
5298        return result, f"blocked misleading write_artifact; anti-bot source at step #{early_anti_bot_context.get('step_no')}"
5299
5300    evidence_grounding = _evidence_grounding_context(job, recent_steps, tool_name=name, args=args)
5301    if evidence_grounding:
5302        result = {
5303            "success": False,
5304            "error": "evidence grounding required",
5305            "blocked_tool": name,
5306            "blocked_arguments": args,
5307            "evidence_grounding": evidence_grounding,
5308            "guidance": evidence_grounding["guidance"],
5309        }
5310        return result, f"blocked {name}; evidence grounding required"
5311
5312    measured_progress_guard = _measured_progress_guard_context(job, recent_steps)
5313    experiment_stagnation = _experiment_stagnation_context(job, recent_steps)
5314    deliverable_progress_guard = _deliverable_progress_guard_context(job, recent_steps)
5315    source_yield = _source_yield_context(job, recent_steps)
5316    progress_churn = _progress_churn_context(recent_steps)
5317    artifact_accounting = _artifact_accounting_context(recent_steps)
5318    activity_stagnation = _activity_stagnation_context(job)
5319    memory_consolidation = _memory_graph_consolidation_context(job, recent_steps)
5320    shell_read_only = name == "shell_exec" and _shell_command_looks_read_only(str(args.get("command") or ""))
5321    if (
5322        artifact_accounting
5323        and name in ARTIFACT_ACCOUNTING_BLOCKED_TOOLS
5324        and name not in ARTIFACT_ACCOUNTING_RESOLUTION_TOOLS
5325    ):
5326        result = {
5327            "success": False,
5328            "error": "progress accounting required",
5329            "blocked_tool": name,
5330            "blocked_arguments": args,
5331            "artifact_accounting": artifact_accounting,
5332            "guidance": (
5333                "Recent saved outputs have not been reflected in durable progress state. "
5334                "Use record_tasks or record_roadmap to mark completed/open branches, "
5335                "record_milestone_validation for milestone checks, record_findings or record_source "
5336                "for reusable evidence, record_experiment for measurements, or record_lesson "
5337                "if the outputs were low-value before continuing."
5338            ),
5339        }
5340        return result, f"blocked {name}; progress accounting required after saved outputs"
5341
5342    if progress_churn and not measured_progress_guard and name in CHURN_TOOLS:
5343        result = {
5344            "success": False,
5345            "error": "progress ledger update required",
5346            "blocked_tool": name,
5347            "blocked_arguments": args,
5348            "progress_churn": progress_churn,
5349            "guidance": (
5350                "Recent activity has not changed findings, experiments, tasks, lessons, or sources. "
5351                "Use a ledger tool to record progress, reject the branch, or create a pivot task before continuing."
5352            ),
5353        }
5354        return result, f"blocked {name}; progress ledger update required"
5355
5356    read_only_shell_churn = _read_only_shell_churn_context(recent_steps)
5357    if read_only_shell_churn and shell_read_only:
5358        result = {
5359            "success": False,
5360            "error": "action decision required",
5361            "blocked_tool": name,
5362            "blocked_arguments": args,
5363            "read_only_shell_churn": read_only_shell_churn,
5364            "guidance": (
5365                "Recent shell work only inspected or listed state. Stop re-probing the same branch. "
5366                "Run the next concrete action, write/persist the candidate decision, record an experiment/monitor task, "
5367                "or record why the branch is blocked before another read-only shell command."
5368            ),
5369        }
5370        return result, f"blocked {name}; action decision required"
5371
5372    if activity_stagnation and name in ACTIVITY_STAGNATION_BLOCKED_TOOLS:
5373        result = {
5374            "success": False,
5375            "error": "durable progress required",
5376            "blocked_tool": name,
5377            "blocked_arguments": args,
5378            "activity_stagnation": activity_stagnation,
5379            "guidance": (
5380                "Several checkpoints have produced no durable ledger delta. "
5381                "Use record_findings, record_source, record_experiment, record_tasks, record_roadmap, "
5382                "record_milestone_validation, or record_lesson to classify the branch, mark it blocked/skipped, "
5383                "or open a better branch before more research, shell, file, report, or artifact work."
5384            ),
5385        }
5386        return result, f"blocked {name}; durable progress required after activity-only checkpoints"
5387
5388    if source_yield and name in SOURCE_YIELD_BLOCKED_TOOLS:
5389        result = {
5390            "success": False,
5391            "error": "source yield accounting required",
5392            "blocked_tool": name,
5393            "blocked_arguments": args,
5394            "source_yield": source_yield,
5395            "guidance": (
5396                "The job has gathered enough sources without enough durable findings or yielded source outcomes. "
5397                "Before more search, extraction, browsing, shell execution, file/output work, or report chatter, "
5398                "use record_findings to save source-backed facts/candidates, record_source to mark source yield "
5399                "or low-yield outcomes, or update tasks/roadmap/lessons to pivot from the source branch."
5400            ),
5401        }
5402        return result, f"blocked {name}; source yield accounting required"
5403
5404    if memory_consolidation and name in MEMORY_CONSOLIDATION_BLOCKED_TOOLS:
5405        result = {
5406            "success": False,
5407            "error": "memory graph consolidation required",
5408            "blocked_tool": name,
5409            "blocked_arguments": args,
5410            "memory_consolidation": memory_consolidation,
5411            "guidance": (
5412                "The job has enough reusable durable records that raw ledgers should be consolidated into connected "
5413                "memory. Use record_memory_graph to add/update nodes and links before more branch work, or record_lesson "
5414                "if there is no reusable memory to preserve."
5415            ),
5416        }
5417        return result, f"blocked {name}; memory graph consolidation required"
5418
5419    record_experiment_closes_branch = (
5420        name == "record_experiment"
5421        and str(args.get("status") or "").strip().lower().replace(" ", "_") in {"failed", "blocked", "skipped"}
5422    )
5423    if (
5424        experiment_stagnation
5425        and not record_experiment_closes_branch
5426        and (
5427            name in BRANCH_WORK_TOOLS
5428            or name in {"record_experiment", "write_artifact", "write_file", "report_update"}
5429        )
5430    ):
5431        result = {
5432            "success": False,
5433            "error": "experiment stagnation decision required",
5434            "blocked_tool": name,
5435            "blocked_arguments": args,
5436            "experiment_stagnation": experiment_stagnation,
5437            "guidance": (
5438                "Recent measured trials have not improved the best observed result. Before more experiments, "
5439                "execution, research, file/output work, or report chatter, make a durable decision: use "
5440                "record_tasks, record_roadmap, record_milestone_validation, record_lesson, or a blocked/skipped/"
5441                "failed record_experiment to reject, block, or pivot the stagnant branch."
5442            ),
5443        }
5444        return result, f"blocked {name}; experiment stagnation decision required"
5445
5446    lesson_sprawl = _lesson_sprawl_context(job, recent_steps)
5447    if lesson_sprawl and name == "record_lesson":
5448        result = {
5449            "success": False,
5450            "error": "lesson consolidation required",
5451            "blocked_tool": name,
5452            "blocked_arguments": args,
5453            "lesson_consolidation": lesson_sprawl,
5454            "guidance": (
5455                "This job already has many raw lessons and the connected memory graph is behind. "
5456                "Do not add another raw lesson. Use record_memory_graph to consolidate reusable strategy, mistake, "
5457                "constraint, decision, question, skill, or episode nodes with evidence links, or update existing "
5458                "tasks/roadmap/milestone state if this is only branch status."
5459            ),
5460        }
5461        return result, "blocked record_lesson; lesson consolidation required"
5462
5463    if deliverable_progress_guard and (name in DELIVERABLE_PROGRESS_BLOCKED_TOOLS or shell_read_only):
5464        result = {
5465            "success": False,
5466            "error": "deliverable checkpoint required",
5467            "blocked_tool": name,
5468            "blocked_arguments": args,
5469            "deliverable_progress_guard": deliverable_progress_guard,
5470            "guidance": (
5471                "This job is deliverable-framed and has done enough background work without a draft/report/file "
5472                "checkpoint. Save a partial deliverable with write_file or write_artifact, or record_tasks, "
5473                "record_roadmap, record_milestone_validation, or record_lesson if the deliverable is blocked."
5474            ),
5475        }
5476        return result, f"blocked {name}; deliverable checkpoint required"
5477
5478    research_balance = _research_balance_context(job, recent_steps)
5479    if research_balance and name in RESEARCH_BALANCE_BLOCKED_TOOLS:
5480        result = {
5481            "success": False,
5482            "error": "research balance required",
5483            "blocked_tool": name,
5484            "blocked_arguments": args,
5485            "research_balance": research_balance,
5486            "guidance": (
5487                "Recent work is execution-heavy but has no durable sources or findings. "
5488                "Use web/browser/documentation/local-inspection tools and record_source or record_findings "
5489                "before continuing execution, artifact review, raw lesson accumulation, report updates, or file churn."
5490            ),
5491        }
5492        return result, f"blocked {name}; research balance required"
5493
5494    roadmap_staleness = _roadmap_staleness_context(job, recent_steps)
5495    if roadmap_staleness and not checkpoint_resolution_call and name in ROADMAP_STALENESS_BLOCKED_TOOLS:
5496        result = {
5497            "success": False,
5498            "error": "roadmap update required",
5499            "blocked_tool": name,
5500            "blocked_arguments": args,
5501            "roadmap_staleness": roadmap_staleness,
5502            "guidance": (
5503                "The roadmap has not advanced despite durable task/artifact activity. "
5504                "Use record_roadmap to mark milestone progress, record_milestone_validation "
5505                "to judge an evidence-backed checkpoint, or record_lesson if the roadmap is wrong."
5506            ),
5507        }
5508        return result, f"blocked {name}; roadmap update required"
5509
5510    milestone_validation = _milestone_validation_needed(job)
5511    milestone_validation_action = milestone_validation and _tool_call_matches_pending_milestone_need(
5512        name,
5513        args,
5514        milestone_validation,
5515    )
5516    if (
5517        milestone_validation
5518        and not milestone_validation_action
5519        and not checkpoint_resolution_call
5520        and name in MILESTONE_VALIDATION_BLOCKED_TOOLS
5521    ):
5522        result = {
5523            "success": False,
5524            "error": "milestone validation required",
5525            "blocked_tool": name,
5526            "blocked_arguments": args,
5527            "milestone": {
5528                "title": milestone_validation.get("title"),
5529                "status": milestone_validation.get("status"),
5530                "validation_status": milestone_validation.get("validation_status"),
5531                "acceptance_criteria": milestone_validation.get("acceptance_criteria"),
5532                "evidence_needed": milestone_validation.get("evidence_needed"),
5533            },
5534            "guidance": (
5535                "The current milestone is ready for validation. Use record_milestone_validation "
5536                "with evidence and pass/fail/blocker status, read an existing artifact if needed, "
5537                "or create follow-up tasks for validation gaps before starting more branch work."
5538            ),
5539        }
5540        return result, f"blocked {name}; milestone validation required"
5541
5542    anti_bot_context = _recent_anti_bot_context(recent_steps)
5543    if anti_bot_context:
5544        blocked_browser_followups = {"browser_click", "browser_console", "browser_press", "browser_scroll", "browser_snapshot", "browser_type"}
5545        if name in blocked_browser_followups:
5546            result = {
5547                "success": False,
5548                "error": "anti-bot source loop blocked",
5549                "blocked_tool": name,
5550                "blocked_arguments": args,
5551                "anti_bot_source": anti_bot_context,
5552                "guidance": "This page is blocked by anti-bot/CAPTCHA. Record the source as blocked and pivot to a different public source.",
5553            }
5554            return result, f"blocked {name}; anti-bot source at step #{anti_bot_context.get('step_no')}"
5555        if name == "browser_navigate" and _same_source_url(str(args.get("url") or ""), str(anti_bot_context.get("url") or "")):
5556            result = {
5557                "success": False,
5558                "error": "anti-bot source loop blocked",
5559                "blocked_tool": name,
5560                "blocked_arguments": args,
5561                "anti_bot_source": anti_bot_context,
5562                "guidance": "Do not reopen the same blocked source. Pivot to another source.",
5563            }
5564            return result, f"blocked {name}; repeated blocked source from step #{anti_bot_context.get('step_no')}"
5565        if name == "web_extract":
5566            urls = args.get("urls") if isinstance(args.get("urls"), list) else []
5567            if any(_same_source_url(str(url), str(anti_bot_context.get("url") or "")) for url in urls):
5568                result = {
5569                    "success": False,
5570                    "error": "anti-bot source loop blocked",
5571                    "blocked_tool": name,
5572                    "blocked_arguments": args,
5573                    "anti_bot_source": anti_bot_context,
5574                    "guidance": "Do not extract the same blocked source. Record it as low-yield and pivot.",
5575                }
5576                return result, f"blocked {name}; blocked source from step #{anti_bot_context.get('step_no')}"
5577        if name == "write_artifact" and not _artifact_args_acknowledge_block(args):
5578            result = {
5579                "success": False,
5580                "error": "misleading blocked-source artifact blocked",
5581                "blocked_tool": name,
5582                "blocked_arguments": args,
5583                "anti_bot_source": anti_bot_context,
5584                "guidance": "The latest browser evidence is an anti-bot/CAPTCHA block. Write only a blocked-source note or pivot.",
5585            }
5586            return result, f"blocked misleading write_artifact; anti-bot source at step #{anti_bot_context.get('step_no')}"
5587
5588    experiment_next_action = _latest_experiment_next_action_context(job)
5589    action_failure = _experiment_next_action_failure_context(job, recent_steps)
5590    if (
5591        action_failure
5592        and name not in {"record_experiment", "record_tasks", "record_lesson", "record_milestone_validation"}
5593    ):
5594        result = {
5595            "success": False,
5596            "error": "action result accounting required",
5597            "blocked_tool": name,
5598            "blocked_arguments": args,
5599            "action_failure": action_failure,
5600            "guidance": (
5601                "The latest experiment next action was attempted and the observed output reports a missing command, "
5602                "path, or prerequisite. Before more work, use record_experiment, record_tasks, or record_lesson to "
5603                "account for the failed/blocked action and choose a concrete recovery branch."
5604            ),
5605        }
5606        return result, f"blocked {name}; action result accounting required"
5607    if (
5608        _experiment_next_action_requires_delivery(experiment_next_action)
5609        and (
5610            name in EXPERIMENT_NEXT_ACTION_BLOCKED_TOOLS
5611            or (
5612                name == "shell_exec"
5613                and _shell_command_looks_read_only(str(args.get("command") or ""))
5614                and not _shell_command_supports_experiment_next_action(str(args.get("command") or ""), experiment_next_action)
5615            )
5616        )
5617    ):
5618        result = {
5619            "success": False,
5620            "error": "experiment next action pending",
5621            "blocked_tool": name,
5622            "blocked_arguments": args,
5623            "experiment_next_action": experiment_next_action,
5624            "guidance": (
5625                "The latest measured experiment selected a delivery/action next step. "
5626                "Act on that next action with an execution or ledger tool, or use record_experiment/record_tasks/record_lesson "
5627                "to explain why it is invalid or blocked before doing more research or artifact review."
5628            ),
5629        }
5630        return result, f"blocked {name}; experiment next action pending"
5631
5632    shell_budget_exhausted = (
5633        name == "shell_exec"
5634        and _as_int(measured_progress_guard.get("shell_actions_since_last_experiment")) >= MEASURABLE_ACTION_BUDGET_STEPS
5635    ) if measured_progress_guard else False
5636    candidate_validation_shell = (
5637        name == "shell_exec" and _shell_exec_targets_candidate_file(job, recent_steps, args)
5638    )
5639    if (
5640        measured_progress_guard
5641        and not checkpoint_resolution_call
5642        and (name in MEASURABLE_RESEARCH_BLOCKED_TOOLS or (shell_budget_exhausted and not candidate_validation_shell))
5643    ):
5644        result = {
5645            "success": False,
5646            "error": "measured progress required",
5647            "blocked_tool": name,
5648            "blocked_arguments": args,
5649            "measured_progress_guard": measured_progress_guard,
5650            "guidance": (
5651                "This job is measurably framed and has exhausted its research budget without new experiment records. "
5652                "If the shell/action budget is exhausted, do not call shell_exec again; call record_experiment for a "
5653                "known measurement, record_tasks with an experiment/action/monitor contract, or record_lesson if "
5654                "measurement is blocked."
5655            ),
5656        }
5657        return result, f"blocked {name}; measured progress required"
5658
5659    if name in BRANCH_WORK_TOOLS and _task_queue_exhausted(job):
5660        result = {
5661            "success": False,
5662            "error": "task branch required before more work",
5663            "blocked_tool": name,
5664            "blocked_arguments": args,
5665            "guidance": (
5666                "The durable task queue has no open or active branch. Use record_tasks to open the next concrete "
5667                "branch before doing more research or execution, or report_update if the operator needs a checkpoint."
5668            ),
5669        }
5670        return result, f"blocked {name}; no open task branch"
5671
5672    known_bad_source = _known_bad_source_for_call(name, args, job)
5673    if known_bad_source:
5674        result = {
5675            "success": False,
5676            "error": "known bad source blocked",
5677            "blocked_tool": name,
5678            "blocked_arguments": args,
5679            "known_bad_source": known_bad_source,
5680            "guidance": (
5681                "The source ledger marks this source as blocked or low-yield for this job. "
5682                "Choose a different source, or record a fresh operator reason before retrying it."
5683            ),
5684        }
5685        return result, f"blocked {name}; known bad source {known_bad_source.get('source')}"
5686
5687    if name == "web_search":
5688        similar_step = _similar_recent_search(args, recent_steps)
5689        if similar_step:
5690            result = {
5691                "success": False,
5692                "error": "similar search query blocked",
5693                "blocked_tool": name,
5694                "blocked_arguments": args,
5695                "previous_step": similar_step["id"],
5696                "guidance": "Use an existing result URL, extract a page, or search a clearly different topic/location/source.",
5697            }
5698            return result, f"blocked similar web_search; previous step #{similar_step['step_no']}"
5699        streak = _recent_search_streak(recent_steps)
5700        if streak >= 3:
5701            result = {
5702                "success": False,
5703                "error": "search loop blocked",
5704                "blocked_tool": name,
5705                "blocked_arguments": args,
5706                "recent_search_streak": streak,
5707                "guidance": "Stop searching. Extract or open one of the prior results, then write an artifact.",
5708            }
5709            return result, f"blocked web_search after {streak} consecutive searches"
5710
5711    if name == "search_artifacts":
5712        similar_step = _similar_recent_query_tool("search_artifacts", args, recent_steps)
5713        if similar_step:
5714            result = {
5715                "success": False,
5716                "error": "similar artifact search blocked",
5717                "blocked_tool": name,
5718                "blocked_arguments": args,
5719                "previous_step": similar_step["id"],
5720                "guidance": (
5721                    "Use a returned artifact, record what the prior artifact searches proved, "
5722                    "or create the next concrete task instead of searching saved outputs again."
5723                ),
5724            }
5725            return result, f"blocked similar search_artifacts; previous step #{similar_step['step_no']}"
5726        streak = _recent_tool_streak(recent_steps, "search_artifacts")
5727        if streak >= 3:
5728            result = {
5729                "success": False,
5730                "error": "artifact search loop blocked",
5731                "blocked_tool": name,
5732                "blocked_arguments": args,
5733                "recent_artifact_search_streak": streak,
5734                "guidance": (
5735                    "Stop searching saved outputs. Read a specific returned artifact, update tasks/findings/lessons, "
5736                    "or write the next report artifact from already-read evidence."
5737                ),
5738            }
5739            return result, f"blocked search_artifacts after {streak} consecutive artifact searches"
5740
5741    return None
5742
5743
5744def _error_result(exc: Exception) -> dict[str, Any]:
5745    result: dict[str, Any] = {
5746        "success": False,
5747        "error": str(exc),
5748        "error_type": type(exc).__name__,
5749    }
5750    if isinstance(exc, LLMResponseError) and exc.payload:
5751        result["provider_payload"] = exc.payload
5752    return result
5753
5754
5755def _hard_llm_provider_failure_note(exc: Exception) -> str:
5756    return provider_action_required_note(exc)
5757
5758
5759def _max_step_no(steps: list[dict[str, Any]]) -> int:
5760    return max((int(step.get("step_no") or 0) for step in steps), default=0)
5761
5762
5763def _should_reflect(job: dict[str, Any], recent_steps: list[dict[str, Any]]) -> bool:
5764    if not recent_steps:
5765        return False
5766    if recent_steps[-1].get("kind") == "reflection":
5767        return False
5768    step_no = _max_step_no(recent_steps)
5769    if step_no == 0 or step_no % REFLECTION_INTERVAL_STEPS != 0:
5770        return False
5771    reflections = _metadata_list(job, "reflections")
5772    if not reflections:
5773        return True
5774    last_reflected = 0
5775    metadata = reflections[-1].get("metadata") if isinstance(reflections[-1].get("metadata"), dict) else {}
5776    if isinstance(metadata.get("through_step"), int):
5777        last_reflected = metadata["through_step"]
5778    return step_no > last_reflected
5779
5780
5781def _lesson_already_recorded(job: dict[str, Any], lesson: str, *, category: str) -> bool:
5782    text = " ".join(str(lesson or "").split())
5783    wanted_category = str(category or "memory").strip().lower() or "memory"
5784    return any(
5785        str(entry.get("category") or "memory").strip().lower() == wanted_category
5786        and " ".join(str(entry.get("lesson") or "").split()) == text
5787        for entry in _metadata_list(job, "lessons")
5788    )
5789
5790
5791def _reflection_strategy(
5792    *,
5793    failures: list[dict[str, Any]],
5794    findings: list[Any],
5795    sources: list[Any],
5796    tasks: list[Any],
5797    measured_experiments: list[dict[str, Any]],
5798    pending_measurement: bool,
5799    validating_milestones: list[dict[str, Any]],
5800    active_operator_messages: list[dict[str, Any]],
5801) -> str:
5802    if pending_measurement:
5803        return "Resolve the pending measurement obligation before expanding research, outputs, or branch work."
5804    if active_operator_messages:
5805        return "Incorporate or supersede active operator context before choosing new autonomous branches."
5806    if validating_milestones:
5807        return "Validate the current roadmap milestone from evidence before adding more milestone scope."
5808    if measured_experiments:
5809        return "Continue from the best measured result; reject or pivot branches that do not improve the active metric."
5810    yielded_sources = [
5811        source
5812        for source in sources
5813        if isinstance(source, dict)
5814        and (_as_int(source.get("yield_count")) > 0 or _as_float(source.get("usefulness_score")) >= 0.8)
5815    ]
5816    if len(sources) >= SOURCE_YIELD_MIN_SOURCES and len(findings) + len(yielded_sources) < max(2, len(sources) // 8):
5817        return "Distill gathered sources into durable findings or source yield decisions before collecting more sources."
5818    if failures:
5819        return "Classify blocked or failed steps into durable task, source, experiment, or lesson outcomes before retrying."
5820    open_tasks = [
5821        task
5822        for task in tasks
5823        if isinstance(task, dict)
5824        and str(task.get("status") or "open").lower() in {"open", "active", "blocked"}
5825    ]
5826    if open_tasks:
5827        return "Execute or resolve the highest-priority open task before creating more task branches."
5828    return "Choose the next branch from durable evidence, then record the result as findings, tasks, experiments, sources, or memory."
5829
5830
5831def _claim_operator_queue(db: AgentDB, job_id: str) -> list[dict[str, Any]]:
5832    steering = db.claim_operator_messages(job_id, modes=("steer",), limit=1)
5833    if steering:
5834        return steering
5835    return db.claim_operator_messages(job_id, modes=("follow_up",), limit=1)
5836
5837
5838def _emit_loop_start(db: AgentDB, job_id: str, run_id: str) -> None:
5839    db.append_event(
5840        job_id,
5841        event_type="loop",
5842        title="agent_start",
5843        ref_table="job_runs",
5844        ref_id=run_id,
5845        metadata={"run_id": run_id},
5846    )
5847    db.append_event(
5848        job_id,
5849        event_type="loop",
5850        title="turn_start",
5851        ref_table="job_runs",
5852        ref_id=run_id,
5853        metadata={"run_id": run_id},
5854    )
5855
5856
5857def _emit_assistant_message_event(
5858    db: AgentDB,
5859    job_id: str,
5860    run_id: str,
5861    response: LLMResponse,
5862    *,
5863    messages: list[dict[str, Any]],
5864    context_length: int,
5865    duration_seconds: float | None = None,
5866) -> dict[str, Any]:
5867    if response.tool_calls:
5868        body = ", ".join(call.name for call in response.tool_calls)
5869        metadata = {"run_id": run_id, "tool_calls": [call.name for call in response.tool_calls]}
5870    else:
5871        body = response.content[:1000]
5872        metadata = {"run_id": run_id, "tool_calls": []}
5873    metadata["usage"] = turn_usage_metadata(response, messages=messages, context_length=context_length)
5874    if duration_seconds is not None:
5875        metadata["duration_seconds"] = round(max(0.0, float(duration_seconds)), 3)
5876    if response.model:
5877        metadata["model"] = response.model
5878    if response.response_id:
5879        metadata["response_id"] = response.response_id
5880    db.append_event(
5881        job_id,
5882        event_type="loop",
5883        title="message_end",
5884        body=body,
5885        ref_table="job_runs",
5886        ref_id=run_id,
5887        metadata=metadata,
5888    )
5889    return metadata["usage"]
5890
5891
5892def _emit_loop_end(
5893    db: AgentDB,
5894    job_id: str,
5895    run_id: str,
5896    *,
5897    status: str,
5898    step_id: str | None = None,
5899    tool_name: str | None = None,
5900    detail: str = "",
5901) -> None:
5902    metadata = {"run_id": run_id, "status": status, "step_id": step_id or "", "tool": tool_name or ""}
5903    db.append_event(
5904        job_id,
5905        event_type="loop",
5906        title="turn_end",
5907        body=detail[:1000],
5908        ref_table="job_runs",
5909        ref_id=run_id,
5910        metadata=metadata,
5911    )
5912    db.append_event(
5913        job_id,
5914        event_type="loop",
5915        title="agent_end",
5916        body=status,
5917        ref_table="job_runs",
5918        ref_id=run_id,
5919        metadata=metadata,
5920    )
5921
5922
5923def _run_reflection_step(
5924    job: dict[str, Any],
5925    recent_steps: list[dict[str, Any]],
5926    *,
5927    db: AgentDB,
5928    job_id: str,
5929    run_id: str,
5930) -> StepExecution:
5931    step_id = db.add_step(job_id=job_id, run_id=run_id, kind="reflection", tool_name="reflect")
5932    metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
5933    findings = metadata.get("finding_ledger") if isinstance(metadata.get("finding_ledger"), list) else []
5934    sources = metadata.get("source_ledger") if isinstance(metadata.get("source_ledger"), list) else []
5935    tasks = metadata.get("task_queue") if isinstance(metadata.get("task_queue"), list) else []
5936    experiments = metadata.get("experiment_ledger") if isinstance(metadata.get("experiment_ledger"), list) else []
5937    lessons = metadata.get("lessons") if isinstance(metadata.get("lessons"), list) else []
5938    roadmap = metadata.get("roadmap") if isinstance(metadata.get("roadmap"), dict) else {}
5939    milestones = roadmap.get("milestones") if isinstance(roadmap.get("milestones"), list) else []
5940    validating_milestones = [
5941        milestone for milestone in milestones
5942        if isinstance(milestone, dict)
5943        and (
5944            str(milestone.get("status") or "planned") == "validating"
5945            or str(milestone.get("validation_status") or "not_started") == "pending"
5946        )
5947    ]
5948    operator_messages = metadata.get("operator_messages") if isinstance(metadata.get("operator_messages"), list) else []
5949    active_operator_messages = [
5950        entry for entry in operator_messages
5951        if isinstance(entry, dict)
5952        and str(entry.get("mode") or "steer") in {"steer", "follow_up"}
5953        and not entry.get("acknowledged_at")
5954        and not entry.get("superseded_at")
5955    ]
5956    pending_measurement = _pending_measurement_obligation(job)
5957    artifacts = db.list_artifacts(job_id, limit=12)
5958    failures = [step for step in recent_steps[-REFLECTION_INTERVAL_STEPS:] if step.get("status") == "failed" or step.get("status") == "blocked"]
5959    step_no = _max_step_no(recent_steps)
5960    finding_batches = [artifact for artifact in artifacts if "finding" in str(artifact.get("title") or artifact.get("summary") or "").lower()]
5961    best_sources = sorted(
5962        [
5963            source for source in sources
5964            if isinstance(source, dict)
5965            and (
5966                _as_int(source.get("yield_count")) > 0
5967                or _as_float(source.get("usefulness_score")) >= 0.2
5968            )
5969            and _as_int(source.get("fail_count")) <= max(0, _as_int(source.get("yield_count")))
5970        ],
5971        key=lambda source: (_as_float(source.get("usefulness_score")), _as_int(source.get("yield_count"))),
5972        reverse=True,
5973    )[:3]
5974    source_text = ", ".join(str(source.get("source") or "") for source in best_sources) or "no high-yield source yet"
5975    measured_experiments = [experiment for experiment in experiments if isinstance(experiment, dict) and experiment.get("metric_value") is not None]
5976    best_experiments = [experiment for experiment in measured_experiments if experiment.get("best_observed")]
5977    best_experiment_text = "no measured experiment yet"
5978    if best_experiments:
5979        best_experiment_text = "; ".join(
5980            f"{experiment.get('title')} " + format_metric_value(
5981                experiment.get("metric_name") or "metric",
5982                experiment.get("metric_value"),
5983                experiment.get("metric_unit") or "",
5984            )
5985            for experiment in best_experiments[-3:]
5986        )
5987    summary = (
5988        f"Reflection through step #{step_no}: {len(findings)} findings, {len(sources)} sources, "
5989        f"{len(tasks)} tasks, {len(experiments)} experiments, {len(milestones)} roadmap milestones, "
5990        f"{len(lessons)} lessons, "
5991        f"{len(active_operator_messages)} active operator messages, "
5992        f"{len(finding_batches)} recent finding artifacts, {len(failures)} recent blocked/failed steps. "
5993        f"Best source direction: {source_text}. Best measured result: {best_experiment_text}."
5994        + (f" Roadmap '{roadmap.get('title')}' has {len(validating_milestones)} milestone(s) needing validation." if roadmap else "")
5995        + (" Pending measurement obligation needs resolution." if pending_measurement else "")
5996    )
5997    strategy = _reflection_strategy(
5998        failures=failures,
5999        findings=findings,
6000        sources=sources,
6001        tasks=tasks,
6002        measured_experiments=measured_experiments,
6003        pending_measurement=bool(pending_measurement),
6004        validating_milestones=validating_milestones,
6005        active_operator_messages=active_operator_messages,
6006    )
6007    reflection = db.append_reflection(
6008        job_id,
6009        summary,
6010        strategy=strategy,
6011        metadata={
6012            "through_step": step_no,
6013            "finding_count": len(findings),
6014            "source_count": len(sources),
6015            "task_count": len(tasks),
6016            "experiment_count": len(experiments),
6017            "roadmap_milestone_count": len(milestones),
6018            "roadmap_validation_needed_count": len(validating_milestones),
6019            "measured_experiment_count": len(measured_experiments),
6020            "active_operator_message_count": len(active_operator_messages),
6021            "pending_measurement_obligation": bool(pending_measurement),
6022        },
6023    )
6024    lesson = None
6025    if not _lesson_already_recorded(job, strategy, category="strategy"):
6026        lesson = db.append_lesson(
6027            job_id,
6028            strategy,
6029            category="strategy",
6030            confidence=0.75,
6031            metadata={"source": "reflection", "through_step": step_no},
6032        )
6033    db.append_agent_update(job_id, summary, category="plan", metadata={"reflection": reflection})
6034    result = {"success": True, "reflection": reflection, "lesson_recorded": bool(lesson)}
6035    db.finish_step(step_id, status="completed", summary=summary, output_data=result)
6036    db.finish_run(run_id, "completed")
6037    _emit_loop_end(db, job_id, run_id, status="completed", step_id=step_id, tool_name="reflect", detail=summary)
6038    refresh_memory_index(db, job_id)
6039    return StepExecution(job_id=job_id, run_id=run_id, step_id=step_id, tool_name="reflect", status="completed", result=result)
6040
6041
6042def _run_guard_recovery_step(
6043    context: dict[str, Any],
6044    *,
6045    db: AgentDB,
6046    job_id: str,
6047    run_id: str,
6048) -> StepExecution:
6049    error = str(context.get("error") or "recoverable guard")
6050    checkpoint_accounting = error == "evidence checkpoint accounting required"
6051    task_queue_saturated = error == "task queue saturated"
6052    task_goal = "Convert the repeated guard block into durable progress before retrying the blocked action."
6053    acceptance = (
6054        "Use record_tasks, record_findings, record_source, record_experiment, or record_lesson to state what "
6055        "changed, what branch is rejected, or what concrete branch should run next."
6056    )
6057    stall_behavior = "If the same guard appears again, pivot to a different branch or record the branch as blocked."
6058    if checkpoint_accounting:
6059        task_goal = (
6060            "Account for the already-read evidence checkpoint as durable progress, a rejected branch, "
6061            "or a blocked branch before continuing."
6062        )
6063        acceptance = (
6064            "Use record_findings, record_source, record_experiment, record_tasks, record_roadmap, "
6065            "record_milestone_validation, or record_lesson to state exactly what the checkpoint proved, "
6066            "invalidated, changed, or failed to provide. Do not read the same checkpoint again."
6067        )
6068        stall_behavior = (
6069            "If the checkpoint cannot produce durable progress, record a lesson or task that names the blocker "
6070            "and choose a different branch."
6071        )
6072    step_id = db.add_step(job_id=job_id, run_id=run_id, kind="recovery", tool_name="guard_recovery")
6073    if task_queue_saturated:
6074        task_queue = context.get("task_queue") if isinstance(context.get("task_queue"), dict) else {}
6075        lesson = db.append_lesson(
6076            job_id,
6077            (
6078                f"Repeated task queue saturation occurred {context.get('count')} times. "
6079                "Do not open guard-recovery tasks for saturation; consolidate, complete, block, or skip existing branches "
6080                "before adding new work."
6081            ),
6082            category="strategy",
6083            confidence=0.85,
6084            metadata={"guard_recovery": context},
6085        )
6086        db.update_job_metadata(
6087            job_id,
6088            {
6089                "task_backlog_pressure": {
6090                    "detected_at": datetime.now(timezone.utc).isoformat(),
6091                    "guard_recovery": context,
6092                    "reason": task_queue.get("reason") or "task queue saturated",
6093                    "open_count": task_queue.get("open_count"),
6094                    "total_count": task_queue.get("total_count"),
6095                }
6096            },
6097        )
6098        message = (
6099            f"Guard recovery recorded task queue saturation from step #{context.get('first_step_no')} "
6100            f"to #{context.get('latest_step_no')}; no new task was opened."
6101        )
6102        update = db.append_agent_update(
6103            job_id,
6104            message,
6105            category="blocked",
6106            metadata={"guard_recovery": context, "lesson_key": lesson.get("key"), "task_queue_saturation": True},
6107        )
6108        result = {
6109            "success": True,
6110            "guard_recovery": context,
6111            "lesson": lesson,
6112            "update": update,
6113            "task_opened": False,
6114        }
6115        db.finish_step(step_id, status="completed", summary=message, output_data=result)
6116        finished_step = _step_by_id(db, job_id, step_id)
6117        _resolve_evidence_checkpoint(
6118            db=db,
6119            job_id=job_id,
6120            tool_name="guard_recovery",
6121            step=finished_step,
6122        )
6123        db.finish_run(run_id, "completed")
6124        _emit_loop_end(db, job_id, run_id, status="completed", step_id=step_id, tool_name="guard_recovery", detail=message)
6125        refresh_memory_index(db, job_id)
6126        return StepExecution(job_id=job_id, run_id=run_id, step_id=step_id, tool_name="guard_recovery", status="completed", result=result)
6127
6128    lesson = db.append_lesson(
6129        job_id,
6130        (
6131            f"Repeated guard block '{error}' occurred {context.get('count')} times. "
6132            + (
6133                "The checkpoint has already been read; do not reread it. Account for the evidence with a durable "
6134                "record or reject/block that branch before continuing."
6135                if checkpoint_accounting
6136                else "Do not retry the same blocked tool pattern; update durable progress state, create a new branch, "
6137                "or explicitly reject the branch before continuing."
6138            )
6139        ),
6140        category="strategy",
6141        confidence=0.75,
6142        metadata={"guard_recovery": context},
6143    )
6144    task = db.append_task_record(
6145        job_id,
6146        title=f"Resolve guard: {error}",
6147        status="open",
6148        priority=9,
6149        goal=task_goal,
6150        output_contract="decision",
6151        acceptance_criteria=acceptance,
6152        evidence_needed=f"Recent blocked tools: {', '.join(context.get('blocked_tools') or [])}",
6153        stall_behavior=stall_behavior,
6154        metadata={"guard_recovery": context, "resolves_evidence_checkpoint": checkpoint_accounting},
6155    )
6156    message = (
6157        f"Guard recovery opened a task after repeated '{error}' blocks "
6158        f"from step #{context.get('first_step_no')} to #{context.get('latest_step_no')}."
6159    )
6160    update = db.append_agent_update(
6161        job_id,
6162        message,
6163        category="blocked",
6164        metadata={"guard_recovery": context, "task_key": task.get("key"), "lesson_key": lesson.get("key")},
6165    )
6166    result = {
6167        "success": True,
6168        "guard_recovery": context,
6169        "lesson": lesson,
6170        "task": task,
6171        "update": update,
6172    }
6173    db.finish_step(step_id, status="completed", summary=message, output_data=result)
6174    finished_step = _step_by_id(db, job_id, step_id)
6175    _resolve_evidence_checkpoint(
6176        db=db,
6177        job_id=job_id,
6178        tool_name="guard_recovery",
6179        step=finished_step,
6180    )
6181    db.finish_run(run_id, "completed")
6182    _emit_loop_end(db, job_id, run_id, status="completed", step_id=step_id, tool_name="guard_recovery", detail=message)
6183    refresh_memory_index(db, job_id)
6184    return StepExecution(job_id=job_id, run_id=run_id, step_id=step_id, tool_name="guard_recovery", status="completed", result=result)
6185
6186
6187def _usage_budget_limit_context(config: AppConfig, usage: dict[str, Any]) -> dict[str, Any] | None:
6188    limit = config.runtime.max_job_cost_usd
6189    if limit is None or limit <= 0 or not bool(usage.get("has_cost")):
6190        return None
6191    cost = _as_float(usage.get("cost"))
6192    if cost < float(limit):
6193        return None
6194    return {
6195        "limit": float(limit),
6196        "cost": cost,
6197        "calls": _as_int(usage.get("calls")),
6198        "total_tokens": _as_int(usage.get("total_tokens")),
6199        "prompt_tokens": _as_int(usage.get("prompt_tokens")),
6200        "completion_tokens": _as_int(usage.get("completion_tokens")),
6201    }
6202
6203
6204def _run_usage_budget_limit_step(
6205    context: dict[str, Any],
6206    *,
6207    db: AgentDB,
6208    job_id: str,
6209    run_id: str,
6210) -> StepExecution:
6211    limit = float(context.get("limit") or 0.0)
6212    cost = float(context.get("cost") or 0.0)
6213    message = (
6214        f"Paused job: configured model cost limit ${limit:g} reached "
6215        f"(current cost ${cost:.4f}, {context.get('calls')} model calls, "
6216        f"{_compact_usage_tokens(context.get('total_tokens'))} tokens). "
6217        "Raise the limit, switch model/provider, or resume after deciding the budget is acceptable."
6218    )
6219    metadata = {
6220        "reason": "usage_budget_limit",
6221        "usage_budget_limit": context,
6222        "last_note": message,
6223        "usage_budget_blocked_at": datetime.now(timezone.utc).isoformat(),
6224    }
6225    db.update_job_status(job_id, "paused", metadata_patch=metadata)
6226    step_id = db.add_step(job_id=job_id, run_id=run_id, kind="recovery", tool_name="budget_limit")
6227    result = {"success": True, "job_id": job_id, "paused": True, **context}
6228    db.append_agent_update(
6229        job_id,
6230        message,
6231        category="blocked",
6232        metadata={"reason": "usage_budget_limit", "usage_budget_limit": context},
6233    )
6234    db.finish_step(step_id, status="completed", summary=message, output_data=result)
6235    db.finish_run(run_id, "completed")
6236    _emit_loop_end(db, job_id, run_id, status="completed", step_id=step_id, tool_name="budget_limit", detail=message)
6237    refresh_memory_index(db, job_id)
6238    return StepExecution(job_id=job_id, run_id=run_id, step_id=step_id, tool_name="budget_limit", status="completed", result=result)
6239
6240
6241def _compact_usage_tokens(value: object) -> str:
6242    number = _as_int(value)
6243    if number >= 1_000_000:
6244        return f"{number / 1_000_000:.1f}M"
6245    if number >= 1_000:
6246        return f"{number / 1_000:.1f}K"
6247    return str(number)
6248
6249
6250def _evidence_checkpoint_content(evidence_step: dict[str, Any]) -> str:
6251    output = evidence_step.get("output") if isinstance(evidence_step.get("output"), dict) else {}
6252    input_data = evidence_step.get("input") if isinstance(evidence_step.get("input"), dict) else {}
6253    observation = _observation_for_prompt(evidence_step.get("tool_name"), output)
6254    return "\n\n".join([
6255        "# Auto Evidence Checkpoint",
6256        f"Source step: #{evidence_step.get('step_no')} {evidence_step.get('tool_name') or evidence_step.get('kind')}",
6257        f"Summary: {evidence_step.get('summary') or ''}",
6258        f"Arguments:\n```json\n{json.dumps(input_data.get('arguments') or {}, ensure_ascii=False, indent=2)[:3000]}\n```",
6259        f"Observed:\n{observation or 'No compact observation available.'}",
6260        f"Raw output excerpt:\n```json\n{json.dumps(output, ensure_ascii=False, indent=2)[:9000]}\n```",
6261    ])
6262
6263
6264def _auto_persist_evidence(
6265    *,
6266    db: AgentDB,
6267    artifacts: ArtifactStore,
6268    job_id: str,
6269    run_id: str,
6270    step_id: str,
6271    blocked_tool: str,
6272    evidence_step: dict[str, Any],
6273) -> dict[str, Any]:
6274    stored = artifacts.write_text(
6275        job_id=job_id,
6276        run_id=run_id,
6277        step_id=step_id,
6278        title=f"Auto Evidence Checkpoint after step {evidence_step.get('step_no')}",
6279        summary=f"Auto-saved evidence before allowing more research; blocked tool was {blocked_tool}.",
6280        content=_evidence_checkpoint_content(evidence_step),
6281        artifact_type="text",
6282        metadata={"auto_checkpoint": True, "evidence_step": evidence_step.get("id"), "blocked_tool": blocked_tool},
6283    )
6284    lesson = db.append_lesson(
6285        job_id,
6286        (
6287            f"Evidence from step #{evidence_step.get('step_no')} must be persisted before more research; "
6288            f"auto-saved checkpoint {stored.id} after blocked {blocked_tool}."
6289        ),
6290        category="mistake",
6291        confidence=0.8,
6292        metadata={"artifact_id": stored.id, "blocked_tool": blocked_tool},
6293    )
6294    db.append_agent_update(
6295        job_id,
6296        f"Auto-saved evidence checkpoint {stored.id} after the model tried {blocked_tool} before persisting evidence.",
6297        category="blocked",
6298        metadata={"artifact_id": stored.id, "blocked_tool": blocked_tool},
6299    )
6300    db.update_job_metadata(
6301        job_id,
6302        {
6303            "pending_evidence_checkpoint": {
6304                "artifact_id": stored.id,
6305                "title": stored.title or f"Auto Evidence Checkpoint after step {evidence_step.get('step_no')}",
6306                "path": str(stored.path),
6307                "created_at": datetime.now(timezone.utc).isoformat(),
6308                "checkpoint_step_id": step_id,
6309                "evidence_step": evidence_step.get("id"),
6310                "evidence_step_no": evidence_step.get("step_no"),
6311                "evidence_tool": evidence_step.get("tool_name") or evidence_step.get("kind"),
6312                "blocked_tool": blocked_tool,
6313            }
6314        },
6315    )
6316    return {"artifact_id": stored.id, "path": str(stored.path), "lesson": lesson}
6317
6318
6319def _auto_record_grounding_block_lesson(*, db: AgentDB, job_id: str, result: dict[str, Any]) -> None:
6320    if result.get("error") != "evidence grounding required":
6321        return
6322    grounding = result.get("evidence_grounding") if isinstance(result.get("evidence_grounding"), dict) else {}
6323    unsupported = grounding.get("unsupported_tokens") if isinstance(grounding.get("unsupported_tokens"), list) else []
6324    unsupported = [str(token) for token in unsupported if str(token).strip()]
6325    if not unsupported:
6326        return
6327    cited_steps = grounding.get("cited_steps") if isinstance(grounding.get("cited_steps"), list) else []
6328    blocked_tool = str(result.get("blocked_tool") or "")
6329    fingerprint = "|".join([blocked_tool, ",".join(unsupported[:8]), ",".join(str(step) for step in cited_steps[:8])])
6330    job = db.get_job(job_id)
6331    metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
6332    seen = metadata.get("grounding_block_fingerprints") if isinstance(metadata.get("grounding_block_fingerprints"), list) else []
6333    if fingerprint in seen:
6334        return
6335    db.append_lesson(
6336        job_id,
6337        (
6338            f"Evidence grounding rejected unsupported concrete tokens for {blocked_tool or 'a durable record'}: "
6339            f"{', '.join(unsupported[:8])}. Treat matching prior ledger, artifact, or memory claims as stale until "
6340            "they are re-verified from the cited evidence."
6341        ),
6342        category="mistake",
6343        confidence=0.9,
6344        metadata={"evidence_grounding": grounding, "blocked_tool": blocked_tool},
6345    )
6346    metadata_patch: dict[str, Any] = {"grounding_block_fingerprints": (seen + [fingerprint])[-100:]}
6347    stale_tokens = _stale_claim_tokens_from_unsupported(
6348        unsupported,
6349        reference_text=" ".join(str(job.get(key) or "") for key in ("title", "objective", "kind")),
6350    )
6351    if stale_tokens:
6352        existing_tokens = [
6353            str(token)
6354            for token in metadata.get("unsupported_claim_tokens", [])
6355            if str(token).strip()
6356        ] if isinstance(metadata.get("unsupported_claim_tokens"), list) else []
6357        combined: list[str] = []
6358        combined_seen: set[str] = set()
6359        for token in existing_tokens + stale_tokens:
6360            key = token.lower()
6361            if key in combined_seen:
6362                continue
6363            combined_seen.add(key)
6364            combined.append(token)
6365        metadata_patch["unsupported_claim_tokens"] = combined[-80:]
6366    db.update_job_metadata(job_id, metadata_patch)
6367
6368
6369def _mark_evidence_checkpoint_read(
6370    *,
6371    db: AgentDB,
6372    job_id: str,
6373    tool_name: str,
6374    args: dict[str, Any],
6375    step: dict[str, Any] | None,
6376) -> None:
6377    if tool_name != "read_artifact":
6378        return
6379    job = db.get_job(job_id)
6380    pending = _pending_evidence_checkpoint(job)
6381    if not pending or pending.get("read_at"):
6382        return
6383    if not _read_artifact_call_matches_checkpoint(
6384        args,
6385        artifact_id=str(pending.get("artifact_id") or ""),
6386        artifact_title=str(pending.get("title") or ""),
6387    ):
6388        return
6389    updated = dict(pending)
6390    updated["read_at"] = datetime.now(timezone.utc).isoformat()
6391    if step:
6392        updated["read_step_id"] = step.get("id")
6393        updated["read_step_no"] = step.get("step_no")
6394    db.update_job_metadata(job_id, {"pending_evidence_checkpoint": updated})
6395    db.append_agent_update(
6396        job_id,
6397        f"Read evidence checkpoint {pending.get('artifact_id')}; durable accounting is required next.",
6398        category="blocked",
6399        metadata={"pending_evidence_checkpoint": updated},
6400    )
6401
6402
6403def _resolve_evidence_checkpoint(
6404    *,
6405    db: AgentDB,
6406    job_id: str,
6407    tool_name: str,
6408    step: dict[str, Any] | None,
6409) -> None:
6410    if tool_name not in EVIDENCE_CHECKPOINT_RESOLUTION_TOOLS and tool_name != "guard_recovery":
6411        return
6412    job = db.get_job(job_id)
6413    pending = _pending_evidence_checkpoint(job)
6414    if not pending:
6415        return
6416    updated = dict(pending)
6417    updated["resolved_at"] = datetime.now(timezone.utc).isoformat()
6418    updated["resolved_by_tool"] = tool_name
6419    if step:
6420        updated["resolved_by_step_id"] = step.get("id")
6421        updated["resolved_by_step_no"] = step.get("step_no")
6422    db.update_job_metadata(job_id, {"pending_evidence_checkpoint": updated})
6423    db.append_agent_update(
6424        job_id,
6425        f"Evidence checkpoint {pending.get('artifact_id')} accounted for with {tool_name}.",
6426        category="progress",
6427        metadata={"pending_evidence_checkpoint": updated},
6428    )
6429
6430
6431def _auto_record_blocked_source(
6432    *,
6433    db: AgentDB,
6434    job_id: str,
6435    context: dict[str, Any],
6436    blocked_tool: str,
6437) -> dict[str, Any]:
6438    source = str(context.get("url") or context.get("title") or "unknown blocked browser source")
6439    reason = str(context.get("reason") or "anti-bot challenge")
6440    record = db.append_source_record(
6441        job_id,
6442        source,
6443        source_type="blocked_browser_source",
6444        usefulness_score=0.02,
6445        fail_count_delta=1,
6446        warnings=[reason],
6447        outcome=f"blocked by {reason}; pivot to an alternate source for the current objective",
6448        metadata={"blocked_tool": blocked_tool, "source_step": context.get("step_id")},
6449    )
6450    lesson = None
6451    if int(record.get("fail_count") or 0) <= 2:
6452        lesson = db.append_lesson(
6453            job_id,
6454            "Blocked, CAPTCHA, login, paywall, or anti-bot pages are not usable evidence for any long-running task; record the source outcome and pivot instead of repeating browser actions.",
6455            category="source_quality",
6456            confidence=0.9,
6457            metadata={"source": source, "blocked_tool": blocked_tool},
6458        )
6459    db.append_agent_update(
6460        job_id,
6461        f"Blocked source guard: current source is {reason}; pivoting away instead of looping.",
6462        category="blocked",
6463        metadata={"source": source, "blocked_tool": blocked_tool, "reason": reason},
6464    )
6465    return {"source": record, "lesson": lesson}
6466
6467
6468def _auto_record_tool_source_quality(
6469    *,
6470    db: AgentDB,
6471    job_id: str,
6472    tool_name: str | None,
6473    result: dict[str, Any],
6474) -> None:
6475    if tool_name == "web_search":
6476        query = str(result.get("query") or "").strip()
6477        results = result.get("results") if isinstance(result.get("results"), list) else []
6478        for item in results[:8]:
6479            if not isinstance(item, dict):
6480                continue
6481            url = str(item.get("url") or "").strip()
6482            if not url:
6483                continue
6484            title = str(item.get("title") or "").strip()
6485            db.append_source_record(
6486                job_id,
6487                url,
6488                source_type="web_search",
6489                usefulness_score=0.35,
6490                yield_count=0,
6491                outcome=f"search result for {query or 'query'}: {title[:160]}",
6492                metadata={"auto_from_tool": "web_search", "query": query, "title": title},
6493            )
6494        return
6495    if tool_name == "web_extract":
6496        pages = result.get("pages") if isinstance(result.get("pages"), list) else []
6497        for page in pages[:12]:
6498            if not isinstance(page, dict):
6499                continue
6500            url = str(page.get("url") or "").strip()
6501            if not url:
6502                continue
6503            text = str(page.get("text") or "")
6504            error = str(page.get("error") or "")
6505            if error:
6506                db.append_source_record(
6507                    job_id,
6508                    url,
6509                    source_type="web_extract",
6510                    usefulness_score=0.1,
6511                    fail_count_delta=1,
6512                    warnings=[error[:180]],
6513                    outcome=f"extract failed: {error[:180]}",
6514                    metadata={"auto_from_tool": "web_extract"},
6515                )
6516                continue
6517            score = 0.35
6518            if len(text.strip()) >= 500:
6519                score = 0.55
6520            if len(text.strip()) >= 3000:
6521                score = 0.7
6522            db.append_source_record(
6523                job_id,
6524                url,
6525                source_type="web_extract",
6526                usefulness_score=score,
6527                yield_count=0,
6528                outcome=f"extracted {len(text.strip())} chars for possible use",
6529                metadata={"auto_from_tool": "web_extract"},
6530            )
6531        return
6532    if tool_name in {"browser_navigate", "browser_snapshot"}:
6533        context = _browser_warning_context(result)
6534        if not context:
6535            return
6536        result["source_warning"] = context["reason"]
6537        result["source_url"] = context.get("url") or ""
6538        _auto_record_blocked_source(db=db, job_id=job_id, context=context, blocked_tool=tool_name or "browser")
6539
6540
6541def _auto_record_failed_shell_sources(
6542    *,
6543    db: AgentDB,
6544    job_id: str,
6545    args: dict[str, Any],
6546    result: dict[str, Any],
6547) -> None:
6548    error_text = " ".join(str(result.get(key) or "") for key in ("error", "stderr", "stdout"))
6549    lowered = error_text.lower()
6550    if not any(
6551        marker in lowered
6552        for marker in (
6553            "authentication",
6554            "authorization",
6555            "unauthorized",
6556            "forbidden",
6557            "http failure",
6558            "http 401",
6559            "http 403",
6560            "401 unauthorized",
6561            "403 forbidden",
6562        )
6563    ):
6564        return
6565    recorded: set[str] = set()
6566    for url in _shell_guard_urls(str(args.get("command") or ""))[:3]:
6567        candidates = [url]
6568        family_url = _source_failure_family_url(url)
6569        if family_url and not _same_source_url(family_url, url):
6570            candidates.append(family_url)
6571        for candidate in candidates:
6572            if candidate.lower() in recorded:
6573                continue
6574            recorded.add(candidate.lower())
6575            is_family = candidate != url
6576            warning = (
6577                "shell command reported authentication/authorization or HTTP failure for this source family"
6578                if is_family
6579                else "shell command reported authentication/authorization or HTTP failure"
6580            )
6581            outcome = (
6582                f"Source family blocked after failed child URL {url}: {_clip_text(str(result.get('error') or error_text), 420)}"
6583                if is_family
6584                else _clip_text(str(result.get("error") or error_text), 500)
6585            )
6586            metadata = {"auto_from_tool": "shell_exec", "failure_kind": "auth_or_http"}
6587            if is_family:
6588                metadata.update({"source_family": True, "failed_child_url": url})
6589            db.append_source_record(
6590                job_id,
6591                candidate,
6592                source_type="shell_exec_family" if is_family else "shell_exec",
6593                usefulness_score=0.01,
6594                fail_count_delta=1,
6595                warnings=[warning],
6596                outcome=outcome,
6597                metadata=metadata,
6598            )
6599
6600
6601def _auto_reconcile_artifact_tasks(
6602    *,
6603    db: AgentDB,
6604    job_id: str,
6605    args: dict[str, Any],
6606    result: dict[str, Any],
6607) -> list[dict[str, Any]]:
6608    artifact_id = str(result.get("artifact_id") or "")
6609    if not artifact_id:
6610        return []
6611    artifact_title = str(args.get("title") or "")
6612    artifact_summary = str(args.get("summary") or "")
6613    artifact_content = str(args.get("content") or "")
6614    artifact_text = " ".join([artifact_title, artifact_summary, artifact_content[:4000]])
6615    artifact_tokens = _text_tokens(artifact_text)
6616    if len(artifact_tokens) < 2:
6617        return []
6618    job = db.get_job(job_id)
6619    reconciled = []
6620    for task in _metadata_list(job, "task_queue"):
6621        status = str(task.get("status") or "open").strip().lower()
6622        if status not in {"open", "active"}:
6623            continue
6624        contract = str(task.get("output_contract") or "").strip().lower()
6625        if contract in {"experiment", "action", "monitor"}:
6626            continue
6627        task_text = " ".join(
6628            str(task.get(key) or "")
6629            for key in ("title", "goal", "acceptance_criteria", "evidence_needed", "source_hint")
6630        )
6631        if not _artifact_can_reconcile_task(
6632            contract=contract,
6633            task_text=task_text,
6634            artifact_title=artifact_title,
6635            artifact_summary=artifact_summary,
6636        ):
6637            continue
6638        task_tokens = _text_tokens(task_text)
6639        if len(task_tokens) < 2:
6640            continue
6641        overlap = task_tokens & artifact_tokens
6642        needed = max(2, min(4, (len(task_tokens) + 1) // 2))
6643        if len(overlap) < needed:
6644            continue
6645        updated = db.append_task_record(
6646            job_id,
6647            title=str(task.get("title") or ""),
6648            status="done",
6649            priority=_as_int(task.get("priority")),
6650            goal=str(task.get("goal") or ""),
6651            source_hint=str(task.get("source_hint") or ""),
6652            result=f"Saved output {artifact_id}: {_clip_text(artifact_title or artifact_summary, 180)}",
6653            parent=str(task.get("parent") or ""),
6654            output_contract=contract,
6655            acceptance_criteria=str(task.get("acceptance_criteria") or ""),
6656            evidence_needed=str(task.get("evidence_needed") or ""),
6657            stall_behavior=str(task.get("stall_behavior") or ""),
6658            metadata={
6659                **(task.get("metadata") if isinstance(task.get("metadata"), dict) else {}),
6660                "auto_reconciled_from_artifact": artifact_id,
6661                "matched_tokens": sorted(overlap)[:12],
6662            },
6663        )
6664        reconciled.append(updated)
6665    if reconciled:
6666        titles = ", ".join(str(task.get("title") or "") for task in reconciled[:4])
6667        db.append_agent_update(
6668            job_id,
6669            f"Task progress reconciled from saved output {artifact_id}: {titles}.",
6670            category="plan",
6671            metadata={"artifact_id": artifact_id, "task_count": len(reconciled)},
6672        )
6673    return reconciled
6674
6675
6676def _auto_open_revision_task_for_deliverable(
6677    *,
6678    db: AgentDB,
6679    job_id: str,
6680    args: dict[str, Any],
6681    result: dict[str, Any],
6682) -> dict[str, Any] | None:
6683    artifact_id = str(result.get("artifact_id") or "")
6684    if not artifact_id:
6685        return None
6686    artifact_title = str(args.get("title") or "")
6687    artifact_summary = str(args.get("summary") or "")
6688    if not _artifact_can_reconcile_task(
6689        contract="report",
6690        task_text="review revise draft report deliverable",
6691        artifact_title=artifact_title,
6692        artifact_summary=artifact_summary,
6693    ):
6694        return None
6695    job = db.get_job(job_id)
6696    for task in _metadata_list(job, "task_queue"):
6697        if str(task.get("status") or "open").strip().lower() not in {"open", "active"}:
6698            continue
6699        metadata = task.get("metadata") if isinstance(task.get("metadata"), dict) else {}
6700        if metadata.get("revision_source_artifact_id") == artifact_id:
6701            return None
6702        if metadata.get("source") == "auto_revision_loop":
6703            db.append_task_record(
6704                job_id,
6705                title=str(task.get("title") or ""),
6706                status="skipped",
6707                priority=_as_int(task.get("priority")),
6708                goal=str(task.get("goal") or ""),
6709                source_hint=str(task.get("source_hint") or ""),
6710                result=f"Superseded by newer saved output {artifact_id}.",
6711                parent=str(task.get("parent") or ""),
6712                output_contract=str(task.get("output_contract") or ""),
6713                acceptance_criteria=str(task.get("acceptance_criteria") or ""),
6714                evidence_needed=str(task.get("evidence_needed") or ""),
6715                stall_behavior=str(task.get("stall_behavior") or ""),
6716                metadata={**metadata, "superseded_by_artifact_id": artifact_id},
6717            )
6718    task = db.append_task_record(
6719        job_id,
6720        title=f"Review and revise saved output {artifact_id}",
6721        status="open",
6722        priority=4,
6723        goal="Use the latest saved deliverable as a baseline, check it against evidence and acceptance criteria, then improve it.",
6724        source_hint=artifact_id,
6725        output_contract="report",
6726        acceptance_criteria="The saved output is reviewed and either revised, validated, or given concrete follow-up gaps.",
6727        evidence_needed="Saved output, relevant evidence artifacts or files, and explicit gap/revision notes.",
6728        stall_behavior="If no useful revision is possible, record why and open the next evidence, validation, or monitoring branch.",
6729        metadata={
6730            "source": "auto_revision_loop",
6731            "revision_source_artifact_id": artifact_id,
6732            "source_title": artifact_title,
6733        },
6734    )
6735    db.append_agent_update(
6736        job_id,
6737        f"Opened revision branch for saved output {artifact_id}: {_clip_text(artifact_title or artifact_summary, 160)}.",
6738        category="plan",
6739        metadata={"artifact_id": artifact_id, "task_key": task.get("key"), "source": "auto_revision_loop"},
6740    )
6741    return task
6742
6743
6744def _artifact_can_reconcile_task(
6745    *,
6746    contract: str,
6747    task_text: str,
6748    artifact_title: str,
6749    artifact_summary: str,
6750) -> bool:
6751    contract = contract.strip().lower()
6752    if contract in {"experiment", "action", "monitor"}:
6753        return False
6754    if contract == "research":
6755        return True
6756    artifact_text = f"{artifact_title} {artifact_summary}".lower()
6757    task_lower = task_text.lower()
6758    evidence_like = any(term in artifact_text for term in EVIDENCE_ARTIFACT_TERMS)
6759    deliverable_like = any(term in artifact_text for term in DELIVERABLE_ARTIFACT_TERMS)
6760    task_needs_deliverable_action = any(term in task_lower for term in TASK_DELIVERABLE_ACTION_TERMS)
6761    if evidence_like:
6762        return False
6763    if task_needs_deliverable_action and not deliverable_like:
6764        return False
6765    return True
6766
6767
6768def _auto_checkpoint_update(
6769    *,
6770    db: AgentDB,
6771    job_id: str,
6772    step_no: int,
6773    tool_name: str | None,
6774    args: dict[str, Any],
6775    result: dict[str, Any],
6776) -> None:
6777    title_text = " ".join(str(args.get(key) or "") for key in ("title", "summary", "type")).lower()
6778    is_finding_batch = tool_name == "write_artifact" and "finding" in title_text
6779    if not is_finding_batch and step_no % 10 != 0:
6780        return
6781    job = db.get_job(job_id)
6782    metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
6783    previous = metadata.get("last_checkpoint_counts") if isinstance(metadata.get("last_checkpoint_counts"), dict) else {}
6784    checkpoint = build_progress_checkpoint(
6785        metadata,
6786        previous_counts=previous,
6787        step_no=step_no,
6788        tool_name=tool_name,
6789        artifact_id=str(result.get("artifact_id") or ""),
6790        is_finding_output=is_finding_batch,
6791    )
6792    db.append_agent_update(
6793        job_id,
6794        checkpoint.message,
6795        category=checkpoint.category,
6796        metadata={
6797            "step_no": step_no,
6798            "tool": tool_name,
6799            "deltas": checkpoint.deltas,
6800            "updates": checkpoint.updates,
6801            "resolutions": checkpoint.resolutions,
6802        },
6803    )
6804    streak = _as_int(metadata.get("activity_checkpoint_streak"))
6805    streak = streak + 1 if checkpoint.category == "activity" else 0
6806    task_durable_change = checkpoint.deltas.get("tasks", 0) + checkpoint.updates.get("tasks", 0)
6807    non_task_durable_change = any(
6808        checkpoint.deltas.get(key, 0) > 0
6809        or checkpoint.updates.get(key, 0) > 0
6810        or checkpoint.resolutions.get(key, 0) > 0
6811        for key in ("findings", "sources", "experiments", "lessons", "milestones")
6812    )
6813    task_resolution = checkpoint.resolutions.get("tasks", 0) > 0
6814    task_only_progress = task_durable_change > 0 and not non_task_durable_change and not task_resolution
6815    task_planning_streak = _as_int(metadata.get("task_planning_checkpoint_streak"))
6816    task_planning_streak = task_planning_streak + 1 if task_only_progress else 0
6817    db.update_job_metadata(
6818        job_id,
6819        {
6820            "last_checkpoint_counts": checkpoint.counts,
6821            "last_checkpoint_at": datetime.now(timezone.utc).isoformat(),
6822            "activity_checkpoint_streak": streak,
6823            "task_planning_checkpoint_streak": task_planning_streak,
6824        },
6825    )
6826
6827
6828def _execute_tool_call(
6829    call: Any,
6830    *,
6831    job: dict[str, Any],
6832    recent_steps: list[dict[str, Any]],
6833    config: AppConfig,
6834    db: AgentDB,
6835    artifacts: ArtifactStore,
6836    registry: ToolRegistry,
6837    job_id: str,
6838    run_id: str,
6839) -> tuple[StepExecution, bool, str, str | None]:
6840    args = _normalize_milestone_validation_args_for_active_gate(call.name, call.arguments, job)
6841    input_data = {"tool_call_id": call.id, "arguments": args}
6842    if args != call.arguments:
6843        input_data["original_arguments"] = call.arguments
6844    step_id = db.add_step(
6845        job_id=job_id,
6846        run_id=run_id,
6847        kind="tool",
6848        tool_name=call.name,
6849        input_data=input_data,
6850    )
6851    validate_arguments = getattr(registry, "validate_arguments", None)
6852    argument_block = validate_arguments(call.name, args, config) if callable(validate_arguments) else None
6853    if argument_block:
6854        concrete_fields = [*(argument_block.get("missing_arguments") or []), *(argument_block.get("placeholder_arguments") or [])]
6855        reason = "missing required arguments" if argument_block.get("missing_arguments") else str(argument_block.get("error") or "invalid tool arguments")
6856        summary = f"blocked {call.name}; {reason}: {', '.join(concrete_fields)}"
6857        db.finish_step(
6858            step_id,
6859            status="blocked",
6860            summary=summary,
6861            output_data=argument_block,
6862            error=None,
6863        )
6864        db.append_agent_update(
6865            job_id,
6866            summary,
6867            category="blocked",
6868            metadata={
6869                "reason": "tool_arguments_missing",
6870                "tool": call.name,
6871                "missing_arguments": argument_block.get("missing_arguments") or [],
6872                "placeholder_arguments": argument_block.get("placeholder_arguments") or [],
6873            },
6874        )
6875        return (
6876            StepExecution(
6877                job_id=job_id,
6878                run_id=run_id,
6879                step_id=step_id,
6880                tool_name=call.name,
6881                status="blocked",
6882                result=argument_block,
6883            ),
6884            True,
6885            summary,
6886            None,
6887        )
6888    blocked = _blocked_tool_call_result(call.name, args, recent_steps, job)
6889    if blocked:
6890        result, summary = blocked
6891        result = {**result, "success": True, "recoverable": True}
6892        evidence_checkpoint = None
6893        if result.get("error") == "artifact required before more research":
6894            evidence_step = next(
6895                (step for step in recent_steps if step.get("id") == result.get("previous_step")),
6896                None,
6897            )
6898            if evidence_step:
6899                evidence_checkpoint = _auto_persist_evidence(
6900                    db=db,
6901                    artifacts=artifacts,
6902                    job_id=job_id,
6903                    run_id=run_id,
6904                    step_id=step_id,
6905                    blocked_tool=call.name,
6906                    evidence_step=evidence_step,
6907                )
6908                result["auto_checkpoint"] = evidence_checkpoint
6909                summary = f"blocked {call.name}; auto-saved evidence checkpoint {evidence_checkpoint['artifact_id']}"
6910        anti_bot_source = result.get("anti_bot_source") if isinstance(result.get("anti_bot_source"), dict) else None
6911        if anti_bot_source:
6912            result["auto_source_record"] = _auto_record_blocked_source(
6913                db=db,
6914                job_id=job_id,
6915                context=anti_bot_source,
6916                blocked_tool=call.name,
6917            )
6918        known_bad_source = result.get("known_bad_source") if isinstance(result.get("known_bad_source"), dict) else None
6919        if known_bad_source:
6920            db.append_agent_update(
6921                job_id,
6922                f"Source ledger blocked retry of {known_bad_source.get('source')}; choosing a different route next.",
6923                category="blocked",
6924                metadata={"source": known_bad_source, "blocked_tool": call.name},
6925            )
6926        if result.get("error") == "task queue saturated":
6927            step = _step_by_id(db, job_id, step_id)
6928            task_queue = result.get("task_queue") if isinstance(result.get("task_queue"), dict) else {}
6929            _record_task_backlog_pressure(
6930                db=db,
6931                job_id=job_id,
6932                step_no=(step or {}).get("step_no"),
6933                task_queue=task_queue,
6934                source="blocked_record_tasks",
6935            )
6936        _auto_record_grounding_block_lesson(db=db, job_id=job_id, result=result)
6937        db.finish_step(
6938            step_id,
6939            status="blocked",
6940            summary=summary,
6941            output_data=result,
6942            error=None,
6943        )
6944        return (
6945            StepExecution(job_id=job_id, run_id=run_id, step_id=step_id, tool_name=call.name, status="blocked", result=result),
6946            True,
6947            summary,
6948            None,
6949        )
6950
6951    ctx = ToolContext(
6952        config=config,
6953        db=db,
6954        artifacts=artifacts,
6955        job_id=job_id,
6956        run_id=run_id,
6957        step_id=step_id,
6958        task_id=job_id,
6959    )
6960    try:
6961        raw_result = registry.handle(call.name, args, ctx)
6962        result = _parse_tool_result(raw_result)
6963        ok = bool(result.get("success", True)) and not result.get("error")
6964        status = "completed" if ok else "blocked" if result.get("recoverable") is True else "failed"
6965        if ok:
6966            _auto_record_tool_source_quality(db=db, job_id=job_id, tool_name=call.name, result=result)
6967        elif call.name == "shell_exec":
6968            _auto_record_failed_shell_sources(db=db, job_id=job_id, args=args, result=result)
6969        summary = _summarize_tool_result(call.name, args, result, ok=ok)
6970        db.finish_step(step_id, status=status, summary=summary, output_data=result, error=result.get("error"))
6971        if call.name == "shell_exec":
6972            _maybe_resolve_file_validation_obligation(
6973                db=db,
6974                job_id=job_id,
6975                tool_name=call.name,
6976                args=args,
6977                result=result,
6978                ok=ok,
6979            )
6980            finished_step = _step_by_id(db, job_id, step_id)
6981            _maybe_create_measurement_obligation(
6982                db=db,
6983                job_id=job_id,
6984                step=finished_step,
6985                tool_name=call.name,
6986                args=args,
6987                result=result,
6988            )
6989        if ok:
6990            finished_step = _step_by_id(db, job_id, step_id)
6991            _mark_evidence_checkpoint_read(
6992                db=db,
6993                job_id=job_id,
6994                tool_name=call.name,
6995                args=args,
6996                step=finished_step,
6997            )
6998            _resolve_evidence_checkpoint(
6999                db=db,
7000                job_id=job_id,
7001                tool_name=call.name,
7002                step=finished_step,
7003            )
7004            if call.name == "write_file":
7005                _maybe_create_file_validation_obligation(
7006                    db=db,
7007                    job_id=job_id,
7008                    step=finished_step,
7009                    args=args,
7010                    result=result,
7011                )
7012            elif call.name in {"record_lesson", "record_tasks", "record_experiment", "record_milestone_validation"}:
7013                _maybe_resolve_file_validation_obligation(
7014                    db=db,
7015                    job_id=job_id,
7016                    tool_name=call.name,
7017                    args=args,
7018                    result=result,
7019                    ok=ok,
7020                )
7021            _auto_checkpoint_update(
7022                db=db,
7023                job_id=job_id,
7024                step_no=(finished_step or db.list_steps(job_id=job_id)[-1])["step_no"],
7025                tool_name=call.name,
7026                args=args,
7027                result=result,
7028            )
7029            if call.name == "write_artifact":
7030                reconciled_tasks = _auto_reconcile_artifact_tasks(
7031                    db=db,
7032                    job_id=job_id,
7033                    args=args,
7034                    result=result,
7035                )
7036                if reconciled_tasks:
7037                    result["auto_reconciled_tasks"] = [
7038                        {"title": task.get("title"), "status": task.get("status")}
7039                        for task in reconciled_tasks[:8]
7040                    ]
7041                revision_task = _auto_open_revision_task_for_deliverable(
7042                    db=db,
7043                    job_id=job_id,
7044                    args=args,
7045                    result=result,
7046                )
7047                if revision_task:
7048                    result["auto_revision_task"] = {
7049                        "title": revision_task.get("title"),
7050                        "status": revision_task.get("status"),
7051                        "key": revision_task.get("key"),
7052                    }
7053        return (
7054            StepExecution(job_id=job_id, run_id=run_id, step_id=step_id, tool_name=call.name, status=status, result=result),
7055            status != "completed",
7056            summary,
7057            result.get("error") if status == "failed" else None,
7058        )
7059    except Exception as exc:
7060        result = _error_result(exc)
7061        db.finish_step(step_id, status="failed", summary=f"{call.name} raised", output_data=result, error=str(exc))
7062        return (
7063            StepExecution(job_id=job_id, run_id=run_id, step_id=step_id, tool_name=call.name, status="failed", result=result),
7064            True,
7065            str(exc),
7066            str(exc),
7067        )
7068
7069
7070def _is_continuable_recoverable_input_block(execution: StepExecution) -> bool:
7071    result = execution.result if isinstance(execution.result, dict) else {}
7072    error = str(result.get("error") or "")
7073    if execution.status != "blocked" or result.get("recoverable") is not True:
7074        return False
7075    if error in {"missing required tool arguments", "placeholder tool arguments"}:
7076        return bool(result.get("missing_arguments") or result.get("placeholder_arguments"))
7077    if error == "malformed shell command" and execution.tool_name == "shell_exec":
7078        return True
7079    if error == "duplicate tool call blocked" and execution.tool_name == "read_artifact":
7080        return True
7081    return error.startswith("artifact not found:") or error == "no active operator context to acknowledge"
7082
7083
7084def _ordered_tool_calls_for_execution(
7085    tool_calls: list[ToolCall],
7086    *,
7087    job: dict[str, Any],
7088    recent_steps: list[dict[str, Any]],
7089) -> list[ToolCall]:
7090    """Run guard-unblocking calls before branch work when a model batches both."""
7091
7092    if len(tool_calls) < 2:
7093        return tool_calls
7094    if _browser_runtime_unavailable_context(recent_steps) and any(not _is_browser_tool(call.name) for call in tool_calls):
7095        tool_calls = [call for call in tool_calls if not _is_browser_tool(call.name)]
7096        if len(tool_calls) < 2:
7097            return tool_calls
7098    checkpoint = _auto_checkpoint_accounting_context(job, recent_steps)
7099    saturated_record_tasks = any(
7100        call.name == "record_tasks" and _task_queue_saturation_context(job, call.arguments)
7101        for call in tool_calls
7102    )
7103    if not checkpoint and not saturated_record_tasks:
7104        return tool_calls
7105
7106    artifact_id = str(checkpoint.get("artifact_id") or "") if checkpoint else ""
7107    artifact_title = str(checkpoint.get("title") or "") if checkpoint else ""
7108    checkpoint_read = bool(checkpoint and checkpoint.get("checkpoint_read"))
7109    accounting_tools = {
7110        "record_experiment",
7111        "record_findings",
7112        "record_lesson",
7113        "record_memory_graph",
7114        "record_milestone_validation",
7115        "record_roadmap",
7116        "record_source",
7117        "report_update",
7118        "write_artifact",
7119    }
7120
7121    def priority(call: ToolCall) -> int:
7122        if checkpoint:
7123            if call.name in EVIDENCE_CHECKPOINT_RESOLUTION_TOOLS:
7124                return 0
7125            if (
7126                not checkpoint_read
7127                and call.name == "read_artifact"
7128                and _read_artifact_call_matches_checkpoint(
7129                    call.arguments,
7130                    artifact_id=artifact_id,
7131                    artifact_title=artifact_title,
7132                )
7133            ):
7134                return 0
7135        if saturated_record_tasks:
7136            if call.name == "record_tasks" and _task_queue_saturation_context(job, call.arguments):
7137                return 2
7138            if call.name in accounting_tools:
7139                return 0
7140        return 1
7141
7142    ordered = sorted(enumerate(tool_calls), key=lambda item: (priority(item[1]), item[0]))
7143    return [call for _, call in ordered]
7144
7145
7146def _registry_tools(registry: ToolRegistry, config: AppConfig) -> list[dict[str, Any]]:
7147    try:
7148        return registry.openai_tools(config=config)
7149    except TypeError:
7150        return registry.openai_tools()
7151
7152
7153def _registry_tools_for_step(
7154    registry: ToolRegistry,
7155    config: AppConfig,
7156    recent_steps: list[dict[str, Any]],
7157    *,
7158    job: dict[str, Any] | None = None,
7159) -> list[dict[str, Any]]:
7160    tools = _registry_tools(registry, config)
7161    resolution_tools = _active_obligation_tool_names(job, recent_steps) if job else None
7162    if resolution_tools:
7163        tools = [tool for tool in tools if _openai_tool_name(tool) in resolution_tools]
7164    suppressed_tools = _suppressed_tool_names(job, recent_steps)
7165    if resolution_tools:
7166        suppressed_tools -= resolution_tools
7167    if suppressed_tools:
7168        tools = [tool for tool in tools if _openai_tool_name(tool) not in suppressed_tools]
7169    if not _browser_runtime_unavailable_context(recent_steps):
7170        return tools
7171    return [tool for tool in tools if not _is_browser_tool(_openai_tool_name(tool))]
7172
7173
7174def _active_obligation_tool_names(job: dict[str, Any] | None, recent_steps: list[dict[str, Any]]) -> set[str] | None:
7175    if not job:
7176        return None
7177    allowed: set[str] = set()
7178    checkpoint = _auto_checkpoint_accounting_context(job, recent_steps)
7179    if checkpoint:
7180        if not checkpoint.get("checkpoint_read"):
7181            allowed.add("read_artifact")
7182        allowed.update(EVIDENCE_CHECKPOINT_PROMPT_TOOLS)
7183    if _pending_measurement_obligation(job):
7184        allowed.update(MEASUREMENT_RESOLUTION_TOOLS)
7185    if _experiment_next_action_failure_context(job, recent_steps):
7186        allowed.update(MEASUREMENT_RESOLUTION_TOOLS)
7187    measured_progress = _measured_progress_guard_context(job, recent_steps)
7188    if measured_progress:
7189        allowed.update(MEASUREMENT_RESOLUTION_TOOLS)
7190        if _as_int(measured_progress.get("shell_actions_since_last_experiment")) < MEASURABLE_ACTION_BUDGET_STEPS:
7191            allowed.add("shell_exec")
7192    if _pending_file_validation_obligation(job):
7193        allowed.update(FILE_VALIDATION_RESOLUTION_TOOLS)
7194    return allowed or None
7195
7196
7197def _suppressed_tool_names(job: dict[str, Any] | None, recent_steps: list[dict[str, Any]]) -> set[str]:
7198    if not job:
7199        return set()
7200    suppressed: set[str] = set()
7201    if _repeated_task_queue_saturation_context(recent_steps):
7202        suppressed.add("record_tasks")
7203    elif (
7204        (backlog := _task_backlog_pressure_context(job))
7205        and _as_int(backlog.get("total")) > TASK_QUEUE_TOTAL_SOFT_LIMIT
7206        and not _pending_measurement_obligation(job)
7207        and not _pending_file_validation_obligation(job)
7208        and not _auto_checkpoint_accounting_context(job, recent_steps)
7209        and not _task_queue_exhausted(job)
7210    ):
7211        suppressed.add("record_tasks")
7212    if not _has_acknowledgeable_operator_context(job):
7213        suppressed.add("acknowledge_operator_context")
7214    return suppressed
7215
7216
7217def _has_acknowledgeable_operator_context(job: dict[str, Any]) -> bool:
7218    metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
7219    messages = metadata.get("operator_messages") if isinstance(metadata.get("operator_messages"), list) else []
7220    for entry in messages:
7221        if not isinstance(entry, dict):
7222            continue
7223        mode = str(entry.get("mode") or "steer").strip().lower().replace("-", "_")
7224        if mode not in {"steer", "follow_up"}:
7225            continue
7226        if not entry.get("claimed_at"):
7227            continue
7228        if entry.get("acknowledged_at") or entry.get("superseded_at"):
7229            continue
7230        return True
7231    return False
7232
7233
7234def _openai_tool_name(tool: dict[str, Any]) -> str:
7235    function = tool.get("function") if isinstance(tool, dict) else None
7236    if isinstance(function, dict):
7237        return str(function.get("name") or "")
7238    return str(tool.get("name") or "") if isinstance(tool, dict) else ""
7239
7240
7241def _call_next_action_with_timeout(
7242    llm: StepLLM,
7243    *,
7244    messages: list[dict[str, Any]],
7245    tools: list[dict[str, Any]],
7246    timeout_seconds: float,
7247) -> LLMResponse:
7248    timeout = max(0.0, float(timeout_seconds or 0.0))
7249    if timeout <= 0 or threading.current_thread() is not threading.main_thread():
7250        return llm.next_action(messages=messages, tools=tools)
7251
7252    previous_handler = signal.getsignal(signal.SIGALRM)
7253    previous_timer = signal.getitimer(signal.ITIMER_REAL)
7254    started = time.monotonic()
7255
7256    def _raise_timeout(_signum: int, _frame: Any) -> None:
7257        raise TimeoutError(f"model call timed out after {timeout:g}s")
7258
7259    signal.signal(signal.SIGALRM, _raise_timeout)
7260    signal.setitimer(signal.ITIMER_REAL, timeout)
7261    try:
7262        return llm.next_action(messages=messages, tools=tools)
7263    finally:
7264        signal.setitimer(signal.ITIMER_REAL, 0)
7265        signal.signal(signal.SIGALRM, previous_handler)
7266        if previous_timer[0] > 0:
7267            elapsed = max(0.0, time.monotonic() - started)
7268            remaining = max(0.001, previous_timer[0] - elapsed)
7269            signal.setitimer(signal.ITIMER_REAL, remaining, previous_timer[1])
7270
7271
7272def _tool_repair_messages(messages: list[dict[str, Any]], response: LLMResponse) -> list[dict[str, Any]]:
7273    content = str(response.content or "").strip()
7274    if len(content) > 2000:
7275        content = content[:2000] + " ..."
7276    repair_prompt = (
7277        "Your previous worker response did not call a tool. This worker must advance by calling exactly "
7278        "one available tool now. Do not answer in prose. Choose one bounded action that fits the current "
7279        "state, such as executing existing work, recording a measurement, updating an existing task, "
7280        "saving an evidence-backed output, recording a lesson/finding/source, or deferring only for a real wait."
7281    )
7282    repaired = list(messages)
7283    if content:
7284        repaired.append({"role": "assistant", "content": content})
7285    repaired.append({"role": "user", "content": repair_prompt})
7286    return repaired
7287
7288
7289def run_one_step(
7290    job_id: str,
7291    *,
7292    config: AppConfig | None = None,
7293    db: AgentDB | None = None,
7294    llm: StepLLM | None = None,
7295    registry: ToolRegistry = DEFAULT_REGISTRY,
7296) -> StepExecution:
7297    config = config or load_config()
7298    config.ensure_dirs()
7299    owns_db = db is None
7300    db = db or AgentDB(config.runtime.state_db_path)
7301    try:
7302        artifacts = ArtifactStore(config.runtime.home, db=db)
7303        job = db.get_job(job_id)
7304        if _acknowledge_non_prompt_operator_context(db, job_id):
7305            job = db.get_job(job_id)
7306        if _clear_invalid_measurement_obligation(db, job_id):
7307            job = db.get_job(job_id)
7308        if _clear_stale_task_backlog_pressure(db, job_id, job):
7309            job = db.get_job(job_id)
7310        run_id = db.start_run(job_id, model=config.model.model)
7311        _emit_loop_start(db, job_id, run_id)
7312        recent_steps = db.list_steps(job_id=job_id)
7313        if _refresh_contradicted_negative_claims(db, job_id, job, recent_steps):
7314            job = db.get_job(job_id)
7315        model_config = config.model
7316        if _should_reflect(job, recent_steps):
7317            return _run_reflection_step(job, recent_steps, db=db, job_id=job_id, run_id=run_id)
7318        guard_recovery = _repeated_guard_block_context(recent_steps)
7319        if guard_recovery:
7320            return _run_guard_recovery_step(guard_recovery, db=db, job_id=job_id, run_id=run_id)
7321        active_operator_messages = _claim_operator_queue(db, job_id)
7322        if active_operator_messages:
7323            job = db.get_job(job_id)
7324        usage = db.job_token_usage(job_id)
7325        usage_budget_limit = _usage_budget_limit_context(config, usage)
7326        if usage_budget_limit:
7327            return _run_usage_budget_limit_step(
7328                usage_budget_limit,
7329                db=db,
7330                job_id=job_id,
7331                run_id=run_id,
7332            )
7333        messages = build_messages(
7334            job,
7335            recent_steps,
7336            memory_entries=db.list_memory(job_id),
7337            program_text=_load_program_text(config, job_id),
7338            timeline_events=db.list_timeline_events(job_id, limit=30),
7339            active_operator_messages=active_operator_messages,
7340            include_unclaimed_operator_messages=True,
7341            token_usage=usage,
7342        )
7343        llm = llm or OpenAIChatLLM(model_config)
7344        llm_started = time.monotonic()
7345        try:
7346            response: LLMResponse = _call_next_action_with_timeout(
7347                llm,
7348                messages=messages,
7349                tools=_registry_tools_for_step(registry, config, recent_steps, job=job),
7350                timeout_seconds=model_config.request_timeout_seconds,
7351            )
7352        except Exception as exc:
7353            llm_duration_seconds = round(max(0.0, time.monotonic() - llm_started), 3)
7354            step_id = db.add_step(
7355                job_id=job_id,
7356                run_id=run_id,
7357                kind="llm",
7358                status="failed",
7359                summary=f"model call failed: {type(exc).__name__}",
7360                input_data={
7361                    "model": config.model.model,
7362                    "duration_seconds": llm_duration_seconds,
7363                    "request_timeout_seconds": model_config.request_timeout_seconds,
7364                },
7365            )
7366            result = _error_result(exc)
7367            result["duration_seconds"] = llm_duration_seconds
7368            hard_failure_note = _hard_llm_provider_failure_note(exc)
7369            if hard_failure_note:
7370                result["provider_action_required"] = True
7371                result["pause_reason"] = "llm_provider_blocked"
7372                db.update_job_status(
7373                    job_id,
7374                    "paused",
7375                    metadata_patch={
7376                        "last_note": hard_failure_note,
7377                        "provider_blocked_at": datetime.now(timezone.utc).isoformat(),
7378                    },
7379                )
7380                db.append_agent_update(
7381                    job_id,
7382                    hard_failure_note,
7383                    category="error",
7384                    metadata={"reason": "llm_provider_blocked", "error_type": type(exc).__name__},
7385                )
7386            db.finish_step(step_id, status="failed", output_data=result, error=str(exc))
7387            db.finish_run(run_id, "failed", error=str(exc))
7388            _emit_loop_end(db, job_id, run_id, status="failed", step_id=step_id, detail=str(exc))
7389            refresh_memory_index(db, job_id)
7390            return StepExecution(job_id=job_id, run_id=run_id, step_id=step_id, tool_name=None, status="failed", result=result)
7391
7392        llm_duration_seconds = round(max(0.0, time.monotonic() - llm_started), 3)
7393        job = db.get_job(job_id)
7394        usage = _emit_assistant_message_event(
7395            db,
7396            job_id,
7397            run_id,
7398            response,
7399            messages=messages,
7400            context_length=config.model.context_length,
7401            duration_seconds=llm_duration_seconds,
7402        )
7403        emit_context_pressure_update(db, job_id, usage)
7404        emit_usage_pressure_update(db, job_id, db.job_token_usage(job_id))
7405
7406        tool_repair_attempted = False
7407        tool_repair_error: dict[str, Any] | None = None
7408        original_content = response.content
7409        if not response.tool_calls and getattr(llm, "tool_repair", False):
7410            tool_repair_attempted = True
7411            repair_messages = _tool_repair_messages(messages, response)
7412            repair_started = time.monotonic()
7413            try:
7414                repair_response = _call_next_action_with_timeout(
7415                    llm,
7416                    messages=repair_messages,
7417                    tools=_registry_tools_for_step(registry, config, recent_steps, job=job),
7418                    timeout_seconds=model_config.request_timeout_seconds,
7419                )
7420            except Exception as exc:
7421                tool_repair_error = _error_result(exc)
7422                tool_repair_error["duration_seconds"] = round(max(0.0, time.monotonic() - repair_started), 3)
7423            else:
7424                repair_duration_seconds = round(max(0.0, time.monotonic() - repair_started), 3)
7425                repair_usage = _emit_assistant_message_event(
7426                    db,
7427                    job_id,
7428                    run_id,
7429                    repair_response,
7430                    messages=repair_messages,
7431                    context_length=config.model.context_length,
7432                    duration_seconds=repair_duration_seconds,
7433                )
7434                emit_context_pressure_update(db, job_id, repair_usage)
7435                emit_usage_pressure_update(db, job_id, db.job_token_usage(job_id))
7436                if repair_response.tool_calls:
7437                    response = repair_response
7438
7439        if response.tool_calls:
7440            executions: list[StepExecution] = []
7441            details: list[str] = []
7442            run_error: str | None = None
7443            ordered_tool_calls = _ordered_tool_calls_for_execution(
7444                response.tool_calls,
7445                job=db.get_job(job_id),
7446                recent_steps=db.list_steps(job_id=job_id),
7447            )
7448            for index, call in enumerate(ordered_tool_calls):
7449                current_job = db.get_job(job_id)
7450                current_recent_steps = db.list_steps(job_id=job_id)
7451                execution, stop_batch, detail, error = _execute_tool_call(
7452                    call,
7453                    job=current_job,
7454                    recent_steps=current_recent_steps,
7455                    config=config,
7456                    db=db,
7457                    artifacts=artifacts,
7458                    registry=registry,
7459                    job_id=job_id,
7460                    run_id=run_id,
7461                )
7462                executions.append(execution)
7463                details.append(detail)
7464                if error:
7465                    run_error = error
7466                if stop_batch:
7467                    if index < len(ordered_tool_calls) - 1 and _is_continuable_recoverable_input_block(execution):
7468                        details.append(f"continued after recoverable {call.name} input block")
7469                        continue
7470                    break
7471
7472            final_execution = executions[-1]
7473            run_status = "failed" if any(item.status == "failed" for item in executions) else "completed"
7474            db.finish_run(run_id, run_status, error=run_error)
7475            detail = f"executed {len(executions)}/{len(response.tool_calls)} tool calls"
7476            if details:
7477                detail = f"{detail}; last: {details[-1]}"
7478            _emit_loop_end(
7479                db,
7480                job_id,
7481                run_id,
7482                status=final_execution.status,
7483                step_id=final_execution.step_id,
7484                tool_name=final_execution.tool_name,
7485                detail=detail,
7486            )
7487            refresh_memory_index(db, job_id)
7488            return final_execution
7489
7490        step_id = db.add_step(
7491            job_id=job_id,
7492            run_id=run_id,
7493            kind="assistant",
7494            status="blocked",
7495            summary="worker returned content without a tool call",
7496            input_data={},
7497        )
7498        result = {
7499            "success": False,
7500            "recoverable": True,
7501            "error": "worker tool call required",
7502            "content": response.content,
7503            "original_content": original_content,
7504            "tool_repair_attempted": tool_repair_attempted,
7505            "tool_repair_error": tool_repair_error,
7506            "next": (
7507                "Worker turns must use a tool call. Continue by choosing one bounded action such as "
7508                "record_tasks, report_update, write_artifact, write_file, shell_exec, record_findings, "
7509                "record_source, record_experiment, record_lesson, or defer_job."
7510            ),
7511        }
7512        db.append_agent_update(
7513            job_id,
7514            "Worker returned a message without a tool call; continuing with a tool-action recovery constraint.",
7515            category="blocked",
7516            metadata={"reason": "worker_tool_call_required", "step_id": step_id},
7517        )
7518        db.finish_step(
7519            step_id,
7520            status="blocked",
7521            summary="blocked assistant-only worker turn; tool call required",
7522            output_data=result,
7523            error="worker tool call required",
7524        )
7525        db.finish_run(run_id, "blocked", error="worker tool call required")
7526        _emit_loop_end(
7527            db,
7528            job_id,
7529            run_id,
7530            status="blocked",
7531            step_id=step_id,
7532            detail="worker tool call required",
7533        )
7534        refresh_memory_index(db, job_id)
7535        return StepExecution(job_id=job_id, run_id=run_id, step_id=step_id, tool_name=None, status="blocked", result=result)
7536    finally:
7537        if owns_db:
7538            db.close()
nipux_cli/worker_policy.py 489 lines
   1"""Static worker prompt and loop policy constants."""
   2
   3from __future__ import annotations
   4
   5import re
   6
   7
   8REFLECTION_INTERVAL_STEPS = 12
   9WORKER_PROTOCOL_VERSION = "2026-05-01-contract-first-v1"
  10
  11SYSTEM_PROMPT = """You are a long-running local work agent.
  12
  13Operate as a bounded worker, not a chat assistant. Choose one useful next step,
  14call one of the available tools, and persist important evidence as artifacts.
  15Do not claim the whole job is complete. A strong result is only a checkpoint:
  16save it, report it, add the next tasks, and continue improving or broadening.
  17
  18Use a contract-first durable cycle. Read the objective, operator context,
  19roadmap, active task, and recent evidence; choose the next action that satisfies
  20the active output contract; produce or measure concrete evidence; update the
  21right ledger; report the checkpoint; then open or continue the next branch.
  22Research is only one possible contract. For action, experiment, monitor, report,
  23or file-deliverable work, prefer execution, measurement, validation, or writing
  24over more background collection. Keep moving forever until the operator pauses
  25or cancels the job.
  26The worker must not mark jobs completed or failed; use record_tasks,
  27record_lesson, report_update, and artifacts to describe checkpoints, blockers,
  28and next branches while the job stays runnable.
  29
  30Avoid loops. Do not repeat the same search query or the same exact tool call.
  31If search results already exist, move forward by extracting source pages,
  32opening a useful site in the browser, or saving a finding/evidence artifact.
  33If a page has already been extracted and contains useful evidence, save that
  34evidence with write_artifact before doing more searching or browsing.
  35Only click or type browser refs from the most recent successful browser snapshot
  36or navigation result. If a click/type fails with an unknown ref, use the fresh
  37recovery snapshot or call browser_snapshot before retrying.
  38If a source shows Cloudflare, login, paywall, or anti-bot verification, keep it
  39visible in the trace. Do not bypass protections. Continue with normal visible
  40browser actions when possible, persist what you have, or use alternate public
  41sources if stuck.
  42If a tool returns a list of actionable candidates such as files, packages,
  43configurations, commands, sources, venues, records, branches, or options, do not
  44keep re-listing the same candidate set with small formatting changes. Persist the
  45candidate list once, choose the best candidate for the active contract, and move
  46to execution, measurement, validation, or an explicit blocked decision.
  47If a probe discovers a local/runtime candidate that might satisfy the active
  48contract, promote that candidate immediately: record the fact, validate it with
  49the smallest relevant action, and measure it before continuing external
  50acquisition or research retries. Do not let an available local candidate fall
  51out of context while pursuing lower-confidence external sources.
  52If repeated external acquisition attempts fail with authentication, permission,
  53quota, missing credentials, or unavailable resources, mark that branch blocked
  54or low-yield and pivot to another source, local candidate, monitor/defer branch,
  55or operator-visible credential requirement instead of retrying small variants.
  56If a browser page says blocked, CAPTCHA, bot check, login required, paywall, or
  57anti-bot, treat that page as a failed/low-yield source for the current job. Do
  58not write an artifact that claims usable evidence exists unless the evidence is
  59actually visible. Record the source outcome or pivot to another public source.
  60Use report_update for short operator-readable progress notes when you need to
  61say what you found or why you are blocked. Do not use report_update instead of
  62write_artifact when you have durable evidence, findings, or report content to save.
  63Use write_file when the objective requires a concrete file deliverable, source
  64file, document, config, dataset, or other workspace output. If a measured
  65experiment says the next action is to write, merge, update, compile, or insert
  66content, prefer write_file or an execution command that actually changes the
  67target over more read-only inspection.
  68Use defer_job when the next useful step is to wait for an external process,
  69scheduled check, long-running command, or monitor interval. Do not
  70simulate waiting with repeated searches, reports, or shell probes.
  71Use record_lesson when you learn something that should change future behavior:
  72bad source patterns, task-specific success criteria, repeated mistakes, operator
  73preferences, or a better strategy. Keep lessons short and reusable.
  74Use record_memory_graph when work produces reusable connected knowledge: an
  75episode worth remembering, a stable fact, a strategy, a reusable skill, an open
  76question, a decision, or a constraint. Link nodes to their evidence and to each
  77other. Treat this as the job's durable brain: recent events are fast episodic
  78memory, while stable graph nodes are consolidated knowledge that should guide
  79future branches without replaying raw history.
  80Durable memory is not automatically true forever. If newer evidence contradicts
  81an older memory-graph fact, constraint, strategy, or finding, update the older
  82record as deprecated/resolved/stale and link the newer evidence before acting.
  83Prefer fresh measured or directly observed evidence over stale summaries.
  84Use record_source when a source is high-yield, low-yield, blocked, repetitive,
  85or otherwise useful to score for future behavior.
  86Use record_findings after finding durable candidates, facts, opportunities,
  87experiments, files, bugs, sources, or other reusable outputs. Dedupe against the
  88finding ledger and artifacts before saving.
  89Use record_tasks to maintain a durable queue of objective-neutral branches:
  90open work, active branch, blocked branch, completed branch, and skipped branch.
  91Each task should include an output_contract (research, artifact, experiment,
  92action, monitor, decision, or report), acceptance criteria, evidence needed,
  93and stall behavior so progress is judged by evidence, not activity volume.
  94Before marking a task or milestone done, audit the claim against the objective:
  95list the requirement, the concrete artifact/file/finding/measurement/validation
  96that proves it, and any remaining gap. If the audit is incomplete, keep the
  97branch active or blocked and create the next smallest follow-up task.
  98When the job is broad or starts looping, split it into tasks and move to the
  99highest-priority open task rather than staying on one source or tactic forever.
 100Use record_roadmap for broad, multi-phase, or ambiguous objectives that need a
 101higher-level orchestration plan. A roadmap is generic: milestones group related
 102features or work units; each milestone has acceptance criteria, evidence needed,
 103and a validation contract. Use record_milestone_validation at milestone checkpoints
 104to pass, fail, block, or create follow-up tasks from validation gaps. Keep the
 105roadmap compact and update it from durable evidence, not from activity count.
 106Use record_experiment for measurable trials, benchmarks, comparisons,
 107optimization attempts, or hypothesis tests. A saved note, source, or artifact is
 108not enough progress for a measurable objective: record the exact configuration,
 109metric, result, whether higher or lower is better, and the next experiment. Keep
 110improving against the best observed result instead of declaring victory after a
 111single measurement.
 112Use shell_exec for command-line work, repository inspection, diagnostics,
 113benchmarks, repeatable experiments, and other command execution that the
 114objective requires. Prefer small read-only probes before changing anything, use
 115explicit timeouts, and save important command output with write_artifact before
 116continuing. Do not run destructive or high-risk cyber commands.
 117For long downloads, builds, training runs, crawls, benchmarks, or other slow
 118actions, treat the action as a monitored branch: choose a timeout that can make
 119meaningful progress, use resumable commands when available, record partial
 120progress as an experiment/task/checkpoint, and use defer_job when the next useful
 121step is to wait and check again. Do not repeatedly restart the same long action
 122with short timeouts without recording what changed and how the next attempt will
 123resume or differ.
 124If a probe shows a partial output, incomplete file, running process, cache entry,
 125checkpoint, or other unfinished artifact from an action branch, stop re-listing
 126the same state. Either resume/continue the action with a resumable command,
 127record a monitor/defer step for the still-running work, or record the branch as
 128blocked with the concrete missing condition and next action.
 129read_artifact only reads saved Nipux artifacts. Use shell_exec for repository,
 130workspace, project, or filesystem files that are not saved artifacts.
 131write_file writes workspace/local files directly; write_artifact writes Nipux's
 132separate saved-output store. Use the right one for the operator-facing result.
 133Operator messages are durable context from the human operator. Messages marked
 134steer are active constraints until acknowledged or superseded. Messages marked
 135follow_up are lower-priority queued work; keep them in the task queue and act on
 136them after the current active branch has a durable checkpoint. Messages marked
 137note are durable preferences. Use acknowledge_operator_context only after you
 138have incorporated or intentionally superseded a steer/follow_up message.
 139"""
 140
 141INFORMATION_GATHERING_TOOLS = {
 142    "browser_back",
 143    "browser_click",
 144    "browser_console",
 145    "browser_navigate",
 146    "browser_press",
 147    "browser_scroll",
 148    "browser_snapshot",
 149    "browser_type",
 150    "web_extract",
 151    "web_search",
 152}
 153
 154ARTIFACT_REVIEW_TOOLS = {"read_artifact", "search_artifacts"}
 155MEMORY_REVIEW_TOOLS = {"search_memory_graph"}
 156BRANCH_WORK_TOOLS = INFORMATION_GATHERING_TOOLS | ARTIFACT_REVIEW_TOOLS | MEMORY_REVIEW_TOOLS | {"shell_exec"}
 157LEDGER_PROGRESS_TOOLS = {
 158    "guard_recovery",
 159    "record_findings",
 160    "record_memory_graph",
 161    "record_source",
 162    "record_tasks",
 163    "record_roadmap",
 164    "record_milestone_validation",
 165    "record_experiment",
 166    "record_lesson",
 167}
 168MEASUREMENT_RESOLUTION_TOOLS = {"record_experiment", "record_lesson", "record_tasks", "record_milestone_validation"}
 169FILE_VALIDATION_RESOLUTION_TOOLS = {
 170    "shell_exec",
 171    "record_experiment",
 172    "record_lesson",
 173    "record_tasks",
 174    "record_milestone_validation",
 175    "record_memory_graph",
 176    "acknowledge_operator_context",
 177}
 178ARTIFACT_ACCOUNTING_RESOLUTION_TOOLS = LEDGER_PROGRESS_TOOLS | {"acknowledge_operator_context"}
 179ARTIFACT_ACCOUNTING_BLOCKED_TOOLS = INFORMATION_GATHERING_TOOLS | {
 180    "shell_exec",
 181    "write_file",
 182    "write_artifact",
 183    "read_artifact",
 184    "search_artifacts",
 185    "report_update",
 186}
 187MEASUREMENT_BLOCKED_TOOLS = INFORMATION_GATHERING_TOOLS | {
 188    "shell_exec",
 189    "write_file",
 190    "write_artifact",
 191    "record_findings",
 192    "record_memory_graph",
 193    "record_source",
 194    "acknowledge_operator_context",
 195    "report_update",
 196}
 197FILE_VALIDATION_BLOCKED_TOOLS = INFORMATION_GATHERING_TOOLS | ARTIFACT_REVIEW_TOOLS | MEMORY_REVIEW_TOOLS | {
 198    "write_file",
 199    "write_artifact",
 200    "record_findings",
 201    "record_source",
 202    "report_update",
 203}
 204MILESTONE_VALIDATION_BLOCKED_TOOLS = INFORMATION_GATHERING_TOOLS | {
 205    "shell_exec",
 206    "write_file",
 207    "write_artifact",
 208    "record_findings",
 209    "record_source",
 210    "record_experiment",
 211    "report_update",
 212}
 213ROADMAP_STALENESS_BLOCKED_TOOLS = INFORMATION_GATHERING_TOOLS | {
 214    "shell_exec",
 215    "write_file",
 216    "write_artifact",
 217    "record_findings",
 218    "record_source",
 219    "record_tasks",
 220    "record_experiment",
 221    "report_update",
 222}
 223CHURN_TOOLS = INFORMATION_GATHERING_TOOLS | ARTIFACT_REVIEW_TOOLS | MEMORY_REVIEW_TOOLS | {"shell_exec"}
 224MEMORY_CONSOLIDATION_BLOCKED_TOOLS = CHURN_TOOLS | {"write_artifact", "write_file", "report_update"}
 225ACTIVITY_STAGNATION_BLOCKED_TOOLS = CHURN_TOOLS | {"write_artifact", "write_file", "report_update"}
 226DELIVERABLE_PROGRESS_BLOCKED_TOOLS = INFORMATION_GATHERING_TOOLS | ARTIFACT_REVIEW_TOOLS | {"report_update"}
 227RESEARCH_BALANCE_BLOCKED_TOOLS = ARTIFACT_REVIEW_TOOLS | MEMORY_REVIEW_TOOLS | {
 228    "shell_exec",
 229    "write_file",
 230    "write_artifact",
 231    "record_lesson",
 232    "report_update",
 233}
 234SOURCE_YIELD_BLOCKED_TOOLS = INFORMATION_GATHERING_TOOLS | ARTIFACT_REVIEW_TOOLS | MEMORY_REVIEW_TOOLS | {
 235    "shell_exec",
 236    "write_file",
 237    "write_artifact",
 238    "report_update",
 239}
 240MEASURABLE_RESEARCH_BLOCKED_TOOLS = INFORMATION_GATHERING_TOOLS | {
 241    "write_artifact",
 242    "record_findings",
 243    "record_source",
 244    "report_update",
 245}
 246MEASURABLE_PROGRESS_PATTERN = re.compile(
 247    r"(?i)\b("
 248    r"benchmark|baseline|compare|comparison|experiment|improv(?:e|ing|ement)|increase|latency|"
 249    r"measure|metric|minimi[sz]e|maximi[sz]e|optim(?:ize|ise|ization|isation)|performance|"
 250    r"rate|reduce|score|speed|throughput|tune|tuning"
 251    r")\b"
 252)
 253RECOVERABLE_GUARD_ERRORS = {
 254    "artifact search loop blocked",
 255    "browser runtime unavailable",
 256    "deliverable checkpoint required",
 257    "durable progress required",
 258    "evidence checkpoint accounting required",
 259    "evidence grounding required",
 260    "duplicate tool call blocked",
 261    "experiment stagnation decision required",
 262    "experiment next action pending",
 263    "known bad source blocked",
 264    "lesson consolidation required",
 265    "memory graph consolidation required",
 266    "measurement obligation pending",
 267    "measured progress required",
 268    "progress accounting required",
 269    "progress ledger update required",
 270    "action decision required",
 271    "similar artifact search blocked",
 272    "similar search query blocked",
 273    "source yield accounting required",
 274    "task execution required",
 275    "task branch required before more work",
 276    "task queue saturated",
 277    "usage pressure recovery required",
 278    "worker tool call required",
 279}
 280MEASURABLE_RESEARCH_BUDGET_STEPS = 18
 281MEASURABLE_ACTION_BUDGET_STEPS = 4
 282DELIVERABLE_RESEARCH_BUDGET_STEPS = 18
 283ACTIVITY_STAGNATION_CHECKPOINTS = 3
 284TASK_QUEUE_SATURATION_OPEN_TASKS = 40
 285TASK_QUEUE_TOTAL_SOFT_LIMIT = 80
 286TASK_PLANNING_STAGNATION_CHECKPOINTS = 2
 287PROGRAM_PROMPT_CHARS = 2000
 288MEMORY_ENTRY_PROMPT_CHARS = 700
 289MEMORY_PROMPT_CHARS = 1800
 290RECENT_STATE_STEPS = 5
 291RECENT_STATE_PROMPT_CHARS = 3000
 292TIMELINE_PROMPT_EVENTS = 8
 293SECTION_ITEM_CHARS = 420
 294MAX_WORKER_PROMPT_CHARS = 18_000
 295TIMELINE_PROMPT_EVENT_TYPES = {
 296    "agent_message",
 297    "artifact",
 298    "error",
 299    "experiment",
 300    "finding",
 301    "lesson",
 302    "memory_node",
 303    "milestone_validation",
 304    "reflection",
 305    "roadmap",
 306    "source",
 307    "task",
 308}
 309TIMELINE_PROMPT_AGENT_TITLES = {"blocked", "error", "plan", "progress", "report", "update"}
 310TIMELINE_PROMPT_TOOL_STATUSES = {"blocked", "failed"}
 311PROMPT_SECTION_BUDGETS = {
 312    "Workspace": 520,
 313    "Operator context": 2_200,
 314    "Current execution focus": 1_600,
 315    "Pending measurement obligation": 1_100,
 316    "Candidate file discovery": 2_000,
 317    "Measured progress guard": 1_000,
 318    "Experiment stagnation guard": 1_000,
 319    "Source yield guard": 1_000,
 320    "Deliverable progress guard": 1_000,
 321    "Progress accounting guard": 900,
 322    "Activity stagnation": 900,
 323    "Task planning guard": 900,
 324    "Memory consolidation guard": 900,
 325    "Lesson consolidation guard": 900,
 326    "Durable progress yield": 900,
 327    "Program": 1_400,
 328    "Usage pressure": 900,
 329    "Lessons learned": 1_100,
 330    "Memory graph": 1_800,
 331    "Roadmap": 2_000,
 332    "Task queue": 2_400,
 333    "Durable outcomes": 1_200,
 334    "Ledgers": 2_400,
 335    "Experiment ledger": 2_200,
 336    "Reflections": 900,
 337    "Compact memory": 1_100,
 338    "Recent visible timeline": 1_000,
 339    "Recent state": 1_800,
 340    "Next-action constraint": 1_100,
 341}
 342
 343QUERY_STOPWORDS = {
 344    "and",
 345    "are",
 346    "does",
 347    "for",
 348    "from",
 349    "how",
 350    "offer",
 351    "product",
 352    "service",
 353    "services",
 354    "the",
 355    "they",
 356    "what",
 357    "with",
 358}
 359TEXT_TOKEN_STOPWORDS = <redacted>
 360    "and",
 361    "are",
 362    "for",
 363    "from",
 364    "into",
 365    "that",
 366    "the",
 367    "this",
 368    "with",
 369}
 370
 371EVIDENCE_ARTIFACT_TERMS = {
 372    "checkpoint",
 373    "evidence",
 374    "extract",
 375    "extracted",
 376    "notes",
 377    "source",
 378    "sources",
 379}
 380DELIVERABLE_ARTIFACT_TERMS = {
 381    "article",
 382    "checklist",
 383    "compiled",
 384    "deck",
 385    "deliverable",
 386    "doc",
 387    "document",
 388    "draft",
 389    "final",
 390    "guide",
 391    "manual",
 392    "memo",
 393    "outline",
 394    "paper",
 395    "presentation",
 396    "report",
 397    "revision",
 398    "section",
 399    "spec",
 400    "template",
 401    "updated",
 402    "writeup",
 403}
 404TASK_DELIVERABLE_ACTION_TERMS = {
 405    "add",
 406    "append",
 407    "compile",
 408    "create",
 409    "edit",
 410    "insert",
 411    "polish",
 412    "rewrite",
 413    "update",
 414    "write",
 415}
 416EXPERIMENT_DELIVERY_ACTION_TERMS = {
 417    "append",
 418    "apply",
 419    "build",
 420    "compile",
 421    "create",
 422    "edit",
 423    "finish",
 424    "fix",
 425    "generate",
 426    "implement",
 427    "insert",
 428    "merge",
 429    "patch",
 430    "produce",
 431    "publish",
 432    "replace",
 433    "rewrite",
 434    "save",
 435    "update",
 436    "write",
 437}
 438EXPERIMENT_INFORMATION_ACTION_TERMS = {
 439    "audit",
 440    "collect",
 441    "extract",
 442    "find",
 443    "gather",
 444    "inspect",
 445    "read",
 446    "research",
 447    "review",
 448    "search",
 449    "source",
 450    "survey",
 451}
 452EXPERIMENT_NEXT_ACTION_BLOCKED_TOOLS = INFORMATION_GATHERING_TOOLS | {"report_update"}
 453READ_ONLY_SHELL_COMMAND_PATTERN = re.compile(
 454    r"(?is)^\s*(?:"
 455    r"awk\b|cat\b|df\b|du\b|echo\b|find\b|git\s+(?:diff|grep|log|ls-files|show|status)\b|"
 456    r"grep\b|head\b|ls\b|pwd\b|rg\b|sed\s+-n\b|stat\b|tail\b|tree\b|wc\b"
 457    r")"
 458)
 459
 460BROWSER_REF_IGNORE_NAMES = {
 461    "about us",
 462    "back to top",
 463    "careers",
 464    "click here",
 465    "clutch rating",
 466    "organization name",
 467    "contact",
 468    "contact us",
 469    "go",
 470    "headquarters",
 471    "help",
 472    "latest links",
 473    "learn more",
 474    "privacy",
 475    "read more",
 476    "readmore",
 477    "services",
 478    "submit",
 479    "top hits",
 480}
 481
 482ANTI_BOT_ACK_TERMS = (
 483    "anti-bot",
 484    "blocked",
 485    "bot check",
 486    "captcha",
 487    "not usable",
 488    "verification",
 489)
nipux_cli/worker_prompt_context.py 921 lines
   1"""Prompt-context renderers for the Nipux worker loop."""
   2
   3from __future__ import annotations
   4
   5import re
   6from typing import Any
   7
   8from nipux_cli.memory_graph import memory_graph_for_prompt
   9from nipux_cli.metric_format import format_metric_value
  10from nipux_cli.operator_context import active_prompt_operator_entries, operator_entry_is_prompt_relevant
  11from nipux_cli.tui_outcomes import hourly_outcome_summary, model_update_event_parts, outcome_counts
  12from nipux_cli.worker_policy import (
  13    MAX_WORKER_PROMPT_CHARS,
  14    PROMPT_SECTION_BUDGETS,
  15    SECTION_ITEM_CHARS,
  16    TIMELINE_PROMPT_AGENT_TITLES,
  17    TIMELINE_PROMPT_EVENT_TYPES,
  18    TIMELINE_PROMPT_EVENTS,
  19    TIMELINE_PROMPT_TOOL_STATUSES,
  20)
  21from nipux_cli.worker_prompt_format import clip_text as _clip_text
  22
  23
  24NEGATIVE_EXISTENCE_MARKERS = (
  25    "cannot access",
  26    "does not exist",
  27    "failed to find",
  28    "has not been",
  29    "missing",
  30    "no ",
  31    "no such",
  32    "none",
  33    "not available",
  34    "not detected",
  35    "not downloaded",
  36    "not found",
  37    "not installed",
  38    "unavailable",
  39    "was not",
  40    "without",
  41)
  42NEGATIVE_EVIDENCE_LINE_MARKERS = (
  43    "cannot access",
  44    "denied",
  45    "does not exist",
  46    "error",
  47    "failed",
  48    "failure",
  49    "has not been",
  50    "missing",
  51    "no such",
  52    "not available",
  53    "not detected",
  54    "not downloaded",
  55    "not found",
  56    "not installed",
  57    "permission",
  58    "timeout",
  59    "unavailable",
  60    "was not",
  61)
  62
  63
  64def _memory_entries_for_prompt(memory_entries: list[dict[str, Any]], *, limit: int = 2) -> list[dict[str, Any]]:
  65    entries = [entry for entry in memory_entries if isinstance(entry, dict)]
  66    rolling = next((entry for entry in entries if entry.get("key") == "rolling_state"), None)
  67    selected: list[dict[str, Any]] = []
  68    if rolling:
  69        selected.append(rolling)
  70    for entry in entries:
  71        if len(selected) >= limit:
  72            break
  73        if rolling is not None and entry is rolling:
  74            continue
  75        selected.append(entry)
  76    return selected[:limit]
  77
  78
  79def _render_worker_prompt(job: dict[str, Any], *, sections: list[tuple[str, str]]) -> str:
  80    objective = _clip_text(job.get("objective") or "", 2_000)
  81    header = f"Job: {job['title']}\nKind: {job['kind']}\nObjective:\n{objective}"
  82    metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
  83    stale_tokens = _stale_claim_tokens_for_prompt(
  84        metadata,
  85        reference_text=" ".join(str(job.get(key) or "") for key in ("title", "objective", "kind")),
  86    )
  87    instruction = (
  88        "Take exactly one bounded next action. If recent state contains search results, do not search the same query again. "
  89        "If recent state contains extracted page evidence, write an artifact before doing more search or browsing."
  90    )
  91    scale = 1.0
  92    while True:
  93        parts = [header]
  94        for title, body in sections:
  95            base_budget = PROMPT_SECTION_BUDGETS.get(title, SECTION_ITEM_CHARS)
  96            budget = max(260, int(base_budget * scale))
  97            safe_body = _redact_stale_tokens_for_prompt(body, stale_tokens)
  98            parts.append(f"{title}:\n{_clip_text(safe_body, budget)}")
  99        parts.append(instruction)
 100        content = "\n\n".join(parts)
 101        if len(content) <= MAX_WORKER_PROMPT_CHARS or scale <= 0.45:
 102            break
 103        scale -= 0.12
 104    if len(content) <= MAX_WORKER_PROMPT_CHARS:
 105        return content
 106    suffix_sections: list[str] = []
 107    for title, body in sections:
 108        if title == "Operator context":
 109            suffix_sections.append(f"Operator context:\n{_clip_text(_redact_stale_tokens_for_prompt(body, stale_tokens), 900)}")
 110        elif title == "Next-action constraint":
 111            suffix_sections.append(f"Next-action constraint:\n{_clip_text(_redact_stale_tokens_for_prompt(body, stale_tokens), 900)}")
 112    suffix = "\n\n".join(suffix_sections + [instruction])
 113    marker = "\n\n...[middle context clipped; operator context and next action repeated below]...\n"
 114    head_budget = max(0, MAX_WORKER_PROMPT_CHARS - len(suffix) - len(marker))
 115    return _clip_text(content, head_budget) + marker + suffix
 116
 117
 118def _redact_stale_tokens_for_prompt(text: str, stale_tokens: list[str]) -> str:
 119    redacted = str(text or "")
 120    for token in sorted((str(token) for token in stale_tokens if str(token).strip()), key=len, reverse=True):
 121        pattern = r"(?<![A-Za-z0-9])" + re.escape(token) + r"(?![A-Za-z0-9])"
 122        redacted = re.sub(
 123            pattern,
 124            lambda match: match.group(0) if _match_inside_path_like_span(match.string, match.start(), match.end()) else "[unsupported-stale-claim]",
 125            redacted,
 126            flags=re.IGNORECASE,
 127        )
 128    return redacted
 129
 130
 131def _match_inside_path_like_span(text: str, start: int, end: int) -> bool:
 132    left = start
 133    while left > 0 and not text[left - 1].isspace() and text[left - 1] not in "'\"`<>|;":
 134        left -= 1
 135    right = end
 136    while right < len(text) and not text[right].isspace() and text[right] not in "'\"`<>|;":
 137        right += 1
 138    span = text[left:right]
 139    return "/" in span or "\\" in span or span.startswith(("~", "."))
 140
 141
 142def _operator_messages_for_prompt(
 143    job: dict[str, Any],
 144    *,
 145    active_messages: list[dict[str, Any]] | None = None,
 146    include_unclaimed: bool = True,
 147) -> str:
 148    metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
 149    messages = metadata.get("operator_messages") if isinstance(metadata.get("operator_messages"), list) else []
 150    lines = []
 151    active_messages = active_prompt_operator_entries(active_messages or [])
 152    active_ids = {active.get("event_id") for active in active_messages if isinstance(active, dict)}
 153    if active_messages:
 154        lines.append("Newly delivered operator messages for this turn:")
 155    for entry in active_messages:
 156        line = _operator_message_line(entry)
 157        if line:
 158            lines.append(line)
 159    active_context = [
 160        entry
 161        for entry in messages
 162        if isinstance(entry, dict)
 163        and operator_entry_is_prompt_relevant(entry)
 164        and _operator_message_visible_in_prompt(entry, include_unclaimed=include_unclaimed)
 165        and entry.get("event_id") not in active_ids
 166    ]
 167    if active_context:
 168        if lines:
 169            lines.append("Still-active durable operator context:")
 170        for entry in active_context[-6:]:
 171            line = _operator_message_line(entry)
 172            if line:
 173                lines.append(line)
 174    return "\n".join(lines) if lines else "No active operator context."
 175
 176
 177def _operator_message_line(entry: dict[str, Any]) -> str:
 178    if not isinstance(entry, dict):
 179        return ""
 180    at = str(entry.get("at") or "")
 181    source = str(entry.get("source") or "operator")
 182    mode = str(entry.get("mode") or "steer")
 183    event_id = str(entry.get("event_id") or "")
 184    message = " ".join(str(entry.get("message") or "").split())
 185    if message:
 186        states = []
 187        if entry.get("claimed_at"):
 188            states.append("delivered")
 189        if entry.get("acknowledged_at"):
 190            states.append("acknowledged")
 191        if entry.get("superseded_at"):
 192            states.append("superseded")
 193        state_text = f" ({', '.join(states)})" if states else ""
 194        id_text = f" id={event_id}" if event_id else ""
 195        return f"-{id_text} {at} {source} {mode}{state_text}: {_clip_text(message, 420)}"
 196    return ""
 197
 198
 199def _lessons_for_prompt(job: dict[str, Any]) -> str:
 200    metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
 201    lessons = metadata.get("lessons") if isinstance(metadata.get("lessons"), list) else []
 202    if not lessons:
 203        return "No durable lessons yet."
 204    reference_text = " ".join(str(job.get(key) or "") for key in ("title", "objective", "kind"))
 205    positive_lines = _positive_durable_lines_for_lesson_conflicts(metadata)
 206    stale_lesson_ids = _stale_negative_record_ids(metadata, kind="lesson")
 207    lines = []
 208    for entry in lessons[-5:]:
 209        if not isinstance(entry, dict):
 210            continue
 211        category = str(entry.get("category") or "memory")
 212        raw_lesson = str(entry.get("lesson") or "")
 213        record_id = _record_id_for_staleness(entry)
 214        conflicting_tokens = _negative_lesson_conflict_tokens(raw_lesson, positive_lines)
 215        if conflicting_tokens:
 216            lesson = (
 217                "Potentially stale negative lesson suppressed for "
 218                + ", ".join(conflicting_tokens[:6])
 219                + ". Re-verify against fresh evidence before using this claim."
 220            )
 221        elif record_id in stale_lesson_ids:
 222            lesson = "Potentially stale negative lesson suppressed after fresh contradictory evidence. Re-verify before using this claim."
 223        else:
 224            lesson = _lesson_prompt_text(raw_lesson, reference_text=reference_text)
 225        if lesson:
 226            lines.append(f"- {category}: {_clip_text(lesson, SECTION_ITEM_CHARS)}")
 227    return "\n".join(lines) if lines else "No durable lessons yet."
 228
 229
 230def _lesson_prompt_text(lesson: str, *, reference_text: str = "") -> str:
 231    lesson = " ".join(str(lesson or "").split())
 232    if "unsupported concrete tokens" not in lesson.lower():
 233        return lesson
 234    reference_norm = _normalize_claim_text(reference_text)
 235    stale_tokens = []
 236    seen = set()
 237    for token in _unsupported_tokens_from_lesson(lesson):
 238        cleaned = " ".join(str(token or "").split())
 239        if not cleaned:
 240            continue
 241        key = cleaned.lower()
 242        if reference_norm and _normalize_claim_text(cleaned) in reference_norm:
 243            continue
 244        if key in seen or not _stale_token_is_distinctive(cleaned):
 245            continue
 246        seen.add(key)
 247        stale_tokens.append(cleaned)
 248    if stale_tokens:
 249        return (
 250            "Evidence grounding rejected unsupported durable-record claims: "
 251            + ", ".join(stale_tokens[:8])
 252            + ". Re-verify them from fresh evidence before using them."
 253        )
 254    return "Evidence grounding rejected an unsupported durable record. Re-verify from fresh evidence before using it."
 255
 256
 257def _positive_durable_lines_for_lesson_conflicts(metadata: dict[str, Any]) -> list[str]:
 258    lines: list[str] = []
 259    for key in ("finding_ledger", "experiment_ledger", "source_ledger"):
 260        records = metadata.get(key) if isinstance(metadata.get(key), list) else []
 261        for record in records[-30:]:
 262            if isinstance(record, dict):
 263                text = _dict_scalar_text(record)
 264                if text:
 265                    lines.append(text)
 266    graph = metadata.get("memory_graph") if isinstance(metadata.get("memory_graph"), dict) else {}
 267    nodes = graph.get("nodes") if isinstance(graph.get("nodes"), list) else []
 268    for node in nodes[-30:]:
 269        if isinstance(node, dict):
 270            text = _dict_scalar_text(node)
 271            if text:
 272                lines.append(text)
 273    return lines
 274
 275
 276def _negative_lesson_conflict_tokens(lesson: str, positive_lines: list[str]) -> list[str]:
 277    lesson = " ".join(str(lesson or "").split())
 278    lesson_lower = lesson.lower()
 279    if not positive_lines or not any(marker in lesson_lower for marker in NEGATIVE_EXISTENCE_MARKERS):
 280        return []
 281    tokens = _distinctive_claim_tokens(lesson)
 282    conflicts: list[str] = []
 283    seen: set[str] = set()
 284    for token in tokens:
 285        key = token.lower()
 286        if key in seen:
 287            continue
 288        seen.add(key)
 289        if not _token_near_negative_marker(lesson, token):
 290            continue
 291        if _positive_line_contains_token(positive_lines, token):
 292            conflicts.append(token)
 293    return conflicts
 294
 295
 296def _dict_scalar_text(record: dict[str, Any]) -> str:
 297    parts: list[str] = []
 298    for key, value in record.items():
 299        if key in {"created_at", "updated_at", "event_id", "id"}:
 300            continue
 301        if isinstance(value, (str, int, float, bool)):
 302            parts.append(str(value))
 303        elif isinstance(value, dict):
 304            parts.append(_dict_scalar_text(value))
 305    return " ".join(part for part in parts if part)
 306
 307
 308def _distinctive_claim_tokens(text: str) -> list[str]:
 309    tokens: list[str] = []
 310    for raw in re.findall(r"\b[A-Za-z][A-Za-z0-9_.+-]{1,}\b", text):
 311        token = raw.strip("._+-")
 312        if token and _stale_token_is_distinctive(token):
 313            tokens.append(token)
 314    return tokens
 315
 316
 317def _token_near_negative_marker(text: str, token: str, *, window: int = 140) -> bool:
 318    text_lower = text.lower()
 319    token_lower = token.lower()
 320    start = 0
 321    while True:
 322        index = text_lower.find(token_lower, start)
 323        if index < 0:
 324            return False
 325        nearby = text_lower[max(0, index - window): index + len(token_lower) + window]
 326        if any(marker in nearby for marker in NEGATIVE_EXISTENCE_MARKERS):
 327            return True
 328        start = index + len(token_lower)
 329
 330
 331def _positive_line_contains_token(lines: list[str], token: str) -> bool:
 332    token_lower = token.lower()
 333    for line in lines:
 334        line_lower = line.lower()
 335        if token_lower not in line_lower:
 336            continue
 337        if any(marker in line_lower for marker in NEGATIVE_EVIDENCE_LINE_MARKERS):
 338            continue
 339        return True
 340    return False
 341
 342
 343def _memory_graph_for_prompt(job: dict[str, Any]) -> str:
 344    metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
 345    stale_tokens = _stale_claim_tokens_for_prompt(
 346        metadata,
 347        reference_text=" ".join(str(job.get(key) or "") for key in ("title", "objective", "kind")),
 348    )
 349    return memory_graph_for_prompt(job, limit=10, stale_tokens=stale_tokens)
 350
 351
 352def _roadmap_for_prompt(job: dict[str, Any]) -> str:
 353    metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
 354    roadmap = metadata.get("roadmap") if isinstance(metadata.get("roadmap"), dict) else {}
 355    if not roadmap:
 356        return (
 357            "No roadmap yet. If the objective is broad, multi-phase, or needs validation checkpoints, "
 358            "use record_roadmap to define compact milestones, features, acceptance criteria, and validation evidence."
 359        )
 360    milestones = roadmap.get("milestones") if isinstance(roadmap.get("milestones"), list) else []
 361    status_counts: dict[str, int] = {}
 362    validation_counts: dict[str, int] = {}
 363    for milestone in milestones:
 364        if not isinstance(milestone, dict):
 365            continue
 366        status = str(milestone.get("status") or "planned")
 367        validation_status = str(milestone.get("validation_status") or "not_started")
 368        status_counts[status] = status_counts.get(status, 0) + 1
 369        validation_counts[validation_status] = validation_counts.get(validation_status, 0) + 1
 370    lines = [
 371        _clip_text(
 372            f"{roadmap.get('status') or 'planned'}: {roadmap.get('title') or 'Roadmap'}"
 373            + (f" | current={roadmap.get('current_milestone')}" if roadmap.get("current_milestone") else ""),
 374            520,
 375        ),
 376        "Milestone counts: " + (", ".join(f"{key}={value}" for key, value in sorted(status_counts.items())) or "none"),
 377        "Validation counts: " + (", ".join(f"{key}={value}" for key, value in sorted(validation_counts.items())) or "none"),
 378    ]
 379    if roadmap.get("scope"):
 380        lines.append("Scope: " + _clip_text(str(roadmap.get("scope") or ""), 420))
 381    if roadmap.get("validation_contract"):
 382        lines.append("Validation contract: " + _clip_text(str(roadmap.get("validation_contract") or ""), 520))
 383    selected = [
 384        milestone for milestone in milestones
 385        if isinstance(milestone, dict)
 386        and str(milestone.get("status") or "planned") in {"active", "validating", "planned", "blocked"}
 387    ][:6]
 388    if not selected:
 389        selected = [milestone for milestone in milestones if isinstance(milestone, dict)][-4:]
 390    for milestone in selected[:6]:
 391        features = milestone.get("features") if isinstance(milestone.get("features"), list) else []
 392        open_features = sum(1 for feature in features if isinstance(feature, dict) and str(feature.get("status") or "planned") in {"planned", "active"})
 393        detail = " | ".join(
 394            bit
 395            for bit in [
 396                str(milestone.get("status") or "planned"),
 397                f"validation={milestone.get('validation_status') or 'not_started'}",
 398                f"p={milestone.get('priority') or 0}",
 399                str(milestone.get("title") or "milestone"),
 400                f"features={len(features)}/{open_features} open" if features else "",
 401            ]
 402            if bit
 403        )
 404        if milestone.get("acceptance_criteria"):
 405            detail += f" | accept={milestone.get('acceptance_criteria')}"
 406        if milestone.get("evidence_needed"):
 407            detail += f" | evidence={milestone.get('evidence_needed')}"
 408        if milestone.get("validation_result"):
 409            detail += f" | validation_result={milestone.get('validation_result')}"
 410        if milestone.get("next_action"):
 411            detail += f" | next={milestone.get('next_action')}"
 412        lines.append("- " + _clip_text(detail, 620))
 413    return "\n".join(lines)
 414
 415
 416def _tasks_for_prompt(job: dict[str, Any]) -> str:
 417    tasks = _metadata_list(job, "task_queue")
 418    if not tasks:
 419        return (
 420            "No durable task queue yet. If the objective is broad, use record_tasks "
 421            "to create a few concrete open branches with output contracts and acceptance criteria before continuing."
 422        )
 423    status_rank = {"active": 0, "open": 1, "blocked": 2, "done": 3, "skipped": 4}
 424    ranked = sorted(
 425        tasks,
 426        key=lambda task: (status_rank.get(str(task.get("status") or "open"), 9), -_as_int(task.get("priority"))),
 427    )
 428    counts: dict[str, int] = {}
 429    for task in tasks:
 430        status = str(task.get("status") or "open")
 431        counts[status] = counts.get(status, 0) + 1
 432    lines = ["Task counts: " + ", ".join(f"{key}={value}" for key, value in sorted(counts.items()))]
 433    selected = [task for task in ranked if str(task.get("status") or "open") in {"active", "open"}][:6]
 434    if len(selected) < 6:
 435        selected.extend([task for task in ranked if str(task.get("status") or "open") == "blocked"][: 6 - len(selected)])
 436    if len(selected) < 6:
 437        selected.extend([task for task in ranked if task not in selected][: 6 - len(selected)])
 438    for task in selected[:6]:
 439        output_contract = _task_output_contract(task)
 440        bits = [
 441            str(task.get("status") or "open"),
 442            f"priority={task.get('priority') or 0}",
 443            str(task.get("title") or "untitled"),
 444        ]
 445        if output_contract:
 446            bits.append(f"contract={output_contract}")
 447        detail = " | ".join(bit for bit in bits if bit)
 448        if task.get("goal"):
 449            detail += f" | goal={task.get('goal')}"
 450        if task.get("acceptance_criteria"):
 451            detail += f" | accept={task.get('acceptance_criteria')}"
 452        if task.get("evidence_needed"):
 453            detail += f" | evidence={task.get('evidence_needed')}"
 454        if task.get("stall_behavior"):
 455            detail += f" | stall={task.get('stall_behavior')}"
 456        if task.get("source_hint"):
 457            detail += f" | source_hint={task.get('source_hint')}"
 458        if task.get("result"):
 459            detail += f" | result={task.get('result')}"
 460        lines.append("- " + _clip_text(detail, 520))
 461    return "\n".join(lines)
 462
 463
 464def _task_output_contract(task: dict[str, Any]) -> str:
 465    metadata = task.get("metadata") if isinstance(task.get("metadata"), dict) else {}
 466    return str(task.get("output_contract") or task.get("contract") or metadata.get("output_contract") or metadata.get("contract") or "")
 467
 468
 469def _timeline_for_prompt(events: list[dict[str, Any]]) -> str:
 470    if not events:
 471        return "No timeline events yet."
 472    selected: list[tuple[str, str, str]] = []
 473    counts: dict[str, int] = {}
 474    for event in events:
 475        rendered = _timeline_event_for_prompt(event)
 476        if not rendered:
 477            continue
 478        at, event_type, detail = rendered
 479        counts[event_type] = counts.get(event_type, 0) + 1
 480        selected.append((at, event_type, detail))
 481    if not selected:
 482        return "No high-signal timeline events yet. Recent state covers raw tool activity."
 483    summary = ", ".join(f"{key}={value}" for key, value in sorted(counts.items()))
 484    lines = [f"High-signal timeline counts: {summary}"]
 485    for at, event_type, detail in selected[-TIMELINE_PROMPT_EVENTS:]:
 486        prefix = f"- {at} {event_type}: " if at else f"- {event_type}: "
 487        lines.append(prefix + _clip_text(detail, SECTION_ITEM_CHARS))
 488    return "\n".join(lines)
 489
 490
 491def _outcomes_for_prompt(events: list[dict[str, Any]]) -> str:
 492    """Summarize durable outputs so the worker sees progress, not just activity."""
 493
 494    if not events:
 495        return "No durable outcomes visible in recent timeline."
 496    counts = outcome_counts(events, include_research=False, include_failures=True)
 497    summary = hourly_outcome_summary(counts)
 498    lines = [f"Outcome counts: {summary or 'none'}."]
 499    seen: set[str] = set()
 500    for event in reversed(events):
 501        parsed = model_update_event_parts(event, width=240, compact=False)
 502        if not parsed:
 503            continue
 504        label, text, _clock = parsed
 505        if label in {"DONE", "PLAN", "UPDATE"}:
 506            continue
 507        key = f"{label}:{text}"
 508        if key in seen:
 509            continue
 510        seen.add(key)
 511        lines.append(f"- {label.lower()}: {_clip_text(text, 360)}")
 512        if len(lines) >= 8:
 513            break
 514    if len(lines) == 1:
 515        lines.append("No durable output/finding/measurement records are visible; prioritize creating or accounting for one.")
 516    return "\n".join(lines)
 517
 518
 519def _ledgers_for_prompt(job: dict[str, Any]) -> str:
 520    metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
 521    findings = _metadata_list(job, "finding_ledger")
 522    sources = _metadata_list(job, "source_ledger")
 523    stale_tokens = _stale_claim_tokens_for_prompt(
 524        metadata,
 525        reference_text=" ".join(str(job.get(key) or "") for key in ("title", "objective", "kind")),
 526    )
 527    stale_record_ids = _stale_negative_record_ids(metadata, kind="finding")
 528    stale_findings = [
 529        finding
 530        for finding in findings
 531        if _record_contains_stale_token(finding, stale_tokens)
 532        or _record_id_for_staleness(finding) in stale_record_ids
 533    ]
 534    active_findings = [finding for finding in findings if finding not in stale_findings]
 535    lines = [
 536        f"Finding ledger: {len(findings)} unique candidates.",
 537        f"Source ledger: {len(sources)} scored sources.",
 538    ]
 539    if stale_tokens:
 540        lines.append(
 541            "Unsupported/stale claim tokens to avoid until re-verified: "
 542            + ", ".join(stale_tokens[:12])
 543        )
 544    if active_findings:
 545        lines.append("Recent findings:")
 546        for finding in active_findings[-5:]:
 547            bits = [
 548                str(finding.get("name") or "unknown"),
 549                str(finding.get("category") or "").strip(),
 550                str(finding.get("location") or "").strip(),
 551                f"score={finding.get('score')}" if finding.get("score") is not None else "",
 552            ]
 553            lines.append("- " + _clip_text(" | ".join(bit for bit in bits if bit), 360))
 554    if stale_findings:
 555        lines.append(
 556            f"Suppressed {len(stale_findings)} stale finding(s) matching unsupported tokens; "
 557            "do not use them as facts until observed again."
 558        )
 559    stale_negative_records = _stale_negative_records_for_prompt(metadata, kind="finding")
 560    if stale_negative_records:
 561        lines.append("Contradicted negative findings suppressed:")
 562        for record in stale_negative_records[-4:]:
 563            lines.append(
 564                "- "
 565                + _clip_text(
 566                    f"{record.get('title') or 'finding'} token={record.get('token') or ''} evidence={record.get('evidence') or ''}",
 567                    360,
 568                )
 569            )
 570    if sources:
 571        usable_sources = [
 572            source
 573            for source in sources
 574            if _as_float(source.get("usefulness_score")) >= 0.2
 575            or _as_int(source.get("yield_count")) > 0
 576        ]
 577        low_quality_sources = [
 578            source
 579            for source in sources
 580            if _as_float(source.get("usefulness_score")) < 0.2
 581            and _as_int(source.get("yield_count")) <= 0
 582            and (_as_int(source.get("fail_count")) > 0 or source.get("warnings"))
 583        ]
 584        ranked = sorted(
 585            usable_sources,
 586            key=lambda item: (_as_float(item.get("usefulness_score")), _as_int(item.get("yield_count"))),
 587            reverse=True,
 588        )
 589        if ranked:
 590            lines.append("High-yield/current sources:")
 591            for source in ranked[:4]:
 592                lines.append(
 593                    "- "
 594                    + _clip_text(
 595                        f"{source.get('source')} type={source.get('source_type') or 'unknown'} "
 596                        f"score={source.get('usefulness_score')} findings={source.get('yield_count') or 0} "
 597                        f"fails={source.get('fail_count') or 0} outcome={source.get('last_outcome') or ''}",
 598                        420,
 599                    )
 600                )
 601        if low_quality_sources:
 602            lines.append("Low-yield/blocked source patterns to avoid:")
 603            for source in low_quality_sources[-3:]:
 604                lines.append(
 605                    "- "
 606                    + _clip_text(
 607                        f"{source.get('source')} type={source.get('source_type') or 'unknown'} "
 608                        f"score={source.get('usefulness_score')} fails={source.get('fail_count') or 0} "
 609                        f"warnings={', '.join(source.get('warnings') or [])} outcome={source.get('last_outcome') or ''}",
 610                        420,
 611                    )
 612                )
 613    return "\n".join(lines)
 614
 615
 616def _stale_claim_tokens_for_prompt(metadata: dict[str, Any], *, reference_text: str = "") -> list[str]:
 617    raw_tokens = metadata.get("unsupported_claim_tokens")
 618    tokens: list[str] = []
 619    seen: set[str] = set()
 620    candidates: list[Any] = []
 621    reference_norm = _normalize_claim_text(reference_text)
 622    if isinstance(raw_tokens, list):
 623        candidates.extend(raw_tokens)
 624    stale_records = metadata.get("stale_negative_records")
 625    if isinstance(stale_records, list):
 626        for record in stale_records:
 627            if isinstance(record, dict):
 628                candidates.append(record.get("token"))
 629    lessons = metadata.get("lessons")
 630    if isinstance(lessons, list):
 631        for lesson in lessons[-25:]:
 632            if not isinstance(lesson, dict):
 633                continue
 634            candidates.extend(_unsupported_tokens_from_lesson(str(lesson.get("lesson") or "")))
 635    for raw in candidates:
 636        token = " ".join(str(raw or "").split())
 637        if not token:
 638            continue
 639        if reference_norm and _normalize_claim_text(token) in reference_norm:
 640            continue
 641        if not _stale_token_is_distinctive(token):
 642            continue
 643        key = token.lower()
 644        if key in seen:
 645            continue
 646        seen.add(key)
 647        tokens.append(token)
 648    return tokens[-100:]
 649
 650
 651def _unsupported_tokens_from_lesson(lesson: str) -> list[str]:
 652    marker = "unsupported concrete tokens"
 653    if marker not in lesson.lower():
 654        return []
 655    match = re.search(r"unsupported concrete tokens for .*?:\s*(.*?)(?:\.\s+Treat matching|$)", lesson, flags=re.IGNORECASE)
 656    if not match:
 657        match = re.search(r"unsupported concrete tokens for .*?:\s*(.*?)(?:\.|$)", lesson, flags=re.IGNORECASE)
 658    if not match:
 659        return []
 660    return [part.strip() for part in match.group(1).split(",") if part.strip()]
 661
 662
 663def _stale_token_is_distinctive(token: str) -> bool:
 664    lowered = token.lower()
 665    if lowered.startswith(".") and re.match(r"^\.[a-z0-9][a-z0-9_-]{1,12}$", lowered):
 666        return lowered not in {".app", ".co", ".com", ".dev", ".edu", ".gov", ".io", ".net", ".org", ".www"}
 667    if lowered in {
 668        "api",
 669        "ascii",
 670        "blocked",
 671        "broken",
 672        "candidate",
 673        "candidates",
 674        "cdn",
 675        "cli",
 676        "critical",
 677        "cpu",
 678        "cuda",
 679        "discovered",
 680        "discovery",
 681        "ggml",
 682        "gguf",
 683        "gpu",
 684        "hf_token",
 685        "html",
 686        "http",
 687        "https",
 688        "incomplete",
 689        "json",
 690        "lfs",
 691        "not_found",
 692        "oid",
 693        "onnx",
 694        "planned",
 695        "python",
 696        "python3",
 697        "ram",
 698        "rest",
 699        "severe",
 700        "sha",
 701        "sha256",
 702        "search",
 703        "usable",
 704        "unvalidated",
 705        "valid",
 706        "validity",
 707        "validate",
 708        "validated",
 709        "vram",
 710        "xml",
 711        "xet",
 712        "yaml",
 713        "yml",
 714    }:
 715        return False
 716    if lowered.startswith((
 717        "art_",
 718        "step_",
 719        "step-",
 720        "shell_",
 721        "shell-",
 722        "web_",
 723        "web-",
 724        "episode-",
 725        "fact-",
 726        "source-",
 727        "quality-",
 728        "constraint-",
 729        "baseline-",
 730        "question-",
 731        "verified_",
 732        "verified-",
 733        "timeout_",
 734        "timeout-",
 735    )):
 736        return False
 737    if lowered.endswith((".md", ".py", ".json", ".yaml", ".yml", ".gguf", ".txt", ".log")):
 738        return False
 739    if lowered.startswith(("python-", "pip", "pip3")):
 740        return False
 741    if len(token) < 4:
 742        return False
 743    return (any(ch.isalpha() for ch in token) and any(ch.isdigit() for ch in token)) or (token.isupper() and len(token) >= 4)
 744
 745
 746def _normalize_claim_text(text: str) -> str:
 747    return re.sub(r"[^a-z0-9]+", "", str(text or "").lower())
 748
 749
 750def _record_contains_stale_token(record: dict[str, Any], stale_tokens: list[str]) -> bool:
 751    if not stale_tokens:
 752        return False
 753    text = " ".join(
 754        str(record.get(key) or "")
 755        for key in ("name", "category", "location", "contact", "reason", "source_url", "url")
 756    )
 757    metadata = record.get("metadata") if isinstance(record.get("metadata"), dict) else {}
 758    text += " " + " ".join(str(value) for value in metadata.values() if isinstance(value, (str, int, float)))
 759    for token in stale_tokens:
 760        pattern = r"(?<![A-Za-z0-9])" + re.escape(token) + r"(?![A-Za-z0-9])"
 761        if re.search(pattern, text, flags=re.IGNORECASE):
 762            return True
 763    return False
 764
 765
 766def _stale_negative_records_for_prompt(metadata: dict[str, Any], *, kind: str) -> list[dict[str, Any]]:
 767    records = metadata.get("stale_negative_records")
 768    if not isinstance(records, list):
 769        return []
 770    return [
 771        record
 772        for record in records
 773        if isinstance(record, dict) and str(record.get("kind") or "") == kind
 774    ]
 775
 776
 777def _stale_negative_record_ids(metadata: dict[str, Any], *, kind: str) -> set[str]:
 778    ids: set[str] = set()
 779    for record in _stale_negative_records_for_prompt(metadata, kind=kind):
 780        record_id = str(record.get("record_id") or "").strip()
 781        if record_id:
 782            ids.add(record_id)
 783    return ids
 784
 785
 786def _record_id_for_staleness(record: dict[str, Any]) -> str:
 787    for key in ("key", "event_id", "id"):
 788        value = str(record.get(key) or "").strip()
 789        if value:
 790            return value
 791    return _normalize_claim_text(str(record.get("name") or record.get("title") or ""))[:120]
 792
 793
 794def _experiments_for_prompt(job: dict[str, Any]) -> str:
 795    experiments = _metadata_list(job, "experiment_ledger")
 796    if not experiments:
 797        return (
 798            "No experiments tracked yet. If this objective involves improving, "
 799            "comparing, benchmarking, reducing, increasing, or otherwise measuring something, "
 800            "turn candidate ideas into record_experiment entries with exact config, metric, result, and next action."
 801        )
 802    measured = [experiment for experiment in experiments if experiment.get("metric_value") is not None]
 803    best = [
 804        experiment
 805        for experiment in measured
 806        if bool(experiment.get("best_observed"))
 807    ]
 808    status_counts: dict[str, int] = {}
 809    for experiment in experiments:
 810        status = str(experiment.get("status") or "planned")
 811        status_counts[status] = status_counts.get(status, 0) + 1
 812    lines = [
 813        f"Experiment counts: {', '.join(f'{key}={value}' for key, value in sorted(status_counts.items()))}.",
 814        f"Measured results: {len(measured)}.",
 815    ]
 816    if best:
 817        lines.append("Best observed results:")
 818        for experiment in best[-3:]:
 819            metric = format_metric_value(
 820                experiment.get("metric_name") or "metric",
 821                experiment.get("metric_value"),
 822                experiment.get("metric_unit") or "",
 823            )
 824            lines.append(
 825                "- "
 826                + _clip_text(" | ".join(
 827                    bit
 828                    for bit in [
 829                        str(experiment.get("title") or "experiment"),
 830                        metric,
 831                        f"result={experiment.get('result')}" if experiment.get("result") else "",
 832                        f"next={experiment.get('next_action')}" if experiment.get("next_action") else "",
 833                    ]
 834                    if bit
 835                ), 520)
 836            )
 837    recent = experiments[-4:]
 838    if recent:
 839        lines.append("Recent experiments:")
 840        for experiment in recent:
 841            metric = ""
 842            if experiment.get("metric_value") is not None:
 843                metric = format_metric_value(
 844                    experiment.get("metric_name") or "metric",
 845                    experiment.get("metric_value"),
 846                    experiment.get("metric_unit") or "",
 847                )
 848            delta = ""
 849            if experiment.get("delta_from_previous_best") is not None:
 850                delta = f"delta={experiment.get('delta_from_previous_best')}"
 851            lines.append(
 852                "- "
 853                + _clip_text(" | ".join(
 854                    bit
 855                    for bit in [
 856                        str(experiment.get("status") or "planned"),
 857                        str(experiment.get("title") or "experiment"),
 858                        metric,
 859                        delta,
 860                        f"next={experiment.get('next_action')}" if experiment.get("next_action") else "",
 861                    ]
 862                    if bit
 863                ), 520)
 864            )
 865    return "\n".join(lines)
 866
 867
 868def _operator_message_visible_in_prompt(entry: dict[str, Any], *, include_unclaimed: bool) -> bool:
 869    mode = str(entry.get("mode") or "steer").strip().lower().replace("-", "_")
 870    if entry.get("claimed_at") or mode == "note":
 871        return True
 872    return include_unclaimed and mode == "steer"
 873
 874
 875def _metadata_list(job: dict[str, Any], key: str) -> list[dict[str, Any]]:
 876    metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
 877    values = metadata.get(key)
 878    if not isinstance(values, list):
 879        return []
 880    return [value for value in values if isinstance(value, dict)]
 881
 882
 883def _timeline_event_for_prompt(event: dict[str, Any]) -> tuple[str, str, str] | None:
 884    event_type = str(event.get("event_type") or "event")
 885    if event_type == "operator_message":
 886        return None
 887    title = " ".join(str(event.get("title") or "").split())
 888    body = " ".join(str(event.get("body") or "").split())
 889    metadata = event.get("metadata") if isinstance(event.get("metadata"), dict) else {}
 890    title_lower = title.lower()
 891    if event_type == "tool_result":
 892        status = str(metadata.get("status") or "").lower()
 893        if status not in TIMELINE_PROMPT_TOOL_STATUSES:
 894            return None
 895    elif event_type == "agent_message":
 896        if title_lower not in TIMELINE_PROMPT_AGENT_TITLES:
 897            return None
 898    elif event_type not in TIMELINE_PROMPT_EVENT_TYPES:
 899        return None
 900    at = str(event.get("created_at") or "")
 901    detail = title if title else event_type
 902    if body:
 903        detail = f"{detail}: {body}"
 904    if event_type == "tool_result":
 905        status = str(metadata.get("status") or "").lower()
 906        detail = f"{status} {detail}".strip()
 907    return at, event_type, detail
 908
 909
 910def _as_float(value: Any, default: float = 0.0) -> float:
 911    try:
 912        return float(value)
 913    except (TypeError, ValueError):
 914        return default
 915
 916
 917def _as_int(value: Any, default: int = 0) -> int:
 918    try:
 919        return int(value)
 920    except (TypeError, ValueError):
 921        return default
nipux_cli/worker_prompt_format.py 223 lines
   1"""Prompt-facing summaries for worker history and tool observations."""
   2
   3from __future__ import annotations
   4
   5import json
   6import re
   7from pathlib import Path
   8from typing import Any
   9
  10from nipux_cli.metric_format import format_metric_value
  11from nipux_cli.source_quality import anti_bot_reason
  12from nipux_cli.worker_policy import BROWSER_REF_IGNORE_NAMES
  13
  14
  15def compact(value: Any, limit: int = 500) -> str:
  16    text = json.dumps(value, ensure_ascii=False, sort_keys=True) if not isinstance(value, str) else value
  17    text = " ".join(text.split())
  18    return text if len(text) <= limit else text[:limit] + "..."
  19
  20
  21def clip_text(value: Any, limit: int) -> str:
  22    text = " ".join(str(value or "").split())
  23    if len(text) <= limit:
  24        return text
  25    return text[: max(0, limit - 3)].rstrip() + "..."
  26
  27
  28def format_step_for_prompt(step: dict[str, Any]) -> str:
  29    tool = f" tool={step['tool_name']}" if step.get("tool_name") else ""
  30    summary = step.get("summary") or step.get("error") or ""
  31    pieces = [f"- #{step['step_no']} {step['kind']} {step['status']}{tool}: {summary}"]
  32    input_data = step.get("input") or {}
  33    args = input_data.get("arguments") if isinstance(input_data, dict) else None
  34    if args:
  35        pieces.append(f"  args: {compact(args, 320)}")
  36    output = step.get("output") or {}
  37    observation = observation_for_prompt(step.get("tool_name"), output)
  38    if observation:
  39        pieces.append(f"  observed: {observation}")
  40    return "\n".join(pieces)
  41
  42
  43def observation_for_prompt(tool_name: str | None, output: dict[str, Any]) -> str:
  44    if not output:
  45        return ""
  46    if output.get("error"):
  47        if tool_name in {"browser_click", "browser_type"}:
  48            recovery = output.get("recovery_snapshot") if isinstance(output.get("recovery_snapshot"), dict) else {}
  49            candidates = browser_candidates_for_prompt(recovery)
  50            suffix = f"; recovery_candidates={candidates}" if candidates else ""
  51            return clip_text(f"error={output.get('error')}; guidance={output.get('recovery_guidance', '')}{suffix}", 700)
  52        evidence_grounding = output.get("evidence_grounding") if isinstance(output.get("evidence_grounding"), dict) else {}
  53        if evidence_grounding:
  54            missing_paths = evidence_grounding.get("missing_candidate_paths")
  55            if isinstance(missing_paths, list) and missing_paths:
  56                missing_paths = [path for path in (clean_prompt_candidate_path(item) for item in missing_paths) if path]
  57                return clip_text(
  58                    "error=evidence grounding required; missing_exact_paths="
  59                    + ", ".join(str(path) for path in missing_paths[:8])
  60                    + "; rewrite the durable record with exact observed paths or state why they are irrelevant",
  61                    900,
  62                )
  63            unsupported = evidence_grounding.get("unsupported_tokens")
  64            if isinstance(unsupported, list) and unsupported:
  65                return clip_text(
  66                    "error=evidence grounding required; unsupported="
  67                    + ", ".join(str(token) for token in unsupported[:10])
  68                    + "; use only tokens present in recent observed evidence or record uncertainty",
  69                    700,
  70                )
  71        recent_artifacts = output.get("recent_artifacts") if isinstance(output.get("recent_artifacts"), list) else []
  72        if tool_name == "read_artifact" and recent_artifacts:
  73            refs = []
  74            for artifact in recent_artifacts[:6]:
  75                if not isinstance(artifact, dict):
  76                    continue
  77                number = str(artifact.get("number") or "").strip()
  78                artifact_id = str(artifact.get("id") or "").strip()
  79                title = str(artifact.get("title") or "").strip()
  80                label = artifact_id or number
  81                if label:
  82                    refs.append(f"{label}={title}" if title else label)
  83            suffix = f"; valid_recent_artifacts={'; '.join(refs)}" if refs else ""
  84            return clip_text(f"error={output.get('error')}; guidance={output.get('guidance') or ''}{suffix}", 900)
  85        return clip_text(f"error={output.get('error')}; guidance={output.get('guidance') or ''}", 700)
  86    if tool_name == "web_search":
  87        results = output.get("results") if isinstance(output.get("results"), list) else []
  88        titles = []
  89        for result in results[:5]:
  90            title = result.get("title") or "untitled"
  91            url = result.get("url") or ""
  92            titles.append(f"{title} <{url}>")
  93        return clip_text(f"query={output.get('query')!r}; results={'; '.join(titles)}", 650)
  94    if tool_name == "web_extract":
  95        pages = output.get("pages") if isinstance(output.get("pages"), list) else []
  96        parts = []
  97        for page in pages[:3]:
  98            if page.get("error"):
  99                parts.append(f"{page.get('url')}: ERROR {page.get('error')}")
 100            else:
 101                text = str(page.get("text") or "")
 102                parts.append(f"{page.get('url')}: {clip_text(text, 160)}")
 103        return clip_text("; ".join(parts), 650)
 104    if tool_name == "shell_exec":
 105        stdout = str(output.get("stdout") or "")
 106        stderr = str(output.get("stderr") or "")
 107        excerpt = stdout.strip() or stderr.strip()
 108        return (
 109            f"command={output.get('command')!r}; rc={output.get('returncode')}; "
 110            f"duration={output.get('duration_seconds')}s; output={clip_text(excerpt, 360)}"
 111        )[:650]
 112    if tool_name == "write_artifact":
 113        return f"saved artifact={output.get('artifact_id')} path={output.get('path')}"
 114    if tool_name == "report_update":
 115        update = output.get("update") if isinstance(output.get("update"), dict) else {}
 116        return clip_text(f"agent_update={update.get('message') or ''}", 420)
 117    if tool_name == "record_lesson":
 118        lesson = output.get("lesson") if isinstance(output.get("lesson"), dict) else {}
 119        return clip_text(f"lesson={lesson.get('category') or 'memory'}: {lesson.get('lesson') or ''}", 420)
 120    if tool_name == "record_memory_graph":
 121        return (
 122            f"memory_graph added_nodes={output.get('added_nodes')} "
 123            f"updated_nodes={output.get('updated_nodes')} added_edges={output.get('added_edges')}"
 124        )[:520]
 125    if tool_name == "search_memory_graph":
 126        nodes = output.get("nodes") if isinstance(output.get("nodes"), list) else []
 127        titles = [str(node.get("title") or node.get("key") or "memory") for node in nodes[:5] if isinstance(node, dict)]
 128        return clip_text(f"memory_query={output.get('query')!r}; nodes={'; '.join(titles)}", 520)
 129    if tool_name == "record_source":
 130        source = output.get("source") if isinstance(output.get("source"), dict) else {}
 131        return (
 132            f"source={source.get('source')} score={source.get('usefulness_score')} "
 133            f"findings={source.get('yield_count')} fails={source.get('fail_count')} outcome={source.get('last_outcome')}"
 134        )[:420]
 135    if tool_name == "record_findings":
 136        return f"finding ledger updated added={output.get('added')} updated={output.get('updated')}"[:700]
 137    if tool_name == "record_experiment":
 138        experiment = output.get("experiment") if isinstance(output.get("experiment"), dict) else {}
 139        metric = ""
 140        if experiment.get("metric_value") is not None:
 141            metric = format_metric_value(
 142                experiment.get("metric_name") or "metric",
 143                experiment.get("metric_value"),
 144                experiment.get("metric_unit") or "",
 145            )
 146        delta = f" delta={experiment.get('delta_from_previous_best')}" if experiment.get("delta_from_previous_best") is not None else ""
 147        best = " best_observed" if experiment.get("best_observed") else ""
 148        return clip_text(f"experiment={experiment.get('title')} status={experiment.get('status')} {metric}{delta}{best}", 520)
 149    if tool_name == "acknowledge_operator_context":
 150        return f"operator_context {output.get('status')} count={output.get('count')}"[:700]
 151    if tool_name == "browser_navigate":
 152        data = output.get("data") if isinstance(output.get("data"), dict) else {}
 153        title = data.get("title") or ""
 154        url = data.get("url") or ""
 155        snapshot = str(output.get("snapshot") or "")
 156        warning = anti_bot_reason(title, url, snapshot)
 157        suffix = f"; source_warning={warning}" if warning else ""
 158        candidates = browser_candidates_for_prompt(output)
 159        candidate_suffix = f"; candidates={candidates}" if candidates else ""
 160        return clip_text(f"opened {title} <{url}>; snapshot_chars={len(snapshot)}{suffix}{candidate_suffix}", 700)
 161    if tool_name == "browser_snapshot":
 162        data = output.get("data") if isinstance(output.get("data"), dict) else {}
 163        snapshot = str(output.get("snapshot") or data.get("snapshot") or output.get("data") or "")
 164        warning = anti_bot_reason(snapshot)
 165        suffix = f"; source_warning={warning}" if warning else ""
 166        candidates = browser_candidates_for_prompt(output)
 167        candidate_suffix = f"; candidates={candidates}" if candidates else ""
 168        return clip_text(f"snapshot_chars={len(snapshot)}{suffix}{candidate_suffix}", 700)
 169    return compact(output, 700)
 170
 171
 172def clean_prompt_candidate_path(value: Any) -> str:
 173    raw = str(value or "").strip().rstrip(".,:;)")
 174    if not raw or "://" in raw or raw.startswith("//") or "..." in raw or "…" in raw or "*" in raw:
 175        return ""
 176    name = Path(raw).name
 177    suffix = Path(name).suffix
 178    if not name or name.startswith(".") or not suffix:
 179        return ""
 180    if not re.match(r"^\.[A-Za-z0-9][A-Za-z0-9_]{1,12}$", suffix) or not any(ch.isalpha() for ch in suffix):
 181        return ""
 182    return raw
 183
 184
 185def browser_candidates_for_prompt(output: dict[str, Any], *, limit: int = 18) -> str:
 186    refs = output.get("refs") if isinstance(output.get("refs"), dict) else None
 187    if refs is None:
 188        data = output.get("data") if isinstance(output.get("data"), dict) else {}
 189        refs = data.get("refs") if isinstance(data.get("refs"), dict) else {}
 190    candidates = []
 191    seen = set()
 192    for ref, item in refs.items():
 193        if not isinstance(item, dict):
 194            continue
 195        role = str(item.get("role") or "")
 196        if role not in {"link", "heading", "cell"}:
 197            continue
 198        name = " ".join(str(item.get("name") or "").split())
 199        key = name.lower().strip()
 200        if not name or key in BROWSER_REF_IGNORE_NAMES:
 201            continue
 202        if len(name) < 3 or len(name) > 90 or key in seen:
 203            continue
 204        if role == "cell" and (_looks_like_metric_cell(name) or _looks_like_service_description(name)):
 205            continue
 206        seen.add(key)
 207        candidates.append(f"{name} (@{ref})")
 208        if len(candidates) >= limit:
 209            break
 210    return "; ".join(candidates)
 211
 212
 213def _looks_like_metric_cell(name: str) -> bool:
 214    text = name.strip()
 215    return bool(re.fullmatch(r"(?:n/?a|na|[-+]?\d+(?:\.\d+)?(?:/5)?|[$€£]?\d[\d,]*(?:\.\d+)?%?)", text, re.I))
 216
 217
 218def _looks_like_service_description(name: str) -> bool:
 219    text = name.lower()
 220    if "," in text and len(text.split()) >= 6:
 221        return True
 222    service_terms = ("custom ecommerce", "ux/ui", "payment integration", "mobile responsiveness", "headless commerce")
 223    return any(term in text for term in service_terms) and len(text.split()) >= 5
nipux_cli/worker_tool_summary.py 99 lines
   1"""Compact result summaries for worker tool executions."""
   2
   3from __future__ import annotations
   4
   5from typing import Any
   6
   7from nipux_cli.metric_format import format_metric_value
   8
   9
  10def summarize_tool_result(name: str, args: dict[str, Any], result: dict[str, Any], *, ok: bool) -> str:
  11    if not ok:
  12        return f"{name} failed: {result.get('error') or 'unknown error'}"
  13    if name == "web_search":
  14        results = result.get("results") if isinstance(result.get("results"), list) else []
  15        top = "; ".join((item.get("title") or "untitled") for item in results[:3])
  16        return f"web_search query={args.get('query')!r} returned {len(results)} results: {top}"
  17    if name == "web_extract":
  18        pages = result.get("pages") if isinstance(result.get("pages"), list) else []
  19        ok_pages = [page for page in pages if not page.get("error")]
  20        return f"web_extract fetched {len(ok_pages)}/{len(pages)} pages"
  21    if name == "shell_exec":
  22        command = str(result.get("command") or args.get("command") or "")
  23        return (
  24            f"shell_exec rc={result.get('returncode')} "
  25            f"duration={result.get('duration_seconds')}s cmd={command!r}"
  26        )
  27    if name == "write_artifact":
  28        return f"write_artifact saved {result.get('artifact_id')} at {result.get('path')}"
  29    if name == "write_file":
  30        return f"write_file {result.get('mode') or 'overwrite'} {result.get('path')} bytes={result.get('bytes')}"
  31    if name == "defer_job":
  32        return f"defer_job until {result.get('defer_until')}"
  33    if name == "report_update":
  34        update = result.get("update") if isinstance(result.get("update"), dict) else {}
  35        return f"report_update saved: {str(update.get('message') or '')[:160]}"
  36    if name == "record_lesson":
  37        lesson = result.get("lesson") if isinstance(result.get("lesson"), dict) else {}
  38        category = lesson.get("category") or "memory"
  39        text = str(lesson.get("lesson") or "")[:160]
  40        return f"record_lesson saved {category}: {text}"
  41    if name == "record_memory_graph":
  42        return (
  43            f"record_memory_graph updated: {result.get('added_nodes', 0)} new nodes, "
  44            f"{result.get('updated_nodes', 0)} updated, {result.get('added_edges', 0)} links"
  45        )
  46    if name == "search_memory_graph":
  47        nodes = result.get("nodes") if isinstance(result.get("nodes"), list) else []
  48        return f"search_memory_graph returned {len(nodes)} nodes for {args.get('query')!r}"
  49    if name == "record_source":
  50        source = result.get("source") if isinstance(result.get("source"), dict) else {}
  51        return f"record_source updated {source.get('source')} score={source.get('usefulness_score')} yield={source.get('yield_count')}"
  52    if name == "record_findings":
  53        return (
  54            f"record_findings updated ledger: {result.get('added', 0)} new, "
  55            f"{result.get('updated', 0)} updated, {result.get('sources_updated', 0)} sources"
  56        )
  57    if name == "record_tasks":
  58        return f"record_tasks updated queue: {result.get('added', 0)} new, {result.get('updated', 0)} updated"
  59    if name == "record_roadmap":
  60        roadmap = result.get("roadmap") if isinstance(result.get("roadmap"), dict) else {}
  61        return (
  62            f"record_roadmap {roadmap.get('status')}: {roadmap.get('title')} "
  63            f"milestones={len(roadmap.get('milestones') or [])}"
  64        )
  65    if name == "record_milestone_validation":
  66        validation = result.get("validation") if isinstance(result.get("validation"), dict) else {}
  67        return (
  68            f"record_milestone_validation {validation.get('validation_status')}: "
  69            f"{validation.get('title')} followups={len(result.get('follow_up_tasks') or [])}"
  70        )
  71    if name == "record_experiment":
  72        experiment = result.get("experiment") if isinstance(result.get("experiment"), dict) else {}
  73        metric = ""
  74        if experiment.get("metric_value") is not None:
  75            metric = " " + format_metric_value(
  76                experiment.get("metric_name") or "metric",
  77                experiment.get("metric_value"),
  78                experiment.get("metric_unit") or "",
  79            )
  80        best = " best" if experiment.get("best_observed") else ""
  81        return f"record_experiment {experiment.get('status')}: {experiment.get('title')}{metric}{best}"
  82    if name == "acknowledge_operator_context":
  83        return f"acknowledge_operator_context {result.get('status')} count={result.get('count', 0)}"
  84    if name == "browser_navigate":
  85        data = result.get("data") if isinstance(result.get("data"), dict) else {}
  86        title = data.get("title") or ""
  87        url = data.get("url") or ""
  88        warning = f" | warning={result.get('source_warning')}" if result.get("source_warning") else ""
  89        return f"browser_navigate opened {title} <{url}>{warning}"
  90    if name == "browser_snapshot":
  91        snapshot = str(result.get("snapshot") or result.get("data") or "")
  92        warning = f" | warning={result.get('source_warning')}" if result.get("source_warning") else ""
  93        return f"browser_snapshot returned {len(snapshot)} chars{warning}"
  94    if name == "read_artifact":
  95        return f"read_artifact read {result.get('artifact_id')}"
  96    if name == "search_artifacts":
  97        results = result.get("results") if isinstance(result.get("results"), list) else []
  98        return f"search_artifacts returned {len(results)} results for {args.get('query')!r}"
  99    return f"{name} completed"
nipux_cli/worker_usage.py 48 lines
   1"""Usage accounting for worker model turns."""
   2
   3from __future__ import annotations
   4
   5import json
   6from typing import Any
   7
   8from nipux_cli.llm import LLMResponse
   9
  10
  11def turn_usage_metadata(
  12    response: LLMResponse,
  13    *,
  14    messages: list[dict[str, Any]],
  15    context_length: int,
  16) -> dict[str, Any]:
  17    prompt_text = json.dumps(messages, ensure_ascii=False, default=str)
  18    completion_text = response.content + json.dumps(
  19        [{"name": call.name, "arguments": call.arguments} for call in response.tool_calls],
  20        ensure_ascii=False,
  21        default=str,
  22    )
  23    usage = dict(response.usage) if isinstance(response.usage, dict) else {}
  24    prompt_tokens = _as_int(usage.get("prompt_tokens")) or estimate_token_count(prompt_text)
  25    completion_tokens = _as_int(usage.get("completion_tokens")) or estimate_token_count(completion_text)
  26    usage.setdefault("prompt_tokens", prompt_tokens)
  27    usage.setdefault("completion_tokens", completion_tokens)
  28    usage.setdefault("total_tokens", prompt_tokens + completion_tokens)
  29    usage.setdefault("estimated", not bool(response.usage))
  30    usage["prompt_chars"] = len(prompt_text)
  31    usage["completion_chars"] = len(completion_text)
  32    if context_length > 0:
  33        usage["context_length"] = context_length
  34        usage["context_fraction"] = round(prompt_tokens / max(1, context_length), 6)
  35    return usage
  36
  37
  38def estimate_token_count(text: str) -> int:
  39    if not text:
  40        return 0
  41    return max(1, (len(text) + 3) // 4)
  42
  43
  44def _as_int(value: Any) -> int:
  45    try:
  46        return int(float(value))
  47    except (TypeError, ValueError):
  48        return 0
plans/nipux-runtime-notes.md 50 lines
   1# Nipux Runtime Notes
   2
   3Nipux is a narrow, restartable worker for long-running browser, web research,
   4and command-line jobs. The active implementation is intentionally small and
   5centered on `nipux_cli/`, `tests/nipux_cli/`, and the `nipux` console script.
   6
   7## Runtime Shape
   8
   9- Package: `nipux_cli/`
  10- CLI entry point: `nipux`
  11- State home: `~/.nipux` or `NIPUX_HOME`
  12- Config file: `~/.nipux/config.yaml`
  13- Database: SQLite with WAL
  14- Artifacts: per-job files under the configured state home
  15- Browser profiles: per-job `agent-browser` profiles
  16- Model API: OpenAI-compatible chat completions endpoint
  17
  18## Design Constraints
  19
  20- Keep every worker step bounded and restartable.
  21- Persist useful evidence as artifacts before summarizing it.
  22- Keep summaries compact and point back to artifacts.
  23- Maintain source, finding, task, experiment, and lesson ledgers.
  24- Keep jobs runnable until the operator pauses or cancels them.
  25- Keep the tool registry explicit and small.
  26- Keep runtime behavior domain-neutral.
  27
  28## Active Tools
  29
  30- Browser: `browser_navigate`, `browser_snapshot`, `browser_click`,
  31  `browser_type`, `browser_scroll`, `browser_back`, `browser_press`,
  32  `browser_console`
  33- Web: `web_search`, `web_extract`
  34- Local command work: `shell_exec`
  35- Artifacts: `write_artifact`, `read_artifact`, `search_artifacts`
  36- Job state and visibility: `update_job_state`, `report_update`,
  37  `send_digest_email`
  38- Learning ledgers: `record_lesson`, `record_source`, `record_findings`,
  39  `record_tasks`, `record_experiment`
  40
  41## Validation
  42
  43```bash
  44PYTEST_ADDOPTS='' uv run --extra dev python -m pytest -q
  45uv run --extra dev ruff check --isolated nipux_cli tests/nipux_cli
  46uv run nipux doctor
  47```
  48
  49Use `uv run nipux daemon --once --fake` for a deterministic no-model smoke
  50test after CLI or daemon changes.
pyproject.toml 54 lines
   1[build-system]
   2requires = ["setuptools>=61.0"]
   3build-backend = "setuptools.build_meta"
   4
   5[project]
   6name = "nipux"
   7version = "0.1.0"
   8description = "A restartable CLI worker for long-running agent jobs"
   9readme = "README.md"
  10requires-python = ">=3.11"
  11authors = [{ name = "Nipux" }]
  12license = "MIT"
  13keywords = ["agent", "cli", "automation", "daemon", "openai-compatible"]
  14classifiers = [
  15  "Development Status :: 3 - Alpha",
  16  "Environment :: Console",
  17  "Intended Audience :: Developers",
  18  "Programming Language :: Python :: 3",
  19  "Programming Language :: Python :: 3.11",
  20  "Programming Language :: Python :: 3.12",
  21  "Topic :: Software Development :: Libraries :: Application Frameworks",
  22]
  23dependencies = [
  24  "openai>=2.21.0,<3",
  25  "pyyaml>=6.0.2,<7",
  26]
  27
  28[project.urls]
  29Homepage = "https://nipux.com"
  30Source = "https://github.com/nipuxx/agent-cli"
  31Issues = "https://github.com/nipuxx/agent-cli/issues"
  32
  33[project.optional-dependencies]
  34dev = ["pytest>=9.0.2,<10", "ruff"]
  35
  36[dependency-groups]
  37dev = ["pytest>=9.0.2,<10", "ruff"]
  38
  39[project.scripts]
  40nipux = "nipux_cli.cli:main"
  41
  42[tool.setuptools.packages.find]
  43include = ["nipux_cli", "nipux_cli.*"]
  44
  45[tool.pytest.ini_options]
  46testpaths = ["tests/nipux_cli"]
  47addopts = "-q"
  48
  49[tool.ruff]
  50line-length = 120
  51target-version = "py311"
  52
  53[tool.uv]
  54exclude-newer = "7 days"
scripts/generate_project_atlas.py 619 lines
   1#!/usr/bin/env python3
   2"""Generate docs/project-atlas.html from the tracked Nipux source tree."""
   3
   4from __future__ import annotations
   5
   6import ast
   7import html
   8import re
   9import subprocess
  10from dataclasses import dataclass
  11from pathlib import Path
  12from typing import Any
  13
  14
  15ROOT = Path(__file__).resolve().parents[1]
  16OUT = ROOT / "docs" / "project-atlas.html"
  17SOURCE_SUFFIXES = {".py", ".md", ".toml", ".yaml", ".yml"}
  18EXCLUDED = {
  19    "docs/project-atlas.html",
  20    "uv.lock",
  21}
  22SENSITIVE_ASSIGNMENT_RE = re.compile(
  23    r"^(\s*)([A-Z0-9_]*(?:API_KEY|TOKEN|SECRET|PASSWORD)[A-Z0-9_]*)(\s*)=(.*)$"
  24)
  25
  26
  27@dataclass
  28class SourceFile:
  29    path: str
  30    text: str
  31    lines: list[str]
  32    tree: ast.AST | None
  33    error: str = ""
  34
  35
  36@dataclass
  37class Symbol:
  38    path: str
  39    kind: str
  40    name: str
  41    line: int
  42    end_line: int
  43    doc: str
  44    calls: list[str]
  45
  46
  47@dataclass
  48class Prompt:
  49    path: str
  50    name: str
  51    line: int
  52    text: str
  53    context: str
  54
  55
  56def main() -> None:
  57    files = load_source_files()
  58    symbols = extract_symbols(files)
  59    prompts = extract_prompts(files)
  60    tools = extract_tools(files)
  61    tables = extract_tables(files)
  62    commit = git(["rev-parse", "--short", "HEAD"]) or "working-tree"
  63    html_text = render(files, symbols, prompts, tools, tables, commit=commit)
  64    OUT.parent.mkdir(parents=True, exist_ok=True)
  65    OUT.write_text(html_text, encoding="utf-8")
  66    print(f"wrote {OUT.relative_to(ROOT)} ({len(html_text):,} chars)")
  67
  68
  69def load_source_files() -> list[SourceFile]:
  70    paths = tracked_paths()
  71    files: list[SourceFile] = []
  72    for path in paths:
  73        if path in EXCLUDED:
  74            continue
  75        full = ROOT / path
  76        if full.suffix not in SOURCE_SUFFIXES or not full.is_file():
  77            continue
  78        text = full.read_text(encoding="utf-8", errors="replace")
  79        tree = None
  80        error = ""
  81        if full.suffix == ".py":
  82            try:
  83                tree = ast.parse(text, filename=path)
  84            except SyntaxError as exc:
  85                error = str(exc)
  86        files.append(SourceFile(path=path, text=text, lines=text.splitlines(), tree=tree, error=error))
  87    return files
  88
  89
  90def tracked_paths() -> list[str]:
  91    output = git(["ls-files"])
  92    if not output:
  93        return []
  94    return sorted(line.strip() for line in output.splitlines() if line.strip())
  95
  96
  97def git(args: list[str]) -> str:
  98    try:
  99        result = subprocess.run(["git", *args], cwd=ROOT, check=False, capture_output=True, text=True)
 100    except OSError:
 101        return ""
 102    return result.stdout.strip() if result.returncode == 0 else ""
 103
 104
 105def extract_symbols(files: list[SourceFile]) -> list[Symbol]:
 106    symbols: list[Symbol] = []
 107    for source in files:
 108        if source.tree is None:
 109            continue
 110        for node in ast.walk(source.tree):
 111            if isinstance(node, (ast.FunctionDef, ast.AsyncFunctionDef, ast.ClassDef)):
 112                calls = sorted(call_names(node))[:16]
 113                symbols.append(
 114                    Symbol(
 115                        path=source.path,
 116                        kind="class" if isinstance(node, ast.ClassDef) else "function",
 117                        name=node.name,
 118                        line=getattr(node, "lineno", 0),
 119                        end_line=getattr(node, "end_lineno", getattr(node, "lineno", 0)),
 120                        doc=ast.get_docstring(node) or "",
 121                        calls=calls,
 122                    )
 123                )
 124    return sorted(symbols, key=lambda item: (item.path, item.line, item.name))
 125
 126
 127def call_names(node: ast.AST) -> set[str]:
 128    names: set[str] = set()
 129    for child in ast.walk(node):
 130        if not isinstance(child, ast.Call):
 131            continue
 132        name = dotted_name(child.func)
 133        if name:
 134            names.add(name)
 135    return names
 136
 137
 138def dotted_name(node: ast.AST) -> str:
 139    if isinstance(node, ast.Name):
 140        return node.id
 141    if isinstance(node, ast.Attribute):
 142        base = dotted_name(node.value)
 143        return f"{base}.{node.attr}" if base else node.attr
 144    return ""
 145
 146
 147def extract_prompts(files: list[SourceFile]) -> list[Prompt]:
 148    prompts: list[Prompt] = []
 149    for source in files:
 150        if source.tree is None:
 151            continue
 152        for node in ast.walk(source.tree):
 153            if isinstance(node, (ast.Assign, ast.AnnAssign)):
 154                names = assignment_names(node)
 155                value = getattr(node, "value", None)
 156                text = literal_string(value)
 157                if not text:
 158                    continue
 159                if any(is_prompt_name(name) for name in names) or is_prompt_text(text):
 160                    prompts.append(
 161                        Prompt(
 162                            path=source.path,
 163                            name=", ".join(names) or "string",
 164                            line=getattr(node, "lineno", 0),
 165                            text=text,
 166                            context="assignment",
 167                        )
 168                    )
 169            elif isinstance(node, ast.Constant) and isinstance(node.value, str) and is_prompt_text(node.value):
 170                prompts.append(
 171                    Prompt(
 172                        path=source.path,
 173                        name="inline string",
 174                        line=getattr(node, "lineno", 0),
 175                        text=node.value,
 176                        context="inline",
 177                    )
 178                )
 179    deduped: dict[tuple[str, int, str], Prompt] = {}
 180    for prompt in prompts:
 181        key = (prompt.path, prompt.line, prompt.text[:120])
 182        existing = deduped.get(key)
 183        if existing is None or (existing.context == "inline" and prompt.context != "inline"):
 184            deduped[key] = prompt
 185    return sorted(deduped.values(), key=lambda item: (item.path, item.line))
 186
 187
 188def assignment_names(node: ast.Assign | ast.AnnAssign) -> list[str]:
 189    targets = node.targets if isinstance(node, ast.Assign) else [node.target]
 190    names: list[str] = []
 191    for target in targets:
 192        if isinstance(target, ast.Name):
 193            names.append(target.id)
 194        elif isinstance(target, ast.Attribute):
 195            names.append(target.attr)
 196    return names
 197
 198
 199def literal_string(node: ast.AST | None) -> str:
 200    if isinstance(node, ast.Constant) and isinstance(node.value, str):
 201        return node.value
 202    if isinstance(node, ast.JoinedStr):
 203        return "".join(part.value for part in node.values if isinstance(part, ast.Constant) and isinstance(part.value, str))
 204    return ""
 205
 206
 207def is_prompt_name(name: str) -> bool:
 208    upper = name.upper()
 209    return any(term in upper for term in ("PROMPT", "SYSTEM", "INSTRUCTION", "GUIDANCE")) and not upper.endswith("_PATH")
 210
 211
 212def is_prompt_text(text: str) -> bool:
 213    clean = " ".join(text.split())
 214    if len(clean) < 240:
 215        return False
 216    lowered = clean.lower()
 217    return (
 218        "you are" in lowered
 219        or "do not" in lowered and "use " in lowered
 220        or "operator" in lowered and "context" in lowered and "prompt" in lowered
 221        or "next-action" in lowered
 222    )
 223
 224
 225def extract_tools(files: list[SourceFile]) -> list[dict[str, str]]:
 226    tools_text = next((source.text for source in files if source.path == "nipux_cli/tools.py"), "")
 227    tools: list[dict[str, str]] = []
 228    pattern = re.compile(r"ToolSpec\(\s*['\"]([^'\"]+)['\"]\s*,\s*['\"]([^'\"]+)['\"]", re.S)
 229    for match in pattern.finditer(tools_text):
 230        line = tools_text[: match.start()].count("\n") + 1
 231        tools.append({"name": match.group(1), "description": " ".join(match.group(2).split()), "line": str(line)})
 232    return tools
 233
 234
 235def extract_tables(files: list[SourceFile]) -> list[dict[str, Any]]:
 236    db_text = next((source.text for source in files if source.path == "nipux_cli/db.py"), "")
 237    tables: list[dict[str, Any]] = []
 238    for match in re.finditer(r"CREATE TABLE IF NOT EXISTS\s+([a-zA-Z_][a-zA-Z0-9_]*)\s*\((.*?)\)", db_text, re.S):
 239        raw_columns = [line.strip().rstrip(",") for line in match.group(2).splitlines()]
 240        columns = [line for line in raw_columns if line and not line.upper().startswith(("FOREIGN", "UNIQUE", "PRIMARY KEY"))]
 241        tables.append({"name": match.group(1), "columns": columns, "line": db_text[: match.start()].count("\n") + 1})
 242    return tables
 243
 244
 245def render(
 246    files: list[SourceFile],
 247    symbols: list[Symbol],
 248    prompts: list[Prompt],
 249    tools: list[dict[str, str]],
 250    tables: list[dict[str, Any]],
 251    *,
 252    commit: str,
 253) -> str:
 254    python_files = [source for source in files if source.path.endswith(".py")]
 255    total_lines = sum(len(source.lines) for source in files)
 256    file_cards = "\n".join(render_file_card(source, symbols) for source in files)
 257    source_browser = "\n".join(render_source_file(source) for source in files)
 258    symbol_cards = "\n".join(render_symbol(symbol) for symbol in symbols)
 259    prompt_cards = "\n".join(render_prompt(prompt) for prompt in prompts[:80])
 260    tool_rows = "\n".join(
 261        f"<tr><td><code>{esc(tool['name'])}</code></td><td>{esc(tool['description'])}</td><td>{tool['line']}</td></tr>"
 262        for tool in tools
 263    )
 264    table_cards = "\n".join(render_table(table) for table in tables)
 265    risk_cards = render_review_points(files, symbols, prompts, tools)
 266    return f"""<!doctype html>
 267<html lang="en">
 268<head>
 269<meta charset="utf-8">
 270<meta name="viewport" content="width=device-width, initial-scale=1">
 271<title>Nipux Project Atlas</title>
 272<style>
 273:root {{
 274  color-scheme: dark;
 275  --bg: #080909; --panel: #101112; --panel-2: #151717; --text: #ecebe6;
 276  --muted: #9b9b96; --faint: #5f615e; --line: #303332; --accent: #9ad6d1;
 277  --accent-2: #d8d06d; --warn: #ee9b66; --bad: #e36d78; --green: #9fca7f;
 278  --mono: ui-monospace, SFMono-Regular, Menlo, Monaco, Consolas, monospace;
 279  --sans: Inter, ui-sans-serif, system-ui, -apple-system, BlinkMacSystemFont, "Segoe UI", sans-serif;
 280}}
 281* {{ box-sizing: border-box; }}
 282body {{ margin: 0; background: radial-gradient(circle at 70% 0%, rgba(154,214,209,.10), transparent 36%), var(--bg); color: var(--text); font: 15px/1.55 var(--sans); }}
 283a {{ color: var(--accent); text-decoration: none; }}
 284a:hover {{ text-decoration: underline; }}
 285.shell {{ display: grid; grid-template-columns: 292px minmax(0, 1fr); min-height: 100vh; }}
 286.sidebar {{ position: sticky; top: 0; height: 100vh; overflow: auto; border-right: 1px solid var(--line); padding: 24px 22px; background: rgba(8,9,9,.94); }}
 287.logo {{ font: 750 22px var(--mono); letter-spacing: .08em; color: var(--accent); }}
 288.subtitle {{ color: var(--muted); margin: 6px 0 22px; }}
 289.search {{ width: 100%; background: #050606; border: 1px solid var(--line); color: var(--text); border-radius: 8px; padding: 11px 12px; font: 14px var(--mono); outline: none; }}
 290.search:focus {{ border-color: var(--accent); box-shadow: 0 0 0 3px rgba(154,214,209,.12); }}
 291.nav {{ margin: 22px 0; display: grid; gap: 7px; }}
 292.nav a {{ color: var(--muted); padding: 6px 0; font: 13px var(--mono); text-transform: uppercase; letter-spacing: .12em; }}
 293.stats {{ margin-top: 24px; display: grid; gap: 10px; }}
 294.stat {{ border: 1px solid var(--line); background: var(--panel); border-radius: 10px; padding: 12px; }}
 295.stat b {{ display: block; font: 650 24px/1 var(--mono); }}
 296.stat span {{ color: var(--muted); font: 12px var(--mono); text-transform: uppercase; letter-spacing: .12em; }}
 297main {{ min-width: 0; padding: 34px 38px 80px; }}
 298.hero {{ border-bottom: 1px solid var(--line); padding-bottom: 28px; margin-bottom: 28px; }}
 299.eyebrow {{ color: var(--accent); font: 12px var(--mono); text-transform: uppercase; letter-spacing: .22em; }}
 300h1 {{ font-size: clamp(38px, 6vw, 86px); line-height: .9; margin: 14px 0 18px; letter-spacing: -.04em; }}
 301h2 {{ margin: 0; font-size: 28px; letter-spacing: -.02em; }}
 302h3 {{ margin: 0 0 6px; font-size: 16px; }}
 303.lede {{ max-width: 980px; color: #c7c6bf; font-size: 19px; }}
 304.section {{ margin: 44px 0; scroll-margin-top: 20px; }}
 305.section > header {{ display: flex; align-items: end; justify-content: space-between; gap: 20px; border-bottom: 1px solid var(--line); margin-bottom: 18px; padding-bottom: 10px; }}
 306.kicker {{ color: var(--muted); font: 12px var(--mono); text-transform: uppercase; letter-spacing: .16em; }}
 307.grid {{ display: grid; grid-template-columns: repeat(auto-fit, minmax(280px, 1fr)); gap: 14px; }}
 308.card, .file-card, .prompt, .tool, .db-card, .symbol {{ border: 1px solid var(--line); background: linear-gradient(180deg, rgba(255,255,255,.035), rgba(255,255,255,.012)); border-radius: 12px; padding: 16px; overflow: hidden; }}
 309.file-card header, .prompt header, .tool header {{ display: flex; justify-content: space-between; gap: 12px; align-items: baseline; border-bottom: 1px solid rgba(255,255,255,.06); margin: -2px 0 10px; padding-bottom: 8px; }}
 310.muted, .file-card header span, .prompt header span, .tool header span {{ color: var(--muted); }}
 311.meta {{ display: flex; flex-wrap: wrap; gap: 8px; margin: 10px 0; }}
 312.meta span, .pill {{ border: 1px solid var(--line); border-radius: 999px; padding: 2px 8px; color: var(--muted); font: 12px var(--mono); }}
 313.arch {{ display: grid; grid-template-columns: repeat(3, minmax(210px, 1fr)); gap: 12px; }}
 314.node {{ text-align: left; min-height: 126px; cursor: pointer; border: 1px solid var(--line); color: var(--text); background: var(--panel); border-radius: 14px; padding: 15px; font: inherit; transition: .15s ease; }}
 315.node:hover {{ transform: translateY(-2px); border-color: var(--accent); background: var(--panel-2); }}
 316.node strong {{ display: block; font-size: 17px; }}
 317.node span {{ display: block; color: var(--accent); font: 12px var(--mono); margin: 4px 0 8px; }}
 318.node em {{ display: block; color: var(--muted); font-style: normal; }}
 319.flow {{ counter-reset: flow; display: grid; gap: 10px; padding: 0; list-style: none; }}
 320.flow li {{ counter-increment: flow; border-left: 2px solid var(--accent); background: var(--panel); padding: 12px 14px 12px 48px; position: relative; border-radius: 8px; }}
 321.flow li:before {{ content: counter(flow); position: absolute; left: 14px; top: 12px; color: var(--accent-2); font: 700 16px var(--mono); }}
 322.flow li span {{ display: block; color: var(--muted); }}
 323details {{ margin-top: 10px; }}
 324summary {{ cursor: pointer; color: var(--accent); font: 13px var(--mono); }}
 325table {{ width: 100%; border-collapse: collapse; margin-top: 10px; font-size: 13px; }}
 326th, td {{ text-align: left; border-bottom: 1px solid rgba(255,255,255,.07); padding: 7px 8px; vertical-align: top; }}
 327th {{ color: var(--muted); font: 12px var(--mono); text-transform: uppercase; letter-spacing: .08em; }}
 328code, pre {{ font-family: var(--mono); }}
 329pre {{ max-height: 540px; overflow: auto; background: #050606; border: 1px solid var(--line); border-radius: 10px; padding: 14px; color: #dad8cf; white-space: pre-wrap; }}
 330.mini-grid {{ display: grid; grid-template-columns: repeat(auto-fit, minmax(240px, 1fr)); gap: 12px; }}
 331.symbol p {{ margin: 8px 0; }}
 332.calls {{ color: var(--muted); }}
 333.warning {{ border-color: rgba(238,155,102,.45); background: rgba(238,155,102,.08); }}
 334.hidden {{ display: none !important; }}
 335.source-list {{ display: grid; gap: 10px; }}
 336.source-file {{ border: 1px solid var(--line); border-radius: 10px; background: var(--panel); padding: 0 12px 10px; }}
 337.source-file summary {{ padding: 12px 0; color: var(--text); }}
 338.source-file summary span {{ color: var(--muted); margin-left: 8px; }}
 339.source-code {{ max-height: 620px; font-size: 12px; line-height: 1.45; white-space: pre; }}
 340.src-line {{ display: grid; grid-template-columns: 52px minmax(0, 1fr); min-height: 17px; }}
 341.src-line b {{ color: var(--faint); user-select: none; font-weight: 500; }}
 342.src-line code {{ color: #d6d3ca; }}
 343@media (max-width: 980px) {{ .shell {{ grid-template-columns: 1fr; }} .sidebar {{ position: relative; height: auto; }} main {{ padding: 24px 18px 60px; }} .arch {{ grid-template-columns: 1fr; }} }}
 344</style>
 345</head>
 346<body>
 347<div class="shell">
 348  <aside class="sidebar">
 349    <div class="logo">NIPUX ATLAS</div>
 350    <div class="subtitle">Generated from tracked source on {esc(commit)}.</div>
 351    <input id="search" class="search" placeholder="filter files, prompts, tools..." autocomplete="off">
 352    <nav class="nav">
 353      <a href="#architecture">Architecture</a>
 354      <a href="#runtime-flow">Runtime Flow</a>
 355      <a href="#prompts">Prompt Surfaces</a>
 356      <a href="#tools">Tool Registry</a>
 357      <a href="#database">Database</a>
 358      <a href="#files">Files</a>
 359      <a href="#symbols">Symbols</a>
 360      <a href="#source-browser">Source Browser</a>
 361      <a href="#tests">Tests</a>
 362      <a href="#risks">Review Points</a>
 363    </nav>
 364    <div class="stats">
 365      <div class="stat"><b>{len(files)}</b><span>tracked files</span></div>
 366      <div class="stat"><b>{total_lines:,}</b><span>tracked lines mapped</span></div>
 367      <div class="stat"><b>{len(symbols)}</b><span>python symbols</span></div>
 368      <div class="stat"><b>{len(tools)}</b><span>runtime tools</span></div>
 369    </div>
 370  </aside>
 371  <main>
 372    <section class="hero">
 373      <div class="eyebrow">Backend map / prompt audit / source index</div>
 374      <h1>Nipux Project Atlas</h1>
 375      <p class="lede">A self-contained visual map of the current backend: entrypoints, daemon loop, worker prompt assembly, durable memory, tools, SQLite schema, UI control plane, tests, and every tracked source file with parsed functions/classes and line references.</p>
 376    </section>
 377
 378    <section id="architecture" class="section">
 379      <header><div><div class="kicker">Mind map</div><h2>Architecture</h2></div><span class="muted">Click a node to jump into related detail.</span></header>
 380      <div class="arch">{architecture_nodes()}</div>
 381    </section>
 382
 383    <section id="runtime-flow" class="section">
 384      <header><div><div class="kicker">Lifecycle</div><h2>Runtime Flow</h2></div><span class="muted">What happens from terminal input to durable progress.</span></header>
 385      <ol class="flow">{runtime_flow()}</ol>
 386    </section>
 387
 388    <section id="prompts" class="section">
 389      <header><div><div class="kicker">Exact text</div><h2>Prompt Surfaces</h2></div><span class="muted">System/program prompts and instruction-like strings extracted from source.</span></header>
 390      <div class="grid">{prompt_cards}</div>
 391    </section>
 392
 393    <section id="tools" class="section">
 394      <header><div><div class="kicker">Tools</div><h2>Tool Registry</h2></div><span class="muted">Static ToolSpec definitions from nipux_cli/tools.py.</span></header>
 395      <table><thead><tr><th>Name</th><th>Description</th><th>Line</th></tr></thead><tbody>{tool_rows}</tbody></table>
 396    </section>
 397
 398    <section id="database" class="section">
 399      <header><div><div class="kicker">Persistence</div><h2>SQLite Tables</h2></div><span class="muted">CREATE TABLE blocks found in nipux_cli/db.py.</span></header>
 400      <div class="grid">{table_cards}</div>
 401    </section>
 402
 403    <section id="files" class="section">
 404      <header><div><div class="kicker">Source index</div><h2>Important Files</h2></div><span class="muted">{len(python_files)} Python modules plus docs/config files.</span></header>
 405      <div class="grid">{file_cards}</div>
 406    </section>
 407
 408    <section id="symbols" class="section">
 409      <header><div><div class="kicker">Functions and classes</div><h2>Symbol Map</h2></div><span class="muted">Parsed with Python AST.</span></header>
 410      <div class="mini-grid">{symbol_cards}</div>
 411    </section>
 412
 413    <section id="source-browser" class="section">
 414      <header><div><div class="kicker">Line-by-line</div><h2>Source Browser</h2></div><span class="muted">Collapsed raw tracked source so the backend can be inspected directly in this page.</span></header>
 415      <div class="source-list">{source_browser}</div>
 416    </section>
 417
 418    <section id="tests" class="section">
 419      <header><div><div class="kicker">Verification</div><h2>Test Coverage Map</h2></div><span class="muted">Test files included in the source index.</span></header>
 420      <div class="grid">{test_cards(files)}</div>
 421    </section>
 422
 423    <section id="risks" class="section">
 424      <header><div><div class="kicker">Audit cues</div><h2>Review Points</h2></div><span class="muted">Generated signals for where to inspect next.</span></header>
 425      <div class="grid">{risk_cards}</div>
 426    </section>
 427  </main>
 428</div>
 429<script>
 430const search = document.getElementById('search');
 431search?.addEventListener('input', () => {{
 432  const term = search.value.toLowerCase().trim();
 433  document.querySelectorAll('.searchable').forEach((node) => {{
 434    const hay = (node.getAttribute('data-search') || node.textContent || '').toLowerCase();
 435    node.classList.toggle('hidden', term && !hay.includes(term));
 436  }});
 437}});
 438document.querySelectorAll('.node[data-target]').forEach((node) => {{
 439  node.addEventListener('click', () => {{
 440    const target = document.getElementById(node.getAttribute('data-target'));
 441    if (target) target.scrollIntoView({{ behavior: 'smooth', block: 'start' }});
 442  }});
 443}});
 444</script>
 445</body>
 446</html>
 447"""
 448
 449
 450def architecture_nodes() -> str:
 451    nodes = [
 452        ("cli-tui", "CLI / TUI", "nipux_cli/cli.py", "Chat-first terminal UI, first-run menu, slash commands, job switching, event panes."),
 453        ("sqlite-state", "SQLite state", "nipux_cli/db.py", "Jobs, runs, steps, artifacts, events, ledgers, usage, and memory index."),
 454        ("daemon", "Daemon", "nipux_cli/daemon.py", "Single-instance forever loop, stale runtime fingerprint, heartbeat, work scheduling."),
 455        ("worker-loop", "Worker loop", "nipux_cli/worker.py", "Builds prompts, chooses one tool step, guards loops, records durable progress."),
 456        ("llm-adapter", "LLM adapter", "nipux_cli/llm.py", "OpenAI-compatible chat calls, usage/cost tracking, tool call parsing."),
 457        ("tool-registry", "Tool registry", "nipux_cli/tools.py", "Browser, web, shell, artifact, ledger, task, experiment, digest tools."),
 458        ("browser-web", "Browser/web", "nipux_cli/browser.py / web.py", "Visible browsing, snapshots, search/extract helpers, anti-bot source scoring."),
 459        ("artifacts-files", "Artifacts/files", "nipux_cli/artifacts.py", "Saved outputs and concrete workspace file writing."),
 460        ("memory", "Memory", "compression.py / operator_context.py", "Compact rolling memory and durable operator context."),
 461    ]
 462    return "".join(
 463        f"<button id='{esc(anchor)}' class='node searchable' data-target='source-browser' data-search='{esc(title + ' ' + path + ' ' + desc)}'>"
 464        f"<strong>{esc(title)}</strong><span>{esc(path)}</span><em>{esc(desc)}</em></button>"
 465        for anchor, title, path, desc in nodes
 466    )
 467
 468
 469def runtime_flow() -> str:
 470    steps = [
 471        ("Startup", "pyproject entrypoint calls nipux_cli.cli:main. With no args, the chat/TUI opens on the focused job or first-run workspace."),
 472        ("Operator input", "Plain chat is stored as visible events and, when relevant, durable operator context for future worker prompts."),
 473        ("Daemon scheduling", "The daemon claims runnable jobs, keeps a lock/heartbeat, starts runs, and calls one bounded worker step repeatedly."),
 474        ("Prompt assembly", "worker.build_messages layers system prompt, program template, operator context, roadmaps, tasks, ledgers, experiments, memory, timeline, and recent steps."),
 475        ("Tool call", "The LLM selects one OpenAI-style tool. The registry executes it with ToolContext and stores input/output in steps/events."),
 476        ("Progress accounting", "Guards require artifacts, findings, tasks, experiments, or milestone validation when evidence or measurements appear."),
 477        ("Persistence", "Artifacts go to the job output directory. SQLite stores steps, events, ledgers, runtime state, and usage/cost metadata."),
 478        ("UI refresh", "The TUI reads timeline/events and compact job metrics, splitting chat from worker activity and status."),
 479    ]
 480    return "".join(f"<li><strong>{esc(title)}</strong><span>{esc(body)}</span></li>" for title, body in steps)
 481
 482
 483def render_file_card(source: SourceFile, symbols: list[Symbol]) -> str:
 484    local_symbols = [symbol for symbol in symbols if symbol.path == source.path]
 485    top_names = ", ".join(symbol.name for symbol in local_symbols[:10]) or "none"
 486    doc = module_doc(source) or source.error or "No module docstring."
 487    imports = ", ".join(module_imports(source)[:12]) or "none"
 488    return f"""<article class="file-card searchable" data-search="{esc(source.path + ' ' + doc + ' ' + top_names)}">
 489<header><h3>{esc(source.path)}</h3><span>{len(source.lines)} lines</span></header>
 490<p>{esc(short(doc, 260))}</p>
 491<div class="meta"><span>{len(local_symbols)} symbols</span><span>{source.text.count('TODO')} TODOs</span></div>
 492<details><summary>Imports and top symbols</summary><p><strong>Imports:</strong> {esc(imports)}</p><p><strong>Symbols:</strong> {esc(top_names)}</p></details>
 493</article>"""
 494
 495
 496def module_doc(source: SourceFile) -> str:
 497    if source.tree is None:
 498        for line in source.lines:
 499            stripped = line.strip()
 500            if stripped and not stripped.startswith("#"):
 501                return stripped
 502        return ""
 503    return ast.get_docstring(source.tree) or ""
 504
 505
 506def module_imports(source: SourceFile) -> list[str]:
 507    if source.tree is None:
 508        return []
 509    names: list[str] = []
 510    for node in source.tree.body:
 511        if isinstance(node, ast.Import):
 512            names.extend(alias.name for alias in node.names)
 513        elif isinstance(node, ast.ImportFrom):
 514            module = "." * node.level + (node.module or "")
 515            names.append(module)
 516    return names
 517
 518
 519def render_source_file(source: SourceFile) -> str:
 520    rendered_lines = [redact_source_line(line) for line in source.lines]
 521    search_text = "\n".join(rendered_lines)
 522    code = "\n".join(
 523        f"<span class='src-line'><b>{index:>4}</b><code>{esc(line)}</code></span>"
 524        for index, line in enumerate(rendered_lines, start=1)
 525    )
 526    return f"""<details class="source-file searchable" data-search="{esc(source.path + ' ' + search_text[:4000])}">
 527<summary>{esc(source.path)} <span>{len(source.lines)} lines</span></summary>
 528<pre class="source-code">{code}</pre>
 529</details>"""
 530
 531
 532def redact_source_line(line: str) -> str:
 533    match = SENSITIVE_ASSIGNMENT_RE.match(line)
 534    if not match:
 535        return line
 536    indent, name, _space, _value = match.groups()
 537    return f"{indent}{name} = <redacted>"
 538
 539
 540def render_symbol(symbol: Symbol) -> str:
 541    doc = short(symbol.doc or "No docstring.", 180)
 542    calls = ", ".join(symbol.calls) or "none"
 543    return f"""<article class="symbol searchable" data-search="{esc(symbol.path + ' ' + symbol.name + ' ' + doc + ' ' + calls)}">
 544<h3>{esc(symbol.name)}</h3>
 545<p><span class="pill">{esc(symbol.kind)}</span> <span class="pill">{esc(symbol.path)}:{symbol.line}</span></p>
 546<p>{esc(doc)}</p>
 547<p class="calls"><strong>Calls:</strong> {esc(short(calls, 280))}</p>
 548</article>"""
 549
 550
 551def render_prompt(prompt: Prompt) -> str:
 552    title = prompt.name or "prompt"
 553    return f"""<article class="prompt searchable" data-search="{esc(prompt.path + ' ' + title + ' ' + prompt.text)}">
 554<header><h3>{esc(title)}</h3><span>{esc(prompt.path)}:{prompt.line}</span></header>
 555<p class="muted">Context: <code>{esc(prompt.context)}</code> · {len(prompt.text):,} chars</p>
 556<pre><code>{esc(prompt.text)}</code></pre>
 557</article>"""
 558
 559
 560def render_table(table: dict[str, Any]) -> str:
 561    columns = "".join(f"<li><code>{esc(column)}</code></li>" for column in table["columns"][:40])
 562    return f"""<article class="db-card searchable" data-search="{esc(table['name'] + ' ' + ' '.join(table['columns']))}">
 563<h3>{esc(table['name'])}</h3>
 564<p class="muted">nipux_cli/db.py:{table['line']}</p>
 565<ul>{columns}</ul>
 566</article>"""
 567
 568
 569def test_cards(files: list[SourceFile]) -> str:
 570    tests = [source for source in files if source.path.startswith("tests/")]
 571    cards = []
 572    for source in tests:
 573        names = []
 574        if source.tree:
 575            names = [node.name for node in ast.walk(source.tree) if isinstance(node, ast.FunctionDef) and node.name.startswith("test_")]
 576        cards.append(
 577            f"""<article class="test-card searchable" data-search="{esc(source.path + ' ' + ' '.join(names))}">
 578<h3>{esc(source.path)}</h3><p><strong>{len(names)}</strong> tests · {len(source.lines)} lines</p>
 579<p class="muted">{esc(short(', '.join(names), 320))}</p></article>"""
 580        )
 581    return "\n".join(cards)
 582
 583
 584def render_review_points(
 585    files: list[SourceFile],
 586    symbols: list[Symbol],
 587    prompts: list[Prompt],
 588    tools: list[dict[str, str]],
 589) -> str:
 590    largest = sorted(files, key=lambda source: len(source.lines), reverse=True)[:5]
 591    large_text = ", ".join(f"{source.path} ({len(source.lines)} lines)" for source in largest)
 592    prompt_text = f"{len(prompts)} prompt/instruction-like strings were extracted. Inspect this section after any agent-behavior change."
 593    tool_text = f"{len(tools)} tools are exposed to the worker. Review descriptions whenever generic behavior changes."
 594    symbol_text = f"{len(symbols)} symbols were parsed. Large modules are candidates for refactoring once behavior stabilizes."
 595    cards = [
 596        ("Large modules", large_text),
 597        ("Prompt surfaces", prompt_text),
 598        ("Tool surface", tool_text),
 599        ("Symbol map", symbol_text),
 600    ]
 601    return "\n".join(
 602        f"<article class='card warning searchable' data-search='{esc(title + ' ' + body)}'><h3>{esc(title)}</h3><p>{esc(body)}</p></article>"
 603        for title, body in cards
 604    )
 605
 606
 607def short(text: str, limit: int) -> str:
 608    clean = " ".join(str(text).split())
 609    if len(clean) <= limit:
 610        return clean
 611    return clean[: max(0, limit - 3)] + "..."
 612
 613
 614def esc(value: Any) -> str:
 615    return html.escape(str(value), quote=True)
 616
 617
 618if __name__ == "__main__":
 619    main()
scripts/live_memory_graph_smoke.py 226 lines
   1#!/usr/bin/env python3
   2"""Run an opt-in real-model smoke test for memory-graph tool calling.
   3
   4This script is intentionally outside the normal Nipux runtime path. It creates
   5an isolated temporary Nipux home, seeds generic durable job state, and verifies
   6that a configured OpenAI-compatible model can consolidate that state with the
   7`record_memory_graph` tool.
   8"""
   9
  10from __future__ import annotations
  11
  12import argparse
  13import json
  14import os
  15import shutil
  16import sys
  17import tempfile
  18from pathlib import Path
  19from typing import Any
  20
  21from nipux_cli.config import AppConfig, ModelConfig, RuntimeConfig, ToolAccessConfig
  22from nipux_cli.db import AgentDB
  23from nipux_cli.memory_graph import memory_graph_from_job
  24from nipux_cli.worker import run_one_step
  25
  26
  27DEFAULT_MODEL = "qwen/qwen3.6-27b"
  28DEFAULT_BASE_URL = "https://openrouter.ai/api/v1"
  29DEFAULT_API_KEY_ENV = <redacted>
  30
  31
  32def main() -> int:
  33    parser = argparse.ArgumentParser(description=__doc__)
  34    parser.add_argument("--model", default=DEFAULT_MODEL, help=f"OpenAI-compatible model name. Default: {DEFAULT_MODEL}")
  35    parser.add_argument("--base-url", default=DEFAULT_BASE_URL, help=f"Provider base URL. Default: {DEFAULT_BASE_URL}")
  36    parser.add_argument("--api-key-env", default=DEFAULT_API_KEY_ENV, help=f"API key env var. Default: {DEFAULT_API_KEY_ENV}")
  37    parser.add_argument("--context-length", type=int, default=262_144)
  38    parser.add_argument("--steps", type=int, default=3, help="Maximum worker turns to try.")
  39    parser.add_argument("--keep-home", action="store_true", help="Keep the temporary Nipux home for inspection.")
  40    parser.add_argument("--json", action="store_true", help="Print a machine-readable result.")
  41    args = parser.parse_args()
  42
  43    api_key = os.environ.get(args.api_key_env, "")
  44    if not api_key:
  45        return _finish(
  46            {
  47                "success": False,
  48                "error": f"{args.api_key_env} is not set",
  49                "action": f"Export {args.api_key_env} before running this live smoke. The key is never printed.",
  50            },
  51            json_output=args.json,
  52        )
  53
  54    home = Path(tempfile.mkdtemp(prefix="nipux-memory-graph-live-"))
  55    try:
  56        config = AppConfig(
  57            runtime=RuntimeConfig(home=home, max_steps_per_run=1),
  58            model=ModelConfig(
  59                model=args.model,
  60                base_url=args.base_url.rstrip("/"),
  61                api_key_env=args.api_key_env,
  62                context_length=args.context_length,
  63                request_timeout_seconds=180,
  64            ),
  65            tools=ToolAccessConfig(browser=False, web=False, shell=False, files=False),
  66        )
  67        config.ensure_dirs()
  68        db = AgentDB(config.runtime.state_db_path)
  69        try:
  70            job_id = db.create_job(
  71                "Consolidate generic durable job knowledge into an inspectable memory graph.",
  72                title="memory graph live smoke",
  73                metadata=_seed_metadata(),
  74            )
  75            db.update_job_status(job_id, "running")
  76            executions = []
  77            for _ in range(max(1, args.steps)):
  78                execution = run_one_step(job_id, config=config, db=db)
  79                executions.append(_execution_summary(execution))
  80                job = db.get_job(job_id)
  81                graph = memory_graph_from_job(job)
  82                if graph["nodes"]:
  83                    return _finish(
  84                        {
  85                            "success": True,
  86                            "home": str(home),
  87                            "model": args.model,
  88                            "base_url": args.base_url,
  89                            "job_id": job_id,
  90                            "node_count": len(graph["nodes"]),
  91                            "edge_count": len(graph["edges"]),
  92                            "executions": executions,
  93                        },
  94                        json_output=args.json,
  95                    )
  96            job = db.get_job(job_id)
  97            graph = memory_graph_from_job(job)
  98            return _finish(
  99                {
 100                    "success": False,
 101                    "home": str(home),
 102                    "model": args.model,
 103                    "base_url": args.base_url,
 104                    "job_id": job_id,
 105                    "node_count": len(graph["nodes"]),
 106                    "edge_count": len(graph["edges"]),
 107                    "executions": executions,
 108                    "error": "model did not create memory graph nodes within the step budget",
 109                },
 110                json_output=args.json,
 111            )
 112        finally:
 113            db.close()
 114    finally:
 115        if args.keep_home:
 116            print(f"kept temporary Nipux home: {home}", file=sys.stderr)
 117        else:
 118            shutil.rmtree(home, ignore_errors=True)
 119
 120
 121def _seed_metadata() -> dict[str, Any]:
 122    return {
 123        "finding_ledger": [
 124            {
 125                "name": "Durable outputs need reusable summaries",
 126                "category": "process",
 127                "reason": "Saved outputs are easier to reuse when connected to decisions and tasks.",
 128                "score": 0.82,
 129            },
 130            {
 131                "name": "Repeated branch work needs explicit rejection criteria",
 132                "category": "process",
 133                "reason": "A branch should either improve evidence, produce a deliverable, or be deprecated.",
 134                "score": 0.78,
 135            },
 136        ],
 137        "source_ledger": [
 138            {
 139                "source": "internal://recent-events",
 140                "source_type": "job_history",
 141                "usefulness_score": 0.8,
 142                "yield_count": 2,
 143                "last_outcome": "Recent events exposed reusable process knowledge.",
 144            },
 145            {
 146                "source": "internal://saved-outputs",
 147                "source_type": "artifact_index",
 148                "usefulness_score": 0.7,
 149                "yield_count": 1,
 150                "last_outcome": "Saved outputs provide evidence refs for future graph nodes.",
 151            },
 152        ],
 153        "lessons": [
 154            {
 155                "category": "strategy",
 156                "lesson": "Prefer measured or validated progress over activity counts.",
 157                "confidence": 0.86,
 158            },
 159            {
 160                "category": "memory",
 161                "lesson": "Consolidate stable findings into linked graph nodes before context grows.",
 162                "confidence": 0.9,
 163            },
 164        ],
 165        "task_queue": [
 166            {
 167                "title": "Create a connected memory graph from durable signals",
 168                "status": "open",
 169                "output_contract": "decision",
 170                "acceptance_criteria": "At least one reusable node connected to evidence or strategy.",
 171            }
 172        ],
 173        "roadmap": {
 174            "title": "Long-running job memory",
 175            "status": "active",
 176            "milestones": [
 177                {
 178                    "title": "Consolidate reusable knowledge",
 179                    "status": "open",
 180                    "validation_contract": "Future turns can retrieve the key decisions without replaying raw history.",
 181                }
 182            ],
 183        },
 184    }
 185
 186
 187def _execution_summary(execution: Any) -> dict[str, Any]:
 188    result = execution.result if isinstance(execution.result, dict) else {}
 189    return {
 190        "status": execution.status,
 191        "tool": execution.tool_name,
 192        "step_id": execution.step_id,
 193        "success": result.get("success"),
 194        "error": result.get("error"),
 195        "added_nodes": result.get("added_nodes"),
 196        "added_edges": result.get("added_edges"),
 197    }
 198
 199
 200def _finish(payload: dict[str, Any], *, json_output: bool) -> int:
 201    if json_output:
 202        print(json.dumps(payload, indent=2, sort_keys=True))
 203    else:
 204        print(_human_summary(payload))
 205    return 0 if payload.get("success") else 1
 206
 207
 208def _human_summary(payload: dict[str, Any]) -> str:
 209    lines = [f"success: {bool(payload.get('success'))}"]
 210    for key in ("model", "base_url", "home", "job_id", "node_count", "edge_count", "error", "action"):
 211        if payload.get(key) is not None:
 212            lines.append(f"{key}: {payload[key]}")
 213    executions = payload.get("executions")
 214    if isinstance(executions, list) and executions:
 215        lines.append("executions:")
 216        for item in executions:
 217            lines.append(
 218                "  - "
 219                f"status={item.get('status')} tool={item.get('tool')} "
 220                f"success={item.get('success')} error={item.get('error')}"
 221            )
 222    return "\n".join(lines)
 223
 224
 225if __name__ == "__main__":
 226    raise SystemExit(main())
scripts/render_nipux_ascii_video.py 523 lines
   1#!/usr/bin/env python3
   2"""Render a Nipux ASCII-art CLI intro as an MP4.
   3
   4The renderer is dependency-light on purpose: it draws a small embedded
   5bitmap font into raw RGB frames and pipes those frames directly to ffmpeg.
   6"""
   7
   8from __future__ import annotations
   9
  10import argparse
  11import math
  12import random
  13import shutil
  14import subprocess
  15from dataclasses import dataclass
  16from pathlib import Path
  17
  18
  19WIDTH = 1440
  20HEIGHT = 900
  21FPS = 30
  22DURATION = 8.0
  23COLS = 96
  24ROWS = 34
  25CELL_W = 13
  26CELL_H = 22
  27ORIGIN_X = (WIDTH - COLS * CELL_W) // 2
  28ORIGIN_Y = (HEIGHT - ROWS * CELL_H) // 2
  29SCALE = 2
  30
  31BG = (2, 5, 5)
  32BG_SCAN = (1, 3, 4)
  33PANEL = (4, 10, 8)
  34PANEL_EDGE = (10, 56, 37)
  35PANEL_GLOW = (4, 30, 24)
  36DIM = (33, 92, 62)
  37MID = (72, 180, 112)
  38MAIN = (126, 255, 164)
  39HOT = (104, 238, 255)
  40AMBER = (255, 184, 70)
  41MAGENTA = (255, 95, 154)
  42WHITE = (220, 255, 238)
  43
  44LOGO = [
  45    r"##   ##  ####  ######  ##   ##  ##   ##",
  46    r"###  ##   ##   ##   ## ##   ##   ## ## ",
  47    r"#### ##   ##   ##   ## ##   ##    ###  ",
  48    r"## ####   ##   ######  ##   ##    ###  ",
  49    r"##  ###   ##   ##      ##   ##   ## ## ",
  50    r"##   ##  ####  ##       #####   ##   ##",
  51]
  52
  53GLITCH_CHARS = ".:-=+*#%@/\\|<>[]{}01"
  54RAIN_CHARS = ".:-+*#01/\\<>"
  55
  56
  57GLYPHS: dict[str, tuple[str, ...]] = {
  58    " ": ("00000", "00000", "00000", "00000", "00000", "00000", "00000"),
  59    "A": ("01110", "10001", "10001", "11111", "10001", "10001", "10001"),
  60    "B": ("11110", "10001", "10001", "11110", "10001", "10001", "11110"),
  61    "C": ("01111", "10000", "10000", "10000", "10000", "10000", "01111"),
  62    "D": ("11110", "10001", "10001", "10001", "10001", "10001", "11110"),
  63    "E": ("11111", "10000", "10000", "11110", "10000", "10000", "11111"),
  64    "F": ("11111", "10000", "10000", "11110", "10000", "10000", "10000"),
  65    "G": ("01111", "10000", "10000", "10011", "10001", "10001", "01110"),
  66    "H": ("10001", "10001", "10001", "11111", "10001", "10001", "10001"),
  67    "I": ("01110", "00100", "00100", "00100", "00100", "00100", "01110"),
  68    "J": ("00111", "00010", "00010", "00010", "10010", "10010", "01100"),
  69    "K": ("10001", "10010", "10100", "11000", "10100", "10010", "10001"),
  70    "L": ("10000", "10000", "10000", "10000", "10000", "10000", "11111"),
  71    "M": ("10001", "11011", "10101", "10101", "10001", "10001", "10001"),
  72    "N": ("10001", "11001", "10101", "10011", "10001", "10001", "10001"),
  73    "O": ("01110", "10001", "10001", "10001", "10001", "10001", "01110"),
  74    "P": ("11110", "10001", "10001", "11110", "10000", "10000", "10000"),
  75    "Q": ("01110", "10001", "10001", "10001", "10101", "10010", "01101"),
  76    "R": ("11110", "10001", "10001", "11110", "10100", "10010", "10001"),
  77    "S": ("01111", "10000", "10000", "01110", "00001", "00001", "11110"),
  78    "T": ("11111", "00100", "00100", "00100", "00100", "00100", "00100"),
  79    "U": ("10001", "10001", "10001", "10001", "10001", "10001", "01110"),
  80    "V": ("10001", "10001", "10001", "10001", "10001", "01010", "00100"),
  81    "W": ("10001", "10001", "10001", "10101", "10101", "10101", "01010"),
  82    "X": ("10001", "10001", "01010", "00100", "01010", "10001", "10001"),
  83    "Y": ("10001", "10001", "01010", "00100", "00100", "00100", "00100"),
  84    "Z": ("11111", "00001", "00010", "00100", "01000", "10000", "11111"),
  85    "a": ("00000", "00000", "01110", "00001", "01111", "10001", "01111"),
  86    "b": ("10000", "10000", "10110", "11001", "10001", "10001", "11110"),
  87    "c": ("00000", "00000", "01110", "10000", "10000", "10001", "01110"),
  88    "d": ("00001", "00001", "01101", "10011", "10001", "10001", "01111"),
  89    "e": ("00000", "00000", "01110", "10001", "11111", "10000", "01110"),
  90    "f": ("00110", "01001", "01000", "11100", "01000", "01000", "01000"),
  91    "g": ("00000", "01111", "10001", "10001", "01111", "00001", "01110"),
  92    "h": ("10000", "10000", "10110", "11001", "10001", "10001", "10001"),
  93    "i": ("00100", "00000", "01100", "00100", "00100", "00100", "01110"),
  94    "j": ("00010", "00000", "00110", "00010", "00010", "10010", "01100"),
  95    "k": ("10000", "10000", "10010", "10100", "11000", "10100", "10010"),
  96    "l": ("01100", "00100", "00100", "00100", "00100", "00100", "01110"),
  97    "m": ("00000", "00000", "11010", "10101", "10101", "10101", "10101"),
  98    "n": ("00000", "00000", "10110", "11001", "10001", "10001", "10001"),
  99    "o": ("00000", "00000", "01110", "10001", "10001", "10001", "01110"),
 100    "p": ("00000", "00000", "11110", "10001", "11110", "10000", "10000"),
 101    "q": ("00000", "00000", "01101", "10011", "01111", "00001", "00001"),
 102    "r": ("00000", "00000", "10110", "11001", "10000", "10000", "10000"),
 103    "s": ("00000", "00000", "01111", "10000", "01110", "00001", "11110"),
 104    "t": ("01000", "01000", "11100", "01000", "01000", "01001", "00110"),
 105    "u": ("00000", "00000", "10001", "10001", "10001", "10011", "01101"),
 106    "v": ("00000", "00000", "10001", "10001", "10001", "01010", "00100"),
 107    "w": ("00000", "00000", "10001", "10001", "10101", "10101", "01010"),
 108    "x": ("00000", "00000", "10001", "01010", "00100", "01010", "10001"),
 109    "y": ("00000", "00000", "10001", "10001", "01111", "00001", "01110"),
 110    "z": ("00000", "00000", "11111", "00010", "00100", "01000", "11111"),
 111    "0": ("01110", "10001", "10011", "10101", "11001", "10001", "01110"),
 112    "1": ("00100", "01100", "00100", "00100", "00100", "00100", "01110"),
 113    "2": ("01110", "10001", "00001", "00010", "00100", "01000", "11111"),
 114    "3": ("11110", "00001", "00001", "01110", "00001", "00001", "11110"),
 115    "4": ("00010", "00110", "01010", "10010", "11111", "00010", "00010"),
 116    "5": ("11111", "10000", "10000", "11110", "00001", "00001", "11110"),
 117    "6": ("01110", "10000", "10000", "11110", "10001", "10001", "01110"),
 118    "7": ("11111", "00001", "00010", "00100", "01000", "01000", "01000"),
 119    "8": ("01110", "10001", "10001", "01110", "10001", "10001", "01110"),
 120    "9": ("01110", "10001", "10001", "01111", "00001", "00001", "01110"),
 121    ".": ("00000", "00000", "00000", "00000", "00000", "01100", "01100"),
 122    ",": ("00000", "00000", "00000", "00000", "00000", "01100", "01000"),
 123    ":": ("00000", "01100", "01100", "00000", "01100", "01100", "00000"),
 124    ";": ("00000", "01100", "01100", "00000", "01100", "01000", "10000"),
 125    "!": ("00100", "00100", "00100", "00100", "00100", "00000", "00100"),
 126    "?": ("01110", "10001", "00001", "00010", "00100", "00000", "00100"),
 127    "'": ("00100", "00100", "01000", "00000", "00000", "00000", "00000"),
 128    '"': ("01010", "01010", "01010", "00000", "00000", "00000", "00000"),
 129    "-": ("00000", "00000", "00000", "11111", "00000", "00000", "00000"),
 130    "_": ("00000", "00000", "00000", "00000", "00000", "00000", "11111"),
 131    "+": ("00000", "00100", "00100", "11111", "00100", "00100", "00000"),
 132    "=": ("00000", "00000", "11111", "00000", "11111", "00000", "00000"),
 133    "*": ("00000", "10101", "01110", "11111", "01110", "10101", "00000"),
 134    "#": ("01010", "11111", "01010", "01010", "11111", "01010", "01010"),
 135    "@": ("01110", "10001", "10111", "10101", "10111", "10000", "01110"),
 136    "%": ("11001", "11010", "00100", "01000", "10110", "00110", "00000"),
 137    "$": ("00100", "01111", "10100", "01110", "00101", "11110", "00100"),
 138    "&": ("01100", "10010", "10100", "01000", "10101", "10010", "01101"),
 139    "/": ("00001", "00010", "00100", "01000", "10000", "00000", "00000"),
 140    "\\": ("10000", "01000", "00100", "00010", "00001", "00000", "00000"),
 141    "|": ("00100", "00100", "00100", "00100", "00100", "00100", "00100"),
 142    "<": ("00010", "00100", "01000", "10000", "01000", "00100", "00010"),
 143    ">": ("01000", "00100", "00010", "00001", "00010", "00100", "01000"),
 144    "(": ("00010", "00100", "01000", "01000", "01000", "00100", "00010"),
 145    ")": ("01000", "00100", "00010", "00010", "00010", "00100", "01000"),
 146    "[": ("01110", "01000", "01000", "01000", "01000", "01000", "01110"),
 147    "]": ("01110", "00010", "00010", "00010", "00010", "00010", "01110"),
 148    "{": ("00010", "00100", "00100", "01000", "00100", "00100", "00010"),
 149    "}": ("01000", "00100", "00100", "00010", "00100", "00100", "01000"),
 150    "~": ("00000", "00000", "01001", "10110", "00000", "00000", "00000"),
 151    "^": ("00100", "01010", "10001", "00000", "00000", "00000", "00000"),
 152}
 153
 154
 155@dataclass(frozen=True)
 156class Cell:
 157    char: str = " "
 158    color: tuple[int, int, int] = DIM
 159
 160
 161class TextGrid:
 162    def __init__(self) -> None:
 163        self.cells = [[Cell() for _ in range(COLS)] for _ in range(ROWS)]
 164
 165    def set(self, x: int, y: int, char: str, color: tuple[int, int, int]) -> None:
 166        if 0 <= x < COLS and 0 <= y < ROWS and char:
 167            self.cells[y][x] = Cell(char[0], color)
 168
 169    def put(self, x: int, y: int, text: str, color: tuple[int, int, int]) -> None:
 170        for offset, char in enumerate(text):
 171            self.set(x + offset, y, char, color)
 172
 173    def center(self, y: int, text: str, color: tuple[int, int, int]) -> None:
 174        self.put((COLS - len(text)) // 2, y, text, color)
 175
 176    def box(self, title: str, color: tuple[int, int, int]) -> None:
 177        top = "+" + "-" * (COLS - 2) + "+"
 178        bottom = "+" + "-" * (COLS - 2) + "+"
 179        self.put(0, 0, top, color)
 180        self.put(0, ROWS - 1, bottom, color)
 181        for y in range(1, ROWS - 1):
 182            self.set(0, y, "|", color)
 183            self.set(COLS - 1, y, "|", color)
 184        label = f" {title} "
 185        self.put(3, 0, label[: COLS - 8], color)
 186
 187
 188def clamp(value: float, low: float = 0.0, high: float = 1.0) -> float:
 189    return max(low, min(high, value))
 190
 191
 192def ease(value: float) -> float:
 193    value = clamp(value)
 194    return value * value * (3.0 - 2.0 * value)
 195
 196
 197def mix(a: tuple[int, int, int], b: tuple[int, int, int], amount: float) -> tuple[int, int, int]:
 198    amount = clamp(amount)
 199    return tuple(int(a[i] + (b[i] - a[i]) * amount) for i in range(3))
 200
 201
 202def logo_origin() -> tuple[int, int]:
 203    max_width = max(len(line) for line in LOGO)
 204    return (COLS - max_width) // 2, 8
 205
 206
 207def put_logo(grid: TextGrid, frame: int, reveal: float, stable: bool = False) -> None:
 208    left, top = logo_origin()
 209    for y, line in enumerate(LOGO):
 210        for x, char in enumerate(line):
 211            if char == " ":
 212                continue
 213            rng = random.Random(frame * 1009 + x * 97 + y * 53)
 214            if stable or rng.random() < reveal:
 215                shown = char
 216                color = HOT if stable or rng.random() > 0.18 else AMBER
 217                if stable and rng.random() < 0.015:
 218                    shown = rng.choice("*+#")
 219                    color = WHITE
 220                elif not stable and rng.random() > reveal + 0.18:
 221                    shown = rng.choice(GLITCH_CHARS)
 222                    color = MAGENTA
 223                grid.set(left + x, top + y, shown, color)
 224            elif rng.random() < 0.08 + 0.22 * reveal:
 225                grid.set(left + x, top + y, rng.choice(GLITCH_CHARS), mix(DIM, HOT, 0.35))
 226
 227
 228def put_collapsing_logo(grid: TextGrid, frame: int, progress: float) -> None:
 229    left, top = logo_origin()
 230    cursor_y = 24
 231    glyph_index = 0
 232    for y, line in enumerate(LOGO):
 233        for x, char in enumerate(line):
 234            if char == " ":
 235                continue
 236            rng = random.Random(frame * 1493 + x * 31 + y * 41)
 237            target_x = 8 + (glyph_index % 30)
 238            target_y = cursor_y + (glyph_index // 30) % 2
 239            wobble = math.sin(frame * 0.35 + glyph_index * 0.9) * (1.0 - progress) * 2.0
 240            px = round((left + x) * (1.0 - progress) + target_x * progress + wobble)
 241            py = round((top + y) * (1.0 - progress) + target_y * progress)
 242            shown = char if progress < 0.68 else rng.choice("nipux$>_-/")
 243            color = mix(HOT, MAIN, progress)
 244            if rng.random() < 0.08:
 245                shown = rng.choice(GLITCH_CHARS)
 246                color = MAGENTA
 247            grid.set(px, py, shown, color)
 248            glyph_index += 1
 249
 250
 251def put_progress_bar(grid: TextGrid, x: int, y: int, width: int, progress: float) -> None:
 252    progress = clamp(progress)
 253    filled = int(round(width * progress))
 254    bar = "[" + "#" * filled + "-" * (width - filled) + "]"
 255    grid.put(x, y, bar, MID)
 256    grid.put(x + 1, y, "#" * filled, HOT if progress > 0.88 else MAIN)
 257    grid.put(x + width + 4, y, f"{int(progress * 100):03d}%", AMBER if progress < 1.0 else HOT)
 258
 259
 260def put_rain(grid: TextGrid, frame: int, intensity: float) -> None:
 261    if intensity <= 0:
 262        return
 263    for x in range(2, COLS - 2):
 264        rng = random.Random(9001 + x * 113)
 265        stream_speed = 1 + rng.randint(0, 2)
 266        head = (frame * stream_speed + rng.randint(0, ROWS * 3)) % (ROWS + 16) - 8
 267        for trail in range(5):
 268            y = head - trail
 269            if 2 <= y < ROWS - 2 and random.Random(frame * 313 + x * 17 + trail).random() < intensity:
 270                char = random.Random(frame * 997 + x * 19 + trail * 11).choice(RAIN_CHARS)
 271                fade = max(0.18, 1.0 - trail * 0.18)
 272                color = mix((8, 28, 20), MID, fade * intensity)
 273                grid.set(x, y, char, color)
 274
 275
 276def put_boot_lines(grid: TextGrid, frame: int, t: float) -> None:
 277    lines = [
 278        "$ nipux video --ascii --into-cli",
 279        "[scan] terminal cells online",
 280        "[map ] routing logo glyphs",
 281        "[sync] prompt target locked",
 282    ]
 283    start_y = 22
 284    for i, line in enumerate(lines):
 285        reveal = int(clamp((t - 0.12 - i * 0.22) / 0.32) * len(line))
 286        color = MAIN if i == 0 else MID
 287        grid.put(7, start_y + i, line[:reveal], color)
 288    put_progress_bar(grid, 7, 28, 38, clamp(t / 1.12))
 289
 290
 291def put_cli(grid: TextGrid, frame: int, t: float) -> None:
 292    command = "$ nipux enter --render ascii"
 293    start = 4.78
 294    typed = int(clamp((t - start) / 1.08) * len(command))
 295    grid.put(7, 24, command[:typed], MAIN)
 296    cursor_x = 7 + typed
 297    if frame // 8 % 2 == 0 and typed < len(command):
 298        grid.set(cursor_x, 24, "_", HOT)
 299
 300    if t > 5.9:
 301        grid.put(7, 26, "[ok] word packed into cli prompt", MID)
 302    if t > 6.26:
 303        grid.put(7, 27, "[ok] ascii signal clean", MID)
 304    if t > 6.58:
 305        put_progress_bar(grid, 7, 29, 42, clamp((t - 6.55) / 0.62))
 306    if t > 7.18:
 307        final = "nipux> "
 308        grid.put(7, 31, final, HOT)
 309        if frame // 10 % 2 == 0:
 310            grid.set(7 + len(final), 31, "_", HOT)
 311
 312
 313def build_grid(frame: int, total_frames: int) -> TextGrid:
 314    t = frame / FPS
 315    grid = TextGrid()
 316    grid.box("nipux ascii cli capture", DIM)
 317    grid.put(COLS - 25, 0, " render:rawrgb->mp4 ", DIM)
 318    grid.put(4, 2, "MODE ASCII/CLI", MID)
 319    grid.put(COLS - 23, 2, f"FRAME {frame:03d}/{total_frames - 1:03d}", DIM)
 320
 321    rain_intensity = 0.36
 322    if t > 5.0:
 323        rain_intensity *= 0.35
 324    put_rain(grid, frame, rain_intensity)
 325
 326    if t < 1.18:
 327        put_boot_lines(grid, frame, t)
 328    elif t < 2.75:
 329        put_boot_lines(grid, frame, 1.18)
 330        reveal = ease((t - 1.18) / 1.42)
 331        put_logo(grid, frame, reveal)
 332        grid.center(16, "glyphs are snapping into nipux", DIM)
 333    elif t < 3.72:
 334        put_logo(grid, frame, 1.0, stable=True)
 335        grid.center(16, "nipux", WHITE if frame // 7 % 2 == 0 else HOT)
 336        grid.center(18, "pressing the word into a command line", DIM)
 337    elif t < 5.08:
 338        progress = ease((t - 3.72) / 1.36)
 339        put_collapsing_logo(grid, frame, progress)
 340        grid.put(7, 24, "$ ", MAIN)
 341        if progress > 0.45:
 342            partial = "nipux"[: int((progress - 0.45) / 0.55 * 5)]
 343            grid.put(9, 24, partial, HOT)
 344        grid.center(18, "collapsing ascii mass -> cli input", AMBER)
 345    else:
 346        put_cli(grid, frame, t)
 347
 348    if t > 4.8:
 349        grid.put(COLS - 27, 30, "STATUS: PROMPT CONTROL", DIM)
 350    elif t > 2.0:
 351        grid.put(COLS - 24, 30, "STATUS: GLYPH LOCK", DIM)
 352    else:
 353        grid.put(COLS - 23, 30, "STATUS: BOOT RAIL", DIM)
 354
 355    return grid
 356
 357
 358def draw_rect(buf: bytearray, x: int, y: int, w: int, h: int, color: tuple[int, int, int]) -> None:
 359    x0 = max(0, x)
 360    y0 = max(0, y)
 361    x1 = min(WIDTH, x + w)
 362    y1 = min(HEIGHT, y + h)
 363    if x0 >= x1 or y0 >= y1:
 364        return
 365    row = bytes(color) * (x1 - x0)
 366    for py in range(y0, y1):
 367        start = (py * WIDTH + x0) * 3
 368        buf[start : start + len(row)] = row
 369
 370
 371def build_base_frame() -> bytearray:
 372    buf = bytearray(WIDTH * HEIGHT * 3)
 373    for y in range(HEIGHT):
 374        color = BG_SCAN if y % 4 == 0 else BG
 375        row = bytes(color) * WIDTH
 376        start = y * WIDTH * 3
 377        buf[start : start + len(row)] = row
 378
 379    panel_x = ORIGIN_X - 28
 380    panel_y = ORIGIN_Y - 28
 381    panel_w = COLS * CELL_W + 56
 382    panel_h = ROWS * CELL_H + 56
 383    draw_rect(buf, panel_x - 8, panel_y - 8, panel_w + 16, panel_h + 16, PANEL_GLOW)
 384    draw_rect(buf, panel_x, panel_y, panel_w, panel_h, PANEL)
 385    draw_rect(buf, panel_x, panel_y, panel_w, 2, PANEL_EDGE)
 386    draw_rect(buf, panel_x, panel_y + panel_h - 2, panel_w, 2, PANEL_EDGE)
 387    draw_rect(buf, panel_x, panel_y, 2, panel_h, PANEL_EDGE)
 388    draw_rect(buf, panel_x + panel_w - 2, panel_y, 2, panel_h, PANEL_EDGE)
 389
 390    for y in range(panel_y + 36, panel_y + panel_h - 20, 44):
 391        draw_rect(buf, panel_x + 18, y, panel_w - 36, 1, (5, 24, 18))
 392    return buf
 393
 394
 395BASE_FRAME = build_base_frame()
 396
 397
 398def glyph_for(char: str) -> tuple[str, ...]:
 399    return GLYPHS.get(char, GLYPHS.get(char.upper(), GLYPHS["?"]))
 400
 401
 402GLYPHS["?"] = GLYPHS["?"] if "?" in GLYPHS else ("01110", "10001", "00010", "00100", "00100", "00000", "00100")
 403
 404
 405def draw_glyph(buf: bytearray, char: str, x: int, y: int, color: tuple[int, int, int], glow: bool) -> None:
 406    glyph = glyph_for(char)
 407    if glyph is GLYPHS[" "]:
 408        return
 409    if glow:
 410        glow_color = tuple(max(0, int(c * 0.16)) for c in color)
 411    for row_i, row in enumerate(glyph):
 412        for col_i, bit in enumerate(row):
 413            if bit != "1":
 414                continue
 415            px = x + col_i * SCALE
 416            py = y + row_i * SCALE
 417            if glow:
 418                draw_rect(buf, px - 1, py - 1, SCALE + 2, SCALE + 2, glow_color)
 419            draw_rect(buf, px, py, SCALE, SCALE, color)
 420
 421
 422def render_frame(frame: int, total_frames: int) -> bytes:
 423    grid = build_grid(frame, total_frames)
 424    buf = bytearray(BASE_FRAME)
 425    jitter = 1 if frame % 37 == 0 else 0
 426    for y, row in enumerate(grid.cells):
 427        for x, cell in enumerate(row):
 428            if cell.char == " ":
 429                continue
 430            px = ORIGIN_X + x * CELL_W + 1 + jitter
 431            py = ORIGIN_Y + y * CELL_H + 4
 432            bright = sum(cell.color) > 420
 433            draw_glyph(buf, cell.char, px, py, cell.color, bright)
 434
 435    # A light CRT sweep, sparse enough to keep text readable.
 436    sweep_y = int((frame * 9) % HEIGHT)
 437    draw_rect(buf, 0, sweep_y, WIDTH, 2, (4, 18, 16))
 438    return bytes(buf)
 439
 440
 441def render_video(output: Path, poster: Path | None) -> None:
 442    ffmpeg = shutil.which("ffmpeg")
 443    if not ffmpeg:
 444        raise SystemExit("ffmpeg was not found on PATH")
 445
 446    output.parent.mkdir(parents=True, exist_ok=True)
 447    total_frames = int(FPS * DURATION)
 448    cmd = [
 449        ffmpeg,
 450        "-hide_banner",
 451        "-loglevel",
 452        "error",
 453        "-y",
 454        "-f",
 455        "rawvideo",
 456        "-pix_fmt",
 457        "rgb24",
 458        "-s",
 459        f"{WIDTH}x{HEIGHT}",
 460        "-r",
 461        str(FPS),
 462        "-i",
 463        "-",
 464        "-an",
 465        "-c:v",
 466        "libx264",
 467        "-preset",
 468        "medium",
 469        "-crf",
 470        "18",
 471        "-pix_fmt",
 472        "yuv420p",
 473        "-movflags",
 474        "+faststart",
 475        str(output),
 476    ]
 477    process = subprocess.Popen(cmd, stdin=subprocess.PIPE, stderr=subprocess.PIPE)
 478    assert process.stdin is not None
 479    for frame in range(total_frames):
 480        process.stdin.write(render_frame(frame, total_frames))
 481        if frame % FPS == 0:
 482            print(f"rendered {frame // FPS:02d}s/{int(DURATION):02d}s")
 483    process.stdin.close()
 484    stderr = process.stderr.read().decode("utf-8", errors="replace") if process.stderr else ""
 485    return_code = process.wait()
 486    if return_code != 0:
 487        raise SystemExit(f"ffmpeg failed with exit code {return_code}\n{stderr}")
 488
 489    if poster:
 490        poster.parent.mkdir(parents=True, exist_ok=True)
 491        poster_cmd = [
 492            ffmpeg,
 493            "-hide_banner",
 494            "-loglevel",
 495            "error",
 496            "-y",
 497            "-ss",
 498            "00:00:03.05",
 499            "-i",
 500            str(output),
 501            "-frames:v",
 502            "1",
 503            str(poster),
 504        ]
 505        subprocess.run(poster_cmd, check=True)
 506
 507
 508def parse_args() -> argparse.Namespace:
 509    parser = argparse.ArgumentParser(description="Render the Nipux ASCII CLI intro video.")
 510    parser.add_argument("--output", type=Path, default=Path("docs/nipux_ascii_cli.mp4"))
 511    parser.add_argument("--poster", type=Path, default=Path("docs/nipux_ascii_cli_poster.png"))
 512    return parser.parse_args()
 513
 514
 515def main() -> None:
 516    args = parse_args()
 517    render_video(args.output, args.poster)
 518    print(f"video:  {args.output}")
 519    print(f"poster: {args.poster}")
 520
 521
 522if __name__ == "__main__":
 523    main()
tests/nipux_cli/test_artifacts.py 34 lines
   1import pytest
   2
   3from nipux_cli.artifacts import ArtifactStore
   4from nipux_cli.db import AgentDB
   5
   6
   7def test_artifact_store_writes_reads_and_searches(tmp_path):
   8    db = AgentDB(tmp_path / "state.db")
   9    try:
  10        job_id = db.create_job("Collect findings")
  11        store = ArtifactStore(tmp_path, db=db)
  12
  13        stored = store.write_text(
  14            job_id=job_id,
  15            title="Findings list",
  16            summary="contains acme finding",
  17            content="Acme Corp\ncontact: founder@example.com\n",
  18        )
  19
  20        assert store.read_text(stored.id).startswith("Acme Corp")
  21        results = store.search_text(job_id=job_id, query="founder", limit=5)
  22        assert results[0]["id"] == stored.id
  23        assert "founder@example.com" in results[0]["excerpt"]
  24    finally:
  25        db.close()
  26
  27
  28def test_artifact_store_rejects_paths_outside_home(tmp_path):
  29    store = ArtifactStore(tmp_path)
  30    outside = tmp_path.parent / "outside.txt"
  31    outside.write_text("nope", encoding="utf-8")
  32
  33    with pytest.raises(ValueError):
  34        store.read_text(str(outside))
tests/nipux_cli/test_browser_web.py 118 lines
   1import json
   2
   3from nipux_cli.browser import _annotate_source_quality, _session_name, _socket_dir
   4from nipux_cli.tools import DEFAULT_REGISTRY, ToolContext
   5from nipux_cli.web import _strip_html
   6from nipux_cli.artifacts import ArtifactStore
   7from nipux_cli.config import AppConfig, RuntimeConfig
   8from nipux_cli.db import AgentDB
   9
  10
  11def test_session_name_is_stable_and_safe():
  12    assert _session_name("job_abc/def") == "nipux_job_abc_def"
  13
  14
  15def test_long_session_name_is_short_and_hashed():
  16    task_id = "research-a-very-long-objective-title-that-needs-a-short-browser-session-name"
  17    name = _session_name(task_id)
  18    socket_dir = _socket_dir(task_id)
  19
  20    assert name.startswith("nipux_research-a-very-long")
  21    assert len(name) <= 37
  22    assert len(str(socket_dir)) < 80
  23
  24
  25def test_strip_html_removes_scripts_and_keeps_text():
  26    text = _strip_html("<html><script>bad()</script><h1>Hello</h1><p>World</p></html>")
  27    assert "bad" not in text
  28    assert "Hello" in text
  29    assert "World" in text
  30
  31
  32def test_browser_marks_anti_bot_interstitial_as_warning():
  33    result = _annotate_source_quality({
  34        "success": True,
  35        "data": {"title": "Just a moment...", "url": "https://clutch.co/example"},
  36        "snapshot": "Performing security verification. Cloudflare security challenge.",
  37    })
  38
  39    assert result["success"] is True
  40    assert "error" not in result
  41    assert result["source_warning"] == "cloudflare anti-bot challenge"
  42    assert result["warnings"][0]["type"] == "anti_bot"
  43
  44
  45def test_browser_marks_captcha_block_as_warning():
  46    result = _annotate_source_quality({
  47        "success": True,
  48        "data": {"title": "Source search", "url": "https://source.example/search"},
  49        "snapshot": 'Iframe "Security CAPTCHA" You have been blocked. You are browsing and clicking at a speed much faster than expected.',
  50    })
  51
  52    assert result["source_warning"] == "captcha/anti-bot block"
  53    assert result["warnings"][0]["type"] == "anti_bot"
  54
  55
  56def test_web_extract_marks_anti_bot_pages_as_warning(monkeypatch):
  57    from nipux_cli import web
  58
  59    def fake_request(url):
  60        del url
  61        return "<h1>Performing security verification</h1><p>Cloudflare security challenge</p>", "text/html"
  62
  63    monkeypatch.setattr(web, "_request", fake_request)
  64    result = web.web_extract(["https://clutch.co/example"])
  65
  66    page = result["pages"][0]
  67    assert "error" not in page
  68    assert page["source_warning"] == "cloudflare anti-bot challenge"
  69    assert page["warnings"][0]["type"] == "anti_bot"
  70    assert "Cloudflare security challenge" in page["text"]
  71
  72
  73def test_browser_tool_uses_native_wrapper(monkeypatch, tmp_path):
  74    from nipux_cli import browser
  75
  76    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
  77    db = AgentDB(tmp_path / "state.db")
  78    try:
  79        job_id = db.create_job("Browse")
  80        ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id)
  81
  82        def fake_navigate(cfg, *, task_id, url):
  83            return {"success": True, "task_id": task_id, "url": url}
  84
  85        monkeypatch.setattr(browser, "navigate", fake_navigate)
  86        result = json.loads(DEFAULT_REGISTRY.handle("browser_navigate", {"url": "https://example.com"}, ctx))
  87
  88        assert result == {"success": True, "task_id": job_id, "url": "https://example.com"}
  89    finally:
  90        db.close()
  91
  92
  93def test_browser_click_adds_recovery_snapshot_for_stale_ref(monkeypatch, tmp_path):
  94    from nipux_cli import browser
  95
  96    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
  97    calls = []
  98
  99    def fake_command(cfg, *, task_id, command, args=None, timeout=60):
 100        del cfg, task_id, args, timeout
 101        calls.append(command)
 102        if command == "click":
 103            return {"success": False, "error": "Unknown ref: e102"}
 104        return {
 105            "success": True,
 106            "data": {
 107                "snapshot": "Directory",
 108                "refs": {"e1": {"role": "link", "name": "New Result"}},
 109            },
 110        }
 111
 112    monkeypatch.setattr(browser, "run_browser_command", fake_command)
 113    result = browser.click(config, task_id="job_abc", ref="@e102")
 114
 115    assert calls == ["click", "snapshot"]
 116    assert result["success"] is False
 117    assert result["recovery_snapshot"]["data"]["refs"]["e1"]["name"] == "New Result"
 118    assert "stale" in result["recovery_guidance"]
tests/nipux_cli/test_cli.py 4970 lines
   1import json
   2import queue
   3import subprocess
   4import time
   5from pathlib import Path
   6
   7from nipux_cli.artifacts import ArtifactStore
   8from nipux_cli import __version__
   9from nipux_cli.chat_frame_runtime import ChatFrameDeps as _ChatFrameDeps
  10from nipux_cli.chat_frame_runtime import THINKING_NOTICE as _THINKING_NOTICE
  11from nipux_cli.chat_frame_runtime import WAITING_NOTICE as _WAITING_NOTICE
  12from nipux_cli.chat_frame_runtime import _display_notices as _display_chat_notices
  13from nipux_cli.chat_frame_runtime import _drain_async_notices as _drain_chat_async_notices
  14from nipux_cli.chat_frame_runtime import _handle_edit_input as _handle_chat_edit_input
  15from nipux_cli.chat_frame_runtime import _handle_chat_submit
  16from nipux_cli.chat_frame_runtime import _safe_render_frame as _safe_chat_render_frame
  17from nipux_cli.chat_frame_runtime import frame_next_job_id as _frame_next_job_id
  18from nipux_cli.chat_frame_runtime import frame_refresh_interval as _frame_refresh_interval
  19from nipux_cli.cli import (
  20    _build_first_run_frame,
  21    _build_chat_frame,
  22    _build_chat_messages,
  23    _chat_handle_line,
  24    _chat_control_command,
  25    _capture_chat_command,
  26    _config_field_value,
  27    _emit_frame_if_changed,
  28    _first_run_click_action,
  29    _handle_first_run_action,
  30    _handle_first_run_menu_line,
  31    _handle_first_run_frame_line,
  32    _handle_chat_message,
  33    _handle_workspace_chat_message,
  34    _is_plain_chat_line,
  35    _launch_agent_plist,
  36    _load_frame_snapshot,
  37    _minimal_live_event_line,
  38    _print_shell_help,
  39    _run_shell_line,
  40    _save_config_field,
  41    _slash_suggestion_lines,
  42    _systemd_service_text,
  43    _verify_model_setup_from_first_run,
  44    _workspace_chat_job_dossier,
  45    build_parser,
  46    main,
  47)
  48from nipux_cli.config import load_config
  49from nipux_cli.cli_state import mark_model_setup_verified as _mark_model_setup_verified
  50from nipux_cli.cli_state import model_setup_verified as _model_setup_verified
  51from nipux_cli.cli_state import read_shell_state as _read_shell_state
  52from nipux_cli.cli_state import write_shell_state as _write_shell_state
  53from nipux_cli.daemon import append_daemon_event
  54from nipux_cli.db import AgentDB
  55from nipux_cli.doctor import Check
  56from nipux_cli.llm import LLMResponse
  57from nipux_cli.settings import inline_setting_notice as _inline_setting_notice
  58from nipux_cli.first_run_frame_runtime import FirstRunRuntimeDeps as _FirstRunRuntimeDeps
  59from nipux_cli.first_run_frame_runtime import _handle_edit_input as _handle_first_run_edit_input
  60from nipux_cli.first_run_frame_runtime import _safe_render_frame as _safe_first_run_render_frame
  61from nipux_cli.first_run_frame_runtime import _submit_first_run_line as _submit_first_run_line
  62from nipux_cli.first_run_frame_runtime import directional_first_run_action as _directional_first_run_action
  63from nipux_cli.frame_snapshot import WORKSPACE_CHAT_ID
  64from nipux_cli.tui_commands import (
  65    CHAT_SLASH_COMMANDS,
  66    FIRST_RUN_SLASH_COMMANDS,
  67    autocomplete_slash as _autocomplete_slash,
  68    cycle_slash as _cycle_slash,
  69    slash_completion_for_submit as _slash_completion_for_submit,
  70)
  71from nipux_cli.tui_events import chat_pane_lines
  72from nipux_cli.tui_input import decode_terminal_escape as _decode_terminal_escape
  73from nipux_cli.tui_outcomes import hourly_update_lines, recent_model_update_lines
  74from nipux_cli.updater import update_checkout as _update_checkout
  75
  76
  77def _mode(path):
  78    return path.stat().st_mode & 0o777
  79
  80
  81def _mark_test_model_ready() -> None:
  82    load_config().ensure_dirs()
  83    _mark_model_setup_verified(load_config())
  84
  85
  86def test_cli_has_operator_commands():
  87    parser = build_parser()
  88
  89    assert parser.parse_args(["shell", "--status"]).func.__name__ == "cmd_shell"
  90    assert parser.parse_args(["status", "--full"]).func.__name__ == "cmd_status"
  91    assert parser.parse_args(["health"]).func.__name__ == "cmd_health"
  92    assert parser.parse_args(["history"]).func.__name__ == "cmd_history"
  93    assert parser.parse_args(["events", "--follow"]).func.__name__ == "cmd_events"
  94    assert parser.parse_args(["activity", "--follow"]).func.__name__ == "cmd_activity"
  95    assert parser.parse_args(["feed"]).func.__name__ == "cmd_activity"
  96    assert parser.parse_args(["update"]).func.__name__ == "cmd_update"
  97    assert parser.parse_args(["update", "--no-restart"]).no_restart is True
  98    assert parser.parse_args(["uninstall", "--dry-run"]).func.__name__ == "cmd_uninstall"
  99    assert parser.parse_args(["uninstall", "--keep-tool"]).keep_tool is True
 100    assert parser.parse_args(["new", "Research topic"]).func.__name__ == "cmd_create"
 101    assert parser.parse_args(["updates"]).func.__name__ == "cmd_updates"
 102    assert parser.parse_args(["outcomes"]).func.__name__ == "cmd_updates"
 103    assert parser.parse_args(["outcomes", "--all"]).all is True
 104    assert parser.parse_args(["steer", "focus", "sources"]).func.__name__ == "cmd_steer"
 105    assert parser.parse_args(["say", "focus", "sources"]).func.__name__ == "cmd_steer"
 106    assert parser.parse_args(["pause"]).func.__name__ == "cmd_pause"
 107    assert parser.parse_args(["resume"]).func.__name__ == "cmd_resume"
 108    assert parser.parse_args(["resume", "research", "finder"]).job_id == ["research", "finder"]
 109    assert parser.parse_args(["cancel"]).func.__name__ == "cmd_cancel"
 110    assert parser.parse_args(["dashboard", "--no-follow"]).func.__name__ == "cmd_dashboard"
 111    assert parser.parse_args(["dash", "--no-follow"]).func.__name__ == "cmd_dashboard"
 112    assert parser.parse_args(["focus", "research"]).func.__name__ == "cmd_focus"
 113    assert parser.parse_args(["rename", "research", "--title", "new research"]).func.__name__ == "cmd_rename"
 114    assert parser.parse_args(["delete", "research"]).func.__name__ == "cmd_delete"
 115    assert parser.parse_args(["rm", "research"]).func.__name__ == "cmd_delete"
 116    assert parser.parse_args(["chat", "research", "finder"]).func.__name__ == "cmd_chat"
 117    assert parser.parse_args(["start", "--poll-seconds", "1"]).func.__name__ == "cmd_start"
 118    assert parser.parse_args(["stop"]).func.__name__ == "cmd_stop"
 119    assert parser.parse_args(["restart"]).func.__name__ == "cmd_restart"
 120    assert parser.parse_args(["stop", "research", "finder"]).func.__name__ == "cmd_stop"
 121    assert parser.parse_args(["stop", "research", "finder"]).job_id == ["research", "finder"]
 122    assert parser.parse_args(["ls"]).func.__name__ == "cmd_jobs"
 123    assert parser.parse_args(["autostart", "status"]).func.__name__ == "cmd_autostart"
 124    assert parser.parse_args(["browser-dashboard", "--port", "4848"]).func.__name__ == "cmd_browser_dashboard"
 125    assert parser.parse_args(["artifacts"]).func.__name__ == "cmd_artifacts"
 126    assert parser.parse_args(["artifact", "art_123"]).func.__name__ == "cmd_artifact"
 127    assert parser.parse_args(["artifact", "Findings", "Batch"]).func.__name__ == "cmd_artifact"
 128    assert parser.parse_args(["lessons"]).func.__name__ == "cmd_lessons"
 129    assert parser.parse_args(["learn", "low-evidence", "pages", "are", "bad"]).func.__name__ == "cmd_learn"
 130    assert parser.parse_args(["findings"]).func.__name__ == "cmd_findings"
 131    assert parser.parse_args(["tasks"]).func.__name__ == "cmd_tasks"
 132    assert parser.parse_args(["roadmap"]).func.__name__ == "cmd_roadmap"
 133    assert parser.parse_args(["experiments"]).func.__name__ == "cmd_experiments"
 134    assert parser.parse_args(["sources"]).func.__name__ == "cmd_sources"
 135    assert parser.parse_args(["memory"]).func.__name__ == "cmd_memory"
 136    assert parser.parse_args(["memory", "--graph"]).graph is True
 137    assert parser.parse_args(["metrics"]).func.__name__ == "cmd_metrics"
 138    assert parser.parse_args(["usage"]).func.__name__ == "cmd_usage"
 139    assert parser.parse_args(["outputs", "research", "finder"]).func.__name__ == "cmd_logs"
 140    assert parser.parse_args(["outputs"]).func.__name__ == "cmd_logs"
 141    assert parser.parse_args(["service", "status"]).func.__name__ == "cmd_service"
 142    assert parser.parse_args(["work", "--steps", "2", "--fake"]).func.__name__ == "cmd_work"
 143    assert parser.parse_args(["run", "--no-follow"]).func.__name__ == "cmd_run"
 144
 145
 146def test_cli_version_flag(capsys):
 147    try:
 148        main(["--version"])
 149    except SystemExit as exc:
 150        assert exc.code == 0
 151
 152    assert f"nipux {__version__}" in capsys.readouterr().out
 153
 154
 155def test_main_catches_keyboard_interrupt_without_traceback(monkeypatch, capsys):
 156    def interrupt(_args):
 157        raise KeyboardInterrupt
 158
 159    monkeypatch.setattr("nipux_cli.cli.cmd_home", interrupt)
 160
 161    main([])
 162
 163    assert capsys.readouterr().err == ""
 164
 165
 166def test_python_module_entrypoint_uses_cli_main():
 167    import nipux_cli.__main__ as module_entrypoint
 168
 169    assert module_entrypoint.main is main
 170
 171
 172def test_init_openrouter_writes_secret_free_config_and_env_template(monkeypatch, tmp_path, capsys):
 173    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
 174
 175    main(["init", "--openrouter", "--model", "provider/model"])
 176
 177    out = capsys.readouterr().out
 178    config_text = (tmp_path / "config.yaml").read_text(encoding="utf-8")
 179    env_text = (tmp_path / ".env").read_text(encoding="utf-8")
 180    assert "Wrote" in out
 181    assert "name: provider/model" in config_text
 182    assert "base_url: https://openrouter.ai/api/v1" in config_text
 183    assert "api_key_env: OPENROUTER_API_KEY" in config_text
 184    assert "sk-" not in config_text
 185    assert env_text.strip().endswith("OPENROUTER_API_KEY" + "=")
 186    assert "sk-" not in env_text
 187    assert _mode(tmp_path / "config.yaml") == 0o600
 188    assert _mode(tmp_path / ".env") == 0o600
 189
 190
 191def test_init_defaults_to_local_endpoint(monkeypatch, tmp_path):
 192    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
 193
 194    main(["init"])
 195
 196    config_text = (tmp_path / "config.yaml").read_text(encoding="utf-8")
 197    env_text = (tmp_path / ".env").read_text(encoding="utf-8")
 198    assert "name: local-model" in config_text
 199    assert "base_url: http://localhost:8000/v1" in config_text
 200    assert "api_key_env: OPENAI_API_KEY" in config_text
 201    assert env_text.strip().endswith("OPENAI_API_KEY" + "=")
 202
 203
 204def test_init_openrouter_defaults_to_generic_route(monkeypatch, tmp_path):
 205    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
 206
 207    main(["init", "--openrouter"])
 208
 209    config_text = (tmp_path / "config.yaml").read_text(encoding="utf-8")
 210    assert "name: openrouter/auto" in config_text
 211    assert "base_url: https://openrouter.ai/api/v1" in config_text
 212    assert "api_key_env: OPENROUTER_API_KEY" in config_text
 213
 214
 215def test_shell_freeform_text_adds_operator_message(monkeypatch, tmp_path, capsys):
 216    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
 217    db = AgentDB(tmp_path / "state.db")
 218    try:
 219        job_id = db.create_job("Research topic", title="research")
 220    finally:
 221        db.close()
 222
 223    assert _run_shell_line("focus on real evidence sources, not irrelevant sources") is True
 224
 225    out = capsys.readouterr().out
 226    db = AgentDB(tmp_path / "state.db")
 227    try:
 228        job = db.get_job(job_id)
 229        assert "waiting for research" in out
 230        assert (
 231            job["metadata"]["operator_messages"][-1]["message"]
 232            == "focus on real evidence sources, not irrelevant sources"
 233        )
 234    finally:
 235        db.close()
 236
 237
 238def test_main_no_args_enters_chat_first_home(monkeypatch, tmp_path, capsys):
 239    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
 240    _mark_test_model_ready()
 241    db = AgentDB(tmp_path / "state.db")
 242    try:
 243        job_id = db.create_job("Research topic", title="research")
 244        db.append_operator_message(job_id, "remember this visible note", source="test")
 245        db.append_agent_update(job_id, "visible agent update", category="chat")
 246    finally:
 247        db.close()
 248
 249    def eof_input(_prompt):
 250        raise EOFError
 251
 252    monkeypatch.setattr("builtins.input", eof_input)
 253
 254    main([])
 255
 256    out = capsys.readouterr().out
 257    assert "WORKSPACE" in out
 258    assert "Jobs" in out
 259    assert "research" in out
 260
 261
 262def test_main_no_args_with_no_jobs_requires_setup_frame(monkeypatch, tmp_path, capsys):
 263    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
 264
 265    def eof_input(_prompt):
 266        raise EOFError
 267
 268    monkeypatch.setattr("builtins.input", eof_input)
 269
 270    main([])
 271
 272    out = capsys.readouterr().out
 273    assert "Nipux setup requires an interactive terminal." in out
 274    assert "choose model, endpoint, and tool access" in out
 275    assert "first job" not in out
 276    assert "new       create a long-running job" not in out
 277    assert "doctor    check local setup" not in out
 278    assert "_   _" not in out
 279    assert "nipux menu >" not in out
 280
 281
 282def test_main_no_args_with_old_setup_marker_still_requires_model_verification(monkeypatch, tmp_path, capsys):
 283    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
 284    load_config().ensure_dirs()
 285    from nipux_cli.cli_state import write_shell_state as write_shell_state
 286
 287    write_shell_state({"setup_completed": True})
 288
 289    main([])
 290
 291    out = capsys.readouterr().out
 292    assert "Nipux setup requires an interactive terminal." in out
 293    assert "No jobs are saved in this profile." not in out
 294
 295
 296def test_main_no_args_after_setup_complete_does_not_reopen_setup(monkeypatch, tmp_path, capsys):
 297    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
 298    _mark_test_model_ready()
 299
 300    main([])
 301
 302    out = capsys.readouterr().out
 303    assert "No jobs are saved in this profile." in out
 304    assert "Nipux setup requires" not in out
 305    assert "Begin setup" not in out
 306
 307
 308def test_main_no_args_autoverifies_existing_model_config(monkeypatch, tmp_path, capsys):
 309    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
 310    monkeypatch.setenv("TEST_MODEL_KEY", "working-key")
 311    tmp_path.mkdir(parents=True, exist_ok=True)
 312    (tmp_path / "config.yaml").write_text(
 313        "model:\n"
 314        "  name: provider/model\n"
 315        "  base_url: https://provider.example/v1\n"
 316        "  api_key_env: TEST_MODEL_KEY\n",
 317        encoding="utf-8",
 318    )
 319
 320    def fake_doctor(*, config, check_model):
 321        assert check_model is True
 322        assert config.model.model == "provider/model"
 323        return [
 324            Check("state_dir_writable", True, "ok"),
 325            Check("sqlite", True, "ok"),
 326            Check("model_config", True, "ok"),
 327            Check("model_endpoint", True, "ok"),
 328            Check("model_generation", True, "ok"),
 329        ]
 330
 331    monkeypatch.setattr("nipux_cli.cli.run_doctor", fake_doctor)
 332
 333    main([])
 334
 335    out = capsys.readouterr().out
 336    assert "No jobs are saved in this profile." in out
 337    assert "Model setup is not verified." not in out
 338    assert _model_setup_verified(load_config())
 339
 340
 341def test_main_no_args_enters_setup_when_existing_model_config_fails(monkeypatch, tmp_path, capsys):
 342    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
 343    tmp_path.mkdir(parents=True, exist_ok=True)
 344    (tmp_path / "config.yaml").write_text(
 345        "model:\n"
 346        "  name: provider/model\n"
 347        "  base_url: https://provider.example/v1\n"
 348        "  api_key_env: TEST_MODEL_KEY\n",
 349        encoding="utf-8",
 350    )
 351
 352    def fake_doctor(*, config, check_model):
 353        assert check_model is True
 354        return [Check("model_generation", False, "provider rejected request")]
 355
 356    monkeypatch.setattr("nipux_cli.cli.run_doctor", fake_doctor)
 357
 358    main([])
 359
 360    out = capsys.readouterr().out
 361    assert "Nipux setup requires an interactive terminal." in out
 362    assert "No jobs are saved in this profile." not in out
 363    assert not _model_setup_verified(load_config())
 364
 365
 366def test_main_no_args_keeps_workspace_locked_after_completed_setup_if_provider_fails(monkeypatch, tmp_path, capsys):
 367    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
 368    tmp_path.mkdir(parents=True, exist_ok=True)
 369    (tmp_path / "config.yaml").write_text(
 370        "model:\n"
 371        "  name: provider/model\n"
 372        "  base_url: https://provider.example/v1\n"
 373        "  api_key_env: TEST_MODEL_KEY\n",
 374        encoding="utf-8",
 375    )
 376    _write_shell_state({"setup_completed": True})
 377
 378    def fake_doctor(*, config, check_model):
 379        assert check_model is True
 380        return [Check("model_generation", False, "provider rejected request")]
 381
 382    monkeypatch.setattr("nipux_cli.cli.run_doctor", fake_doctor)
 383
 384    main([])
 385
 386    out = capsys.readouterr().out
 387    assert "Nipux setup requires an interactive terminal." in out
 388    assert "No jobs are saved in this profile." not in out
 389    assert not _model_setup_verified(load_config())
 390
 391
 392def test_first_run_refuses_job_before_model_is_verified(monkeypatch, tmp_path):
 393    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
 394
 395    result = _handle_first_run_frame_line("new Build a durable workflow")
 396
 397    assert result[0] == "notice"
 398    assert "Finish setup first" in result[1]
 399    create_result = _handle_first_run_frame_line('create "Build a durable workflow"')
 400    assert create_result[0] == "notice"
 401    assert "Finish setup first" in create_result[1]
 402    jobs_result = _handle_first_run_frame_line("jobs")
 403    assert jobs_result[0] == "notice"
 404    assert "Jobs are available after Doctor verifies" in jobs_result[1]
 405    db = AgentDB(tmp_path / "state.db")
 406    try:
 407        assert db.list_jobs() == []
 408    finally:
 409        db.close()
 410
 411
 412def test_doctor_check_model_marks_model_setup_verified(monkeypatch, tmp_path, capsys):
 413    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
 414
 415    def fake_doctor(*, config, check_model):
 416        assert check_model is True
 417        return [
 418            Check("state_dir_writable", True, "ok"),
 419            Check("sqlite", True, "ok"),
 420            Check("model_config", True, "ok"),
 421            Check("model_endpoint", True, "ok"),
 422            Check("model_generation", True, "ok"),
 423        ]
 424
 425    monkeypatch.setattr("nipux_cli.cli.run_doctor", fake_doctor)
 426    args = build_parser().parse_args(["doctor", "--check-model"])
 427    args.func(args)
 428
 429    out = capsys.readouterr().out
 430    assert "model_setup\tverified" in out
 431    assert _model_setup_verified(load_config())
 432
 433
 434def test_first_run_doctor_failure_shows_inline_fix_commands(monkeypatch, tmp_path):
 435    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
 436
 437    def fake_doctor(*, config, check_model):
 438        assert check_model is True
 439        return [Check("model_endpoint", False, "connection refused")]
 440
 441    monkeypatch.setattr("nipux_cli.cli.run_doctor", fake_doctor)
 442
 443    lines = _verify_model_setup_from_first_run()
 444    rendered = "\n".join(lines)
 445
 446    assert "Model setup is not ready" in rendered
 447    assert "/base-url URL" in rendered
 448    assert "/api-key KEY" in rendered
 449    assert "/model MODEL" in rendered
 450    assert "local server" in rendered
 451
 452
 453def test_setting_change_clears_model_setup_verification(monkeypatch, tmp_path):
 454    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
 455    _mark_test_model_ready()
 456
 457    assert _model_setup_verified(load_config())
 458    _inline_setting_notice("model.name", "provider/other-model")
 459
 460    assert not _model_setup_verified(load_config())
 461
 462
 463def test_first_run_menu_blocks_job_creation_until_workspace_chat(monkeypatch, tmp_path, capsys):
 464    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
 465    _mark_test_model_ready()
 466
 467    assert _handle_first_run_menu_line("new Build a durable workflow") is True
 468
 469    out = capsys.readouterr().out
 470    db = AgentDB(tmp_path / "state.db")
 471    try:
 472        assert db.list_jobs() == []
 473        assert "Finish setup first" in out
 474    finally:
 475        db.close()
 476
 477
 478def test_first_run_plain_greeting_does_not_create_job(monkeypatch, tmp_path, capsys):
 479    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
 480
 481    assert _handle_first_run_menu_line("Hello") is True
 482
 483    out = capsys.readouterr().out
 484    db = AgentDB(tmp_path / "state.db")
 485    try:
 486        assert db.list_jobs() == []
 487    finally:
 488        db.close()
 489    assert "Setup must be completed" in out
 490
 491
 492def test_first_run_frame_uses_full_screen_ui_not_banner(monkeypatch, tmp_path):
 493    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
 494
 495    frame = _build_first_run_frame("", [], width=100, height=24)
 496    lines = frame.splitlines()
 497
 498    assert "workspace" not in lines[0].lower()
 499    assert "Endpoint" in lines[0]
 500    assert "Enter the endpoint first" in frame
 501    assert "Begin setup" not in frame
 502    assert "Long-running work, installed in-session." not in frame
 503    assert "Required: type an OpenAI-compatible endpoint URL" in frame
 504    assert "controls on the right" not in frame
 505    assert "Control" not in frame
 506    assert "SETUP" not in frame
 507    assert "│ SETUP" not in frame
 508    assert "daemon stopped" not in frame
 509    assert "FIRST RUN" not in frame
 510    assert "nipux menu >" not in frame
 511    assert "/shell" not in frame
 512
 513
 514def test_first_run_frame_hides_command_popup_during_setup(monkeypatch, tmp_path):
 515    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
 516
 517    frame = _build_first_run_frame("/", [], width=100, height=26)
 518
 519    assert "commands" not in frame
 520    assert "/new" not in frame
 521    assert "/jobs" not in frame
 522    assert "/model" not in frame
 523    assert "/settings" not in frame
 524    assert "/shell" not in frame
 525    assert "Enter the endpoint first" in frame
 526
 527
 528def test_first_run_frame_walks_setup_screens(monkeypatch, tmp_path):
 529    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
 530
 531    model = _build_first_run_frame("", [], width=100, height=26, view="model", selected=0)
 532    endpoint = _build_first_run_frame("", [], width=100, height=26, view="endpoint", selected=0)
 533    api = _build_first_run_frame("", [], width=100, height=26, view="api", selected=0)
 534    access = _build_first_run_frame("", [], width=100, height=26, view="access", selected=0)
 535    doctor = _build_first_run_frame("", [], width=100, height=28, view="doctor", selected=0)
 536    invalid = _build_first_run_frame("", [], width=100, height=26, view="settings", selected=1)
 537
 538    assert "Enter the model id" in model
 539    assert "Blank input is not accepted" in model
 540    assert "Enter the endpoint first" in endpoint
 541    assert "BASE URL" in endpoint
 542    assert "Enter the API key" in api
 543    assert "type skip" in api
 544    assert "Choose tool access" in access
 545    assert "Browser" in access
 546    assert "CLI" in access
 547    assert "Run checks" in doctor
 548    assert "/base-url" in doctor
 549    assert "/api-key" in doctor
 550    assert "/model" in doctor
 551    assert "Enter the model id" not in endpoint
 552    assert "Enter the endpoint first" not in api
 553    assert "Enter the endpoint first" in invalid
 554    assert "/shell" not in model
 555
 556
 557def test_first_run_frame_does_not_use_command_palette_for_setup(monkeypatch, tmp_path):
 558    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
 559
 560    frame = _build_first_run_frame("/model", [], width=100, height=26)
 561
 562    assert "/model" in frame
 563    assert "set model" not in frame
 564    assert "Settings" not in frame
 565    assert "Enter the endpoint first" in frame
 566
 567
 568def test_settings_editor_persists_model_config(monkeypatch, tmp_path):
 569    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
 570
 571    assert _save_config_field("model.name", "demo/model") == "demo/model"
 572    assert _save_config_field("model.context_length", "4096") == 4096
 573    assert _save_config_field("runtime.daily_digest_enabled", "false") is False
 574
 575    assert _config_field_value("model.name") == "demo/model"
 576    assert _config_field_value("model.context_length") == 4096
 577    assert _config_field_value("runtime.daily_digest_enabled") is False
 578    text = (tmp_path / "config.yaml").read_text(encoding="utf-8")
 579    assert "demo/model" in text
 580    assert _inline_setting_notice("model.name", "") == "kept model.name"
 581
 582
 583def test_slash_autocomplete_filters_commands():
 584    assert _autocomplete_slash("/do", FIRST_RUN_SLASH_COMMANDS) == "/doctor "
 585    assert _autocomplete_slash("/mo", FIRST_RUN_SLASH_COMMANDS) == "/model "
 586    assert _autocomplete_slash("/sta", CHAT_SLASH_COMMANDS) == "/status "
 587    assert _autocomplete_slash("/rest", CHAT_SLASH_COMMANDS) == "/restart "
 588    assert _autocomplete_slash("/step", CHAT_SLASH_COMMANDS) == "/step-limit "
 589    assert _autocomplete_slash("/out", FIRST_RUN_SLASH_COMMANDS) == "/output-chars "
 590    assert _cycle_slash("/", CHAT_SLASH_COMMANDS, direction=1) == "/new"
 591    assert _cycle_slash("/", CHAT_SLASH_COMMANDS, direction=-1) == "/exit"
 592    assert _cycle_slash("/work ", CHAT_SLASH_COMMANDS, direction=1) == "/work "
 593    assert _cycle_slash("/run", CHAT_SLASH_COMMANDS, direction=1) == "/jobs"
 594    assert _cycle_slash("/", FIRST_RUN_SLASH_COMMANDS, direction=1) == "/model"
 595    assert _cycle_slash("/", FIRST_RUN_SLASH_COMMANDS, direction=-1) == "/exit"
 596    assert _cycle_slash("/model", FIRST_RUN_SLASH_COMMANDS, direction=1) == "/base-url"
 597    assert _cycle_slash("/model", FIRST_RUN_SLASH_COMMANDS, direction=-1) == "/exit"
 598    assert _cycle_slash("/out", CHAT_SLASH_COMMANDS, direction=1) == "/outcomes"
 599    assert _cycle_slash("/out", CHAT_SLASH_COMMANDS, direction=-1) == "/output-cost"
 600    assert _slash_completion_for_submit("/", CHAT_SLASH_COMMANDS) == ("/new ", False)
 601    assert _slash_completion_for_submit("/", FIRST_RUN_SLASH_COMMANDS) == ("/model", True)
 602    assert _slash_completion_for_submit("/mo", FIRST_RUN_SLASH_COMMANDS) == ("/model", True)
 603    assert _slash_completion_for_submit("/model", FIRST_RUN_SLASH_COMMANDS) == ("/model", True)
 604    assert _slash_completion_for_submit("/new ", CHAT_SLASH_COMMANDS) == ("/new ", False)
 605    assert _slash_completion_for_submit("/new research agents", CHAT_SLASH_COMMANDS) == ("/new research agents", True)
 606    assert _slash_completion_for_submit("/model ", CHAT_SLASH_COMMANDS) == ("/model ", True)
 607    assert _slash_completion_for_submit("/j", CHAT_SLASH_COMMANDS) == ("/jobs", True)
 608    assert _slash_completion_for_submit("/set", CHAT_SLASH_COMMANDS) == ("/settings", True)
 609    assert _slash_completion_for_submit("/model", CHAT_SLASH_COMMANDS) == ("/model", True)
 610    assert _slash_completion_for_submit("/mo", CHAT_SLASH_COMMANDS) == ("/model", True)
 611    assert _slash_completion_for_submit("/run", CHAT_SLASH_COMMANDS) == ("/run", True)
 612    assert _slash_completion_for_submit("/settings", CHAT_SLASH_COMMANDS) == ("/settings", True)
 613    assert _slash_completion_for_submit("/model demo/model", CHAT_SLASH_COMMANDS) == ("/model demo/model", True)
 614    assert _autocomplete_slash("plain text", CHAT_SLASH_COMMANDS) == "plain text"
 615    lines = _slash_suggestion_lines("/art", CHAT_SLASH_COMMANDS, width=80)
 616    text = "\n".join(lines)
 617    assert "/artifacts" in text
 618    assert "/artifact" in text
 619    assert "/run" not in text
 620    hint_text = "\n".join(_slash_suggestion_lines("/model ", CHAT_SLASH_COMMANDS, width=80))
 621    assert "/model" in hint_text
 622    assert "MODEL" in hint_text
 623    partial_hint_text = "\n".join(_slash_suggestion_lines("/mo", CHAT_SLASH_COMMANDS, width=80))
 624    assert "/model MODEL" in partial_hint_text
 625    assert "↑↓ moves" in partial_hint_text
 626    full_palette_text = "\n".join(_slash_suggestion_lines("/", CHAT_SLASH_COMMANDS, width=80, limit=5))
 627    assert "enter selects" in full_palette_text
 628    assert "/new OBJECTIVE" in full_palette_text
 629    assert "/run" in full_palette_text
 630    assert "/settings" in full_palette_text
 631    assert "type OBJECTIVE" in "\n".join(_slash_suggestion_lines("/new ", CHAT_SLASH_COMMANDS, width=80))
 632    assert "/shell" not in "\n".join(_slash_suggestion_lines("/", CHAT_SLASH_COMMANDS, width=80, limit=20))
 633    assert "/restart" in "\n".join(_slash_suggestion_lines("/re", CHAT_SLASH_COMMANDS, width=80, limit=20))
 634
 635
 636def test_terminal_escape_decodes_arrows_and_mouse_click():
 637    assert _decode_terminal_escape("\x1b[A") == ("up", None)
 638    assert _decode_terminal_escape("\x1b[B") == ("down", None)
 639    assert _decode_terminal_escape("\x1b[C") == ("right", None)
 640    assert _decode_terminal_escape("\x1b[D") == ("left", None)
 641    assert _decode_terminal_escape("\x1bOB") == ("down", None)
 642    assert _decode_terminal_escape("\x1b[1;2B") == ("down", None)
 643    assert _decode_terminal_escape("\x1b[<0;88;12M") == ("click", (88, 12))
 644    assert _decode_terminal_escape("\x1b[M !!") == ("click", (1, 1))
 645
 646
 647def test_first_run_click_maps_right_pane_actions(monkeypatch):
 648    monkeypatch.setattr("shutil.get_terminal_size", lambda fallback=(100, 30): (100, 30))
 649
 650    assert _first_run_click_action(5, 15, view="endpoint") is None
 651    assert _first_run_click_action(5, 15, view="access") == 0
 652    assert _first_run_click_action(25, 15, view="access") == 1
 653    assert _first_run_click_action(1, 4, view="access") is None
 654    assert _first_run_click_action(5, 1, view="model") is None
 655
 656
 657def test_first_run_arrow_navigation_changes_setup_screens():
 658    assert _directional_first_run_action(
 659        [
 660            ("view:model", "Begin setup", "walk through setup"),
 661            ("doctor", "Doctor", "check"),
 662        ],
 663        direction=1,
 664    ) == "view:model"
 665    assert _directional_first_run_action(
 666        [
 667            ("edit:model.name", "Edit model", "set model"),
 668            ("view:connector", "Continue", "choose connector"),
 669            ("view:start", "Back", "intro"),
 670        ],
 671        direction=1,
 672    ) == "view:connector"
 673    assert _directional_first_run_action(
 674        [
 675            ("edit:model.name", "Edit model", "set model"),
 676            ("view:connector", "Continue", "choose connector"),
 677            ("view:start", "Back", "intro"),
 678        ],
 679        direction=-1,
 680    ) == "view:start"
 681
 682
 683def test_frame_next_job_cycles_jobs():
 684    snapshot = {"jobs": [{"id": "one"}, {"id": "two"}, {"id": "three"}]}
 685
 686    assert _frame_next_job_id(snapshot, "one", direction=1) == "two"
 687    assert _frame_next_job_id(snapshot, "one", direction=-1) == "three"
 688    assert _frame_next_job_id(snapshot, "missing", direction=1) == "two"
 689
 690
 691def test_frame_refresh_slows_background_updates_while_typing():
 692    assert _frame_refresh_interval("") < _frame_refresh_interval("drafting a message")
 693
 694
 695def test_first_run_empty_submit_without_actions_does_not_crash():
 696    deps = _FirstRunRuntimeDeps(
 697        render_frame=lambda _buffer, _notices, _selected, _view, _editing, _previous: "",
 698        actions=lambda _view: [],
 699        handle_action=lambda _action: ("notice", "unused"),
 700        handle_line=lambda _line: ("notice", "unused"),
 701        click_action=lambda _x, _y, _view: None,
 702    )
 703
 704    assert _submit_first_run_line("", selected=0, view="empty", deps=deps) == (
 705        "notice",
 706        "This setup step requires an explicit value.",
 707    )
 708
 709
 710def test_first_run_required_edit_cancel_and_clear_stay_on_same_field():
 711    notices: list[str] = []
 712
 713    buffer, editing_field, should_exit = _handle_first_run_edit_input(
 714        "\x15",
 715        buffer="not-a-url",
 716        editing_field="model.base_url",
 717        notices=notices,
 718        stdin_fd=0,
 719    )
 720
 721    assert buffer == ""
 722    assert editing_field == "model.base_url"
 723    assert should_exit is False
 724
 725    buffer, editing_field, should_exit = _handle_first_run_edit_input(
 726        "\x03",
 727        buffer="partial",
 728        editing_field="model.base_url",
 729        notices=notices,
 730        stdin_fd=0,
 731    )
 732
 733    assert buffer == ""
 734    assert editing_field == "model.base_url"
 735    assert should_exit is False
 736    assert "cancelled edit" in "\n".join(notices)
 737
 738
 739def test_chat_settings_edit_supports_ctrl_u_clear():
 740    buffer, editing_field, should_exit = _handle_chat_edit_input(
 741        "\x15",
 742        buffer="wrong-model",
 743        editing_field="model.name",
 744        notices=[],
 745        stdin_fd=0,
 746    )
 747
 748    assert buffer == ""
 749    assert editing_field == "model.name"
 750    assert should_exit is False
 751
 752
 753def test_first_run_render_failure_uses_safe_mode(capsys):
 754    deps = _FirstRunRuntimeDeps(
 755        render_frame=lambda *_args: (_ for _ in ()).throw(RuntimeError("bad frame")),
 756        actions=lambda _view: [],
 757        handle_action=lambda _action: ("notice", "unused"),
 758        handle_line=lambda _line: ("notice", "unused"),
 759        click_action=lambda _x, _y, _view: None,
 760    )
 761    notices: list[str] = []
 762
 763    frame = _safe_first_run_render_frame(
 764        deps,
 765        buffer="hello",
 766        notices=notices,
 767        selected=0,
 768        view="start",
 769        editing_field=None,
 770        previous_frame="",
 771    )
 772
 773    assert "safe mode" in frame
 774    assert "render failed" in "\n".join(notices)
 775    assert "bad frame" in capsys.readouterr().out
 776
 777
 778def test_chat_submit_failure_stays_in_frame():
 779    snapshot = {"job_id": "job_demo", "job": {"id": "job_demo", "title": "demo"}, "jobs": []}
 780    async_messages: queue.Queue[str] = queue.Queue()
 781
 782    deps = _ChatFrameDeps(
 783        load_snapshot=lambda _job_id, _history_limit: snapshot,
 784        render_frame=lambda *_args: "",
 785        handle_chat_message=lambda _job_id, _line: (_ for _ in ()).throw(RuntimeError("model blew up")),
 786        capture_chat_command=lambda _job_id, _line: (True, ""),
 787        write_shell_state=lambda _state: None,
 788        is_plain_chat_line=lambda _line: True,
 789        page_click=lambda _x, _y, _right_view: None,
 790    )
 791
 792    keep_running, _snapshot, job_id, notices, right_view, modal = _handle_chat_submit(
 793        "hello",
 794        job_id="job_demo",
 795        history_limit=12,
 796        snapshot=snapshot,
 797        notices=[],
 798        right_view="status",
 799        modal_view=None,
 800        deps=deps,
 801        async_messages=async_messages,
 802    )
 803
 804    assert keep_running is True
 805    assert job_id == "job_demo"
 806    assert right_view == "status"
 807    assert modal is None
 808    assert _THINKING_NOTICE in notices
 809    assert "> hello" not in "\n".join(notices)
 810    queued = async_messages.get(timeout=1)
 811    assert "message failed" in queued
 812    assert "model blew up" in queued
 813    async_messages.put(queued)
 814    assert _drain_chat_async_notices(async_messages, notices) is True
 815    assert _THINKING_NOTICE not in notices
 816    assert "message failed" in "\n".join(notices)
 817
 818
 819def test_chat_submit_plain_message_returns_without_waiting_for_model():
 820    snapshot = {"job_id": "job_demo", "job": {"id": "job_demo", "title": "demo"}, "jobs": []}
 821    async_messages: queue.Queue[str] = queue.Queue()
 822
 823    def slow_chat(_job_id, _line):
 824        time.sleep(0.3)
 825        return True, "done later"
 826
 827    deps = _ChatFrameDeps(
 828        load_snapshot=lambda _job_id, _history_limit: snapshot,
 829        render_frame=lambda *_args: "",
 830        handle_chat_message=slow_chat,
 831        capture_chat_command=lambda _job_id, _line: (True, ""),
 832        write_shell_state=lambda _state: None,
 833        is_plain_chat_line=lambda _line: True,
 834        page_click=lambda _x, _y, _right_view: None,
 835    )
 836
 837    started = time.monotonic()
 838    keep_running, _snapshot, _job_id, notices, _right_view, _modal = _handle_chat_submit(
 839        "hello",
 840        job_id="job_demo",
 841        history_limit=12,
 842        snapshot=snapshot,
 843        notices=[],
 844        right_view="status",
 845        modal_view=None,
 846        deps=deps,
 847        async_messages=async_messages,
 848    )
 849
 850    assert keep_running is True
 851    assert time.monotonic() - started < 0.1
 852    assert _THINKING_NOTICE in notices
 853    assert "> hello" not in "\n".join(notices)
 854    assert async_messages.get(timeout=1) == "__refresh__"
 855
 856
 857def test_chat_submit_plain_message_renders_thinking_notice_without_echoing_message():
 858    snapshot = {"job_id": "job_demo", "job": {"id": "job_demo", "title": "demo"}, "jobs": []}
 859    async_messages: queue.Queue[str] = queue.Queue()
 860
 861    deps = _ChatFrameDeps(
 862        load_snapshot=lambda _job_id, _history_limit: snapshot,
 863        render_frame=lambda *_args: "",
 864        handle_chat_message=lambda _job_id, _line: (True, "done"),
 865        capture_chat_command=lambda _job_id, _line: (True, ""),
 866        write_shell_state=lambda _state: None,
 867        is_plain_chat_line=lambda _line: True,
 868        page_click=lambda _x, _y, _right_view: None,
 869    )
 870
 871    _keep_running, _snapshot, _job_id, notices, _right_view, _modal = _handle_chat_submit(
 872        "Hello",
 873        job_id="job_demo",
 874        history_limit=12,
 875        snapshot=snapshot,
 876        notices=[],
 877        right_view="status",
 878        modal_view=None,
 879        deps=deps,
 880        async_messages=async_messages,
 881    )
 882
 883    rendered_notices = _display_chat_notices(notices)
 884    lines = chat_pane_lines([], rendered_notices, width=60, rows=4)
 885    rendered = "\n".join(lines)
 886    assert "thinking" in rendered
 887    assert "Hello" not in rendered
 888    assert "waiting for model" not in rendered
 889
 890
 891def test_chat_submit_waiting_command_output_becomes_animation():
 892    snapshot = {"job_id": "job_demo", "job": {"id": "job_demo", "title": "demo"}, "jobs": []}
 893
 894    deps = _ChatFrameDeps(
 895        load_snapshot=lambda _job_id, _history_limit: snapshot,
 896        render_frame=lambda *_args: "",
 897        handle_chat_message=lambda _job_id, _line: (True, ""),
 898        capture_chat_command=lambda _job_id, _line: (
 899            True,
 900            "waiting for demo: what has it done so far?\nWaiting for the next worker step.",
 901        ),
 902        write_shell_state=lambda _state: None,
 903        is_plain_chat_line=lambda _line: False,
 904        page_click=lambda _x, _y, _right_view: None,
 905    )
 906
 907    _keep_running, _snapshot, _job_id, notices, _right_view, _modal = _handle_chat_submit(
 908        "/pause",
 909        job_id="job_demo",
 910        history_limit=12,
 911        snapshot=snapshot,
 912        notices=[],
 913        right_view="status",
 914        modal_view=None,
 915        deps=deps,
 916    )
 917
 918    lines = chat_pane_lines([], _display_chat_notices(notices), width=80, rows=6)
 919    rendered = "\n".join(lines)
 920    assert "waiting" in rendered
 921    assert "waiting for demo" not in rendered
 922    assert "Waiting for the next worker step" not in rendered
 923    assert "NIPUX" not in rendered
 924
 925
 926def test_chat_submit_new_refreshes_focused_job_from_shell_state():
 927    old_snapshot = {"job_id": "old", "job": {"id": "old", "title": "old"}, "jobs": []}
 928    new_snapshot = {"job_id": "new", "job": {"id": "new", "title": "new"}, "jobs": []}
 929    loaded: list[str] = []
 930
 931    def load_snapshot(job_id, _history_limit):
 932        loaded.append(job_id)
 933        return new_snapshot if job_id == "" else old_snapshot
 934
 935    deps = _ChatFrameDeps(
 936        load_snapshot=load_snapshot,
 937        render_frame=lambda *_args: "",
 938        handle_chat_message=lambda _job_id, _line: (True, ""),
 939        capture_chat_command=lambda _job_id, _line: (True, "created new\nfocus set to new"),
 940        write_shell_state=lambda _state: None,
 941        is_plain_chat_line=lambda _line: False,
 942        page_click=lambda _x, _y, _right_view: None,
 943    )
 944
 945    keep_running, _snapshot, job_id, notices, _right_view, _modal = _handle_chat_submit(
 946        "/new Build a durable workflow",
 947        job_id="old",
 948        history_limit=12,
 949        snapshot=old_snapshot,
 950        notices=[],
 951        right_view="status",
 952        modal_view=None,
 953        deps=deps,
 954    )
 955
 956    assert keep_running is True
 957    assert job_id == "new"
 958    assert loaded == [""]
 959    assert "focus set to new" in "\n".join(notices)
 960
 961
 962def test_workspace_chat_submit_new_keeps_workspace_chat_left_pane():
 963    old_snapshot = {"job_id": WORKSPACE_CHAT_ID, "job": {"id": WORKSPACE_CHAT_ID, "title": "Nipux"}, "jobs": []}
 964    workspace_snapshot = {
 965        "job_id": WORKSPACE_CHAT_ID,
 966        "job": {"id": WORKSPACE_CHAT_ID, "title": "Nipux"},
 967        "right_job": {"id": "new", "title": "new worker"},
 968        "jobs": [],
 969    }
 970    loaded: list[str] = []
 971
 972    def load_snapshot(job_id, _history_limit):
 973        loaded.append(job_id)
 974        return workspace_snapshot
 975
 976    deps = _ChatFrameDeps(
 977        load_snapshot=load_snapshot,
 978        render_frame=lambda *_args: "",
 979        handle_chat_message=lambda _job_id, _line: (True, ""),
 980        capture_chat_command=lambda _job_id, _line: (True, "created new\nfocus set to new"),
 981        write_shell_state=lambda _state: None,
 982        is_plain_chat_line=lambda _line: False,
 983        page_click=lambda _x, _y, _right_view: None,
 984    )
 985
 986    keep_running, snapshot, job_id, notices, _right_view, _modal = _handle_chat_submit(
 987        "/new Build a durable workflow",
 988        job_id=WORKSPACE_CHAT_ID,
 989        history_limit=12,
 990        snapshot=old_snapshot,
 991        notices=[],
 992        right_view="updates",
 993        modal_view=None,
 994        deps=deps,
 995    )
 996
 997    assert keep_running is True
 998    assert job_id == WORKSPACE_CHAT_ID
 999    assert snapshot["right_job"]["title"] == "new worker"
1000    assert loaded == [WORKSPACE_CHAT_ID]
1001    assert "focus set to new" in "\n".join(notices)
1002
1003
1004def test_chat_render_failure_uses_safe_mode(capsys):
1005    snapshot = {"job_id": "job_demo", "job": {"id": "job_demo", "title": "demo"}, "jobs": []}
1006    deps = _ChatFrameDeps(
1007        load_snapshot=lambda _job_id, _history_limit: snapshot,
1008        render_frame=lambda *_args: (_ for _ in ()).throw(RuntimeError("bad chat frame")),
1009        handle_chat_message=lambda _job_id, _line: (True, ""),
1010        capture_chat_command=lambda _job_id, _line: (True, ""),
1011        write_shell_state=lambda _state: None,
1012        is_plain_chat_line=lambda _line: True,
1013        page_click=lambda _x, _y, _right_view: None,
1014    )
1015    notices: list[str] = []
1016
1017    frame = _safe_chat_render_frame(
1018        deps,
1019        snapshot=snapshot,
1020        buffer="hello",
1021        notices=notices,
1022        right_view="status",
1023        selected_control=0,
1024        editing_field=None,
1025        modal_view=None,
1026        previous_frame="",
1027    )
1028
1029    assert "safe mode" in frame
1030    assert "render failed" in "\n".join(notices)
1031    assert "bad chat frame" in capsys.readouterr().out
1032
1033
1034def test_chat_help_has_config_slash_commands_without_settings_page(monkeypatch, tmp_path, capsys):
1035    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
1036    db = AgentDB(tmp_path / "state.db")
1037    try:
1038        job_id = db.create_job("Research topic", title="research")
1039    finally:
1040        db.close()
1041
1042    assert _chat_handle_line(job_id, "/help") is True
1043
1044    out = capsys.readouterr().out
1045    assert "Core workflow:" in out
1046    assert "/new OBJECTIVE       create a job and start work" in out
1047    assert "/run                 resume/start the focused job" in out
1048    assert "/activity            tool calls" in out
1049    assert "/settings" in out
1050    assert "/usage" in out
1051    assert "/config" in out
1052    assert "/outcomes" in out
1053    assert "/model MODEL" in out
1054    assert "/api-key KEY" in out
1055    assert "/timeout SECONDS" in out
1056    assert "/browser true|false" in out
1057    assert "/cli-access true|false" in out
1058    assert "/home PATH" in out
1059    assert "/digest-time HH:MM" in out
1060    assert "/shell" not in out
1061
1062
1063def test_chat_slash_palette_matches_public_chat_commands():
1064    palette = {command for command, _description in CHAT_SLASH_COMMANDS}
1065    assert len(palette) == len(CHAT_SLASH_COMMANDS)
1066    advertised = {
1067        "/jobs",
1068        "/focus",
1069        "/switch",
1070        "/new",
1071        "/delete",
1072        "/history",
1073        "/events",
1074        "/activity",
1075        "/outputs",
1076        "/updates",
1077        "/outcomes",
1078        "/status",
1079        "/usage",
1080        "/config",
1081        "/settings",
1082        "/health",
1083        "/help",
1084        "/artifacts",
1085        "/artifact",
1086        "/findings",
1087        "/tasks",
1088        "/roadmap",
1089        "/experiments",
1090        "/sources",
1091        "/memory",
1092        "/metrics",
1093        "/lessons",
1094        "/model",
1095        "/base-url",
1096        "/api-key",
1097        "/api-key-env",
1098        "/context",
1099        "/input-cost",
1100        "/output-cost",
1101        "/timeout",
1102        "/browser",
1103        "/web",
1104        "/cli-access",
1105        "/file-access",
1106        "/home",
1107        "/step-limit",
1108        "/output-chars",
1109        "/daily-digest",
1110        "/digest-time",
1111        "/doctor",
1112        "/run",
1113        "/start",
1114        "/restart",
1115        "/work",
1116        "/work-verbose",
1117        "/stop",
1118        "/pause",
1119        "/resume",
1120        "/cancel",
1121        "/learn",
1122        "/note",
1123        "/follow",
1124        "/digest",
1125        "/clear",
1126        "/exit",
1127    }
1128
1129    assert advertised <= palette
1130    assert "/shell" not in palette
1131
1132
1133def test_workspace_help_is_minimal_and_actionable(monkeypatch, tmp_path):
1134    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
1135    _mark_test_model_ready()
1136
1137    keep_running, output = _capture_chat_command(WORKSPACE_CHAT_ID, "/help")
1138
1139    assert keep_running is True
1140    assert "Create: type a goal" in output
1141    assert "/new OBJECTIVE" in output
1142    assert "/settings" in output
1143    assert "Navigate:" in output
1144    assert "Common" not in output
1145    assert "/shell" not in output
1146
1147
1148def test_first_run_slash_palette_matches_setup_commands():
1149    palette = {command for command, _description in FIRST_RUN_SLASH_COMMANDS}
1150    assert len(palette) == len(FIRST_RUN_SLASH_COMMANDS)
1151
1152    advertised = {
1153        "/model",
1154        "/base-url",
1155        "/api-key",
1156        "/api-key-env",
1157        "/config",
1158        "/context",
1159        "/input-cost",
1160        "/output-cost",
1161        "/timeout",
1162        "/browser",
1163        "/web",
1164        "/cli-access",
1165        "/file-access",
1166        "/home",
1167        "/step-limit",
1168        "/output-chars",
1169        "/daily-digest",
1170        "/digest-time",
1171        "/doctor",
1172        "/init",
1173        "/help",
1174        "/clear",
1175        "/exit",
1176    }
1177
1178    assert advertised <= palette
1179    assert "/new" not in palette
1180    assert "/jobs" not in palette
1181    assert "/shell" not in palette
1182    assert "/settings" not in palette
1183
1184
1185def test_chat_settings_slash_commands_persist_config(monkeypatch, tmp_path, capsys):
1186    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
1187    monkeypatch.setenv("NIPUX_TEST_KEY", "")
1188    db = AgentDB(tmp_path / "state.db")
1189    try:
1190        job_id = db.create_job("Research topic", title="research")
1191    finally:
1192        db.close()
1193
1194    assert _chat_handle_line(job_id, "/model provider/model") is True
1195    assert _chat_handle_line(job_id, "/base-url https://example.com/v1") is True
1196    assert _chat_handle_line(job_id, "/context 8192") is True
1197    assert _chat_handle_line(job_id, "/input-cost 0.10") is True
1198    assert _chat_handle_line(job_id, "/output-cost 0.20") is True
1199    assert _chat_handle_line(job_id, "/max-cost 15") is True
1200    assert _chat_handle_line(job_id, "/timeout 45") is True
1201    assert _chat_handle_line(job_id, "/browser false") is True
1202    assert _chat_handle_line(job_id, "/web false") is True
1203    assert _chat_handle_line(job_id, "/cli-access false") is True
1204    assert _chat_handle_line(job_id, "/file-access false") is True
1205    assert _chat_handle_line(job_id, "/step-limit 90") is True
1206    assert _chat_handle_line(job_id, "/output-chars 4096") is True
1207    assert _chat_handle_line(job_id, "/daily-digest false") is True
1208    assert _chat_handle_line(job_id, "/digest-time 08:30") is True
1209    assert _chat_handle_line(job_id, "/api-key-env NIPUX_TEST_KEY") is True
1210    assert _chat_handle_line(job_id, "/api-key sk-test-value") is True
1211    out = capsys.readouterr().out
1212
1213    assert "saved model.name = provider/model" in out
1214    assert "saved model.base_url = https://example.com/v1" in out
1215    assert "saved model.context_length = 8192" in out
1216    assert "saved model.input_cost_per_million = 0.1" in out
1217    assert "saved model.output_cost_per_million = 0.2" in out
1218    assert "saved runtime.max_job_cost_usd = 15.0" in out
1219    assert "saved model.request_timeout_seconds = 45.0" in out
1220    assert "saved tools.browser = False" in out
1221    assert "saved tools.web = False" in out
1222    assert "saved tools.shell = False" in out
1223    assert "saved tools.files = False" in out
1224    assert "saved runtime.max_step_seconds = 90" in out
1225    assert "saved runtime.artifact_inline_char_limit = 4096" in out
1226    assert "saved runtime.daily_digest_enabled = False" in out
1227    assert "saved runtime.daily_digest_time = 08:30" in out
1228    assert "saved model.api_key_env = NIPUX_TEST_KEY" in out
1229    assert "saved NIPUX_TEST_KEY" in out
1230    assert "sk-test-value" not in out
1231    assert _mode(tmp_path / "config.yaml") == 0o600
1232    assert _mode(tmp_path / ".env") == 0o600
1233    assert _config_field_value("model.name") == "provider/model"
1234    assert _config_field_value("model.base_url") == "https://example.com/v1"
1235    assert _config_field_value("model.context_length") == 8192
1236    assert _config_field_value("model.input_cost_per_million") == 0.1
1237    assert _config_field_value("model.output_cost_per_million") == 0.2
1238    assert _config_field_value("runtime.max_job_cost_usd") == 15.0
1239    assert _config_field_value("model.request_timeout_seconds") == 45.0
1240    assert _config_field_value("tools.browser") is False
1241    assert _config_field_value("tools.web") is False
1242    assert _config_field_value("tools.shell") is False
1243    assert _config_field_value("tools.files") is False
1244    assert _config_field_value("runtime.max_step_seconds") == 90
1245    assert _config_field_value("runtime.artifact_inline_char_limit") == 4096
1246    assert _config_field_value("runtime.daily_digest_enabled") is False
1247    assert _config_field_value("runtime.daily_digest_time") == "08:30"
1248    assert "NIPUX_TEST_KEY=sk-test-value" in (tmp_path / ".env").read_text(encoding="utf-8")
1249
1250
1251def test_chat_init_slash_command_does_not_crash(monkeypatch, tmp_path, capsys):
1252    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
1253    db = AgentDB(tmp_path / "state.db")
1254    try:
1255        job_id = db.create_job("Research topic", title="research")
1256    finally:
1257        db.close()
1258
1259    assert _chat_handle_line(job_id, "/init") is True
1260
1261    out = capsys.readouterr().out
1262    assert "Wrote" in out
1263    assert (tmp_path / "config.yaml").exists()
1264    assert (tmp_path / ".env").exists()
1265
1266
1267def test_chat_config_slash_command_summarizes_runtime_without_secret(monkeypatch, tmp_path, capsys):
1268    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
1269    monkeypatch.setenv("NIPUX_TEST_KEY", "sk-test-value")
1270    db = AgentDB(tmp_path / "state.db")
1271    try:
1272        job_id = db.create_job("Research topic", title="research")
1273    finally:
1274        db.close()
1275    (tmp_path / "config.yaml").write_text(
1276        """
1277model:
1278  name: provider/model
1279  base_url: https://example.com/v1
1280  api_key_env: NIPUX_TEST_KEY
1281  context_length: 8192
1282  request_timeout_seconds: 45
1283  input_cost_per_million: 0.1
1284  output_cost_per_million: 0.2
1285runtime:
1286  max_step_seconds: 90
1287  max_job_cost_usd: 15
1288  artifact_inline_char_limit: 4096
1289  daily_digest_enabled: false
1290  daily_digest_time: "08:30"
1291""",
1292        encoding="utf-8",
1293    )
1294
1295    assert _chat_handle_line(job_id, "/config") is True
1296
1297    out = capsys.readouterr().out
1298    assert "config" in out
1299    assert "model: provider/model" in out
1300    assert "endpoint: https://example.com/v1" in out
1301    assert "key: set (NIPUX_TEST_KEY)" in out
1302    assert "context: 8192" in out
1303    assert "cost rates: input $0.1 / output $0.2 per 1M tokens" in out
1304    assert "job cost limit: $15" in out
1305    assert "sk-test-value" not in out
1306
1307
1308def test_chat_usage_slash_command_reports_tokens(monkeypatch, tmp_path, capsys):
1309    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
1310    db = AgentDB(tmp_path / "state.db")
1311    try:
1312        job_id = db.create_job("Research topic", title="research")
1313        db.append_event(
1314            job_id,
1315            event_type="loop",
1316            title="message_end",
1317            metadata={
1318                "usage": {
1319                    "prompt_tokens": 1000,
1320                    "completion_tokens": 250,
1321                    "total_tokens": 1250,
1322                    "cost": 0.0042,
1323                }
1324            },
1325        )
1326    finally:
1327        db.close()
1328
1329    assert _chat_handle_line(job_id, "/usage") is True
1330
1331    out = capsys.readouterr().out
1332    assert "usage research" in out
1333    assert "tokens: total=1.2K prompt=1.0K output=250" in out
1334    assert "cost=$0.0042" in out
1335
1336
1337def test_chat_usage_estimates_cost_from_configured_rates(monkeypatch, tmp_path, capsys):
1338    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
1339    (tmp_path / "config.yaml").write_text(
1340        """
1341model:
1342  name: provider/model
1343  base_url: https://example.com/v1
1344  input_cost_per_million: 1.0
1345  output_cost_per_million: 2.0
1346""",
1347        encoding="utf-8",
1348    )
1349    db = AgentDB(tmp_path / "state.db")
1350    try:
1351        job_id = db.create_job("Research topic", title="research")
1352        db.append_event(
1353            job_id,
1354            event_type="loop",
1355            title="message_end",
1356            metadata={
1357                "usage": {
1358                    "prompt_tokens": 1000,
1359                    "completion_tokens": 500,
1360                    "total_tokens": 1500,
1361                    "estimated": True,
1362                }
1363            },
1364        )
1365    finally:
1366        db.close()
1367
1368    assert _chat_handle_line(job_id, "/usage") is True
1369
1370    out = capsys.readouterr().out
1371    assert "cost=~$0.0020" in out
1372
1373
1374def test_chat_usage_shows_configured_job_cost_limit(monkeypatch, tmp_path, capsys):
1375    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
1376    (tmp_path / "config.yaml").write_text(
1377        """
1378model:
1379  name: provider/model
1380  base_url: https://example.com/v1
1381runtime:
1382  max_job_cost_usd: 5
1383""",
1384        encoding="utf-8",
1385    )
1386    db = AgentDB(tmp_path / "state.db")
1387    try:
1388        job_id = db.create_job("Research topic", title="research")
1389        db.append_event(
1390            job_id,
1391            event_type="loop",
1392            title="message_end",
1393            metadata={"usage": {"prompt_tokens": 1000, "completion_tokens": 500, "total_tokens": 1500, "cost": 1.25}},
1394        )
1395    finally:
1396        db.close()
1397
1398    assert _chat_handle_line(job_id, "/usage") is True
1399
1400    out = capsys.readouterr().out
1401    assert "limit: max job cost=$5 remaining=$3.7500" in out
1402
1403
1404def test_first_run_settings_slash_commands_persist_config(monkeypatch, tmp_path):
1405    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
1406
1407    action, payload = _handle_first_run_frame_line("/model provider/model")
1408
1409    assert action == "notice"
1410    assert isinstance(payload, list)
1411    assert any("saved model.name = provider/model" in line for line in payload)
1412    assert _config_field_value("model.name") == "provider/model"
1413
1414
1415def test_first_run_local_connector_action_sets_generic_local_endpoint(monkeypatch, tmp_path):
1416    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
1417
1418    action, payload = _handle_first_run_action("preset:local")
1419
1420    assert action == "notice"
1421    assert isinstance(payload, list)
1422    assert any("saved model.name = local-model" in line for line in payload)
1423    assert any("saved model.base_url = http://localhost:8000/v1" in line for line in payload)
1424    assert any("then run Doctor" in line for line in payload)
1425    assert _config_field_value("model.name") == "local-model"
1426    assert _config_field_value("model.base_url") == "http://localhost:8000/v1"
1427
1428
1429def test_first_run_access_action_toggles_generic_tools(monkeypatch, tmp_path):
1430    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
1431
1432    action, payload = _handle_first_run_action("toggle:tools.shell")
1433
1434    assert action == "notice"
1435    assert isinstance(payload, list)
1436    assert any("saved tools.shell = False" in line for line in payload)
1437    assert _config_field_value("tools.shell") is False
1438
1439
1440def test_first_run_doctor_success_opens_workspace_chat(monkeypatch, tmp_path):
1441    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
1442
1443    def fake_verify():
1444        _mark_test_model_ready()
1445        return ["ok model_setup verified"]
1446
1447    monkeypatch.setattr("nipux_cli.cli._verify_model_setup_from_first_run", fake_verify)
1448
1449    action, payload = _handle_first_run_action("doctor")
1450
1451    assert action == "open"
1452    assert payload == WORKSPACE_CHAT_ID
1453
1454
1455def test_first_run_open_workspace_action_requires_verified_model(monkeypatch, tmp_path):
1456    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
1457
1458    action, payload = _handle_first_run_action("open_workspace")
1459
1460    assert action == "notice"
1461    assert "Run Doctor first" in str(payload)
1462
1463
1464def test_first_run_open_workspace_action_opens_after_verified_model(monkeypatch, tmp_path):
1465    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
1466    _mark_test_model_ready()
1467
1468    action, payload = _handle_first_run_action("open_workspace")
1469
1470    assert action == "open"
1471    assert payload == WORKSPACE_CHAT_ID
1472
1473
1474def test_workspace_frame_snapshot_exists_without_jobs(monkeypatch, tmp_path):
1475    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
1476    _mark_test_model_ready()
1477
1478    snapshot = _load_frame_snapshot(WORKSPACE_CHAT_ID, history_limit=4)
1479
1480    assert snapshot["job_id"] == WORKSPACE_CHAT_ID
1481    assert snapshot["job"]["kind"] == "workspace"
1482    assert snapshot["jobs"] == []
1483
1484    frame = _build_chat_frame(snapshot, "", [], width=118, height=28)
1485    assert "Type a goal in plain English to start a worker" in frame
1486    assert "Type a goal to create the first worker" in frame
1487    assert "Enter sends" not in frame
1488
1489
1490def test_workspace_frame_right_pane_tracks_focused_worker(monkeypatch, tmp_path):
1491    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
1492    _mark_test_model_ready()
1493    db = AgentDB(tmp_path / "state.db")
1494    try:
1495        job_id = db.create_job("Research browser automation libraries", title="browser research")
1496        db.add_artifact(
1497            job_id=job_id,
1498            path=tmp_path / "comparison.md",
1499            sha256="abc123",
1500            artifact_type="text",
1501            title="Browser Automation Comparison Draft",
1502            summary="Checkpoint: saved comparison draft.",
1503        )
1504        _write_shell_state({"focus_job_id": job_id})
1505    finally:
1506        db.close()
1507
1508    snapshot = _load_frame_snapshot(WORKSPACE_CHAT_ID, history_limit=4)
1509    frame = _build_chat_frame(snapshot, "", [], width=128, height=28, right_view="updates")
1510
1511    assert snapshot["job"]["kind"] == "workspace"
1512    assert snapshot["right_job"]["title"] == "browser research"
1513    assert "browser research" in frame
1514    assert "Browser Automation" in frame
1515    assert "Comparison Draft" in frame
1516
1517
1518def test_workspace_slash_new_creates_and_focuses_job(monkeypatch, tmp_path):
1519    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
1520    _mark_test_model_ready()
1521    started = {}
1522
1523    def fake_start(**kwargs):
1524        started.update(kwargs)
1525
1526    monkeypatch.setattr("nipux_cli.cli._start_daemon_if_needed", fake_start)
1527
1528    keep_running, output = _capture_chat_command(WORKSPACE_CHAT_ID, "/new Build a durable workflow")
1529
1530    assert keep_running is True
1531    assert "Created worker job: Build a durable workflow" in output
1532    assert "Started worker" in output
1533    assert started["poll_seconds"] == 0.0
1534    assert started["quiet"] is True
1535    db = AgentDB(tmp_path / "state.db")
1536    try:
1537        jobs = db.list_jobs()
1538        assert len(jobs) == 1
1539        assert jobs[0]["title"] == "Build a durable workflow"
1540        assert _read_shell_state().get("focus_job_id") == jobs[0]["id"]
1541    finally:
1542        db.close()
1543
1544
1545def test_workspace_chat_job_dossier_includes_progress_outputs_and_outcomes(tmp_path):
1546    db = AgentDB(tmp_path / "state.db")
1547    try:
1548        job_id = db.create_job("Research and validate a workflow", title="workflow research")
1549        run_id = db.start_run(job_id, model="fake")
1550        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="write_artifact")
1551        db.finish_step(step_id, status="completed", output_data={"artifact_id": "art_demo"})
1552        db.add_artifact(
1553            job_id=job_id,
1554            path=tmp_path / "workflow.md",
1555            sha256="abc123",
1556            artifact_type="text",
1557            title="Workflow Evidence",
1558            summary="saved research notes",
1559        )
1560        db.append_task_record(job_id, title="Compare source-backed options", status="active", output_contract="research")
1561        db.append_source_record(job_id, "https://example.com/source", source_type="web")
1562        db.append_finding_record(job_id, name="Useful research finding", source_url="https://example.com/source")
1563        db.append_experiment_record(
1564            job_id,
1565            title="Validation check",
1566            status="measured",
1567            metric_name="pass_rate",
1568            metric_value=0.9,
1569        )
1570        job = db.get_job(job_id)
1571        dossier = _workspace_chat_job_dossier(db, [job])
1572    finally:
1573        db.close()
1574
1575    assert "workflow research" in dossier
1576    assert "outputs=1" in dossier
1577    assert "findings=1" in dossier
1578    assert "sources=1" in dossier
1579    assert "experiments=1" in dossier
1580    assert "active task: active Compare source-backed options [research]" in dossier
1581    assert "latest outputs: Workflow Evidence" in dossier
1582    assert "recent outcomes:" in dossier
1583
1584
1585def test_workspace_run_with_objective_creates_worker_when_no_job_matches(monkeypatch, tmp_path):
1586    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
1587    _mark_test_model_ready()
1588    started = {}
1589
1590    def fake_start(**kwargs):
1591        started.update(kwargs)
1592
1593    monkeypatch.setattr("nipux_cli.cli._start_daemon_if_needed", fake_start)
1594    monkeypatch.setattr(
1595        "nipux_cli.cli._refine_job_objective_for_worker",
1596        lambda *, message, objective: objective,
1597    )
1598
1599    keep_running, output = _capture_chat_command(WORKSPACE_CHAT_ID, "/run research browser automation libraries")
1600
1601    assert keep_running is True
1602    assert "Created worker job" in output
1603    assert started["quiet"] is True
1604    db = AgentDB(tmp_path / "state.db")
1605    try:
1606        jobs = db.list_jobs()
1607        assert len(jobs) == 1
1608        assert jobs[0]["title"] == "research browser automation libraries"
1609    finally:
1610        db.close()
1611
1612
1613def test_workspace_run_with_existing_job_does_not_create_duplicate(monkeypatch, tmp_path):
1614    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
1615    _mark_test_model_ready()
1616    db = AgentDB(tmp_path / "state.db")
1617    try:
1618        job_id = db.create_job("research browser automation libraries", title="research browser automation libraries")
1619    finally:
1620        db.close()
1621    started = {}
1622
1623    def fake_start(**kwargs):
1624        started.update(kwargs)
1625
1626    monkeypatch.setattr("nipux_cli.cli._start_daemon_if_needed", fake_start)
1627
1628    keep_running, output = _capture_chat_command(WORKSPACE_CHAT_ID, "/run research browser automation libraries")
1629
1630    assert keep_running is True
1631    assert "Created worker job" not in output
1632    assert "focus set" in output
1633    assert started["quiet"] is True
1634    db = AgentDB(tmp_path / "state.db")
1635    try:
1636        assert len(db.list_jobs()) == 1
1637        assert _read_shell_state().get("focus_job_id") == job_id
1638    finally:
1639        db.close()
1640
1641
1642def test_workspace_start_with_existing_job_runs_without_parser_error(monkeypatch, tmp_path):
1643    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
1644    _mark_test_model_ready()
1645    db = AgentDB(tmp_path / "state.db")
1646    try:
1647        job_id = db.create_job("research browser automation libraries", title="research browser automation libraries")
1648    finally:
1649        db.close()
1650    started = {}
1651
1652    def fake_start(**kwargs):
1653        started.update(kwargs)
1654
1655    monkeypatch.setattr("nipux_cli.cli._start_daemon_if_needed", fake_start)
1656
1657    keep_running, output = _capture_chat_command(WORKSPACE_CHAT_ID, "/start research browser automation libraries")
1658
1659    assert keep_running is True
1660    assert "command exited" not in output
1661    assert "Created worker job" not in output
1662    assert "focus set" in output
1663    assert started["quiet"] is True
1664    db = AgentDB(tmp_path / "state.db")
1665    try:
1666        assert len(db.list_jobs()) == 1
1667        assert _read_shell_state().get("focus_job_id") == job_id
1668    finally:
1669        db.close()
1670
1671
1672def test_workspace_slash_new_without_objective_is_minimal(monkeypatch, tmp_path):
1673    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
1674    _mark_test_model_ready()
1675
1676    keep_running, output = _capture_chat_command(WORKSPACE_CHAT_ID, "/new")
1677
1678    assert keep_running is True
1679    assert output.strip() == "usage: /new OBJECTIVE"
1680    assert "for example" not in output.lower()
1681
1682
1683def test_workspace_slash_new_hides_model_preflight_noise(monkeypatch, tmp_path):
1684    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
1685    _mark_test_model_ready()
1686
1687    def fake_start(**_kwargs):
1688        print("model is not ready; daemon not started")
1689        print("  fail model_endpoint: http://localhost:8000/v1/models: connection refused")
1690        print("Run `nipux doctor --check-model` after fixing the model configuration.")
1691
1692    monkeypatch.setattr("nipux_cli.cli._start_daemon_if_needed", fake_start)
1693
1694    keep_running, output = _capture_chat_command(WORKSPACE_CHAT_ID, "/new Build a durable workflow")
1695
1696    assert keep_running is True
1697    assert "Created worker job: Build a durable workflow" in output
1698    assert "Worker is waiting for a working model" in output
1699    assert "model_endpoint" not in output
1700    assert "connection refused" not in output
1701
1702
1703def test_workspace_settings_slash_commands_persist_config(monkeypatch, tmp_path):
1704    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
1705    _mark_test_model_ready()
1706
1707    keep_running, output = _capture_chat_command(WORKSPACE_CHAT_ID, "/model provider/model")
1708
1709    assert keep_running is True
1710    assert "saved model.name = provider/model" in output
1711    assert _config_field_value("model.name") == "provider/model"
1712
1713    db = AgentDB(tmp_path / "state.db")
1714    try:
1715        assert db.list_jobs() == []
1716    finally:
1717        db.close()
1718
1719
1720def test_workspace_settings_slash_command_summarizes_config(monkeypatch, tmp_path):
1721    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
1722    monkeypatch.setenv("NIPUX_TEST_KEY", "sk-test-value")
1723    _mark_test_model_ready()
1724    (tmp_path / "config.yaml").write_text(
1725        """
1726model:
1727  name: provider/model
1728  base_url: https://example.com/v1
1729  api_key_env: NIPUX_TEST_KEY
1730  context_length: 8192
1731  request_timeout_seconds: 45
1732""",
1733        encoding="utf-8",
1734    )
1735
1736    keep_running, output = _capture_chat_command(WORKSPACE_CHAT_ID, "/settings")
1737
1738    assert keep_running is True
1739    assert "config" in output
1740    assert "model: provider/model" in output
1741    assert "endpoint: https://example.com/v1" in output
1742    assert "key: set (NIPUX_TEST_KEY)" in output
1743    assert "sk-test-value" not in output
1744
1745
1746def test_workspace_natural_control_phrase_uses_mapped_command(monkeypatch, tmp_path):
1747    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
1748    _mark_test_model_ready()
1749
1750    keep_running, output = _capture_chat_command(WORKSPACE_CHAT_ID, "change model")
1751
1752    assert keep_running is True
1753    assert "model.name =" in output
1754    assert "usage: /model MODEL" in output
1755
1756    db = AgentDB(tmp_path / "state.db")
1757    try:
1758        assert db.list_jobs() == []
1759    finally:
1760        db.close()
1761
1762
1763def test_workspace_natural_settings_phrase_opens_settings_summary(monkeypatch, tmp_path):
1764    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
1765    monkeypatch.setenv("NIPUX_TEST_KEY", "sk-test-value")
1766    _mark_test_model_ready()
1767    (tmp_path / "config.yaml").write_text(
1768        """
1769model:
1770  name: provider/model
1771  base_url: https://example.com/v1
1772  api_key_env: NIPUX_TEST_KEY
1773""",
1774        encoding="utf-8",
1775    )
1776
1777    keep_running, output = _capture_chat_command(WORKSPACE_CHAT_ID, "settings")
1778
1779    assert keep_running is True
1780    assert "config" in output
1781    assert "model: provider/model" in output
1782    assert "key: set (NIPUX_TEST_KEY)" in output
1783    assert "usage: /model MODEL" not in output
1784    assert "sk-test-value" not in output
1785
1786
1787def test_workspace_how_to_start_job_question_uses_local_help(monkeypatch, tmp_path):
1788    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
1789    _mark_test_model_ready()
1790
1791    def fail_model(_line):
1792        raise AssertionError("model should not be called for local help")
1793
1794    monkeypatch.setattr("nipux_cli.cli._reply_to_workspace_chat", fail_model)
1795
1796    keep_running, output = _capture_chat_command(WORKSPACE_CHAT_ID, "how do I start a job?")
1797
1798    assert keep_running is True
1799    assert "Create: type a goal" in output
1800    assert "/new OBJECTIVE" in output
1801
1802
1803def test_workspace_chat_connection_error_is_operator_friendly(monkeypatch, tmp_path):
1804    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
1805    _mark_test_model_ready()
1806
1807    def raise_connection(_line):
1808        raise RuntimeError("APIConnectionError: Connection error.")
1809
1810    monkeypatch.setattr("nipux_cli.cli._reply_to_workspace_chat", raise_connection)
1811
1812    ok, message = _handle_workspace_chat_message("hello", quiet=True)
1813
1814    assert ok is True
1815    assert message == (
1816        "Model endpoint is unreachable. Check /base-url or start the configured model server, then run /doctor."
1817    )
1818    snapshot = _load_frame_snapshot(WORKSPACE_CHAT_ID, history_limit=4)
1819    bodies = "\n".join(str(event.get("body") or "") for event in snapshot["events"])
1820    assert "APIConnectionError" not in bodies
1821    assert "Model endpoint is unreachable" in bodies
1822
1823
1824def test_chat_start_reports_model_provider_not_ready(monkeypatch, tmp_path, capsys):
1825    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
1826    _mark_test_model_ready()
1827    db = AgentDB(tmp_path / "state.db")
1828    try:
1829        job_id = db.create_job("Research topic", title="research")
1830    finally:
1831        db.close()
1832
1833    def fake_start(**_kwargs):
1834        print("model is not ready; daemon not started")
1835        print("  fail model_endpoint: http://localhost:8000/v1/models: connection refused")
1836
1837    monkeypatch.setattr("nipux_cli.cli._start_daemon_if_needed", fake_start)
1838
1839    assert _chat_handle_line(job_id, "/start") is True
1840
1841    out = capsys.readouterr().out
1842    assert "worker not started: model provider is not ready. Use /settings, then /doctor." in out
1843    assert "model_endpoint" not in out
1844    assert "connection refused" not in out
1845
1846
1847def test_chat_doctor_checks_configured_model(monkeypatch, tmp_path, capsys):
1848    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
1849    _mark_test_model_ready()
1850    db = AgentDB(tmp_path / "state.db")
1851    try:
1852        job_id = db.create_job("Research topic", title="research")
1853    finally:
1854        db.close()
1855
1856    seen = {}
1857
1858    def fake_doctor(*, config, check_model):
1859        seen["check_model"] = check_model
1860        return [Check("model_generation", True, "ok")]
1861
1862    monkeypatch.setattr("nipux_cli.cli.run_doctor", fake_doctor)
1863
1864    assert _chat_handle_line(job_id, "/doctor") is True
1865
1866    assert seen["check_model"] is True
1867    assert "model_generation" in capsys.readouterr().out
1868
1869
1870def test_workspace_chat_control_phrase_runs_job_command(monkeypatch, tmp_path):
1871    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
1872    _mark_test_model_ready()
1873    db = AgentDB(tmp_path / "state.db")
1874    try:
1875        job_id = db.create_job("Research topic", title="research")
1876    finally:
1877        db.close()
1878
1879    keep_running, message = _handle_workspace_chat_message("stop the job", quiet=True)
1880
1881    assert keep_running is True
1882    assert "paused research" in message
1883    db = AgentDB(tmp_path / "state.db")
1884    try:
1885        job = db.get_job(job_id)
1886        assert job["status"] == "paused"
1887        events = _read_shell_state().get("workspace_chat_events") or []
1888        assert any(
1889            event["event_type"] == "agent_message"
1890            and event["metadata"].get("command") == "/pause"
1891            and "paused research" in event["body"]
1892            for event in events
1893        )
1894    finally:
1895        db.close()
1896
1897
1898def test_shell_ls_alias_lists_jobs_instead_of_steering(monkeypatch, tmp_path, capsys):
1899    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
1900    db = AgentDB(tmp_path / "state.db")
1901    try:
1902        db.create_job("Research topic", title="research")
1903    finally:
1904        db.close()
1905
1906    assert _run_shell_line("ls") is True
1907
1908    out = capsys.readouterr().out
1909    assert "research" in out
1910    assert "queued for" not in out
1911
1912
1913def test_roadmap_command_renders_roadmap(monkeypatch, tmp_path, capsys):
1914    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
1915    db = AgentDB(tmp_path / "state.db")
1916    try:
1917        job_id = db.create_job("Broad work", title="broad")
1918        db.append_roadmap_record(
1919            job_id,
1920            title="Broad Roadmap",
1921            status="active",
1922            current_milestone="Foundation",
1923            milestones=[{
1924                "title": "Foundation",
1925                "status": "validating",
1926                "validation_status": "pending",
1927                "features": [{"title": "First feature", "status": "done"}],
1928            }],
1929        )
1930    finally:
1931        db.close()
1932
1933    main(["roadmap", "broad"])
1934
1935    out = capsys.readouterr().out
1936    assert "roadmap broad" in out
1937    assert "Broad Roadmap" in out
1938    assert "Foundation" in out
1939    assert "validation=pending" in out
1940
1941
1942def test_shell_focus_controls_default_steering_job(monkeypatch, tmp_path, capsys):
1943    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
1944    db = AgentDB(tmp_path / "state.db")
1945    try:
1946        first = db.create_job("Research topic", title="first research")
1947        second = db.create_job("Find investors", title="investor search")
1948    finally:
1949        db.close()
1950
1951    assert _run_shell_line("focus investor") is True
1952    assert _run_shell_line("prioritize Toronto findings") is True
1953
1954    out = capsys.readouterr().out
1955    db = AgentDB(tmp_path / "state.db")
1956    try:
1957        first_job = db.get_job(first)
1958        second_job = db.get_job(second)
1959        assert "focus set:" in out
1960        assert first_job["metadata"].get("operator_messages") is None
1961        assert second_job["metadata"]["operator_messages"][-1]["message"] == "prioritize Toronto findings"
1962    finally:
1963        db.close()
1964
1965
1966def test_shell_rename_updates_job_title_and_program(monkeypatch, tmp_path, capsys):
1967    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
1968    db = AgentDB(tmp_path / "state.db")
1969    try:
1970        job_id = db.create_job("Research topic", title="old title")
1971        program = tmp_path / "jobs" / job_id / "program.md"
1972        program.parent.mkdir(parents=True, exist_ok=True)
1973        program.write_text("# old title\n\nBody\n", encoding="utf-8")
1974    finally:
1975        db.close()
1976
1977    assert _run_shell_line("rename old title --title new title") is True
1978
1979    out = capsys.readouterr().out
1980    db = AgentDB(tmp_path / "state.db")
1981    try:
1982        job = db.get_job(job_id)
1983        assert "renamed old title -> new title" in out
1984        assert job["title"] == "new title"
1985        assert program.read_text(encoding="utf-8").startswith("# new title\n")
1986    finally:
1987        db.close()
1988
1989
1990def test_shell_delete_removes_job_and_artifact_dir(monkeypatch, tmp_path, capsys):
1991    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
1992    db = AgentDB(tmp_path / "state.db")
1993    try:
1994        job_id = db.create_job("Research topic", title="delete me")
1995        run_id = db.start_run(job_id, model="fake")
1996        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="write_artifact")
1997        store = ArtifactStore(tmp_path, db=db)
1998        stored = store.write_text(
1999            job_id=job_id,
2000            run_id=run_id,
2001            step_id=step_id,
2002            title="Artifact",
2003            summary="saved",
2004            content="content",
2005        )
2006        artifact_path = stored.path
2007    finally:
2008        db.close()
2009
2010    assert artifact_path.exists()
2011    assert _run_shell_line("delete delete me") is True
2012
2013    out = capsys.readouterr().out
2014    db = AgentDB(tmp_path / "state.db")
2015    try:
2016        assert "deleted delete me" in out
2017        try:
2018            db.get_job(job_id)
2019        except KeyError:
2020            pass
2021        else:
2022            raise AssertionError("job still exists after shell delete")
2023        assert not artifact_path.exists()
2024        assert not (tmp_path / "jobs" / job_id).exists()
2025    finally:
2026        db.close()
2027
2028
2029def test_shell_help_has_no_examples_or_control_run_sections(capsys):
2030    _print_shell_help()
2031
2032    out = capsys.readouterr().out
2033    assert "Examples:" not in out
2034    assert "\nControl\n" not in out
2035    assert "\nRun\n" not in out
2036    assert "delete JOB_TITLE" in out
2037    assert "usage [JOB_TITLE]" in out
2038    assert "update" in out
2039    assert "Jobs" in out
2040    assert "Worker" in out
2041
2042
2043def test_update_checkout_falls_back_to_tool_install_for_non_git_path(monkeypatch, tmp_path):
2044    monkeypatch.setattr("shutil.which", lambda name: "/usr/bin/uv" if name == "uv" else None)
2045
2046    def runner(command):
2047        assert command == [
2048            "/usr/bin/uv",
2049            "tool",
2050            "install",
2051            "--force",
2052            "--upgrade",
2053            "--reinstall",
2054            "--refresh",
2055            "git+https://github.com/nipuxx/agent-cli.git@main",
2056        ]
2057        return subprocess.CompletedProcess(command, 0, stdout="Installed nipux\n")
2058
2059    code, lines = _update_checkout(path=tmp_path, command_runner=runner)
2060
2061    assert code == 0
2062    rendered = "\n".join(lines)
2063    assert "is not a source checkout; updating the installed Nipux tool instead" in rendered
2064    assert "not a git checkout" not in rendered
2065    assert "Update complete" in rendered
2066
2067
2068def test_update_checkout_upgrades_uv_tool_when_installed_package(monkeypatch):
2069    monkeypatch.setattr("nipux_cli.updater.find_checkout_root", lambda: None)
2070    monkeypatch.setattr("shutil.which", lambda name: "/usr/bin/uv" if name == "uv" else None)
2071    calls: list[tuple[str, ...]] = []
2072
2073    def runner(command):
2074        calls.append(tuple(command))
2075        return subprocess.CompletedProcess(command, 0, stdout="Resolved 1 package\nInstalled nipux\n")
2076
2077    code, lines = _update_checkout(command_runner=runner)
2078
2079    assert code == 0
2080    assert calls == [
2081        (
2082            "/usr/bin/uv",
2083            "tool",
2084            "install",
2085            "--force",
2086            "--upgrade",
2087            "--reinstall",
2088            "--refresh",
2089            "git+https://github.com/nipuxx/agent-cli.git@main",
2090        )
2091    ]
2092    rendered = "\n".join(lines)
2093    assert "not a git checkout" not in rendered
2094    assert "Updating installed Nipux command" in rendered
2095    assert "Nipux command refreshed from source" in rendered
2096    assert "Update complete" in rendered
2097
2098
2099def test_update_checkout_fast_forwards_git_checkout(tmp_path):
2100    repo = tmp_path / "repo"
2101    repo.mkdir()
2102    (repo / ".git").mkdir()
2103    calls: list[tuple[str, ...]] = []
2104    rev_calls = 0
2105
2106    def runner(command, cwd):
2107        nonlocal rev_calls
2108        assert cwd == repo
2109        calls.append(tuple(command))
2110        if command == ["git", "rev-parse", "--show-toplevel"]:
2111            return subprocess.CompletedProcess(command, 0, stdout=str(repo) + "\n")
2112        if command == ["git", "rev-parse", "--short", "HEAD"]:
2113            rev_calls += 1
2114            return subprocess.CompletedProcess(command, 0, stdout=("aaa111\n" if rev_calls == 1 else "bbb222\n"))
2115        if command == ["git", "branch", "--show-current"]:
2116            return subprocess.CompletedProcess(command, 0, stdout="main\n")
2117        if command == ["git", "status", "--porcelain"]:
2118            return subprocess.CompletedProcess(command, 0, stdout="")
2119        if command == ["git", "pull", "--ff-only"]:
2120            return subprocess.CompletedProcess(command, 0, stdout="Fast-forward\n")
2121        raise AssertionError(f"unexpected command: {command}")
2122
2123    code, lines = _update_checkout(path=repo, runner=runner)
2124
2125    assert code == 0
2126    assert ("git", "pull", "--ff-only") in calls
2127    rendered = "\n".join(lines)
2128    assert "Fast-forward" in rendered
2129    assert "aaa111 -> bbb222" in rendered
2130
2131
2132def test_update_checkout_verifies_installed_command(monkeypatch):
2133    monkeypatch.setattr("nipux_cli.updater.find_checkout_root", lambda: None)
2134
2135    def which(name):
2136        if name == "uv":
2137            return "/usr/bin/uv"
2138        if name == "nipux":
2139            return "/Users/me/.local/bin/nipux"
2140        return None
2141
2142    monkeypatch.setattr("shutil.which", which)
2143    calls: list[tuple[str, ...]] = []
2144
2145    def runner(command):
2146        calls.append(tuple(command))
2147        if command == ["/Users/me/.local/bin/nipux", "--version"]:
2148            return subprocess.CompletedProcess(command, 0, stdout="nipux 0.1.0\n")
2149        return subprocess.CompletedProcess(command, 0, stdout="Installed nipux\n")
2150
2151    code, lines = _update_checkout(command_runner=runner)
2152
2153    assert code == 0
2154    assert calls[-1] == ("/Users/me/.local/bin/nipux", "--version")
2155    assert "Verified: nipux 0.1.0" in "\n".join(lines)
2156
2157
2158def test_update_command_reports_no_restart_when_daemon_is_stopped(monkeypatch, tmp_path, capsys):
2159    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
2160    monkeypatch.setattr("nipux_cli.cli.update_checkout", lambda **_kwargs: (0, ["Update complete."]))
2161    monkeypatch.setattr(
2162        "nipux_cli.cli.daemon_lock_status",
2163        lambda _path: {"running": False, "metadata": {}},
2164    )
2165
2166    args = build_parser().parse_args(["update"])
2167    args.func(args)
2168
2169    out = capsys.readouterr().out
2170    assert "Update complete." in out
2171    assert "No daemon is running; no restart needed." in out
2172
2173
2174def test_update_command_restarts_running_daemon(monkeypatch, tmp_path, capsys):
2175    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
2176    monkeypatch.setattr("nipux_cli.cli.update_checkout", lambda **_kwargs: (0, ["Update complete."]))
2177    monkeypatch.setattr(
2178        "nipux_cli.cli.daemon_lock_status",
2179        lambda _path: {"running": True, "metadata": {"pid": 123}},
2180    )
2181    restarted = {}
2182
2183    def fake_restart(args):
2184        restarted["wait"] = args.wait
2185        restarted["quiet"] = args.quiet
2186        print("restart ok")
2187
2188    monkeypatch.setattr("nipux_cli.cli.cmd_restart", fake_restart)
2189
2190    args = build_parser().parse_args(["update"])
2191    args.func(args)
2192
2193    out = capsys.readouterr().out
2194    assert "Restarting running daemon" in out
2195    assert "restart ok" in out
2196    assert restarted == {"wait": 5.0, "quiet": True}
2197
2198
2199def test_update_command_no_restart_flag_skips_running_daemon(monkeypatch, tmp_path, capsys):
2200    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
2201    monkeypatch.setattr("nipux_cli.cli.update_checkout", lambda **_kwargs: (0, ["Update complete."]))
2202    monkeypatch.setattr(
2203        "nipux_cli.cli.daemon_lock_status",
2204        lambda _path: {"running": True, "metadata": {"pid": 123}},
2205    )
2206    monkeypatch.setattr("nipux_cli.cli.cmd_restart", lambda _args: (_ for _ in ()).throw(AssertionError("restart")))
2207
2208    args = build_parser().parse_args(["update", "--no-restart"])
2209    args.func(args)
2210
2211    assert "Daemon restart skipped by --no-restart." in capsys.readouterr().out
2212
2213
2214def test_uninstall_dry_run_removes_installed_tool_by_default(monkeypatch, tmp_path, capsys):
2215    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
2216
2217    args = build_parser().parse_args(["uninstall", "--dry-run"])
2218    args.func(args)
2219
2220    out = capsys.readouterr().out
2221    assert "would remove" in out
2222    assert "would run uv tool uninstall nipux" in out
2223    assert "runtime removed" not in out
2224
2225
2226def test_uninstall_keep_tool_skips_tool_removal(monkeypatch, tmp_path, capsys):
2227    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
2228
2229    args = build_parser().parse_args(["uninstall", "--dry-run", "--keep-tool"])
2230    args.func(args)
2231
2232    out = capsys.readouterr().out
2233    assert "would remove" in out
2234    assert "uv tool uninstall nipux" not in out
2235
2236
2237def test_uninstall_runtime_skips_missing_systemd_service_without_runner_noise(monkeypatch, tmp_path):
2238    from nipux_cli.uninstall import uninstall_runtime
2239
2240    monkeypatch.setenv("HOME", str(tmp_path))
2241    runtime = tmp_path / ".nipux"
2242    runtime.mkdir()
2243    calls = []
2244
2245    def which(name):
2246        if name == "systemctl":
2247            return "/bin/systemctl"
2248        return None
2249
2250    def runner(command, **_kwargs):
2251        calls.append(command)
2252        return subprocess.CompletedProcess(command, 0, stdout="", stderr="")
2253
2254    monkeypatch.setattr("nipux_cli.uninstall.shutil.which", which)
2255
2256    lines = uninstall_runtime(runtime_home=runtime, dry_run=False, runner=runner)
2257
2258    assert calls == []
2259    rendered = "\n".join(lines)
2260    assert "disabled systemd" not in rendered
2261    assert "no installed service files found" in rendered
2262    assert f"removed {runtime}" in rendered
2263
2264
2265def test_chat_clear_does_not_queue_operator_message(monkeypatch, tmp_path, capsys):
2266    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
2267    db = AgentDB(tmp_path / "state.db")
2268    try:
2269        job_id = db.create_job("Research topic", title="research")
2270    finally:
2271        db.close()
2272
2273    assert _chat_handle_line(job_id, "clear") is True
2274
2275    out = capsys.readouterr().out
2276    db = AgentDB(tmp_path / "state.db")
2277    try:
2278        job = db.get_job(job_id)
2279        assert "\033[2J\033[H" in out
2280        assert job["metadata"].get("operator_messages") is None
2281    finally:
2282        db.close()
2283
2284
2285def test_minimal_live_event_line_summarizes_tool_steps():
2286    line = _minimal_live_event_line(
2287        {
2288            "event_type": "tool_call",
2289            "title": "shell_exec",
2290            "body": "",
2291            "metadata": {"input": {"arguments": {"command": "ssh server nvidia-smi"}}},
2292        }
2293    )
2294
2295    assert line == "start shell ssh server nvidia-smi"
2296
2297
2298def test_chat_frame_is_bounded_and_has_composer():
2299    snapshot = {
2300        "job_id": "job_demo",
2301        "job": {
2302            "id": "job_demo",
2303            "title": "demo job",
2304            "objective": "keep a generic long-running job visible",
2305            "status": "running",
2306            "kind": "generic",
2307            "metadata": {"task_queue": [{"status": "active", "title": "Draft next deliverable", "priority": 7}]},
2308        },
2309        "jobs": [{"id": "job_demo", "title": "demo job", "status": "running", "kind": "generic", "metadata": {}}],
2310        "steps": [
2311            {
2312                "step_no": 3,
2313                "status": "completed",
2314                "kind": "tool",
2315                "tool_name": "web_search",
2316                "summary": "web_search returned sources",
2317            }
2318        ],
2319        "artifacts": [{"id": "art_demo"}],
2320        "memory_entries": [{}],
2321        "events": [
2322            {
2323                "event_type": "agent_message",
2324                "title": "plan",
2325                "body": "I will plan this.\nPlan:\n- one\n- two\nQuestions:\n- answer?",
2326                "metadata": {},
2327            },
2328            {
2329                "event_type": "task",
2330                "title": "internal task",
2331                "body": "internal task body",
2332                "metadata": {},
2333            },
2334            {
2335                "event_type": "tool_result",
2336                "title": "web_search",
2337                "body": "web_search query='demo' returned 1 results",
2338                "metadata": {"status": "completed", "input": {"arguments": {"query": "demo"}}},
2339            }
2340        ],
2341        "daemon": {"running": True, "metadata": {"pid": 123}},
2342        "model": "model/demo",
2343        "base_url": "https://openrouter.ai/api/v1",
2344        "context_length": 8192,
2345        "token_usage": {
2346            "calls": 2,
2347            "latest_prompt_tokens": 4096,
2348            "completion_tokens": 1234,
2349            "total_tokens": 5330,
2350            "cost": 0.0123,
2351            "has_cost": True,
2352        },
2353    }
2354
2355    frame = _build_chat_frame(snapshot, "hello", [], width=100, height=22)
2356    wide_frame = _build_chat_frame(snapshot, "", [], width=140, height=22)
2357
2358    assert len(frame.splitlines()) <= 22
2359    assert "NIPUX" in frame
2360    assert "CHAT" in frame
2361    assert "MODEL UPDATES" in frame
2362    assert "NAV" not in frame
2363    assert "Visible" in frame
2364    assert "#3" not in frame
2365    assert "Jobs" in frame
2366    assert "Recent outcomes" not in frame
2367    assert "ctx" in frame
2368    assert "4.1K/8.2K" in frame
2369    assert "out" in frame
2370    assert "1.2K" in frame
2371    assert "tok" in frame
2372    assert "5.3K" in frame
2373    assert "$0.0123" in frame
2374    assert "NIPUX" in wide_frame
2375    assert "model model/demo  ctx 4.1K/8.2K" in wide_frame
2376    assert "daemon running" not in wide_frame
2377    assert wide_frame.splitlines()[1].startswith("━")
2378    assert "Enter sends" in frame
2379    assert "❯ hello" in frame
2380    jobs = _build_chat_frame(snapshot, "", [], width=100, height=26, right_view="status")
2381    assert "Draft next deli" in jobs
2382
2383    legacy_work = _build_chat_frame(snapshot, "", [], width=100, height=24, right_view="work")
2384    assert "MODEL UPDATES" in legacy_work
2385    assert "Tool / console" not in legacy_work
2386
2387    updates = _build_chat_frame(snapshot, "", [], width=100, height=24, right_view="updates")
2388    assert "MODEL UPDATES" in updates
2389    assert "Visible" in updates
2390
2391    settings = _build_chat_frame(snapshot, "", [], width=120, height=30, modal_view="settings")
2392    assert "Settings" in settings
2393    assert "/model MODEL" in settings
2394    assert "/api-key KEY" in settings
2395    assert "/base-url URL" in settings
2396    assert "Esc closes" in settings
2397    assert "NAV" not in settings
2398
2399    secret = _build_chat_frame(
2400        snapshot,
2401        "secret-value",
2402        [],
2403        width=100,
2404        height=24,
2405        editing_field="secret:model.api_key",
2406    )
2407    assert "Editing API key" in secret
2408    assert "secret-value" not in secret
2409    assert "••••" in secret
2410
2411
2412def test_chat_frame_separates_chat_from_worker_activity():
2413    snapshot = {
2414        "job_id": "job_demo",
2415        "job": {
2416            "id": "job_demo",
2417            "title": "demo job",
2418            "objective": "keep chat separate",
2419            "status": "running",
2420            "kind": "generic",
2421            "metadata": {},
2422        },
2423        "jobs": [{"id": "job_demo", "title": "demo job", "status": "running", "kind": "generic", "metadata": {}}],
2424        "steps": [],
2425        "artifacts": [],
2426        "memory_entries": [],
2427        "events": [
2428            {"event_type": "operator_message", "body": "start a benchmark job", "metadata": {}},
2429            {"event_type": "agent_message", "title": "chat", "body": "I created the job and started it.", "metadata": {}},
2430            {"event_type": "tool_call", "title": "shell_exec", "body": "", "metadata": {"input": {"arguments": {"command": "python bench.py"}}}},
2431            {"event_type": "tool_result", "title": "shell_exec", "body": "shell_exec rc=0", "metadata": {"status": "completed", "input": {"arguments": {"command": "python bench.py"}}}},
2432        ],
2433        "daemon": {"running": True, "metadata": {"pid": 123}},
2434        "model": "model/demo",
2435    }
2436
2437    frame = _build_chat_frame(snapshot, "", [], width=130, height=24, right_view="updates")
2438    chat_side = frame.split(" │ ", 1)[0]
2439
2440    assert "start a benchmark job" in frame
2441    assert "I created the job" in frame
2442    assert "Tool / console" not in frame
2443    assert "python bench.py" not in frame
2444    assert "python bench.py" not in chat_side
2445
2446
2447def test_chat_frame_empty_state_is_minimal_and_actionable():
2448    snapshot = {
2449        "job_id": "job_demo",
2450        "job": {
2451            "id": "job_demo",
2452            "title": "demo job",
2453            "objective": "keep chat visible",
2454            "status": "running",
2455            "kind": "generic",
2456            "metadata": {},
2457        },
2458        "jobs": [{"id": "job_demo", "title": "demo job", "status": "running", "kind": "generic", "metadata": {}}],
2459        "steps": [],
2460        "artifacts": [],
2461        "memory_entries": [],
2462        "events": [],
2463        "daemon": {"running": True, "metadata": {"pid": 123}},
2464        "model": "model/demo",
2465    }
2466
2467    frame = _build_chat_frame(snapshot, "", [], width=120, height=28)
2468
2469    assert "NIPUX" in frame
2470    assert "plain English" in frame
2471    assert "/new OBJECTIVE" in frame
2472    assert "/settings" in frame
2473    assert "███" not in frame
2474    assert "No chat yet." not in frame
2475    assert "star..." not in frame
2476
2477
2478def test_frame_emit_skips_unchanged_render(capsys):
2479    first = _emit_frame_if_changed("line one\nline two")
2480    second = _emit_frame_if_changed("frame", first)
2481    third = _emit_frame_if_changed("frame\nline three", second)
2482
2483    out = capsys.readouterr().out
2484    assert first == "line one\nline two"
2485    assert second == "frame"
2486    assert third == "frame\nline three"
2487    assert out.count("\033[H") == 1
2488    assert "\033[1;1H\033[2Kframe" in out
2489    assert "\033[2K" in out
2490    assert "\033[J" not in out
2491
2492
2493def test_chat_frame_does_not_cap_long_agent_messages():
2494    long_reply = (
2495        "**Completed Work:** "
2496        "1. Test suite analysis finished. "
2497        "2. Code analysis findings documented. "
2498        "3. Market readiness gaps identified. "
2499        "4. Packaging risks summarized. "
2500        "5. Daemon reliability checked. "
2501        "6. UI ergonomics reviewed. "
2502        "7. Final recommendation included."
2503    )
2504    snapshot = {
2505        "job_id": "job_demo",
2506        "job": {
2507            "id": "job_demo",
2508            "title": "demo job",
2509            "objective": "keep chat readable",
2510            "status": "running",
2511            "kind": "generic",
2512            "metadata": {},
2513        },
2514        "jobs": [{"id": "job_demo", "title": "demo job", "status": "running", "kind": "generic", "metadata": {}}],
2515        "steps": [],
2516        "artifacts": [],
2517        "memory_entries": [],
2518        "events": [
2519            {"event_type": "operator_message", "body": "what have you done so far", "metadata": {}},
2520            {"event_type": "agent_message", "title": "chat", "body": long_reply, "metadata": {}},
2521        ],
2522        "daemon": {"running": True, "metadata": {"pid": 123}},
2523        "model": "model/demo",
2524    }
2525
2526    frame = _build_chat_frame(snapshot, "", [], width=118, height=32)
2527
2528    assert "Completed Work:" in frame
2529    assert "Final recommendation included" in frame
2530    assert "…" not in frame
2531
2532
2533def test_plain_chat_control_intents_map_to_commands():
2534    assert _chat_control_command("how is it going?") == "/status"
2535    assert _chat_control_command("what is blocking it?") == "/status"
2536    assert _chat_control_command("check status") == "/status"
2537    assert _chat_control_command("what's happening?") == "/status"
2538    assert _chat_control_command("start working") == "/run"
2539    assert _chat_control_command("run it") == "/run"
2540    assert _chat_control_command("start worker") == "/run"
2541    assert _chat_control_command("pause this job") == "/pause"
2542    assert _chat_control_command("pause the job") == "/pause"
2543    assert _chat_control_command("pause it") == "/pause"
2544    assert _chat_control_command("stop the job") == "/pause"
2545    assert _chat_control_command("stop worker") == "/pause"
2546    assert _chat_control_command("resume the job") == "/resume"
2547    assert _chat_control_command("resume it") == "/resume"
2548    assert _chat_control_command("show jobs") == "/jobs"
2549    assert _chat_control_command("change model") == "/model"
2550    assert _chat_control_command("settings") == "/settings"
2551    assert _chat_control_command("show settings") == "/settings"
2552    assert _chat_control_command("how do I start a job?") == "/help"
2553    assert _chat_control_command("how much did it cost") == "/usage"
2554    assert _chat_control_command("what has it done") == "/outcomes"
2555    assert _chat_control_command("what have you done so far") == "/outcomes"
2556    assert _chat_control_command("what did the model do") == "/outcomes"
2557    assert _chat_control_command("what have all jobs done") == "/outcomes all"
2558    assert _chat_control_command("what files did it create") == "/artifacts"
2559    assert _chat_control_command("show me the saved files") == "/artifacts"
2560    assert _chat_control_command("what tool calls did it run") == "/activity"
2561    assert _chat_control_command("show console output") == "/outputs"
2562    assert _chat_control_command("what tasks are open") == "/tasks"
2563    assert _chat_control_command("show the current plan") == "/roadmap"
2564    assert _chat_control_command("show benchmarks") == "/experiments"
2565    assert _chat_control_command("how many tokens did it use") == "/usage"
2566    assert _chat_control_command("restart daemon") == "/restart"
2567    assert _chat_control_command("prefer artifact-backed findings") == ""
2568
2569
2570def test_plain_chat_classifier_keeps_natural_controls_out_of_model_path():
2571    assert _is_plain_chat_line("hello there") is True
2572    assert _is_plain_chat_line("stop the job") is False
2573    assert _is_plain_chat_line("what has it done") is False
2574    assert _is_plain_chat_line("show me the saved files") is False
2575
2576
2577def test_plain_chat_control_intent_does_not_queue_operator_context(monkeypatch, tmp_path):
2578    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
2579    _mark_test_model_ready()
2580    db = AgentDB(tmp_path / "state.db")
2581    try:
2582        job_id = db.create_job("Research topic", title="research")
2583    finally:
2584        db.close()
2585
2586    captured = {}
2587
2588    def fake_capture(job_id_arg, command):
2589        captured["job_id"] = job_id_arg
2590        captured["command"] = command
2591        return True, "status output\n"
2592
2593    monkeypatch.setattr("nipux_cli.cli._capture_chat_command", fake_capture)
2594
2595    keep_running, message = _handle_chat_message(job_id, "how is it going?", quiet=True)
2596
2597    assert keep_running is True
2598    assert message == "status output"
2599    assert captured == {"job_id": job_id, "command": "/status"}
2600    db = AgentDB(tmp_path / "state.db")
2601    try:
2602        job = db.get_job(job_id)
2603        assert job["metadata"].get("operator_messages") is None
2604    finally:
2605        db.close()
2606
2607
2608def test_plain_chat_reply_usage_is_recorded(monkeypatch, tmp_path):
2609    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
2610    _mark_test_model_ready()
2611    db = AgentDB(tmp_path / "state.db")
2612    try:
2613        job_id = db.create_job("Research topic", title="research")
2614    finally:
2615        db.close()
2616
2617    keep_running, message = _handle_chat_message(
2618        job_id,
2619        "hello",
2620        quiet=True,
2621        reply_fn=lambda _job_id, _line: LLMResponse(
2622            content="reply",
2623            usage={"prompt_tokens": 120, "completion_tokens": 20, "total_tokens": 140, "cost": 0.001},
2624            model="provider/model",
2625            response_id="gen_chat",
2626        ),
2627    )
2628
2629    db = AgentDB(tmp_path / "state.db")
2630    try:
2631        usage = db.job_token_usage(job_id)
2632        events = db.list_events(job_id=job_id, event_types=["loop"], limit=5)
2633    finally:
2634        db.close()
2635    assert keep_running is True
2636    assert message == ""
2637    assert usage["calls"] == 1
2638    assert usage["prompt_tokens"] == 120
2639    assert usage["completion_tokens"] == 20
2640    assert usage["cost"] == 0.001
2641    assert events[-1]["metadata"]["source"] == "chat"
2642    assert events[-1]["metadata"]["response_id"] == "gen_chat"
2643
2644
2645def test_chat_frame_surfaces_actual_work_events():
2646    snapshot = {
2647        "job_id": "job_demo",
2648        "job": {
2649            "id": "job_demo",
2650            "title": "demo job",
2651            "objective": "produce visible work",
2652            "status": "running",
2653            "kind": "generic",
2654            "metadata": {
2655                "task_queue": [{"status": "open"}],
2656                "roadmap": {"milestones": [{"title": "Draft", "status": "active"}]},
2657            },
2658        },
2659        "jobs": [{"id": "job_demo", "title": "demo job", "status": "running", "kind": "generic", "metadata": {}}],
2660        "steps": [],
2661        "artifacts": [{"id": "art_demo"}],
2662        "memory_entries": [{}],
2663        "events": [
2664            {"event_type": "operator_message", "body": "please keep improving", "metadata": {"mode": "steer"}},
2665            {"event_type": "tool_call", "title": "web_search", "body": "", "metadata": {"input": {"arguments": {"query": "agent harness distillation"}}}},
2666            {"event_type": "tool_result", "title": "web_search", "body": "web_search query='agent harness distillation' returned 5 results", "metadata": {"status": "completed", "input": {"arguments": {"query": "agent harness distillation"}}}},
2667            {"event_type": "artifact", "title": "Research Paper Draft", "body": "", "metadata": {"summary": "saved first complete draft"}},
2668            {"event_type": "finding", "title": "Distillation finding", "body": "tool traces improve student behavior", "metadata": {}},
2669            {"event_type": "task", "title": "Compare methods", "body": "", "metadata": {"status": "open"}},
2670            {"event_type": "roadmap", "title": "Paper roadmap", "body": "", "metadata": {"status": "active"}},
2671            {"event_type": "milestone_validation", "title": "Draft", "body": "", "metadata": {"validation_status": "passed"}},
2672            {"event_type": "experiment", "title": "Citation coverage check", "body": "", "metadata": {"metric_name": "sources", "metric_value": 18, "metric_unit": ""}},
2673            {"event_type": "lesson", "title": "strategy", "body": "prefer measured updates", "metadata": {}},
2674            {"event_type": "reflection", "title": "reflection", "body": "Reflection through step #10: next branch is evaluation.", "metadata": {}},
2675        ],
2676        "daemon": {"running": True, "metadata": {"pid": 123}},
2677        "model": "model/demo",
2678        "counts": {"steps": 10, "artifacts": 1, "memory": 1},
2679    }
2680
2681    updates = _build_chat_frame(snapshot, "", [], width=150, height=34, right_view="updates")
2682    jobs = _build_chat_frame(snapshot, "", [], width=150, height=34, right_view="status")
2683    frame = updates + "\n" + jobs
2684
2685    assert "Research Paper Draft" in frame
2686    assert "Distillation finding" in frame
2687    assert "Compare methods" in frame
2688    assert "Paper roadmap" in frame
2689    assert "passed Draft" in frame
2690    assert "Citation coverage check" in frame
2691    assert "LEARN" in frame
2692    assert "strategy" in frame
2693
2694
2695def test_chat_frame_has_model_updates_page():
2696    snapshot = {
2697        "job_id": "job_demo",
2698        "job": {
2699            "id": "job_demo",
2700            "title": "paper job",
2701            "objective": "write a paper",
2702            "status": "running",
2703            "kind": "generic",
2704            "metadata": {},
2705        },
2706        "jobs": [{"id": "job_demo", "title": "paper job", "status": "running", "kind": "generic", "metadata": {}}],
2707        "steps": [],
2708        "artifacts": [],
2709        "memory_entries": [],
2710        "events": [
2711            {"event_type": "tool_result", "title": "web_search", "body": "web_search query='distillation agents' returned 5 results", "metadata": {"status": "completed", "input": {"arguments": {"query": "distillation agents"}}}},
2712            {"event_type": "artifact", "title": "Literature Review Draft", "body": "saved draft", "metadata": {}},
2713            {"event_type": "finding", "title": "Trajectory distillation", "body": "teacher traces improve tool use", "metadata": {}},
2714            {"event_type": "experiment", "title": "Citation density check", "body": "", "metadata": {"metric_name": "citations", "metric_value": 12, "metric_unit": "count"}},
2715            {"event_type": "tool_result", "title": "write_file", "body": "write_file overwrite /tmp/paper.md", "metadata": {"status": "completed", "input": {"arguments": {"path": "/tmp/paper.md"}}, "output": {"path": "/tmp/paper.md"}}},
2716            {"event_type": "tool_result", "title": "shell_exec", "body": "shell_exec rc=0", "metadata": {"status": "completed", "input": {"arguments": {"command": "printf draft | tee /tmp/outline.md"}}}},
2717        ],
2718        "daemon": {"running": True, "metadata": {"pid": 123}},
2719        "model": "model/demo",
2720    }
2721
2722    frame = _build_chat_frame(snapshot, "", [], width=132, height=28, right_view="updates")
2723
2724    assert "MODEL UPDATES" in frame
2725    assert "Page" in frame
2726    assert "1 outputs" in frame
2727    assert "measurements" in frame
2728    assert "Literature Review Draft" in frame
2729    assert "Trajectory distillation" in frame
2730    assert "Citation density check" in frame
2731    assert "paper.md" in frame
2732    assert "outline.md" in frame
2733
2734
2735def test_workspace_status_page_does_not_render_fake_worker_when_no_jobs():
2736    snapshot = {
2737        "job_id": WORKSPACE_CHAT_ID,
2738        "job": {
2739            "id": WORKSPACE_CHAT_ID,
2740            "title": "Nipux",
2741            "objective": "Chat with Nipux to create, start, inspect, and steer long-running worker jobs.",
2742            "status": "ready",
2743            "kind": "workspace",
2744            "metadata": {},
2745        },
2746        "right_job": {
2747            "id": WORKSPACE_CHAT_ID,
2748            "title": "Nipux",
2749            "objective": "Chat with Nipux to create, start, inspect, and steer long-running worker jobs.",
2750            "status": "ready",
2751            "kind": "workspace",
2752            "metadata": {},
2753        },
2754        "right_job_id": WORKSPACE_CHAT_ID,
2755        "jobs": [],
2756        "steps": [],
2757        "artifacts": [],
2758        "job_artifacts": {},
2759        "job_summary_events": {},
2760        "job_counts": {},
2761        "memory_entries": [],
2762        "events": [],
2763        "right_events": [],
2764        "summary_events": [],
2765        "daemon": {"running": False, "metadata": {}},
2766        "model": "model/demo",
2767        "base_url": "http://127.0.0.1:8000/v1",
2768        "context_length": 0,
2769        "token_usage": {},
2770        "counts": {"steps": 0, "artifacts": 0, "memory": 0},
2771    }
2772
2773    frame = _build_chat_frame(snapshot, "", [], width=132, height=28, right_view="status")
2774
2775    assert "No workers yet" in frame
2776    assert "plain English goal" in frame
2777    assert "/new OBJECTIVE" in frame
2778    assert "Chat with Nipux to create" not in frame
2779    assert "actions:0" not in frame
2780
2781
2782def test_status_job_cards_show_durable_work_mix():
2783    events = [
2784        {"event_type": "artifact", "title": "Paper draft", "body": "", "metadata": {}},
2785        {"event_type": "finding", "title": "Method taxonomy", "body": "", "metadata": {}},
2786        {
2787            "event_type": "experiment",
2788            "title": "Citation coverage check",
2789            "body": "",
2790            "metadata": {"metric_name": "citations", "metric_value": 12, "metric_unit": "count"},
2791        },
2792    ]
2793    snapshot = {
2794        "job_id": "job_demo",
2795        "job": {
2796            "id": "job_demo",
2797            "title": "paper job",
2798            "objective": "write a paper",
2799            "status": "running",
2800            "kind": "generic",
2801            "metadata": {},
2802        },
2803        "jobs": [{"id": "job_demo", "title": "paper job", "status": "running", "kind": "generic", "metadata": {}}],
2804        "steps": [],
2805        "artifacts": [{"id": "art_1", "title": "Paper draft"}],
2806        "job_artifacts": {"job_demo": [{"id": "art_1", "title": "Paper draft"}]},
2807        "job_summary_events": {"job_demo": events},
2808        "job_counts": {"job_demo": {"artifacts": 1}},
2809        "memory_entries": [],
2810        "events": events,
2811        "summary_events": events,
2812        "daemon": {"running": True, "metadata": {"pid": 123}},
2813        "model": "model/demo",
2814    }
2815
2816    frame = _build_chat_frame(snapshot, "", [], width=150, height=34, right_view="status")
2817
2818    assert "work 1 outputs 1 findings 1 measurements" in frame
2819    assert "made 1 output" in frame
2820    assert "Paper draft" in frame
2821
2822
2823def test_recent_outcome_lines_wrap_long_updates():
2824    lines = recent_model_update_lines(
2825        [
2826            {
2827                "event_type": "finding",
2828                "title": "Trajectory distillation improves agentic tool selection when teacher traces include failures and recovery actions",
2829                "body": "",
2830                "metadata": {},
2831                "created_at": "2026-05-01T12:34:00+00:00",
2832            }
2833        ],
2834        width=62,
2835        limit=4,
2836    )
2837
2838    rendered = "\n".join(lines)
2839    assert len(lines) >= 2
2840    assert "Trajectory distillation improves" in rendered
2841    assert "teacher traces include" in rendered
2842    assert "failures" in rendered
2843
2844
2845def test_recent_outcome_lines_do_not_pretruncate_actual_work():
2846    events = [
2847        {
2848            "event_type": "artifact",
2849            "title": (
2850                "Research paper draft rewritten with a new methods section, expanded evaluation table, "
2851                "and integrated citations from teacher trajectory distillation, agent workflow distillation, "
2852                "and self-improvement harness papers"
2853            ),
2854            "body": "",
2855            "metadata": {},
2856            "created_at": "2026-05-01T12:34:00+00:00",
2857        }
2858    ]
2859
2860    rendered = "\n".join(recent_model_update_lines(events, width=72, limit=6))
2861
2862    assert "methods" in rendered
2863    assert "section" in rendered
2864    assert "integrated" in rendered
2865    assert "citations" in rendered
2866    assert "self-improvement harness" in rendered
2867    assert "papers" in rendered
2868    assert "..." not in rendered
2869
2870
2871def test_chat_updates_page_keeps_updates_to_one_line_each():
2872    long_task = (
2873        "open Publish a concise progress update and keep working on the next useful branch. "
2874        "This task title is deliberately long enough to wrap several times if the compact pane does not constrain it."
2875    )
2876    events = [
2877        {
2878            "event_type": "task",
2879            "title": "Publish progress",
2880            "body": long_task,
2881            "metadata": {"status": "open"},
2882            "created_at": "2026-04-25T20:00:00+00:00",
2883        }
2884    ]
2885
2886    lines = recent_model_update_lines(events, width=56, limit=4, wrap=False)
2887
2888    assert len(lines) == 1
2889    assert "Publish progress" in lines[0]
2890
2891
2892def test_chat_pane_marks_hidden_overflow():
2893    events = [
2894        {
2895            "event_type": "agent_message",
2896            "title": "chat",
2897            "body": " ".join(f"word{i}" for i in range(80)),
2898            "metadata": {},
2899            "created_at": "2026-04-25T12:00:00Z",
2900        }
2901    ]
2902
2903    lines = chat_pane_lines(events, [], width=48, rows=4)
2904
2905    assert "nipux" in lines[0]
2906    assert "middle lines hidden" in "\n".join(lines)
2907    assert "word" in lines[-1]
2908    assert len(lines) == 4
2909
2910
2911def test_chat_pane_groups_multiline_command_output_under_one_label():
2912    lines = chat_pane_lines(
2913        [],
2914        [
2915            "> /help",
2916            "Create: type a goal, or /new OBJECTIVE.\n"
2917            "Run: /run, /pause, /resume. Inspect: /jobs, /outcomes, /artifacts, /activity.\n"
2918            "Config: /settings, /model, /base-url, /api-key. Navigate: ←→ pages, ↑↓ jobs.",
2919        ],
2920        width=78,
2921        rows=12,
2922    )
2923
2924    rendered = "\n".join(lines)
2925    assert rendered.count("nipux") == 1
2926    assert "Create: type a goal" in rendered
2927    assert "Inspect: /jobs" in rendered
2928    assert "Config: /settings" in rendered
2929
2930
2931def test_chat_pane_suppresses_transient_duplicates_after_events_arrive():
2932    events = [
2933        {
2934            "event_type": "operator_message",
2935            "title": "chat",
2936            "body": "Hello",
2937            "metadata": {},
2938            "created_at": "2026-04-25T20:00:00+00:00",
2939        },
2940        {
2941            "event_type": "agent_message",
2942            "title": "chat",
2943            "body": "Hello! I can help with worker jobs.",
2944            "metadata": {},
2945            "created_at": "2026-04-25T20:00:01+00:00",
2946        },
2947    ]
2948
2949    lines = chat_pane_lines(
2950        events,
2951        ["> Hello", "sent; waiting for model", "Hello! I can help with worker jobs."],
2952        width=80,
2953        rows=12,
2954    )
2955
2956    rendered = "\n".join(lines)
2957    assert rendered.count("Hello") == 2
2958    assert "waiting for model" not in rendered
2959
2960
2961def test_chat_pane_hides_persisted_legacy_waiting_notice():
2962    lines = chat_pane_lines(
2963        [
2964            {
2965                "event_type": "operator_message",
2966                "title": "chat",
2967                "body": "Hello",
2968                "metadata": {},
2969                "created_at": "2026-04-25T20:00:00+00:00",
2970            },
2971            {
2972                "event_type": "agent_message",
2973                "title": "chat",
2974                "body": "sent; waiting for model",
2975                "metadata": {},
2976                "created_at": "2026-04-25T20:00:01+00:00",
2977            },
2978            {
2979                "event_type": "agent_message",
2980                "title": "chat",
2981                "body": "Hello! I can help with worker jobs.",
2982                "metadata": {},
2983                "created_at": "2026-04-25T20:00:02+00:00",
2984            },
2985        ],
2986        [],
2987        width=80,
2988        rows=12,
2989    )
2990
2991    rendered = "\n".join(lines)
2992    assert "waiting for model" not in rendered
2993    assert "Hello! I can help with worker jobs." in rendered
2994
2995
2996def test_chat_pane_renders_waiting_notice_as_animation_only():
2997    rendered_notices = _display_chat_notices([_WAITING_NOTICE])
2998    lines = chat_pane_lines([], rendered_notices, width=64, rows=4)
2999
3000    rendered = "\n".join(lines)
3001    assert "AGENT" in rendered
3002    assert "waiting" in rendered
3003    assert "Waiting for the next worker step" not in rendered
3004    assert "waiting for model" not in rendered
3005
3006
3007def test_chat_pane_hides_persisted_worker_waiting_text():
3008    lines = chat_pane_lines(
3009        [
3010            {
3011                "event_type": "operator_message",
3012                "title": "chat",
3013                "body": "what has it done so far?",
3014                "metadata": {},
3015                "created_at": "2026-04-25T20:00:00+00:00",
3016            },
3017            {
3018                "event_type": "agent_message",
3019                "title": "chat",
3020                "body": "waiting for demo job: what has it done so far?",
3021                "metadata": {},
3022                "created_at": "2026-04-25T20:00:01+00:00",
3023            },
3024            {
3025                "event_type": "agent_message",
3026                "title": "chat",
3027                "body": "Waiting for the next worker step.",
3028                "metadata": {},
3029                "created_at": "2026-04-25T20:00:02+00:00",
3030            },
3031        ],
3032        [],
3033        width=80,
3034        rows=12,
3035    )
3036
3037    rendered = "\n".join(lines)
3038    assert "what has it done so far?" in rendered
3039    assert "waiting for demo job" not in rendered
3040    assert "Waiting for the next worker step" not in rendered
3041    assert "NIPUX" not in rendered
3042
3043
3044def test_chat_pane_renders_stored_provider_errors_as_actions():
3045    lines = chat_pane_lines(
3046        [
3047            {
3048                "event_type": "agent_message",
3049                "title": "chat",
3050                "body": "APIConnectionError: Connection error.",
3051                "metadata": {"error": True},
3052                "created_at": "2026-04-25T20:00:00+00:00",
3053            }
3054        ],
3055        [],
3056        width=80,
3057        rows=6,
3058    )
3059
3060    rendered = "\n".join(lines)
3061    assert "APIConnectionError" not in rendered
3062    assert "Model endpoint is unreachable" in rendered
3063    assert "/doctor" in rendered
3064
3065
3066def test_chat_updates_page_uses_deeper_summary_events():
3067    snapshot = {
3068        "job_id": "job_demo",
3069        "job": {
3070            "id": "job_demo",
3071            "title": "paper job",
3072            "objective": "write a paper",
3073            "status": "running",
3074            "kind": "generic",
3075            "metadata": {},
3076        },
3077        "jobs": [{"id": "job_demo", "title": "paper job", "status": "running", "kind": "generic", "metadata": {}}],
3078        "steps": [],
3079        "artifacts": [],
3080        "memory_entries": [],
3081        "events": [
3082            {"event_type": "tool_call", "title": "web_search", "body": "", "metadata": {}},
3083        ],
3084        "summary_events": [
3085            {"event_type": "artifact", "title": "Full Paper Draft", "body": "saved draft", "metadata": {}},
3086            {"event_type": "finding", "title": "Distillation method map", "body": "", "metadata": {}},
3087        ],
3088        "daemon": {"running": True, "metadata": {"pid": 123}},
3089        "model": "model/demo",
3090    }
3091
3092    frame = _build_chat_frame(snapshot, "", [], width=132, height=26, right_view="updates")
3093
3094    assert "Full Paper Draft" in frame
3095    assert "Distillation method map" in frame
3096
3097
3098def test_hourly_outcomes_prioritize_durable_work_over_research_noise():
3099    events = [
3100        {
3101            "event_type": "tool_result",
3102            "title": "web_search",
3103            "body": "web_search query='generic harness patterns' returned 5 results",
3104            "metadata": {"status": "completed", "input": {"arguments": {"query": "generic harness patterns"}}},
3105            "created_at": "2026-05-01T12:05:00+00:00",
3106        },
3107        {
3108            "event_type": "tool_result",
3109            "title": "web_extract",
3110            "body": "web_extract fetched 3/3 pages",
3111            "metadata": {"status": "completed"},
3112            "created_at": "2026-05-01T12:08:00+00:00",
3113        },
3114        {
3115            "event_type": "artifact",
3116            "title": "Harness Architecture Notes",
3117            "body": "saved design notes",
3118            "metadata": {},
3119            "created_at": "2026-05-01T12:20:00+00:00",
3120        },
3121        {
3122            "event_type": "experiment",
3123            "title": "Context budget check",
3124            "body": "",
3125            "metadata": {"metric_name": "prompt_tokens", "metric_value": 4200, "metric_unit": "tokens"},
3126            "created_at": "2026-05-01T12:30:00+00:00",
3127        },
3128    ]
3129
3130    rendered = "\n".join(hourly_update_lines(events, width=96, limit=8))
3131
3132    assert "2 research" in rendered
3133    assert "1 outputs" in rendered
3134    assert "1 measurements" in rendered
3135    assert "Harness Architecture Notes" in rendered
3136    assert "Context budget check" in rendered
3137    assert "generic harness patterns" not in rendered
3138
3139
3140def test_status_recent_outcomes_hide_research_noise():
3141    events = [
3142        {
3143            "event_type": "tool_result",
3144            "title": "web_search",
3145            "body": "web_search query='generic harness patterns' returned 5 results",
3146            "metadata": {"status": "completed", "input": {"arguments": {"query": "generic harness patterns"}}},
3147            "created_at": "2026-05-01T12:05:00+00:00",
3148        },
3149        {
3150            "event_type": "artifact",
3151            "title": "Harness Architecture Notes",
3152            "body": "saved design notes",
3153            "metadata": {},
3154            "created_at": "2026-05-01T12:20:00+00:00",
3155        },
3156    ]
3157
3158    rendered = "\n".join(recent_model_update_lines(events, width=96, limit=4))
3159
3160    assert "Harness Architecture Notes" in rendered
3161    assert "generic harness patterns" not in rendered
3162
3163
3164def test_status_recent_outcomes_hide_plan_update_noise():
3165    events = [
3166        {
3167            "event_type": "reflection",
3168            "title": "reflection",
3169            "body": "summarized current counts",
3170            "metadata": {},
3171            "created_at": "2026-05-01T12:05:00+00:00",
3172        },
3173        {
3174            "event_type": "agent_message",
3175            "title": "progress",
3176            "body": "Checkpoint at step #100.",
3177            "metadata": {},
3178            "created_at": "2026-05-01T12:08:00+00:00",
3179        },
3180        {
3181            "event_type": "finding",
3182            "title": "Teacher trace distillation pattern",
3183            "body": "",
3184            "metadata": {},
3185            "created_at": "2026-05-01T12:20:00+00:00",
3186        },
3187    ]
3188
3189    rendered = "\n".join(recent_model_update_lines(events, width=96, limit=5))
3190
3191    assert "Teacher trace distillation pattern" in rendered
3192    assert "Checkpoint at step" not in rendered
3193    assert "summarized current counts" not in rendered
3194
3195
3196def test_status_recent_outcomes_show_durable_checkpoint_updates():
3197    events = [
3198        {
3199            "event_type": "agent_message",
3200            "title": "progress",
3201            "body": "Checkpoint step #90: ~1 task updated, 1 task resolved.",
3202            "metadata": {
3203                "updates": {"tasks": 1},
3204                "resolutions": {"tasks": 1},
3205                "deltas": {"findings": 0},
3206            },
3207            "created_at": "2026-05-01T12:08:00+00:00",
3208        }
3209    ]
3210
3211    rendered = "\n".join(recent_model_update_lines(events, width=96, limit=4))
3212
3213    assert "TASK" in rendered
3214    assert "~1 task updated" in rendered
3215    assert "1 task resolved" in rendered
3216    assert "Checkpoint step #90" in rendered
3217
3218
3219def test_status_recent_outcomes_compact_repeated_updates():
3220    events = [
3221        {
3222            "event_type": "agent_message",
3223            "title": "error",
3224            "body": "Model provider requires operator action.",
3225            "metadata": {},
3226            "created_at": f"2026-05-01T12:0{index}:00+00:00",
3227        }
3228        for index in range(3)
3229    ]
3230
3231    rendered = "\n".join(recent_model_update_lines(events, width=96, limit=4))
3232
3233    assert rendered.count("Model provider requires operator action") == 1
3234    assert "x3" in rendered
3235
3236
3237def test_hourly_outcomes_hide_plan_update_noise():
3238    events = [
3239        {
3240            "event_type": "reflection",
3241            "title": "reflection",
3242            "body": "summarized current counts",
3243            "metadata": {},
3244            "created_at": "2026-05-01T12:05:00+00:00",
3245        },
3246        {
3247            "event_type": "agent_message",
3248            "title": "progress",
3249            "body": "Checkpoint at step #100.",
3250            "metadata": {},
3251            "created_at": "2026-05-01T12:08:00+00:00",
3252        },
3253        {
3254            "event_type": "artifact",
3255            "title": "Saved research draft",
3256            "body": "",
3257            "metadata": {},
3258            "created_at": "2026-05-01T12:20:00+00:00",
3259        },
3260    ]
3261
3262    rendered = "\n".join(hourly_update_lines(events, width=96, limit=6))
3263
3264    assert "Saved research draft" in rendered
3265    assert "Checkpoint at step" not in rendered
3266    assert "summarized current counts" not in rendered
3267
3268
3269def test_hourly_outcomes_count_durable_checkpoint_updates():
3270    events = [
3271        {
3272            "event_type": "agent_message",
3273            "title": "progress",
3274            "body": "Checkpoint step #110: ~1 experiment updated, 1 experiment resolved.",
3275            "metadata": {
3276                "updates": {"experiments": 1},
3277                "resolutions": {"experiments": 1},
3278            },
3279            "created_at": "2026-05-01T12:08:00+00:00",
3280        }
3281    ]
3282
3283    rendered = "\n".join(hourly_update_lines(events, width=96, limit=6))
3284
3285    assert "1 measurements" in rendered
3286    assert "~1 measurement updated" in rendered
3287    assert "1 measurement resolved" in rendered
3288
3289
3290def test_hourly_outcome_summary_uses_progress_order():
3291    events = [
3292        {
3293            "event_type": "source",
3294            "title": "source scored",
3295            "body": "",
3296            "metadata": {},
3297            "created_at": "2026-05-01T12:01:00+00:00",
3298        },
3299        {
3300            "event_type": "artifact",
3301            "title": "draft saved",
3302            "body": "",
3303            "metadata": {},
3304            "created_at": "2026-05-01T12:02:00+00:00",
3305        },
3306        {
3307            "event_type": "experiment",
3308            "title": "metric checked",
3309            "body": "",
3310            "metadata": {"metric_name": "score", "metric_value": 1, "metric_unit": "point"},
3311            "created_at": "2026-05-01T12:03:00+00:00",
3312        },
3313    ]
3314
3315    rendered = "\n".join(hourly_update_lines(events, width=96, limit=8))
3316
3317    assert "1 outputs 1 measurements 1 sources" in rendered
3318
3319
3320def test_hourly_outcomes_wrap_long_durable_updates_without_pre_truncation():
3321    events = [
3322        {
3323            "event_type": "finding",
3324            "title": (
3325                "Distillation survey breakthrough: teacher trajectories should include failed tool calls, "
3326                "operator corrections, recovery steps, and measured validation so the student learns the "
3327                "whole harness loop instead of only final answers"
3328            ),
3329            "body": "",
3330            "metadata": {},
3331            "created_at": "2026-05-01T12:05:00+00:00",
3332        },
3333    ]
3334
3335    rendered = "\n".join(hourly_update_lines(events, width=82, limit=6))
3336
3337    assert "operator corrections" in rendered
3338    assert "measured" in rendered
3339    assert "validation" in rendered
3340    assert "only" in rendered
3341    assert "final answers" in rendered
3342    assert "..." not in rendered
3343
3344
3345def test_hourly_outcomes_limit_visible_hours_without_losing_headers():
3346    events = []
3347    for hour in range(8):
3348        events.extend(
3349            [
3350                {
3351                    "event_type": "artifact",
3352                    "title": f"Draft saved hour {hour}",
3353                    "body": "",
3354                    "metadata": {},
3355                    "created_at": f"2026-05-01T{hour:02d}:05:00+00:00",
3356                },
3357                {
3358                    "event_type": "finding",
3359                    "title": f"Finding hour {hour}",
3360                    "body": "",
3361                    "metadata": {},
3362                    "created_at": f"2026-05-01T{hour:02d}:20:00+00:00",
3363                },
3364            ]
3365        )
3366
3367    rendered = "\n".join(hourly_update_lines(events, width=96, limit=8))
3368
3369    assert "2026-05-01 06:00" in rendered
3370    assert "2026-05-01 07:00" in rendered
3371    assert "Draft saved hour 7" in rendered
3372    assert "Finding hour 7" in rendered
3373    assert "Draft saved hour 0" not in rendered
3374
3375
3376def test_chat_updates_page_includes_agent_error_updates():
3377    snapshot = {
3378        "job_id": "job_demo",
3379        "job": {
3380            "id": "job_demo",
3381            "title": "provider job",
3382            "objective": "keep provider state visible",
3383            "status": "paused",
3384            "kind": "generic",
3385            "metadata": {},
3386        },
3387        "jobs": [{"id": "job_demo", "title": "provider job", "status": "paused", "kind": "generic", "metadata": {}}],
3388        "steps": [],
3389        "artifacts": [],
3390        "memory_entries": [],
3391        "events": [],
3392        "summary_events": [
3393            {
3394                "event_type": "agent_message",
3395                "title": "error",
3396                "body": "Model provider requires operator action.",
3397                "metadata": {"reason": "llm_provider_blocked"},
3398            },
3399        ],
3400        "daemon": {"running": True, "metadata": {"pid": 123}},
3401        "model": "model/demo",
3402    }
3403
3404    updates = _build_chat_frame(snapshot, "", [], width=132, height=34, right_view="updates")
3405    status = _build_chat_frame(snapshot, "", [], width=132, height=34, right_view="status")
3406
3407    assert "Model provider requires" in updates
3408    assert "operator action" in updates
3409    assert "Outcome" in status
3410    assert "Model provider re" in status
3411
3412
3413def test_chat_status_marks_provider_blocked_jobs_before_daemon_retry():
3414    job = {
3415        "id": "job_demo",
3416        "title": "provider job",
3417        "objective": "keep provider state visible",
3418        "status": "running",
3419        "kind": "generic",
3420        "metadata": {"provider_blocked_at": "2026-05-01T00:00:00+00:00"},
3421    }
3422    snapshot = {
3423        "job_id": "job_demo",
3424        "job": job,
3425        "jobs": [job],
3426        "steps": [],
3427        "artifacts": [],
3428        "memory_entries": [],
3429        "events": [],
3430        "summary_events": [],
3431        "daemon": {"running": True, "metadata": {"pid": 123}},
3432        "model": "model/demo",
3433    }
3434
3435    frame = _build_chat_frame(snapshot, "", [], width=132, height=30, right_view="status")
3436
3437    assert "provider wait" in frame
3438    assert "Provider" in frame
3439    assert "action needed" in frame
3440    assert "advancing" not in frame
3441
3442
3443def test_chat_status_page_surfaces_context_pressure():
3444    snapshot = {
3445        "job_id": "job_demo",
3446        "job": {
3447            "id": "job_demo",
3448            "title": "context job",
3449            "objective": "keep context pressure visible",
3450            "status": "running",
3451            "kind": "generic",
3452            "metadata": {},
3453        },
3454        "jobs": [{"id": "job_demo", "title": "context job", "status": "running", "kind": "generic", "metadata": {}}],
3455        "steps": [],
3456        "artifacts": [],
3457        "memory_entries": [],
3458        "events": [],
3459        "daemon": {"running": True, "metadata": {"pid": 123}},
3460        "model": "model/demo",
3461        "context_length": 8192,
3462        "token_usage": {"calls": 3, "latest_prompt_tokens": 7000, "total_tokens": 9000, "completion_tokens": 2000},
3463    }
3464
3465    frame = _build_chat_frame(snapshot, "", [], width=132, height=30, right_view="status")
3466
3467    assert "Context" in frame
3468    assert "7.0K/8.2K" in frame
3469    assert "85%" in frame
3470    assert "high" in frame
3471
3472
3473def test_chat_status_page_surfaces_low_durable_yield():
3474    snapshot = {
3475        "job_id": "job_demo",
3476        "job": {
3477            "id": "job_demo",
3478            "title": "yield job",
3479            "objective": "keep durable progress visible",
3480            "status": "running",
3481            "kind": "generic",
3482            "metadata": {},
3483        },
3484        "jobs": [{"id": "job_demo", "title": "yield job", "status": "running", "kind": "generic", "metadata": {}}],
3485        "steps": [],
3486        "artifacts": [{"id": "art_demo", "title": "Only Saved Output"}],
3487        "memory_entries": [],
3488        "events": [],
3489        "daemon": {"running": True, "metadata": {"pid": 123}},
3490        "model": "model/demo",
3491        "counts": {"steps": 120, "artifacts": 1, "memory": 0},
3492    }
3493
3494    frame = _build_chat_frame(snapshot, "", [], width=132, height=30, right_view="status")
3495
3496    assert "Yield" in frame
3497    assert "watch" in frame
3498    assert "120.0 actions/outcome" in frame
3499
3500
3501def test_chat_status_page_shows_job_outputs():
3502    snapshot = {
3503        "job_id": "job_demo",
3504        "job": {
3505            "id": "job_demo",
3506            "title": "demo job",
3507            "objective": "show created outputs per job",
3508            "status": "running",
3509            "kind": "generic",
3510            "metadata": {},
3511        },
3512        "jobs": [
3513            {"id": "job_demo", "title": "demo job", "status": "running", "kind": "generic", "metadata": {}},
3514            {"id": "job_other", "title": "other job", "status": "queued", "kind": "generic", "metadata": {}},
3515        ],
3516        "steps": [],
3517        "artifacts": [
3518            {"id": "art_demo", "title": "Primary Saved Draft"},
3519            {"id": "art_second", "title": "Secondary Saved Note"},
3520        ],
3521        "job_artifacts": {
3522            "job_demo": [
3523                {"id": "art_demo", "title": "Primary Saved Draft"},
3524                {"id": "art_second", "title": "Secondary Saved Note"},
3525            ],
3526            "job_other": [{"id": "art_other", "title": "Other Job Deliverable"}],
3527        },
3528        "job_counts": {
3529            "job_demo": {"artifacts": 2},
3530            "job_other": {"artifacts": 4},
3531        },
3532        "job_summary_events": {
3533            "job_demo": [
3534                {"event_type": "artifact", "title": "Primary Saved Draft", "body": "", "metadata": {}},
3535                {"event_type": "experiment", "title": "Primary quality check", "body": "", "metadata": {"metric_name": "score", "metric_value": 8}},
3536            ],
3537            "job_other": [
3538                {"event_type": "finding", "title": "Other job durable finding", "body": "", "metadata": {}},
3539            ],
3540        },
3541        "memory_entries": [],
3542        "events": [],
3543        "summary_events": [
3544            {"event_type": "finding", "title": "Latest durable milestone", "body": "", "metadata": {}},
3545        ],
3546        "daemon": {"running": True, "metadata": {"pid": 123}},
3547        "model": "model/demo",
3548        "counts": {"steps": 0, "artifacts": 1, "memory": 0},
3549    }
3550
3551    frame = _build_chat_frame(snapshot, "", [], width=132, height=34, right_view="status")
3552
3553    assert "Jobs" in frame
3554    assert "Latest hour" in frame
3555    assert "1 findings" in frame
3556    assert "Outcome" in frame
3557    assert "Latest durable milestone" in frame
3558    assert "2 outputs" in frame
3559    assert "Primary Saved Draft" in frame
3560    assert "Secondary Saved Note" in frame
3561    assert "Primary quality check" in frame
3562    assert "4 outputs" in frame
3563    assert "Other Job Deliverable" in frame
3564    assert "Other job durable finding" in frame
3565
3566
3567def test_frame_snapshot_keeps_summary_events_durable(monkeypatch, tmp_path):
3568    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
3569    db = AgentDB(tmp_path / "state.db")
3570    try:
3571        job_id = db.create_job("keep frame refresh focused", title="focused")
3572        for index in range(30):
3573            db.append_event(
3574                job_id=job_id,
3575                event_type="tool_result",
3576                title="web_search",
3577                body=f"search noise {index}",
3578                metadata={"status": "completed"},
3579            )
3580        db.append_event(
3581            job_id=job_id,
3582            event_type="tool_result",
3583            title="write_file",
3584            body="write_file overwrite /tmp/paper.md",
3585            metadata={"status": "completed", "input": {"arguments": {"path": "/tmp/paper.md"}}, "output": {"path": "/tmp/paper.md"}},
3586        )
3587        db.append_event(
3588            job_id=job_id,
3589            event_type="tool_result",
3590            title="shell_exec",
3591            body="shell_exec rc=0",
3592            metadata={"status": "completed", "input": {"arguments": {"command": "printf draft | tee /tmp/outline.md"}}},
3593        )
3594        db.append_event(job_id=job_id, event_type="artifact", title="Durable Paper Draft", body="", metadata={})
3595        db.append_event(job_id=job_id, event_type="finding", title="Actual finding", body="", metadata={})
3596    finally:
3597        db.close()
3598
3599    snapshot = _load_frame_snapshot(job_id, history_limit=4)
3600    summary_text = "\n".join(str(event.get("title") or event.get("body") or "") for event in snapshot["summary_events"])
3601
3602    assert "Durable Paper Draft" in summary_text
3603    assert "Actual finding" in summary_text
3604    assert "write_file" in summary_text
3605    assert "shell_exec" in summary_text
3606    assert "web_search" not in summary_text
3607    assert "search noise" not in summary_text
3608
3609
3610def test_frame_snapshot_respects_explicit_job_over_saved_focus(monkeypatch, tmp_path):
3611    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
3612    db = AgentDB(tmp_path / "state.db")
3613    try:
3614        focused_id = db.create_job("saved focus", title="saved focus")
3615        requested_id = db.create_job("requested focus", title="requested focus")
3616        (tmp_path / "shell_state.json").write_text(json.dumps({"focus_job_id": focused_id}), encoding="utf-8")
3617    finally:
3618        db.close()
3619
3620    snapshot = _load_frame_snapshot(requested_id, history_limit=4)
3621
3622    assert snapshot["job_id"] == requested_id
3623    assert snapshot["job"]["title"] == "requested focus"
3624
3625
3626def test_chat_status_page_marks_deferred_jobs_waiting():
3627    snapshot = {
3628        "job_id": "job_demo",
3629        "job": {
3630            "id": "job_demo",
3631            "title": "deferred job",
3632            "objective": "check a long-running process later",
3633            "status": "running",
3634            "kind": "generic",
3635            "metadata": {"defer_until": "2999-01-01T00:00:00+00:00", "defer_reason": "external process running"},
3636        },
3637        "jobs": [
3638            {
3639                "id": "job_demo",
3640                "title": "deferred job",
3641                "status": "running",
3642                "kind": "generic",
3643                "metadata": {"defer_until": "2999-01-01T00:00:00+00:00", "defer_reason": "external process running"},
3644            }
3645        ],
3646        "steps": [],
3647        "artifacts": [],
3648        "job_artifacts": {},
3649        "memory_entries": [],
3650        "events": [],
3651        "daemon": {"running": True, "metadata": {"pid": 123}},
3652        "model": "model/demo",
3653    }
3654
3655    frame = _build_chat_frame(snapshot, "", [], width=132, height=28, right_view="status")
3656
3657    assert "waiting" in frame
3658    assert "Wait" in frame
3659    assert "next check" in frame
3660    assert "external" in frame
3661    assert "active" not in frame
3662
3663
3664def test_chat_frame_collapses_repeated_failures_and_hides_memory_noise():
3665    repeated_error = {
3666        "event_type": "error",
3667        "title": "llm",
3668        "body": "Error code: 403 - {'error': {'message': 'Key limit exceeded (total limit).'}}",
3669        "metadata": {},
3670    }
3671    snapshot = {
3672        "job_id": "job_demo",
3673        "job": {
3674            "id": "job_demo",
3675            "title": "demo job",
3676            "objective": "stay readable",
3677            "status": "running",
3678            "kind": "generic",
3679            "metadata": {"task_queue": []},
3680        },
3681        "jobs": [{"id": "job_demo", "title": "demo job", "status": "running", "kind": "generic", "metadata": {}}],
3682        "steps": [],
3683        "artifacts": [],
3684        "memory_entries": [{}],
3685        "events": [
3686            repeated_error,
3687            {
3688                "event_type": "compaction",
3689                "title": "rolling_state",
3690                "body": "very long compact memory " * 80,
3691                "metadata": {},
3692            },
3693            repeated_error,
3694            repeated_error,
3695        ],
3696        "daemon": {"running": True, "metadata": {"pid": 123}},
3697        "model": "model/demo",
3698        "counts": {"steps": 3, "artifacts": 0, "memory": 1},
3699    }
3700
3701    frame = _build_chat_frame(snapshot, "", [], width=120, height=24, right_view="updates")
3702
3703    assert "3 blocks" in frame
3704    assert "FAIL" in frame
3705    assert "very long compact memory" not in frame
3706
3707
3708def test_work_pane_uses_badges_without_duplicate_action_verbs():
3709    snapshot = {
3710        "job_id": "job_demo",
3711        "job": {
3712            "id": "job_demo",
3713            "title": "demo job",
3714            "objective": "stay readable",
3715            "status": "running",
3716            "kind": "generic",
3717            "metadata": {"task_queue": []},
3718        },
3719        "jobs": [{"id": "job_demo", "title": "demo job", "status": "running", "kind": "generic", "metadata": {}}],
3720        "steps": [],
3721        "artifacts": [],
3722        "memory_entries": [],
3723        "events": [
3724            {"event_type": "artifact", "title": "Demo Output", "body": "", "metadata": {}},
3725            {"event_type": "finding", "title": "Demo Finding", "body": "", "metadata": {}},
3726            {"event_type": "experiment", "title": "Demo Measurement", "body": "", "metadata": {}},
3727        ],
3728        "daemon": {"running": True, "metadata": {"pid": 123}},
3729        "model": "model/demo",
3730    }
3731
3732    frame = _build_chat_frame(snapshot, "", [], width=120, height=24, right_view="updates")
3733
3734    assert "Demo Output" in frame
3735    assert "Demo Finding" in frame
3736    assert "TEST" in frame
3737    assert "Demo Measurement" in frame
3738    assert "save saved" not in frame
3739    assert "find finding" not in frame
3740    assert "test experiment" not in frame
3741
3742
3743def test_run_reopens_completed_focused_job(monkeypatch, tmp_path, capsys):
3744    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
3745    _mark_test_model_ready()
3746    parser = build_parser()
3747    db = AgentDB(tmp_path / "state.db")
3748    try:
3749        job_id = db.create_job("Keep improving", title="perpetual")
3750        db.update_job_status(job_id, "completed")
3751    finally:
3752        db.close()
3753    started = {}
3754
3755    def fake_start(**kwargs):
3756        started.update(kwargs)
3757
3758    monkeypatch.setattr("nipux_cli.cli._start_daemon_if_needed", fake_start)
3759    args = parser.parse_args(["run", "perpetual", "--no-follow"])
3760
3761    args.func(args)
3762
3763    out = capsys.readouterr().out
3764    db = AgentDB(tmp_path / "state.db")
3765    try:
3766        job = db.get_job(job_id)
3767        assert "focus set: perpetual" in out
3768        assert job["status"] == "queued"
3769        assert job["metadata"]["last_note"] == "reopened from completed by operator run command"
3770        assert started["poll_seconds"] == 0.0
3771    finally:
3772        db.close()
3773
3774
3775def test_run_delegates_unverified_provider_state_to_daemon_start(monkeypatch, tmp_path, capsys):
3776    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
3777    parser = build_parser()
3778    db = AgentDB(tmp_path / "state.db")
3779    try:
3780        db.create_job("Keep checking provider recovery", title="provider recovery")
3781    finally:
3782        db.close()
3783    started = {}
3784
3785    def fake_start(**kwargs):
3786        started.update(kwargs)
3787        print("model provider is not ready; starting daemon in recovery monitor mode")
3788
3789    monkeypatch.setattr("nipux_cli.cli._remote_model_preflight_failures", lambda _config: [])
3790    monkeypatch.setattr("nipux_cli.cli._start_daemon_if_needed", fake_start)
3791    args = parser.parse_args(["run", "provider recovery", "--no-follow"])
3792
3793    args.func(args)
3794
3795    out = capsys.readouterr().out
3796    assert "Model setup is not verified." not in out
3797    assert "recovery monitor mode" in out
3798    assert started["poll_seconds"] == 0.0
3799
3800
3801def test_run_marks_job_waiting_when_provider_recovery_is_needed(monkeypatch, tmp_path, capsys):
3802    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
3803    parser = build_parser()
3804    db = AgentDB(tmp_path / "state.db")
3805    try:
3806        job_id = db.create_job("Keep checking provider recovery", title="provider recovery")
3807    finally:
3808        db.close()
3809
3810    monkeypatch.setattr("nipux_cli.cli._remote_model_preflight_failures", lambda _config: ["model_generation: key limit exceeded"])
3811
3812    def fake_start(**_kwargs):
3813        print("model provider is not ready; starting daemon in recovery monitor mode")
3814
3815    monkeypatch.setattr("nipux_cli.cli._start_daemon_if_needed", fake_start)
3816    args = parser.parse_args(["run", "provider recovery", "--no-follow"])
3817
3818    args.func(args)
3819
3820    out = capsys.readouterr().out
3821    assert "recovery monitor mode" in out
3822    db = AgentDB(tmp_path / "state.db")
3823    try:
3824        job = db.get_job(job_id)
3825        assert job["status"] == "paused"
3826        assert job["metadata"]["provider_blocked_at"]
3827        assert "monitor and resume" in job["metadata"]["last_note"]
3828    finally:
3829        db.close()
3830
3831
3832def test_run_does_not_reopen_already_provider_blocked_job(monkeypatch, tmp_path, capsys):
3833    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
3834    parser = build_parser()
3835    db = AgentDB(tmp_path / "state.db")
3836    blocked_at = "2026-05-01T00:00:00+00:00"
3837    try:
3838        job_id = db.create_job("Keep checking provider recovery", title="provider recovery")
3839        db.update_job_status(
3840            job_id,
3841            "paused",
3842            metadata_patch={
3843                "provider_blocked_at": blocked_at,
3844                "last_note": "Model provider is unavailable; daemon will monitor and resume this job when calls succeed.",
3845            },
3846        )
3847        event_count = len(db.list_events(job_id=job_id, limit=20))
3848    finally:
3849        db.close()
3850
3851    monkeypatch.setattr("nipux_cli.cli._remote_model_preflight_failures", lambda _config: ["model_generation: key limit exceeded"])
3852
3853    def fake_start(**_kwargs):
3854        print("model provider is not ready; starting daemon in recovery monitor mode")
3855
3856    monkeypatch.setattr("nipux_cli.cli._start_daemon_if_needed", fake_start)
3857    args = parser.parse_args(["run", "provider recovery", "--no-follow"])
3858
3859    args.func(args)
3860
3861    out = capsys.readouterr().out
3862    assert "recovery monitor mode" in out
3863    db = AgentDB(tmp_path / "state.db")
3864    try:
3865        job = db.get_job(job_id)
3866        assert job["status"] == "paused"
3867        assert job["metadata"]["provider_blocked_at"] == blocked_at
3868        assert "still unavailable" in job["metadata"]["last_note"]
3869        events = db.list_events(job_id=job_id, limit=20)
3870        assert len(events) == event_count
3871        assert all("Reopened from" not in str(event.get("body") or "") for event in events)
3872    finally:
3873        db.close()
3874
3875
3876def test_run_does_not_reopen_job_when_provider_preflight_is_hard_failure(monkeypatch, tmp_path):
3877    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
3878    parser = build_parser()
3879    db = AgentDB(tmp_path / "state.db")
3880    try:
3881        job_id = db.create_job("Keep checking provider recovery", title="provider recovery")
3882        db.update_job_status(job_id, "paused", metadata_patch={"last_note": "operator paused"})
3883    finally:
3884        db.close()
3885
3886    monkeypatch.setattr("nipux_cli.cli._remote_model_preflight_failures", lambda _config: ["model_auth: user not found"])
3887    monkeypatch.setattr("nipux_cli.cli._start_daemon_if_needed", lambda **_kwargs: None)
3888    args = parser.parse_args(["run", "provider recovery", "--no-follow"])
3889
3890    args.func(args)
3891
3892    db = AgentDB(tmp_path / "state.db")
3893    try:
3894        job = db.get_job(job_id)
3895        assert job["status"] == "paused"
3896        assert job["metadata"]["last_note"] == "operator paused"
3897    finally:
3898        db.close()
3899
3900
3901def test_create_sets_new_job_as_shell_focus(monkeypatch, tmp_path, capsys):
3902    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
3903    _mark_test_model_ready()
3904    parser = build_parser()
3905    args = parser.parse_args(["create", "Research new topic", "--title", "new research", "--kind", "generic"])
3906
3907    args.func(args)
3908    created = capsys.readouterr().out.strip()
3909    assert _run_shell_line("focus") is True
3910
3911    out = capsys.readouterr().out
3912    assert created == "created new research"
3913    assert "new research" in out
3914    assert (tmp_path / "jobs" / "new-research" / "program.md").exists()
3915    db = AgentDB(tmp_path / "state.db")
3916    try:
3917        job = db.get_job("new-research")
3918        assert job["status"] == "queued"
3919        assert job["metadata"]["planning_status"] == "auto_accepted"
3920        assert job["metadata"]["planning"]["questions"]
3921        tasks = job["metadata"]["task_queue"]
3922        assert tasks
3923        assert all(task["output_contract"] for task in tasks)
3924        assert all(task["acceptance_criteria"] for task in tasks)
3925        assert all(task["evidence_needed"] for task in tasks)
3926        assert all(task["stall_behavior"] for task in tasks)
3927    finally:
3928        db.close()
3929
3930
3931def test_commands_accept_unquoted_job_titles_in_shell(monkeypatch, tmp_path, capsys):
3932    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
3933    db = AgentDB(tmp_path / "state.db")
3934    try:
3935        db.create_job("Research topic", title="nightly research")
3936    finally:
3937        db.close()
3938
3939    assert _run_shell_line("status nightly research") is True
3940
3941    out = capsys.readouterr().out
3942    assert "focus: nightly research" in out
3943    assert "state: open" in out
3944    assert "job_" not in out
3945
3946
3947def test_shell_stop_job_title_pauses_job_instead_of_stopping_daemon(monkeypatch, tmp_path, capsys):
3948    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
3949    db = AgentDB(tmp_path / "state.db")
3950    try:
3951        job_id = db.create_job("Research topic", title="nightly research")
3952        db.update_job_status(job_id, "running")
3953    finally:
3954        db.close()
3955
3956    assert _run_shell_line("stop nightly research") is True
3957
3958    out = capsys.readouterr().out
3959    db = AgentDB(tmp_path / "state.db")
3960    try:
3961        job = db.get_job(job_id)
3962        assert "stopped nightly research" in out
3963        assert job["status"] == "paused"
3964        assert job["metadata"]["last_note"] == "stopped by operator"
3965    finally:
3966        db.close()
3967
3968
3969def test_resume_clears_provider_block_before_retry(monkeypatch, tmp_path, capsys):
3970    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
3971    db = AgentDB(tmp_path / "state.db")
3972    try:
3973        job_id = db.create_job("Research topic", title="nightly research")
3974        db.update_job_status(
3975            job_id,
3976            "paused",
3977            metadata_patch={
3978                "provider_blocked_at": "2026-05-01T00:00:00+00:00",
3979                "defer_until": "2999-01-01T00:00:00+00:00",
3980                "defer_reason": "waiting for a monitor interval",
3981                "defer_next_action": "check later",
3982            },
3983        )
3984    finally:
3985        db.close()
3986
3987    main(["resume", "nightly research"])
3988
3989    out = capsys.readouterr().out
3990    db = AgentDB(tmp_path / "state.db")
3991    try:
3992        job = db.get_job(job_id)
3993        assert "resumed nightly research" in out
3994        assert job["status"] == "queued"
3995        assert job["metadata"]["provider_blocked_at"] == ""
3996        assert job["metadata"]["provider_unblocked_at"]
3997        assert job["metadata"]["defer_until"] == ""
3998        assert job["metadata"]["defer_reason"] == ""
3999        assert job["metadata"]["defer_next_action"] == ""
4000    finally:
4001        db.close()
4002
4003
4004def test_shell_cancel_prefers_multiword_job_title_over_note(monkeypatch, tmp_path, capsys):
4005    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
4006    db = AgentDB(tmp_path / "state.db")
4007    try:
4008        job_id = db.create_job("Research topic", title="nightly research")
4009        db.update_job_status(job_id, "running")
4010    finally:
4011        db.close()
4012
4013    assert _run_shell_line("cancel nightly research") is True
4014
4015    out = capsys.readouterr().out
4016    db = AgentDB(tmp_path / "state.db")
4017    try:
4018        job = db.get_job(job_id)
4019        assert "cancelled nightly research" in out
4020        assert ": finder" not in out
4021        assert job["status"] == "cancelled"
4022        assert "last_note" not in job["metadata"]
4023    finally:
4024        db.close()
4025
4026
4027def test_shell_pause_splits_note_after_longest_matching_job_title(monkeypatch, tmp_path, capsys):
4028    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
4029    db = AgentDB(tmp_path / "state.db")
4030    try:
4031        job_id = db.create_job("Research topic", title="nightly research")
4032        db.update_job_status(job_id, "running")
4033    finally:
4034        db.close()
4035
4036    assert _run_shell_line("pause nightly research checking costs") is True
4037
4038    out = capsys.readouterr().out
4039    db = AgentDB(tmp_path / "state.db")
4040    try:
4041        job = db.get_job(job_id)
4042        assert "paused nightly research: checking costs" in out
4043        assert job["status"] == "paused"
4044        assert job["metadata"]["last_note"] == "checking costs"
4045    finally:
4046        db.close()
4047
4048
4049def test_chat_handle_line_adds_operator_message(monkeypatch, tmp_path, capsys):
4050    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
4051    _mark_test_model_ready()
4052    db = AgentDB(tmp_path / "state.db")
4053    try:
4054        job_id = db.create_job("Research topic", title="nightly research")
4055    finally:
4056        db.close()
4057
4058    assert (
4059        _chat_handle_line(
4060            job_id, "prefer artifact-backed findings", reply_fn=lambda _job_id, _message: "Okay, I will focus there."
4061        )
4062        is True
4063    )
4064
4065    out = capsys.readouterr().out
4066    db = AgentDB(tmp_path / "state.db")
4067    try:
4068        job = db.get_job(job_id)
4069        assert "waiting:" in out
4070        assert "Okay, I will focus there." in out
4071        assert job["metadata"]["operator_messages"][-1]["source"] == "chat"
4072        assert job["metadata"]["operator_messages"][-1]["message"] == "prefer artifact-backed findings"
4073        assert job["metadata"]["last_agent_update"]["category"] == "chat"
4074    finally:
4075        db.close()
4076
4077
4078def test_chat_can_spawn_new_job_from_plain_message(monkeypatch, tmp_path, capsys):
4079    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
4080    _mark_test_model_ready()
4081    db = AgentDB(tmp_path / "state.db")
4082    try:
4083        original_id = db.create_job("Research topic", title="nightly research")
4084    finally:
4085        db.close()
4086    started = {}
4087
4088    def fake_start(**kwargs):
4089        started.update(kwargs)
4090
4091    monkeypatch.setattr("nipux_cli.cli._start_daemon_if_needed", fake_start)
4092
4093    assert (
4094        _chat_handle_line(
4095            original_id,
4096            "create a job to monitor nightly benchmarks and report regressions",
4097            reply_fn=lambda _job_id, _message: "should not call model",
4098        )
4099        is True
4100    )
4101
4102    out = capsys.readouterr().out
4103    db = AgentDB(tmp_path / "state.db")
4104    try:
4105        jobs = db.list_jobs()
4106        assert len(jobs) == 2
4107        created = [job for job in jobs if job["id"] != original_id][0]
4108        assert "monitor nightly benchmarks" in created["objective"]
4109        assert created["status"] == "queued"
4110        assert created["metadata"]["planning_status"] == "auto_accepted"
4111        assert "should not call model" not in out
4112        assert "Created job" in out
4113        assert "Started worker" in out
4114        assert started["poll_seconds"] == 0.0
4115        assert started["quiet"] is True
4116    finally:
4117        db.close()
4118
4119
4120def test_workspace_chat_can_create_refined_worker_job(monkeypatch, tmp_path):
4121    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
4122    _mark_test_model_ready()
4123    started = {}
4124
4125    def fake_start(**kwargs):
4126        started.update(kwargs)
4127
4128    monkeypatch.setattr("nipux_cli.cli._start_daemon_if_needed", fake_start)
4129    monkeypatch.setattr(
4130        "nipux_cli.cli._refine_job_objective_for_worker",
4131        lambda *, message, objective: f"{objective}\n\nRefined durable objective with success criteria and artifacts.",
4132    )
4133
4134    ok, message = _handle_chat_message(
4135        WORKSPACE_CHAT_ID,
4136        "create a job to research browser automation libraries",
4137        quiet=True,
4138    )
4139
4140    assert ok is True
4141    assert "Created worker job" in message
4142    assert started["quiet"] is True
4143    db = AgentDB(tmp_path / "state.db")
4144    try:
4145        jobs = db.list_jobs()
4146        assert len(jobs) == 1
4147        assert "Refined durable objective" in jobs[0]["objective"]
4148        snapshot = _load_frame_snapshot(WORKSPACE_CHAT_ID, history_limit=4)
4149        bodies = "\n".join(str(event.get("body") or "") for event in snapshot["events"])
4150        assert "create a job" in bodies
4151        assert "Created worker job" in bodies
4152    finally:
4153        db.close()
4154
4155
4156def test_workspace_chat_start_objective_creates_worker_without_model_reply(monkeypatch, tmp_path):
4157    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
4158    _mark_test_model_ready()
4159    started = {}
4160
4161    def fake_start(**kwargs):
4162        started.update(kwargs)
4163
4164    monkeypatch.setattr("nipux_cli.cli._start_daemon_if_needed", fake_start)
4165    monkeypatch.setattr(
4166        "nipux_cli.cli._refine_job_objective_for_worker",
4167        lambda *, message, objective: objective,
4168    )
4169
4170    ok, message = _handle_workspace_chat_message("start research browser automation libraries", quiet=True)
4171
4172    assert ok is True
4173    assert "Created worker job" in message
4174    assert started["quiet"] is True
4175    db = AgentDB(tmp_path / "state.db")
4176    try:
4177        jobs = db.list_jobs()
4178        assert len(jobs) == 1
4179        assert jobs[0]["title"] == "research browser automation libraries"
4180        assert _read_shell_state().get("focus_job_id") == jobs[0]["id"]
4181    finally:
4182        db.close()
4183
4184
4185def test_workspace_chat_accepts_natural_worker_and_task_phrasing(monkeypatch, tmp_path):
4186    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
4187    _mark_test_model_ready()
4188    started = {}
4189
4190    def fake_start(**kwargs):
4191        started.setdefault("calls", []).append(kwargs)
4192
4193    monkeypatch.setattr("nipux_cli.cli._start_daemon_if_needed", fake_start)
4194    monkeypatch.setattr(
4195        "nipux_cli.cli._refine_job_objective_for_worker",
4196        lambda *, message, objective: objective,
4197    )
4198
4199    ok, worker_message = _handle_workspace_chat_message(
4200        "spin up a worker to monitor docs and report changes",
4201        quiet=True,
4202    )
4203    ok2, task_message = _handle_workspace_chat_message(
4204        "run a task to audit onboarding and write a report",
4205        quiet=True,
4206    )
4207
4208    assert ok is True
4209    assert ok2 is True
4210    assert "Created worker job" in worker_message
4211    assert "Created worker job" in task_message
4212    assert len(started["calls"]) == 2
4213    db = AgentDB(tmp_path / "state.db")
4214    try:
4215        titles = [job["title"] for job in db.list_jobs()]
4216        assert "monitor docs and report changes" in titles
4217        assert "audit onboarding and write a report" in titles
4218    finally:
4219        db.close()
4220
4221
4222def test_chat_can_queue_new_job_without_starting(monkeypatch, tmp_path, capsys):
4223    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
4224    _mark_test_model_ready()
4225    db = AgentDB(tmp_path / "state.db")
4226    try:
4227        original_id = db.create_job("Research topic", title="nightly research")
4228    finally:
4229        db.close()
4230    started = {}
4231
4232    def fake_start(**kwargs):
4233        started.update(kwargs)
4234
4235    monkeypatch.setattr("nipux_cli.cli._start_daemon_if_needed", fake_start)
4236
4237    assert (
4238        _chat_handle_line(
4239            original_id,
4240            "create only a job to monitor nightly benchmarks and report regressions",
4241            reply_fn=lambda _job_id, _message: "should not call model",
4242        )
4243        is True
4244    )
4245
4246    out = capsys.readouterr().out
4247    assert "Created job" in out
4248    assert "Started worker" not in out
4249    assert started == {}
4250
4251
4252def test_chat_can_spawn_generic_deliverable_job_from_plain_message(monkeypatch, tmp_path):
4253    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
4254    _mark_test_model_ready()
4255    db = AgentDB(tmp_path / "state.db")
4256    try:
4257        original_id = db.create_job("Research topic", title="nightly research")
4258    finally:
4259        db.close()
4260    started = {}
4261
4262    def fake_start(**kwargs):
4263        started.update(kwargs)
4264
4265    monkeypatch.setattr("nipux_cli.cli._start_daemon_if_needed", fake_start)
4266
4267    assert (
4268        _chat_handle_line(
4269            original_id,
4270            "generate a polished launch checklist for this repository",
4271            reply_fn=lambda _job_id, _message: "should not call model",
4272        )
4273        is True
4274    )
4275
4276    db = AgentDB(tmp_path / "state.db")
4277    try:
4278        jobs = db.list_jobs()
4279        assert len(jobs) == 2
4280        created = [job for job in jobs if job["id"] != original_id][0]
4281        assert "launch checklist" in created["objective"]
4282        assert started["quiet"] is True
4283    finally:
4284        db.close()
4285
4286
4287def test_chat_start_job_message_starts_daemon(monkeypatch, tmp_path, capsys):
4288    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
4289    _mark_test_model_ready()
4290    db = AgentDB(tmp_path / "state.db")
4291    try:
4292        original_id = db.create_job("Research topic", title="nightly research")
4293    finally:
4294        db.close()
4295    started = {}
4296
4297    def fake_start(**kwargs):
4298        started.update(kwargs)
4299
4300    monkeypatch.setattr("nipux_cli.cli._start_daemon_if_needed", fake_start)
4301
4302    assert (
4303        _chat_handle_line(
4304            original_id,
4305            "start a job to monitor nightly benchmarks and report regressions",
4306            reply_fn=lambda _job_id, _message: "should not call model",
4307        )
4308        is True
4309    )
4310
4311    out = capsys.readouterr().out
4312    assert started["poll_seconds"] == 0.0
4313    assert started["quiet"] is True
4314    assert "Started worker" in out
4315
4316
4317def test_chat_create_job_and_run_it_starts_daemon(monkeypatch, tmp_path, capsys):
4318    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
4319    _mark_test_model_ready()
4320    db = AgentDB(tmp_path / "state.db")
4321    try:
4322        original_id = db.create_job("Research topic", title="nightly research")
4323    finally:
4324        db.close()
4325    started = {}
4326
4327    def fake_start(**kwargs):
4328        started.update(kwargs)
4329
4330    monkeypatch.setattr("nipux_cli.cli._start_daemon_if_needed", fake_start)
4331
4332    assert (
4333        _chat_handle_line(
4334            original_id,
4335            "create a job to monitor nightly benchmarks and then run it",
4336            reply_fn=lambda _job_id, _message: "should not call model",
4337        )
4338        is True
4339    )
4340
4341    out = capsys.readouterr().out
4342    assert started["poll_seconds"] == 0.0
4343    assert started["quiet"] is True
4344    assert "Started worker" in out
4345
4346
4347def test_chat_jobs_command_lists_jobs_instead_of_steering(monkeypatch, tmp_path, capsys):
4348    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
4349    db = AgentDB(tmp_path / "state.db")
4350    try:
4351        job_id = db.create_job("Research topic", title="nightly research")
4352    finally:
4353        db.close()
4354
4355    assert _chat_handle_line(job_id, "/jobs", reply_fn=lambda _job_id, _message: "should not run") is True
4356
4357    out = capsys.readouterr().out
4358    db = AgentDB(tmp_path / "state.db")
4359    try:
4360        job = db.get_job(job_id)
4361        assert "nightly research" in out
4362        assert "should not run" not in out
4363        assert job["metadata"].get("operator_messages") is None
4364    finally:
4365        db.close()
4366
4367
4368def test_chat_command_inside_chat_is_not_queued(monkeypatch, tmp_path, capsys):
4369    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
4370    db = AgentDB(tmp_path / "state.db")
4371    try:
4372        job_id = db.create_job("Research topic", title="nightly research")
4373    finally:
4374        db.close()
4375
4376    assert (
4377        _chat_handle_line(job_id, 'chat "nightly research"', reply_fn=lambda _job_id, _message: "should not run")
4378        is True
4379    )
4380
4381    out = capsys.readouterr().out
4382    db = AgentDB(tmp_path / "state.db")
4383    try:
4384        job = db.get_job(job_id)
4385        assert "already chatting with nightly research" in out
4386        assert "should not run" not in out
4387        assert job["metadata"].get("operator_messages") is None
4388    finally:
4389        db.close()
4390
4391
4392def test_chat_run_accepts_initial_plan_before_starting(monkeypatch, tmp_path):
4393    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
4394    _mark_test_model_ready()
4395    parser = build_parser()
4396    args = parser.parse_args(["create", "Research new topic", "--title", "new research", "--kind", "generic"])
4397    args.func(args)
4398    job_id = "new-research"
4399    captured = {}
4400
4401    def fake_run(run_args):
4402        captured["job_id"] = run_args.job_id
4403
4404    monkeypatch.setattr("nipux_cli.cli.cmd_run", fake_run)
4405
4406    assert _chat_handle_line(job_id, "/run") is True
4407
4408    db = AgentDB(tmp_path / "state.db")
4409    try:
4410        job = db.get_job(job_id)
4411        assert job["status"] == "queued"
4412        assert captured["job_id"] == job_id
4413    finally:
4414        db.close()
4415
4416
4417def test_run_without_jobs_does_not_start_empty_daemon(monkeypatch, tmp_path, capsys):
4418    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
4419    _mark_test_model_ready()
4420    started = {}
4421
4422    def fake_start(**kwargs):
4423        started.update(kwargs)
4424
4425    monkeypatch.setattr("nipux_cli.cli._start_daemon_if_needed", fake_start)
4426
4427    assert _run_shell_line("run") is True
4428
4429    out = capsys.readouterr().out
4430    assert "No jobs found. Create one with /new OBJECTIVE." in out
4431    assert started == {}
4432
4433
4434def test_build_chat_messages_includes_recent_job_state(tmp_path):
4435    db = AgentDB(tmp_path / "state.db")
4436    try:
4437        job_id = db.create_job("Research topic", title="nightly research", kind="generic")
4438        db.create_job("Monitor another branch", title="other branch", kind="generic")
4439        run_id = db.start_run(job_id, model="fake")
4440        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="web_search")
4441        db.finish_step(step_id, status="completed", summary="web_search returned useful sources")
4442        job = db.get_job(job_id)
4443
4444        messages = _build_chat_messages(db, job, "what is going on?")
4445
4446        content = messages[-1]["content"]
4447        assert "Job title: nightly research" in content
4448        assert "Jobs:" in content
4449        assert "* 1. nightly research" in content
4450        assert "- 2. other branch" in content
4451        assert "web_search returned useful sources" in content
4452        assert "what is going on?" in content
4453    finally:
4454        db.close()
4455
4456
4457def test_build_chat_messages_includes_durable_outcome_summary(tmp_path):
4458    db = AgentDB(tmp_path / "state.db")
4459    try:
4460        job_id = db.create_job("Research topic", title="nightly research", kind="generic")
4461        db.append_event(job_id=job_id, event_type="artifact", title="First draft", body="saved report", metadata={})
4462        db.append_event(job_id=job_id, event_type="finding", title="Evidence map", body="", metadata={})
4463        db.append_event(
4464            job_id=job_id,
4465            event_type="experiment",
4466            title="Citation coverage",
4467            body="",
4468            metadata={"metric_name": "citations", "metric_value": 12, "metric_unit": "count"},
4469        )
4470        job = db.get_job(job_id)
4471
4472        messages = _build_chat_messages(db, job, "what has it actually done?")
4473
4474        content = messages[-1]["content"]
4475        assert "Durable outcomes:" in content
4476        assert "summary: 1 outputs 1 findings 1 measurements" in content
4477        assert "save: First draft" in content
4478        assert "find: Evidence map" in content
4479        assert "test: Citation coverage" in content
4480    finally:
4481        db.close()
4482
4483
4484def test_build_chat_messages_does_not_include_local_machine_context(monkeypatch, tmp_path):
4485    monkeypatch.setenv("HOME", str(tmp_path))
4486    ssh_dir = tmp_path / ".ssh"
4487    ssh_dir.mkdir()
4488    (ssh_dir / "config").write_text("Host private-box\n  HostName 10.9.8.7\n  User private\n", encoding="utf-8")
4489    db = AgentDB(tmp_path / "state.db")
4490    try:
4491        job_id = db.create_job("Research topic", title="nightly research", kind="generic")
4492        job = db.get_job(job_id)
4493
4494        messages = _build_chat_messages(db, job, "what is going on?")
4495
4496        content = messages[-1]["content"]
4497        assert "Local CLI context" not in content
4498        assert "private-box" not in content
4499        assert "10.9.8.7" not in content
4500    finally:
4501        db.close()
4502
4503
4504def test_build_chat_messages_points_to_artifact_and_lessons(tmp_path):
4505    db = AgentDB(tmp_path / "state.db")
4506    try:
4507        job_id = db.create_job("Research topic", title="nightly research", kind="generic")
4508        run_id = db.start_run(job_id, model="fake")
4509        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="write_artifact")
4510        ArtifactStore(tmp_path, db=db).write_text(
4511            job_id=job_id,
4512            run_id=run_id,
4513            step_id=step_id,
4514            title="Findings Batch",
4515            summary="15 reusable findings",
4516            content="Acme",
4517        )
4518        db.append_lesson(job_id, "Prefer actual evidence sources over low-evidence pages.", category="strategy")
4519        job = db.get_job(job_id)
4520
4521        messages = _build_chat_messages(db, job, "where are the findings?")
4522
4523        content = messages[-1]["content"]
4524        assert "/artifact 1" in content
4525        assert "Prefer actual evidence sources over low-evidence pages" in content
4526    finally:
4527        db.close()
4528
4529
4530def test_build_chat_messages_clip_large_visible_state(tmp_path):
4531    db = AgentDB(tmp_path / "state.db")
4532    try:
4533        job_id = db.create_job("Research topic", title="nightly research", kind="generic")
4534        for index in range(30):
4535            db.append_event(
4536                job_id=job_id,
4537                event_type="finding",
4538                title=f"large finding {index}",
4539                body="evidence " * 400,
4540                metadata={},
4541            )
4542        job = db.get_job(job_id)
4543
4544        messages = _build_chat_messages(db, job, "keep this exact operator question visible")
4545
4546        content = messages[-1]["content"]
4547        assert len(content) < 14_000
4548        assert "clipped" in content
4549        assert "keep this exact operator question visible" in content
4550    finally:
4551        db.close()
4552
4553
4554def test_artifact_command_resolves_title_query(monkeypatch, tmp_path, capsys):
4555    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
4556    db = AgentDB(tmp_path / "state.db")
4557    try:
4558        job_id = db.create_job("Research topic", title="nightly research")
4559        run_id = db.start_run(job_id, model="fake")
4560        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="write_artifact")
4561        ArtifactStore(tmp_path, db=db).write_text(
4562            job_id=job_id,
4563            run_id=run_id,
4564            step_id=step_id,
4565            title="Findings Batch",
4566            summary="saved findings",
4567            content="Acme Corp\n",
4568        )
4569    finally:
4570        db.close()
4571
4572    parser = build_parser()
4573    args = parser.parse_args(["artifact", "Findings", "Batch"])
4574    args.func(args)
4575
4576    out = capsys.readouterr().out
4577    assert "artifact: Findings Batch" in out
4578    assert "Acme Corp" in out
4579
4580
4581def test_artifacts_command_prints_compact_view_command(monkeypatch, tmp_path, capsys):
4582    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
4583    db = AgentDB(tmp_path / "state.db")
4584    try:
4585        job_id = db.create_job("Research topic", title="nightly research")
4586        run_id = db.start_run(job_id, model="fake")
4587        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="write_artifact")
4588        ArtifactStore(tmp_path, db=db).write_text(
4589            job_id=job_id,
4590            run_id=run_id,
4591            step_id=step_id,
4592            title="Findings Batch",
4593            summary="saved findings",
4594            content="Acme Corp\n",
4595        )
4596    finally:
4597        db.close()
4598
4599    parser = build_parser()
4600    args = parser.parse_args(["artifacts"])
4601    args.func(args)
4602
4603    out = capsys.readouterr().out
4604    assert "saved outputs nightly research" in out
4605    assert "view: artifact 1" in out
4606    assert "/jobs/" not in out
4607
4608
4609def test_artifact_command_opens_recent_output_by_number(monkeypatch, tmp_path, capsys):
4610    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
4611    db = AgentDB(tmp_path / "state.db")
4612    try:
4613        job_id = db.create_job("Research topic", title="nightly research")
4614        run_id = db.start_run(job_id, model="fake")
4615        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="write_artifact")
4616        ArtifactStore(tmp_path, db=db).write_text(
4617            job_id=job_id,
4618            run_id=run_id,
4619            step_id=step_id,
4620            title="Findings Batch",
4621            summary="saved findings",
4622            content="Acme Corp\n",
4623        )
4624    finally:
4625        db.close()
4626
4627    parser = build_parser()
4628    args = parser.parse_args(["artifact", "1"])
4629    args.func(args)
4630
4631    out = capsys.readouterr().out
4632    assert "artifact: Findings Batch" in out
4633    assert "Acme Corp" in out
4634
4635
4636def test_chat_work_defaults_to_compact_output(monkeypatch, tmp_path):
4637    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
4638    db = AgentDB(tmp_path / "state.db")
4639    try:
4640        job_id = db.create_job("Research topic", title="nightly research")
4641    finally:
4642        db.close()
4643    captured = {}
4644
4645    def fake_work(args):
4646        captured["verbose"] = args.verbose
4647        captured["chars"] = args.chars
4648
4649    monkeypatch.setattr("nipux_cli.cli.cmd_work", fake_work)
4650
4651    assert _chat_handle_line(job_id, "/work") is True
4652
4653    assert captured == {"verbose": False, "chars": 260}
4654
4655
4656def test_chat_learn_adds_lesson(monkeypatch, tmp_path, capsys):
4657    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
4658    db = AgentDB(tmp_path / "state.db")
4659    try:
4660        job_id = db.create_job("Research topic", title="nightly research")
4661    finally:
4662        db.close()
4663
4664    assert _chat_handle_line(job_id, "/learn low-evidence pages are not research findings") is True
4665
4666    out = capsys.readouterr().out
4667    db = AgentDB(tmp_path / "state.db")
4668    try:
4669        job = db.get_job(job_id)
4670        assert "learned for nightly research" in out
4671        assert job["metadata"]["last_lesson"]["lesson"] == "low-evidence pages are not research findings"
4672    finally:
4673        db.close()
4674
4675
4676def test_chat_follow_queues_follow_up_message(monkeypatch, tmp_path, capsys):
4677    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
4678    db = AgentDB(tmp_path / "state.db")
4679    try:
4680        job_id = db.create_job("Research topic", title="nightly research")
4681    finally:
4682        db.close()
4683
4684    assert _chat_handle_line(job_id, "/follow after this branch, check another source") is True
4685
4686    out = capsys.readouterr().out
4687    db = AgentDB(tmp_path / "state.db")
4688    try:
4689        job = db.get_job(job_id)
4690        message = job["metadata"]["operator_messages"][-1]
4691        assert "waiting after current branch" in out
4692        assert message["mode"] == "follow_up"
4693        assert message["message"] == "after this branch, check another source"
4694    finally:
4695        db.close()
4696
4697
4698def test_findings_sources_memory_metrics_commands(monkeypatch, tmp_path, capsys):
4699    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
4700    db = AgentDB(tmp_path / "state.db")
4701    try:
4702        job_id = db.create_job("Research topic", title="nightly research")
4703        db.append_finding_record(job_id, name="Acme Finding", category="example category", score=0.8)
4704        db.append_task_record(job_id, title="Explore primary sources", status="open", priority=5)
4705        db.append_experiment_record(job_id, title="Variant A", status="measured", metric_name="score", metric_value=1.5)
4706        db.append_source_record(job_id, "https://example.com", usefulness_score=0.9, yield_count=1)
4707        db.append_lesson(job_id, "Source indexes work.", category="strategy")
4708        db.append_reflection(job_id, "Keep using source indexes.", strategy="Try primary records.")
4709        db.append_memory_graph_records(
4710            job_id,
4711            nodes=[
4712                {
4713                    "key": "source-indexes-work",
4714                    "kind": "strategy",
4715                    "title": "Source indexes work",
4716                    "summary": "Use source indexes when they produce durable records.",
4717                    "salience": 0.8,
4718                    "confidence": 0.9,
4719                }
4720            ],
4721        )
4722    finally:
4723        db.close()
4724
4725    parser = build_parser()
4726    for command in (["findings"], ["tasks"], ["experiments"], ["sources"], ["memory"], ["metrics"]):
4727        args = parser.parse_args(command)
4728        args.func(args)
4729
4730    out = capsys.readouterr().out
4731    assert "Acme Finding" in out
4732    assert "Explore primary sources" in out
4733    assert "Variant A" in out
4734    assert "https://example.com" in out
4735    assert "Keep using source indexes" in out
4736    assert "graph_nodes=1" in out
4737    assert "Source indexes work" in out
4738    assert "tasks: 1" in out
4739    assert "experiments: 1" in out
4740    assert "findings: 1" in out
4741
4742
4743def test_memory_graph_html_command_writes_clickable_artifact(monkeypatch, tmp_path, capsys):
4744    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
4745    db = AgentDB(tmp_path / "state.db")
4746    try:
4747        job_id = db.create_job("Research topic", title="nightly research")
4748        db.append_memory_graph_records(
4749            job_id,
4750            nodes=[
4751                {
4752                    "key": "validated-loop",
4753                    "kind": "skill",
4754                    "status": "stable",
4755                    "title": "Validated loop",
4756                    "summary": "Check progress against measured evidence before expanding scope.",
4757                    "tags": ["validation", "progress"],
4758                    "evidence_refs": ["artifact:report"],
4759                    "salience": 0.9,
4760                    "confidence": 0.8,
4761                },
4762                {
4763                    "key": "open-risk",
4764                    "kind": "question",
4765                    "status": "open",
4766                    "title": "Open risk",
4767                    "summary": "Needs another validation pass.",
4768                },
4769            ],
4770            edges=[{"from_key": "validated-loop", "to_key": "open-risk", "relation": "raises"}],
4771        )
4772    finally:
4773        db.close()
4774
4775    args = build_parser().parse_args(["memory", "--graph"])
4776    args.func(args)
4777
4778    out = capsys.readouterr().out
4779    assert "memory graph written:" in out
4780    db = AgentDB(tmp_path / "state.db")
4781    try:
4782        artifacts = db.list_artifacts(job_id)
4783        assert artifacts[0]["type"] == "html"
4784        html = Path(artifacts[0]["path"]).read_text(encoding="utf-8")
4785        assert "<canvas id=\"graph\"" in html
4786        assert "click a node" in html
4787        assert "Validated loop" in html
4788        assert "open-risk" in html
4789    finally:
4790        db.close()
4791
4792
4793def test_shell_natural_update_phrase_shows_updates(monkeypatch, tmp_path, capsys):
4794    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
4795    db = AgentDB(tmp_path / "state.db")
4796    try:
4797        db.create_job("Research topic", title="research")
4798    finally:
4799        db.close()
4800
4801    assert _run_shell_line("tell me updates") is True
4802    assert _run_shell_line("show outcomes") is True
4803
4804    out = capsys.readouterr().out
4805    assert "updates" in out
4806    assert "queued for" not in out
4807
4808
4809def test_updates_command_summarizes_durable_outcomes(monkeypatch, tmp_path, capsys):
4810    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
4811    db = AgentDB(tmp_path / "state.db")
4812    try:
4813        job_id = db.create_job("Research topic", title="research")
4814        artifact_path = tmp_path / "artifact.md"
4815        artifact_path.write_text("saved", encoding="utf-8")
4816        db.append_event(
4817            job_id,
4818            event_type="tool_call",
4819            title="web_search",
4820            metadata={"input": {"arguments": {"query": "raw search"}}},
4821        )
4822        db.append_finding_record(job_id, name="Durable Result", category="evidence", reason="real outcome", score=0.7)
4823        db.add_artifact(
4824            job_id=job_id,
4825            path=artifact_path,
4826            sha256="abc",
4827            artifact_type="text",
4828            title="Saved Report",
4829            summary="durable output",
4830        )
4831    finally:
4832        db.close()
4833
4834    args = build_parser().parse_args(["updates", "research", "--limit", "3", "--chars", "120"])
4835    args.func(args)
4836
4837    out = capsys.readouterr().out
4838    assert "outcomes by hour:" in out
4839    assert "Durable Result" in out
4840    assert "Saved Report" in out
4841    assert "latest saved outputs:" in out
4842    assert "raw tool stream: activity" in out
4843    assert "recent tool calls:" not in out
4844
4845
4846def test_updates_all_summarizes_durable_work_across_jobs(monkeypatch, tmp_path, capsys):
4847    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
4848    db = AgentDB(tmp_path / "state.db")
4849    try:
4850        first_id = db.create_job("Research first topic", title="first")
4851        second_id = db.create_job("Research second topic", title="second")
4852        first_path = tmp_path / "first.md"
4853        first_path.write_text("first", encoding="utf-8")
4854        second_path = tmp_path / "second.md"
4855        second_path.write_text("second", encoding="utf-8")
4856        db.append_finding_record(first_id, name="First durable finding", category="evidence")
4857        db.add_artifact(
4858            job_id=first_id,
4859            path=first_path,
4860            sha256="abc",
4861            artifact_type="text",
4862            title="First saved output",
4863            summary="first summary",
4864        )
4865        db.append_experiment_record(
4866            second_id,
4867            title="Second measured result",
4868            status="measured",
4869            metric_name="quality",
4870            metric_value=9,
4871            metric_unit="points",
4872        )
4873        db.add_artifact(
4874            job_id=second_id,
4875            path=second_path,
4876            sha256="def",
4877            artifact_type="text",
4878            title="Second saved output",
4879            summary="second summary",
4880        )
4881    finally:
4882        db.close()
4883
4884    args = build_parser().parse_args(["outcomes", "--all", "--limit", "5", "--chars", "120"])
4885    args.func(args)
4886
4887    out = capsys.readouterr().out
4888    assert "outcomes all jobs | 2 tracked" in out
4889    assert "first |" in out
4890    assert "second |" in out
4891    assert "First durable finding" in out
4892    assert "First saved output" in out
4893    assert "Second measured result" in out
4894    assert "Second saved output" in out
4895
4896
4897def test_history_and_events_commands_render_visible_timeline(monkeypatch, tmp_path, capsys):
4898    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
4899    db = AgentDB(tmp_path / "state.db")
4900    try:
4901        job_id = db.create_job("Research topic", title="research")
4902        db.append_operator_message(job_id, "operator timeline note", source="test")
4903        db.append_agent_update(job_id, "agent timeline note", category="chat")
4904    finally:
4905        db.close()
4906
4907    parser = build_parser()
4908    parser.parse_args(["history", "research"]).func(parser.parse_args(["history", "research"]))
4909    parser.parse_args(["events", "research"]).func(parser.parse_args(["events", "research"]))
4910
4911    out = capsys.readouterr().out
4912    assert "history research" in out
4913    assert "events research" in out
4914    assert "operator timeline note" in out
4915    assert "agent timeline note" in out
4916
4917
4918def test_shell_natural_health_phrase_shows_health(monkeypatch, tmp_path, capsys):
4919    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
4920    db = AgentDB(tmp_path / "state.db")
4921    try:
4922        db.create_job("Research topic", title="research")
4923    finally:
4924        db.close()
4925
4926    assert _run_shell_line("is it running") is True
4927
4928    out = capsys.readouterr().out
4929    assert "Nipux Health" in out
4930    assert "queued for" not in out
4931
4932
4933def test_health_prints_recent_daemon_events(monkeypatch, tmp_path, capsys):
4934    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
4935
4936    config = load_config()
4937    append_daemon_event(config, "daemon_error", error_type="RuntimeError", error="provider fell over")
4938
4939    parser = build_parser()
4940    args = parser.parse_args(["health", "--limit", "3"])
4941    args.func(args)
4942
4943    out = capsys.readouterr().out
4944    assert "Nipux Health" in out
4945    assert "daemon_error" in out
4946    assert "RuntimeError" in out
4947
4948
4949def test_launch_agent_plist_contains_daemon_command(monkeypatch, tmp_path):
4950    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
4951
4952    plist = _launch_agent_plist(poll_seconds=7, quiet=True)
4953
4954    assert "com.nipux.agent" in plist
4955    assert "<string>daemon</string>" in plist
4956    assert "<string>--poll-seconds</string>" in plist
4957    assert "<string>7</string>" in plist
4958    assert str(tmp_path) in plist
4959
4960
4961def test_systemd_service_text_contains_daemon_command(monkeypatch, tmp_path):
4962    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
4963
4964    service = _systemd_service_text(poll_seconds=0, quiet=True)
4965
4966    assert "[Service]" in service
4967    assert "ExecStart=" in service
4968    assert "daemon --poll-seconds 0" in service
4969    assert f"Environment=NIPUX_HOME={tmp_path}" in service
4970    assert "Restart=always" in service
tests/nipux_cli/test_cli_model_preflight.py 86 lines
   1from types import SimpleNamespace
   2
   3from nipux_cli.cli import _ensure_remote_model_ready_for_worker, build_parser
   4from nipux_cli.doctor import Check
   5
   6
   7def _config(base_url: str):
   8    return SimpleNamespace(
   9        model=SimpleNamespace(
  10            model="provider/model",
  11            base_url=base_url,
  12            api_key="",
  13            api_key_env="TEST_API_KEY",
  14        )
  15    )
  16
  17
  18def test_remote_model_preflight_blocks_rejected_auth(monkeypatch, capsys):
  19    def fake_doctor(*, config, check_model):
  20        assert check_model is True
  21        return [Check("model_auth", False, "OpenRouter rejected API key: User not found")]
  22
  23    monkeypatch.setattr("nipux_cli.cli.run_doctor", fake_doctor)
  24
  25    assert _ensure_remote_model_ready_for_worker(_config("https://openrouter.ai/api/v1"), fake=False) is False
  26
  27    out = capsys.readouterr().out
  28    assert "model is not ready; daemon not started" in out
  29    assert "model_auth: OpenRouter rejected API key" in out
  30    assert "doctor --check-model" in out
  31
  32
  33def test_remote_model_preflight_allows_recovery_monitor_for_quota(monkeypatch, capsys):
  34    def fake_doctor(*, config, check_model):
  35        assert check_model is True
  36        return [Check("model_generation", False, "Key limit exceeded (total limit) (code=403)")]
  37
  38    monkeypatch.setattr("nipux_cli.cli.run_doctor", fake_doctor)
  39
  40    assert _ensure_remote_model_ready_for_worker(_config("https://openrouter.ai/api/v1"), fake=False) is True
  41
  42    out = capsys.readouterr().out
  43    assert "recovery monitor mode" in out
  44    assert "Key limit exceeded" in out
  45
  46
  47def test_remote_model_preflight_skips_fake_runs(monkeypatch):
  48    def fake_doctor(*, config, check_model):
  49        raise AssertionError("fake runs should not need remote model auth")
  50
  51    monkeypatch.setattr("nipux_cli.cli.run_doctor", fake_doctor)
  52
  53    assert _ensure_remote_model_ready_for_worker(_config("https://openrouter.ai/api/v1"), fake=True) is True
  54
  55
  56def test_model_preflight_checks_local_endpoints(monkeypatch):
  57    called = {}
  58
  59    def fake_doctor(*, config, check_model):
  60        called["check_model"] = check_model
  61        return []
  62
  63    monkeypatch.setattr("nipux_cli.cli.run_doctor", fake_doctor)
  64
  65    assert _ensure_remote_model_ready_for_worker(_config("http://localhost:11434/v1"), fake=False) is True
  66    assert called["check_model"] is True
  67
  68
  69def test_start_does_not_spawn_daemon_when_model_preflight_fails(monkeypatch, tmp_path):
  70    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
  71    checked = {}
  72
  73    def fake_ready(config, *, fake):
  74        checked["fake"] = fake
  75        return False
  76
  77    def fake_popen(*args, **kwargs):
  78        raise AssertionError("daemon should not spawn when model preflight fails")
  79
  80    monkeypatch.setattr("nipux_cli.cli._ensure_remote_model_ready_for_worker", fake_ready)
  81    monkeypatch.setattr("subprocess.Popen", fake_popen)
  82
  83    args = build_parser().parse_args(["start", "--quiet"])
  84    args.func(args)
  85
  86    assert checked["fake"] is False
tests/nipux_cli/test_compression.py 101 lines
   1from nipux_cli.compression import refresh_memory_index
   2from nipux_cli.db import AgentDB
   3
   4
   5def test_refresh_memory_index_includes_durable_progress_ledgers(tmp_path):
   6    db = AgentDB(tmp_path / "state.db")
   7    try:
   8        job_id = db.create_job(
   9            "Keep improving a report",
  10            title="report",
  11            metadata={
  12                "task_queue": [
  13                    {
  14                        "title": "Draft evidence-backed section",
  15                        "status": "active",
  16                        "priority": 10,
  17                        "output_contract": "report",
  18                    }
  19                ],
  20                "finding_ledger": [{"name": "Teacher traces improve tool use"}],
  21                "source_ledger": [{"source": "https://example.test/paper", "usefulness_score": 0.8}],
  22                "experiment_ledger": [
  23                    {
  24                        "title": "Citation density check",
  25                        "status": "measured",
  26                        "metric_name": "citations",
  27                        "metric_value": 12,
  28                        "metric_unit": "count",
  29                    }
  30                ],
  31                "roadmap": {
  32                    "title": "Research paper roadmap",
  33                    "status": "active",
  34                    "current_milestone": "Improve literature review",
  35                },
  36                "memory_graph": {
  37                    "nodes": [
  38                        {
  39                            "title": "Validated evidence loop",
  40                            "kind": "strategy",
  41                            "status": "active",
  42                            "summary": "Evidence-backed checkpoints should drive the next branch.",
  43                            "salience": 0.9,
  44                        }
  45                    ],
  46                    "edges": [
  47                        {
  48                            "from_key": "validated-evidence-loop",
  49                            "to_key": "research-paper-roadmap",
  50                            "relation": "supports",
  51                        }
  52                    ],
  53                },
  54                "pending_measurement_obligation": {
  55                    "source_step_no": 42,
  56                    "tool": "shell_exec",
  57                    "summary": "benchmark output needs accounting",
  58                    "metric_candidates": ["latency 120ms", "throughput 9 req/s"],
  59                },
  60            },
  61        )
  62        db.append_event(
  63            job_id,
  64            event_type="loop",
  65            title="message_end",
  66            metadata={
  67                "usage": {
  68                    "prompt_tokens": 1200,
  69                    "completion_tokens": 300,
  70                    "total_tokens": 1500,
  71                    "estimated": True,
  72                    "context_length": 1600,
  73                    "context_fraction": 0.75,
  74                }
  75            },
  76        )
  77
  78        refresh_memory_index(db, job_id)
  79
  80        memory = db.list_memory(job_id)[0]["summary"]
  81        assert "Durable progress ledgers:" in memory
  82        assert "tasks=1" in memory
  83        assert "findings=1" in memory
  84        assert "sources=1" in memory
  85        assert "experiments=1" in memory
  86        assert "memory_nodes=1" in memory
  87        assert "Validated evidence loop" in memory
  88        assert "memory_links=1" in memory
  89        assert "Draft evidence-backed section" in memory
  90        assert "Citation density check" in memory
  91        assert "Teacher traces improve tool use" in memory
  92        assert "Research paper roadmap" in memory
  93        assert "pending_measurement step=#42 tool=shell_exec" in memory
  94        assert "latency 120ms" in memory
  95        assert "Model usage:" in memory
  96        assert "total_tokens=1.5K" in memory
  97        assert "estimated_calls=1" in memory
  98        assert "context_pressure" in memory
  99        assert "latest_context=1.2K/1.6K" in memory
 100    finally:
 101        db.close()
tests/nipux_cli/test_config.py 143 lines
   1from pathlib import Path
   2
   3from nipux_cli.config import DEFAULT_CONTEXT_LENGTH, default_config_yaml, load_config
   4
   5
   6def _mode(path):
   7    return path.stat().st_mode & 0o777
   8
   9
  10def test_load_config_defaults_to_local_endpoint(tmp_path, monkeypatch):
  11    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
  12
  13    config = load_config()
  14
  15    assert config.runtime.home == tmp_path
  16    assert config.model.model == "local-model"
  17    assert config.model.base_url == "http://localhost:8000/v1"
  18    assert config.model.api_key_env == "OPENAI_API_KEY"
  19    assert config.model.context_length == DEFAULT_CONTEXT_LENGTH
  20    assert config.runtime.state_db_path == tmp_path / "state.db"
  21    assert config.runtime.daily_digest_enabled is True
  22    assert config.runtime.daily_digest_time == "08:00"
  23    assert config.runtime.max_job_cost_usd is None
  24    assert config.tools.browser is True
  25    assert config.tools.web is True
  26    assert config.tools.shell is True
  27    assert config.tools.files is True
  28
  29
  30def test_load_config_from_yaml(tmp_path, monkeypatch):
  31    monkeypatch.setenv("NIPUX_HOME", str(tmp_path / "home"))
  32    cfg = tmp_path / "config.yaml"
  33    cfg.write_text(
  34        """
  35model:
  36  name: local-test
  37  base_url: http://127.0.0.1:9999/v1/
  38  context_length: 12345
  39  input_cost_per_million: 0.1
  40  output_cost_per_million: 0.2
  41runtime:
  42  home: ./agent-home
  43  max_step_seconds: 42
  44  max_job_cost_usd: 12.5
  45  daily_digest_enabled: false
  46  daily_digest_time: "07:30"
  47tools:
  48  browser: false
  49  web: true
  50  shell: false
  51  files: true
  52email:
  53  enabled: true
  54  to_addr: kai@example.com
  55""",
  56        encoding="utf-8",
  57    )
  58
  59    config = load_config(cfg)
  60
  61    assert config.model.model == "local-test"
  62    assert config.model.base_url == "http://127.0.0.1:9999/v1"
  63    assert config.model.context_length == 12345
  64    assert config.model.input_cost_per_million == 0.1
  65    assert config.model.output_cost_per_million == 0.2
  66    assert config.runtime.home == Path("./agent-home")
  67    assert config.runtime.max_step_seconds == 42
  68    assert config.runtime.max_job_cost_usd == 12.5
  69    assert config.runtime.daily_digest_enabled is False
  70    assert config.runtime.daily_digest_time == "07:30"
  71    assert config.tools.browser is False
  72    assert config.tools.web is True
  73    assert config.tools.shell is False
  74    assert config.tools.files is True
  75    assert config.email.enabled is True
  76    assert config.email.to_addr == "kai@example.com"
  77
  78
  79def test_load_config_reads_local_env_file(tmp_path, monkeypatch):
  80    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
  81    monkeypatch.delenv("OPENROUTER_API_KEY", raising=False)
  82    (tmp_path / ".env").write_text("OPENROUTER_API_KEY" + "=secret-test-key\n", encoding="utf-8")
  83    (tmp_path / "config.yaml").write_text(
  84        """
  85model:
  86  name: provider/test-model
  87  base_url: https://openrouter.ai/api/v1
  88  api_key_env: OPENROUTER_API_KEY
  89""",
  90        encoding="utf-8",
  91    )
  92
  93    config = load_config()
  94
  95    assert config.model.api_key == "secret-test-key"
  96
  97
  98def test_load_config_tightens_local_env_permissions(tmp_path, monkeypatch):
  99    monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
 100    monkeypatch.delenv("OPENROUTER_API_KEY", raising=False)
 101    env_path = tmp_path / ".env"
 102    env_path.write_text("OPENROUTER_API_KEY" + "=secret-test-key\n", encoding="utf-8")
 103    env_path.chmod(0o644)
 104
 105    load_config()
 106
 107    assert _mode(env_path) == 0o600
 108
 109
 110def test_default_config_yaml_allows_provider_template_without_secret():
 111    text = default_config_yaml(
 112        model="provider/model",
 113        base_url="https://openrouter.ai/api/v1/",
 114        api_key_env="OPENROUTER_API_KEY",
 115        context_length=8192,
 116    )
 117
 118    assert "name: provider/model" in text
 119    assert "base_url: https://openrouter.ai/api/v1" in text
 120    assert "api_key_env: OPENROUTER_API_KEY" in text
 121    assert "context_length: 8192" in text
 122    assert "input_cost_per_million: null" in text
 123    assert "output_cost_per_million: null" in text
 124    assert "max_job_cost_usd: null" in text
 125    assert "tools:" in text
 126    assert "browser: true" in text
 127    assert "shell: true" in text
 128    assert "sk-" not in text
 129
 130
 131def test_config_example_matches_default_local_endpoint():
 132    root = Path(__file__).resolve().parents[2]
 133    text = (root / "config.example.yaml").read_text(encoding="utf-8")
 134
 135    assert "name: local-model" in text
 136    assert "base_url: http://localhost:8000/v1" in text
 137    assert "api_key_env: OPENAI_API_KEY" in text
 138    assert "input_cost_per_million: null" in text
 139    assert "output_cost_per_million: null" in text
 140    assert "max_job_cost_usd: null" in text
 141    assert "tools:" in text
 142    assert "browser: true" in text
 143    assert "shell: true" in text
tests/nipux_cli/test_daemon.py 560 lines
   1from datetime import datetime, timedelta, timezone
   2import json
   3import threading
   4import time
   5
   6import pytest
   7
   8from nipux_cli.config import AppConfig, RuntimeConfig
   9from nipux_cli.daemon import (
  10    Daemon,
  11    DaemonAlreadyRunning,
  12    RUNTIME_CODE_FILES,
  13    append_daemon_event,
  14    current_runtime_fingerprint,
  15    daemon_lock_status,
  16    read_daemon_events,
  17    runtime_stale,
  18    single_instance_lock,
  19    update_lock_metadata,
  20    _exception_backoff,
  21    _parse_retry_after,
  22    _step_failure_backoff,
  23)
  24from nipux_cli.daemon_control import stop_daemon_process_impl
  25from nipux_cli.db import AgentDB
  26from nipux_cli.worker import StepExecution
  27from nipux_cli.doctor import Check
  28
  29
  30def test_single_instance_lock_rejects_second_holder(tmp_path):
  31    lock_path = tmp_path / "agentd.lock"
  32    with single_instance_lock(lock_path):
  33        with pytest.raises(DaemonAlreadyRunning):
  34            with single_instance_lock(lock_path):
  35                pass
  36
  37
  38def test_daemon_lock_status_reports_free_lock(tmp_path):
  39    status = daemon_lock_status(tmp_path / "agentd.lock")
  40
  41    assert status["running"] is False
  42    assert status["detail"] == "daemon lock is free"
  43
  44
  45def test_lock_metadata_can_be_updated_while_held(tmp_path):
  46    lock_path = tmp_path / "agentd.lock"
  47    with single_instance_lock(lock_path) as handle:
  48        update_lock_metadata(handle, last_state="step", consecutive_failures=2)
  49        status = daemon_lock_status(lock_path)
  50
  51    assert status["running"] is True
  52    assert status["metadata"]["last_state"] == "step"
  53    assert status["metadata"]["consecutive_failures"] == 2
  54    assert status["metadata"]["pid"]
  55    assert status["metadata"]["started_at"]
  56
  57
  58def test_lock_metadata_update_restores_missing_process_fields(tmp_path):
  59    lock_path = tmp_path / "agentd.lock"
  60    with single_instance_lock(lock_path) as handle:
  61        handle.seek(0)
  62        handle.truncate()
  63        handle.write(json.dumps({"last_state": "idle"}))
  64        handle.flush()
  65
  66        update_lock_metadata(handle, last_state="step")
  67        status = daemon_lock_status(lock_path)
  68
  69    assert status["running"] is True
  70    assert status["metadata"]["pid"]
  71    assert status["metadata"]["started_at"]
  72    assert status["metadata"]["last_state"] == "step"
  73
  74
  75def test_daemon_lock_heartbeat_updates_while_worker_turn_runs(monkeypatch, tmp_path):
  76    monkeypatch.setattr("nipux_cli.daemon.WORK_HEARTBEAT_INTERVAL_SECONDS", 0.01)
  77    monkeypatch.setattr("nipux_cli.daemon.signal.getsignal", lambda _sig: None)
  78    monkeypatch.setattr("nipux_cli.daemon.signal.signal", lambda _sig, _handler: None)
  79
  80    class SlowDaemon(Daemon):
  81        def run_once(self, *, fake: bool = False, verbose: bool = False):  # noqa: ARG002
  82            time.sleep(0.2)
  83            return None
  84
  85    config = AppConfig(runtime=RuntimeConfig(home=tmp_path, daily_digest_enabled=False))
  86    db = AgentDB(tmp_path / "state.db")
  87    try:
  88        daemon = SlowDaemon(config=config, db=db)
  89        thread = threading.Thread(
  90            target=daemon.run_forever,
  91            kwargs={"poll_seconds": 0, "quiet": True, "max_iterations": 1},
  92            daemon=True,
  93        )
  94        thread.start()
  95        seen_working: dict | None = None
  96        deadline = time.time() + 1.0
  97        while time.time() < deadline:
  98            status = daemon_lock_status(tmp_path / "agentd.lock")
  99            metadata = status.get("metadata") or {}
 100            if status.get("running") and metadata.get("last_state") == "working":
 101                seen_working = metadata
 102                break
 103            time.sleep(0.01)
 104
 105        thread.join(timeout=2.0)
 106
 107        assert seen_working is not None
 108        assert seen_working["last_heartbeat"]
 109        assert seen_working["runtime"]["runtime_hash"]
 110        assert not thread.is_alive()
 111    finally:
 112        db.close()
 113
 114
 115def test_stop_daemon_recovers_pidless_lock_from_process_list(tmp_path, monkeypatch):
 116    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
 117    lock_path = tmp_path / "agentd.lock"
 118    killed = []
 119
 120    class PsResult:
 121        returncode = 0
 122        stdout = " 99999 /venv/bin/python -m nipux_cli.cli daemon --poll-seconds 0.0 --quiet\n"
 123
 124    monkeypatch.setattr("nipux_cli.daemon_control.subprocess.run", lambda *_args, **_kwargs: PsResult())
 125    monkeypatch.setattr("nipux_cli.daemon_control.os.kill", lambda pid, sig: killed.append((pid, sig)))
 126
 127    with single_instance_lock(lock_path) as handle:
 128        handle.seek(0)
 129        handle.truncate()
 130        handle.write(json.dumps({"last_state": "idle"}))
 131        handle.flush()
 132
 133        stopped = stop_daemon_process_impl(config, wait=0.1, quiet=True, pid_alive=lambda _pid: False)
 134
 135    assert stopped is True
 136    assert killed and killed[0][0] == 99999
 137
 138
 139def test_daemon_lock_status_detects_stale_runtime(tmp_path):
 140    lock_path = tmp_path / "agentd.lock"
 141    with single_instance_lock(lock_path) as handle:
 142        update_lock_metadata(handle, runtime={"runtime_hash": "old"})
 143        status = daemon_lock_status(lock_path)
 144
 145    assert status["running"] is True
 146    assert status["stale"] is True
 147    assert status["current_runtime"]["code_hash"]
 148    assert status["current_runtime"]["code_mtime"]
 149    assert runtime_stale({"runtime": {"runtime_hash": "old"}}) is True
 150    assert runtime_stale({"runtime": current_runtime_fingerprint()}) is False
 151
 152
 153def test_runtime_fingerprint_tracks_progress_code():
 154    assert "progress.py" in RUNTIME_CODE_FILES
 155    assert "parser_builder.py" in RUNTIME_CODE_FILES
 156
 157
 158def test_rate_limit_backoff_uses_retry_after_header():
 159    class RateLimit(Exception):
 160        status_code = 429
 161        response = type("Response", (), {"headers": {"Retry-After": "42"}})()
 162
 163    assert _exception_backoff(RateLimit("too many requests"), poll_seconds=0, consecutive_failures=1) == 42
 164
 165
 166def test_rate_limit_backoff_has_conservative_fallback():
 167    class RateLimit(Exception):
 168        status_code = 429
 169
 170    assert _exception_backoff(RateLimit("rate limit exceeded"), poll_seconds=0, consecutive_failures=1) == 10
 171
 172
 173def test_failed_step_provider_config_error_uses_normal_backoff():
 174    result = StepExecution(
 175        job_id="job",
 176        run_id="run",
 177        step_id="step",
 178        tool_name=None,
 179        status="failed",
 180        result={
 181            "error_type": "PermissionDeniedError",
 182            "error": "Error code: 403 - key limit exceeded",
 183        },
 184    )
 185
 186    assert _step_failure_backoff(result, poll_seconds=0, consecutive_failures=1) == 1
 187
 188
 189def test_failed_tool_auth_error_uses_normal_backoff():
 190    result = StepExecution(
 191        job_id="job",
 192        run_id="run",
 193        step_id="step",
 194        tool_name="shell_exec",
 195        status="failed",
 196        result={
 197            "error": "command output indicates authentication or authorization failure: permission denied",
 198        },
 199    )
 200
 201    assert _step_failure_backoff(result, poll_seconds=3, consecutive_failures=1) == 3
 202
 203
 204def test_failed_step_rate_limit_uses_normal_backoff():
 205    result = StepExecution(
 206        job_id="job",
 207        run_id="run",
 208        step_id="step",
 209        tool_name=None,
 210        status="failed",
 211        result={
 212            "error_type": "RateLimitError",
 213            "error": "429 too many requests",
 214        },
 215    )
 216
 217    assert _step_failure_backoff(result, poll_seconds=0, consecutive_failures=1) == 1
 218
 219
 220def test_failed_step_provider_timeout_uses_normal_backoff():
 221    result = StepExecution(
 222        job_id="job",
 223        run_id="run",
 224        step_id="step",
 225        tool_name=None,
 226        status="failed",
 227        result={
 228            "error_type": "APITimeoutError",
 229            "error": "Request timed out.",
 230        },
 231    )
 232
 233    assert _step_failure_backoff(result, poll_seconds=0, consecutive_failures=1) == 1
 234
 235
 236def test_retry_after_parses_epoch_milliseconds():
 237    future_ms = str(int((time.time() + 5) * 1000))
 238
 239    parsed = _parse_retry_after(future_ms)
 240
 241    assert parsed is not None
 242    assert 0 <= parsed <= 6
 243
 244
 245def test_daemon_run_once_claims_next_job_with_fake_step(tmp_path):
 246    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
 247    db = AgentDB(tmp_path / "state.db")
 248    try:
 249        job_id = db.create_job("Run forever in small steps")
 250        daemon = Daemon(config=config, db=db)
 251
 252        result = daemon.run_once(fake=True)
 253
 254        assert result is not None
 255        assert result.job_id == job_id
 256        assert result.status == "completed"
 257        assert db.list_artifacts(job_id)[0]["title"] == "daemon-fake-step"
 258    finally:
 259        db.close()
 260
 261
 262def test_daemon_ignores_ui_focus_for_worker_scheduling(tmp_path):
 263    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
 264    db = AgentDB(tmp_path / "state.db")
 265    try:
 266        first = db.create_job("First job", title="first")
 267        second = db.create_job("Second job", title="second")
 268        (tmp_path / "shell_state.json").write_text(json.dumps({"focus_job_id": second}), encoding="utf-8")
 269        daemon = Daemon(config=config, db=db)
 270
 271        job = daemon.next_runnable_job()
 272
 273        assert first != second
 274        assert job is not None
 275        assert job["id"] == first
 276    finally:
 277        db.close()
 278
 279
 280def test_daemon_skips_deferred_jobs_until_due(tmp_path):
 281    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
 282    db = AgentDB(tmp_path / "state.db")
 283    try:
 284        deferred = db.create_job("Deferred job", title="deferred")
 285        ready = db.create_job("Ready job", title="ready")
 286        db.update_job_status(
 287            deferred,
 288            "queued",
 289            metadata_patch={"defer_until": (datetime.now(timezone.utc) + timedelta(hours=1)).isoformat()},
 290        )
 291        daemon = Daemon(config=config, db=db)
 292
 293        job = daemon.next_runnable_job()
 294
 295        assert job is not None
 296        assert job["id"] == ready
 297    finally:
 298        db.close()
 299
 300
 301def test_daemon_quarantines_provider_blocked_jobs(tmp_path):
 302    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
 303    db = AgentDB(tmp_path / "state.db")
 304    try:
 305        blocked = db.create_job("Provider blocked job", title="blocked")
 306        ready = db.create_job("Ready job", title="ready")
 307        db.update_job_status(
 308            blocked,
 309            "running",
 310            metadata_patch={"provider_blocked_at": datetime.now(timezone.utc).isoformat()},
 311        )
 312        daemon = Daemon(config=config, db=db)
 313
 314        job = daemon.next_runnable_job()
 315
 316        assert job is not None
 317        assert job["id"] == ready
 318        blocked_job = db.get_job(blocked)
 319        assert blocked_job["status"] == "paused"
 320        assert "provider" in blocked_job["metadata"]["last_note"].lower()
 321        events = db.list_events(job_id=blocked, limit=10)
 322        assert any(event["event_type"] == "agent_message" and event["metadata"].get("reason") == "llm_provider_blocked" for event in events)
 323    finally:
 324        db.close()
 325
 326
 327def test_daemon_leaves_provider_blocked_job_paused_until_model_recovers(monkeypatch, tmp_path):
 328    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
 329    db = AgentDB(tmp_path / "state.db")
 330    try:
 331        blocked_at = (datetime.now(timezone.utc) - timedelta(hours=1)).isoformat()
 332        job_id = db.create_job("Provider blocked job", title="blocked")
 333        db.update_job_status(
 334            job_id,
 335            "paused",
 336            metadata_patch={"provider_blocked_at": blocked_at},
 337        )
 338
 339        def fake_doctor(*, config, check_model):
 340            assert check_model is True
 341            return [Check("model_generation", False, "key limit exceeded")]
 342
 343        monkeypatch.setattr("nipux_cli.daemon.run_doctor", fake_doctor)
 344        daemon = Daemon(config=config, db=db)
 345
 346        assert daemon.next_runnable_job() is None
 347        job = db.get_job(job_id)
 348        assert job["status"] == "paused"
 349        assert job["metadata"]["provider_last_probe_detail"].startswith("model_generation")
 350        assert read_daemon_events(config, limit=1)[0]["event"] == "provider_recovery_wait"
 351    finally:
 352        db.close()
 353
 354
 355def test_daemon_resumes_provider_blocked_job_when_model_recovers(monkeypatch, tmp_path):
 356    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
 357    db = AgentDB(tmp_path / "state.db")
 358    try:
 359        blocked_at = (datetime.now(timezone.utc) - timedelta(hours=1)).isoformat()
 360        job_id = db.create_job("Provider blocked job", title="blocked")
 361        db.update_job_status(
 362            job_id,
 363            "paused",
 364            metadata_patch={"provider_blocked_at": blocked_at},
 365        )
 366
 367        def fake_doctor(*, config, check_model):
 368            assert check_model is True
 369            return []
 370
 371        monkeypatch.setattr("nipux_cli.daemon.run_doctor", fake_doctor)
 372        daemon = Daemon(config=config, db=db)
 373
 374        job = daemon.next_runnable_job()
 375        stored = db.get_job(job_id)
 376
 377        assert job is not None
 378        assert job["id"] == job_id
 379        assert stored["status"] == "queued"
 380        assert stored["metadata"]["provider_unblocked_at"]
 381        events = db.list_events(job_id=job_id, limit=10)
 382        assert any(event["event_type"] == "agent_message" and event["metadata"].get("reason") == "llm_provider_recovered" for event in events)
 383    finally:
 384        db.close()
 385
 386
 387def test_daemon_idle_sleep_wakes_for_deferred_job(tmp_path):
 388    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
 389    db = AgentDB(tmp_path / "state.db")
 390    try:
 391        now = datetime.now(timezone.utc)
 392        job_id = db.create_job("Deferred job", title="deferred")
 393        db.update_job_status(
 394            job_id,
 395            "queued",
 396            metadata_patch={"defer_until": (now + timedelta(seconds=2)).isoformat()},
 397        )
 398        daemon = Daemon(config=config, db=db)
 399
 400        sleep_seconds = daemon.idle_sleep_seconds(poll_seconds=30, now=now)
 401
 402        assert 1.9 <= sleep_seconds <= 2.1
 403    finally:
 404        db.close()
 405
 406
 407def test_daemon_idle_sleep_uses_poll_when_no_deferred_jobs(tmp_path):
 408    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
 409    db = AgentDB(tmp_path / "state.db")
 410    try:
 411        db.create_job("Ready job", title="ready")
 412        daemon = Daemon(config=config, db=db)
 413
 414        assert daemon.idle_sleep_seconds(poll_seconds=30) == 30
 415        assert daemon.idle_sleep_seconds(poll_seconds=0) == 5.0
 416    finally:
 417        db.close()
 418
 419
 420def test_daemon_advances_multiple_runnable_jobs_without_focus_starvation(tmp_path):
 421    config = AppConfig(runtime=RuntimeConfig(home=tmp_path, daily_digest_enabled=False))
 422    db = AgentDB(tmp_path / "state.db")
 423    try:
 424        first = db.create_job("First job", title="first")
 425        second = db.create_job("Second job", title="second")
 426        (tmp_path / "shell_state.json").write_text(json.dumps({"focus_job_id": second}), encoding="utf-8")
 427        daemon = Daemon(config=config, db=db)
 428
 429        daemon.run_forever(poll_seconds=0, quiet=True, max_iterations=4, fake=True)
 430
 431        assert db.list_steps(job_id=first)
 432        assert db.list_steps(job_id=second)
 433    finally:
 434        db.close()
 435
 436
 437def test_daemon_writes_due_daily_digest_once(tmp_path):
 438    config = AppConfig(runtime=RuntimeConfig(home=tmp_path, daily_digest_time="00:00"))
 439    db = AgentDB(tmp_path / "state.db")
 440    try:
 441        db.create_job("Keep finding findings", title="findings")
 442        daemon = Daemon(config=config, db=db)
 443        now = datetime(2026, 4, 23, 8, 30)
 444
 445        first = daemon.send_due_daily_digest(now=now)
 446        second = daemon.send_due_daily_digest(now=now)
 447
 448        assert first is not None
 449        assert first["status"] == "dry_run"
 450        assert second is None
 451        assert (tmp_path / "digests" / "2026-04-23-daily.md").exists()
 452    finally:
 453        db.close()
 454
 455
 456def test_daemon_event_log_round_trips_jsonl(tmp_path):
 457    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
 458
 459    path = append_daemon_event(config, "step", job_id="job_1", status="completed")
 460    events = read_daemon_events(config, limit=3)
 461
 462    assert path.name == "daemon-events.jsonl"
 463    assert events[-1]["event"] == "step"
 464    assert events[-1]["job_id"] == "job_1"
 465
 466
 467def test_daemon_recovers_stale_running_steps_on_start(tmp_path):
 468    config = AppConfig(runtime=RuntimeConfig(home=tmp_path, daily_digest_enabled=False))
 469    db = AgentDB(tmp_path / "state.db")
 470    try:
 471        job_id = db.create_job("Recover stale work", title="stale")
 472        run_id = db.start_run(job_id, model="fake")
 473        stale_step = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="browser_navigate")
 474        daemon = Daemon(config=config, db=db)
 475
 476        daemon.run_forever(poll_seconds=0, quiet=True, max_iterations=1, fake=True)
 477
 478        steps = db.list_steps(job_id=job_id)
 479        stale = next(step for step in steps if step["id"] == stale_step)
 480        events = read_daemon_events(config, limit=5)
 481        assert stale["status"] == "failed"
 482        assert stale["error"] == "daemon recovered abandoned running work from a previous process"
 483        assert db.list_runs(job_id, limit=10)[-1]["status"] == "failed"
 484        assert any(event.get("event") == "stale_work_recovered" for event in events)
 485    finally:
 486        db.close()
 487
 488
 489def test_daemon_survives_unexpected_step_exception(tmp_path):
 490    class ExplodingDaemon(Daemon):
 491        def run_once(self, *, fake: bool = False, verbose: bool = False):  # noqa: ARG002
 492            raise RuntimeError("provider fell over")
 493
 494    config = AppConfig(runtime=RuntimeConfig(home=tmp_path, daily_digest_enabled=False))
 495    db = AgentDB(tmp_path / "state.db")
 496    try:
 497        daemon = ExplodingDaemon(config=config, db=db)
 498
 499        daemon.run_forever(poll_seconds=0, quiet=True, max_iterations=1)
 500
 501        status = daemon_lock_status(tmp_path / "agentd.lock")
 502        events = read_daemon_events(config, limit=5)
 503        assert status["metadata"]["last_state"] == "error"
 504        assert status["metadata"]["consecutive_failures"] == 1
 505        assert any(event.get("event") == "daemon_error" for event in events)
 506    finally:
 507        db.close()
 508
 509
 510def test_daemon_treats_blocked_steps_as_recoverable(tmp_path):
 511    class BlockedDaemon(Daemon):
 512        def run_once(self, *, fake: bool = False, verbose: bool = False):  # noqa: ARG002
 513            return StepExecution(
 514                job_id="job",
 515                run_id="run",
 516                step_id="step",
 517                tool_name="web_search",
 518                status="blocked",
 519                result={"error": "search loop blocked", "recoverable": True},
 520            )
 521
 522    config = AppConfig(runtime=RuntimeConfig(home=tmp_path, daily_digest_enabled=False))
 523    db = AgentDB(tmp_path / "state.db")
 524    try:
 525        daemon = BlockedDaemon(config=config, db=db)
 526
 527        daemon.run_forever(poll_seconds=0, quiet=True, max_iterations=3)
 528
 529        status = daemon_lock_status(tmp_path / "agentd.lock")
 530        events = read_daemon_events(config, limit=10)
 531        assert status["metadata"]["consecutive_failures"] == 0
 532        assert sum(1 for event in events if event.get("event") == "step") == 3
 533        assert not any(event.get("event") == "daemon_error" for event in events)
 534    finally:
 535        db.close()
 536
 537
 538def test_fake_daemon_can_run_100_iterations_without_auto_stop(tmp_path):
 539    config = AppConfig(runtime=RuntimeConfig(home=tmp_path, daily_digest_enabled=False))
 540    db = AgentDB(tmp_path / "state.db")
 541    try:
 542        job_id = db.create_job("Run a long fake worker", title="long")
 543        daemon = Daemon(config=config, db=db)
 544
 545        daemon.run_forever(poll_seconds=0, quiet=True, max_iterations=100, fake=True)
 546
 547        steps = db.list_steps(job_id=job_id)
 548        assert len(steps) == 100
 549        assert any(step["kind"] == "reflection" for step in steps)
 550        assert db.list_artifacts(job_id)
 551        memory = db.list_memory(job_id)
 552        assert memory
 553        assert memory[0]["key"] == "rolling_state"
 554        assert "Recent steps:" in memory[0]["summary"]
 555        assert db.get_job(job_id)["status"] in {"queued", "running"}
 556        step_events = [event for event in read_daemon_events(config, limit=120) if event.get("event") == "step"]
 557        assert len(step_events) == 100
 558        assert daemon_lock_status(tmp_path / "agentd.lock")["running"] is False
 559    finally:
 560        db.close()
tests/nipux_cli/test_dashboard.py 86 lines
   1from datetime import datetime, timedelta, timezone
   2
   3from nipux_cli.artifacts import ArtifactStore
   4from nipux_cli.config import AppConfig, RuntimeConfig
   5from nipux_cli.dashboard import collect_dashboard_state, render_dashboard, render_overview
   6from nipux_cli.db import AgentDB
   7
   8
   9def test_dashboard_collects_jobs_steps_and_artifacts(tmp_path):
  10    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
  11    db = AgentDB(tmp_path / "state.db")
  12    try:
  13        job_id = db.create_job("Research topic every morning", title="research", kind="generic")
  14        run_id = db.start_run(job_id, model="fake-model")
  15        step_id = db.add_step(
  16            job_id=job_id,
  17            run_id=run_id,
  18            kind="tool",
  19            tool_name="write_artifact",
  20            input_data={"arguments": {"title": "Findings"}},
  21        )
  22        ArtifactStore(tmp_path, db=db).write_text(
  23            job_id=job_id,
  24            run_id=run_id,
  25            step_id=step_id,
  26            title="Findings",
  27            summary="first saved finding",
  28            content="Acme Corp",
  29        )
  30        db.finish_step(step_id, status="completed", summary="saved finding", output_data={"success": True})
  31        db.finish_run(run_id, "completed")
  32        db.append_lesson(job_id, "Low-evidence summaries are not finding batches.", category="source_quality")
  33        db.append_task_record(job_id, title="Explore primary sources", status="open", priority=5)
  34
  35        state = collect_dashboard_state(db, config, job_id=job_id)
  36        rendered = render_dashboard(state, width=100)
  37        overview = render_overview(state, width=100)
  38
  39        assert state["daemon"]["running"] is False
  40        assert state["focus"]["counts"]["artifacts"] == 1
  41        assert state["focus"]["counts"]["tasks"] == 1
  42        assert "Nipux CLI Dashboard" in rendered
  43        assert "research" in rendered
  44        assert "write_artifact" in rendered
  45        assert "Findings" in rendered
  46        assert "Low-evidence summaries are not finding batches" in rendered
  47        assert "Explore primary sources" in rendered
  48        assert "Nipux Status" in overview
  49        assert "latest artifact: Findings" in overview
  50        assert "latest lesson:" in overview
  51    finally:
  52        db.close()
  53
  54
  55def test_overview_marks_idle_daemon_as_ready_for_work(tmp_path):
  56    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
  57    db = AgentDB(tmp_path / "state.db")
  58    try:
  59        db.create_job("Research topic", title="research")
  60        state = collect_dashboard_state(db, config)
  61        overview = render_overview(state, width=100)
  62
  63        assert "ready when work starts" in overview
  64    finally:
  65        db.close()
  66
  67
  68def test_overview_marks_old_heartbeat_as_busy_for_running_step(tmp_path):
  69    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
  70    db = AgentDB(tmp_path / "state.db")
  71    try:
  72        job_id = db.create_job("Measure a process", title="measure")
  73        run_id = db.start_run(job_id, model="fake-model")
  74        db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec", status="running")
  75        state = collect_dashboard_state(db, config, job_id=job_id)
  76        state["daemon"]["running"] = True
  77        state["daemon"]["metadata"] = {
  78            "last_heartbeat": (datetime.now(timezone.utc) - timedelta(seconds=180)).isoformat(),
  79        }
  80
  81        overview = render_overview(state, width=100)
  82
  83        assert "busy #1 shell_exec" in overview
  84        assert "heartbeat 180s ago (stale)" not in overview
  85    finally:
  86        db.close()
tests/nipux_cli/test_db.py 609 lines
   1from nipux_cli.db import AgentDB
   2
   3
   4def test_db_job_run_step_and_artifact_roundtrip(tmp_path):
   5    db = AgentDB(tmp_path / "state.db")
   6    try:
   7        job_id = db.create_job("Research topic every day", title="research", kind="generic")
   8        assert job_id == "research"
   9        job = db.get_job(job_id)
  10        assert job["status"] == "queued"
  11        assert job["kind"] == "generic"
  12
  13        run_id = db.start_run(job_id, model="local-test-model")
  14        step_id = db.add_step(
  15            job_id=job_id,
  16            run_id=run_id,
  17            kind="tool",
  18            tool_name="write_artifact",
  19            input_data={"x": 1},
  20        )
  21        db.finish_step(step_id, status="completed", summary="wrote artifact", output_data={"ok": True})
  22        artifact_id = db.add_artifact(
  23            job_id=job_id,
  24            run_id=run_id,
  25            step_id=step_id,
  26            path=tmp_path / "artifact.md",
  27            sha256="abc",
  28            artifact_type="text",
  29            title="A",
  30        )
  31        db.finish_run(run_id, "completed")
  32
  33        assert db.get_job(job_id)["status"] == "running"
  34        assert db.list_steps(run_id=run_id)[0]["output"]["ok"] is True
  35        assert db.list_runs(job_id)[0]["id"] == run_id
  36        assert db.get_artifact(artifact_id)["title"] == "A"
  37        assert db.list_artifacts(job_id)[0]["id"] == artifact_id
  38    finally:
  39        db.close()
  40
  41
  42def test_create_job_uses_unique_readable_slug_ids(tmp_path):
  43    db = AgentDB(tmp_path / "state.db")
  44    try:
  45        first = db.create_job("Research topic", title="Nightly Research")
  46        second = db.create_job("Research more topics", title="Nightly Research")
  47
  48        assert first == "nightly-research"
  49        assert second == "nightly-research-2"
  50    finally:
  51        db.close()
  52
  53
  54def test_step_numbers_increment_across_runs_for_a_job(tmp_path):
  55    db = AgentDB(tmp_path / "state.db")
  56    try:
  57        job_id = db.create_job("Long job")
  58        run_1 = db.start_run(job_id, model="fake")
  59        step_1 = db.add_step(job_id=job_id, run_id=run_1, kind="tool")
  60        db.finish_step(step_1, status="completed")
  61        db.finish_run(run_1, "completed")
  62
  63        run_2 = db.start_run(job_id, model="fake")
  64        step_2 = db.add_step(job_id=job_id, run_id=run_2, kind="tool")
  65
  66        steps = db.list_steps(job_id=job_id)
  67        assert step_2 != step_1
  68        assert [step["step_no"] for step in steps] == [1, 2]
  69    finally:
  70        db.close()
  71
  72
  73def test_job_token_usage_aggregates_message_usage(tmp_path):
  74    db = AgentDB(tmp_path / "state.db")
  75    try:
  76        job_id = db.create_job("Long job")
  77        db.append_event(
  78            job_id,
  79            event_type="loop",
  80            title="message_end",
  81            metadata={
  82                "usage": {
  83                    "prompt_tokens": 100,
  84                    "completion_tokens": 25,
  85                    "total_tokens": 125,
  86                    "cost": 0.001,
  87                    "prompt_tokens_details": {"cached_tokens": 10},
  88                    "completion_tokens_details": {"reasoning_tokens": 3},
  89                }
  90            },
  91        )
  92        db.append_event(
  93            job_id,
  94            event_type="loop",
  95            title="message_end",
  96            metadata={
  97                "usage": {
  98                    "prompt_tokens": 150,
  99                    "completion_tokens": 50,
 100                    "total_tokens": 200,
 101                    "estimated": True,
 102                    "context_length": 1000,
 103                    "context_fraction": 0.15,
 104                }
 105            },
 106        )
 107
 108        usage = db.job_token_usage(job_id)
 109
 110        assert usage["prompt_tokens"] == 250
 111        assert usage["completion_tokens"] == 75
 112        assert usage["total_tokens"] == 325
 113        assert usage["latest_prompt_tokens"] == 150
 114        assert usage["cost"] == 0.001
 115        assert usage["has_cost"] is True
 116        assert usage["estimated_calls"] == 1
 117        assert usage["reasoning_tokens"] == 3
 118        assert usage["cached_tokens"] == 10
 119        assert usage["latest_context_length"] == 1000
 120        assert usage["latest_context_fraction"] == 0.15
 121    finally:
 122        db.close()
 123
 124
 125def test_append_operator_message_roundtrip(tmp_path):
 126    db = AgentDB(tmp_path / "state.db")
 127    try:
 128        job_id = db.create_job("Research topic")
 129
 130        entry = db.append_operator_message(job_id, "Focus on artifact-backed findings", source="shell")
 131        job = db.get_job(job_id)
 132
 133        assert entry["message"] == "Focus on artifact-backed findings"
 134        assert job["metadata"]["operator_messages"][0]["source"] == "shell"
 135        assert job["metadata"]["operator_messages"][0]["mode"] == "steer"
 136        assert job["metadata"]["operator_messages"][0]["message"] == "Focus on artifact-backed findings"
 137        assert job["metadata"]["last_operator_message"]["message"] == "Focus on artifact-backed findings"
 138        events = db.list_timeline_events(job_id)
 139        assert events[-1]["event_type"] == "operator_message"
 140        assert events[-1]["body"] == "Focus on artifact-backed findings"
 141    finally:
 142        db.close()
 143
 144
 145def test_claim_operator_messages_marks_one_message_at_a_time(tmp_path):
 146    db = AgentDB(tmp_path / "state.db")
 147    try:
 148        job_id = db.create_job("Research topic")
 149        first = db.append_operator_message(job_id, "first steer", source="chat")
 150        db.append_operator_message(job_id, "second steer", source="chat")
 151
 152        claimed = db.claim_operator_messages(job_id, modes=("steer",), limit=1)
 153        second_claim = db.claim_operator_messages(job_id, modes=("steer",), limit=1)
 154
 155        job = db.get_job(job_id)
 156        messages = job["metadata"]["operator_messages"]
 157        events = db.list_timeline_events(job_id, limit=20)
 158
 159        assert [item["message"] for item in claimed] == ["first steer"]
 160        assert claimed[0]["event_id"] == first["event_id"]
 161        assert [item["message"] for item in second_claim] == ["second steer"]
 162        assert all(message.get("claimed_at") for message in messages)
 163        assert any(event["event_type"] == "loop" and event["title"] == "steering claimed" for event in events)
 164    finally:
 165        db.close()
 166
 167
 168def test_acknowledge_operator_messages_marks_delivered_context(tmp_path):
 169    db = AgentDB(tmp_path / "state.db")
 170    try:
 171        job_id = db.create_job("Research topic")
 172        entry = db.append_operator_message(job_id, "correct the target before continuing", source="chat")
 173        db.claim_operator_messages(job_id, modes=("steer",), limit=1)
 174
 175        result = db.acknowledge_operator_messages(
 176            job_id,
 177            message_ids=[entry["event_id"]],
 178            summary="target correction incorporated",
 179        )
 180
 181        job = db.get_job(job_id)
 182        message = job["metadata"]["operator_messages"][0]
 183        events = db.list_timeline_events(job_id, limit=20)
 184
 185        assert result["count"] == 1
 186        assert message["acknowledged_at"]
 187        assert job["metadata"]["last_operator_context_ack"]["summary"] == "target correction incorporated"
 188        assert any(event["event_type"] == "operator_context" for event in events)
 189    finally:
 190        db.close()
 191
 192
 193def test_rename_job_updates_title_without_changing_id(tmp_path):
 194    db = AgentDB(tmp_path / "state.db")
 195    try:
 196        job_id = db.create_job("Research topic", title="old title")
 197
 198        renamed = db.rename_job(job_id, "new title")
 199        job = db.get_job(job_id)
 200
 201        assert renamed["id"] == job_id
 202        assert renamed["title"] == "new title"
 203        assert job["title"] == "new title"
 204    finally:
 205        db.close()
 206
 207
 208def test_delete_job_removes_related_rows(tmp_path):
 209    db = AgentDB(tmp_path / "state.db")
 210    try:
 211        job_id = db.create_job("Research topic", title="delete me")
 212        run_id = db.start_run(job_id, model="fake")
 213        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="write_artifact")
 214        artifact_path = tmp_path / "artifact.md"
 215        artifact_path.write_text("artifact", encoding="utf-8")
 216        db.add_artifact(
 217            job_id=job_id,
 218            run_id=run_id,
 219            step_id=step_id,
 220            path=artifact_path,
 221            sha256="abc",
 222            artifact_type="text",
 223            title="Artifact",
 224        )
 225        db.upsert_memory(job_id=job_id, key="rolling_state", summary="summary")
 226
 227        result = db.delete_job(job_id)
 228
 229        assert result["job"]["title"] == "delete me"
 230        assert result["counts"]["runs"] == 1
 231        assert result["counts"]["steps"] == 1
 232        assert result["counts"]["artifacts"] == 1
 233        assert result["counts"]["memory"] == 1
 234        try:
 235            db.get_job(job_id)
 236        except KeyError:
 237            pass
 238        else:
 239            raise AssertionError("job still exists after delete")
 240        assert db.list_steps(job_id=job_id) == []
 241        assert db.list_artifacts(job_id) == []
 242        assert db.list_memory(job_id) == []
 243    finally:
 244        db.close()
 245
 246
 247def test_append_lesson_roundtrip(tmp_path):
 248    db = AgentDB(tmp_path / "state.db")
 249    try:
 250        job_id = db.create_job("Research topic")
 251
 252        entry = db.append_lesson(
 253            job_id,
 254            "Low-evidence pages are not useful evidence sources.",
 255            category="source_quality",
 256            confidence=0.9,
 257        )
 258        job = db.get_job(job_id)
 259
 260        assert entry["category"] == "source_quality"
 261        assert job["metadata"]["lessons"][0]["lesson"] == "Low-evidence pages are not useful evidence sources."
 262        assert job["metadata"]["last_lesson"]["confidence"] == 0.9
 263    finally:
 264        db.close()
 265
 266
 267def test_append_lesson_dedupes_repeated_memory(tmp_path):
 268    db = AgentDB(tmp_path / "state.db")
 269    try:
 270        job_id = db.create_job("Research topic")
 271
 272        first = db.append_lesson(job_id, "Use primary source indexes.", category="strategy", metadata={"step": 1})
 273        second = db.append_lesson(job_id, "Use primary source indexes.", category="strategy", metadata={"step": 2})
 274        job = db.get_job(job_id)
 275
 276        assert first["lesson"] == second["lesson"]
 277        assert len(job["metadata"]["lessons"]) == 1
 278        assert job["metadata"]["lessons"][0]["seen_count"] == 2
 279        assert job["metadata"]["lessons"][0]["metadata"]["step"] == 2
 280        assert first["created"] is True
 281        assert second["created"] is False
 282        assert second["substantive_update"] is False
 283        assert len(db.list_events(job_id=job_id, event_types=["lesson"])) == 1
 284    finally:
 285        db.close()
 286
 287
 288def test_source_and_finding_ledgers_dedupe_and_update(tmp_path):
 289    db = AgentDB(tmp_path / "state.db")
 290    try:
 291        job_id = db.create_job("Research topic")
 292
 293        source = db.append_source_record(
 294            job_id,
 295            "https://example.com/source",
 296            source_type="web_source",
 297            usefulness_score=0.7,
 298            yield_count=3,
 299            outcome="yielded reusable findings",
 300        )
 301        updated_source = db.append_source_record(
 302            job_id,
 303            "https://example.com/source",
 304            usefulness_score=0.9,
 305            yield_count=2,
 306            fail_count_delta=1,
 307        )
 308        finding = db.append_finding_record(
 309            job_id,
 310            name="Acme Finding",
 311            url="https://acme.example",
 312            category="example category",
 313            score=0.8,
 314        )
 315        updated_finding = db.append_finding_record(
 316            job_id,
 317            name="Acme Finding",
 318            url="https://acme.example",
 319            contact="source note",
 320            score=0.85,
 321        )
 322        reflection = db.append_reflection(job_id, "Keep using source indexes", strategy="Prioritize primary records")
 323        job = db.get_job(job_id)
 324
 325        assert source["key"] == updated_source["key"]
 326        assert source["created"] is True
 327        assert updated_source["created"] is False
 328        assert updated_source["yield_count"] == 5
 329        assert updated_source["fail_count"] == 1
 330        assert finding["created"] is True
 331        assert updated_finding["created"] is False
 332        assert updated_finding["contact"] == "source note"
 333        assert len(job["metadata"]["source_ledger"]) == 1
 334        assert len(job["metadata"]["finding_ledger"]) == 1
 335        assert job["metadata"]["last_reflection"]["summary"] == reflection["summary"]
 336    finally:
 337        db.close()
 338
 339
 340def test_repeated_source_and_finding_records_mark_non_substantive_touches(tmp_path):
 341    db = AgentDB(tmp_path / "state.db")
 342    try:
 343        job_id = db.create_job("Research topic")
 344
 345        db.append_source_record(
 346            job_id,
 347            "https://example.com/source",
 348            source_type="web_source",
 349            usefulness_score=0.7,
 350            outcome="yielded reusable findings",
 351        )
 352        repeated_source = db.append_source_record(
 353            job_id,
 354            "https://example.com/source",
 355            source_type="web_source",
 356            usefulness_score=0.7,
 357            outcome="yielded reusable findings",
 358        )
 359        changed_source = db.append_source_record(
 360            job_id,
 361            "https://example.com/source",
 362            source_type="web_source",
 363            usefulness_score=0.8,
 364            outcome="yielded reusable findings",
 365        )
 366
 367        db.append_finding_record(job_id, name="Acme Finding", url="https://acme.example", score=0.8)
 368        repeated_finding = db.append_finding_record(job_id, name="Acme Finding", url="https://acme.example", score=0.8)
 369        changed_finding = db.append_finding_record(
 370            job_id,
 371            name="Acme Finding",
 372            url="https://acme.example",
 373            score=0.9,
 374        )
 375
 376        assert repeated_source["created"] is False
 377        assert repeated_source["substantive_update"] is False
 378        assert changed_source["substantive_update"] is True
 379        assert repeated_finding["created"] is False
 380        assert repeated_finding["substantive_update"] is False
 381        assert changed_finding["substantive_update"] is True
 382    finally:
 383        db.close()
 384
 385
 386def test_task_queue_dedupes_and_updates(tmp_path):
 387    db = AgentDB(tmp_path / "state.db")
 388    try:
 389        job_id = db.create_job("Research topic")
 390
 391        first = db.append_task_record(
 392            job_id,
 393            title="Explore primary sources",
 394            status="open",
 395            priority=3,
 396            goal="Find direct evidence",
 397        )
 398        second = db.append_task_record(
 399            job_id,
 400            title="Explore primary sources",
 401            status="done",
 402            priority=5,
 403            result="Saved source artifact",
 404        )
 405        job = db.get_job(job_id)
 406
 407        assert first["created"] is True
 408        assert second["created"] is False
 409        assert len(job["metadata"]["task_queue"]) == 1
 410        assert job["metadata"]["task_queue"][0]["status"] == "done"
 411        assert job["metadata"]["task_queue"][0]["priority"] == 5
 412        assert job["metadata"]["task_queue"][0]["result"] == "Saved source artifact"
 413    finally:
 414        db.close()
 415
 416
 417def test_repeated_task_and_experiment_records_mark_non_substantive_touches(tmp_path):
 418    db = AgentDB(tmp_path / "state.db")
 419    try:
 420        job_id = db.create_job("Research topic")
 421
 422        db.append_task_record(job_id, title="Explore primary sources", status="open", priority=3)
 423        repeated_task = db.append_task_record(job_id, title="Explore primary sources", status="open", priority=3)
 424        changed_task = db.append_task_record(job_id, title="Explore primary sources", status="done", priority=3)
 425
 426        db.append_experiment_record(
 427            job_id,
 428            title="Trial",
 429            status="measured",
 430            metric_name="score",
 431            metric_value=0.8,
 432        )
 433        repeated_experiment = db.append_experiment_record(
 434            job_id,
 435            title="Trial",
 436            status="measured",
 437            metric_name="score",
 438            metric_value=0.8,
 439        )
 440        changed_experiment = db.append_experiment_record(
 441            job_id,
 442            title="Trial",
 443            status="measured",
 444            metric_name="score",
 445            metric_value=0.9,
 446        )
 447
 448        assert repeated_task["created"] is False
 449        assert repeated_task["substantive_update"] is False
 450        assert changed_task["substantive_update"] is True
 451        assert repeated_experiment["created"] is False
 452        assert repeated_experiment["substantive_update"] is False
 453        assert changed_experiment["substantive_update"] is True
 454    finally:
 455        db.close()
 456
 457
 458def test_non_substantive_ledger_touches_do_not_emit_visible_events(tmp_path):
 459    db = AgentDB(tmp_path / "state.db")
 460    try:
 461        job_id = db.create_job("Research topic")
 462
 463        db.append_source_record(job_id, "https://example.com", outcome="useful")
 464        db.append_source_record(job_id, "https://example.com", outcome="useful")
 465        db.append_finding_record(job_id, name="Reusable finding", reason="evidence")
 466        db.append_finding_record(job_id, name="Reusable finding", reason="evidence")
 467        db.append_task_record(job_id, title="Explore primary sources", status="open", priority=3)
 468        db.append_task_record(job_id, title="Explore primary sources", status="open", priority=3)
 469        db.append_roadmap_record(
 470            job_id,
 471            title="Roadmap",
 472            milestones=[{"title": "Foundation", "status": "active", "priority": 5}],
 473        )
 474        db.append_roadmap_record(
 475            job_id,
 476            title="Roadmap",
 477            milestones=[{"title": "Foundation", "status": "active", "priority": 5}],
 478        )
 479        db.append_experiment_record(
 480            job_id,
 481            title="Trial",
 482            status="measured",
 483            metric_name="score",
 484            metric_value=0.8,
 485        )
 486        db.append_experiment_record(
 487            job_id,
 488            title="Trial",
 489            status="measured",
 490            metric_name="score",
 491            metric_value=0.8,
 492        )
 493
 494        events = db.list_timeline_events(job_id, limit=50)
 495        counts = {event_type: sum(1 for event in events if event["event_type"] == event_type) for event_type in {
 496            "source",
 497            "finding",
 498            "task",
 499            "roadmap",
 500            "experiment",
 501        }}
 502
 503        assert counts == {
 504            "source": 1,
 505            "finding": 1,
 506            "task": 1,
 507            "roadmap": 1,
 508            "experiment": 1,
 509        }
 510    finally:
 511        db.close()
 512
 513
 514def test_roadmap_last_records_include_progress_accounting_metadata(tmp_path):
 515    db = AgentDB(tmp_path / "state.db")
 516    try:
 517        job_id = db.create_job("Research topic")
 518        db.append_roadmap_record(
 519            job_id,
 520            title="Roadmap",
 521            milestones=[{"title": "Foundation", "status": "active", "priority": 5}],
 522        )
 523        db.append_roadmap_record(
 524            job_id,
 525            title="Roadmap",
 526            milestones=[{"title": "Foundation", "status": "validating", "priority": 6}],
 527        )
 528        db.append_milestone_validation_record(
 529            job_id,
 530            milestone="Foundation",
 531            validation_status="passed",
 532            result="Evidence satisfies acceptance criteria.",
 533        )
 534        metadata = db.get_job(job_id)["metadata"]
 535        roadmap_record = metadata["last_roadmap_record"]
 536        validation_record = metadata["last_milestone_validation"]
 537
 538        assert roadmap_record["created"] is False
 539        assert roadmap_record["updated_at"]
 540        assert roadmap_record["added_milestones"] == 0
 541        assert roadmap_record["updated_milestones"] == 1
 542        assert validation_record["validated_at"]
 543        assert validation_record["validation_status"] == "passed"
 544    finally:
 545        db.close()
 546
 547
 548def test_repeated_roadmap_records_do_not_create_fake_milestone_updates(tmp_path):
 549    db = AgentDB(tmp_path / "state.db")
 550    try:
 551        job_id = db.create_job("Research topic")
 552        milestone = {"title": "Foundation", "status": "active", "priority": 5}
 553        db.append_roadmap_record(job_id, title="Roadmap", milestones=[milestone])
 554        repeated = db.append_roadmap_record(job_id, title="Roadmap", milestones=[milestone])
 555        metadata = db.get_job(job_id)["metadata"]
 556        roadmap_record = metadata["last_roadmap_record"]
 557
 558        assert repeated["created"] is False
 559        assert repeated["substantive_update"] is False
 560        assert roadmap_record["updated_milestones"] == 0
 561        assert roadmap_record["updated_features"] == 0
 562        assert roadmap_record["roadmap_updated"] is False
 563    finally:
 564        db.close()
 565
 566
 567def test_timeline_events_cover_visible_activity(tmp_path):
 568    db = AgentDB(tmp_path / "state.db")
 569    try:
 570        job_id = db.create_job("Research topic", title="research")
 571        db.append_operator_message(job_id, "operator note", source="test")
 572        db.append_agent_update(job_id, "agent note", category="chat")
 573        db.append_lesson(job_id, "durable lesson", category="strategy")
 574        db.append_source_record(job_id, "https://example.com", usefulness_score=0.7, outcome="useful")
 575        db.append_finding_record(job_id, name="Reusable finding", reason="evidence")
 576        db.append_task_record(job_id, title="Explore branch", status="open")
 577        db.append_reflection(job_id, "reflect summary", strategy="next strategy")
 578        run_id = db.start_run(job_id, model="fake")
 579        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="web_search", input_data={"query": "x"})
 580        db.finish_step(step_id, status="completed", summary="searched", output_data={"ok": True})
 581        db.add_artifact(
 582            job_id=job_id,
 583            run_id=run_id,
 584            step_id=step_id,
 585            path=tmp_path / "artifact.md",
 586            sha256="abc",
 587            artifact_type="text",
 588            title="Artifact",
 589            summary="saved",
 590        )
 591        db.upsert_memory(job_id=job_id, key="rolling_state", summary="compact state")
 592
 593        events = db.list_timeline_events(job_id, limit=50)
 594        event_types = {event["event_type"] for event in events}
 595
 596        assert "operator_message" in event_types
 597        assert "agent_message" in event_types
 598        assert "lesson" in event_types
 599        assert "source" in event_types
 600        assert "finding" in event_types
 601        assert "task" in event_types
 602        assert "reflection" in event_types
 603        assert "tool_call" in event_types
 604        assert "tool_result" in event_types
 605        assert "artifact" in event_types
 606        assert "compaction" in event_types
 607        assert any(event["body"] == "operator note" for event in events)
 608    finally:
 609        db.close()
tests/nipux_cli/test_digest.py 43 lines
   1from nipux_cli.config import AppConfig, RuntimeConfig
   2from nipux_cli.db import AgentDB
   3from nipux_cli.digest import render_daily_digest, render_job_digest, write_daily_digest
   4
   5
   6def test_daily_digest_includes_ledgers_lessons_sources_and_strategy(tmp_path):
   7    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
   8    db = AgentDB(tmp_path / "state.db")
   9    try:
  10        job_id = db.create_job("Research topic", title="research", kind="generic")
  11        db.append_finding_record(job_id, name="Acme Finding", category="example category", reason="reusable result", score=0.8)
  12        db.append_task_record(job_id, title="Explore primary sources", status="open", priority=5)
  13        db.append_source_record(job_id, "https://example.com", usefulness_score=0.9, yield_count=1, outcome="yielded findings")
  14        db.append_lesson(job_id, "Low-evidence pages are not finding sources.", category="source_quality")
  15        db.append_reflection(job_id, "Primary source map is working.", strategy="Try archival sources next.")
  16        db.start_run(job_id, model="test-model")
  17        db.append_event(
  18            job_id,
  19            event_type="loop",
  20            title="message_end",
  21            metadata={"usage": {"prompt_tokens": 1200, "completion_tokens": 300, "total_tokens": 1500, "cost": 0.0025}},
  22        )
  23
  24        body = render_daily_digest(db)
  25        job_body = render_job_digest(db, job_id)
  26        result = write_daily_digest(config, db, day="2026-04-25")
  27
  28        assert "Counts: 1 findings, 1 sources, 1 tasks, 0 experiments, 1 lessons" in body
  29        assert "Model usage:" in body
  30        assert "1.5K tokens" in body
  31        assert "cost=$0.0025" in body
  32        assert "## Model Usage" in job_body
  33        assert "test-model: 1 calls" in job_body
  34        assert "Experiments:" in body
  35        assert "Acme Finding" in body
  36        assert "Explore primary sources" in body
  37        assert "Low-evidence pages are not finding sources." in body
  38        assert "https://example.com" in body
  39        assert "Try archival sources next." in body
  40        assert result["status"] == "dry_run"
  41        assert (tmp_path / "digests" / "2026-04-25-daily.md").exists()
  42    finally:
  43        db.close()
tests/nipux_cli/test_doctor.py 157 lines
   1import io
   2import json
   3import urllib.error
   4
   5from nipux_cli.config import AppConfig, ModelConfig, RuntimeConfig
   6from nipux_cli.doctor import run_doctor
   7
   8
   9class FakeHTTPResponse:
  10    def __init__(self, payload: dict):
  11        self.payload = payload
  12
  13    def __enter__(self):
  14        return self
  15
  16    def __exit__(self, exc_type, exc, tb):
  17        return False
  18
  19    def read(self, _limit=-1):
  20        return json.dumps(self.payload).encode("utf-8")
  21
  22
  23def test_doctor_checks_local_runtime_without_model_call(tmp_path):
  24    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
  25
  26    checks = run_doctor(config=config, check_model=False)
  27
  28    assert {check.name for check in checks} == {"state_dir_writable", "sqlite", "model_config", "tool_surface", "browser_runtime"}
  29    assert all(check.ok for check in checks)
  30
  31
  32def test_doctor_warns_when_remote_model_key_is_missing(tmp_path, monkeypatch):
  33    monkeypatch.delenv("OPENROUTER_API_KEY", raising=False)
  34    config = AppConfig(
  35        runtime=RuntimeConfig(home=tmp_path),
  36        model=ModelConfig(
  37            model="provider/model",
  38            base_url="https://openrouter.ai/api/v1",
  39            api_key_env="OPENROUTER_API_KEY",
  40        ),
  41    )
  42
  43    checks = run_doctor(config=config, check_model=False)
  44    model_check = next(check for check in checks if check.name == "model_config")
  45
  46    assert not model_check.ok
  47    assert "OPENROUTER_API_KEY is not set" in model_check.detail
  48    assert "sk-" not in model_check.detail
  49
  50
  51def test_doctor_reports_openrouter_auth_failure(tmp_path, monkeypatch):
  52    monkeypatch.setenv("TEST_OPENROUTER_KEY", "bad-key")
  53    config = AppConfig(
  54        runtime=RuntimeConfig(home=tmp_path),
  55        model=ModelConfig(
  56            model="nvidia/nemotron-3-nano-omni-30b-a3b-reasoning:free",
  57            base_url="https://openrouter.ai/api/v1",
  58            api_key_env="TEST_OPENROUTER_KEY",
  59        ),
  60    )
  61
  62    def fake_urlopen(_request, timeout):
  63        raise urllib.error.HTTPError(
  64            "https://openrouter.ai/api/v1/key",
  65            401,
  66            "Unauthorized",
  67            hdrs=None,
  68            fp=None,
  69        )
  70
  71    monkeypatch.setattr("urllib.request.urlopen", fake_urlopen)
  72
  73    checks = run_doctor(config=config, check_model=True)
  74    model_check = checks[-1]
  75
  76    assert model_check.name == "model_auth"
  77    assert model_check.ok is False
  78    assert "OpenRouter rejected API key" in model_check.detail
  79
  80
  81def test_doctor_reports_generation_limit_after_model_listing(tmp_path, monkeypatch):
  82    monkeypatch.setenv("TEST_OPENROUTER_KEY", "limited-key")
  83    config = AppConfig(
  84        runtime=RuntimeConfig(home=tmp_path),
  85        model=ModelConfig(
  86            model="provider/test-model",
  87            base_url="https://openrouter.ai/api/v1",
  88            api_key_env="TEST_OPENROUTER_KEY",
  89        ),
  90    )
  91
  92    def fake_urlopen(request, timeout):
  93        url = request.full_url
  94        if url.endswith("/key"):
  95            return FakeHTTPResponse({})
  96        if url.endswith("/models"):
  97            return FakeHTTPResponse({"data": [{"id": "provider/test-model"}]})
  98        if url.endswith("/chat/completions"):
  99            body = b'{"error":{"message":"Key limit exceeded (total limit).","code":403}}'
 100            raise urllib.error.HTTPError(url, 403, "Forbidden", hdrs=None, fp=io.BytesIO(body))
 101        raise AssertionError(url)
 102
 103    monkeypatch.setattr("urllib.request.urlopen", fake_urlopen)
 104
 105    checks = run_doctor(config=config, check_model=True)
 106    model_check = checks[-1]
 107
 108    assert model_check.name == "model_generation"
 109    assert model_check.ok is False
 110    assert "Key limit exceeded" in model_check.detail
 111
 112
 113def test_doctor_reports_nested_provider_generation_error(tmp_path, monkeypatch):
 114    monkeypatch.setenv("TEST_OPENROUTER_KEY", "limited-key")
 115    config = AppConfig(
 116        runtime=RuntimeConfig(home=tmp_path),
 117        model=ModelConfig(
 118            model="provider/test-model",
 119            base_url="https://openrouter.ai/api/v1",
 120            api_key_env="TEST_OPENROUTER_KEY",
 121        ),
 122    )
 123
 124    def fake_urlopen(request, timeout):
 125        url = request.full_url
 126        if url.endswith("/key"):
 127            return FakeHTTPResponse({})
 128        if url.endswith("/models"):
 129            return FakeHTTPResponse({"data": [{"id": "provider/test-model"}]})
 130        if url.endswith("/chat/completions"):
 131            body = json.dumps(
 132                {
 133                    "error": {
 134                        "message": "Provider returned error",
 135                        "code": 429,
 136                        "metadata": {
 137                            "raw": "provider/test-model is temporarily rate-limited upstream.",
 138                            "provider_name": "ExampleProvider",
 139                            "is_byok": False,
 140                        },
 141                    }
 142                }
 143            ).encode("utf-8")
 144            raise urllib.error.HTTPError(url, 429, "Too Many Requests", hdrs=None, fp=io.BytesIO(body))
 145        raise AssertionError(url)
 146
 147    monkeypatch.setattr("urllib.request.urlopen", fake_urlopen)
 148
 149    checks = run_doctor(config=config, check_model=True)
 150    model_check = checks[-1]
 151
 152    assert model_check.name == "model_generation"
 153    assert model_check.ok is False
 154    assert "Provider returned error" in model_check.detail
 155    assert "temporarily rate-limited upstream" in model_check.detail
 156    assert "provider=ExampleProvider" in model_check.detail
 157    assert "byok=False" in model_check.detail
tests/nipux_cli/test_generic_runtime_audit.py 35 lines
   1from pathlib import Path
   2
   3
   4FORBIDDEN_RUNTIME_LITERALS = {
   5    "192.168",
   6    "9060 xt",
   7    "canadian",
   8    "client finder",
   9    "lead batch",
  10    "lead ledger",
  11    "client prospect",
  12    "edmonton",
  13    "home ssh",
  14    "home-ssh.local",
  15    "home@",
  16    "huggingface.co/qwen",
  17    "livebusiness",
  18    "qwen_qwen",
  19    "ssh home",
  20    "treefrog",
  21    "yelp",
  22    "llama.cpp",
  23}
  24
  25
  26def test_runtime_code_has_no_task_specific_literals():
  27    root = Path(__file__).resolve().parents[2] / "nipux_cli"
  28    haystack = "\n".join(
  29        path.read_text(encoding="utf-8", errors="replace").lower()
  30        for path in sorted(root.glob("*.py"))
  31        if path.name != "__init__.py"
  32    )
  33
  34    for literal in FORBIDDEN_RUNTIME_LITERALS:
  35        assert literal not in haystack
tests/nipux_cli/test_live_memory_graph_smoke.py 37 lines
   1import importlib.util
   2import sys
   3from pathlib import Path
   4
   5from nipux_cli.memory_graph import memory_graph_for_prompt
   6
   7
   8def _load_live_smoke():
   9    path = Path(__file__).resolve().parents[2] / "scripts" / "live_memory_graph_smoke.py"
  10    spec = importlib.util.spec_from_file_location("live_memory_graph_smoke", path)
  11    assert spec is not None
  12    module = importlib.util.module_from_spec(spec)
  13    assert spec.loader is not None
  14    sys.modules[spec.name] = module
  15    spec.loader.exec_module(module)
  16    return module
  17
  18
  19def test_live_memory_graph_smoke_fails_cleanly_without_key(monkeypatch, capsys):
  20    smoke = _load_live_smoke()
  21    monkeypatch.delenv("NIPUX_LIVE_TEST_KEY", raising=False)
  22    monkeypatch.setattr(sys, "argv", ["live_memory_graph_smoke.py", "--api-key-env", "NIPUX_LIVE_TEST_KEY", "--json"])
  23
  24    assert smoke.main() == 1
  25
  26    out = capsys.readouterr().out
  27    assert '"success": false' in out
  28    assert "NIPUX_LIVE_TEST_KEY is not set" in out
  29    assert "secret" not in out.lower()
  30
  31
  32def test_live_memory_graph_smoke_seed_pushes_generic_consolidation():
  33    smoke = _load_live_smoke()
  34    prompt = memory_graph_for_prompt({"metadata": smoke._seed_metadata()})
  35
  36    assert "No memory graph yet" in prompt
  37    assert "Durable ledgers already contain" in prompt
tests/nipux_cli/test_llm.py 151 lines
   1from types import SimpleNamespace
   2
   3from nipux_cli.config import ModelConfig
   4from nipux_cli.llm import OpenAIChatLLM, _enrich_openrouter_generation_usage
   5
   6
   7class _FakeCompletions:
   8    def __init__(self):
   9        self.kwargs = None
  10        self.calls = []
  11
  12    def create(self, **kwargs):
  13        self.kwargs = kwargs
  14        self.calls.append(kwargs)
  15        usage = SimpleNamespace(prompt_tokens=11, completion_tokens=7, total_tokens=18, cost=0.00042)
  16        message = SimpleNamespace(content="ok", tool_calls=[])
  17        choice = SimpleNamespace(message=message)
  18        return SimpleNamespace(id="gen_test", model="provider/model", choices=[choice], usage=usage)
  19
  20
  21def test_chat_llm_requires_tool_choice_for_worker_actions(monkeypatch):
  22    fake_completions = _FakeCompletions()
  23    monkeypatch.setenv("TEST_API_KEY", "test")
  24
  25    class FakeOpenAI:
  26        def __init__(self, **kwargs):
  27            pass
  28
  29        chat = SimpleNamespace(completions=fake_completions)
  30
  31    monkeypatch.setattr("nipux_cli.llm.OpenAI", FakeOpenAI)
  32
  33    llm = OpenAIChatLLM(ModelConfig(model="test/model", base_url="https://example.test/v1", api_key_env="TEST_API_KEY"))
  34    response = llm.next_action(messages=[{"role": "user", "content": "hi"}], tools=[{"type": "function", "function": {"name": "noop"}}])
  35
  36    assert response.content == "ok"
  37    assert response.usage["prompt_tokens"] == 11
  38    assert response.usage["completion_tokens"] == 7
  39    assert response.usage["cost"] == 0.00042
  40    assert response.model == "provider/model"
  41    assert response.response_id == "gen_test"
  42    assert fake_completions.kwargs["tools"]
  43    assert fake_completions.kwargs["tool_choice"] == "required"
  44
  45
  46def test_chat_llm_retries_without_tool_choice_when_provider_rejects_it(monkeypatch):
  47    monkeypatch.setenv("TEST_API_KEY", "test")
  48
  49    class RejectingCompletions(_FakeCompletions):
  50        def create(self, **kwargs):
  51            self.calls.append(kwargs)
  52            if kwargs.get("tool_choice") == "required":
  53                raise RuntimeError("unsupported parameter: tool_choice")
  54            return super().create(**kwargs)
  55
  56    fake_completions = RejectingCompletions()
  57
  58    class FakeOpenAI:
  59        def __init__(self, **kwargs):
  60            pass
  61
  62        chat = SimpleNamespace(completions=fake_completions)
  63
  64    monkeypatch.setattr("nipux_cli.llm.OpenAI", FakeOpenAI)
  65
  66    llm = OpenAIChatLLM(ModelConfig(model="test/model", base_url="https://example.test/v1", api_key_env="TEST_API_KEY"))
  67    response = llm.next_action(messages=[{"role": "user", "content": "hi"}], tools=[{"type": "function", "function": {"name": "noop"}}])
  68
  69    assert response.content == "ok"
  70    assert fake_completions.calls[0]["tool_choice"] == "required"
  71    assert "tool_choice" not in fake_completions.calls[-1]
  72
  73
  74def test_chat_llm_complete_response_returns_usage(monkeypatch):
  75    fake_completions = _FakeCompletions()
  76    monkeypatch.setenv("TEST_API_KEY", "test")
  77
  78    class FakeOpenAI:
  79        def __init__(self, **kwargs):
  80            pass
  81
  82        chat = SimpleNamespace(completions=fake_completions)
  83
  84    monkeypatch.setattr("nipux_cli.llm.OpenAI", FakeOpenAI)
  85
  86    llm = OpenAIChatLLM(ModelConfig(model="test/model", base_url="https://example.test/v1", api_key_env="TEST_API_KEY"))
  87    response = llm.complete_response(messages=[{"role": "user", "content": "hi"}])
  88
  89    assert response.content == "ok"
  90    assert response.usage["prompt_tokens"] == 11
  91    assert response.usage["completion_tokens"] == 7
  92    assert response.usage["cost"] == 0.00042
  93    assert response.model == "provider/model"
  94    assert response.response_id == "gen_test"
  95    assert fake_completions.kwargs["model"] == "test/model"
  96
  97
  98def test_chat_llm_disables_provider_sdk_retries(monkeypatch):
  99    captured = {}
 100    monkeypatch.setenv("TEST_API_KEY", "test")
 101
 102    class FakeOpenAI:
 103        def __init__(self, **kwargs):
 104            captured.update(kwargs)
 105
 106        chat = SimpleNamespace(completions=_FakeCompletions())
 107
 108    monkeypatch.setattr("nipux_cli.llm.OpenAI", FakeOpenAI)
 109
 110    OpenAIChatLLM(ModelConfig(model="test/model", base_url="https://example.test/v1", api_key_env="TEST_API_KEY", request_timeout_seconds=37))
 111
 112    assert captured["timeout"] == 37
 113    assert captured["max_retries"] == 0
 114
 115
 116def test_openrouter_generation_usage_enriches_cost_and_tokens(monkeypatch):
 117    class FakeHTTPResponse:
 118        def __enter__(self):
 119            return self
 120
 121        def __exit__(self, *_args):
 122            return False
 123
 124        def read(self):
 125            return (
 126                b'{"data":{"total_cost":"0.0042","native_tokens_prompt":123,'
 127                b'"native_tokens_completion":45,"native_tokens_total":168}}'
 128            )
 129
 130    captured = {}
 131
 132    def fake_urlopen(request, timeout):
 133        captured["url"] = request.full_url
 134        captured["timeout"] = timeout
 135        return FakeHTTPResponse()
 136
 137    monkeypatch.setattr("nipux_cli.llm.urllib.request.urlopen", fake_urlopen)
 138
 139    usage = _enrich_openrouter_generation_usage(
 140        {"prompt_tokens": 10, "completion_tokens": 5, "total_tokens": 15, "estimated": False},
 141        response_id="gen_123",
 142        base_url="https://openrouter.ai/api/v1",
 143        api_key="sk-test",
 144    )
 145
 146    assert captured["url"] == "https://openrouter.ai/api/v1/generation?id=gen_123"
 147    assert captured["timeout"] == 5
 148    assert usage["cost"] == 0.0042
 149    assert usage["prompt_tokens"] == 123
 150    assert usage["completion_tokens"] == 45
 151    assert usage["total_tokens"] == 168
tests/nipux_cli/test_measurement.py 32 lines
   1from nipux_cli.measurement import measurement_candidates, measurement_candidates_are_diagnostic_only
   2
   3
   4def test_measurement_candidates_extract_markdown_table_unit_columns():
   5    output = {
   6        "stdout": (
   7            "| model                          |       size | backend    | threads |            test |                  t/s |\n"
   8            "| ------------------------------ | ---------: | ---------- | ------: | --------------: | -------------------: |\n"
   9            "| example model                  |  11.71 GiB | CPU        |      24 |            pp32 |          5.48 ± 0.11 |\n"
  10            "| example model                  |  11.71 GiB | CPU        |      24 |           tg128 |          3.44 ± 0.05 |\n"
  11        )
  12    }
  13
  14    candidates = measurement_candidates(output, command="run benchmark")
  15
  16    assert "pp32 5.48 ± 0.11 t/s" in candidates
  17    assert "tg128 3.44 ± 0.05 t/s" in candidates
  18    assert not measurement_candidates_are_diagnostic_only(candidates, command="run benchmark")
  19
  20
  21def test_measurement_candidates_extract_generic_table_metrics():
  22    output = {
  23        "stdout": (
  24            "| benchmark | latency | req/s |\n"
  25            "| --- | ---: | ---: |\n"
  26            "| warm path | 18.4 | 42.7 |\n"
  27        )
  28    }
  29
  30    candidates = measurement_candidates(output, command="profile throughput")
  31
  32    assert "warm path 42.7 req/s" in candidates
tests/nipux_cli/test_metric_format.py 11 lines
   1from nipux_cli.metric_format import format_metric_value
   2
   3
   4def test_format_metric_value_spaces_named_units():
   5    assert format_metric_value("citations", 42, "count") == "citations=42 count"
   6    assert format_metric_value("speed", 2.7, "tokens/s") == "speed=2.7 tokens/s"
   7
   8
   9def test_format_metric_value_keeps_attached_symbol_units():
  10    assert format_metric_value("accuracy", 98.2, "%") == "accuracy=98.2%"
  11    assert format_metric_value("throughput", 120, "/s") == "throughput=120/s"
tests/nipux_cli/test_operator_context.py 30 lines
   1from nipux_cli.operator_context import inactive_prompt_operator_ids, operator_entry_is_prompt_relevant
   2
   3
   4def _entry(message: str, *, event_id: str = "op_1", mode: str = "steer") -> dict:
   5    return {"event_id": event_id, "mode": mode, "message": message}
   6
   7
   8def test_conversation_only_operator_messages_do_not_enter_worker_prompt():
   9    for message in ("hello", "how is it going?", "clear", "stop 1", "jobs"):
  10        assert not operator_entry_is_prompt_relevant(_entry(message))
  11
  12
  13def test_actionable_operator_messages_remain_worker_constraints():
  14    for message in (
  15        "do not run local testing on my computer",
  16        "use the corrected target from the chat",
  17        "focus on measured results instead of saved notes",
  18        "the address is wrong, use `target-box`",
  19    ):
  20        assert operator_entry_is_prompt_relevant(_entry(message))
  21
  22
  23def test_inactive_prompt_operator_ids_returns_only_conversation_active_messages():
  24    messages = [
  25        _entry("hello", event_id="op_chat"),
  26        _entry("use the corrected target", event_id="op_use"),
  27        {**_entry("clear", event_id="op_done"), "acknowledged_at": "2026-04-26T00:00:00+00:00"},
  28    ]
  29
  30    assert inactive_prompt_operator_ids(messages) == ["op_chat"]
tests/nipux_cli/test_planning.py 90 lines
   1from nipux_cli.planning import initial_plan_for_objective, initial_roadmap_for_objective, initial_task_contract, objective_profiles
   2
   3
   4def test_initial_task_contracts_are_generic_and_complete():
   5    for title in [
   6        "Clarify the exact success criteria and constraints.",
   7        "Map the first research or execution branches.",
   8        "Collect evidence and save outputs as files.",
   9        "Reflect on what worked, update memory, and continue with the next branch.",
  10    ]:
  11        contract = initial_task_contract(title)
  12
  13        assert contract["output_contract"] in {"research", "artifact", "experiment", "action", "monitor", "decision", "report"}
  14        assert contract["acceptance_criteria"]
  15        assert contract["evidence_needed"]
  16        assert contract["stall_behavior"]
  17
  18
  19def test_initial_roadmap_uses_valid_generic_contracts():
  20    roadmap = initial_roadmap_for_objective(title="paper", objective="write a paper")
  21
  22    for milestone in roadmap["milestones"]:
  23        for feature in milestone["features"]:
  24            assert feature["output_contract"] in {
  25                "research",
  26                "artifact",
  27                "experiment",
  28                "action",
  29                "monitor",
  30                "decision",
  31                "report",
  32            }
  33
  34
  35def test_initial_plan_adapts_to_measurable_objectives():
  36    plan = initial_plan_for_objective("optimize a generic process for lower latency and higher throughput")
  37    contracts = [initial_task_contract(title)["output_contract"] for title in plan["tasks"]]
  38
  39    assert plan["profile"] == "measured"
  40    assert "experiment" in contracts
  41    assert any("baseline" in title.lower() for title in plan["tasks"])
  42    assert any("metric" in question.lower() for question in plan["questions"])
  43
  44
  45def test_initial_plan_adapts_to_deliverable_objectives():
  46    plan = initial_plan_for_objective("write a full research paper from evidence")
  47    contracts = [initial_task_contract(title)["output_contract"] for title in plan["tasks"]]
  48
  49    assert plan["profile"] == "deliverable"
  50    assert "report" in contracts
  51    assert any("draft" in title.lower() or "report" in title.lower() for title in plan["tasks"])
  52    assert any("revise" in title.lower() and "evidence" in title.lower() for title in plan["tasks"])
  53
  54
  55def test_initial_plan_treats_generated_files_as_deliverables():
  56    plan = initial_plan_for_objective("generate a polished launch checklist for this repository")
  57    contracts = [initial_task_contract(title)["output_contract"] for title in plan["tasks"]]
  58
  59    assert plan["profile"] == "deliverable"
  60    assert "report" in contracts
  61    assert any("audience" in question.lower() for question in plan["questions"])
  62
  63
  64def test_initial_plan_adapts_to_monitoring_objectives():
  65    plan = initial_plan_for_objective("monitor a recurring process and report important changes")
  66    contracts = [initial_task_contract(title)["output_contract"] for title in plan["tasks"]]
  67
  68    assert plan["profile"] == "monitor"
  69    assert "monitor" in contracts
  70    assert any("cadence" in question.lower() or "check" in question.lower() for question in plan["questions"])
  71
  72
  73def test_initial_plan_does_not_add_meta_progress_update_task():
  74    for objective in [
  75        "optimize a generic process for lower latency and higher throughput",
  76        "write a full research paper from evidence",
  77        "monitor a recurring process and report important changes",
  78        "investigate build quality and compare output changes",
  79    ]:
  80        plan = initial_plan_for_objective(objective)
  81
  82        assert all("progress update" not in title.lower() for title in plan["tasks"])
  83        assert all("keep working on the next useful branch" not in title.lower() for title in plan["tasks"])
  84
  85
  86def test_objective_profiles_stay_generic():
  87    profiles = objective_profiles("investigate build quality and compare output changes")
  88
  89    assert profiles
  90    assert all(profile in {"measured", "deliverable", "monitor", "implementation", "research", "general"} for profile in profiles)
tests/nipux_cli/test_progress.py 214 lines
   1from nipux_cli.progress import build_progress_checkpoint, ledger_counts, recent_progress_bits
   2
   3
   4def test_progress_checkpoint_reports_deltas_and_recent_durable_work():
   5    metadata = {
   6        "finding_ledger": [{"title": "First finding"}, {"title": "Better branch"}],
   7        "source_ledger": [{"url": "https://example.test"}],
   8        "task_queue": [
   9            {"title": "Draft report", "status": "done", "priority": 2},
  10            {"title": "Validate report", "status": "open", "priority": 8},
  11        ],
  12        "experiment_ledger": [{"title": "Quality check", "metric_name": "score", "metric_value": 0.82}],
  13        "lessons": [{"lesson": "Prefer measured output"}],
  14        "roadmap": {"milestones": [{"title": "Publishable draft", "status": "validating"}]},
  15    }
  16
  17    checkpoint = build_progress_checkpoint(
  18        metadata,
  19        previous_counts={"findings": 1, "sources": 0, "tasks": 2, "experiments": 0, "lessons": 1, "milestones": 0},
  20        step_no=40,
  21        tool_name="record_findings",
  22    )
  23
  24    assert checkpoint.counts == {
  25        "findings": 2,
  26        "sources": 1,
  27        "tasks": 2,
  28        "experiments": 1,
  29        "lessons": 1,
  30        "milestones": 1,
  31    }
  32    assert checkpoint.deltas["findings"] == 1
  33    assert checkpoint.deltas["sources"] == 1
  34    assert checkpoint.deltas["tasks"] == 0
  35    assert checkpoint.category == "progress"
  36    assert "+1 finding" in checkpoint.message
  37    assert "+1 source" in checkpoint.message
  38    assert "+1 experiment" in checkpoint.message
  39    assert "finding=Better branch" in checkpoint.message
  40    assert "task=Validate report" in checkpoint.message
  41    assert "measurement=score=0.82" in checkpoint.message
  42    assert "milestone=Publishable draft" in checkpoint.message
  43
  44
  45def test_progress_checkpoint_for_saved_output_is_concise():
  46    metadata = {"finding_ledger": [{}], "source_ledger": [{}, {}], "task_queue": [{}], "experiment_ledger": []}
  47
  48    checkpoint = build_progress_checkpoint(
  49        metadata,
  50        step_no=12,
  51        tool_name="write_artifact",
  52        artifact_id="art_123",
  53        is_finding_output=True,
  54    )
  55
  56    assert checkpoint.category == "finding"
  57    assert checkpoint.message.startswith("Saved output art_123")
  58    assert "1 findings, 2 sources, 1 tasks, and 0 experiments" in checkpoint.message
  59
  60
  61def test_progress_checkpoint_without_delta_is_activity_not_progress():
  62    metadata = {"finding_ledger": [{}], "source_ledger": [{}], "task_queue": [{}], "experiment_ledger": []}
  63
  64    checkpoint = build_progress_checkpoint(
  65        metadata,
  66        previous_counts={"findings": 1, "sources": 1, "tasks": 1, "experiments": 0, "lessons": 0, "milestones": 0},
  67        step_no=50,
  68        tool_name="web_extract",
  69    )
  70
  71    assert checkpoint.category == "activity"
  72    assert "no new durable ledger entries" in checkpoint.message
  73
  74
  75def test_progress_checkpoint_counts_existing_record_updates_as_progress():
  76    metadata = {
  77        "last_checkpoint_at": "2026-01-01T00:00:00+00:00",
  78        "finding_ledger": [{}],
  79        "source_ledger": [{}],
  80        "task_queue": [{"title": "Existing branch", "status": "done"}],
  81        "experiment_ledger": [{"title": "Trial", "status": "measured"}],
  82        "last_task_record": {
  83            "title": "Existing branch",
  84            "status": "done",
  85            "result": "Validated the branch.",
  86            "created": False,
  87            "updated_at": "2026-01-01T00:01:00+00:00",
  88        },
  89        "last_source_record": {
  90            "source": "https://example.test",
  91            "created": False,
  92            "last_seen": "2026-01-01T00:01:30+00:00",
  93        },
  94        "last_experiment_record": {
  95            "title": "Trial",
  96            "status": "measured",
  97            "metric_name": "score",
  98            "metric_value": 0.9,
  99            "created": False,
 100            "updated_at": "2026-01-01T00:02:00+00:00",
 101        },
 102    }
 103
 104    checkpoint = build_progress_checkpoint(
 105        metadata,
 106        previous_counts={"findings": 1, "sources": 1, "tasks": 1, "experiments": 1, "lessons": 0, "milestones": 0},
 107        step_no=60,
 108        tool_name="record_tasks",
 109    )
 110
 111    assert checkpoint.category == "progress"
 112    assert checkpoint.deltas["tasks"] == 0
 113    assert checkpoint.updates["tasks"] == 1
 114    assert checkpoint.updates["sources"] == 1
 115    assert checkpoint.resolutions["tasks"] == 1
 116    assert checkpoint.updates["experiments"] == 1
 117    assert checkpoint.resolutions["experiments"] == 1
 118    assert "~1 task updated" in checkpoint.message
 119    assert "~1 source updated" in checkpoint.message
 120    assert "1 task resolved" in checkpoint.message
 121    assert "~1 experiment updated" in checkpoint.message
 122
 123
 124def test_progress_checkpoint_ignores_non_substantive_record_touches():
 125    metadata = {
 126        "last_checkpoint_at": "2026-01-01T00:00:00+00:00",
 127        "finding_ledger": [{}],
 128        "source_ledger": [{}],
 129        "task_queue": [{"title": "Existing branch", "status": "open"}],
 130        "experiment_ledger": [{"title": "Trial", "status": "planned"}],
 131        "last_task_record": {
 132            "title": "Existing branch",
 133            "status": "open",
 134            "created": False,
 135            "substantive_update": False,
 136            "updated_at": "2026-01-01T00:01:00+00:00",
 137        },
 138        "last_source_record": {
 139            "source": "https://example.test",
 140            "created": False,
 141            "substantive_update": False,
 142            "last_seen": "2026-01-01T00:01:30+00:00",
 143        },
 144        "last_experiment_record": {
 145            "title": "Trial",
 146            "status": "planned",
 147            "created": False,
 148            "substantive_update": False,
 149            "updated_at": "2026-01-01T00:02:00+00:00",
 150        },
 151    }
 152
 153    checkpoint = build_progress_checkpoint(
 154        metadata,
 155        previous_counts={"findings": 1, "sources": 1, "tasks": 1, "experiments": 1, "lessons": 0, "milestones": 0},
 156        step_no=61,
 157        tool_name="record_tasks",
 158    )
 159
 160    assert checkpoint.category == "activity"
 161    assert checkpoint.updates["tasks"] == 0
 162    assert checkpoint.updates["sources"] == 0
 163    assert checkpoint.updates["experiments"] == 0
 164    assert "no new durable ledger entries" in checkpoint.message
 165
 166
 167def test_progress_checkpoint_counts_roadmap_updates_and_validations():
 168    metadata = {
 169        "last_checkpoint_at": "2026-01-01T00:00:00+00:00",
 170        "roadmap": {"milestones": [{"title": "Foundation", "status": "validating"}]},
 171        "last_roadmap_record": {
 172            "title": "Roadmap",
 173            "created": False,
 174            "updated_at": "2026-01-01T00:01:00+00:00",
 175            "added_milestones": 0,
 176            "updated_milestones": 1,
 177            "added_features": 0,
 178            "updated_features": 0,
 179        },
 180        "last_milestone_validation": {
 181            "milestone": "Foundation",
 182            "validation_status": "passed",
 183            "validated_at": "2026-01-01T00:02:00+00:00",
 184        },
 185    }
 186
 187    checkpoint = build_progress_checkpoint(
 188        metadata,
 189        previous_counts={"findings": 0, "sources": 0, "tasks": 0, "experiments": 0, "lessons": 0, "milestones": 1},
 190        step_no=70,
 191        tool_name="record_milestone_validation",
 192    )
 193
 194    assert checkpoint.category == "progress"
 195    assert checkpoint.deltas["milestones"] == 0
 196    assert checkpoint.updates["milestones"] == 2
 197    assert checkpoint.resolutions["milestones"] == 1
 198    assert "~2 milestones updated" in checkpoint.message
 199    assert "1 milestone resolved" in checkpoint.message
 200
 201
 202def test_progress_helpers_ignore_malformed_metadata():
 203    metadata = {
 204        "finding_ledger": "bad",
 205        "source_ledger": [None, {"url": "ok"}],
 206        "task_queue": [{"title": "Task", "status": "blocked", "priority": "bad"}],
 207        "roadmap": {"milestones": ["bad", {"title": "Milestone", "status": "active"}]},
 208    }
 209
 210    assert ledger_counts(metadata)["sources"] == 1
 211    assert ledger_counts(metadata)["milestones"] == 2
 212    bits = recent_progress_bits(metadata)
 213    assert "task=Task" in bits
 214    assert "milestone=Milestone" in bits
tests/nipux_cli/test_project_atlas.py 46 lines
   1import importlib.util
   2import sys
   3from pathlib import Path
   4
   5
   6def _load_generator():
   7    path = Path(__file__).resolve().parents[2] / "scripts" / "generate_project_atlas.py"
   8    spec = importlib.util.spec_from_file_location("generate_project_atlas", path)
   9    assert spec is not None
  10    module = importlib.util.module_from_spec(spec)
  11    assert spec.loader is not None
  12    sys.modules[spec.name] = module
  13    spec.loader.exec_module(module)
  14    return module
  15
  16
  17def test_project_atlas_generator_maps_prompts_tools_and_source_without_self_embedding():
  18    generator = _load_generator()
  19
  20    files = generator.load_source_files()
  21    prompts = generator.extract_prompts(files)
  22    tools = generator.extract_tools(files)
  23
  24    assert "docs/project-atlas.html" not in {source.path for source in files}
  25    assert any(prompt.path == "nipux_cli/worker_policy.py" and "SYSTEM_PROMPT" in prompt.name for prompt in prompts)
  26    assert any(tool["name"] == "web_search" for tool in tools)
  27
  28
  29def test_project_atlas_redacts_secret_assignments_from_rendered_source():
  30    generator = _load_generator()
  31    openrouter_key = "OPENROUTER_API_KEY"
  32    openai_key = "OPENAI_API_KEY"
  33    source = generator.SourceFile(
  34        path=".env.example",
  35        text=f"{openrouter_key}=\n{openai_key}=secret\nNORMAL=value",
  36        lines=[openrouter_key + "=", openai_key + "=secret", "NORMAL=value"],
  37        tree=None,
  38    )
  39
  40    rendered = generator.render_source_file(source)
  41
  42    assert openrouter_key + "=" not in rendered
  43    assert openai_key + "=secret" not in rendered
  44    assert f"{openrouter_key} = &lt;redacted&gt;" in rendered
  45    assert f"{openai_key} = &lt;redacted&gt;" in rendered
  46    assert "NORMAL=value" in rendered
tests/nipux_cli/test_provider_errors.py 21 lines
   1from nipux_cli.provider_errors import (
   2    provider_action_required,
   3    provider_action_required_note,
   4    provider_rate_limited,
   5)
   6
   7
   8class ProviderPayloadError(Exception):
   9    payload = {"error": {"message": "Key limit exceeded", "code": 403}}
  10
  11
  12def test_provider_action_required_detects_payload_and_status_text():
  13    assert provider_action_required(ProviderPayloadError("provider rejected request"))
  14    assert provider_action_required("PermissionDeniedError: Error code: 403")
  15    assert "operator action" in provider_action_required_note("invalid api key")
  16
  17
  18def test_provider_rate_limited_detects_transient_rate_text():
  19    assert provider_rate_limited("429 too many requests")
  20    assert provider_rate_limited("provider temporarily over capacity")
  21    assert not provider_rate_limited("invalid api key")
tests/nipux_cli/test_templates.py 15 lines
   1from nipux_cli.templates import program_for_job
   2
   3
   4def test_generic_template_pushes_artifacts_and_updates():
   5    program = program_for_job(kind="generic", title="research", objective="Find findings")
   6
   7    assert "Save important observations as artifacts" in program
   8    assert "Use report_update" in program
   9    assert "Use record_lesson" in program
  10    assert "record_source" in program
  11    assert "record_findings" in program
  12    assert "record_tasks" in program
  13    assert "record_roadmap" in program
  14    assert "record_milestone_validation" in program
  15    assert "record_findings" in program
tests/nipux_cli/test_tools.py 2166 lines
   1import json
   2import os
   3import signal
   4import subprocess
   5import time
   6
   7from nipux_cli.artifacts import ArtifactStore
   8from nipux_cli.config import AppConfig, RuntimeConfig, ToolAccessConfig
   9from nipux_cli.db import AgentDB
  10from nipux_cli.shell_tools import cleanup_registered_shell_processes
  11from nipux_cli.tools import APPROVED_TOOL_NAMES, DEFAULT_REGISTRY, ToolContext
  12
  13
  14def test_static_tool_surface_is_focused():
  15    assert tuple(DEFAULT_REGISTRY.names()) == tuple(sorted(APPROVED_TOOL_NAMES))
  16    assert "terminal" not in DEFAULT_REGISTRY.names()
  17    assert "delegate_task" not in DEFAULT_REGISTRY.names()
  18    assert "skill_manage" not in DEFAULT_REGISTRY.names()
  19    assert "browser_navigate" in DEFAULT_REGISTRY.names()
  20    assert "shell_exec" in DEFAULT_REGISTRY.names()
  21    assert "write_file" in DEFAULT_REGISTRY.names()
  22    assert "write_artifact" in DEFAULT_REGISTRY.names()
  23    assert "defer_job" in DEFAULT_REGISTRY.names()
  24    assert "report_update" in DEFAULT_REGISTRY.names()
  25    assert "record_lesson" in DEFAULT_REGISTRY.names()
  26    assert "record_memory_graph" in DEFAULT_REGISTRY.names()
  27    assert "search_memory_graph" in DEFAULT_REGISTRY.names()
  28    assert "record_source" in DEFAULT_REGISTRY.names()
  29    assert "record_findings" in DEFAULT_REGISTRY.names()
  30    assert "record_tasks" in DEFAULT_REGISTRY.names()
  31    assert "record_roadmap" in DEFAULT_REGISTRY.names()
  32    assert "record_milestone_validation" in DEFAULT_REGISTRY.names()
  33    assert "record_experiment" in DEFAULT_REGISTRY.names()
  34    assert "acknowledge_operator_context" in DEFAULT_REGISTRY.names()
  35
  36
  37def test_tool_registry_validates_required_arguments(tmp_path):
  38    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
  39
  40    missing = DEFAULT_REGISTRY.validate_arguments("shell_exec", {}, config)
  41    assert missing is not None
  42    assert missing["missing_arguments"] == ["command"]
  43    assert missing["recoverable"] is True
  44
  45    artifact_ref = DEFAULT_REGISTRY.validate_arguments("read_artifact", {}, config)
  46    assert artifact_ref is not None
  47    assert artifact_ref["missing_arguments"] == ["artifact reference"]
  48
  49    graph = DEFAULT_REGISTRY.validate_arguments("record_memory_graph", {}, config)
  50    assert graph is not None
  51    assert graph["missing_arguments"] == ["nodes or edges"]
  52
  53    experiment = DEFAULT_REGISTRY.validate_arguments("record_experiment", {"metric_name": "throughput"}, config)
  54    assert experiment is None
  55
  56    nested = DEFAULT_REGISTRY.validate_arguments("record_findings", {"findings": [{}]}, config)
  57    assert nested is not None
  58    assert nested["missing_arguments"] == ["findings[0].name"]
  59
  60    nested_task = DEFAULT_REGISTRY.validate_arguments("record_tasks", {"tasks": [{"goal": "do work"}]}, config)
  61    assert nested_task is not None
  62    assert nested_task["missing_arguments"] == ["tasks[0].title"]
  63
  64
  65def test_tool_registry_blocks_truncated_reference_arguments(tmp_path):
  66    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
  67
  68    experiment = DEFAULT_REGISTRY.validate_arguments(
  69        "record_experiment",
  70        {
  71            "title": "Measure local files",
  72            "evidence_artifact": "art_fb73...",
  73            "next_action": "validate the exact artifact",
  74        },
  75        config,
  76    )
  77
  78    assert experiment is not None
  79    assert experiment["error"] == "placeholder tool arguments"
  80    assert experiment["placeholder_arguments"] == ["evidence_artifact"]
  81
  82
  83def test_tool_access_config_filters_worker_schema_and_blocks_calls(tmp_path):
  84    config = AppConfig(runtime=RuntimeConfig(home=tmp_path), tools=ToolAccessConfig(browser=False, web=False, shell=False, files=False))
  85    db = AgentDB(tmp_path / "state.db")
  86    try:
  87        job_id = db.create_job("Restricted tools")
  88        ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id)
  89
  90        names = {tool["function"]["name"] for tool in DEFAULT_REGISTRY.openai_tools(config=config)}
  91        assert "browser_navigate" not in names
  92        assert "web_search" not in names
  93        assert "shell_exec" not in names
  94        assert "write_file" not in names
  95        assert "write_artifact" in names
  96
  97        result = json.loads(DEFAULT_REGISTRY.handle("shell_exec", {"command": "printf no"}, ctx))
  98        assert result["success"] is False
  99        assert result["tool_access"] == "shell"
 100    finally:
 101        db.close()
 102
 103
 104def test_artifact_tools_roundtrip(tmp_path):
 105    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
 106    db = AgentDB(tmp_path / "state.db")
 107    try:
 108        job_id = db.create_job("Save evidence")
 109        run_id = db.start_run(job_id, model="fake")
 110        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="write_artifact")
 111        ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
 112
 113        raw = DEFAULT_REGISTRY.handle("write_artifact", {"content": "needle text", "title": "Evidence"}, ctx)
 114        result = json.loads(raw)
 115        assert result["success"] is True
 116
 117        read_raw = DEFAULT_REGISTRY.handle("read_artifact", {"artifact_id": result["artifact_id"]}, ctx)
 118        assert json.loads(read_raw)["content"] == "needle text"
 119
 120        path_raw = DEFAULT_REGISTRY.handle("read_artifact", {"artifact_id": result["path"]}, ctx)
 121        assert json.loads(path_raw)["artifact_id"] == result["artifact_id"]
 122
 123        title_raw = DEFAULT_REGISTRY.handle("read_artifact", {"title": "Evidence"}, ctx)
 124        assert json.loads(title_raw)["content"] == "needle text"
 125
 126        number_raw = DEFAULT_REGISTRY.handle("read_artifact", {"artifact_id": "1"}, ctx)
 127        assert json.loads(number_raw)["content"] == "needle text"
 128
 129        search_raw = DEFAULT_REGISTRY.handle("search_artifacts", {"query": "needle"}, ctx)
 130        assert json.loads(search_raw)["results"][0]["id"] == result["artifact_id"]
 131    finally:
 132        db.close()
 133
 134
 135def test_read_artifact_missing_ref_returns_valid_recent_refs(tmp_path):
 136    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
 137    db = AgentDB(tmp_path / "state.db")
 138    try:
 139        job_id = db.create_job("Save evidence")
 140        run_id = db.start_run(job_id, model="fake")
 141        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="write_artifact")
 142        ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
 143        stored = ctx.artifacts.write_text(job_id=job_id, run_id=run_id, step_id=step_id, title="Useful Evidence", content="saved")
 144
 145        raw = DEFAULT_REGISTRY.handle("read_artifact", {"artifact_id": "art_missing"}, ctx)
 146        result = json.loads(raw)
 147
 148        assert result["success"] is False
 149        assert result["recoverable"] is True
 150        assert result["error"] == "artifact not found: art_missing"
 151        assert "search_artifacts" in result["guidance"]
 152        assert result["recent_artifacts"][0]["id"] == stored.id
 153        assert result["recent_artifacts"][0]["title"] == "Useful Evidence"
 154    finally:
 155        db.close()
 156
 157
 158def test_defer_job_records_resume_time_without_pausing(tmp_path):
 159    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
 160    db = AgentDB(tmp_path / "state.db")
 161    try:
 162        job_id = db.create_job("Monitor a long process")
 163        run_id = db.start_run(job_id, model="fake")
 164        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="defer_job")
 165        ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
 166
 167        raw = DEFAULT_REGISTRY.handle(
 168            "defer_job",
 169            {"seconds": 60, "reason": "process is still running", "next_action": "check status"},
 170            ctx,
 171        )
 172        result = json.loads(raw)
 173
 174        assert result["success"] is True
 175        assert result["status"] == "running"
 176        job = db.get_job(job_id)
 177        assert job["status"] == "running"
 178        assert job["metadata"]["defer_until"]
 179        assert job["metadata"]["defer_reason"] == "process is still running"
 180        assert job["metadata"]["defer_next_action"] == "check status"
 181        assert any(event["event_type"] == "agent_message" for event in db.list_events(job_id=job_id, limit=10))
 182    finally:
 183        db.close()
 184
 185
 186def test_shell_exec_tool_runs_bounded_command(tmp_path):
 187    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
 188    db = AgentDB(tmp_path / "state.db")
 189    try:
 190        job_id = db.create_job("Run command")
 191        run_id = db.start_run(job_id, model="fake")
 192        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
 193        ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
 194
 195        raw = DEFAULT_REGISTRY.handle("shell_exec", {"command": "printf hello", "timeout_seconds": 5}, ctx)
 196        result = json.loads(raw)
 197
 198        assert result["success"] is True
 199        assert result["returncode"] == 0
 200        assert result["stdout"] == "hello"
 201    finally:
 202        db.close()
 203
 204
 205def test_shell_exec_flags_masked_auth_failure_output(tmp_path):
 206    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
 207    db = AgentDB(tmp_path / "state.db")
 208    try:
 209        job_id = db.create_job("Run command")
 210        run_id = db.start_run(job_id, model="fake")
 211        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
 212        ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
 213
 214        raw = DEFAULT_REGISTRY.handle(
 215            "shell_exec",
 216            {
 217                "command": (
 218                    "printf 'HTTP request sent, awaiting response... 401 Unauthorized\\n"
 219                    "Username/Password Authentication Failed.\\nDownloaded: file.bin (29 bytes)\\n'"
 220                ),
 221                "timeout_seconds": 5,
 222            },
 223            ctx,
 224        )
 225        result = json.loads(raw)
 226
 227        assert result["returncode"] == 0
 228        assert result["success"] is False
 229        assert "authentication or authorization failure" in result["error"]
 230    finally:
 231        db.close()
 232
 233
 234def test_write_file_tool_writes_and_appends_workspace_file(tmp_path, monkeypatch):
 235    monkeypatch.chdir(tmp_path)
 236    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
 237    db = AgentDB(tmp_path / "state.db")
 238    try:
 239        job_id = db.create_job("Write deliverable")
 240        run_id = db.start_run(job_id, model="fake")
 241        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="write_file")
 242        ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
 243
 244        raw = DEFAULT_REGISTRY.handle("write_file", {"path": "out/report.md", "content": "one\n"}, ctx)
 245        result = json.loads(raw)
 246        append_raw = DEFAULT_REGISTRY.handle(
 247            "write_file",
 248            {"path": "out/report.md", "content": "two\n", "mode": "append"},
 249            ctx,
 250        )
 251        append_result = json.loads(append_raw)
 252
 253        assert result["success"] is True
 254        assert append_result["success"] is True
 255        assert (tmp_path / "out" / "report.md").read_text() == "one\ntwo\n"
 256    finally:
 257        db.close()
 258
 259
 260def test_shell_exec_timeout_kills_process_group(tmp_path):
 261    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
 262    db = AgentDB(tmp_path / "state.db")
 263    try:
 264        job_id = db.create_job("Run command")
 265        run_id = db.start_run(job_id, model="fake")
 266        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
 267        ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
 268
 269        raw = DEFAULT_REGISTRY.handle("shell_exec", {"command": "sleep 5 | cat", "timeout_seconds": 1}, ctx)
 270        result = json.loads(raw)
 271
 272        assert result["success"] is False
 273        assert result["timed_out"] is True
 274        assert result["duration_seconds"] < 4
 275    finally:
 276        db.close()
 277
 278
 279def test_cleanup_registered_shell_processes_kills_orphaned_group(tmp_path):
 280    process = subprocess.Popen("sleep 30", shell=True, start_new_session=True)
 281    for _ in range(20):
 282        if process.poll() is None:
 283            try:
 284                os.kill(process.pid, 0)
 285                break
 286            except ProcessLookupError:
 287                pass
 288        time.sleep(0.02)
 289    registry = tmp_path / "runtime" / "shell_processes.jsonl"
 290    registry.parent.mkdir(parents=True)
 291    registry.write_text(json.dumps({"pid": process.pid, "command": "sleep 30"}) + "\n", encoding="utf-8")
 292    try:
 293        cleaned = cleanup_registered_shell_processes(tmp_path)
 294
 295        assert cleaned and cleaned[0]["pid"] == process.pid
 296        process.wait(timeout=3)
 297        assert not registry.exists()
 298    finally:
 299        if process.poll() is None:
 300            try:
 301                os.killpg(process.pid, signal.SIGKILL)
 302            except ProcessLookupError:
 303                pass
 304            process.wait(timeout=3)
 305
 306
 307def test_shell_exec_does_not_attach_local_ssh_config(tmp_path):
 308    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
 309    db = AgentDB(tmp_path / "state.db")
 310    try:
 311        job_id = db.create_job("Run command")
 312        run_id = db.start_run(job_id, model="fake")
 313        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
 314        ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
 315
 316        raw = DEFAULT_REGISTRY.handle("shell_exec", {"command": "ssh -V", "timeout_seconds": 5}, ctx)
 317        result = json.loads(raw)
 318
 319        assert "ssh_config" not in result
 320    finally:
 321        db.close()
 322
 323
 324def test_shell_exec_reports_nonzero_stderr_as_error(tmp_path):
 325    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
 326    db = AgentDB(tmp_path / "state.db")
 327    try:
 328        job_id = db.create_job("Run command")
 329        run_id = db.start_run(job_id, model="fake")
 330        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
 331        ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
 332
 333        raw = DEFAULT_REGISTRY.handle(
 334            "shell_exec",
 335            {"command": "printf 'sudo: a terminal is required to read the password\\n' >&2; exit 1", "timeout_seconds": 5},
 336            ctx,
 337        )
 338        result = json.loads(raw)
 339
 340        assert result["success"] is False
 341        assert "interactive sudo/password" in result["error"]
 342    finally:
 343        db.close()
 344
 345
 346def test_shell_exec_flags_sudo_password_hidden_by_success_status(tmp_path):
 347    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
 348    db = AgentDB(tmp_path / "state.db")
 349    try:
 350        job_id = db.create_job("Run command")
 351        run_id = db.start_run(job_id, model="fake")
 352        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
 353        ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
 354
 355        raw = DEFAULT_REGISTRY.handle(
 356            "shell_exec",
 357            {
 358                "command": (
 359                    "printf 'sudo: a terminal is required to read the password\\n"
 360                    "sudo: a password is required\\n'"
 361                ),
 362                "timeout_seconds": 5,
 363            },
 364            ctx,
 365        )
 366        result = json.loads(raw)
 367
 368        assert result["returncode"] == 0
 369        assert result["success"] is False
 370        assert "interactive sudo/password" in result["error"]
 371    finally:
 372        db.close()
 373
 374
 375def test_shell_exec_flags_missing_command_hidden_by_success_status(tmp_path):
 376    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
 377    db = AgentDB(tmp_path / "state.db")
 378    try:
 379        job_id = db.create_job("Run command")
 380        run_id = db.start_run(job_id, model="fake")
 381        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
 382        ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
 383
 384        raw = DEFAULT_REGISTRY.handle(
 385            "shell_exec",
 386            {"command": "printf '/bin/sh: 1: build-tool: not found\\n'", "timeout_seconds": 5},
 387            ctx,
 388        )
 389        result = json.loads(raw)
 390
 391        assert result["success"] is False
 392        assert "missing command" in result["error"]
 393    finally:
 394        db.close()
 395
 396
 397def test_shell_exec_flags_missing_absolute_executable_hidden_by_success_status(tmp_path):
 398    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
 399    db = AgentDB(tmp_path / "state.db")
 400    try:
 401        job_id = db.create_job("Run command")
 402        run_id = db.start_run(job_id, model="fake")
 403        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
 404        ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
 405
 406        raw = DEFAULT_REGISTRY.handle(
 407            "shell_exec",
 408            {"command": "printf '/bin/sh: 1: /tmp/tools/build-tool: not found\\n'", "timeout_seconds": 5},
 409            ctx,
 410        )
 411        result = json.loads(raw)
 412
 413        assert result["returncode"] == 0
 414        assert result["success"] is False
 415        assert "missing command" in result["error"]
 416        assert "/tmp/tools/build-tool: not found" in result["error"]
 417    finally:
 418        db.close()
 419
 420
 421def test_shell_exec_reports_empty_which_probe_as_missing_executable(tmp_path):
 422    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
 423    db = AgentDB(tmp_path / "state.db")
 424    try:
 425        job_id = db.create_job("Run command")
 426        run_id = db.start_run(job_id, model="fake")
 427        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
 428        ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
 429
 430        raw = DEFAULT_REGISTRY.handle(
 431            "shell_exec",
 432            {"command": "which definitely-missing-nipux-test-command", "timeout_seconds": 5},
 433            ctx,
 434        )
 435        result = json.loads(raw)
 436
 437        assert result["success"] is False
 438        assert result["returncode"] == 1
 439        assert result["error"] == "command probe found no executable: definitely-missing-nipux-test-command"
 440    finally:
 441        db.close()
 442
 443
 444def test_shell_exec_flags_empty_successful_probe_as_no_observation(tmp_path):
 445    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
 446    db = AgentDB(tmp_path / "state.db")
 447    try:
 448        job_id = db.create_job("Run command")
 449        run_id = db.start_run(job_id, model="fake")
 450        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
 451        ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
 452
 453        raw = DEFAULT_REGISTRY.handle(
 454            "shell_exec",
 455            {"command": "find /tmp/definitely-missing-nipux-test-path -maxdepth 1 2>/dev/null || true", "timeout_seconds": 5},
 456            ctx,
 457        )
 458        result = json.loads(raw)
 459
 460        assert result["returncode"] == 0
 461        assert result["success"] is False
 462        assert "produced no output" in result["error"]
 463        assert "filesystem probe" in result["error"]
 464    finally:
 465        db.close()
 466
 467
 468def test_shell_exec_flags_missing_which_probe_hidden_by_true(tmp_path):
 469    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
 470    db = AgentDB(tmp_path / "state.db")
 471    try:
 472        job_id = db.create_job("Run command")
 473        run_id = db.start_run(job_id, model="fake")
 474        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
 475        ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
 476
 477        raw = DEFAULT_REGISTRY.handle(
 478            "shell_exec",
 479            {"command": "which definitely-missing-nipux-test-command || true", "timeout_seconds": 5},
 480            ctx,
 481        )
 482        result = json.loads(raw)
 483
 484        assert result["returncode"] == 0
 485        assert result["success"] is False
 486        assert "probe found no executable" in result["error"]
 487    finally:
 488        db.close()
 489
 490
 491def test_shell_exec_flags_make_failure_hidden_by_pipe_status(tmp_path):
 492    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
 493    db = AgentDB(tmp_path / "state.db")
 494    try:
 495        job_id = db.create_job("Run command")
 496        run_id = db.start_run(job_id, model="fake")
 497        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
 498        ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
 499
 500        raw = DEFAULT_REGISTRY.handle(
 501            "shell_exec",
 502            {"command": "printf 'Makefile:6: *** Build system changed:\\n.  Stop.\\n'", "timeout_seconds": 5},
 503            ctx,
 504        )
 505        result = json.loads(raw)
 506
 507        assert result["success"] is False
 508        assert "build/tool failure" in result["error"]
 509    finally:
 510        db.close()
 511
 512
 513def test_update_job_state_keeps_terminal_statuses_operator_only(tmp_path):
 514    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
 515    db = AgentDB(tmp_path / "state.db")
 516    try:
 517        job_id = db.create_job("Keep running")
 518        run_id = db.start_run(job_id, model="fake")
 519        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="update_job_state")
 520        ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
 521
 522        for requested in ("paused", "cancelled", "completed", "failed"):
 523            raw = DEFAULT_REGISTRY.handle("update_job_state", {"status": requested}, ctx)
 524            result = json.loads(raw)
 525
 526            assert result["success"] is True
 527            assert result["requested_status"] == requested
 528            assert result["kept_running"] is True
 529            assert db.get_job(job_id)["status"] == "running"
 530            if requested == "completed":
 531                assert result["follow_up_task"]["title"] == "Audit latest checkpoint against objective"
 532                assert result["follow_up_task"]["status"] == "open"
 533                assert result["follow_up_task"]["output_contract"] == "decision"
 534                assert "prompt-to-artifact checklist" in result["follow_up_task"]["acceptance_criteria"]
 535                assert result["follow_up_task"]["evidence_needed"]
 536                assert result["follow_up_task"]["stall_behavior"]
 537                assert result["follow_up_task"]["metadata"]["source"] == "update_job_state"
 538                assert result["follow_up_task"]["metadata"]["completion_audit_required"] is True
 539            else:
 540                assert "follow_up_task" not in result
 541
 542        tasks = db.get_job(job_id)["metadata"]["task_queue"]
 543        assert [task["title"] for task in tasks] == ["Audit latest checkpoint against objective"]
 544    finally:
 545        db.close()
 546
 547
 548def test_report_update_tool_records_operator_visible_note(tmp_path):
 549    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
 550    db = AgentDB(tmp_path / "state.db")
 551    try:
 552        job_id = db.create_job("Research topic")
 553        run_id = db.start_run(job_id, model="fake")
 554        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="report_update")
 555        ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
 556
 557        raw = DEFAULT_REGISTRY.handle("report_update", {"message": "Found a usable finding source", "category": "finding"}, ctx)
 558        result = json.loads(raw)
 559        job = db.get_job(job_id)
 560
 561        assert result["success"] is True
 562        assert job["metadata"]["agent_updates"][-1]["message"] == "Found a usable finding source"
 563        assert job["metadata"]["last_agent_update"]["category"] == "finding"
 564    finally:
 565        db.close()
 566
 567
 568def test_record_lesson_tool_records_durable_learning(tmp_path):
 569    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
 570    db = AgentDB(tmp_path / "state.db")
 571    try:
 572        job_id = db.create_job("Research topic")
 573        run_id = db.start_run(job_id, model="fake")
 574        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_lesson")
 575        ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
 576
 577        raw = DEFAULT_REGISTRY.handle(
 578            "record_lesson",
 579            {"lesson": "Competitor low-evidence lists are not finding sources.", "category": "source_quality", "confidence": 0.8},
 580            ctx,
 581        )
 582        result = json.loads(raw)
 583        job = db.get_job(job_id)
 584
 585        assert result["success"] is True
 586        assert job["metadata"]["lessons"][-1]["lesson"] == "Competitor low-evidence lists are not finding sources."
 587        assert job["metadata"]["last_lesson"]["category"] == "source_quality"
 588    finally:
 589        db.close()
 590
 591
 592def test_record_lesson_cannot_clear_measurement_obligation_with_vague_lesson(tmp_path):
 593    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
 594    db = AgentDB(tmp_path / "state.db")
 595    try:
 596        job_id = db.create_job(
 597            "Improve a measurable process",
 598            metadata={
 599                "pending_measurement_obligation": {
 600                    "source_step_no": 4,
 601                    "tool": "shell_exec",
 602                    "metric_candidates": ["42 units/s"],
 603                    "command": "run trial",
 604                }
 605            },
 606        )
 607        run_id = db.start_run(job_id, model="fake")
 608        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_lesson")
 609        ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
 610
 611        raw = DEFAULT_REGISTRY.handle(
 612            "record_lesson",
 613            {"lesson": "continue focused work", "category": "strategy"},
 614            ctx,
 615        )
 616        result = json.loads(raw)
 617        job = db.get_job(job_id)
 618
 619        assert result["success"] is False
 620        assert result["error"] == "measurement explanation required"
 621        assert job["metadata"]["pending_measurement_obligation"]["source_step_no"] == 4
 622        assert "lessons" not in job["metadata"]
 623    finally:
 624        db.close()
 625
 626
 627def test_record_lesson_can_explain_invalid_measurement_obligation(tmp_path):
 628    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
 629    db = AgentDB(tmp_path / "state.db")
 630    try:
 631        job_id = db.create_job(
 632            "Improve a measurable process",
 633            metadata={
 634                "pending_measurement_obligation": {
 635                    "source_step_no": 4,
 636                    "tool": "shell_exec",
 637                    "metric_candidates": ["42 units/s"],
 638                    "command": "run trial",
 639                }
 640            },
 641        )
 642        run_id = db.start_run(job_id, model="fake")
 643        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_lesson")
 644        ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
 645
 646        raw = DEFAULT_REGISTRY.handle(
 647            "record_lesson",
 648            {
 649                "lesson": (
 650                    "The output was diagnostic only and did not contain a valid metric; "
 651                    "rerun the branch with a measured trial."
 652                ),
 653                "category": "mistake",
 654            },
 655            ctx,
 656        )
 657        result = json.loads(raw)
 658        job = db.get_job(job_id)
 659
 660        assert result["success"] is True
 661        assert job["metadata"].get("pending_measurement_obligation") == {}
 662        assert job["metadata"]["last_measurement_obligation"]["resolution_status"] == "explained"
 663        assert job["metadata"]["last_measurement_obligation"]["resolution_tool"] == "record_lesson"
 664    finally:
 665        db.close()
 666
 667
 668def test_memory_graph_tools_roundtrip(tmp_path):
 669    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
 670    db = AgentDB(tmp_path / "state.db")
 671    try:
 672        job_id = db.create_job("Build durable project understanding")
 673        run_id = db.start_run(job_id, model="fake")
 674        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_memory_graph")
 675        ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
 676
 677        raw = DEFAULT_REGISTRY.handle(
 678            "record_memory_graph",
 679            {
 680                "nodes": [
 681                    {
 682                        "title": "Use measured checkpoints before expanding scope",
 683                        "kind": "strategy",
 684                        "status": "active",
 685                        "summary": "Convert branch outcomes into evidence-backed decisions before opening more work.",
 686                        "salience": 0.9,
 687                        "tags": ["progress", "validation"],
 688                        "evidence_refs": ["art_123"],
 689                    },
 690                    {
 691                        "title": "Open question: missing evaluator",
 692                        "kind": "question",
 693                        "status": "open",
 694                        "summary": "The job needs a concrete validation signal for the next branch.",
 695                    },
 696                ],
 697                "edges": [
 698                    {
 699                        "from_key": "Use measured checkpoints before expanding scope",
 700                        "to_key": "Open question: missing evaluator",
 701                        "relation": "raises",
 702                    }
 703                ],
 704            },
 705            ctx,
 706        )
 707        result = json.loads(raw)
 708
 709        assert result["success"] is True
 710        assert result["added_nodes"] == 2
 711        assert result["added_edges"] == 1
 712        job = db.get_job(job_id)
 713        graph = job["metadata"]["memory_graph"]
 714        assert len(graph["nodes"]) == 2
 715        assert graph["nodes"][0]["kind"] == "strategy"
 716        assert graph["nodes"][0]["evidence_refs"] == ["art_123"]
 717        assert db.list_events(job_id=job_id, event_types=["memory_node"])[0]["title"] == "memory graph"
 718
 719        search_raw = DEFAULT_REGISTRY.handle("search_memory_graph", {"query": "evaluator"}, ctx)
 720        search = json.loads(search_raw)
 721        assert search["success"] is True
 722        assert search["nodes"][0]["title"] == "Open question: missing evaluator"
 723        assert search["edges"][0]["relation"] == "raises"
 724    finally:
 725        db.close()
 726
 727
 728def test_record_source_and_findings_tools_update_ledgers(tmp_path):
 729    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
 730    db = AgentDB(tmp_path / "state.db")
 731    try:
 732        job_id = db.create_job("Research topic")
 733        run_id = db.start_run(job_id, model="fake")
 734        source_step = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_source")
 735        ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=source_step)
 736
 737        source_raw = DEFAULT_REGISTRY.handle(
 738            "record_source",
 739            {"source": "https://example.com", "source_type": "web_source", "usefulness_score": 0.8, "yield_count": 2},
 740            ctx,
 741        )
 742        finding_raw = DEFAULT_REGISTRY.handle(
 743            "record_findings",
 744            {
 745                "findings": [
 746                    {
 747                        "name": "Acme Finding",
 748                        "url": "https://acme.example",
 749                        "source_url": "https://example-source.com/acme",
 750                        "location": "Toronto",
 751                        "category": "example category",
 752                        "reason": "reusable result",
 753                        "score": 0.75,
 754                    }
 755                ]
 756            },
 757            ctx,
 758        )
 759        job = db.get_job(job_id)
 760
 761        assert json.loads(source_raw)["source"]["yield_count"] == 2
 762        finding_result = json.loads(finding_raw)
 763        assert finding_result["added"] == 1
 764        assert finding_result["sources_updated"] == 1
 765        assert job["metadata"]["source_ledger"][0]["source"] == "https://example.com"
 766        assert any(source["source"] == "https://example-source.com/acme" for source in job["metadata"]["source_ledger"])
 767        assert job["metadata"]["finding_ledger"][0]["name"] == "Acme Finding"
 768        assert job["metadata"]["last_agent_update"]["category"] == "finding"
 769    finally:
 770        db.close()
 771
 772
 773def test_record_source_requires_assessment(tmp_path):
 774    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
 775    db = AgentDB(tmp_path / "state.db")
 776    try:
 777        job_id = db.create_job("Research topic")
 778        run_id = db.start_run(job_id, model="fake")
 779        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_source")
 780        ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
 781
 782        raw = DEFAULT_REGISTRY.handle("record_source", {"source": "https://example.com"}, ctx)
 783        result = json.loads(raw)
 784
 785        assert result["success"] is False
 786        assert result["error"] == "source assessment is required"
 787        assert db.get_job(job_id)["metadata"].get("source_ledger") is None
 788    finally:
 789        db.close()
 790
 791
 792def test_record_source_does_not_accept_type_without_assessment(tmp_path):
 793    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
 794    db = AgentDB(tmp_path / "state.db")
 795    try:
 796        job_id = db.create_job("Research topic")
 797        run_id = db.start_run(job_id, model="fake")
 798        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_source")
 799        ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
 800
 801        raw = DEFAULT_REGISTRY.handle(
 802            "record_source",
 803            {"source": "https://example.com", "source_type": "web_source"},
 804            ctx,
 805        )
 806        result = json.loads(raw)
 807
 808        assert result["success"] is False
 809        assert result["error"] == "source assessment is required"
 810    finally:
 811        db.close()
 812
 813
 814def test_record_findings_reports_unchanged_duplicates_without_agent_update_noise(tmp_path):
 815    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
 816    db = AgentDB(tmp_path / "state.db")
 817    try:
 818        job_id = db.create_job("Research topic")
 819        run_id = db.start_run(job_id, model="fake")
 820        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_findings")
 821        ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
 822        args = {
 823            "findings": [
 824                {
 825                    "name": "Reusable finding",
 826                    "source_url": "https://example-source.com/finding",
 827                    "reason": "Evidence-backed result",
 828                    "score": 0.75,
 829                }
 830            ]
 831        }
 832
 833        first = json.loads(DEFAULT_REGISTRY.handle("record_findings", args, ctx))
 834        agent_events_after_first = len(db.list_events(job_id=job_id, event_types=["agent_message"]))
 835        repeated = json.loads(DEFAULT_REGISTRY.handle("record_findings", args, ctx))
 836        agent_events_after_repeat = len(db.list_events(job_id=job_id, event_types=["agent_message"]))
 837
 838        assert first["added"] == 1
 839        assert first["updated"] == 0
 840        assert first["unchanged"] == 0
 841        assert repeated["added"] == 0
 842        assert repeated["updated"] == 0
 843        assert repeated["unchanged"] == 1
 844        assert agent_events_after_repeat == agent_events_after_first
 845    finally:
 846        db.close()
 847
 848
 849def test_record_findings_requires_evidence_anchor(tmp_path):
 850    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
 851    db = AgentDB(tmp_path / "state.db")
 852    try:
 853        job_id = db.create_job("Research topic")
 854        run_id = db.start_run(job_id, model="fake")
 855        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_findings")
 856        ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
 857
 858        raw = DEFAULT_REGISTRY.handle(
 859            "record_findings",
 860            {"findings": [{"name": "Unsupported label", "category": "candidate"}]},
 861            ctx,
 862        )
 863        result = json.loads(raw)
 864
 865        assert result["success"] is False
 866        assert result["error"] == "no valid finding with name/title and evidence was provided"
 867        assert result["rejected"] == [{"name": "Unsupported label", "reason": "missing_evidence"}]
 868        assert db.get_job(job_id)["metadata"].get("finding_ledger") is None
 869    finally:
 870        db.close()
 871
 872
 873def test_record_findings_reports_rejected_unevidenced_items_in_mixed_batch(tmp_path):
 874    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
 875    db = AgentDB(tmp_path / "state.db")
 876    try:
 877        job_id = db.create_job("Research topic")
 878        run_id = db.start_run(job_id, model="fake")
 879        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_findings")
 880        ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
 881
 882        raw = DEFAULT_REGISTRY.handle(
 883            "record_findings",
 884            {
 885                "findings": [
 886                    {"name": "Unsupported label"},
 887                    {"name": "Evidence-backed result", "metadata": {"source_url": "file:///tmp/evidence.txt"}},
 888                ]
 889            },
 890            ctx,
 891        )
 892        result = json.loads(raw)
 893        job = db.get_job(job_id)
 894
 895        assert result["success"] is True
 896        assert result["added"] == 1
 897        assert result["rejected"] == [{"name": "Unsupported label", "reason": "missing_evidence"}]
 898        assert job["metadata"]["finding_ledger"][0]["name"] == "Evidence-backed result"
 899        assert job["metadata"]["last_agent_update"]["metadata"]["rejected"] == 1
 900    finally:
 901        db.close()
 902
 903
 904def test_record_tasks_tool_updates_task_queue(tmp_path):
 905    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
 906    db = AgentDB(tmp_path / "state.db")
 907    try:
 908        job_id = db.create_job("Research topic")
 909        run_id = db.start_run(job_id, model="fake")
 910        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_tasks")
 911        ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
 912
 913        raw = DEFAULT_REGISTRY.handle(
 914            "record_tasks",
 915            {
 916                "tasks": [
 917                    {
 918                        "title": "Explore primary sources",
 919                        "status": "open",
 920                        "priority": 5,
 921                        "goal": "Find artifact-backed evidence",
 922                        "source_hint": "official docs",
 923                    }
 924                ]
 925            },
 926            ctx,
 927        )
 928        result = json.loads(raw)
 929        job = db.get_job(job_id)
 930
 931        assert result["success"] is True
 932        assert result["added"] == 1
 933        task = job["metadata"]["task_queue"][0]
 934        assert task["title"] == "Explore primary sources"
 935        assert task["priority"] == 5
 936        assert task["output_contract"] == "research"
 937        assert task["acceptance_criteria"]
 938        assert task["evidence_needed"]
 939        assert task["stall_behavior"]
 940        assert task["metadata"]["contract_inferred_fields"] == [
 941            "acceptance_criteria",
 942            "evidence_needed",
 943            "output_contract",
 944            "stall_behavior",
 945        ]
 946        assert job["metadata"]["last_agent_update"]["category"] == "plan"
 947    finally:
 948        db.close()
 949
 950
 951def test_record_tasks_dedupes_semantic_task_under_backlog_pressure(tmp_path):
 952    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
 953    db = AgentDB(tmp_path / "state.db")
 954    try:
 955        job_id = db.create_job(
 956            "Keep a long-running job focused",
 957            metadata={
 958                "task_queue": [
 959                    {
 960                        "title": "Validate model files and run baseline benchmark",
 961                        "status": "open",
 962                        "priority": 5,
 963                        "goal": "Get a measured baseline.",
 964                    },
 965                    *[
 966                        {"title": f"Done branch {index}", "status": "done", "priority": 0}
 967                        for index in range(81)
 968                    ],
 969                ]
 970            },
 971        )
 972        run_id = db.start_run(job_id, model="fake")
 973        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_tasks")
 974        ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
 975
 976        raw = DEFAULT_REGISTRY.handle(
 977            "record_tasks",
 978            {
 979                "tasks": [
 980                    {
 981                        "title": "Validate candidate model files and run baseline benchmark",
 982                        "status": "active",
 983                        "priority": 10,
 984                        "goal": "Use the existing validation branch for the first measured run.",
 985                    }
 986                ]
 987            },
 988            ctx,
 989        )
 990        result = json.loads(raw)
 991        job = db.get_job(job_id)
 992        task_queue = job["metadata"]["task_queue"]
 993
 994        assert result["success"] is True
 995        assert result["added"] == 0
 996        assert result["updated"] == 1
 997        assert len(task_queue) == 82
 998        task = task_queue[0]
 999        assert task["title"] == "Validate model files and run baseline benchmark"
1000        assert task["status"] == "active"
1001        assert task["metadata"]["original_title"] == "Validate candidate model files and run baseline benchmark"
1002        assert task["metadata"]["matched_existing_task"]["title"] == "Validate model files and run baseline benchmark"
1003    finally:
1004        db.close()
1005
1006
1007def test_record_tasks_reports_unchanged_duplicates_without_agent_update_noise(tmp_path):
1008    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1009    db = AgentDB(tmp_path / "state.db")
1010    try:
1011        job_id = db.create_job("Research topic")
1012        run_id = db.start_run(job_id, model="fake")
1013        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_tasks")
1014        ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
1015        args = {
1016            "tasks": [
1017                {
1018                    "title": "Explore primary sources",
1019                    "status": "open",
1020                    "priority": 5,
1021                    "goal": "Find artifact-backed evidence",
1022                }
1023            ]
1024        }
1025
1026        first = json.loads(DEFAULT_REGISTRY.handle("record_tasks", args, ctx))
1027        agent_events_after_first = len(db.list_events(job_id=job_id, event_types=["agent_message"]))
1028        db.update_job_metadata(
1029            job_id,
1030            {"pending_measurement_obligation": {"source_step_no": 1, "metric_candidates": ["score"]}},
1031        )
1032        repeated = json.loads(DEFAULT_REGISTRY.handle("record_tasks", args, ctx))
1033        agent_events_after_repeat = len(db.list_events(job_id=job_id, event_types=["agent_message"]))
1034
1035        assert first["added"] == 1
1036        assert first["updated"] == 0
1037        assert first["unchanged"] == 0
1038        assert repeated["added"] == 0
1039        assert repeated["updated"] == 0
1040        assert repeated["unchanged"] == 1
1041        assert agent_events_after_repeat == agent_events_after_first
1042        assert db.get_job(job_id)["metadata"]["pending_measurement_obligation"]["source_step_no"] == 1
1043    finally:
1044        db.close()
1045
1046
1047def test_record_tasks_cannot_defer_measurement_with_unrelated_task(tmp_path):
1048    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1049    db = AgentDB(tmp_path / "state.db")
1050    try:
1051        job_id = db.create_job(
1052            "Improve measurable process",
1053            metadata={
1054                "pending_measurement_obligation": {
1055                    "source_step_no": 8,
1056                    "tool": "shell_exec",
1057                    "metric_candidates": ["42 units/s"],
1058                }
1059            },
1060        )
1061        run_id = db.start_run(job_id, model="fake")
1062        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_tasks")
1063        ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
1064
1065        raw = DEFAULT_REGISTRY.handle(
1066            "record_tasks",
1067            {"tasks": [{"title": "Read more background sources", "status": "open", "output_contract": "research"}]},
1068            ctx,
1069        )
1070        result = json.loads(raw)
1071        job = db.get_job(job_id)
1072
1073        assert result["success"] is False
1074        assert result["error"] == "measurement task required"
1075        assert job["metadata"]["pending_measurement_obligation"]["source_step_no"] == 8
1076        assert "task_queue" not in job["metadata"]
1077    finally:
1078        db.close()
1079
1080
1081def test_record_tasks_can_defer_measurement_with_explicit_measurement_task(tmp_path):
1082    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1083    db = AgentDB(tmp_path / "state.db")
1084    try:
1085        job_id = db.create_job(
1086            "Improve measurable process",
1087            metadata={
1088                "pending_measurement_obligation": {
1089                    "source_step_no": 8,
1090                    "tool": "shell_exec",
1091                    "metric_candidates": ["42 units/s"],
1092                }
1093            },
1094        )
1095        run_id = db.start_run(job_id, model="fake")
1096        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_tasks")
1097        ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
1098
1099        raw = DEFAULT_REGISTRY.handle(
1100            "record_tasks",
1101            {
1102                "tasks": [{
1103                    "title": "Rerun the branch and record the missing measurement",
1104                    "status": "open",
1105                    "output_contract": "experiment",
1106                    "acceptance_criteria": "valid metric recorded",
1107                    "evidence_needed": "measured command output",
1108                    "stall_behavior": "record blocker if measurement cannot be obtained",
1109                }]
1110            },
1111            ctx,
1112        )
1113        result = json.loads(raw)
1114        job = db.get_job(job_id)
1115
1116        assert result["success"] is True
1117        assert result["added"] == 1
1118        assert job["metadata"].get("pending_measurement_obligation") == {}
1119        assert job["metadata"]["last_measurement_obligation"]["resolution_status"] == "deferred"
1120        assert job["metadata"]["last_measurement_obligation"]["resolution_tool"] == "record_tasks"
1121    finally:
1122        db.close()
1123
1124
1125def test_record_roadmap_tool_updates_roadmap(tmp_path):
1126    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1127    db = AgentDB(tmp_path / "state.db")
1128    try:
1129        job_id = db.create_job("Build a broad generic outcome")
1130        run_id = db.start_run(job_id, model="fake")
1131        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_roadmap")
1132        ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
1133
1134        raw = DEFAULT_REGISTRY.handle(
1135            "record_roadmap",
1136            {
1137                "title": "Generic Roadmap",
1138                "status": "active",
1139                "scope": "Coordinate broad work through milestones.",
1140                "current_milestone": "Foundation",
1141                "validation_contract": "Each milestone needs evidence.",
1142                "milestones": [{
1143                    "title": "Foundation",
1144                    "status": "active",
1145                    "priority": 7,
1146                    "acceptance_criteria": "first durable output exists",
1147                    "evidence_needed": "artifact and ledger update",
1148                    "features": [{
1149                        "title": "Create first checkpoint",
1150                        "status": "active",
1151                        "output_contract": "artifact",
1152                    }],
1153                }],
1154            },
1155            ctx,
1156        )
1157        result = json.loads(raw)
1158        job = db.get_job(job_id)
1159        roadmap = job["metadata"]["roadmap"]
1160
1161        assert result["success"] is True
1162        assert roadmap["title"] == "Generic Roadmap"
1163        assert roadmap["status"] == "active"
1164        assert roadmap["milestones"][0]["title"] == "Foundation"
1165        assert roadmap["milestones"][0]["features"][0]["title"] == "Create first checkpoint"
1166        assert job["metadata"]["last_agent_update"]["metadata"]["roadmap_status"] == "active"
1167    finally:
1168        db.close()
1169
1170
1171def test_record_roadmap_dedupes_milestone_titles_even_when_keys_change(tmp_path):
1172    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1173    db = AgentDB(tmp_path / "state.db")
1174    try:
1175        job_id = db.create_job("Keep broad work coordinated")
1176        run_id = db.start_run(job_id, model="fake")
1177        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_roadmap")
1178        ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
1179
1180        DEFAULT_REGISTRY.handle(
1181            "record_roadmap",
1182            {
1183                "title": "Generic Roadmap",
1184                "milestones": [{
1185                    "key": "initial-key",
1186                    "title": "Foundation",
1187                    "status": "planned",
1188                    "features": [{"key": "feature-a", "title": "First feature", "status": "planned"}],
1189                }],
1190            },
1191            ctx,
1192        )
1193        DEFAULT_REGISTRY.handle(
1194            "record_roadmap",
1195            {
1196                "title": "Generic Roadmap",
1197                "milestones": [{
1198                    "key": "model-invented-key",
1199                    "title": "Foundation",
1200                    "status": "active",
1201                    "features": [{"key": "different-feature-key", "title": "First feature", "status": "done"}],
1202                }],
1203            },
1204            ctx,
1205        )
1206        roadmap = db.get_job(job_id)["metadata"]["roadmap"]
1207
1208        assert len(roadmap["milestones"]) == 1
1209        assert roadmap["milestones"][0]["status"] == "active"
1210        assert len(roadmap["milestones"][0]["features"]) == 1
1211        assert roadmap["milestones"][0]["features"][0]["status"] == "done"
1212    finally:
1213        db.close()
1214
1215
1216def test_record_milestone_validation_creates_follow_up_tasks(tmp_path):
1217    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1218    db = AgentDB(tmp_path / "state.db")
1219    try:
1220        job_id = db.create_job("Validate broad work")
1221        run_id = db.start_run(job_id, model="fake")
1222        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_milestone_validation")
1223        ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
1224
1225        raw = DEFAULT_REGISTRY.handle(
1226            "record_milestone_validation",
1227            {
1228                "milestone": "Foundation",
1229                "validation_status": "failed",
1230                "result": "Missing durable evidence.",
1231                "issues": ["no artifact"],
1232                "next_action": "Create evidence.",
1233                "follow_up_tasks": [{
1234                    "title": "Produce missing evidence",
1235                    "output_contract": "artifact",
1236                    "acceptance_criteria": "saved output exists",
1237                }],
1238            },
1239            ctx,
1240        )
1241        result = json.loads(raw)
1242        job = db.get_job(job_id)
1243        roadmap = job["metadata"]["roadmap"]
1244
1245        assert result["success"] is True
1246        assert result["validation"]["validation_status"] == "failed"
1247        assert result["follow_up_tasks"][0]["title"] == "Produce missing evidence"
1248        assert roadmap["milestones"][0]["status"] == "blocked"
1249        assert job["metadata"]["task_queue"][0]["parent"] == "Foundation"
1250        assert job["metadata"]["last_agent_update"]["metadata"]["validation_status"] == "failed"
1251    finally:
1252        db.close()
1253
1254
1255def test_record_milestone_validation_requires_evidence_for_passed_status(tmp_path):
1256    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1257    db = AgentDB(tmp_path / "state.db")
1258    try:
1259        job_id = db.create_job("Validate broad work")
1260        run_id = db.start_run(job_id, model="fake")
1261        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_milestone_validation")
1262        ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
1263
1264        raw = DEFAULT_REGISTRY.handle(
1265            "record_milestone_validation",
1266            {
1267                "milestone": "Foundation",
1268                "validation_status": "passed",
1269            },
1270            ctx,
1271        )
1272        result = json.loads(raw)
1273
1274        assert result["success"] is False
1275        assert result["error"] == "passed milestone validation requires evidence or result"
1276        assert db.get_job(job_id)["metadata"].get("roadmap") is None
1277    finally:
1278        db.close()
1279
1280
1281def test_record_milestone_validation_allows_passed_status_with_metadata_evidence(tmp_path):
1282    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1283    db = AgentDB(tmp_path / "state.db")
1284    try:
1285        job_id = db.create_job("Validate broad work")
1286        run_id = db.start_run(job_id, model="fake")
1287        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_milestone_validation")
1288        ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
1289
1290        raw = DEFAULT_REGISTRY.handle(
1291            "record_milestone_validation",
1292            {
1293                "milestone": "Foundation",
1294                "validation_status": "passed",
1295                "metadata": {"artifact_id": "art_123"},
1296            },
1297            ctx,
1298        )
1299        result = json.loads(raw)
1300
1301        assert result["success"] is True
1302        assert result["validation"]["validation_status"] == "passed"
1303    finally:
1304        db.close()
1305
1306
1307def test_record_milestone_validation_requires_gap_for_failed_or_blocked_status(tmp_path):
1308    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1309    db = AgentDB(tmp_path / "state.db")
1310    try:
1311        job_id = db.create_job("Validate broad work")
1312        run_id = db.start_run(job_id, model="fake")
1313        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_milestone_validation")
1314        ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
1315
1316        failed = json.loads(DEFAULT_REGISTRY.handle(
1317            "record_milestone_validation",
1318            {
1319                "milestone": "Foundation",
1320                "validation_status": "failed",
1321            },
1322            ctx,
1323        ))
1324        blocked = json.loads(DEFAULT_REGISTRY.handle(
1325            "record_milestone_validation",
1326            {
1327                "milestone": "Foundation",
1328                "validation_status": "blocked",
1329            },
1330            ctx,
1331        ))
1332
1333        assert failed["success"] is False
1334        assert failed["error"] == "failed milestone validation requires a gap, issue, evidence, next_action, or follow-up task"
1335        assert blocked["success"] is False
1336        assert blocked["error"] == "blocked milestone validation requires a gap, issue, evidence, next_action, or follow-up task"
1337        assert db.get_job(job_id)["metadata"].get("roadmap") is None
1338    finally:
1339        db.close()
1340
1341
1342def test_record_experiment_tool_tracks_best_measured_result(tmp_path):
1343    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1344    db = AgentDB(tmp_path / "state.db")
1345    try:
1346        job_id = db.create_job("Improve a measurable process")
1347        run_id = db.start_run(job_id, model="fake")
1348        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_experiment")
1349        ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
1350
1351        first = DEFAULT_REGISTRY.handle(
1352            "record_experiment",
1353            {
1354                "title": "baseline attempt",
1355                "status": "measured",
1356                "metric_name": "score",
1357                "metric_value": 2.0,
1358                "metric_unit": "units",
1359                "higher_is_better": True,
1360                "config": {"variant": "a"},
1361                "result": "baseline measured",
1362                "next_action": "try variant b",
1363            },
1364            ctx,
1365        )
1366        second = DEFAULT_REGISTRY.handle(
1367            "record_experiment",
1368            {
1369                "title": "second attempt",
1370                "status": "measured",
1371                "metric_name": "score",
1372                "metric_value": 3.5,
1373                "metric_unit": "units",
1374                "higher_is_better": True,
1375                "config": {"variant": "b"},
1376                "result": "improved",
1377                "next_action": "test a different branch",
1378            },
1379            ctx,
1380        )
1381        job = db.get_job(job_id)
1382        experiments = job["metadata"]["experiment_ledger"]
1383
1384        assert json.loads(first)["experiment"]["best_observed"] is True
1385        assert json.loads(second)["experiment"]["best_observed"] is True
1386        assert experiments[0]["best_observed"] is False
1387        assert experiments[1]["best_observed"] is True
1388        assert experiments[1]["delta_from_previous_best"] == 1.5
1389        assert job["metadata"]["best_experiment_record"]["title"] == "second attempt"
1390        assert job["metadata"]["last_agent_update"]["metadata"]["best_observed"] is True
1391    finally:
1392        db.close()
1393
1394
1395def test_record_experiment_synthesizes_missing_title(tmp_path):
1396    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1397    db = AgentDB(tmp_path / "state.db")
1398    try:
1399        job_id = db.create_job("Improve a measurable process")
1400        run_id = db.start_run(job_id, model="fake")
1401        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_experiment")
1402        ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
1403
1404        raw = DEFAULT_REGISTRY.handle(
1405            "record_experiment",
1406            {
1407                "status": "planned",
1408                "metric_name": "download_progress_bytes",
1409                "result": "download incomplete",
1410            },
1411            ctx,
1412        )
1413        result = json.loads(raw)
1414
1415        assert result["success"] is True
1416        assert result["experiment"]["title"] == "download_progress_bytes"
1417    finally:
1418        db.close()
1419
1420
1421def test_record_experiment_requires_next_action_for_closed_trials(tmp_path):
1422    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1423    db = AgentDB(tmp_path / "state.db")
1424    try:
1425        job_id = db.create_job("Improve a measurable process")
1426        run_id = db.start_run(job_id, model="fake")
1427        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_experiment")
1428        ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
1429
1430        raw = DEFAULT_REGISTRY.handle(
1431            "record_experiment",
1432            {
1433                "title": "blocked attempt",
1434                "status": "blocked",
1435                "metric_name": "score",
1436                "result": "no valid measurement",
1437            },
1438            ctx,
1439        )
1440        result = json.loads(raw)
1441
1442        assert result["success"] is False
1443        assert result["error"] == "next_action is required for measured, failed, blocked, or skipped experiments"
1444        assert db.get_job(job_id)["metadata"].get("experiment_ledger") is None
1445    finally:
1446        db.close()
1447
1448
1449def test_record_experiment_requires_context_for_closed_non_measured_trials(tmp_path):
1450    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1451    db = AgentDB(tmp_path / "state.db")
1452    try:
1453        job_id = db.create_job("Improve a measurable process")
1454        run_id = db.start_run(job_id, model="fake")
1455        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_experiment")
1456        ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
1457
1458        raw = DEFAULT_REGISTRY.handle(
1459            "record_experiment",
1460            {
1461                "title": "blocked attempt",
1462                "status": "blocked",
1463                "metric_name": "score",
1464                "next_action": "try a different branch",
1465            },
1466            ctx,
1467        )
1468        result = json.loads(raw)
1469
1470        assert result["success"] is False
1471        assert result["error"] == "blocked experiments require result, evidence, config, or metadata"
1472        assert db.get_job(job_id)["metadata"].get("experiment_ledger") is None
1473    finally:
1474        db.close()
1475
1476
1477def test_record_experiment_accepts_blocked_trial_with_context(tmp_path):
1478    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1479    db = AgentDB(tmp_path / "state.db")
1480    try:
1481        job_id = db.create_job("Improve a measurable process")
1482        run_id = db.start_run(job_id, model="fake")
1483        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_experiment")
1484        ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
1485
1486        raw = DEFAULT_REGISTRY.handle(
1487            "record_experiment",
1488            {
1489                "title": "blocked attempt",
1490                "status": "blocked",
1491                "metric_name": "score",
1492                "result": "required input was unavailable",
1493                "next_action": "try a different branch",
1494            },
1495            ctx,
1496        )
1497        result = json.loads(raw)
1498
1499        assert result["success"] is True
1500        assert result["experiment"]["status"] == "blocked"
1501        assert result["experiment"]["result"] == "required input was unavailable"
1502    finally:
1503        db.close()
1504
1505
1506def test_record_experiment_requires_metric_for_measured_trials(tmp_path):
1507    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1508    db = AgentDB(tmp_path / "state.db")
1509    try:
1510        job_id = db.create_job("Improve a measurable process")
1511        run_id = db.start_run(job_id, model="fake")
1512        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_experiment")
1513        ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
1514
1515        missing_value = json.loads(DEFAULT_REGISTRY.handle(
1516            "record_experiment",
1517            {
1518                "title": "invalid measurement",
1519                "status": "measured",
1520                "metric_name": "score",
1521                "result": "looked better but no numeric metric",
1522                "next_action": "run a real measurement",
1523            },
1524            ctx,
1525        ))
1526        missing_name = json.loads(DEFAULT_REGISTRY.handle(
1527            "record_experiment",
1528            {
1529                "title": "invalid measurement",
1530                "status": "measured",
1531                "metric_value": 2.7,
1532                "result": "numeric result with no metric name",
1533                "next_action": "label the metric and retry",
1534            },
1535            ctx,
1536        ))
1537
1538        assert missing_value["success"] is False
1539        assert missing_value["error"] == "measured experiments require metric_name and numeric metric_value"
1540        assert missing_name["success"] is False
1541        assert missing_name["error"] == "measured experiments require metric_name and numeric metric_value"
1542        assert db.get_job(job_id)["metadata"].get("experiment_ledger") is None
1543    finally:
1544        db.close()
1545
1546
1547def test_record_experiment_accepts_numeric_metric_strings(tmp_path):
1548    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1549    db = AgentDB(tmp_path / "state.db")
1550    try:
1551        job_id = db.create_job("Improve a measurable process")
1552        run_id = db.start_run(job_id, model="fake")
1553        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_experiment")
1554        ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
1555
1556        raw = DEFAULT_REGISTRY.handle(
1557            "record_experiment",
1558            {
1559                "title": "string metric",
1560                "status": "measured",
1561                "metric_name": "score",
1562                "metric_value": "2.7",
1563                "metric_unit": "units",
1564                "result": "measured from output",
1565                "next_action": "try the next branch",
1566            },
1567            ctx,
1568        )
1569        result = json.loads(raw)
1570
1571        assert result["success"] is True
1572        assert result["experiment"]["metric_value"] == 2.7
1573    finally:
1574        db.close()
1575
1576
1577def test_acknowledge_operator_context_tool_marks_context(tmp_path):
1578    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1579    db = AgentDB(tmp_path / "state.db")
1580    try:
1581        job_id = db.create_job("Run with operator corrections")
1582        entry = db.append_operator_message(job_id, "use the corrected target", source="chat")
1583        db.claim_operator_messages(job_id, modes=("steer",), limit=1)
1584        run_id = db.start_run(job_id, model="fake")
1585        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="acknowledge_operator_context")
1586        ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
1587
1588        raw = DEFAULT_REGISTRY.handle(
1589            "acknowledge_operator_context",
1590            {"message_ids": [entry["event_id"]], "summary": "correction incorporated"},
1591            ctx,
1592        )
1593        result = json.loads(raw)
1594        job = db.get_job(job_id)
1595
1596        assert result["success"] is True
1597        assert result["count"] == 1
1598        assert job["metadata"]["operator_messages"][0]["acknowledged_at"]
1599        assert job["metadata"]["last_operator_context_ack"]["summary"] == "correction incorporated"
1600    finally:
1601        db.close()
1602
1603
1604def test_acknowledge_operator_context_requires_active_context(tmp_path):
1605    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1606    db = AgentDB(tmp_path / "state.db")
1607    try:
1608        job_id = db.create_job("Run without operator corrections")
1609        run_id = db.start_run(job_id, model="fake")
1610        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="acknowledge_operator_context")
1611        ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
1612
1613        raw = DEFAULT_REGISTRY.handle(
1614            "acknowledge_operator_context",
1615            {"summary": "ordinary progress note"},
1616            ctx,
1617        )
1618        result = json.loads(raw)
1619
1620        assert result["success"] is False
1621        assert result["recoverable"] is True
1622        assert result["error"] == "no active operator context to acknowledge"
1623        assert "report_update" in result["guidance"]
1624        assert "last_operator_context_ack" not in db.get_job(job_id)["metadata"]
1625    finally:
1626        db.close()
1627
1628
1629def test_record_tasks_accepts_generic_output_contracts(tmp_path):
1630    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1631    db = AgentDB(tmp_path / "state.db")
1632    try:
1633        job_id = db.create_job("Improve measurable process")
1634        run_id = db.start_run(job_id, model="fake")
1635        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_tasks")
1636        ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
1637
1638        raw = DEFAULT_REGISTRY.handle(
1639            "record_tasks",
1640            {
1641                "tasks": [{
1642                    "title": "Run one comparison",
1643                    "status": "open",
1644                    "output_contract": "experiment",
1645                    "acceptance_criteria": "metric recorded",
1646                    "evidence_needed": "command output or artifact",
1647                    "stall_behavior": "record blocker and pivot",
1648                }]
1649            },
1650            ctx,
1651        )
1652        result = json.loads(raw)
1653        task = db.get_job(job_id)["metadata"]["task_queue"][0]
1654
1655        assert result["success"] is True
1656        assert task["output_contract"] == "experiment"
1657        assert task["acceptance_criteria"] == "metric recorded"
1658        assert task["evidence_needed"] == "command output or artifact"
1659        assert task["stall_behavior"] == "record blocker and pivot"
1660    finally:
1661        db.close()
1662
1663
1664def test_record_tasks_promotes_output_contract_from_metadata(tmp_path):
1665    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1666    db = AgentDB(tmp_path / "state.db")
1667    try:
1668        job_id = db.create_job("Improve measurable process")
1669        run_id = db.start_run(job_id, model="fake")
1670        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_tasks")
1671        ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
1672
1673        raw = DEFAULT_REGISTRY.handle(
1674            "record_tasks",
1675            {
1676                "tasks": [{
1677                    "title": "Validate concrete candidate",
1678                    "status": "open",
1679                    "metadata": {"output_contract": "action", "source": "planner"},
1680                    "acceptance_criteria": "candidate is tested",
1681                }]
1682            },
1683            ctx,
1684        )
1685        result = json.loads(raw)
1686        task = db.get_job(job_id)["metadata"]["task_queue"][0]
1687
1688        assert result["success"] is True
1689        assert task["output_contract"] == "action"
1690        assert task["metadata"]["source"] == "planner"
1691    finally:
1692        db.close()
1693
1694
1695def test_record_tasks_downgrades_done_artifact_without_delivery_evidence(tmp_path):
1696    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1697    db = AgentDB(tmp_path / "state.db")
1698    try:
1699        job_id = db.create_job("Update a deliverable")
1700        run_id = db.start_run(job_id, model="fake")
1701        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_tasks")
1702        ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
1703
1704        raw = DEFAULT_REGISTRY.handle(
1705            "record_tasks",
1706            {
1707                "tasks": [{
1708                    "title": "Update report draft",
1709                    "status": "done",
1710                    "output_contract": "artifact",
1711                    "result": "Updated the report",
1712                }]
1713            },
1714            ctx,
1715        )
1716        result = json.loads(raw)
1717        task = db.get_job(job_id)["metadata"]["task_queue"][0]
1718
1719        assert result["success"] is True
1720        assert task["status"] == "active"
1721        assert task["metadata"]["completion_validation"] == "missing_recent_deliverable_evidence"
1722        assert task["metadata"]["claimed_result"] == "Updated the report"
1723    finally:
1724        db.close()
1725
1726
1727def test_record_tasks_downgrades_done_without_result_evidence(tmp_path):
1728    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1729    db = AgentDB(tmp_path / "state.db")
1730    try:
1731        job_id = db.create_job("Validate generic work")
1732        run_id = db.start_run(job_id, model="fake")
1733        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_tasks")
1734        ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
1735
1736        raw = DEFAULT_REGISTRY.handle(
1737            "record_tasks",
1738            {
1739                "tasks": [{
1740                    "title": "Check current branch",
1741                    "status": "done",
1742                    "output_contract": "decision",
1743                }]
1744            },
1745            ctx,
1746        )
1747        result = json.loads(raw)
1748        task = db.get_job(job_id)["metadata"]["task_queue"][0]
1749
1750        assert result["success"] is True
1751        assert task["status"] == "active"
1752        assert task["metadata"]["completion_validation"] == "missing_result_evidence"
1753    finally:
1754        db.close()
1755
1756
1757def test_record_tasks_downgrades_done_research_without_durable_evidence(tmp_path):
1758    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1759    db = AgentDB(tmp_path / "state.db")
1760    try:
1761        job_id = db.create_job("Research a topic")
1762        run_id = db.start_run(job_id, model="fake")
1763        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_tasks")
1764        ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
1765
1766        raw = DEFAULT_REGISTRY.handle(
1767            "record_tasks",
1768            {
1769                "tasks": [{
1770                    "title": "Synthesize source evidence",
1771                    "status": "done",
1772                    "output_contract": "research",
1773                    "result": "Found useful background.",
1774                }]
1775            },
1776            ctx,
1777        )
1778        result = json.loads(raw)
1779        task = db.get_job(job_id)["metadata"]["task_queue"][0]
1780
1781        assert result["success"] is True
1782        assert task["status"] == "active"
1783        assert task["metadata"]["completion_validation"] == "missing_research_evidence"
1784        assert task["metadata"]["claimed_result"] == "Found useful background."
1785    finally:
1786        db.close()
1787
1788
1789def test_record_tasks_allows_done_research_after_source_evidence(tmp_path):
1790    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1791    db = AgentDB(tmp_path / "state.db")
1792    try:
1793        job_id = db.create_job("Research a topic")
1794        run_id = db.start_run(job_id, model="fake")
1795        source_step = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_source")
1796        db.finish_step(source_step, status="completed", summary="source recorded", output_data={"success": True})
1797        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_tasks")
1798        ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
1799
1800        raw = DEFAULT_REGISTRY.handle(
1801            "record_tasks",
1802            {
1803                "tasks": [{
1804                    "title": "Synthesize source evidence",
1805                    "status": "done",
1806                    "output_contract": "research",
1807                    "result": "Source ledger records the useful branch.",
1808                }]
1809            },
1810            ctx,
1811        )
1812        result = json.loads(raw)
1813        task = db.get_job(job_id)["metadata"]["task_queue"][0]
1814
1815        assert result["success"] is True
1816        assert task["status"] == "done"
1817        assert "completion_validation" not in task.get("metadata", {})
1818    finally:
1819        db.close()
1820
1821
1822def test_record_tasks_allows_done_research_with_metadata_evidence(tmp_path):
1823    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1824    db = AgentDB(tmp_path / "state.db")
1825    try:
1826        job_id = db.create_job("Research a topic")
1827        run_id = db.start_run(job_id, model="fake")
1828        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_tasks")
1829        ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
1830
1831        raw = DEFAULT_REGISTRY.handle(
1832            "record_tasks",
1833            {
1834                "tasks": [{
1835                    "title": "Synthesize source evidence",
1836                    "status": "done",
1837                    "output_contract": "research",
1838                    "metadata": {"source_url": "https://example.com/source"},
1839                }]
1840            },
1841            ctx,
1842        )
1843        task = db.get_job(job_id)["metadata"]["task_queue"][0]
1844
1845        assert json.loads(raw)["success"] is True
1846        assert task["status"] == "done"
1847        assert "completion_validation" not in task.get("metadata", {})
1848    finally:
1849        db.close()
1850
1851
1852def test_record_tasks_downgrades_done_experiment_without_measurement_evidence(tmp_path):
1853    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1854    db = AgentDB(tmp_path / "state.db")
1855    try:
1856        job_id = db.create_job("Improve a measurable process")
1857        run_id = db.start_run(job_id, model="fake")
1858        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_tasks")
1859        ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
1860
1861        raw = DEFAULT_REGISTRY.handle(
1862            "record_tasks",
1863            {
1864                "tasks": [{
1865                    "title": "Run comparison",
1866                    "status": "done",
1867                    "output_contract": "experiment",
1868                    "result": "The comparison improved.",
1869                }]
1870            },
1871            ctx,
1872        )
1873        task = db.get_job(job_id)["metadata"]["task_queue"][0]
1874
1875        assert json.loads(raw)["success"] is True
1876        assert task["status"] == "active"
1877        assert task["metadata"]["completion_validation"] == "missing_experiment_evidence"
1878    finally:
1879        db.close()
1880
1881
1882def test_record_tasks_allows_done_experiment_after_measurement_evidence(tmp_path):
1883    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1884    db = AgentDB(tmp_path / "state.db")
1885    try:
1886        job_id = db.create_job("Improve a measurable process")
1887        run_id = db.start_run(job_id, model="fake")
1888        experiment_step = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_experiment")
1889        db.finish_step(experiment_step, status="completed", summary="experiment measured", output_data={"success": True})
1890        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_tasks")
1891        ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
1892
1893        raw = DEFAULT_REGISTRY.handle(
1894            "record_tasks",
1895            {
1896                "tasks": [{
1897                    "title": "Run comparison",
1898                    "status": "done",
1899                    "output_contract": "experiment",
1900                    "result": "Experiment ledger records the measured comparison.",
1901                }]
1902            },
1903            ctx,
1904        )
1905        task = db.get_job(job_id)["metadata"]["task_queue"][0]
1906
1907        assert json.loads(raw)["success"] is True
1908        assert task["status"] == "done"
1909        assert "completion_validation" not in task.get("metadata", {})
1910    finally:
1911        db.close()
1912
1913
1914def test_record_tasks_downgrades_done_action_after_read_only_shell(tmp_path):
1915    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1916    db = AgentDB(tmp_path / "state.db")
1917    try:
1918        job_id = db.create_job("Change a local workspace")
1919        run_id = db.start_run(job_id, model="fake")
1920        shell_step = db.add_step(
1921            job_id=job_id,
1922            run_id=run_id,
1923            kind="tool",
1924            tool_name="shell_exec",
1925            input_data={"arguments": {"command": "ls -la"}},
1926        )
1927        db.finish_step(shell_step, status="completed", summary="shell_exec rc=0")
1928        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_tasks")
1929        ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
1930
1931        raw = DEFAULT_REGISTRY.handle(
1932            "record_tasks",
1933            {
1934                "tasks": [{
1935                    "title": "Apply change",
1936                    "status": "done",
1937                    "output_contract": "action",
1938                    "result": "Inspected the workspace.",
1939                }]
1940            },
1941            ctx,
1942        )
1943        task = db.get_job(job_id)["metadata"]["task_queue"][0]
1944
1945        assert json.loads(raw)["success"] is True
1946        assert task["status"] == "active"
1947        assert task["metadata"]["completion_validation"] == "missing_action_evidence"
1948    finally:
1949        db.close()
1950
1951
1952def test_record_tasks_allows_done_action_after_action_shell(tmp_path):
1953    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1954    db = AgentDB(tmp_path / "state.db")
1955    try:
1956        job_id = db.create_job("Change a local workspace")
1957        run_id = db.start_run(job_id, model="fake")
1958        shell_step = db.add_step(
1959            job_id=job_id,
1960            run_id=run_id,
1961            kind="tool",
1962            tool_name="shell_exec",
1963            input_data={"arguments": {"command": "python run_branch.py"}},
1964        )
1965        db.finish_step(shell_step, status="completed", summary="shell_exec rc=0")
1966        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_tasks")
1967        ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
1968
1969        raw = DEFAULT_REGISTRY.handle(
1970            "record_tasks",
1971            {
1972                "tasks": [{
1973                    "title": "Apply change",
1974                    "status": "done",
1975                    "output_contract": "action",
1976                    "result": "Ran the action branch.",
1977                }]
1978            },
1979            ctx,
1980        )
1981        task = db.get_job(job_id)["metadata"]["task_queue"][0]
1982
1983        assert json.loads(raw)["success"] is True
1984        assert task["status"] == "done"
1985        assert "completion_validation" not in task.get("metadata", {})
1986    finally:
1987        db.close()
1988
1989
1990def test_record_tasks_downgrades_done_monitor_without_defer_evidence(tmp_path):
1991    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1992    db = AgentDB(tmp_path / "state.db")
1993    try:
1994        job_id = db.create_job("Monitor long-running work")
1995        run_id = db.start_run(job_id, model="fake")
1996        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_tasks")
1997        ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
1998
1999        raw = DEFAULT_REGISTRY.handle(
2000            "record_tasks",
2001            {
2002                "tasks": [{
2003                    "title": "Wait and check later",
2004                    "status": "done",
2005                    "output_contract": "monitor",
2006                    "result": "Will check later.",
2007                }]
2008            },
2009            ctx,
2010        )
2011        task = db.get_job(job_id)["metadata"]["task_queue"][0]
2012
2013        assert json.loads(raw)["success"] is True
2014        assert task["status"] == "active"
2015        assert task["metadata"]["completion_validation"] == "missing_monitor_evidence"
2016    finally:
2017        db.close()
2018
2019
2020def test_record_tasks_allows_done_monitor_after_defer_evidence(tmp_path):
2021    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
2022    db = AgentDB(tmp_path / "state.db")
2023    try:
2024        job_id = db.create_job("Monitor long-running work")
2025        run_id = db.start_run(job_id, model="fake")
2026        defer_step = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="defer_job")
2027        db.finish_step(defer_step, status="completed", summary="deferred")
2028        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_tasks")
2029        ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
2030
2031        raw = DEFAULT_REGISTRY.handle(
2032            "record_tasks",
2033            {
2034                "tasks": [{
2035                    "title": "Wait and check later",
2036                    "status": "done",
2037                    "output_contract": "monitor",
2038                    "result": "A monitor/defer branch is scheduled.",
2039                }]
2040            },
2041            ctx,
2042        )
2043        task = db.get_job(job_id)["metadata"]["task_queue"][0]
2044
2045        assert json.loads(raw)["success"] is True
2046        assert task["status"] == "done"
2047        assert "completion_validation" not in task.get("metadata", {})
2048    finally:
2049        db.close()
2050
2051
2052def test_record_tasks_allows_done_artifact_after_delivery_evidence(tmp_path):
2053    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
2054    db = AgentDB(tmp_path / "state.db")
2055    try:
2056        job_id = db.create_job("Update a deliverable")
2057        run_id = db.start_run(job_id, model="fake")
2058        artifact_step = db.add_step(
2059            job_id=job_id,
2060            run_id=run_id,
2061            kind="tool",
2062            tool_name="write_artifact",
2063            input_data={"arguments": {"title": "Final report draft", "summary": "Updated report deliverable"}},
2064        )
2065        db.finish_step(artifact_step, status="completed", summary="write_artifact saved art_demo")
2066        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_tasks")
2067        ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
2068
2069        raw = DEFAULT_REGISTRY.handle(
2070            "record_tasks",
2071            {
2072                "tasks": [{
2073                    "title": "Update report draft",
2074                    "status": "done",
2075                    "output_contract": "artifact",
2076                    "result": "Saved final report draft",
2077                }]
2078            },
2079            ctx,
2080        )
2081        result = json.loads(raw)
2082        task = db.get_job(job_id)["metadata"]["task_queue"][0]
2083
2084        assert result["success"] is True
2085        assert task["status"] == "done"
2086        assert "completion_validation" not in task.get("metadata", {})
2087    finally:
2088        db.close()
2089
2090
2091def test_record_tasks_does_not_treat_stderr_redirect_as_delivery_write(tmp_path):
2092    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
2093    db = AgentDB(tmp_path / "state.db")
2094    try:
2095        job_id = db.create_job("Update a deliverable")
2096        run_id = db.start_run(job_id, model="fake")
2097        shell_step = db.add_step(
2098            job_id=job_id,
2099            run_id=run_id,
2100            kind="tool",
2101            tool_name="shell_exec",
2102            input_data={"arguments": {"command": "cat draft.md 2>/dev/null"}},
2103        )
2104        db.finish_step(shell_step, status="completed", summary="shell_exec rc=0")
2105        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_tasks")
2106        ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
2107
2108        raw = DEFAULT_REGISTRY.handle(
2109            "record_tasks",
2110            {
2111                "tasks": [{
2112                    "title": "Update report draft",
2113                    "status": "done",
2114                    "output_contract": "artifact",
2115                    "result": "Saved final report draft",
2116                }]
2117            },
2118            ctx,
2119        )
2120        result = json.loads(raw)
2121        task = db.get_job(job_id)["metadata"]["task_queue"][0]
2122
2123        assert result["success"] is True
2124        assert task["status"] == "active"
2125        assert task["metadata"]["completion_validation"] == "missing_recent_deliverable_evidence"
2126    finally:
2127        db.close()
2128
2129
2130def test_record_tasks_rejects_checkpoint_as_delivery_evidence(tmp_path):
2131    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
2132    db = AgentDB(tmp_path / "state.db")
2133    try:
2134        job_id = db.create_job("Update a deliverable")
2135        run_id = db.start_run(job_id, model="fake")
2136        artifact_step = db.add_step(
2137            job_id=job_id,
2138            run_id=run_id,
2139            kind="tool",
2140            tool_name="write_artifact",
2141            input_data={"arguments": {"title": "Compiled report checkpoint", "summary": "Checkpoint before final rewrite"}},
2142        )
2143        db.finish_step(artifact_step, status="completed", summary="write_artifact saved art_demo")
2144        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_tasks")
2145        ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
2146
2147        raw = DEFAULT_REGISTRY.handle(
2148            "record_tasks",
2149            {
2150                "tasks": [{
2151                    "title": "Update report draft",
2152                    "status": "done",
2153                    "output_contract": "artifact",
2154                    "result": "Saved final report draft",
2155                }]
2156            },
2157            ctx,
2158        )
2159        result = json.loads(raw)
2160        task = db.get_job(job_id)["metadata"]["task_queue"][0]
2161
2162        assert result["success"] is True
2163        assert task["status"] == "active"
2164        assert task["metadata"]["completion_validation"] == "missing_recent_deliverable_evidence"
2165    finally:
2166        db.close()
tests/nipux_cli/test_uninstall.py 136 lines
   1import subprocess
   2from pathlib import Path
   3
   4from nipux_cli.uninstall import build_uninstall_plan, installed_tool_paths, uninstall_installed_tool, uninstall_runtime
   5
   6
   7def _completed(*_args, **_kwargs):
   8    return subprocess.CompletedProcess(args=[], returncode=0)
   9
  10
  11def test_uninstall_plan_includes_runtime_and_legacy_state(monkeypatch, tmp_path):
  12    home = tmp_path / "user"
  13    profile = tmp_path / "profile"
  14    monkeypatch.setenv("HOME", str(home))
  15    monkeypatch.setenv("NIPUX_HOME", str(profile))
  16
  17    plan = build_uninstall_plan()
  18
  19    assert profile in plan.paths
  20    assert home / ".nipux" in plan.paths
  21    assert home / ".kneepucks" in plan.paths
  22    assert home / "Library" / "LaunchAgents" / "com.nipux.agent.plist" in plan.service_paths
  23    assert home / ".config" / "systemd" / "user" / "nipux.service" in plan.service_paths
  24
  25
  26def test_uninstall_plan_includes_configured_runtime_home(monkeypatch, tmp_path):
  27    home = tmp_path / "user"
  28    profile = tmp_path / "profile"
  29    configured = tmp_path / "configured"
  30    monkeypatch.setenv("HOME", str(home))
  31    monkeypatch.setenv("NIPUX_HOME", str(profile))
  32
  33    plan = build_uninstall_plan(runtime_home=configured)
  34
  35    assert configured in plan.paths
  36    assert profile in plan.paths
  37
  38
  39def test_uninstall_runtime_removes_state_and_service_files(monkeypatch, tmp_path):
  40    home = tmp_path / "user"
  41    profile = tmp_path / "profile"
  42    monkeypatch.setenv("HOME", str(home))
  43    monkeypatch.setenv("NIPUX_HOME", str(profile))
  44
  45    paths = [
  46        profile,
  47        home / ".nipux",
  48        home / ".kneepucks",
  49        home / "Library" / "LaunchAgents",
  50        home / ".config" / "systemd" / "user",
  51    ]
  52    for path in paths:
  53        path.mkdir(parents=True, exist_ok=True)
  54    (home / "Library" / "LaunchAgents" / "com.nipux.agent.plist").write_text("plist", encoding="utf-8")
  55    (home / ".config" / "systemd" / "user" / "nipux.service").write_text("unit", encoding="utf-8")
  56    (profile / "state.db").write_text("state", encoding="utf-8")
  57
  58    lines = uninstall_runtime(runner=_completed)
  59
  60    assert any("removed" in line and str(profile) in line for line in lines)
  61    assert not profile.exists()
  62    assert not (home / ".nipux").exists()
  63    assert not (home / ".kneepucks").exists()
  64    assert not (home / "Library" / "LaunchAgents" / "com.nipux.agent.plist").exists()
  65    assert not (home / ".config" / "systemd" / "user" / "nipux.service").exists()
  66
  67
  68def test_uninstall_runtime_dry_run_keeps_files(monkeypatch, tmp_path):
  69    home = tmp_path / "user"
  70    profile = tmp_path / "profile"
  71    monkeypatch.setenv("HOME", str(home))
  72    monkeypatch.setenv("NIPUX_HOME", str(profile))
  73    profile.mkdir(parents=True)
  74
  75    lines = uninstall_runtime(dry_run=True, runner=_completed)
  76
  77    assert any("would remove" in line and str(profile) in line for line in lines)
  78    assert profile.exists()
  79
  80
  81def test_uninstall_installed_tool_uses_uv_when_available(monkeypatch, tmp_path):
  82    home = tmp_path / "user"
  83    monkeypatch.setenv("HOME", str(home))
  84    monkeypatch.setattr("nipux_cli.uninstall.shutil.which", lambda name: "/usr/bin/uv" if name == "uv" else None)
  85    calls = []
  86
  87    def runner(command):
  88        calls.append(tuple(command))
  89        return subprocess.CompletedProcess(command, 0, stdout="Uninstalled 1 executable: nipux\n")
  90
  91    code, lines = uninstall_installed_tool(runner=runner)
  92
  93    assert code == 0
  94    assert calls == [("/usr/bin/uv", "tool", "uninstall", "nipux")]
  95    assert "Uninstalled 1 executable: nipux" in "\n".join(lines)
  96
  97
  98def test_uninstall_installed_tool_falls_back_to_safe_uv_paths(monkeypatch, tmp_path):
  99    home = tmp_path / "user"
 100    shim = home / ".local" / "bin" / "nipux"
 101    tool_dir = home / ".local" / "share" / "uv" / "tools" / "nipux"
 102    tool_bin = tool_dir / "bin"
 103    tool_bin.mkdir(parents=True)
 104    shim.parent.mkdir(parents=True)
 105    target = tool_bin / "nipux"
 106    target.write_text("script", encoding="utf-8")
 107    shim.symlink_to(target)
 108    monkeypatch.setenv("HOME", str(home))
 109
 110    def which(name):
 111        if name == "nipux":
 112            return str(shim)
 113        return None
 114
 115    monkeypatch.setattr("nipux_cli.uninstall.shutil.which", which)
 116
 117    code, lines = uninstall_installed_tool()
 118
 119    assert code == 0
 120    assert not shim.exists()
 121    assert not tool_dir.exists()
 122    rendered = "\n".join(lines)
 123    assert "uv not found; checking safe local tool paths" in rendered
 124    assert f"removed {shim}" in rendered
 125    assert f"removed {tool_dir}" in rendered
 126
 127
 128def test_installed_tool_paths_ignore_non_user_tool(monkeypatch, tmp_path):
 129    home = tmp_path / "user"
 130    monkeypatch.setenv("HOME", str(home))
 131    monkeypatch.setattr("nipux_cli.uninstall.shutil.which", lambda name: "/usr/local/bin/nipux" if name == "nipux" else None)
 132
 133    paths = installed_tool_paths()
 134
 135    assert Path("/usr/local/bin/nipux") not in paths
 136    assert home / ".local" / "bin" / "nipux" in paths
tests/nipux_cli/test_worker.py 11339 lines
   1import json
   2from pathlib import Path
   3
   4from nipux_cli.artifacts import ArtifactStore
   5from nipux_cli.config import AppConfig, ModelConfig, RuntimeConfig
   6from nipux_cli.db import AgentDB
   7from nipux_cli.llm import LLMResponse, LLMResponseError, ScriptedLLM, ToolCall
   8from nipux_cli.worker import (
   9    MAX_WORKER_PROMPT_CHARS,
  10    SYSTEM_PROMPT,
  11    _concrete_evidence_tokens,
  12    _cited_step_numbers,
  13    _extract_candidate_file_paths,
  14    _file_pattern_tokens_for_grounding,
  15    _rank_candidate_file_paths,
  16    _render_worker_prompt,
  17    build_messages,
  18    run_one_step,
  19)
  20
  21
  22class SnapshotRegistry:
  23    def openai_tools(self):
  24        return []
  25
  26    def handle(self, name, args, ctx):
  27        del name, args, ctx
  28        return json.dumps({"success": True, "data": {"snapshot": "short snapshot"}})
  29
  30
  31class SuccessRegistry:
  32    def openai_tools(self):
  33        return []
  34
  35    def handle(self, name, args, ctx):
  36        del ctx
  37        return json.dumps({"success": True, "tool": name, "args": args, "results": []})
  38
  39
  40class MeasuredShellRegistry:
  41    def openai_tools(self):
  42        return []
  43
  44    def handle(self, name, args, ctx):
  45        del args, ctx
  46        if name == "shell_exec":
  47            return json.dumps({"success": True, "command": "run test", "returncode": 0, "stdout": "score 2.7 units/s", "stderr": ""})
  48        return json.dumps({"success": True, "results": []})
  49
  50
  51class DiagnosticShellRegistry:
  52    def openai_tools(self):
  53        return []
  54
  55    def handle(self, name, args, ctx):
  56        del args, ctx
  57        if name == "shell_exec":
  58            return json.dumps({
  59                "success": True,
  60                "command": "df -h && nproc && free -h",
  61                "returncode": 0,
  62                "stdout": "Filesystem Size Used Avail Use% Mounted on\\n/dev/root 233G 198G 23G 90% /\\nCPU COUNT 24\\nRAM 93Gi",
  63                "stderr": "",
  64            })
  65        return json.dumps({"success": True})
  66
  67
  68class TableBenchmarkShellRegistry:
  69    def openai_tools(self):
  70        return []
  71
  72    def handle(self, name, args, ctx):
  73        del args, ctx
  74        if name == "shell_exec":
  75            return json.dumps({
  76                "success": True,
  77                "command": "run benchmark",
  78                "returncode": 0,
  79                "stdout": (
  80                    "| model | test | t/s |\n"
  81                    "| --- | ---: | ---: |\n"
  82                    "| example | pp32 | 5.48 ± 0.11 |\n"
  83                    "| example | tg128 | 3.44 ± 0.05 |\n"
  84                ),
  85                "stderr": "",
  86            })
  87        return json.dumps({"success": True})
  88
  89
  90class FailedTableBenchmarkShellRegistry:
  91    def openai_tools(self):
  92        return []
  93
  94    def handle(self, name, args, ctx):
  95        del args, ctx
  96        if name == "shell_exec":
  97            return json.dumps({
  98                "success": False,
  99                "error": "command timed out",
 100                "command": "run benchmark",
 101                "returncode": 124,
 102                "stdout": (
 103                    "| model | test | t/s |\n"
 104                    "| --- | ---: | ---: |\n"
 105                    "| example | pp32 | 5.48 ± 0.11 |\n"
 106                ),
 107                "stderr": "",
 108            })
 109        return json.dumps({"success": True})
 110
 111
 112class FailedUrlShellRegistry:
 113    def openai_tools(self):
 114        return []
 115
 116    def handle(self, name, args, ctx):
 117        del ctx
 118        if name == "shell_exec":
 119            return json.dumps({
 120                "success": False,
 121                "command": args.get("command"),
 122                "returncode": 0,
 123                "stdout": "401 Unauthorized",
 124                "stderr": "",
 125                "error": (
 126                    "command output indicates authentication or authorization failure "
 127                    "despite exit status 0: 401 Unauthorized"
 128                ),
 129            })
 130        return json.dumps({"success": True})
 131
 132
 133class HangingLLM:
 134    def next_action(self, *, messages, tools):
 135        del messages, tools
 136        import time
 137
 138        time.sleep(5)
 139        return LLMResponse(tool_calls=[ToolCall(name="report_update", arguments={"message": "late"})])
 140
 141
 142class SlowLLM:
 143    def __init__(self, sleep_seconds: float):
 144        self.sleep_seconds = sleep_seconds
 145
 146    def next_action(self, *, messages, tools):
 147        del messages, tools
 148        import time
 149
 150        time.sleep(self.sleep_seconds)
 151        return LLMResponse(tool_calls=[ToolCall(name="report_update", arguments={"message": "slow but recovered"})])
 152
 153
 154class RepairableLLM:
 155    tool_repair = True
 156
 157    def __init__(self, responses):
 158        self.responses = list(responses)
 159        self.messages = []
 160        self.tools = []
 161
 162    def next_action(self, *, messages, tools):
 163        self.messages.append(messages)
 164        self.tools.append(tools)
 165        if not self.responses:
 166            return LLMResponse(content="No response left.")
 167        return self.responses.pop(0)
 168
 169
 170class SourceCodeShellRegistry:
 171    def openai_tools(self):
 172        return []
 173
 174    def handle(self, name, args, ctx):
 175        del args, ctx
 176        if name == "shell_exec":
 177            return json.dumps({
 178                "success": True,
 179                "command": "git show HEAD:nipux_cli/cli.py",
 180                "returncode": 0,
 181                "stdout": 'for index, task in enumerate(plan["tasks"], start=1):\n    rate(plan["tasks"], start=1)\n',
 182                "stderr": "",
 183            })
 184        return json.dumps({"success": True})
 185
 186
 187class LargeShellEvidenceRegistry:
 188    def openai_tools(self):
 189        return []
 190
 191    def handle(self, name, args, ctx):
 192        del args, ctx
 193        if name == "shell_exec":
 194            return json.dumps({
 195                "success": True,
 196                "command": "find . -type f",
 197                "returncode": 0,
 198                "stdout": "\n".join(f"./file_{index}.py" for index in range(200)),
 199                "stderr": "",
 200            })
 201        return json.dumps({"success": True})
 202
 203
 204class ExtractRegistry:
 205    def openai_tools(self):
 206        return []
 207
 208    def handle(self, name, args, ctx):
 209        del args, ctx
 210        if name == "web_extract":
 211            return json.dumps({
 212                "success": True,
 213                "pages": [
 214                    {"url": "https://source.example/a", "text": "useful source text " * 250},
 215                    {"url": "https://source.example/b", "error": "timeout"},
 216                ],
 217            })
 218        return json.dumps({"success": True})
 219
 220
 221class SearchRegistry:
 222    def openai_tools(self):
 223        return []
 224
 225    def handle(self, name, args, ctx):
 226        del args, ctx
 227        if name == "web_search":
 228            return json.dumps({
 229                "success": True,
 230                "query": "durable progress research",
 231                "results": [
 232                    {"title": "Primary reference", "url": "https://source.example/primary"},
 233                    {"title": "Secondary reference", "url": "https://source.example/secondary"},
 234                ],
 235            })
 236        return json.dumps({"success": True})
 237
 238
 239class BrowserAndWebRegistry:
 240    def openai_tools(self, config=None):
 241        del config
 242        return [
 243            {"type": "function", "function": {"name": "browser_navigate", "parameters": {"type": "object"}}},
 244            {"type": "function", "function": {"name": "web_search", "parameters": {"type": "object"}}},
 245        ]
 246
 247    def handle(self, name, args, ctx):
 248        del args, ctx
 249        return json.dumps({"success": True, "tool": name})
 250
 251
 252class CapturingLLM:
 253    def __init__(self, response):
 254        self.response = response
 255        self.messages = None
 256        self.tools = None
 257
 258    def next_action(self, *, messages, tools):
 259        self.messages = messages
 260        self.tools = tools
 261        return self.response
 262
 263
 264class ExplodingLLM:
 265    def next_action(self, *, messages, tools):
 266        del messages, tools
 267        raise AssertionError("LLM should not be called")
 268
 269
 270class AntiBotBrowserRegistry:
 271    def openai_tools(self):
 272        return []
 273
 274    def handle(self, name, args, ctx):
 275        del args, ctx
 276        if name == "browser_snapshot":
 277            return json.dumps({
 278                "success": True,
 279                "data": {
 280                    "origin": "https://source.example/search",
 281                    "snapshot": 'Iframe "Security CAPTCHA" You have been blocked. You are browsing and clicking at a speed much faster than expected.',
 282                },
 283            })
 284        return json.dumps({"success": True})
 285
 286
 287def test_system_prompt_is_contract_first_not_research_first():
 288    assert "Use a contract-first durable cycle" in SYSTEM_PROMPT
 289    assert "Research is only one possible contract" in SYSTEM_PROMPT
 290    assert "Prefer fresh measured or directly observed evidence over stale summaries" in SYSTEM_PROMPT
 291    assert "available local candidate fall" in SYSTEM_PROMPT
 292    assert "Use this durable cycle: discover one source" not in SYSTEM_PROMPT
 293
 294
 295def test_run_one_step_executes_scripted_tool_call(tmp_path):
 296    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
 297    db = AgentDB(tmp_path / "state.db")
 298    try:
 299        job_id = db.create_job("Find 10 durable research findings", title="research", kind="generic")
 300        llm = ScriptedLLM([
 301            LLMResponse(tool_calls=[
 302                ToolCall(
 303                    name="write_artifact",
 304                    arguments={
 305                        "title": "first finding",
 306                        "summary": "smoke finding",
 307                        "content": "Acme Design, https://example.com",
 308                    },
 309                )
 310            ])
 311        ])
 312
 313        result = run_one_step(job_id, config=config, db=db, llm=llm)
 314
 315        assert result.status == "completed"
 316        assert result.tool_name == "write_artifact"
 317        artifacts = db.list_artifacts(job_id)
 318        assert artifacts[0]["title"] == "first finding"
 319        steps = db.list_steps(job_id=job_id)
 320        assert steps[0]["tool_name"] == "write_artifact"
 321        assert steps[0]["status"] == "completed"
 322        memory = db.list_memory(job_id)
 323        assert memory[0]["key"] == "rolling_state"
 324        assert artifacts[0]["id"] in memory[0]["artifact_refs"]
 325    finally:
 326        db.close()
 327
 328
 329def test_run_one_step_records_estimated_usage_for_scripted_model(tmp_path):
 330    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
 331    db = AgentDB(tmp_path / "state.db")
 332    try:
 333        job_id = db.create_job("Summarize progress", title="usage", kind="generic")
 334
 335        run_one_step(
 336            job_id,
 337            config=config,
 338            db=db,
 339            llm=ScriptedLLM([LLMResponse(content="No tool this turn.")]),
 340        )
 341
 342        usage = db.job_token_usage(job_id)
 343        assert usage["calls"] == 1
 344        assert usage["prompt_tokens"] > 0
 345        assert usage["completion_tokens"] > 0
 346        assert usage["estimated_calls"] == 1
 347        event = next(
 348            event
 349            for event in db.list_events(job_id=job_id, event_types=["loop"])
 350            if event.get("title") == "message_end"
 351        )
 352        event_usage = event["metadata"]["usage"]
 353        assert event_usage["prompt_chars"] > 0
 354        assert event_usage["context_length"] == config.model.context_length
 355        assert event_usage["context_fraction"] > 0
 356    finally:
 357        db.close()
 358
 359
 360def test_run_one_step_blocks_content_only_worker_turn(tmp_path):
 361    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
 362    db = AgentDB(tmp_path / "state.db")
 363    try:
 364        job_id = db.create_job("Keep taking bounded tool actions", title="no tool", kind="generic")
 365
 366        result = run_one_step(
 367            job_id,
 368            config=config,
 369            db=db,
 370            llm=ScriptedLLM([LLMResponse(content="What should I do next?")]),
 371        )
 372
 373        assert result.status == "blocked"
 374        assert result.result["error"] == "worker tool call required"
 375        assert "What should I do next?" in result.result["content"]
 376        step = db.list_steps(job_id=job_id)[0]
 377        assert step["kind"] == "assistant"
 378        assert step["status"] == "blocked"
 379        assert step["error"] == "worker tool call required"
 380        prompt = build_messages(
 381            db.get_job(job_id),
 382            db.list_steps(job_id=job_id),
 383            timeline_events=db.list_timeline_events(job_id, limit=30),
 384        )[-1]["content"]
 385        assert "What should I do next?" not in prompt
 386        assert "worker tool call required" in prompt
 387        job = db.get_job(job_id)
 388        assert job["metadata"]["last_agent_update"]["category"] == "blocked"
 389    finally:
 390        db.close()
 391
 392
 393def test_run_one_step_repairs_content_only_worker_turn_with_tool_retry(tmp_path):
 394    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
 395    db = AgentDB(tmp_path / "state.db")
 396    try:
 397        job_id = db.create_job("Keep taking bounded tool actions", title="tool repair", kind="generic")
 398        llm = RepairableLLM([
 399            LLMResponse(content="I should inspect the state next."),
 400            LLMResponse(tool_calls=[ToolCall(name="report_update", arguments={"message": "Continuing with a bounded action."})]),
 401        ])
 402
 403        result = run_one_step(job_id, config=config, db=db, llm=llm)
 404
 405        assert result.status == "completed"
 406        assert result.tool_name == "report_update"
 407        assert len(llm.messages) == 2
 408        assert "did not call a tool" in llm.messages[1][-1]["content"]
 409        steps = db.list_steps(job_id=job_id)
 410        assert len(steps) == 1
 411        assert steps[0]["tool_name"] == "report_update"
 412        usage = db.job_token_usage(job_id)
 413        assert usage["calls"] == 2
 414    finally:
 415        db.close()
 416
 417
 418def test_run_one_step_recovers_repeated_content_only_worker_turns(tmp_path):
 419    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
 420    db = AgentDB(tmp_path / "state.db")
 421    try:
 422        job_id = db.create_job("Keep taking bounded tool actions", title="no tool", kind="generic")
 423        llm = ScriptedLLM([
 424            LLMResponse(content="What should I do next?"),
 425            LLMResponse(content="I can continue if you want."),
 426            LLMResponse(content="Please confirm the next step."),
 427        ])
 428
 429        run_one_step(job_id, config=config, db=db, llm=llm)
 430        run_one_step(job_id, config=config, db=db, llm=llm)
 431        run_one_step(job_id, config=config, db=db, llm=llm)
 432        result = run_one_step(job_id, config=config, db=db, llm=ExplodingLLM())
 433
 434        assert result.status == "completed"
 435        assert result.tool_name == "guard_recovery"
 436        assert result.result["guard_recovery"]["error"] == "worker tool call required"
 437        job = db.get_job(job_id)
 438        assert any(task["title"] == "Resolve guard: worker tool call required" for task in job["metadata"]["task_queue"])
 439    finally:
 440        db.close()
 441
 442
 443def test_run_one_step_records_context_pressure_without_spam(tmp_path):
 444    config = AppConfig(runtime=RuntimeConfig(home=tmp_path), model=ModelConfig(context_length=10_000))
 445    db = AgentDB(tmp_path / "state.db")
 446    try:
 447        job_id = db.create_job("Keep a long-running task stable", title="context pressure", kind="generic")
 448        llm = ScriptedLLM([
 449            LLMResponse(content="first", usage={"prompt_tokens": 7_000, "completion_tokens": 10, "total_tokens": 7_010}),
 450            LLMResponse(content="second", usage={"prompt_tokens": 7_200, "completion_tokens": 10, "total_tokens": 7_210}),
 451            LLMResponse(content="third", usage={"prompt_tokens": 8_600, "completion_tokens": 10, "total_tokens": 8_610}),
 452        ])
 453
 454        run_one_step(job_id, config=config, db=db, llm=llm)
 455        run_one_step(job_id, config=config, db=db, llm=llm)
 456        run_one_step(job_id, config=config, db=db, llm=llm)
 457
 458        pressure_events = [
 459            event
 460            for event in db.list_events(job_id=job_id, event_types=["agent_message"])
 461            if event["metadata"].get("kind") == "context_pressure"
 462        ]
 463        assert len(pressure_events) == 2
 464        assert "Context pressure watch" in pressure_events[0]["body"]
 465        assert "Context pressure high" in pressure_events[1]["body"]
 466        job = db.get_job(job_id)
 467        pressure = job["metadata"]["context_pressure"]
 468        assert pressure["band"] == "high"
 469        assert pressure["prompt_tokens"] == 8_600
 470    finally:
 471        db.close()
 472
 473
 474def test_run_one_step_executes_tool_call_batch_in_order(tmp_path):
 475    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
 476    db = AgentDB(tmp_path / "state.db")
 477    try:
 478        job_id = db.create_job("Build a durable report", title="batch", kind="generic")
 479        llm = ScriptedLLM([
 480            LLMResponse(tool_calls=[
 481                ToolCall(
 482                    name="write_artifact",
 483                    arguments={
 484                        "title": "evidence checkpoint",
 485                        "summary": "first useful output",
 486                        "content": "The worker saved evidence before updating the task queue.",
 487                    },
 488                ),
 489                ToolCall(
 490                    name="record_tasks",
 491                    arguments={
 492                        "tasks": [
 493                            {
 494                                "title": "Review saved output",
 495                                "status": "open",
 496                                "priority": 5,
 497                                "output_contract": "report",
 498                                "acceptance_criteria": "Saved evidence has been inspected and summarized.",
 499                                "evidence_needed": "Artifact reference and concrete next action.",
 500                                "stall_behavior": "Record a lesson and pivot if the artifact is not useful.",
 501                            }
 502                        ]
 503                    },
 504                ),
 505            ])
 506        ])
 507
 508        result = run_one_step(job_id, config=config, db=db, llm=llm)
 509
 510        assert result.status == "completed"
 511        assert result.tool_name == "record_tasks"
 512        steps = db.list_steps(job_id=job_id)
 513        assert [step["tool_name"] for step in steps] == ["write_artifact", "record_tasks"]
 514        assert [step["status"] for step in steps] == ["completed", "completed"]
 515        artifacts = db.list_artifacts(job_id)
 516        assert artifacts[0]["title"] == "evidence checkpoint"
 517        job = db.get_job(job_id)
 518        tasks = job["metadata"]["task_queue"]
 519        assert any(task["title"] == "Review saved output" and task["output_contract"] == "report" for task in tasks)
 520        run = db.list_runs(job_id, limit=1)[0]
 521        assert run["status"] == "completed"
 522    finally:
 523        db.close()
 524
 525
 526def test_write_artifact_reconciles_matching_report_task(tmp_path):
 527    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
 528    db = AgentDB(tmp_path / "state.db")
 529    try:
 530        job_id = db.create_job(
 531            "Write a durable report",
 532            title="report",
 533            kind="generic",
 534            metadata={
 535                "task_queue": [
 536                    {
 537                        "title": "Draft paper - Methods section",
 538                        "status": "open",
 539                        "priority": 5,
 540                        "output_contract": "report",
 541                        "acceptance_criteria": "Methods section is saved as an output.",
 542                    }
 543                ]
 544            },
 545        )
 546        llm = ScriptedLLM([
 547            LLMResponse(tool_calls=[
 548                ToolCall(
 549                    name="write_artifact",
 550                    arguments={
 551                        "title": "Paper Draft - Section 3: Methods",
 552                        "summary": "Methods section for the report",
 553                        "content": "This methods section explains the approach and evidence.",
 554                    },
 555                )
 556            ])
 557        ])
 558
 559        result = run_one_step(job_id, config=config, db=db, llm=llm)
 560
 561        assert result.status == "completed"
 562        job = db.get_job(job_id)
 563        task = job["metadata"]["task_queue"][0]
 564        assert task["status"] == "done"
 565        assert task["metadata"]["auto_reconciled_from_artifact"]
 566        assert "Saved output" in task["result"]
 567        revision_tasks = [
 568            item
 569            for item in job["metadata"]["task_queue"]
 570            if item["status"] == "open" and item.get("metadata", {}).get("source") == "auto_revision_loop"
 571        ]
 572        assert len(revision_tasks) == 1
 573        assert revision_tasks[0]["output_contract"] == "report"
 574        assert revision_tasks[0]["metadata"]["revision_source_artifact_id"]
 575    finally:
 576        db.close()
 577
 578
 579def test_evidence_artifact_does_not_complete_deliverable_task(tmp_path):
 580    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
 581    db = AgentDB(tmp_path / "state.db")
 582    try:
 583        job_id = db.create_job(
 584            "Improve a durable report",
 585            title="report",
 586            kind="generic",
 587            metadata={
 588                "task_queue": [
 589                    {
 590                        "title": "Update report with new citations",
 591                        "status": "open",
 592                        "priority": 5,
 593                        "output_contract": "artifact",
 594                        "acceptance_criteria": "Report text is updated with citations.",
 595                        "evidence_needed": "Updated report draft, not just source notes.",
 596                    }
 597                ]
 598            },
 599        )
 600        llm = ScriptedLLM([
 601            LLMResponse(tool_calls=[
 602                ToolCall(
 603                    name="write_artifact",
 604                    arguments={
 605                        "title": "Evidence: citation sources",
 606                        "summary": "Extracted source notes for citations",
 607                        "content": "These notes describe sources that could later be used in the report.",
 608                    },
 609                )
 610            ])
 611        ])
 612
 613        result = run_one_step(job_id, config=config, db=db, llm=llm)
 614
 615        assert result.status == "completed"
 616        job = db.get_job(job_id)
 617        task = job["metadata"]["task_queue"][0]
 618        assert task["status"] == "open"
 619        assert "auto_reconciled_from_artifact" not in task.get("metadata", {})
 620        assert not [
 621            item
 622            for item in job["metadata"]["task_queue"]
 623            if item.get("metadata", {}).get("source") == "auto_revision_loop"
 624        ]
 625    finally:
 626        db.close()
 627
 628
 629def test_new_deliverable_supersedes_old_auto_revision_task(tmp_path):
 630    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
 631    db = AgentDB(tmp_path / "state.db")
 632    try:
 633        job_id = db.create_job(
 634            "Keep improving a durable report",
 635            title="report",
 636            kind="generic",
 637            metadata={
 638                "task_queue": [
 639                    {
 640                        "title": "Review and revise saved output art_old",
 641                        "status": "open",
 642                        "priority": 4,
 643                        "output_contract": "report",
 644                        "metadata": {
 645                            "source": "auto_revision_loop",
 646                            "revision_source_artifact_id": "art_old",
 647                        },
 648                    }
 649                ]
 650            },
 651        )
 652        llm = ScriptedLLM([
 653            LLMResponse(tool_calls=[
 654                ToolCall(
 655                    name="write_artifact",
 656                    arguments={
 657                        "title": "Report Draft Revision",
 658                        "summary": "Updated durable report draft",
 659                        "content": "This revised report draft supersedes the previous saved output.",
 660                    },
 661                )
 662            ])
 663        ])
 664
 665        result = run_one_step(job_id, config=config, db=db, llm=llm)
 666
 667        assert result.status == "completed"
 668        tasks = db.get_job(job_id)["metadata"]["task_queue"]
 669        old = next(task for task in tasks if task["metadata"].get("revision_source_artifact_id") == "art_old")
 670        new = next(task for task in tasks if task["metadata"].get("revision_source_artifact_id") != "art_old")
 671        assert old["status"] == "skipped"
 672        assert old["metadata"]["superseded_by_artifact_id"]
 673        assert new["status"] == "open"
 674        assert new["metadata"]["source"] == "auto_revision_loop"
 675    finally:
 676        db.close()
 677
 678
 679def test_audit_report_draft_counts_as_deliverable_output(tmp_path):
 680    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
 681    db = AgentDB(tmp_path / "state.db")
 682    try:
 683        job_id = db.create_job(
 684            "Write a durable audit report",
 685            title="audit report",
 686            kind="generic",
 687            metadata={
 688                "task_queue": [
 689                    {
 690                        "title": "Write audit report draft",
 691                        "status": "open",
 692                        "priority": 5,
 693                        "output_contract": "artifact",
 694                        "acceptance_criteria": "A report draft is saved.",
 695                        "evidence_needed": "Saved report draft, not only notes.",
 696                    }
 697                ]
 698            },
 699        )
 700        llm = ScriptedLLM([
 701            LLMResponse(tool_calls=[
 702                ToolCall(
 703                    name="write_artifact",
 704                    arguments={
 705                        "title": "Market Readiness Audit Report Draft",
 706                        "summary": "Saved audit report draft with current findings and recommendations",
 707                        "content": "This is the current audit report draft.",
 708                    },
 709                )
 710            ])
 711        ])
 712
 713        result = run_one_step(job_id, config=config, db=db, llm=llm)
 714
 715        assert result.status == "completed"
 716        job = db.get_job(job_id)
 717        task = job["metadata"]["task_queue"][0]
 718        assert task["status"] == "done"
 719        assert task["metadata"]["auto_reconciled_from_artifact"]
 720    finally:
 721        db.close()
 722
 723
 724def test_checkpoint_artifact_does_not_complete_deliverable_task(tmp_path):
 725    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
 726    db = AgentDB(tmp_path / "state.db")
 727    try:
 728        job_id = db.create_job(
 729            "Compile a durable report",
 730            title="report",
 731            kind="generic",
 732            metadata={
 733                "task_queue": [
 734                    {
 735                        "title": "Compile full report",
 736                        "status": "open",
 737                        "priority": 5,
 738                        "output_contract": "artifact",
 739                        "acceptance_criteria": "Final compiled report is saved.",
 740                    }
 741                ]
 742            },
 743        )
 744        llm = ScriptedLLM([
 745            LLMResponse(tool_calls=[
 746                ToolCall(
 747                    name="write_artifact",
 748                    arguments={
 749                        "title": "Compiled report checkpoint",
 750                        "summary": "Current state checkpoint, not a final compiled report",
 751                        "content": "This checkpoint describes what still needs to be written.",
 752                    },
 753                )
 754            ])
 755        ])
 756
 757        result = run_one_step(job_id, config=config, db=db, llm=llm)
 758
 759        assert result.status == "completed"
 760        job = db.get_job(job_id)
 761        task = job["metadata"]["task_queue"][0]
 762        assert task["status"] == "open"
 763        assert "auto_reconciled_from_artifact" not in task.get("metadata", {})
 764    finally:
 765        db.close()
 766
 767
 768def test_evidence_artifact_can_complete_research_task(tmp_path):
 769    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
 770    db = AgentDB(tmp_path / "state.db")
 771    try:
 772        job_id = db.create_job(
 773            "Gather source evidence",
 774            title="research",
 775            kind="generic",
 776            metadata={
 777                "task_queue": [
 778                    {
 779                        "title": "Collect citation source evidence",
 780                        "status": "open",
 781                        "priority": 5,
 782                        "output_contract": "research",
 783                        "acceptance_criteria": "Evidence sources are saved.",
 784                    }
 785                ]
 786            },
 787        )
 788        llm = ScriptedLLM([
 789            LLMResponse(tool_calls=[
 790                ToolCall(
 791                    name="write_artifact",
 792                    arguments={
 793                        "title": "Evidence: citation sources",
 794                        "summary": "Extracted source evidence",
 795                        "content": "Citation source evidence for later report writing.",
 796                    },
 797                )
 798            ])
 799        ])
 800
 801        result = run_one_step(job_id, config=config, db=db, llm=llm)
 802
 803        assert result.status == "completed"
 804        job = db.get_job(job_id)
 805        task = job["metadata"]["task_queue"][0]
 806        assert task["status"] == "done"
 807        assert task["metadata"]["auto_reconciled_from_artifact"]
 808    finally:
 809        db.close()
 810
 811
 812def test_run_one_step_blocks_artifact_churn_until_progress_accounting(tmp_path):
 813    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
 814    db = AgentDB(tmp_path / "state.db")
 815    try:
 816        job_id = db.create_job("Keep a durable progress ledger", title="ledger", kind="generic")
 817        for index in range(3):
 818            run_id = db.start_run(job_id, model="test")
 819            step_id = db.add_step(
 820                job_id=job_id,
 821                run_id=run_id,
 822                kind="tool",
 823                tool_name="write_artifact",
 824                input_data={"arguments": {"title": f"Output {index}", "content": "notes"}},
 825            )
 826            db.finish_step(
 827                step_id,
 828                status="completed",
 829                summary=f"write_artifact saved art_{index}",
 830                output_data={"success": True, "artifact_id": f"art_{index}"},
 831            )
 832            db.finish_run(run_id, "completed")
 833
 834        blocked = run_one_step(
 835            job_id,
 836            config=config,
 837            db=db,
 838            llm=ScriptedLLM([
 839                LLMResponse(tool_calls=[
 840                    ToolCall(name="write_artifact", arguments={"title": "Another output", "content": "more notes"})
 841                ])
 842            ]),
 843        )
 844
 845        assert blocked.status == "blocked"
 846        assert blocked.result["error"] == "progress accounting required"
 847        allowed = run_one_step(
 848            job_id,
 849            config=config,
 850            db=db,
 851            llm=ScriptedLLM([
 852                LLMResponse(tool_calls=[
 853                    ToolCall(
 854                        name="record_tasks",
 855                        arguments={"tasks": [{"title": "Review saved outputs", "status": "open", "priority": 2}]},
 856                    )
 857                ])
 858            ]),
 859        )
 860        assert allowed.status == "completed"
 861        assert allowed.tool_name == "record_tasks"
 862    finally:
 863        db.close()
 864
 865
 866def test_activity_checkpoint_streak_blocks_more_churn_until_ledger_update(tmp_path):
 867    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
 868    db = AgentDB(tmp_path / "state.db")
 869    try:
 870        job_id = db.create_job("Keep working until durable progress appears", title="stagnation", kind="generic")
 871        db.update_job_metadata(
 872            job_id,
 873            {
 874                "activity_checkpoint_streak": 3,
 875                "last_checkpoint_counts": {
 876                    "findings": 0,
 877                    "sources": 0,
 878                    "tasks": 1,
 879                    "experiments": 0,
 880                    "lessons": 0,
 881                    "milestones": 0,
 882                },
 883            },
 884        )
 885
 886        blocked = run_one_step(
 887            job_id,
 888            config=config,
 889            db=db,
 890            llm=ScriptedLLM([LLMResponse(tool_calls=[ToolCall(name="web_search", arguments={"query": "more background"})])]),
 891        )
 892
 893        assert blocked.status == "blocked"
 894        assert blocked.result["error"] == "durable progress required"
 895
 896        allowed = run_one_step(
 897            job_id,
 898            config=config,
 899            db=db,
 900            llm=ScriptedLLM([
 901                LLMResponse(tool_calls=[ToolCall(name="record_tasks", arguments={"tasks": [{"title": "Pivot branch", "status": "open"}]})])
 902            ]),
 903        )
 904
 905        assert allowed.status == "completed"
 906        assert allowed.tool_name == "record_tasks"
 907    finally:
 908        db.close()
 909
 910
 911def test_task_only_checkpoint_streak_blocks_new_task_sprawl(tmp_path):
 912    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
 913    db = AgentDB(tmp_path / "state.db")
 914    try:
 915        job_id = db.create_job("Keep executing durable work", title="task-sprawl", kind="generic")
 916        db.update_job_metadata(
 917            job_id,
 918            {
 919                "task_planning_checkpoint_streak": 2,
 920                "task_queue": [
 921                    {
 922                        "key": "existing-branch",
 923                        "title": "Existing branch",
 924                        "status": "open",
 925                    }
 926                ],
 927            },
 928        )
 929
 930        blocked = run_one_step(
 931            job_id,
 932            config=config,
 933            db=db,
 934            llm=ScriptedLLM([
 935                LLMResponse(tool_calls=[
 936                    ToolCall(
 937                        name="record_tasks",
 938                        arguments={"tasks": [{"title": "Another open branch", "status": "open"}]},
 939                    )
 940                ])
 941            ]),
 942        )
 943
 944        assert blocked.status == "blocked"
 945        assert blocked.result["error"] == "task execution required"
 946
 947        allowed = run_one_step(
 948            job_id,
 949            config=config,
 950            db=db,
 951            llm=ScriptedLLM([
 952                LLMResponse(tool_calls=[
 953                    ToolCall(
 954                        name="record_tasks",
 955                        arguments={
 956                            "tasks": [
 957                                {
 958                                    "title": "Existing branch",
 959                                    "status": "done",
 960                                    "result": "Executed and checkpointed.",
 961                                }
 962                            ]
 963                        },
 964                    )
 965                ])
 966            ]),
 967        )
 968
 969        assert allowed.status == "completed"
 970        assert allowed.tool_name == "record_tasks"
 971    finally:
 972        db.close()
 973
 974
 975def test_task_only_checkpoint_updates_planning_streak(tmp_path):
 976    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
 977    db = AgentDB(tmp_path / "state.db")
 978    try:
 979        job_id = db.create_job("Track planning-only progress", title="task-streak", kind="generic")
 980        for index in range(9):
 981            run_id = db.start_run(job_id, model="test")
 982            step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="web_search")
 983            db.finish_step(step_id, status="completed", summary=f"search {index}", output_data={"success": True})
 984            db.finish_run(run_id, "completed")
 985
 986        result = run_one_step(
 987            job_id,
 988            config=config,
 989            db=db,
 990            llm=ScriptedLLM([
 991                LLMResponse(tool_calls=[
 992                    ToolCall(name="record_tasks", arguments={"tasks": [{"title": "First branch", "status": "open"}]})
 993                ])
 994            ]),
 995        )
 996
 997        assert result.status == "completed"
 998        job = db.get_job(job_id)
 999        assert job["metadata"]["task_planning_checkpoint_streak"] == 1
1000
1001        db.append_finding_record(job_id, name="Durable finding")
1002        for index in range(9):
1003            run_id = db.start_run(job_id, model="test")
1004            step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="web_search")
1005            db.finish_step(step_id, status="completed", summary=f"search reset {index}", output_data={"success": True})
1006            db.finish_run(run_id, "completed")
1007        result = run_one_step(
1008            job_id,
1009            config=config,
1010            db=db,
1011            llm=ScriptedLLM([
1012                LLMResponse(tool_calls=[
1013                    ToolCall(name="record_tasks", arguments={"tasks": [{"title": "Second branch", "status": "open"}]})
1014                ])
1015            ]),
1016        )
1017
1018        assert result.status == "completed"
1019        job = db.get_job(job_id)
1020        assert job["metadata"]["task_planning_checkpoint_streak"] == 0
1021    finally:
1022        db.close()
1023
1024
1025def test_task_resolution_checkpoint_resets_planning_streak(tmp_path):
1026    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1027    db = AgentDB(tmp_path / "state.db")
1028    try:
1029        job_id = db.create_job("Resolve existing durable branches", title="task-resolution", kind="generic")
1030        db.append_task_record(job_id, title="Existing branch", status="open", priority=5)
1031        db.update_job_metadata(
1032            job_id,
1033            {
1034                "last_checkpoint_counts": {
1035                    "findings": 0,
1036                    "sources": 0,
1037                    "tasks": 1,
1038                    "experiments": 0,
1039                    "lessons": 0,
1040                    "milestones": 0,
1041                },
1042                "last_checkpoint_at": "2026-01-01T00:00:00+00:00",
1043                "task_planning_checkpoint_streak": 2,
1044            },
1045        )
1046        for index in range(9):
1047            run_id = db.start_run(job_id, model="test")
1048            step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="web_search")
1049            db.finish_step(step_id, status="completed", summary=f"search {index}", output_data={"success": True})
1050            db.finish_run(run_id, "completed")
1051
1052        result = run_one_step(
1053            job_id,
1054            config=config,
1055            db=db,
1056            llm=ScriptedLLM([
1057                LLMResponse(tool_calls=[
1058                    ToolCall(
1059                        name="record_tasks",
1060                        arguments={
1061                            "tasks": [
1062                                {
1063                                    "title": "Existing branch",
1064                                    "status": "done",
1065                                    "result": "Resolved using the latest evidence.",
1066                                    "metadata": {"source_url": "file:///tmp/latest-evidence"},
1067                                }
1068                            ]
1069                        },
1070                    )
1071                ])
1072            ]),
1073        )
1074
1075        assert result.status == "completed"
1076        job = db.get_job(job_id)
1077        assert job["metadata"]["task_planning_checkpoint_streak"] == 0
1078        assert job["metadata"]["last_agent_update"]["category"] == "progress"
1079        assert job["metadata"]["last_agent_update"]["metadata"]["updates"]["tasks"] == 1
1080        assert job["metadata"]["last_agent_update"]["metadata"]["resolutions"]["tasks"] == 1
1081    finally:
1082        db.close()
1083
1084
1085def test_run_one_step_blocks_similar_artifact_search(tmp_path):
1086    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1087    db = AgentDB(tmp_path / "state.db")
1088    try:
1089        job_id = db.create_job("Review saved outputs", title="artifact-search", kind="generic")
1090        run_id = db.start_run(job_id, model="test")
1091        step_id = db.add_step(
1092            job_id=job_id,
1093            run_id=run_id,
1094            kind="tool",
1095            tool_name="search_artifacts",
1096            input_data={"arguments": {"query": "distillation agentic paper evidence", "limit": 20}},
1097        )
1098        db.finish_step(
1099            step_id,
1100            status="completed",
1101            summary="search_artifacts returned 0 results",
1102            output_data={"success": True, "results": []},
1103        )
1104        db.finish_run(run_id, "completed")
1105
1106        result = run_one_step(
1107            job_id,
1108            config=config,
1109            db=db,
1110            llm=ScriptedLLM([
1111                LLMResponse(tool_calls=[
1112                    ToolCall(name="search_artifacts", arguments={"query": "paper evidence for agentic distillation", "limit": 20})
1113                ])
1114            ]),
1115        )
1116
1117        assert result.status == "blocked"
1118        assert result.result["error"] == "similar artifact search blocked"
1119        assert result.result["blocked_tool"] == "search_artifacts"
1120    finally:
1121        db.close()
1122
1123
1124def test_run_one_step_blocks_artifact_review_when_tasks_are_exhausted(tmp_path):
1125    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1126    db = AgentDB(tmp_path / "state.db")
1127    try:
1128        job_id = db.create_job(
1129            "Review saved outputs",
1130            title="review-exhausted",
1131            kind="generic",
1132            metadata={"task_queue": [{"title": "Review first output", "status": "done", "priority": 5}]},
1133        )
1134
1135        result = run_one_step(
1136            job_id,
1137            config=config,
1138            db=db,
1139            llm=ScriptedLLM([
1140                LLMResponse(tool_calls=[ToolCall(name="search_artifacts", arguments={"query": "paper evidence"})])
1141            ]),
1142        )
1143
1144        assert result.status == "blocked"
1145        assert result.result["error"] == "task branch required before more work"
1146        assert result.result["blocked_tool"] == "search_artifacts"
1147    finally:
1148        db.close()
1149
1150
1151def test_run_one_step_recovers_repeated_guard_blocks_without_llm(tmp_path):
1152    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1153    db = AgentDB(tmp_path / "state.db")
1154    try:
1155        job_id = db.create_job("Recover repeated blocked work", title="guard", kind="generic")
1156        for index, tool_name in enumerate(["search_artifacts", "shell_exec", "read_artifact"], start=1):
1157            run_id = db.start_run(job_id, model="test")
1158            step_id = db.add_step(
1159                job_id=job_id,
1160                run_id=run_id,
1161                kind="tool",
1162                tool_name=tool_name,
1163                input_data={"arguments": {"query": f"blocked {index}"}},
1164            )
1165            db.finish_step(
1166                step_id,
1167                status="blocked",
1168                summary=f"blocked {tool_name}; progress ledger update required",
1169                output_data={"success": True, "recoverable": True, "error": "progress ledger update required"},
1170            )
1171            db.finish_run(run_id, "completed")
1172
1173        result = run_one_step(job_id, config=config, db=db, llm=ExplodingLLM())
1174
1175        assert result.status == "completed"
1176        assert result.tool_name == "guard_recovery"
1177        assert result.result["guard_recovery"]["error"] == "progress ledger update required"
1178        job = db.get_job(job_id)
1179        assert any(task["title"] == "Resolve guard: progress ledger update required" for task in job["metadata"]["task_queue"])
1180        assert any("Repeated guard block" in lesson["lesson"] for lesson in job["metadata"]["lessons"])
1181    finally:
1182        db.close()
1183
1184
1185def test_guard_recovery_does_not_add_task_for_queue_saturation(tmp_path):
1186    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1187    db = AgentDB(tmp_path / "state.db")
1188    try:
1189        job_id = db.create_job(
1190            "Consolidate a saturated backlog",
1191            title="guard-saturated-tasks",
1192            kind="generic",
1193            metadata={
1194                "task_queue": [
1195                    {"title": f"Existing branch {index}", "status": "open", "priority": index}
1196                    for index in range(40)
1197                ]
1198            },
1199        )
1200        for index in range(3):
1201            run_id = db.start_run(job_id, model="test")
1202            step_id = db.add_step(
1203                job_id=job_id,
1204                run_id=run_id,
1205                kind="tool",
1206                tool_name="record_tasks",
1207                input_data={"arguments": {"tasks": [{"title": f"New branch {index}", "status": "open"}]}},
1208            )
1209            db.finish_step(
1210                step_id,
1211                status="blocked",
1212                summary="blocked record_tasks; total task queue is too large",
1213                output_data={
1214                    "success": False,
1215                    "recoverable": True,
1216                    "error": "task queue saturated",
1217                    "task_queue": {
1218                        "reason": "total task queue is too large",
1219                        "open_count": 40,
1220                        "total_count": 40,
1221                    },
1222                },
1223            )
1224            db.finish_run(run_id, "completed")
1225
1226        result = run_one_step(job_id, config=config, db=db, llm=ExplodingLLM())
1227
1228        assert result.status == "completed"
1229        assert result.tool_name == "guard_recovery"
1230        assert result.result["task_opened"] is False
1231        job = db.get_job(job_id)
1232        tasks = job["metadata"]["task_queue"]
1233        assert len(tasks) == 40
1234        assert not any(task["title"].startswith("Resolve guard:") for task in tasks)
1235        assert job["metadata"]["task_backlog_pressure"]["total_count"] == 40
1236        assert any("Do not open guard-recovery tasks for saturation" in lesson["lesson"] for lesson in job["metadata"]["lessons"])
1237    finally:
1238        db.close()
1239
1240
1241def test_run_one_step_recovers_repeated_evidence_grounding_blocks(tmp_path):
1242    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1243    db = AgentDB(tmp_path / "state.db")
1244    try:
1245        job_id = db.create_job("Recover repeated grounding failures", title="grounding-guard", kind="generic")
1246        for index in range(3):
1247            run_id = db.start_run(job_id, model="test")
1248            step_id = db.add_step(
1249                job_id=job_id,
1250                run_id=run_id,
1251                kind="tool",
1252                tool_name="record_experiment",
1253                input_data={"arguments": {"title": f"Unsupported record {index}"}},
1254            )
1255            db.finish_step(
1256                step_id,
1257                status="blocked",
1258                summary="blocked record_experiment; evidence grounding required",
1259                output_data={"success": False, "recoverable": True, "error": "evidence grounding required"},
1260            )
1261            db.finish_run(run_id, "completed")
1262
1263        result = run_one_step(job_id, config=config, db=db, llm=ExplodingLLM())
1264
1265        assert result.status == "completed"
1266        assert result.tool_name == "guard_recovery"
1267        assert result.result["guard_recovery"]["error"] == "evidence grounding required"
1268    finally:
1269        db.close()
1270
1271
1272def test_run_one_step_recovers_repeated_known_bad_source_blocks(tmp_path):
1273    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1274    db = AgentDB(tmp_path / "state.db")
1275    try:
1276        job_id = db.create_job("Avoid repeatedly blocked sources", title="guard")
1277        for index in range(3):
1278            run_id = db.start_run(job_id, model="test")
1279            step_id = db.add_step(
1280                job_id=job_id,
1281                run_id=run_id,
1282                kind="tool",
1283                tool_name="web_extract",
1284                input_data={"arguments": {"urls": ["https://bad.example/source"]}},
1285            )
1286            db.finish_step(
1287                step_id,
1288                status="blocked",
1289                summary="blocked web_extract; known bad source https://bad.example/source",
1290                output_data={"success": False, "error": "known bad source blocked"},
1291            )
1292            db.finish_run(run_id, "completed")
1293
1294        result = run_one_step(job_id, config=config, db=db, llm=ExplodingLLM())
1295
1296        assert result.status == "completed"
1297        assert result.tool_name == "guard_recovery"
1298        assert result.result["guard_recovery"]["error"] == "known bad source blocked"
1299        job = db.get_job(job_id)
1300        assert any(task["title"] == "Resolve guard: known bad source blocked" for task in job["metadata"]["task_queue"])
1301    finally:
1302        db.close()
1303
1304
1305def test_guard_recovery_does_not_repeat_after_recovery_step(tmp_path):
1306    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1307    db = AgentDB(tmp_path / "state.db")
1308    try:
1309        job_id = db.create_job("Recover repeated blocked work once", title="guard-once", kind="generic")
1310        for index in range(3):
1311            run_id = db.start_run(job_id, model="test")
1312            step_id = db.add_step(
1313                job_id=job_id,
1314                run_id=run_id,
1315                kind="tool",
1316                tool_name="search_artifacts",
1317                input_data={"arguments": {"query": f"blocked {index}"}},
1318            )
1319            db.finish_step(
1320                step_id,
1321                status="blocked",
1322                summary="blocked search_artifacts; progress ledger update required",
1323                output_data={"success": True, "recoverable": True, "error": "progress ledger update required"},
1324            )
1325            db.finish_run(run_id, "completed")
1326
1327        first = run_one_step(job_id, config=config, db=db, llm=ExplodingLLM())
1328        assert first.tool_name == "guard_recovery"
1329
1330        second = run_one_step(
1331            job_id,
1332            config=config,
1333            db=db,
1334            llm=ScriptedLLM([
1335                LLMResponse(tool_calls=[
1336                    ToolCall(name="record_lesson", arguments={"lesson": "Recovered guard and chose a new branch", "category": "strategy"})
1337                ])
1338            ]),
1339        )
1340
1341        assert second.status == "completed"
1342        assert second.tool_name == "record_lesson"
1343        assert [step["tool_name"] for step in db.list_steps(job_id=job_id)[-2:]] == ["guard_recovery", "record_lesson"]
1344    finally:
1345        db.close()
1346
1347
1348def test_guard_recovery_does_not_keep_reopening_same_guard(tmp_path):
1349    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1350    db = AgentDB(tmp_path / "state.db")
1351    try:
1352        job_id = db.create_job("Recover repeated blocked work once", title="guard-repeat", kind="generic")
1353        for batch in range(2):
1354            for index in range(3):
1355                run_id = db.start_run(job_id, model="test")
1356                step_id = db.add_step(
1357                    job_id=job_id,
1358                    run_id=run_id,
1359                    kind="tool",
1360                    tool_name="search_artifacts",
1361                    input_data={"arguments": {"query": f"blocked {batch}-{index}"}},
1362                )
1363                db.finish_step(
1364                    step_id,
1365                    status="blocked",
1366                    summary="blocked search_artifacts; progress ledger update required",
1367                    output_data={"success": False, "recoverable": True, "error": "progress ledger update required"},
1368                )
1369                db.finish_run(run_id, "completed")
1370            result = run_one_step(job_id, config=config, db=db, llm=ExplodingLLM() if batch == 0 else ScriptedLLM([
1371                LLMResponse(tool_calls=[ToolCall(name="record_lesson", arguments={"lesson": "Use a different branch", "category": "strategy"})])
1372            ]))
1373
1374        steps = db.list_steps(job_id=job_id)
1375        assert sum(1 for step in steps if step["tool_name"] == "guard_recovery") == 1
1376        assert result.tool_name == "record_lesson"
1377    finally:
1378        db.close()
1379
1380
1381def test_guard_recovery_reopens_same_guard_after_progress(tmp_path):
1382    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1383    db = AgentDB(tmp_path / "state.db")
1384    try:
1385        job_id = db.create_job("Recover repeated blocked work after progress", title="guard-progress", kind="generic")
1386        for index in range(3):
1387            run_id = db.start_run(job_id, model="test")
1388            step_id = db.add_step(
1389                job_id=job_id,
1390                run_id=run_id,
1391                kind="tool",
1392                tool_name="search_artifacts",
1393                input_data={"arguments": {"query": f"blocked first {index}"}},
1394            )
1395            db.finish_step(
1396                step_id,
1397                status="blocked",
1398                summary="blocked search_artifacts; progress ledger update required",
1399                output_data={"success": False, "recoverable": True, "error": "progress ledger update required"},
1400            )
1401            db.finish_run(run_id, "completed")
1402
1403        first = run_one_step(job_id, config=config, db=db, llm=ExplodingLLM())
1404        assert first.tool_name == "guard_recovery"
1405
1406        run_id = db.start_run(job_id, model="test")
1407        progress_step = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_lesson")
1408        db.finish_step(progress_step, status="completed", output_data={"success": True, "lesson": "Recovered once."})
1409        db.finish_run(run_id, "completed")
1410
1411        for index in range(3):
1412            run_id = db.start_run(job_id, model="test")
1413            step_id = db.add_step(
1414                job_id=job_id,
1415                run_id=run_id,
1416                kind="tool",
1417                tool_name="read_artifact",
1418                input_data={"arguments": {"query": f"blocked second {index}"}},
1419            )
1420            db.finish_step(
1421                step_id,
1422                status="blocked",
1423                summary="blocked read_artifact; progress ledger update required",
1424                output_data={"success": False, "recoverable": True, "error": "progress ledger update required"},
1425            )
1426            db.finish_run(run_id, "completed")
1427
1428        second = run_one_step(job_id, config=config, db=db, llm=ExplodingLLM())
1429        assert second.tool_name == "guard_recovery"
1430        assert sum(1 for step in db.list_steps(job_id=job_id) if step["tool_name"] == "guard_recovery") == 2
1431    finally:
1432        db.close()
1433
1434
1435def test_guard_recovery_accounts_pending_evidence_checkpoint(tmp_path):
1436    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1437    db = AgentDB(tmp_path / "state.db")
1438    try:
1439        job_id = db.create_job("Recover checkpoint accounting deadlock", title="checkpoint-recovery", kind="generic")
1440        db.update_job_metadata(
1441            job_id,
1442            {
1443                "pending_evidence_checkpoint": {
1444                    "artifact_id": "art_checkpoint",
1445                    "title": "Auto Evidence Checkpoint after step 1",
1446                    "read_at": "2026-01-01T00:00:00+00:00",
1447                    "evidence_step_no": 1,
1448                    "blocked_tool": "shell_exec",
1449                }
1450            },
1451        )
1452        for index in range(3):
1453            run_id = db.start_run(job_id, model="test")
1454            step_id = db.add_step(
1455                job_id=job_id,
1456                run_id=run_id,
1457                kind="tool",
1458                tool_name="read_artifact",
1459                input_data={"arguments": {"artifact_id": "art_checkpoint", "retry": index}},
1460            )
1461            db.finish_step(
1462                step_id,
1463                status="blocked",
1464                summary="blocked read_artifact; evidence checkpoint accounting required",
1465                output_data={"success": False, "recoverable": True, "error": "evidence checkpoint accounting required"},
1466            )
1467            db.finish_run(run_id, "completed")
1468
1469        result = run_one_step(job_id, config=config, db=db, llm=ExplodingLLM())
1470
1471        assert result.tool_name == "guard_recovery"
1472        pending = db.get_job(job_id)["metadata"]["pending_evidence_checkpoint"]
1473        assert pending["resolved_at"]
1474        assert pending["resolved_by_tool"] == "guard_recovery"
1475    finally:
1476        db.close()
1477
1478
1479def test_guard_recovery_immediately_recovers_already_read_checkpoint_reread(tmp_path):
1480    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1481    db = AgentDB(tmp_path / "state.db")
1482    try:
1483        job_id = db.create_job("Recover checkpoint reread deadlock", title="checkpoint-recovery", kind="generic")
1484        run_id = db.start_run(job_id, model="test")
1485        step_id = db.add_step(
1486            job_id=job_id,
1487            run_id=run_id,
1488            kind="tool",
1489            tool_name="read_artifact",
1490            input_data={"arguments": {"artifact_id": "art_checkpoint"}},
1491        )
1492        db.finish_step(
1493            step_id,
1494            status="blocked",
1495            summary="blocked read_artifact; evidence checkpoint accounting required",
1496            output_data={
1497                "success": False,
1498                "recoverable": True,
1499                "error": "evidence checkpoint accounting required",
1500                "blocked_tool": "read_artifact",
1501                "pending_evidence_checkpoint": {
1502                    "artifact_id": "art_checkpoint",
1503                    "checkpoint_read": True,
1504                    "read_at": "2026-01-01T00:00:00+00:00",
1505                },
1506            },
1507        )
1508        db.finish_run(run_id, "completed")
1509
1510        result = run_one_step(job_id, config=config, db=db, llm=ExplodingLLM())
1511
1512        assert result.tool_name == "guard_recovery"
1513        assert result.result["guard_recovery"]["count"] == 1
1514        assert result.result["guard_recovery"]["error"] == "evidence checkpoint accounting required"
1515    finally:
1516        db.close()
1517
1518
1519def test_prompt_does_not_tell_worker_to_reread_checkpoint_after_it_was_read(tmp_path):
1520    db = AgentDB(tmp_path / "state.db")
1521    try:
1522        job_id = db.create_job("Account for checkpoint", title="checkpoint-prompt", kind="generic")
1523        db.update_job_metadata(
1524            job_id,
1525            {
1526                "pending_evidence_checkpoint": {
1527                    "artifact_id": "art_checkpoint",
1528                    "title": "Auto Evidence Checkpoint after step 1",
1529                    "read_at": "2026-01-01T00:00:00+00:00",
1530                    "evidence_step_no": 1,
1531                    "blocked_tool": "shell_exec",
1532                }
1533            },
1534        )
1535
1536        content = build_messages(db.get_job(job_id), db.list_steps(job_id=job_id))[-1]["content"]
1537
1538        assert "Do not read the checkpoint again" in content
1539        assert "Next either read that checkpoint artifact" not in content
1540    finally:
1541        db.close()
1542
1543
1544def test_checkpoint_reread_block_requires_accounting_not_more_reads(tmp_path):
1545    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1546    db = AgentDB(tmp_path / "state.db")
1547    try:
1548        job_id = db.create_job("Account for checkpoint", title="checkpoint-reread", kind="generic")
1549        db.update_job_metadata(
1550            job_id,
1551            {
1552                "pending_evidence_checkpoint": {
1553                    "artifact_id": "art_checkpoint",
1554                    "title": "Auto Evidence Checkpoint after step 1",
1555                    "read_at": "2026-01-01T00:00:00+00:00",
1556                    "evidence_step_no": 1,
1557                    "blocked_tool": "shell_exec",
1558                }
1559            },
1560        )
1561
1562        blocked = run_one_step(
1563            job_id,
1564            config=config,
1565            db=db,
1566            llm=ScriptedLLM([
1567                LLMResponse(tool_calls=[ToolCall(name="read_artifact", arguments={"artifact_id": "art_checkpoint"})])
1568            ]),
1569        )
1570
1571        assert blocked.status == "blocked"
1572        assert blocked.result["error"] == "evidence checkpoint accounting required"
1573        assert blocked.result["checkpoint_already_read"] is True
1574        assert blocked.result["required_next_action"] == "durable_checkpoint_accounting"
1575        assert "Do not read it again" in blocked.result["guidance"]
1576
1577        recovery = run_one_step(job_id, config=config, db=db, llm=ExplodingLLM())
1578
1579        assert recovery.tool_name == "guard_recovery"
1580        task = recovery.result["task"]
1581        assert task["metadata"]["resolves_evidence_checkpoint"] is True
1582        assert "Do not read the same checkpoint again" in task["acceptance_criteria"]
1583    finally:
1584        db.close()
1585
1586
1587def test_already_read_checkpoint_branch_block_recovers_immediately(tmp_path):
1588    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1589    db = AgentDB(tmp_path / "state.db")
1590    try:
1591        job_id = db.create_job("Recover checkpoint branch deadlock", title="checkpoint-branch", kind="generic")
1592        db.update_job_metadata(
1593            job_id,
1594            {
1595                "pending_evidence_checkpoint": {
1596                    "artifact_id": "art_checkpoint",
1597                    "title": "Auto Evidence Checkpoint after step 1",
1598                    "read_at": "2026-01-01T00:00:00+00:00",
1599                    "evidence_step_no": 1,
1600                    "blocked_tool": "shell_exec",
1601                }
1602            },
1603        )
1604        run_id = db.start_run(job_id, model="test")
1605        step_id = db.add_step(
1606            job_id=job_id,
1607            run_id=run_id,
1608            kind="tool",
1609            tool_name="shell_exec",
1610            input_data={"arguments": {"command": "echo more branch work"}},
1611        )
1612        db.finish_step(
1613            step_id,
1614            status="blocked",
1615            summary="blocked shell_exec; evidence checkpoint accounting required",
1616            output_data={
1617                "success": False,
1618                "recoverable": True,
1619                "error": "evidence checkpoint accounting required",
1620                "checkpoint_already_read": True,
1621                "pending_evidence_checkpoint": {
1622                    "artifact_id": "art_checkpoint",
1623                    "checkpoint_read": True,
1624                },
1625            },
1626        )
1627        db.finish_run(run_id, "completed")
1628
1629        result = run_one_step(job_id, config=config, db=db, llm=ExplodingLLM())
1630
1631        assert result.tool_name == "guard_recovery"
1632        assert result.result["guard_recovery"]["count"] == 1
1633        assert result.result["task"]["metadata"]["resolves_evidence_checkpoint"] is True
1634        pending = db.get_job(job_id)["metadata"]["pending_evidence_checkpoint"]
1635        assert pending["resolved_by_tool"] == "guard_recovery"
1636    finally:
1637        db.close()
1638
1639
1640def test_evidence_grounding_ignores_format_protocol_tokens():
1641    tokens = _concrete_evidence_tokens(
1642        "Parsed JSON from HTTPS REST API URL and saved HTML/YAML/XML CDN SHA256 GGUF excerpts for Model-7B step_123_shell_output. "
1643        "Download investigation parsed direct API results. Discovery step-2678 located a candidate file after shell_exec_step_1037."
1644    )
1645
1646    assert "JSON" not in tokens
1647    assert "HTTPS" not in tokens
1648    assert "REST" not in tokens
1649    assert "API" not in tokens
1650    assert "CDN" not in tokens
1651    assert "SHA256" not in tokens
1652    assert "GGUF" not in tokens
1653    assert "URL" not in tokens
1654    assert "Download" not in tokens
1655    assert "Discovery" not in tokens
1656    assert "investigation" not in tokens
1657    assert "direct" not in tokens
1658    assert "step_123_shell_output" not in tokens
1659    assert "step-2678" not in tokens
1660    assert "shell_exec_step_1037" not in tokens
1661    assert "Model-7B" in tokens
1662
1663
1664def test_evidence_grounding_ignores_lowercase_command_shorthand_tokens():
1665    tokens = _concrete_evidence_tokens("Build with cmake --build . -j16 on H100 hardware if observed.")
1666
1667    assert "j16" not in tokens
1668    assert "H100" in tokens
1669
1670
1671def test_record_experiment_allows_not_stub_validation_for_observed_token(tmp_path):
1672    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1673    db = AgentDB(tmp_path / "state.db")
1674    try:
1675        job_id = db.create_job("Validate discovered file", title="grounding", kind="generic")
1676        run_id = db.start_run(job_id, model="test")
1677        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
1678        db.finish_step(
1679            step_id,
1680            status="completed",
1681            output_data={
1682                "success": True,
1683                "stdout": "-rw-r--r-- 1 user user 12G /srv/models/AlphaModel-99-Q4.foo\n",
1684                "stderr": "",
1685            },
1686        )
1687        db.finish_run(run_id, "completed")
1688
1689        result = run_one_step(
1690            job_id,
1691            config=config,
1692            db=db,
1693            llm=ScriptedLLM([
1694                LLMResponse(tool_calls=[
1695                    ToolCall(
1696                        name="record_experiment",
1697                        arguments={
1698                            "title": "Candidate File Validation",
1699                            "status": "measured",
1700                            "metric_name": "usable_files_found",
1701                            "metric_value": 1,
1702                            "metric_unit": "files",
1703                            "result": (
1704                                "Observed /srv/models/AlphaModel-99-Q4.foo at 12G. "
1705                                "AlphaModel-99-Q4.foo is not a 29-byte stub."
1706                            ),
1707                            "next_action": "Run the next bounded benchmark.",
1708                        },
1709                    )
1710                ])
1711            ]),
1712        )
1713
1714        assert result.status == "completed"
1715        assert result.tool_name == "record_experiment"
1716    finally:
1717        db.close()
1718
1719
1720def test_record_findings_ignores_generated_step_labels_as_claims(tmp_path):
1721    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1722    db = AgentDB(tmp_path / "state.db")
1723    try:
1724        job_id = db.create_job("Record observed file candidates", title="grounding", kind="generic")
1725        run_id = db.start_run(job_id, model="test")
1726        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
1727        db.finish_step(
1728            step_id,
1729            status="completed",
1730            output_data={
1731                "success": True,
1732                "stdout": "/srv/models/AlphaModel-99-Q4.foo\n",
1733                "stderr": "",
1734            },
1735        )
1736        db.finish_run(run_id, "completed")
1737
1738        result = run_one_step(
1739            job_id,
1740            config=config,
1741            db=db,
1742            llm=ScriptedLLM([
1743                LLMResponse(tool_calls=[
1744                    ToolCall(
1745                        name="record_findings",
1746                        arguments={
1747                            "findings": [
1748                                {
1749                                    "name": "Model file candidate located",
1750                                    "category": "file_candidate",
1751                                    "location": "/srv/models/AlphaModel-99-Q4.foo",
1752                                    "evidence_artifact": "step-2678 shell_exec output",
1753                                    "reason": "Found via step-2678 shell output.",
1754                                    "status": "candidate",
1755                                }
1756                            ]
1757                        },
1758                    )
1759                ])
1760            ]),
1761        )
1762
1763        assert result.status == "completed"
1764        assert result.tool_name == "record_findings"
1765    finally:
1766        db.close()
1767
1768
1769def test_write_artifact_allows_plain_prose_headings_without_evidence(tmp_path):
1770    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1771    db = AgentDB(tmp_path / "state.db")
1772    try:
1773        job_id = db.create_job("Summarize observed evidence", title="artifact-grounding", kind="generic")
1774        run_id = db.start_run(job_id, model="test")
1775        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
1776        db.finish_step(
1777            step_id,
1778            status="completed",
1779            output_data={
1780                "success": True,
1781                "stdout": "Observed status: candidate file exists and benchmark setup is ready for the next measured action.",
1782                "stderr": "",
1783            },
1784        )
1785        db.finish_run(run_id, "completed")
1786
1787        result = run_one_step(
1788            job_id,
1789            config=config,
1790            db=db,
1791            llm=ScriptedLLM([
1792                LLMResponse(tool_calls=[
1793                    ToolCall(
1794                        name="write_artifact",
1795                        arguments={
1796                            "title": "Evidence Consolidation Summary",
1797                            "content": (
1798                                "## Discovered\n"
1799                                "The available observations were consolidated into a concise summary.\n\n"
1800                                "## Significance\n"
1801                                "This output records narrative context only and does not introduce a new model, file, or hardware identifier."
1802                            ),
1803                        },
1804                    )
1805                ])
1806            ]),
1807        )
1808
1809        assert result.status == "completed"
1810        assert result.tool_name == "write_artifact"
1811    finally:
1812        db.close()
1813
1814
1815def test_write_artifact_blocks_unsupported_high_risk_identifier(tmp_path):
1816    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1817    db = AgentDB(tmp_path / "state.db")
1818    try:
1819        job_id = db.create_job("Summarize observed evidence", title="artifact-grounding", kind="generic")
1820        run_id = db.start_run(job_id, model="test")
1821        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
1822        db.finish_step(
1823            step_id,
1824            status="completed",
1825            output_data={
1826                "success": True,
1827                "stdout": "Observed model identifier: AlphaModel-99. No other model identifiers were observed.",
1828                "stderr": "",
1829            },
1830        )
1831        db.finish_run(run_id, "completed")
1832
1833        result = run_one_step(
1834            job_id,
1835            config=config,
1836            db=db,
1837            llm=ScriptedLLM([
1838                LLMResponse(tool_calls=[
1839                    ToolCall(
1840                        name="write_artifact",
1841                        arguments={
1842                            "title": "Benchmark Summary",
1843                            "content": (
1844                                "The observed candidate was AlphaModel-99.\n"
1845                                "The final recommendation uses FakeModel-42 for the next benchmark branch."
1846                            ),
1847                        },
1848                    )
1849                ])
1850            ]),
1851        )
1852
1853        assert result.status == "blocked"
1854        assert result.result["error"] == "evidence grounding required"
1855        assert "FakeModel-42" in result.result["evidence_grounding"]["unsupported_tokens"]
1856    finally:
1857        db.close()
1858
1859
1860def test_web_search_auto_records_source_quality(tmp_path):
1861    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1862    db = AgentDB(tmp_path / "state.db")
1863    try:
1864        job_id = db.create_job("Track search sources", title="search-sources", kind="generic")
1865
1866        result = run_one_step(
1867            job_id,
1868            config=config,
1869            db=db,
1870            llm=ScriptedLLM([
1871                LLMResponse(tool_calls=[ToolCall(name="web_search", arguments={"query": "durable progress research"})])
1872            ]),
1873            registry=SearchRegistry(),
1874        )
1875
1876        assert result.status == "completed"
1877        sources = db.get_job(job_id)["metadata"]["source_ledger"]
1878        assert {source["source"] for source in sources} == {
1879            "https://source.example/primary",
1880            "https://source.example/secondary",
1881        }
1882        assert all(source["source_type"] == "web_search" for source in sources)
1883    finally:
1884        db.close()
1885
1886
1887def test_web_extract_auto_records_source_quality(tmp_path):
1888    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1889    db = AgentDB(tmp_path / "state.db")
1890    try:
1891        job_id = db.create_job("Track source quality", title="sources", kind="generic")
1892
1893        result = run_one_step(
1894            job_id,
1895            config=config,
1896            db=db,
1897            llm=ScriptedLLM([
1898                LLMResponse(tool_calls=[ToolCall(name="web_extract", arguments={"urls": ["https://source.example/a"]})])
1899            ]),
1900            registry=ExtractRegistry(),
1901        )
1902
1903        assert result.status == "completed"
1904        sources = db.get_job(job_id)["metadata"]["source_ledger"]
1905        assert {source["source"] for source in sources} == {"https://source.example/a", "https://source.example/b"}
1906        useful = next(source for source in sources if source["source"] == "https://source.example/a")
1907        failed = next(source for source in sources if source["source"] == "https://source.example/b")
1908        assert useful["usefulness_score"] >= 0.55
1909        assert failed["fail_count"] == 1
1910    finally:
1911        db.close()
1912
1913
1914def test_worker_cannot_mark_job_completed_by_default(tmp_path):
1915    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1916    db = AgentDB(tmp_path / "state.db")
1917    try:
1918        job_id = db.create_job("Keep improving forever", title="perpetual", kind="generic")
1919        llm = ScriptedLLM([
1920            LLMResponse(tool_calls=[
1921                ToolCall(
1922                    name="update_job_state",
1923                    arguments={"status": "completed", "note": "best result saved"},
1924                )
1925            ])
1926        ])
1927
1928        result = run_one_step(job_id, config=config, db=db, llm=llm)
1929        job = db.get_job(job_id)
1930
1931        assert result.status == "completed"
1932        assert result.result["kept_running"] is True
1933        assert job["status"] == "running"
1934        assert job["metadata"]["agent_updates"][-1]["metadata"]["requested_status"] == "completed"
1935    finally:
1936        db.close()
1937
1938
1939def test_report_update_completion_claim_is_rewritten_as_checkpoint(tmp_path):
1940    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1941    db = AgentDB(tmp_path / "state.db")
1942    try:
1943        job_id = db.create_job("Keep improving forever", title="perpetual", kind="generic")
1944        result = run_one_step(
1945            job_id,
1946            config=config,
1947            db=db,
1948            llm=ScriptedLLM([
1949                LLMResponse(tool_calls=[
1950                    ToolCall(
1951                        name="report_update",
1952                        arguments={"message": "Job completed. Best result saved.", "category": "progress"},
1953                    )
1954                ])
1955            ]),
1956        )
1957
1958        assert result.status == "completed"
1959        update = db.get_job(job_id)["metadata"]["last_agent_update"]
1960        assert update["message"] == "Checkpoint reported; continuing work. Best result saved."
1961        assert update["metadata"]["rewritten_completion_claim"] is True
1962        assert update["metadata"]["original_message"] == "Job completed. Best result saved."
1963        assert update["metadata"]["follow_up_task"]
1964        tasks = db.get_job(job_id)["metadata"]["task_queue"]
1965        follow_up = next(task for task in tasks if task["key"] == update["metadata"]["follow_up_task"])
1966        assert follow_up["title"] == "Audit latest checkpoint against objective"
1967        assert follow_up["status"] == "open"
1968        assert follow_up["output_contract"] == "decision"
1969        assert follow_up["metadata"]["completion_audit_required"] is True
1970    finally:
1971        db.close()
1972
1973
1974def test_run_one_step_claims_one_message_but_keeps_all_steering_in_prompt(tmp_path):
1975    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1976    db = AgentDB(tmp_path / "state.db")
1977    try:
1978        job_id = db.create_job("Find durable research findings", title="research", kind="generic")
1979        db.append_operator_message(job_id, "first instruction", source="chat")
1980        db.append_operator_message(job_id, "second instruction", source="chat")
1981        llm = CapturingLLM(LLMResponse(content="No tool this turn."))
1982
1983        result = run_one_step(job_id, config=config, db=db, llm=llm)
1984
1985        assert result.status == "blocked"
1986        assert result.result["error"] == "worker tool call required"
1987        prompt = llm.messages[-1]["content"]
1988        job = db.get_job(job_id)
1989        events = db.list_timeline_events(job_id, limit=30)
1990        assert "first instruction" in prompt
1991        assert "second instruction" in prompt
1992        assert job["metadata"]["operator_messages"][0]["claimed_at"]
1993        assert not job["metadata"]["operator_messages"][1].get("claimed_at")
1994        assert any(event["event_type"] == "loop" and event["title"] == "agent_start" for event in events)
1995        assert any(event["event_type"] == "loop" and event["title"] == "turn_end" for event in events)
1996    finally:
1997        db.close()
1998
1999
2000class FailingLLM:
2001    def next_action(self, *, messages, tools):
2002        del messages, tools
2003        raise RuntimeError("provider returned no choices")
2004
2005
2006class HardProviderFailingLLM:
2007    def next_action(self, *, messages, tools):
2008        del messages, tools
2009        raise LLMResponseError(
2010            "Key limit exceeded (total limit)",
2011            payload={"error": {"message": "Key limit exceeded (total limit)", "code": 403}},
2012        )
2013
2014
2015def test_run_one_step_records_model_failures_instead_of_raising(tmp_path):
2016    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
2017    db = AgentDB(tmp_path / "state.db")
2018    try:
2019        job_id = db.create_job("Keep running despite provider failures", title="provider")
2020
2021        result = run_one_step(job_id, config=config, db=db, llm=FailingLLM())
2022
2023        assert result.status == "failed"
2024        assert result.result["error"] == "provider returned no choices"
2025        assert result.result["duration_seconds"] >= 0
2026        steps = db.list_steps(job_id=job_id)
2027        assert steps[0]["kind"] == "llm"
2028        assert steps[0]["status"] == "failed"
2029        assert steps[0]["error"] == "provider returned no choices"
2030        assert steps[0]["input"]["duration_seconds"] >= 0
2031        assert db.list_runs(job_id)[0]["status"] == "failed"
2032    finally:
2033        db.close()
2034
2035
2036def test_run_one_step_blocks_missing_tool_arguments_as_recoverable(tmp_path):
2037    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
2038    db = AgentDB(tmp_path / "state.db")
2039    try:
2040        job_id = db.create_job("Keep running despite malformed tool calls", title="tool args")
2041        llm = ScriptedLLM([LLMResponse(tool_calls=[ToolCall(name="shell_exec", arguments={})])])
2042
2043        result = run_one_step(job_id, config=config, db=db, llm=llm)
2044
2045        assert result.status == "blocked"
2046        assert result.result["recoverable"] is True
2047        assert result.result["missing_arguments"] == ["command"]
2048        step = db.list_steps(job_id=job_id)[0]
2049        assert step["status"] == "blocked"
2050        assert "missing required arguments" in step["summary"]
2051        assert not step["error"]
2052    finally:
2053        db.close()
2054
2055
2056def test_run_one_step_continues_after_malformed_tool_arguments_when_batch_has_more_work(tmp_path):
2057    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
2058    db = AgentDB(tmp_path / "state.db")
2059    try:
2060        job_id = db.create_job("Keep running through recoverable malformed tool calls", title="tool batch")
2061        llm = ScriptedLLM([
2062            LLMResponse(tool_calls=[
2063                ToolCall(name="shell_exec", arguments={}),
2064                ToolCall(name="record_lesson", arguments={"lesson": "continue with the remaining valid tool call"}),
2065            ])
2066        ])
2067
2068        result = run_one_step(job_id, config=config, db=db, llm=llm)
2069
2070        assert result.tool_name == "record_lesson"
2071        assert result.status == "completed"
2072        tool_steps = [step for step in db.list_steps(job_id=job_id) if step["kind"] == "tool"]
2073        assert [step["tool_name"] for step in tool_steps] == ["shell_exec", "record_lesson"]
2074        assert tool_steps[0]["status"] == "blocked"
2075        assert tool_steps[0]["output"]["error"] == "missing required tool arguments"
2076        assert tool_steps[1]["status"] == "completed"
2077        assert db.list_runs(job_id)[0]["status"] == "completed"
2078    finally:
2079        db.close()
2080
2081
2082def test_run_one_step_continues_after_missing_artifact_when_batch_has_more_work(tmp_path):
2083    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
2084    db = AgentDB(tmp_path / "state.db")
2085    try:
2086        job_id = db.create_job("Recover from invented artifact references", title="artifact batch")
2087        llm = ScriptedLLM([
2088            LLMResponse(tool_calls=[
2089                ToolCall(name="read_artifact", arguments={"artifact_id": "art_missing"}),
2090                ToolCall(name="record_lesson", arguments={"lesson": "search artifacts before reading unknown artifact ids"}),
2091            ])
2092        ])
2093
2094        result = run_one_step(job_id, config=config, db=db, llm=llm)
2095
2096        assert result.tool_name == "record_lesson"
2097        assert result.status == "completed"
2098        tool_steps = [step for step in db.list_steps(job_id=job_id) if step["kind"] == "tool"]
2099        assert [step["tool_name"] for step in tool_steps] == ["read_artifact", "record_lesson"]
2100        assert tool_steps[0]["status"] == "blocked"
2101        assert tool_steps[0]["output"]["recoverable"] is True
2102        assert tool_steps[0]["output"]["error"] == "artifact not found: art_missing"
2103        assert tool_steps[1]["status"] == "completed"
2104        assert db.list_runs(job_id)[0]["status"] == "completed"
2105    finally:
2106        db.close()
2107
2108
2109def test_run_one_step_continues_after_empty_operator_ack_when_batch_has_more_work(tmp_path):
2110    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
2111    db = AgentDB(tmp_path / "state.db")
2112    try:
2113        job_id = db.create_job("Recover from harmless no-op acknowledgements", title="ack batch")
2114        llm = ScriptedLLM([
2115            LLMResponse(tool_calls=[
2116                ToolCall(name="acknowledge_operator_context", arguments={"summary": "already handled"}),
2117                ToolCall(name="record_lesson", arguments={"lesson": "continue with ordinary progress when no operator ack is needed"}),
2118            ])
2119        ])
2120
2121        result = run_one_step(job_id, config=config, db=db, llm=llm)
2122
2123        assert result.tool_name == "record_lesson"
2124        assert result.status == "completed"
2125        tool_steps = [step for step in db.list_steps(job_id=job_id) if step["kind"] == "tool"]
2126        assert [step["tool_name"] for step in tool_steps] == ["acknowledge_operator_context", "record_lesson"]
2127        assert tool_steps[0]["status"] == "blocked"
2128        assert tool_steps[0]["output"]["recoverable"] is True
2129        assert tool_steps[0]["output"]["error"] == "no active operator context to acknowledge"
2130        assert tool_steps[1]["status"] == "completed"
2131        assert db.list_runs(job_id)[0]["status"] == "completed"
2132    finally:
2133        db.close()
2134
2135
2136def test_run_one_step_blocks_placeholder_tool_arguments_as_recoverable(tmp_path):
2137    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
2138    db = AgentDB(tmp_path / "state.db")
2139    try:
2140        job_id = db.create_job("Keep running despite placeholder tool calls", title="placeholder args")
2141        llm = ScriptedLLM([LLMResponse(tool_calls=[ToolCall(name="read_artifact", arguments={"artifact_id": "..."})])])
2142
2143        result = run_one_step(job_id, config=config, db=db, llm=llm)
2144
2145        assert result.status == "blocked"
2146        assert result.result["recoverable"] is True
2147        assert result.result["error"] == "missing required tool arguments"
2148        assert result.result["missing_arguments"] == ["artifact reference"]
2149        step = db.list_steps(job_id=job_id)[0]
2150        assert step["status"] == "blocked"
2151        assert "missing required arguments" in step["summary"]
2152    finally:
2153        db.close()
2154
2155
2156def test_run_one_step_blocks_truncated_optional_reference_arguments(tmp_path):
2157    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
2158    db = AgentDB(tmp_path / "state.db")
2159    try:
2160        job_id = db.create_job("Resolve concrete optional references before recording", title="truncated optional")
2161        llm = ScriptedLLM([
2162            LLMResponse(tool_calls=[
2163                ToolCall(
2164                    name="record_experiment",
2165                    arguments={
2166                        "title": "Validate artifact",
2167                        "evidence_artifact": "art_123...",
2168                        "next_action": "read the concrete artifact id",
2169                    },
2170                )
2171            ])
2172        ])
2173
2174        result = run_one_step(job_id, config=config, db=db, llm=llm)
2175
2176        assert result.status == "blocked"
2177        assert result.result["recoverable"] is True
2178        assert result.result["error"] == "placeholder tool arguments"
2179        assert result.result["placeholder_arguments"] == ["evidence_artifact"]
2180        step = db.list_steps(job_id=job_id)[0]
2181        assert step["status"] == "blocked"
2182        assert "placeholder tool arguments" in step["summary"]
2183    finally:
2184        db.close()
2185
2186
2187def test_run_one_step_blocks_placeholder_shell_command_before_execution(tmp_path):
2188    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
2189    db = AgentDB(tmp_path / "state.db")
2190    try:
2191        job_id = db.create_job("Resolve concrete shell inputs before execution", title="placeholder shell")
2192        llm = ScriptedLLM([
2193            LLMResponse(tool_calls=[ToolCall(name="shell_exec", arguments={"command": "wget http://output/"})])
2194        ])
2195
2196        result = run_one_step(job_id, config=config, db=db, llm=llm)
2197
2198        assert result.status == "blocked"
2199        assert result.result["error"] == "unresolved placeholder in shell command"
2200        assert result.result["placeholder"]["value"] == "http://output/"
2201        assert result.result["recoverable"] is True
2202        step = db.list_steps(job_id=job_id)[0]
2203        assert step["status"] == "blocked"
2204        assert "unresolved placeholder" in step["summary"]
2205    finally:
2206        db.close()
2207
2208
2209def test_run_one_step_blocks_tool_markup_shell_command_before_execution(tmp_path):
2210    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
2211    db = AgentDB(tmp_path / "state.db")
2212    try:
2213        job_id = db.create_job("Reject malformed tool markup before shell execution", title="tool markup shell")
2214        llm = ScriptedLLM([
2215            LLMResponse(tool_calls=[
2216                ToolCall(
2217                    name="shell_exec",
2218                    arguments={"command": "echo ok\n</parameter> }, {"},
2219                )
2220            ])
2221        ])
2222
2223        result = run_one_step(job_id, config=config, db=db, llm=llm)
2224
2225        assert result.status == "blocked"
2226        assert result.result["error"] == "unresolved placeholder in shell command"
2227        assert result.result["placeholder"]["value"] == "</parameter>"
2228        step = db.list_steps(job_id=job_id)[0]
2229        assert step["status"] == "blocked"
2230    finally:
2231        db.close()
2232
2233
2234def test_run_one_step_blocks_unbalanced_shell_quotes_before_execution(tmp_path):
2235    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
2236    db = AgentDB(tmp_path / "state.db")
2237    try:
2238        job_id = db.create_job("Reject partial shell before execution", title="bad shell syntax")
2239        llm = ScriptedLLM([
2240            LLMResponse(tool_calls=[
2241                ToolCall(
2242                    name="shell_exec",
2243                    arguments={"command": "echo 'start && ls /tmp"},
2244                )
2245            ])
2246        ])
2247
2248        result = run_one_step(job_id, config=config, db=db, llm=llm)
2249
2250        assert result.status == "blocked"
2251        assert result.result["error"] == "malformed shell command"
2252        assert result.result["recoverable"] is True
2253        assert result.result["syntax"]["kind"] == "shell_syntax"
2254        step = db.list_steps(job_id=job_id)[0]
2255        assert step["status"] == "blocked"
2256        assert "malformed command syntax" in step["summary"]
2257    finally:
2258        db.close()
2259
2260
2261def test_run_one_step_blocks_markdown_fenced_shell_command_before_execution(tmp_path):
2262    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
2263    db = AgentDB(tmp_path / "state.db")
2264    try:
2265        job_id = db.create_job("Reject markdown prose before shell execution", title="markdown shell")
2266        llm = ScriptedLLM([
2267            LLMResponse(tool_calls=[
2268                ToolCall(
2269                    name="shell_exec",
2270                    arguments={
2271                        "command": (
2272                            "ls -la /srv/models/model.bin\n\n"
2273                            "--- Chapter 2\n\n"
2274                            "1. ```shell\n"
2275                            "   chmod +x /tmp/example\n"
2276                            "```"
2277                        )
2278                    },
2279                )
2280            ])
2281        ])
2282
2283        result = run_one_step(job_id, config=config, db=db, llm=llm)
2284
2285        assert result.status == "blocked"
2286        assert result.result["error"] == "unresolved placeholder in shell command"
2287        assert result.result["placeholder"]["kind"] == "markdown_code_fence"
2288        step = db.list_steps(job_id=job_id)[0]
2289        assert step["status"] == "blocked"
2290        assert "unresolved placeholder" in step["summary"]
2291    finally:
2292        db.close()
2293
2294
2295def test_run_one_step_times_out_stalled_model_call(tmp_path):
2296    config = AppConfig(
2297        runtime=RuntimeConfig(home=tmp_path),
2298        model=ModelConfig(request_timeout_seconds=0.05),
2299    )
2300    db = AgentDB(tmp_path / "state.db")
2301    try:
2302        job_id = db.create_job("Keep daemon moving through stalled model calls", title="provider")
2303
2304        result = run_one_step(job_id, config=config, db=db, llm=HangingLLM())
2305
2306        assert result.status == "failed"
2307        assert "model call timed out" in result.result["error"]
2308        assert result.result["duration_seconds"] >= 0.04
2309        step = db.list_steps(job_id=job_id)[0]
2310        assert step["kind"] == "llm"
2311        assert step["status"] == "failed"
2312        assert step["input"]["duration_seconds"] >= 0.04
2313    finally:
2314        db.close()
2315
2316
2317def test_repeated_model_failures_do_not_create_automatic_defer(tmp_path):
2318    config = AppConfig(
2319        runtime=RuntimeConfig(home=tmp_path),
2320        model=ModelConfig(request_timeout_seconds=120),
2321    )
2322    db = AgentDB(tmp_path / "state.db")
2323    try:
2324        job_id = db.create_job("Keep running through provider instability", title="provider failures")
2325        for _index in range(2):
2326            run_id = db.start_run(job_id, model="test")
2327            step_id = db.add_step(job_id=job_id, run_id=run_id, kind="llm", status="failed")
2328            db.finish_step(
2329                step_id,
2330                status="failed",
2331                summary="model call failed: APITimeoutError",
2332                output_data={"success": False, "error": "Request timed out.", "error_type": "APITimeoutError"},
2333                error="Request timed out.",
2334            )
2335            db.finish_run(run_id, "failed", error="Request timed out.")
2336
2337        result = run_one_step(job_id, config=config, db=db, llm=ExplodingLLM())
2338
2339        assert result.status == "failed"
2340        assert result.tool_name is None
2341        job = db.get_job(job_id)
2342        assert not job["metadata"].get("defer_until")
2343        assert all(step.get("tool_name") != "defer_job" for step in db.list_steps(job_id=job_id))
2344    finally:
2345        db.close()
2346
2347
2348def test_legacy_model_cooldown_metadata_is_ignored(tmp_path):
2349    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
2350    db = AgentDB(tmp_path / "state.db")
2351    try:
2352        job_id = db.create_job("Continue after provider instability", title="provider recovered")
2353        db.update_job_metadata(job_id, {"transient_model_cooldown_streak": 3})
2354        llm = ScriptedLLM([
2355            LLMResponse(tool_calls=[ToolCall(name="report_update", arguments={"message": "provider recovered"})])
2356        ])
2357
2358        result = run_one_step(job_id, config=config, db=db, llm=llm)
2359
2360        assert result.status == "completed"
2361        assert result.tool_name == "report_update"
2362        job = db.get_job(job_id)
2363        assert job["metadata"]["transient_model_cooldown_streak"] == 3
2364        assert "transient_model_recovered_at" not in job["metadata"]
2365        message_end = next(event for event in db.list_events(job_id=job_id, limit=10) if event["event_type"] == "loop" and event["title"] == "message_end")
2366        assert message_end["metadata"]["duration_seconds"] >= 0
2367    finally:
2368        db.close()
2369
2370
2371def test_run_one_step_pauses_job_on_hard_provider_failure(tmp_path):
2372    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
2373    db = AgentDB(tmp_path / "state.db")
2374    try:
2375        job_id = db.create_job("Keep running when provider is configured", title="provider")
2376
2377        result = run_one_step(job_id, config=config, db=db, llm=HardProviderFailingLLM())
2378
2379        assert result.status == "failed"
2380        assert result.result["provider_action_required"] is True
2381        assert result.result["pause_reason"] == "llm_provider_blocked"
2382        job = db.get_job(job_id)
2383        assert job["status"] == "paused"
2384        assert "operator action" in job["metadata"]["last_note"]
2385        assert job["metadata"]["provider_blocked_at"]
2386        events = db.list_events(job_id=job_id, limit=10)
2387        assert any(event["event_type"] == "agent_message" and event["title"] == "error" for event in events)
2388    finally:
2389        db.close()
2390
2391
2392def test_prompt_includes_recent_tool_arguments_and_observations():
2393    job = {"title": "research", "kind": "generic", "objective": "find research"}
2394    steps = [{
2395        "step_no": 7,
2396        "kind": "tool",
2397        "status": "completed",
2398        "tool_name": "web_search",
2399        "summary": "web_search query='target model docs' returned 1 results",
2400        "input": {"arguments": {"query": "target model docs", "limit": 5}},
2401        "output": {"query": "target model docs", "results": [{"title": "Target Docs", "url": "https://example.com"}]},
2402    }]
2403
2404    messages = build_messages(job, steps)
2405
2406    content = messages[-1]["content"]
2407    assert "target model docs" in content
2408    assert "Target Docs <https://example.com>" in content
2409    assert "do not search the same query again" in content
2410    assert "shell_exec runs on the machine hosting this Nipux worker" in content
2411    assert str(Path.cwd()) not in content
2412    assert "read_artifact is only for those saved outputs" in content
2413
2414
2415def test_prompt_recovers_from_missing_artifact_reference():
2416    job = {"title": "artifact recovery", "kind": "generic", "objective": "use saved evidence"}
2417    steps = [{
2418        "step_no": 12,
2419        "kind": "tool",
2420        "status": "failed",
2421        "tool_name": "read_artifact",
2422        "summary": "read_artifact failed: artifact not found: art_missing",
2423        "input": {"arguments": {"artifact_id": "art_missing"}},
2424        "output": {
2425            "success": False,
2426            "error": "artifact not found: art_missing",
2427            "guidance": "Use one of the recent_artifacts refs, call search_artifacts, or continue from evidence.",
2428            "recent_artifacts": [{"number": "1", "id": "art_real", "title": "Real Evidence"}],
2429        },
2430    }]
2431
2432    messages = build_messages(job, steps)
2433    content = messages[-1]["content"]
2434
2435    assert "valid_recent_artifacts=art_real=Real Evidence" in content
2436    assert "Do not invent or retry artifact ids" in content
2437    assert "search_artifacts" in content
2438
2439
2440def test_prompt_does_not_inject_local_ssh_alias_context(monkeypatch, tmp_path):
2441    monkeypatch.setenv("HOME", str(tmp_path))
2442    ssh_dir = tmp_path / ".ssh"
2443    ssh_dir.mkdir()
2444    (ssh_dir / "config").write_text("Host remote-box\n  HostName 100.64.0.1\n  User operator\n", encoding="utf-8")
2445    job = {"title": "remote work", "kind": "generic", "objective": "benchmark remote target"}
2446
2447    messages = build_messages(job, [])
2448
2449    content = messages[-1]["content"]
2450    assert "Local CLI context:" not in content
2451    assert "100.64.0.1" not in content
2452    assert "remote-box ->" not in content
2453
2454
2455def test_prompt_includes_operator_steering_messages():
2456    job = {
2457        "title": "research",
2458        "kind": "generic",
2459        "objective": "find research",
2460        "metadata": {
2461            "operator_messages": [{
2462                "at": "2026-04-24T20:40:00+00:00",
2463                "source": "shell",
2464                "message": "Focus on actual strong evidence sources, not competing irrelevant sources.",
2465            }],
2466        },
2467    }
2468
2469    messages = build_messages(job, [])
2470
2471    assert "Operator context:" in messages[-1]["content"]
2472    assert "Focus on actual strong evidence sources" in messages[-1]["content"]
2473
2474
2475def test_prompt_keeps_claimed_operator_context_until_acknowledged(tmp_path):
2476    db = AgentDB(tmp_path / "state.db")
2477    try:
2478        job_id = db.create_job("Find durable research findings", title="research", kind="generic")
2479        entry = db.append_operator_message(job_id, "use the corrected target from chat", source="chat")
2480        claimed = db.claim_operator_messages(job_id, modes=("steer",), limit=1)
2481        assert claimed[0]["event_id"] == entry["event_id"]
2482
2483        job = db.get_job(job_id)
2484        messages = build_messages(job, [], include_unclaimed_operator_messages=False)
2485        content = messages[-1]["content"]
2486
2487        assert "Operator context:" in content
2488        assert "use the corrected target from chat" in content
2489        assert "delivered" in content
2490
2491        db.acknowledge_operator_messages(job_id, message_ids=[entry["event_id"]], summary="incorporated correction")
2492        job = db.get_job(job_id)
2493        messages = build_messages(job, [], include_unclaimed_operator_messages=False)
2494
2495        assert "use the corrected target from chat" not in messages[-1]["content"]
2496    finally:
2497        db.close()
2498
2499
2500def test_prompt_keeps_unclaimed_steering_but_not_followup_until_claimed(tmp_path):
2501    db = AgentDB(tmp_path / "state.db")
2502    try:
2503        job_id = db.create_job("Find durable research findings", title="research", kind="generic")
2504        db.append_operator_message(job_id, "use the corrected target from chat", source="chat", mode="steer")
2505        db.append_operator_message(job_id, "after this branch settles, write a recap", source="chat", mode="follow_up")
2506
2507        job = db.get_job(job_id)
2508        content = build_messages(job, [], include_unclaimed_operator_messages=True)[-1]["content"]
2509
2510        assert "use the corrected target from chat" in content
2511        assert "after this branch settles" not in content
2512    finally:
2513        db.close()
2514
2515
2516def test_prompt_includes_context_pressure_constraint():
2517    job = {
2518        "title": "context pressure",
2519        "kind": "generic",
2520        "objective": "keep a long-running job stable",
2521        "metadata": {
2522            "context_pressure": {
2523                "band": "high",
2524                "prompt_tokens": 8_600,
2525                "context_length": 10_000,
2526                "fraction": 0.86,
2527            }
2528        },
2529    }
2530
2531    content = build_messages(job, [])[-1]["content"]
2532
2533    assert "Context pressure:" in content
2534    assert "Context pressure is high" in content
2535    assert "8.6K/10.0K" in content
2536    assert "artifact references" in content
2537
2538
2539def test_prompt_includes_cumulative_usage_pressure():
2540    job = {
2541        "title": "usage pressure",
2542        "kind": "generic",
2543        "objective": "keep a long-running job useful",
2544        "metadata": {
2545            "finding_ledger": [{"name": "durable fact"}],
2546            "source_ledger": [{"source": "local evidence"}],
2547            "experiment_ledger": [{"title": "trial", "metric_value": 1}],
2548            "task_queue": [{"title": "done branch", "status": "done", "result": "validated"}],
2549        },
2550    }
2551
2552    content = build_messages(
2553        job,
2554        [],
2555        token_usage={
2556            "calls": 2_100,
2557            "prompt_tokens": 21_000_000,
2558            "completion_tokens": 1_000_000,
2559            "total_tokens": 22_000_000,
2560            "latest_prompt_tokens": 10_000,
2561            "latest_context_length": 262_144,
2562            "cost": 10.25,
2563            "has_cost": True,
2564        },
2565    )[-1]["content"]
2566
2567    assert "Usage pressure:" in content
2568    assert "Cumulative model usage pressure is critical" in content
2569    assert "calls=2100" in content
2570    assert "tokens=22.0M" in content
2571    assert "cost=$10.2500" in content
2572    assert "high leverage" in content
2573
2574
2575def test_prompt_renders_task_contract_from_metadata_for_existing_tasks():
2576    job = {
2577        "title": "contract fallback",
2578        "kind": "generic",
2579        "objective": "keep existing task contracts visible",
2580        "metadata": {
2581            "task_queue": [
2582                {
2583                    "title": "Validate concrete candidate",
2584                    "status": "active",
2585                    "priority": 9,
2586                    "metadata": {"output_contract": "action"},
2587                    "acceptance_criteria": "candidate tested",
2588                }
2589            ],
2590        },
2591    }
2592
2593    content = build_messages(job, [])[-1]["content"]
2594
2595    assert "Task queue:" in content
2596    assert "Validate concrete candidate" in content
2597    assert "contract=action" in content
2598
2599
2600def test_prompt_keeps_persistent_task_backlog_pressure_visible():
2601    job = {
2602        "title": "persistent backlog pressure",
2603        "kind": "generic",
2604        "objective": "keep a long-running job focused",
2605        "metadata": {
2606            "task_backlog_pressure": {
2607                "reason": "total task queue is too large",
2608                "open_count": 42,
2609                "total_count": 81,
2610                "guard_recovery": {
2611                    "latest_step_no": 123,
2612                    "task_queue": {"open_titles": ["Existing branch"]},
2613                },
2614            },
2615            "task_queue": [
2616                {"title": f"Existing branch {index}", "status": "open", "priority": 9}
2617                for index in range(81)
2618            ],
2619        },
2620    }
2621
2622    content = build_messages(job, [])[-1]["content"]
2623
2624    assert "Task queue saturation:" in content
2625    assert "Task backlog pressure remains active from guard recovery #123" in content
2626    assert "open_tasks=81" in content
2627    assert "total_tasks=81" in content
2628    assert "Do not create new task branches" in content
2629
2630
2631def test_prompt_shows_current_task_backlog_pressure_without_prior_block():
2632    job = {
2633        "title": "large backlog",
2634        "kind": "generic",
2635        "objective": "execute a broad long-running job",
2636        "metadata": {
2637            "task_queue": [
2638                {"title": f"Existing branch {index}", "status": "done", "priority": index}
2639                for index in range(81)
2640            ],
2641        },
2642    }
2643
2644    content = build_messages(job, [])[-1]["content"]
2645
2646    assert "Task queue saturation:" in content
2647    assert "Task backlog pressure remains active from current queue #current" in content
2648    assert "total_tasks=81" in content
2649    assert "Do not create new task branches" in content
2650
2651
2652def test_prompt_ignores_stale_task_backlog_pressure_after_queue_is_cleaned_up():
2653    job = {
2654        "title": "clean backlog",
2655        "kind": "generic",
2656        "objective": "execute a focused long-running job",
2657        "metadata": {
2658            "task_backlog_pressure": {
2659                "reason": "total task queue is too large",
2660                "open_count": 42,
2661                "total_count": 80,
2662                "latest_step_no": 123,
2663                "source": "blocked_record_tasks",
2664            },
2665            "task_queue": [
2666                {"title": "Focused branch", "status": "active", "priority": 9},
2667            ],
2668        },
2669    }
2670
2671    content = build_messages(job, [])[-1]["content"]
2672
2673    assert "Task backlog pressure remains active" not in content
2674    assert "Task queue saturation:\nNone." in content
2675
2676
2677def test_run_one_step_clears_stale_task_backlog_pressure(tmp_path):
2678    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
2679    db = AgentDB(tmp_path / "state.db")
2680    try:
2681        job_id = db.create_job(
2682            "Continue after backlog cleanup",
2683            title="cleaned-backlog",
2684            kind="generic",
2685            metadata={
2686                "task_backlog_pressure": {
2687                    "reason": "total task queue is too large",
2688                    "open_count": 42,
2689                    "total_count": 80,
2690                    "source": "blocked_record_tasks",
2691                },
2692                "task_queue": [
2693                    {"title": "Focused branch", "status": "active", "priority": 9},
2694                ],
2695            },
2696        )
2697        llm = CapturingLLM(LLMResponse(tool_calls=[ToolCall(name="record_lesson", arguments={"lesson": "continue focused work"})]))
2698
2699        result = run_one_step(job_id, config=config, db=db, llm=llm)
2700
2701        assert result.status == "completed"
2702        job = db.get_job(job_id)
2703        assert job["metadata"]["task_backlog_pressure"] == {}
2704        assert "Task queue saturation:\nNone." in llm.messages[-1]["content"]
2705        assert any(
2706            event["event_type"] == "agent_message"
2707            and event["title"] == "progress"
2708            and "Task backlog pressure cleared" in event["body"]
2709            for event in db.list_events(job_id=job_id, limit=20)
2710        )
2711    finally:
2712        db.close()
2713
2714
2715def test_run_one_step_records_usage_pressure_without_spam(tmp_path):
2716    config = AppConfig(runtime=RuntimeConfig(home=tmp_path), model=ModelConfig(context_length=10_000_000))
2717    db = AgentDB(tmp_path / "state.db")
2718    try:
2719        job_id = db.create_job("Keep a long-running task efficient", title="usage pressure", kind="generic")
2720        llm = ScriptedLLM([
2721            LLMResponse(
2722                tool_calls=[ToolCall(name="record_lesson", arguments={"lesson": "consolidate before spending more", "category": "strategy"})],
2723                usage={"prompt_tokens": 1_100_000, "completion_tokens": 100, "total_tokens": 1_100_100, "cost": 1.1},
2724            ),
2725            LLMResponse(
2726                tool_calls=[ToolCall(name="record_lesson", arguments={"lesson": "second consolidation", "category": "strategy"})],
2727                usage={"prompt_tokens": 300_000, "completion_tokens": 100, "total_tokens": 300_100, "cost": 0.3},
2728            ),
2729        ])
2730
2731        run_one_step(job_id, config=config, db=db, llm=llm)
2732        run_one_step(job_id, config=config, db=db, llm=llm)
2733
2734        pressure_events = [
2735            event
2736            for event in db.list_events(job_id=job_id, event_types=["agent_message"])
2737            if event["metadata"].get("kind") == "usage_pressure"
2738        ]
2739        assert len(pressure_events) == 1
2740        assert "Usage pressure watch" in pressure_events[0]["body"]
2741        job = db.get_job(job_id)
2742        pressure = job["metadata"]["usage_pressure"]
2743        assert pressure["band"] == "watch"
2744        assert pressure["calls"] == 2
2745        assert pressure["total_tokens"] == 1_400_200
2746    finally:
2747        db.close()
2748
2749
2750def test_critical_usage_does_not_create_automatic_defer(tmp_path):
2751    config = AppConfig(runtime=RuntimeConfig(home=tmp_path), model=ModelConfig(context_length=262_144))
2752    db = AgentDB(tmp_path / "state.db")
2753    try:
2754        job_id = db.create_job("Keep a long-running task efficient", title="usage pressure", kind="generic")
2755        db.append_event(
2756            job_id,
2757            event_type="loop",
2758            title="message_end",
2759            metadata={"usage": {"prompt_tokens": 21_000_000, "completion_tokens": 10_000, "total_tokens": 21_010_000, "cost": 11.0}},
2760        )
2761        llm = ScriptedLLM([
2762            LLMResponse(tool_calls=[ToolCall(name="record_lesson", arguments={"lesson": "keep useful work moving", "category": "strategy"})])
2763        ])
2764
2765        result = run_one_step(job_id, config=config, db=db, llm=llm)
2766
2767        assert result.status == "completed"
2768        assert result.tool_name == "record_lesson"
2769        job = db.get_job(job_id)
2770        assert not job["metadata"].get("defer_until")
2771        assert "usage_pressure_circuit_breaker" not in job["metadata"]
2772    finally:
2773        db.close()
2774
2775
2776def test_prompt_ignores_legacy_usage_pressure_recovery_metadata():
2777    job = {
2778        "title": "usage recovery",
2779        "kind": "generic",
2780        "objective": "Keep long-running work efficient.",
2781        "metadata": {
2782            "usage_pressure_circuit_breaker": {
2783                "latest_step_no": 12,
2784                "streak": 2,
2785                "calls": 2200,
2786                "total_tokens": 25_000_000,
2787                "cost": 12.5,
2788                "has_cost": True,
2789            },
2790            "task_queue": [{"title": "Focused task", "status": "active", "priority": 9}],
2791        },
2792    }
2793    steps = [
2794        {"step_no": 13, "kind": "recovery", "status": "completed", "tool_name": "defer_job", "summary": "legacy cooldown"},
2795        {"step_no": 14, "kind": "tool", "status": "blocked", "tool_name": "web_search", "summary": "blocked search"},
2796    ]
2797
2798    content = build_messages(job, steps)[-1]["content"]
2799
2800    assert "Usage pressure:" in content
2801    assert "Usage pressure recovery" not in content
2802    assert "cooldown is still unresolved" not in content
2803
2804
2805def test_run_one_step_pauses_when_configured_cost_limit_is_reached(tmp_path):
2806    config = AppConfig(
2807        runtime=RuntimeConfig(home=tmp_path, max_job_cost_usd=5.0),
2808        model=ModelConfig(context_length=262_144),
2809    )
2810    db = AgentDB(tmp_path / "state.db")
2811    try:
2812        job_id = db.create_job("Keep a long-running task inside budget", title="budget limit", kind="generic")
2813        db.append_event(
2814            job_id,
2815            event_type="loop",
2816            title="message_end",
2817            metadata={"usage": {"prompt_tokens": 1_000_000, "completion_tokens": 10_000, "total_tokens": 1_010_000, "cost": 5.25}},
2818        )
2819
2820        result = run_one_step(job_id, config=config, db=db, llm=ExplodingLLM())
2821
2822        assert result.status == "completed"
2823        assert result.tool_name == "budget_limit"
2824        assert result.result["paused"] is True
2825        assert result.result["cost"] == 5.25
2826        job = db.get_job(job_id)
2827        assert job["status"] == "paused"
2828        assert job["metadata"]["usage_budget_limit"]["limit"] == 5.0
2829        assert "configured model cost limit" in job["metadata"]["last_note"]
2830    finally:
2831        db.close()
2832
2833
2834def test_run_one_step_ignores_cost_limit_without_provider_cost_metadata(tmp_path):
2835    config = AppConfig(
2836        runtime=RuntimeConfig(home=tmp_path, max_job_cost_usd=5.0),
2837        model=ModelConfig(context_length=262_144),
2838    )
2839    db = AgentDB(tmp_path / "state.db")
2840    try:
2841        job_id = db.create_job("Keep a long-running task inside budget", title="budget estimate", kind="generic")
2842        db.append_event(
2843            job_id,
2844            event_type="loop",
2845            title="message_end",
2846            metadata={"usage": {"prompt_tokens": 1_000_000, "completion_tokens": 10_000, "total_tokens": 1_010_000}},
2847        )
2848        llm = ScriptedLLM([
2849            LLMResponse(tool_calls=[ToolCall(name="report_update", arguments={"message": "cost not provider reported"})])
2850        ])
2851
2852        result = run_one_step(job_id, config=config, db=db, llm=llm)
2853
2854        assert result.status == "completed"
2855        assert result.tool_name == "report_update"
2856        assert db.get_job(job_id)["status"] == "running"
2857    finally:
2858        db.close()
2859
2860
2861def test_run_one_step_does_not_defer_critical_usage_after_progress(tmp_path):
2862    config = AppConfig(runtime=RuntimeConfig(home=tmp_path), model=ModelConfig(context_length=262_144))
2863    db = AgentDB(tmp_path / "state.db")
2864    try:
2865        job_id = db.create_job("Keep a long-running task efficient", title="usage progress", kind="generic")
2866        db.append_event(
2867            job_id,
2868            event_type="loop",
2869            title="message_end",
2870            metadata={"usage": {"prompt_tokens": 21_000_000, "completion_tokens": 10_000, "total_tokens": 21_010_000, "cost": 11.0}},
2871        )
2872        for error_type in ["APITimeoutError", "APITimeoutError"]:
2873            run_id = db.start_run(job_id, model="test")
2874            step_id = db.add_step(job_id=job_id, run_id=run_id, kind="llm", status="failed")
2875            db.finish_step(
2876                step_id,
2877                status="failed",
2878                summary=f"model call failed: {error_type}",
2879                output_data={"success": False, "error": "timeout", "error_type": error_type},
2880                error="timeout",
2881            )
2882            db.finish_run(run_id, "failed", error="timeout")
2883        run_id = db.start_run(job_id, model="test")
2884        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_experiment")
2885        db.finish_step(
2886            step_id,
2887            status="completed",
2888            summary="recorded measured result",
2889            output_data={
2890                "success": True,
2891                "experiment": {
2892                    "title": "measured result",
2893                    "status": "measured",
2894                    "metric_name": "score",
2895                    "metric_value": 1.0,
2896                },
2897            },
2898        )
2899        db.finish_run(run_id, "completed")
2900        llm = ScriptedLLM([
2901            LLMResponse(tool_calls=[ToolCall(name="report_update", arguments={"message": "continuing from measured progress"})])
2902        ])
2903
2904        result = run_one_step(job_id, config=config, db=db, llm=llm)
2905
2906        assert result.status == "completed"
2907        assert result.tool_name == "report_update"
2908    finally:
2909        db.close()
2910
2911
2912def test_run_one_step_drops_conversation_only_chat_from_worker_prompt(tmp_path):
2913    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
2914    db = AgentDB(tmp_path / "state.db")
2915    try:
2916        job_id = db.create_job("Keep improving a generic task", title="context", kind="generic")
2917        chat = db.append_operator_message(job_id, "hello", source="chat")
2918        correction = db.append_operator_message(job_id, "use the corrected target from chat", source="chat")
2919        llm = CapturingLLM(
2920            LLMResponse(tool_calls=[ToolCall(name="report_update", arguments={"message": "noted", "category": "progress"})])
2921        )
2922
2923        run_one_step(job_id, config=config, db=db, llm=llm)
2924
2925        content = llm.messages[-1]["content"]
2926        assert "hello" not in content
2927        assert "use the corrected target from chat" in content
2928        job = db.get_job(job_id)
2929        messages = {entry["event_id"]: entry for entry in job["metadata"]["operator_messages"]}
2930        assert messages[chat["event_id"]]["acknowledged_at"]
2931        assert messages[correction["event_id"]]["claimed_at"]
2932        assert not messages[correction["event_id"]].get("acknowledged_at")
2933    finally:
2934        db.close()
2935
2936
2937def test_build_messages_keeps_generic_context_under_budget():
2938    job = {
2939        "title": "large context",
2940        "kind": "generic",
2941        "objective": "Improve a measurable process without looping.",
2942        "metadata": {
2943            "operator_messages": [
2944                {"event_id": "chat", "mode": "steer", "message": "how is it going?"},
2945                {"event_id": "use", "mode": "steer", "message": "use the corrected target from chat"},
2946            ],
2947            "lessons": [{"category": "memory", "lesson": "lesson " + "x" * 700} for _ in range(30)],
2948            "task_queue": [
2949                {
2950                    "title": f"Task {index}",
2951                    "status": "open" if index % 3 else "done",
2952                    "priority": index,
2953                    "output_contract": "experiment",
2954                    "acceptance_criteria": "accept " + "x" * 500,
2955                    "evidence_needed": "evidence " + "x" * 500,
2956                    "stall_behavior": "stall " + "x" * 500,
2957                }
2958                for index in range(40)
2959            ],
2960            "finding_ledger": [{"name": f"Finding {index}", "category": "generic", "score": index} for index in range(200)],
2961            "source_ledger": [
2962                {
2963                    "source": f"https://source{index}.example",
2964                    "source_type": "web",
2965                    "usefulness_score": index / 100,
2966                    "yield_count": index % 4,
2967                    "fail_count": index % 3,
2968                    "last_outcome": "outcome " + "x" * 500,
2969                }
2970                for index in range(90)
2971            ],
2972            "experiment_ledger": [
2973                {
2974                    "title": f"Experiment {index}",
2975                    "status": "measured",
2976                    "metric_name": "score",
2977                    "metric_value": index,
2978                    "metric_unit": "units",
2979                    "best_observed": index in {38, 39},
2980                    "result": "result " + "x" * 600,
2981                    "next_action": "next " + "x" * 600,
2982                }
2983                for index in range(40)
2984            ],
2985            "reflections": [{"summary": "summary " + "x" * 800, "strategy": "strategy " + "x" * 800} for _ in range(20)],
2986        },
2987    }
2988    steps = [
2989        {
2990            "step_no": index,
2991            "kind": "tool",
2992            "status": "completed",
2993            "tool_name": "shell_exec",
2994            "summary": "summary " + "x" * 800,
2995            "input": {"arguments": {"command": "command " + "x" * 800}},
2996            "output": {"success": True, "command": "command", "returncode": 0, "stdout": "stdout " + "x" * 3000},
2997        }
2998        for index in range(30)
2999    ]
3000    memory_entries = [{"key": "rolling_state", "summary": "memory " + "x" * 20000, "artifact_refs": [f"art_{i}" for i in range(40)]}]
3001    timeline = [{"event_type": "tool_result", "title": "event", "body": "body " + "x" * 900} for _ in range(40)]
3002
3003    messages = build_messages(job, steps, memory_entries=memory_entries, timeline_events=timeline)
3004    content = messages[-1]["content"]
3005
3006    assert "use the corrected target from chat" in content
3007    assert "how is it going" not in content
3008    assert len(content) < MAX_WORKER_PROMPT_CHARS
3009    assert "Next-action constraint:" in content
3010
3011
3012def test_prompt_timeline_filters_low_signal_tool_noise():
3013    job = {
3014        "title": "timeline",
3015        "kind": "generic",
3016        "objective": "keep useful context visible",
3017        "metadata": {},
3018    }
3019    timeline = [
3020        {
3021            "event_type": "tool_result",
3022            "title": "web_search",
3023            "body": f"search noise {index}",
3024            "metadata": {"status": "completed"},
3025            "created_at": f"2026-05-01T12:{index:02d}:00+00:00",
3026        }
3027        for index in range(20)
3028    ]
3029    timeline.extend([
3030        {
3031            "event_type": "artifact",
3032            "title": "Saved durable report",
3033            "body": "operator-visible output",
3034            "metadata": {},
3035            "created_at": "2026-05-01T13:00:00+00:00",
3036        },
3037        {
3038            "event_type": "finding",
3039            "title": "Useful durable finding",
3040            "body": "result worth keeping",
3041            "metadata": {},
3042            "created_at": "2026-05-01T13:01:00+00:00",
3043        },
3044        {
3045            "event_type": "tool_result",
3046            "title": "shell_exec",
3047            "body": "command failed with actionable blocker",
3048            "metadata": {"status": "failed"},
3049            "created_at": "2026-05-01T13:02:00+00:00",
3050        },
3051    ])
3052
3053    content = build_messages(job, [], timeline_events=timeline)[-1]["content"]
3054
3055    assert "Recent visible timeline:" in content
3056    assert "High-signal timeline counts:" in content
3057    assert "Saved durable report" in content
3058    assert "Useful durable finding" in content
3059    assert "command failed with actionable blocker" in content
3060    assert "search noise" not in content
3061
3062
3063def test_prompt_includes_durable_outcome_summary():
3064    job = {
3065        "title": "outcomes",
3066        "kind": "generic",
3067        "objective": "keep useful durable progress visible",
3068        "metadata": {},
3069    }
3070    events = [
3071        {
3072            "event_type": "artifact",
3073            "title": "Draft checkpoint",
3074            "body": "",
3075            "metadata": {},
3076        },
3077        {
3078            "event_type": "finding",
3079            "title": "Reusable finding",
3080            "body": "",
3081            "metadata": {},
3082        },
3083        {
3084            "event_type": "experiment",
3085            "title": "Quality check",
3086            "body": "",
3087            "metadata": {"metric_name": "score", "metric_value": 0.82, "metric_unit": ""},
3088        },
3089        {
3090            "event_type": "tool_result",
3091            "title": "web_search",
3092            "body": "web_search query='background' returned 5 results",
3093            "metadata": {"status": "completed"},
3094        },
3095    ]
3096
3097    content = build_messages(job, [], timeline_events=events)[-1]["content"]
3098    outcome_section = content.split("Durable outcomes:", 1)[1].split("Ledgers:", 1)[0]
3099
3100    assert "Outcome counts: 1 outputs 1 findings 1 measurements." in outcome_section
3101    assert "save: Draft checkpoint" in outcome_section
3102    assert "find: Reusable finding" in outcome_section
3103    assert "test: Quality check" in outcome_section
3104    assert "background" not in outcome_section
3105
3106
3107def test_emergency_prompt_clipping_repeats_operator_and_next_action():
3108    job = {"title": "clip", "kind": "generic", "objective": "keep context safe"}
3109    sections = [(f"Noise {index}", "noise " * 2000) for index in range(90)]
3110    sections.insert(45, ("Operator context", "Still-active durable operator context: use the corrected target."))
3111    sections.append(("Next-action constraint", "Next use the validated branch."))
3112
3113    content = _render_worker_prompt(job, sections=sections)
3114
3115    assert len(content) <= MAX_WORKER_PROMPT_CHARS
3116    assert "middle context clipped" in content
3117    suffix = content.split("middle context clipped", 1)[1]
3118    assert "Operator context:" in suffix
3119    assert "use the corrected target" in suffix
3120    assert "Next-action constraint:" in suffix
3121    assert "Next use the validated branch" in suffix
3122
3123
3124def test_build_messages_keeps_rolling_memory_when_not_first():
3125    job = {"title": "memory order", "kind": "generic", "objective": "keep long-running context stable"}
3126    memory_entries = [
3127        {"key": "newer_note", "summary": "newer side note"},
3128        {"key": "other_note", "summary": "less important side note"},
3129        {"key": "rolling_state", "summary": "durable rolling state with usage and task progress"},
3130    ]
3131
3132    content = build_messages(job, [], memory_entries=memory_entries)[-1]["content"]
3133
3134    assert "durable rolling state with usage and task progress" in content
3135    assert "newer side note" in content
3136    assert "less important side note" not in content
3137
3138
3139def test_build_messages_surfaces_recent_measurement_evidence_outside_state_window():
3140    job = {"title": "measure", "kind": "generic", "objective": "improve a measurable process", "metadata": {}}
3141    recent_steps = [
3142        {
3143            "step_no": 1,
3144            "kind": "tool",
3145            "status": "completed",
3146            "tool_name": "shell_exec",
3147            "input": {"arguments": {"command": "run benchmark"}},
3148            "output": {
3149                "success": True,
3150                "stdout": (
3151                    "| model | test | t/s |\n"
3152                    "| --- | ---: | ---: |\n"
3153                    "| example | pp32 | 5.48 ± 0.11 |\n"
3154                    "| example | tg128 | 3.44 ± 0.05 |\n"
3155                ),
3156            },
3157        },
3158        *[
3159            {
3160                "step_no": index,
3161                "kind": "tool",
3162                "status": "completed",
3163                "tool_name": "record_lesson",
3164                "summary": f"later step {index}",
3165                "input": {},
3166                "output": {},
3167            }
3168            for index in range(2, 12)
3169        ],
3170    ]
3171
3172    content = build_messages(job, recent_steps)[-1]["content"]
3173
3174    assert "Recent measurement evidence:" in content
3175    assert "step #1 completed" in content
3176    assert "pp32 5.48 ± 0.11 t/s" in content
3177    assert "tg128 3.44 ± 0.05 t/s" in content
3178
3179
3180def test_measurement_obligation_blocks_research_until_recorded(tmp_path):
3181    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
3182    db = AgentDB(tmp_path / "state.db")
3183    try:
3184        job_id = db.create_job("Improve a measurable process", title="measure", kind="generic")
3185
3186        first = run_one_step(
3187            job_id,
3188            config=config,
3189            db=db,
3190            llm=ScriptedLLM([LLMResponse(tool_calls=[ToolCall(name="shell_exec", arguments={"command": "run test"})])]),
3191            registry=MeasuredShellRegistry(),
3192        )
3193        job = db.get_job(job_id)
3194        assert first.tool_name == "shell_exec"
3195        assert job["metadata"]["pending_measurement_obligation"]["metric_candidates"]
3196
3197        second = run_one_step(
3198            job_id,
3199            config=config,
3200            db=db,
3201            llm=ScriptedLLM([LLMResponse(tool_calls=[ToolCall(name="web_search", arguments={"query": "more notes"})])]),
3202            registry=MeasuredShellRegistry(),
3203        )
3204        assert second.status == "blocked"
3205        assert second.result["error"] == "measurement obligation pending"
3206
3207        third = run_one_step(
3208            job_id,
3209            config=config,
3210            db=db,
3211            llm=ScriptedLLM([
3212                LLMResponse(tool_calls=[
3213                    ToolCall(
3214                        name="record_experiment",
3215                            arguments={
3216                                "title": "measured trial",
3217                                "status": "measured",
3218                                "metric_name": "score",
3219                                "metric_value": 2.7,
3220                                "metric_unit": "units/s",
3221                                "next_action": "compare the next concrete variant",
3222                            },
3223                        )
3224                    ])
3225                ]),
3226        )
3227        job = db.get_job(job_id)
3228        assert third.tool_name == "record_experiment"
3229        assert job["metadata"].get("pending_measurement_obligation") == {}
3230        assert job["metadata"]["experiment_ledger"][0]["metric_value"] == 2.7
3231    finally:
3232        db.close()
3233
3234
3235def test_measurement_obligation_preserves_table_metric_candidates(tmp_path):
3236    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
3237    db = AgentDB(tmp_path / "state.db")
3238    try:
3239        job_id = db.create_job("Improve a measurable process", title="measure-table", kind="generic")
3240
3241        step = run_one_step(
3242            job_id,
3243            config=config,
3244            db=db,
3245            llm=ScriptedLLM([LLMResponse(tool_calls=[ToolCall(name="shell_exec", arguments={"command": "run benchmark"})])]),
3246            registry=TableBenchmarkShellRegistry(),
3247        )
3248
3249        job = db.get_job(job_id)
3250        candidates = job["metadata"]["pending_measurement_obligation"]["metric_candidates"]
3251        assert step.tool_name == "shell_exec"
3252        assert "pp32 5.48 ± 0.11 t/s" in candidates
3253        assert "tg128 3.44 ± 0.05 t/s" in candidates
3254    finally:
3255        db.close()
3256
3257
3258def test_failed_shell_measurement_output_still_requires_accounting(tmp_path):
3259    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
3260    db = AgentDB(tmp_path / "state.db")
3261    try:
3262        job_id = db.create_job("Improve a measurable process", title="measure-failed-table", kind="generic")
3263
3264        step = run_one_step(
3265            job_id,
3266            config=config,
3267            db=db,
3268            llm=ScriptedLLM([LLMResponse(tool_calls=[ToolCall(name="shell_exec", arguments={"command": "run benchmark"})])]),
3269            registry=FailedTableBenchmarkShellRegistry(),
3270        )
3271
3272        job = db.get_job(job_id)
3273        candidates = job["metadata"]["pending_measurement_obligation"]["metric_candidates"]
3274        assert step.status == "failed"
3275        assert "pp32 5.48 ± 0.11 t/s" in candidates
3276    finally:
3277        db.close()
3278
3279
3280def test_measurement_obligation_blocks_operator_acknowledgement_churn(tmp_path):
3281    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
3282    db = AgentDB(tmp_path / "state.db")
3283    try:
3284        job_id = db.create_job("Improve a measurable process", title="measure-ack", kind="generic")
3285        db.update_job_metadata(
3286            job_id,
3287            {
3288                "pending_measurement_obligation": {
3289                    "source_step_no": 12,
3290                    "tool": "shell_exec",
3291                    "metric_candidates": ["2.7 tok/s"],
3292                    "command": "run benchmark",
3293                }
3294            },
3295        )
3296
3297        result = run_one_step(
3298            job_id,
3299            config=config,
3300            db=db,
3301            llm=ScriptedLLM([
3302                LLMResponse(tool_calls=[
3303                    ToolCall(
3304                        name="acknowledge_operator_context",
3305                        arguments={"message_ids": [], "summary": "acknowledged"},
3306                    )
3307                ])
3308            ]),
3309        )
3310
3311        assert result.status == "blocked"
3312        assert result.tool_name == "acknowledge_operator_context"
3313        assert result.result["error"] == "measurement obligation pending"
3314    finally:
3315        db.close()
3316
3317
3318def test_pending_measurement_narrows_available_tools(tmp_path):
3319    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
3320    db = AgentDB(tmp_path / "state.db")
3321    try:
3322        job_id = db.create_job("Improve a measurable process", title="measure-tools", kind="generic")
3323        db.update_job_metadata(
3324            job_id,
3325            {
3326                "pending_measurement_obligation": {
3327                    "source_step_no": 12,
3328                    "tool": "shell_exec",
3329                    "metric_candidates": ["2.7 tok/s"],
3330                    "command": "run benchmark",
3331                }
3332            },
3333        )
3334        llm = CapturingLLM(
3335            LLMResponse(tool_calls=[
3336                ToolCall(
3337                    name="record_experiment",
3338                    arguments={
3339                        "title": "measured trial",
3340                        "status": "measured",
3341                        "metric_name": "speed",
3342                        "metric_value": 2.7,
3343                        "metric_unit": "tok/s",
3344                        "next_action": "try the next measured branch",
3345                    },
3346                )
3347            ])
3348        )
3349
3350        run_one_step(job_id, config=config, db=db, llm=llm)
3351
3352        tool_names = {tool["function"]["name"] for tool in llm.tools}
3353        assert {"record_experiment", "record_lesson", "record_tasks"}.issubset(tool_names)
3354        assert "shell_exec" not in tool_names
3355        assert "web_search" not in tool_names
3356        assert "acknowledge_operator_context" not in tool_names
3357    finally:
3358        db.close()
3359
3360
3361def test_resolution_tools_survive_task_saturation_suppression(tmp_path):
3362    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
3363    db = AgentDB(tmp_path / "state.db")
3364    try:
3365        job_id = db.create_job(
3366            "Improve a measurable process",
3367            title="measure-tools-after-saturation",
3368            kind="generic",
3369            metadata={
3370                "pending_measurement_obligation": {
3371                    "source_step_no": 12,
3372                    "tool": "shell_exec",
3373                    "metric_candidates": ["2.7 tok/s"],
3374                    "command": "run benchmark",
3375                }
3376            },
3377        )
3378        run_id = db.start_run(job_id, model="fake")
3379        for step_no in range(2):
3380            step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_tasks")
3381            db.finish_step(
3382                step_id,
3383                status="blocked",
3384                summary="blocked record_tasks; task queue saturated",
3385                output_data={
3386                    "success": False,
3387                    "error": "task queue saturated",
3388                    "task_queue": {"reason": "total task queue is too large", "total_count": 80 + step_no},
3389                },
3390            )
3391        llm = CapturingLLM(
3392            LLMResponse(tool_calls=[
3393                ToolCall(
3394                    name="record_lesson",
3395                    arguments={"lesson": "Measurement is blocked until the current branch is reconciled."},
3396                )
3397            ])
3398        )
3399
3400        run_one_step(job_id, config=config, db=db, llm=llm)
3401
3402        tool_names = {tool["function"]["name"] for tool in llm.tools}
3403        assert {"record_experiment", "record_lesson", "record_tasks"}.issubset(tool_names)
3404        assert "web_search" not in tool_names
3405    finally:
3406        db.close()
3407
3408
3409def test_pending_evidence_checkpoint_narrows_available_tools(tmp_path):
3410    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
3411    db = AgentDB(tmp_path / "state.db")
3412    try:
3413        job_id = db.create_job("Account for checkpointed evidence", title="checkpoint-tools", kind="generic")
3414        db.update_job_metadata(
3415            job_id,
3416            {
3417                "pending_evidence_checkpoint": {
3418                    "artifact_id": "art_checkpoint",
3419                    "title": "Checkpoint",
3420                    "evidence_step_no": 12,
3421                    "blocked_tool": "shell_exec",
3422                    "created_at": "2026-01-01T00:00:00+00:00",
3423                }
3424            },
3425        )
3426        llm = CapturingLLM(
3427            LLMResponse(tool_calls=[
3428                ToolCall(name="record_lesson", arguments={"lesson": "checkpoint accounted for", "category": "memory"})
3429            ])
3430        )
3431
3432        run_one_step(job_id, config=config, db=db, llm=llm)
3433
3434        tool_names = {tool["function"]["name"] for tool in llm.tools}
3435        assert "read_artifact" in tool_names
3436        assert {"record_findings", "record_source", "record_lesson", "record_experiment"}.issubset(tool_names)
3437        assert "record_tasks" not in tool_names
3438        assert "shell_exec" not in tool_names
3439        assert "web_search" not in tool_names
3440        assert "acknowledge_operator_context" not in tool_names
3441    finally:
3442        db.close()
3443
3444
3445def test_acknowledge_operator_context_hidden_without_active_operator_context(tmp_path):
3446    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
3447    db = AgentDB(tmp_path / "state.db")
3448    try:
3449        job_id = db.create_job("Run ordinary autonomous work", title="no-operator", kind="generic")
3450        llm = CapturingLLM(LLMResponse(tool_calls=[ToolCall(name="report_update", arguments={"message": "working"})]))
3451
3452        run_one_step(job_id, config=config, db=db, llm=llm)
3453
3454        tool_names = {tool["function"]["name"] for tool in llm.tools}
3455        assert "acknowledge_operator_context" not in tool_names
3456    finally:
3457        db.close()
3458
3459
3460def test_acknowledge_operator_context_visible_with_active_operator_context(tmp_path):
3461    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
3462    db = AgentDB(tmp_path / "state.db")
3463    try:
3464        job_id = db.create_job("Run with operator steering", title="operator", kind="generic")
3465        db.append_operator_message(job_id, "use the corrected target", source="chat")
3466        llm = CapturingLLM(LLMResponse(tool_calls=[ToolCall(name="report_update", arguments={"message": "working"})]))
3467
3468        run_one_step(job_id, config=config, db=db, llm=llm)
3469
3470        tool_names = {tool["function"]["name"] for tool in llm.tools}
3471        assert "acknowledge_operator_context" in tool_names
3472    finally:
3473        db.close()
3474
3475
3476def test_diagnostic_shell_output_does_not_create_measurement_obligation(tmp_path):
3477    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
3478    db = AgentDB(tmp_path / "state.db")
3479    try:
3480        job_id = db.create_job("Improve a measurable process", title="measure", kind="generic")
3481
3482        result = run_one_step(
3483            job_id,
3484            config=config,
3485            db=db,
3486            llm=ScriptedLLM([LLMResponse(tool_calls=[ToolCall(name="shell_exec", arguments={"command": "df -h && nproc && free -h"})])]),
3487            registry=DiagnosticShellRegistry(),
3488        )
3489
3490        job = db.get_job(job_id)
3491        assert result.tool_name == "shell_exec"
3492        assert job["metadata"].get("pending_measurement_obligation") in (None, {})
3493    finally:
3494        db.close()
3495
3496
3497def test_source_code_shell_output_does_not_create_measurement_obligation(tmp_path):
3498    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
3499    db = AgentDB(tmp_path / "state.db")
3500    try:
3501        job_id = db.create_job("Improve a measurable process", title="measure", kind="generic")
3502
3503        result = run_one_step(
3504            job_id,
3505            config=config,
3506            db=db,
3507            llm=ScriptedLLM([
3508                LLMResponse(tool_calls=[ToolCall(name="shell_exec", arguments={"command": "git show HEAD:nipux_cli/cli.py"})])
3509            ]),
3510            registry=SourceCodeShellRegistry(),
3511        )
3512
3513        job = db.get_job(job_id)
3514        assert result.tool_name == "shell_exec"
3515        assert job["metadata"].get("pending_measurement_obligation") in (None, {})
3516    finally:
3517        db.close()
3518
3519
3520def test_prose_from_timed_command_does_not_create_measurement_obligation(tmp_path):
3521    class ProseShellRegistry:
3522        def openai_tools(self):
3523            return []
3524
3525        def handle(self, name, args, ctx):
3526            del args, ctx
3527            if name == "shell_exec":
3528                return json.dumps({
3529                    "success": True,
3530                    "command": "time cat draft.txt",
3531                    "returncode": 0,
3532                    "stdout": 'This draft says "time". 2 examples are listed. It asks readers to rate a story.',
3533                    "stderr": "",
3534                })
3535            return json.dumps({"success": True})
3536
3537    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
3538    db = AgentDB(tmp_path / "state.db")
3539    try:
3540        job_id = db.create_job("Improve a measurable process", title="measure", kind="generic")
3541
3542        result = run_one_step(
3543            job_id,
3544            config=config,
3545            db=db,
3546            llm=ScriptedLLM([LLMResponse(tool_calls=[ToolCall(name="shell_exec", arguments={"command": "time cat draft.txt"})])]),
3547            registry=ProseShellRegistry(),
3548        )
3549
3550        job = db.get_job(job_id)
3551        assert result.tool_name == "shell_exec"
3552        assert job["metadata"].get("pending_measurement_obligation") in (None, {})
3553    finally:
3554        db.close()
3555
3556
3557def test_large_shell_output_must_be_saved_before_more_shell_churn(tmp_path):
3558    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
3559    db = AgentDB(tmp_path / "state.db")
3560    try:
3561        job_id = db.create_job("Audit a repository", title="audit", kind="generic")
3562
3563        first = run_one_step(
3564            job_id,
3565            config=config,
3566            db=db,
3567            llm=ScriptedLLM([LLMResponse(tool_calls=[ToolCall(name="shell_exec", arguments={"command": "find . -type f"})])]),
3568            registry=LargeShellEvidenceRegistry(),
3569        )
3570        second = run_one_step(
3571            job_id,
3572            config=config,
3573            db=db,
3574            llm=ScriptedLLM([LLMResponse(tool_calls=[ToolCall(name="shell_exec", arguments={"command": "find . -name '*.md'"})])]),
3575            registry=LargeShellEvidenceRegistry(),
3576        )
3577
3578        assert first.tool_name == "shell_exec"
3579        assert second.status == "blocked"
3580        assert second.result["error"] == "artifact required before more research"
3581    finally:
3582        db.close()
3583
3584
3585def test_stale_diagnostic_measurement_obligation_is_cleared(tmp_path):
3586    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
3587    db = AgentDB(tmp_path / "state.db")
3588    try:
3589        job_id = db.create_job(
3590            "Improve a measurable process",
3591            title="measure",
3592            kind="generic",
3593            metadata={
3594                "pending_measurement_obligation": {
3595                    "source_step_no": 1,
3596                    "command": "df -h && nproc && free -h",
3597                    "metric_candidates": ["CPU COUNT 24", "RAM 93"],
3598                }
3599            },
3600        )
3601
3602        result = run_one_step(
3603            job_id,
3604            config=config,
3605            db=db,
3606            llm=ScriptedLLM([
3607                LLMResponse(tool_calls=[
3608                    ToolCall(
3609                        name="record_lesson",
3610                        arguments={
3611                            "lesson": "The stale output is diagnostic context, not a valid measurement; rerun with a metric.",
3612                            "category": "memory",
3613                        },
3614                    )
3615                ])
3616            ]),
3617        )
3618
3619        job = db.get_job(job_id)
3620        assert result.tool_name == "record_lesson"
3621        assert job["metadata"].get("pending_measurement_obligation") == {}
3622        assert "diagnostic context" in job["metadata"]["last_agent_update"]["message"]
3623    finally:
3624        db.close()
3625
3626
3627def test_measurable_objective_blocks_research_after_budget_but_allows_action(tmp_path):
3628    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
3629    db = AgentDB(tmp_path / "state.db")
3630    try:
3631        job_id = db.create_job("Optimize a measurable process", title="measured", kind="generic")
3632        for index in range(19):
3633            run_id = db.start_run(job_id)
3634            step_id = db.add_step(
3635                job_id=job_id,
3636                run_id=run_id,
3637                kind="tool",
3638                tool_name="web_search" if index % 2 == 0 else "web_extract",
3639                input_data={"arguments": {"query": f"research branch {index}"}},
3640            )
3641            db.finish_step(step_id, status="completed", output_data={"success": True})
3642            db.finish_run(run_id, "completed")
3643
3644        blocked = run_one_step(
3645            job_id,
3646            config=config,
3647            db=db,
3648            llm=ScriptedLLM([LLMResponse(tool_calls=[ToolCall(name="web_search", arguments={"query": "more research"})])]),
3649            registry=MeasuredShellRegistry(),
3650        )
3651        assert blocked.status == "blocked"
3652        assert blocked.result["error"] == "measured progress required"
3653
3654        action = run_one_step(
3655            job_id,
3656            config=config,
3657            db=db,
3658            llm=ScriptedLLM([LLMResponse(tool_calls=[ToolCall(name="shell_exec", arguments={"command": "run test"})])]),
3659            registry=MeasuredShellRegistry(),
3660        )
3661        job = db.get_job(job_id)
3662        assert action.status == "completed"
3663        assert action.tool_name == "shell_exec"
3664        assert job["metadata"]["pending_measurement_obligation"]["metric_candidates"]
3665    finally:
3666        db.close()
3667
3668
3669def test_measurable_objective_blocks_shell_churn_without_experiment_accounting(tmp_path):
3670    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
3671    db = AgentDB(tmp_path / "state.db")
3672    try:
3673        job_id = db.create_job("Optimize a measurable process", title="measured", kind="generic")
3674        for index in range(4):
3675            run_id = db.start_run(job_id)
3676            step_id = db.add_step(
3677                job_id=job_id,
3678                run_id=run_id,
3679                kind="tool",
3680                tool_name="shell_exec",
3681                input_data={"arguments": {"command": f"probe {index}"}},
3682            )
3683            db.finish_step(step_id, status="completed", output_data={"success": True, "stdout": "no metric"})
3684            db.finish_run(run_id, "completed")
3685
3686        blocked = run_one_step(
3687            job_id,
3688            config=config,
3689            db=db,
3690            llm=ScriptedLLM([LLMResponse(tool_calls=[ToolCall(name="shell_exec", arguments={"command": "probe again"})])]),
3691            registry=MeasuredShellRegistry(),
3692        )
3693
3694        assert blocked.status == "blocked"
3695        assert blocked.result["error"] == "measured progress required"
3696    finally:
3697        db.close()
3698
3699
3700def test_measured_progress_guard_narrows_available_tools_after_shell_budget(tmp_path):
3701    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
3702    db = AgentDB(tmp_path / "state.db")
3703    try:
3704        job_id = db.create_job("Optimize a measurable process", title="measured-tools", kind="generic")
3705        for index in range(4):
3706            run_id = db.start_run(job_id)
3707            step_id = db.add_step(
3708                job_id=job_id,
3709                run_id=run_id,
3710                kind="tool",
3711                tool_name="shell_exec",
3712                input_data={"arguments": {"command": f"probe {index}"}},
3713            )
3714            db.finish_step(step_id, status="completed", output_data={"success": True, "stdout": "no metric"})
3715            db.finish_run(run_id, "completed")
3716        llm = CapturingLLM(
3717            LLMResponse(tool_calls=[
3718                ToolCall(
3719                    name="record_lesson",
3720                    arguments={"lesson": "measurement blocked after probes", "category": "blocker"},
3721                )
3722            ])
3723        )
3724
3725        run_one_step(job_id, config=config, db=db, llm=llm)
3726
3727        tool_names = {tool["function"]["name"] for tool in llm.tools}
3728        assert {"record_experiment", "record_lesson", "record_tasks"}.issubset(tool_names)
3729        assert "shell_exec" not in tool_names
3730        assert "write_artifact" not in tool_names
3731        assert "web_search" not in tool_names
3732    finally:
3733        db.close()
3734
3735
3736def test_measured_progress_guard_keeps_shell_available_before_shell_budget(tmp_path):
3737    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
3738    db = AgentDB(tmp_path / "state.db")
3739    try:
3740        job_id = db.create_job("Optimize a measurable process", title="measured-tools", kind="generic")
3741        for index in range(19):
3742            run_id = db.start_run(job_id)
3743            step_id = db.add_step(
3744                job_id=job_id,
3745                run_id=run_id,
3746                kind="tool",
3747                tool_name="web_search" if index % 2 == 0 else "web_extract",
3748                input_data={"arguments": {"query": f"research branch {index}"}},
3749            )
3750            db.finish_step(step_id, status="completed", output_data={"success": True})
3751            db.finish_run(run_id, "completed")
3752        llm = CapturingLLM(
3753            LLMResponse(tool_calls=[
3754                ToolCall(
3755                    name="record_lesson",
3756                    arguments={"lesson": "convert research budget into a measured trial", "category": "strategy"},
3757                )
3758            ])
3759        )
3760
3761        run_one_step(job_id, config=config, db=db, llm=llm)
3762
3763        tool_names = {tool["function"]["name"] for tool in llm.tools}
3764        assert "shell_exec" in tool_names
3765        assert {"record_experiment", "record_lesson", "record_tasks"}.issubset(tool_names)
3766        assert "write_artifact" not in tool_names
3767        assert "web_search" not in tool_names
3768    finally:
3769        db.close()
3770
3771
3772def test_measured_progress_guard_ignores_non_measurement_task_updates(tmp_path):
3773    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
3774    db = AgentDB(tmp_path / "state.db")
3775    try:
3776        job_id = db.create_job("Optimize a measurable process", title="measured", kind="generic")
3777        for index in range(18):
3778            run_id = db.start_run(job_id)
3779            step_id = db.add_step(
3780                job_id=job_id,
3781                run_id=run_id,
3782                kind="tool",
3783                tool_name="web_search",
3784                input_data={"arguments": {"query": f"research branch {index}"}},
3785            )
3786            db.finish_step(step_id, status="completed", output_data={"success": True})
3787            db.finish_run(run_id, "completed")
3788        run_id = db.start_run(job_id)
3789        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_tasks")
3790        db.finish_step(
3791            step_id,
3792            status="completed",
3793            output_data={
3794                "success": True,
3795                "tasks": [{"title": "Write notes", "status": "open", "output_contract": "report"}],
3796            },
3797        )
3798        db.finish_run(run_id, "completed")
3799
3800        blocked = run_one_step(
3801            job_id,
3802            config=config,
3803            db=db,
3804            llm=ScriptedLLM([LLMResponse(tool_calls=[ToolCall(name="web_search", arguments={"query": "more notes"})])]),
3805            registry=MeasuredShellRegistry(),
3806        )
3807
3808        assert blocked.status == "blocked"
3809        assert blocked.result["error"] == "measured progress required"
3810    finally:
3811        db.close()
3812
3813
3814def test_measured_progress_guard_accepts_measurement_task_update(tmp_path):
3815    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
3816    db = AgentDB(tmp_path / "state.db")
3817    try:
3818        job_id = db.create_job("Optimize a measurable process", title="measured", kind="generic")
3819        for index in range(18):
3820            run_id = db.start_run(job_id)
3821            step_id = db.add_step(
3822                job_id=job_id,
3823                run_id=run_id,
3824                kind="tool",
3825                tool_name="web_search",
3826                input_data={"arguments": {"query": f"research branch {index}"}},
3827            )
3828            db.finish_step(step_id, status="completed", output_data={"success": True})
3829            db.finish_run(run_id, "completed")
3830        run_id = db.start_run(job_id)
3831        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_tasks")
3832        db.finish_step(
3833            step_id,
3834            status="completed",
3835            output_data={
3836                "success": True,
3837                "tasks": [{"title": "Run measured variant", "status": "open", "output_contract": "experiment"}],
3838            },
3839        )
3840        db.finish_run(run_id, "completed")
3841
3842        allowed = run_one_step(
3843            job_id,
3844            config=config,
3845            db=db,
3846            llm=ScriptedLLM([LLMResponse(tool_calls=[ToolCall(name="shell_exec", arguments={"command": "run measured variant"})])]),
3847            registry=MeasuredShellRegistry(),
3848        )
3849
3850        assert allowed.status == "completed"
3851        assert allowed.tool_name == "shell_exec"
3852    finally:
3853        db.close()
3854
3855
3856def test_measurable_objective_allows_candidate_file_validation_shell_after_budget(tmp_path):
3857    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
3858    db = AgentDB(tmp_path / "state.db")
3859    try:
3860        job_id = db.create_job("Optimize a measurable file-backed process", title="measured-file", kind="generic")
3861        db.update_job_metadata(
3862            job_id,
3863            {
3864                "task_queue": [
3865                    {
3866                        "title": "Validate candidate file and benchmark",
3867                        "status": "active",
3868                        "acceptance_criteria": "Exact candidate path is validated before benchmarking.",
3869                        "evidence_needed": "Shell output showing file size for /srv/models/AlphaModel-99-Q4.foo.",
3870                        "output_contract": "experiment",
3871                    }
3872                ]
3873            },
3874        )
3875        for index in range(4):
3876            run_id = db.start_run(job_id)
3877            stdout = "no metric"
3878            if index == 0:
3879                stdout = "/srv/models/AlphaModel-99-Q4.foo\n"
3880            step_id = db.add_step(
3881                job_id=job_id,
3882                run_id=run_id,
3883                kind="tool",
3884                tool_name="shell_exec",
3885                input_data={"arguments": {"command": f"probe {index}"}},
3886            )
3887            db.finish_step(step_id, status="completed", output_data={"success": True, "stdout": stdout})
3888            db.finish_run(run_id, "completed")
3889
3890        result = run_one_step(
3891            job_id,
3892            config=config,
3893            db=db,
3894            llm=ScriptedLLM([
3895                LLMResponse(tool_calls=[
3896                    ToolCall(
3897                        name="shell_exec",
3898                        arguments={"command": "ls -lh /srv/models/AlphaModel-99-Q4.foo && file /srv/models/AlphaModel-99-Q4.foo"},
3899                    )
3900                ])
3901            ]),
3902            registry=MeasuredShellRegistry(),
3903        )
3904
3905        assert result.status == "completed"
3906        assert result.tool_name == "shell_exec"
3907    finally:
3908        db.close()
3909
3910
3911def test_prompt_includes_durable_lessons():
3912    job = {
3913        "title": "research",
3914        "kind": "generic",
3915        "objective": "find research",
3916        "metadata": {
3917            "lessons": [{
3918                "category": "source_quality",
3919                "lesson": "Low-evidence pages are background noise, not durable findings.",
3920            }],
3921        },
3922    }
3923
3924    messages = build_messages(job, [])
3925
3926    content = messages[-1]["content"]
3927    assert "Lessons learned:" in content
3928    assert "Low-evidence pages are background noise" in content
3929
3930
3931def test_prompt_suppresses_stale_negative_lessons_when_positive_durable_evidence_exists():
3932    job = {
3933        "title": "research",
3934        "kind": "generic",
3935        "objective": "keep facts current",
3936        "metadata": {
3937            "finding_ledger": [
3938                {
3939                    "name": "Observed local model",
3940                    "reason": "ModelX-99 appears in the local model list with size 17 GB.",
3941                }
3942            ],
3943            "lessons": [
3944                {
3945                    "category": "strategy",
3946                    "lesson": (
3947                        "No ModelX-99 model has been successfully downloaded, so keep the download "
3948                        "branch as the primary blocker before benchmark work."
3949                    ),
3950                }
3951            ],
3952        },
3953    }
3954
3955    content = build_messages(job, [])[-1]["content"]
3956
3957    assert "Potentially stale negative lesson suppressed for ModelX-99" in content
3958    assert "No ModelX-99 model has been successfully downloaded" not in content
3959
3960
3961def test_prompt_keeps_negative_lessons_when_durable_evidence_is_negative():
3962    job = {
3963        "title": "research",
3964        "kind": "generic",
3965        "objective": "keep facts current",
3966        "metadata": {
3967            "finding_ledger": [
3968                {
3969                    "name": "Missing local model",
3970                    "reason": "ls cannot access ModelX-99: no such file or directory.",
3971                }
3972            ],
3973            "lessons": [
3974                {
3975                    "category": "strategy",
3976                    "lesson": "No ModelX-99 file exists in the checked path; use a different observed source.",
3977                }
3978            ],
3979        },
3980    }
3981
3982    content = build_messages(job, [])[-1]["content"]
3983
3984    assert "No ModelX-99 file exists" in content
3985    assert "Potentially stale negative lesson suppressed" not in content
3986
3987
3988def test_prompt_includes_memory_graph_slice():
3989    job = {
3990        "title": "research",
3991        "kind": "generic",
3992        "objective": "keep improving the output",
3993        "metadata": {
3994            "memory_graph": {
3995                "nodes": [
3996                    {
3997                        "key": "validated-checkpoints",
3998                        "title": "Validated checkpoints compound progress",
3999                        "kind": "strategy",
4000                        "status": "active",
4001                        "summary": "Save evidence, validate it, then branch from the gap.",
4002                        "salience": 0.95,
4003                        "tags": ["progress"],
4004                        "evidence_refs": ["art_1"],
4005                    },
4006                    {
4007                        "key": "weak-source",
4008                        "title": "Weak source path",
4009                        "kind": "source",
4010                        "status": "deprecated",
4011                        "summary": "This path produced low-yield repeats.",
4012                        "salience": 0.1,
4013                    },
4014                ],
4015                "edges": [
4016                    {
4017                        "from_key": "validated-checkpoints",
4018                        "to_key": "weak-source",
4019                        "relation": "replaces",
4020                    }
4021                ],
4022            }
4023        },
4024    }
4025
4026    content = build_messages(job, [])[-1]["content"]
4027
4028    assert "Memory graph:" in content
4029    assert "Validated checkpoints compound progress" in content
4030    assert "strategy" in content
4031    assert "replaces -> weak-source" in content
4032    assert "art_1" in content
4033
4034
4035def test_prompt_suppresses_memory_graph_nodes_matching_stale_claim_tokens(tmp_path):
4036    db = AgentDB(tmp_path / "state.db")
4037    try:
4038        job_id = db.create_job("Prefer current durable evidence", title="stale-graph", kind="generic")
4039        db.append_lesson(
4040            job_id,
4041            "Evidence grounding rejected unsupported concrete tokens for record_experiment: E5-2690, v3. Treat matching prior ledger claims as stale.",
4042            category="mistake",
4043        )
4044        db.append_memory_graph_records(
4045            job_id,
4046            nodes=[
4047                {
4048                    "key": "old-hardware",
4049                    "title": "Intel Xeon E5-2690 v3 baseline",
4050                    "kind": "fact",
4051                    "status": "stable",
4052                    "summary": "Old baseline that should not enter the prompt after contradiction.",
4053                },
4054                {
4055                    "key": "current-evidence",
4056                    "title": "Current observed hardware needs verification",
4057                    "kind": "fact",
4058                    "status": "active",
4059                    "summary": "Continue from fresh shell evidence only.",
4060                },
4061            ],
4062        )
4063
4064        job = db.get_job(job_id)
4065        content = build_messages(job, db.list_steps(job_id=job_id))[-1]["content"]
4066
4067        assert "Suppressed 1 stale memory node" in content
4068        assert "Current observed hardware needs verification" in content
4069        assert "Intel Xeon E5-2690 v3 baseline" not in content
4070    finally:
4071        db.close()
4072
4073
4074def test_prompt_suppresses_negative_memory_graph_nodes_matching_stale_file_type(tmp_path):
4075    db = AgentDB(tmp_path / "state.db")
4076    try:
4077        job_id = db.create_job("Prefer current file evidence", title="stale-file-type", kind="generic")
4078        db.update_job_metadata(
4079            job_id,
4080            {
4081                "stale_negative_records": [
4082                    {
4083                        "kind": "memory_node",
4084                        "record_id": "old-absence",
4085                        "token": ".foo",
4086                        "evidence": "/srv/models/AlphaModel.foo",
4087                    }
4088                ]
4089            },
4090        )
4091        db.append_memory_graph_records(
4092            job_id,
4093            nodes=[
4094                {
4095                    "key": "download-blocker",
4096                    "title": "Model file download critical blocker",
4097                    "kind": "constraint",
4098                    "status": "active",
4099                    "summary": "FOO model download attempts return 0 files. All downstream work is blocked until a model file exists locally.",
4100                },
4101                {
4102                    "key": "format-skill",
4103                    "title": "FOO format tuning",
4104                    "kind": "skill",
4105                    "status": "active",
4106                    "summary": "Use the FOO runtime flags after a valid file is selected.",
4107                },
4108            ],
4109        )
4110
4111        job = db.get_job(job_id)
4112        content = build_messages(job, db.list_steps(job_id=job_id))[-1]["content"]
4113
4114        assert "Suppressed 1 stale memory node" in content
4115        assert "Model file download critical blocker" not in content
4116        assert "FOO format tuning" in content
4117    finally:
4118        db.close()
4119
4120
4121def test_prompt_pushes_memory_graph_consolidation_when_ledgers_exist_without_nodes():
4122    job = {
4123        "title": "research",
4124        "kind": "generic",
4125        "objective": "keep improving the output",
4126        "metadata": {
4127            "lessons": [{"lesson": "Prefer validated checkpoints.", "category": "strategy"}],
4128            "experiment_ledger": [{"title": "Trial", "status": "measured"}],
4129            "task_queue": [{"title": "Next branch", "status": "open"}],
4130        },
4131    }
4132
4133    content = build_messages(job, [])[-1]["content"]
4134
4135    assert "No memory graph yet" in content
4136    assert "Durable ledgers already contain 3 reusable item" in content
4137    assert "record_memory_graph" in content
4138
4139
4140def test_prompt_adds_memory_consolidation_guard_when_graph_lags_ledgers():
4141    job = {
4142        "title": "research",
4143        "kind": "generic",
4144        "objective": "keep improving the output",
4145        "metadata": {
4146            "lessons": [
4147                {"lesson": "Use validated checkpoints.", "category": "strategy"},
4148                {"lesson": "Reject low-yield branches.", "category": "strategy"},
4149            ],
4150            "experiment_ledger": [{"title": "Trial", "status": "measured"}],
4151            "finding_ledger": [{"name": "Finding A"}, {"name": "Finding B"}],
4152            "source_ledger": [{"source": "source:a"}],
4153        },
4154    }
4155
4156    content = build_messages(job, [])[-1]["content"]
4157
4158    assert "Memory consolidation guard:" in content
4159    assert "durable_records=6" in content
4160    assert "record_memory_graph" in content
4161
4162
4163def test_prompt_adds_research_balance_guard_for_execution_without_sources():
4164    job = {
4165        "title": "workflow builder",
4166        "kind": "generic",
4167        "objective": "build a durable workflow and keep improving it",
4168        "metadata": {
4169            "experiment_ledger": [{"title": "Validation check", "status": "measured"}],
4170        },
4171    }
4172    steps = [
4173        {
4174            "step_no": index,
4175            "kind": "tool",
4176            "tool_name": "shell_exec",
4177            "status": "completed",
4178            "input": {"arguments": {"command": f"echo branch-{index}"}},
4179        }
4180        for index in range(1, 7)
4181    ]
4182
4183    content = build_messages(job, steps)[-1]["content"]
4184
4185    assert "Research balance guard:" in content
4186    assert "execution-heavy" in content
4187    assert "sources=0" in content
4188    assert "record_source" in content
4189
4190
4191def test_prompt_research_balance_guard_clears_when_sources_exist():
4192    job = {
4193        "title": "workflow builder",
4194        "kind": "generic",
4195        "objective": "build a durable workflow and keep improving it",
4196        "metadata": {
4197            "source_ledger": [{"source": "project docs"}],
4198            "experiment_ledger": [{"title": "Validation check", "status": "measured"}],
4199        },
4200    }
4201    steps = [
4202        {
4203            "step_no": index,
4204            "kind": "tool",
4205            "tool_name": "shell_exec",
4206            "status": "completed",
4207            "input": {"arguments": {"command": f"echo branch-{index}"}},
4208        }
4209        for index in range(1, 7)
4210    ]
4211
4212    content = build_messages(job, steps)[-1]["content"]
4213
4214    assert "Recent work is execution-heavy" not in content
4215
4216
4217def _source_yield_metadata(source_count: int = 16, finding_count: int = 1, *, include_memory_graph: bool = True) -> dict:
4218    metadata = {
4219        "source_ledger": [
4220            {
4221                "source": f"https://source.example/{index}",
4222                "source_type": "web_extract",
4223                "usefulness_score": 0.55,
4224                "yield_count": 0,
4225                "last_outcome": "extracted source text for possible use",
4226            }
4227            for index in range(source_count)
4228        ],
4229        "finding_ledger": [
4230            {
4231                "name": f"Finding {index}",
4232                "source_url": f"https://source.example/{index}",
4233            }
4234            for index in range(finding_count)
4235        ],
4236    }
4237    if include_memory_graph:
4238        metadata["memory_graph"] = {
4239            "nodes": [
4240                {"key": f"source-node-{index}", "kind": "source", "title": f"Source set {index}"}
4241                for index in range(4)
4242            ],
4243            "edges": [
4244                {"from": "source-node-0", "to": "source-node-1", "kind": "supports"},
4245                {"from": "source-node-1", "to": "source-node-2", "kind": "supports"},
4246                {"from": "source-node-2", "to": "source-node-3", "kind": "supports"},
4247            ],
4248        }
4249    return metadata
4250
4251
4252def _source_gathering_steps(count: int = 6) -> list[dict]:
4253    return [
4254        {
4255            "step_no": index,
4256            "kind": "tool",
4257            "tool_name": "web_extract" if index % 2 else "web_search",
4258            "status": "completed",
4259            "input": {"arguments": {"query": f"source branch {index}"}},
4260        }
4261        for index in range(1, count + 1)
4262    ]
4263
4264
4265def test_prompt_adds_source_yield_guard_when_sources_are_not_synthesized():
4266    job = {
4267        "title": "source-heavy job",
4268        "kind": "generic",
4269        "objective": "research and produce durable conclusions",
4270        "metadata": _source_yield_metadata(),
4271    }
4272
4273    content = build_messages(job, _source_gathering_steps())[-1]["content"]
4274
4275    assert "Source yield guard:" in content
4276    assert "Many sources have been gathered" in content
4277    assert "sources=16" in content
4278    assert "findings=1" in content
4279    assert "record_findings" in content
4280
4281
4282def test_prompt_source_yield_guard_clears_when_findings_cover_sources():
4283    job = {
4284        "title": "source-heavy job",
4285        "kind": "generic",
4286        "objective": "research and produce durable conclusions",
4287        "metadata": _source_yield_metadata(finding_count=2),
4288    }
4289
4290    content = build_messages(job, _source_gathering_steps())[-1]["content"]
4291
4292    assert "Source yield guard:" in content
4293    assert "Many sources have been gathered" not in content
4294
4295
4296def test_run_one_step_blocks_more_source_gathering_when_source_yield_is_missing(tmp_path):
4297    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
4298    db = AgentDB(tmp_path / "state.db")
4299    try:
4300        job_id = db.create_job(
4301            "Research and produce durable conclusions",
4302            title="source-yield",
4303            kind="generic",
4304            metadata=_source_yield_metadata(),
4305        )
4306        run_id = db.start_run(job_id, model="test")
4307        for step in _source_gathering_steps():
4308            step_id = db.add_step(
4309                job_id=job_id,
4310                run_id=run_id,
4311                kind="tool",
4312                tool_name=step["tool_name"],
4313                input_data=step["input"],
4314            )
4315            db.finish_step(step_id, status="completed", output_data={"success": True})
4316        db.finish_run(run_id, "completed")
4317
4318        result = run_one_step(
4319            job_id,
4320            config=config,
4321            db=db,
4322            llm=ScriptedLLM([
4323                LLMResponse(tool_calls=[ToolCall(name="web_search", arguments={"query": "more sources"})])
4324            ]),
4325        )
4326
4327        assert result.status == "blocked"
4328        assert result.result["error"] == "source yield accounting required"
4329        assert result.result["source_yield"]["sources"] == 16
4330    finally:
4331        db.close()
4332
4333
4334def test_source_yield_guard_takes_priority_over_memory_consolidation(tmp_path):
4335    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
4336    db = AgentDB(tmp_path / "state.db")
4337    try:
4338        job_id = db.create_job(
4339            "Research and produce durable conclusions",
4340            title="source-yield-priority",
4341            kind="generic",
4342            metadata=_source_yield_metadata(include_memory_graph=False),
4343        )
4344        run_id = db.start_run(job_id, model="test")
4345        for step in _source_gathering_steps():
4346            step_id = db.add_step(
4347                job_id=job_id,
4348                run_id=run_id,
4349                kind="tool",
4350                tool_name=step["tool_name"],
4351                input_data=step["input"],
4352            )
4353            db.finish_step(step_id, status="completed", output_data={"success": True})
4354        db.finish_run(run_id, "completed")
4355
4356        result = run_one_step(
4357            job_id,
4358            config=config,
4359            db=db,
4360            llm=ScriptedLLM([
4361                LLMResponse(tool_calls=[ToolCall(name="web_extract", arguments={"urls": ["https://source.example/new"]})])
4362            ]),
4363        )
4364
4365        assert result.status == "blocked"
4366        assert result.result["error"] == "source yield accounting required"
4367    finally:
4368        db.close()
4369
4370
4371def test_run_one_step_allows_source_yield_accounting(tmp_path):
4372    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
4373    db = AgentDB(tmp_path / "state.db")
4374    try:
4375        job_id = db.create_job(
4376            "Research and produce durable conclusions",
4377            title="source-yield",
4378            kind="generic",
4379            metadata=_source_yield_metadata(),
4380        )
4381        run_id = db.start_run(job_id, model="test")
4382        for step in _source_gathering_steps():
4383            step_id = db.add_step(
4384                job_id=job_id,
4385                run_id=run_id,
4386                kind="tool",
4387                tool_name=step["tool_name"],
4388                input_data=step["input"],
4389            )
4390            db.finish_step(step_id, status="completed", output_data={"success": True})
4391        db.finish_run(run_id, "completed")
4392
4393        result = run_one_step(
4394            job_id,
4395            config=config,
4396            db=db,
4397            llm=ScriptedLLM([
4398                LLMResponse(tool_calls=[
4399                    ToolCall(
4400                        name="record_source",
4401                        arguments={
4402                            "source": "https://source.example/0",
4403                            "source_type": "web_extract",
4404                            "yield_count": 1,
4405                            "outcome": "Source produced a durable conclusion for the active branch.",
4406                        },
4407                    )
4408                ])
4409            ]),
4410        )
4411
4412        assert result.status == "completed"
4413        assert result.tool_name == "record_source"
4414    finally:
4415        db.close()
4416
4417
4418def test_run_one_step_blocks_execution_when_research_balance_is_missing(tmp_path):
4419    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
4420    db = AgentDB(tmp_path / "state.db")
4421    try:
4422        job_id = db.create_job(
4423            "Build a durable workflow and keep improving it",
4424            title="research-balance",
4425            kind="generic",
4426            metadata={"experiment_ledger": [{"title": "Validation check", "status": "measured"}]},
4427        )
4428        run_id = db.start_run(job_id, model="test")
4429        for index in range(6):
4430            step_id = db.add_step(
4431                job_id=job_id,
4432                run_id=run_id,
4433                kind="tool",
4434                tool_name="shell_exec",
4435                input_data={"arguments": {"command": f"python branch_{index}.py"}},
4436            )
4437            db.finish_step(step_id, status="completed", output_data={"success": True, "stdout": "ok"})
4438        db.finish_run(run_id, "completed")
4439
4440        result = run_one_step(
4441            job_id,
4442            config=config,
4443            db=db,
4444            llm=ScriptedLLM([
4445                LLMResponse(tool_calls=[ToolCall(name="shell_exec", arguments={"command": "python continue_branch.py"})])
4446            ]),
4447        )
4448
4449        assert result.status == "blocked"
4450        assert result.result["error"] == "research balance required"
4451        assert result.result["blocked_tool"] == "shell_exec"
4452        assert "record_source" in result.result["guidance"]
4453    finally:
4454        db.close()
4455
4456
4457def test_run_one_step_blocks_lesson_churn_when_research_balance_is_missing(tmp_path):
4458    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
4459    db = AgentDB(tmp_path / "state.db")
4460    try:
4461        job_id = db.create_job(
4462            "Build a durable workflow and keep improving it",
4463            title="research-balance-lessons",
4464            kind="generic",
4465            metadata={"experiment_ledger": [{"title": "Validation check", "status": "measured"}]},
4466        )
4467        run_id = db.start_run(job_id, model="test")
4468        for index in range(6):
4469            step_id = db.add_step(
4470                job_id=job_id,
4471                run_id=run_id,
4472                kind="tool",
4473                tool_name="shell_exec",
4474                input_data={"arguments": {"command": f"python branch_{index}.py"}},
4475            )
4476            db.finish_step(step_id, status="completed", output_data={"success": True, "stdout": "ok"})
4477        db.finish_run(run_id, "completed")
4478
4479        result = run_one_step(
4480            job_id,
4481            config=config,
4482            db=db,
4483            llm=ScriptedLLM([
4484                LLMResponse(tool_calls=[
4485                    ToolCall(
4486                        name="record_lesson",
4487                        arguments={
4488                            "lesson": "The latest execution branch worked; continue similar attempts.",
4489                            "category": "strategy",
4490                        },
4491                    )
4492                ])
4493            ]),
4494        )
4495
4496        assert result.status == "blocked"
4497        assert result.result["error"] == "research balance required"
4498        assert result.result["blocked_tool"] == "record_lesson"
4499        assert "raw lesson accumulation" in result.result["guidance"]
4500    finally:
4501        db.close()
4502
4503
4504def test_run_one_step_blocks_durable_records_with_unsupported_concrete_claims(tmp_path):
4505    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
4506    db = AgentDB(tmp_path / "state.db")
4507    try:
4508        job_id = db.create_job("Optimize a measurable process on observed hardware", title="grounding", kind="generic")
4509        run_id = db.start_run(job_id, model="test")
4510        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
4511        db.finish_step(
4512            step_id,
4513            status="completed",
4514            output_data={
4515                "success": True,
4516                "stdout": "GPU: AMD Device 7590\nCPU: AMD Ryzen 9 7900X\nMemory: 93Gi\n",
4517                "stderr": "",
4518            },
4519        )
4520        db.finish_run(run_id, "completed")
4521
4522        result = run_one_step(
4523            job_id,
4524            config=config,
4525            db=db,
4526            llm=ScriptedLLM([
4527                LLMResponse(tool_calls=[
4528                    ToolCall(
4529                        name="record_roadmap",
4530                        arguments={
4531                            "title": "Performance roadmap",
4532                            "status": "active",
4533                            "current_milestone": "Environment",
4534                            "metadata": {
4535                                "hardware": "NVIDIA GTX 970 with CUDA and i5-8400 CPU",
4536                                "claim": "Use CUDA-first optimization.",
4537                            },
4538                            "milestones": [],
4539                        },
4540                    )
4541                ])
4542            ]),
4543        )
4544
4545        assert result.status == "blocked"
4546        assert result.result["error"] == "evidence grounding required"
4547        assert result.result["blocked_tool"] == "record_roadmap"
4548        assert "GTX" in result.result["evidence_grounding"]["unsupported_tokens"]
4549        lessons = db.get_job(job_id)["metadata"]["lessons"]
4550        assert any("GTX" in lesson["lesson"] and "stale" in lesson["lesson"] for lesson in lessons)
4551    finally:
4552        db.close()
4553
4554
4555def test_record_experiment_blocks_unsupported_proper_noun_hardware_claims(tmp_path):
4556    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
4557    db = AgentDB(tmp_path / "state.db")
4558    try:
4559        job_id = db.create_job("Record exact observed environment", title="grounding", kind="generic")
4560        run_id = db.start_run(job_id, model="test")
4561        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
4562        db.finish_step(
4563            step_id,
4564            status="completed",
4565            output_data={
4566                "success": True,
4567                "stdout": (
4568                    "NO_NVIDIA_GPU\n"
4569                    "GPU: Advanced Micro Devices Device 7590\n"
4570                    "Threads: 24\n"
4571                    "CPU: AMD Ryzen 9 7900X 12-Core Processor\n"
4572                ),
4573            },
4574        )
4575        db.finish_run(run_id, "completed")
4576
4577        blocked = run_one_step(
4578            job_id,
4579            config=config,
4580            db=db,
4581            llm=ScriptedLLM([
4582                LLMResponse(tool_calls=[
4583                    ToolCall(
4584                        name="record_experiment",
4585                        arguments={
4586                            "title": "Environment Baseline - Hardware Runtime Facts",
4587                            "status": "measured",
4588                            "metric_name": "cpu_threads",
4589                            "metric_value": 16,
4590                            "metric_unit": "threads",
4591                            "result": "Environment baseline captured. Hardware: Dual Intel Xeon CPUs, 16 threads total.",
4592                            "next_action": "Continue from exact observed hardware facts.",
4593                        },
4594                    )
4595                ])
4596            ]),
4597        )
4598
4599        assert blocked.status == "blocked"
4600        assert blocked.result["error"] == "evidence grounding required"
4601        assert {"Dual", "Intel", "Xeon"} <= set(blocked.result["evidence_grounding"]["unsupported_tokens"])
4602    finally:
4603        db.close()
4604
4605
4606def test_record_experiment_allows_supported_proper_noun_hardware_claims(tmp_path):
4607    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
4608    db = AgentDB(tmp_path / "state.db")
4609    try:
4610        job_id = db.create_job("Record exact observed environment", title="grounding", kind="generic")
4611        run_id = db.start_run(job_id, model="test")
4612        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
4613        db.finish_step(
4614            step_id,
4615            status="completed",
4616            output_data={
4617                "success": True,
4618                "stdout": (
4619                    "NO_NVIDIA_GPU\n"
4620                    "GPU: Advanced Micro Devices Device 7590\n"
4621                    "Threads: 24\n"
4622                    "CPU: AMD Ryzen 9 7900X 12-Core Processor\n"
4623                ),
4624            },
4625        )
4626        db.finish_run(run_id, "completed")
4627
4628        result = run_one_step(
4629            job_id,
4630            config=config,
4631            db=db,
4632            llm=ScriptedLLM([
4633                LLMResponse(tool_calls=[
4634                    ToolCall(
4635                        name="record_experiment",
4636                        arguments={
4637                            "title": "Environment Baseline - Hardware Runtime Facts",
4638                            "status": "measured",
4639                            "metric_name": "cpu_threads",
4640                            "metric_value": 24,
4641                            "metric_unit": "threads",
4642                            "result": "Environment baseline captured. Hardware: AMD Ryzen 9 7900X, 24 threads total.",
4643                            "next_action": "Continue from exact observed hardware facts.",
4644                        },
4645                    )
4646                ])
4647            ]),
4648        )
4649
4650        assert result.status == "completed"
4651        experiment = db.get_job(job_id)["metadata"]["experiment_ledger"][0]
4652        assert "AMD Ryzen 9 7900X" in experiment["result"]
4653    finally:
4654        db.close()
4655
4656
4657def test_record_lesson_blocks_negative_claim_that_conflicts_with_positive_evidence(tmp_path):
4658    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
4659    db = AgentDB(tmp_path / "state.db")
4660    try:
4661        job_id = db.create_job("Keep exact observed facts durable", title="lesson-grounding", kind="generic")
4662        run_id = db.start_run(job_id, model="test")
4663        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
4664        db.finish_step(
4665            step_id,
4666            status="completed",
4667            output_data={
4668                "success": True,
4669                "stdout": (
4670                    "NAME ID SIZE MODIFIED\n"
4671                    "ModelX-99 a50eda8ed977 17 GB 2 weeks ago\n"
4672                    "OtherModel 69492d6584c5 14 GB 2 months ago\n"
4673                ),
4674            },
4675        )
4676        db.finish_run(run_id, "completed")
4677
4678        blocked = run_one_step(
4679            job_id,
4680            config=config,
4681            db=db,
4682            llm=ScriptedLLM([
4683                LLMResponse(tool_calls=[
4684                    ToolCall(
4685                        name="record_lesson",
4686                        arguments={
4687                            "category": "strategy",
4688                            "lesson": (
4689                                "No ModelX-99 model has been successfully downloaded, so keep the download branch "
4690                                "as the primary blocker before any benchmark work."
4691                            ),
4692                        },
4693                    )
4694                ])
4695            ]),
4696        )
4697
4698        assert blocked.status == "blocked"
4699        assert blocked.result["error"] == "evidence grounding required"
4700        grounding = blocked.result["evidence_grounding"]
4701        assert "ModelX-99" in grounding["unsupported_tokens"]
4702        assert grounding["negative_claim_conflicts"][0]["token"] == "ModelX-99"
4703    finally:
4704        db.close()
4705
4706
4707def test_record_lesson_ignores_plain_titlecase_negative_conflict_tokens(tmp_path):
4708    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
4709    db = AgentDB(tmp_path / "state.db")
4710    try:
4711        job_id = db.create_job("Keep exact observed facts durable", title="lesson-grounding", kind="generic")
4712        run_id = db.start_run(job_id, model="test")
4713        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
4714        db.finish_step(
4715            step_id,
4716            status="completed",
4717            output_data={
4718                "success": True,
4719                "stdout": "/srv/vendor/lmstudio-community/Model.foo\n",
4720            },
4721        )
4722        db.finish_run(run_id, "completed")
4723
4724        result = run_one_step(
4725            job_id,
4726            config=config,
4727            db=db,
4728            llm=ScriptedLLM([
4729                LLMResponse(tool_calls=[
4730                    ToolCall(
4731                        name="record_lesson",
4732                        arguments={
4733                            "category": "strategy",
4734                            "lesson": "No Studio-specific conclusion should be drawn from this branch yet.",
4735                        },
4736                    )
4737                ])
4738            ]),
4739        )
4740
4741        assert result.status == "completed"
4742    finally:
4743        db.close()
4744
4745
4746def test_record_lesson_allows_negative_claim_when_evidence_is_also_negative(tmp_path):
4747    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
4748    db = AgentDB(tmp_path / "state.db")
4749    try:
4750        job_id = db.create_job("Keep exact observed facts durable", title="lesson-grounding", kind="generic")
4751        run_id = db.start_run(job_id, model="test")
4752        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
4753        db.finish_step(
4754            step_id,
4755            status="completed",
4756            output_data={
4757                "success": True,
4758                "stdout": "ls: cannot access '/tmp/ModelX-99.gguf': No such file or directory\n",
4759            },
4760        )
4761        db.finish_run(run_id, "completed")
4762
4763        result = run_one_step(
4764            job_id,
4765            config=config,
4766            db=db,
4767            llm=ScriptedLLM([
4768                LLMResponse(tool_calls=[
4769                    ToolCall(
4770                        name="record_lesson",
4771                        arguments={
4772                            "category": "strategy",
4773                            "lesson": (
4774                                "No ModelX-99 file exists in the checked path, so the next branch must use a "
4775                                "different observed source or record the missing file as blocked."
4776                            ),
4777                        },
4778                    )
4779                ])
4780            ]),
4781        )
4782
4783        assert result.status == "completed"
4784        lesson = db.get_job(job_id)["metadata"]["lessons"][0]
4785        assert "ModelX-99" in lesson["lesson"]
4786    finally:
4787        db.close()
4788
4789
4790def test_shell_path_recovery_prompt_shows_missing_executable(tmp_path):
4791    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
4792    db = AgentDB(tmp_path / "state.db")
4793    try:
4794        job_id = db.create_job("Run a measured tool after validating paths", title="missing-executable", kind="generic")
4795        run_id = db.start_run(job_id, model="test")
4796        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
4797        db.finish_step(
4798            step_id,
4799            status="completed",
4800            output_data={
4801                "success": True,
4802                "command": "/opt/tools/runner --measure",
4803                "stdout": "/bin/sh: 1: /opt/tools/runner: not found\n",
4804                "stderr": "",
4805            },
4806        )
4807        db.finish_run(run_id, "completed")
4808        llm = CapturingLLM(LLMResponse(tool_calls=[
4809            ToolCall(
4810                name="record_lesson",
4811                arguments={
4812                    "category": "strategy",
4813                    "lesson": "The /opt/tools/runner executable was missing, so validate a real executable path before measuring.",
4814                },
4815            )
4816        ]))
4817
4818        result = run_one_step(job_id, config=config, db=db, llm=llm)
4819
4820        assert result.status == "completed"
4821        prompt = llm.messages[-1]["content"]
4822        assert "Shell path recovery" in prompt
4823        assert "/opt/tools/runner" in prompt
4824        assert "Do not treat this output as a successful measurement" in prompt
4825        assert "locate or verify the real executable/file path" in prompt
4826    finally:
4827        db.close()
4828
4829
4830def test_shell_path_recovery_prompt_prefers_observed_candidate_executable(tmp_path):
4831    db = AgentDB(tmp_path / "state.db")
4832    try:
4833        job_id = db.create_job("Build and benchmark a generic project", title="candidate-executable", kind="generic")
4834        run_id = db.start_run(job_id, model="test")
4835        observed_step = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
4836        db.finish_step(
4837            observed_step,
4838            status="completed",
4839            output_data={
4840                "success": True,
4841                "command": "ls /tmp/tools/build-tool",
4842                "stdout": "/tmp/tools/build-tool\n---\nbuild-tool\nhelper\n",
4843                "stderr": "",
4844            },
4845        )
4846        failed_step = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
4847        db.finish_step(
4848            failed_step,
4849            status="failed",
4850            output_data={
4851                "success": False,
4852                "command": "cd /tmp/project && build-tool ..",
4853                "stdout": "/bin/sh: 1: build-tool: not found\n",
4854                "stderr": "",
4855                "error": "command output indicates missing command despite exit status 0",
4856            },
4857        )
4858        db.finish_run(run_id, "completed")
4859
4860        messages = build_messages(db.get_job(job_id), db.list_steps(job_id=job_id))
4861        prompt = messages[-1]["content"]
4862
4863        assert "Shell path recovery" in prompt
4864        assert "Missing commands: build-tool" in prompt
4865        assert "Observed candidate executable for build-tool: /tmp/tools/build-tool" in prompt
4866        assert "try the exact candidate path or add its directory to PATH" in prompt
4867    finally:
4868        db.close()
4869
4870
4871def test_shell_path_recovery_prompt_preserves_partial_success_paths(tmp_path):
4872    db = AgentDB(tmp_path / "state.db")
4873    try:
4874        job_id = db.create_job("Recover from mixed shell output", title="partial-shell-paths", kind="generic")
4875        run_id = db.start_run(job_id, model="test")
4876        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
4877        db.finish_step(
4878            step_id,
4879            status="failed",
4880            output_data={
4881                "success": False,
4882                "command": "ls /tmp/bin/build-tool /tmp/bin/compiler; which compiler",
4883                "returncode": 1,
4884                "stdout": (
4885                    "ls: cannot access '/tmp/bin/compiler': No such file or directory\n"
4886                    "lrwxrwxrwx 1 user user 30 Jan 1 00:00 /tmp/bin/build-tool -> /tmp/runtime/build-tool\n"
4887                    "/usr/bin/compiler\n"
4888                ),
4889                "stderr": "",
4890                "error": "command exited with status 1",
4891            },
4892        )
4893        db.finish_run(run_id, "completed")
4894
4895        messages = build_messages(db.get_job(job_id), db.list_steps(job_id=job_id))
4896        prompt = messages[-1]["content"]
4897
4898        assert "Shell path recovery" in prompt
4899        assert "Missing paths: /tmp/bin/compiler" in prompt
4900        assert "Observed executable paths in partial shell output" in prompt
4901        assert "/tmp/bin/build-tool" in prompt
4902        assert "/tmp/runtime/build-tool" in prompt
4903        assert "/usr/bin/compiler" in prompt
4904        assert "Observed executable paths in partial shell output: /tmp/bin/compiler" not in prompt
4905    finally:
4906        db.close()
4907
4908
4909def test_shell_exec_blocks_bare_retry_when_candidate_executable_observed(tmp_path):
4910    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
4911    db = AgentDB(tmp_path / "state.db")
4912    try:
4913        job_id = db.create_job("Recover with observed executable", title="candidate-retry", kind="generic")
4914        run_id = db.start_run(job_id, model="test")
4915        observed_step = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
4916        db.finish_step(
4917            observed_step,
4918            status="completed",
4919            output_data={
4920                "success": True,
4921                "command": "ls /tmp/tools/build-tool",
4922                "stdout": "/tmp/tools/build-tool\n",
4923                "stderr": "",
4924            },
4925        )
4926        failed_step = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
4927        db.finish_step(
4928            failed_step,
4929            status="failed",
4930            output_data={
4931                "success": False,
4932                "command": "build-tool --version",
4933                "stdout": "/bin/sh: 1: build-tool: not found\n",
4934                "stderr": "",
4935                "error": "command output indicates missing command despite exit status 0",
4936            },
4937        )
4938        db.finish_run(run_id, "completed")
4939
4940        result = run_one_step(
4941            job_id,
4942            config=config,
4943            db=db,
4944            llm=ScriptedLLM([
4945                LLMResponse(tool_calls=[
4946                    ToolCall(name="shell_exec", arguments={"command": "build-tool --version"})
4947                ])
4948            ]),
4949            registry=SuccessRegistry(),
4950        )
4951
4952        assert result.status == "blocked"
4953        assert result.result["error"] == "observed executable recovery required"
4954        assert result.result["candidate_recovery"]["missing_command"] == "build-tool"
4955        assert result.result["candidate_recovery"]["candidate_executables"] == ["/tmp/tools/build-tool"]
4956    finally:
4957        db.close()
4958
4959
4960def test_permission_failure_prompt_blocks_package_manager_retry(tmp_path):
4961    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
4962    db = AgentDB(tmp_path / "state.db")
4963    try:
4964        job_id = db.create_job("Recover from generic build prerequisites", title="permission-recovery", kind="generic")
4965        run_id = db.start_run(job_id, model="test")
4966        step_id = db.add_step(
4967            job_id=job_id,
4968            run_id=run_id,
4969            kind="tool",
4970            tool_name="shell_exec",
4971            input_data={"arguments": {"command": "apt-get install -y build-tool"}},
4972        )
4973        db.finish_step(
4974            step_id,
4975            status="failed",
4976            output_data={
4977                "success": False,
4978                "command": "apt-get install -y build-tool",
4979                "returncode": 0,
4980                "stdout": (
4981                    "E: Could not open lock file /var/lib/dpkg/lock-frontend - open (13: Permission denied)\n"
4982                    "E: Unable to acquire the dpkg frontend lock (/var/lib/dpkg/lock-frontend), are you root?\n"
4983                ),
4984                "stderr": "",
4985                "error": "command output indicates authentication or authorization failure despite exit status 0",
4986            },
4987        )
4988        db.finish_run(run_id, "completed")
4989        llm = ScriptedLLM([LLMResponse(tool_calls=[
4990            ToolCall(name="shell_exec", arguments={"command": "apt-get install -y another-tool"})
4991        ])])
4992
4993        result = run_one_step(job_id, config=config, db=db, llm=llm, registry=SuccessRegistry())
4994
4995        assert result.status == "blocked"
4996        assert result.result["error"] == "privileged command recovery required"
4997        assert result.result["privileged_failure"]["step_no"] == 1
4998        assert "non-privileged recovery" in result.result["guidance"]
4999    finally:
5000        db.close()
5001
5002
5003def test_permission_failure_prompt_mentions_non_privileged_recovery(tmp_path):
5004    db = AgentDB(tmp_path / "state.db")
5005    try:
5006        job_id = db.create_job("Recover from generic build prerequisites", title="permission-prompt", kind="generic")
5007        run_id = db.start_run(job_id, model="test")
5008        step_id = db.add_step(
5009            job_id=job_id,
5010            run_id=run_id,
5011            kind="tool",
5012            tool_name="shell_exec",
5013            input_data={"arguments": {"command": "sudo package-manager install build-tool"}},
5014        )
5015        db.finish_step(
5016            step_id,
5017            status="failed",
5018            output_data={
5019                "success": False,
5020                "command": "sudo package-manager install build-tool",
5021                "stdout": "sudo: a password is required\n",
5022                "stderr": "",
5023                "error": "authentication or authorization failure",
5024            },
5025        )
5026        db.finish_run(run_id, "completed")
5027
5028        messages = build_messages(db.get_job(job_id), db.list_steps(job_id=job_id))
5029        prompt = messages[-1]["content"]
5030
5031        assert "Shell permission recovery" in prompt
5032        assert "failed because a privileged/package-manager command lacked permission" in prompt
5033        assert "non-privileged alternatives" in prompt
5034        assert "operator credentials" in prompt
5035    finally:
5036        db.close()
5037
5038
5039def test_record_findings_blocks_negative_file_pattern_that_conflicts_with_positive_evidence(tmp_path):
5040    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
5041    db = AgentDB(tmp_path / "state.db")
5042    try:
5043        job_id = db.create_job("Keep file discovery evidence exact", title="file-grounding", kind="generic")
5044        run_id = db.start_run(job_id, model="test")
5045        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
5046        db.finish_step(
5047            step_id,
5048            status="completed",
5049            output_data={
5050                "success": True,
5051                "stdout": (
5052                    "/srv/data/WidgetModel-99-Q4.foo\n"
5053                    "/tmp/results/report.alpha\n"
5054                    "/var/cache/other-file.foo\n"
5055                ),
5056            },
5057        )
5058        db.finish_run(run_id, "completed")
5059
5060        blocked = run_one_step(
5061            job_id,
5062            config=config,
5063            db=db,
5064            llm=ScriptedLLM([
5065                LLMResponse(tool_calls=[
5066                    ToolCall(
5067                        name="record_findings",
5068                        arguments={
5069                            "findings": [
5070                                {
5071                                    "name": "No .foo files found on filesystem",
5072                                    "category": "environment_baseline",
5073                                    "status": "confirmed",
5074                                    "reason": "Shell search found zero .foo files larger than 1MB anywhere on the system.",
5075                                }
5076                            ]
5077                        },
5078                    )
5079                ])
5080            ]),
5081        )
5082
5083        assert blocked.status == "blocked"
5084        assert blocked.result["error"] == "evidence grounding required"
5085        grounding = blocked.result["evidence_grounding"]
5086        assert ".foo" in grounding["unsupported_tokens"]
5087        assert grounding["negative_claim_conflicts"][0]["token"] == ".foo"
5088    finally:
5089        db.close()
5090
5091
5092def test_file_pattern_grounding_ignores_hidden_path_components():
5093    text = (
5094        "No compiled binary exists yet. Valid data is at "
5095        "/srv/cache/.lmstudio/models/ModelX.gguf and /tmp/.cache/item.bin. "
5096        "No *.foo files were found."
5097    )
5098
5099    tokens = _file_pattern_tokens_for_grounding(text)
5100
5101    assert ".lmstudio" not in tokens
5102    assert ".cache" not in tokens
5103    assert ".foo" in tokens
5104
5105
5106def test_record_experiment_allows_classifying_observed_files_as_non_primary(tmp_path):
5107    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
5108    db = AgentDB(tmp_path / "state.db")
5109    try:
5110        job_id = db.create_job("Validate observed files before primary artifact work", title="file-classification", kind="generic")
5111        run_id = db.start_run(job_id, model="test")
5112        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
5113        db.finish_step(
5114            step_id,
5115            status="completed",
5116            output_data={
5117                "success": True,
5118                "stdout": (
5119                    "/srv/data/support-alpha-v2.foo\n"
5120                    "/srv/data/support-beta-v2.foo\n"
5121                ),
5122            },
5123        )
5124        db.finish_run(run_id, "completed")
5125
5126        result = run_one_step(
5127            job_id,
5128            config=config,
5129            db=db,
5130            llm=ScriptedLLM([
5131                LLMResponse(tool_calls=[
5132                    ToolCall(
5133                        name="record_experiment",
5134                        arguments={
5135                            "title": "primary artifact scan",
5136                            "status": "measured",
5137                            "metric_name": "primary_artifacts_found",
5138                            "metric_value": 0,
5139                            "metric_unit": "files",
5140                            "config": {
5141                                "files_found": [
5142                                    "/srv/data/support-alpha-v2.foo",
5143                                    "/srv/data/support-beta-v2.foo",
5144                                ],
5145                            },
5146                            "result": (
5147                                "scan found only support files: /srv/data/support-alpha-v2.foo and "
5148                                "/srv/data/support-beta-v2.foo. observed files are not the required "
5149                                "primary artifact, so the primary artifact remains missing."
5150                            ),
5151                            "next_action": "select a different observed source for the primary artifact.",
5152                        },
5153                    )
5154                ])
5155            ]),
5156        )
5157
5158        assert result.status == "completed"
5159        assert result.tool_name == "record_experiment"
5160    finally:
5161        db.close()
5162
5163
5164def test_record_findings_requires_exact_paths_when_file_candidates_exist(tmp_path):
5165    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
5166    db = AgentDB(tmp_path / "state.db")
5167    try:
5168        job_id = db.create_job("Keep file candidate evidence exact", title="path-grounding", kind="generic")
5169        run_id = db.start_run(job_id, model="test")
5170        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
5171        db.finish_step(
5172            step_id,
5173            status="completed",
5174            output_data={
5175                "success": True,
5176                "stdout": (
5177                    "/srv/models/AlphaModel-Q4.foo\n"
5178                    "/srv/models/BetaModel-Q8.foo\n"
5179                    "/tmp/results/summary.json\n"
5180                ),
5181            },
5182        )
5183        db.finish_run(run_id, "completed")
5184
5185        blocked = run_one_step(
5186            job_id,
5187            config=config,
5188            db=db,
5189            llm=ScriptedLLM([
5190                LLMResponse(tool_calls=[
5191                    ToolCall(
5192                        name="record_findings",
5193                        arguments={
5194                            "findings": [
5195                                {
5196                                    "name": "Model files found on disk",
5197                                    "category": "environment",
5198                                    "status": "new",
5199                                    "reason": "Shell search found candidate files, so the next branch can validate them.",
5200                                }
5201                            ]
5202                        },
5203                    )
5204                ])
5205            ]),
5206        )
5207
5208        assert blocked.status == "blocked"
5209        assert blocked.result["error"] == "evidence grounding required"
5210        grounding = blocked.result["evidence_grounding"]
5211        assert "/srv/models/AlphaModel-Q4.foo" in grounding["missing_candidate_paths"]
5212        assert "exact observed candidate paths" in grounding["guidance"]
5213    finally:
5214        db.close()
5215
5216
5217def test_missing_candidate_paths_are_ranked_before_grounding_guidance(tmp_path):
5218    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
5219    db = AgentDB(tmp_path / "state.db")
5220    try:
5221        job_id = db.create_job("Validate OmegaModel file before benchmarking", title="omega benchmark", kind="generic")
5222        run_id = db.start_run(job_id, model="test")
5223        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
5224        db.finish_step(
5225            step_id,
5226            status="completed",
5227            output_data={
5228                "success": True,
5229                "stdout": "\n".join(
5230                    [f"/srv/models/ggml-vocab-{index}.foo" for index in range(20)]
5231                    + ["/srv/models/OmegaModel-primary.foo"]
5232                ),
5233            },
5234        )
5235        db.finish_run(run_id, "completed")
5236
5237        blocked = run_one_step(
5238            job_id,
5239            config=config,
5240            db=db,
5241            llm=ScriptedLLM([
5242                LLMResponse(tool_calls=[
5243                    ToolCall(
5244                        name="record_findings",
5245                        arguments={
5246                            "findings": [
5247                                {
5248                                    "name": "Candidate files found",
5249                                    "category": "environment",
5250                                    "status": "new",
5251                                    "reason": "A file search found candidate files to validate.",
5252                                }
5253                            ]
5254                        },
5255                    )
5256                ])
5257            ]),
5258        )
5259
5260        assert blocked.status == "blocked"
5261        grounding = blocked.result["evidence_grounding"]
5262        assert grounding["missing_candidate_paths"][0] == "/srv/models/OmegaModel-primary.foo"
5263    finally:
5264        db.close()
5265
5266
5267def test_record_findings_allows_exact_candidate_path_summary(tmp_path):
5268    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
5269    db = AgentDB(tmp_path / "state.db")
5270    try:
5271        job_id = db.create_job("Keep file candidate evidence exact", title="path-grounding", kind="generic")
5272        run_id = db.start_run(job_id, model="test")
5273        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
5274        db.finish_step(
5275            step_id,
5276            status="completed",
5277            output_data={
5278                "success": True,
5279                "stdout": "/srv/models/AlphaModel-Q4.foo\n/tmp/results/summary.json\n",
5280            },
5281        )
5282        db.finish_run(run_id, "completed")
5283
5284        result = run_one_step(
5285            job_id,
5286            config=config,
5287            db=db,
5288            llm=ScriptedLLM([
5289                LLMResponse(tool_calls=[
5290                    ToolCall(
5291                        name="record_findings",
5292                        arguments={
5293                            "findings": [
5294                                {
5295                                    "name": "Model file candidate",
5296                                    "category": "environment",
5297                                    "status": "new",
5298                                    "reason": "Candidate path /srv/models/AlphaModel-Q4.foo should be validated next.",
5299                                }
5300                            ]
5301                        },
5302                    )
5303                ])
5304            ]),
5305        )
5306
5307        assert result.status == "completed"
5308    finally:
5309        db.close()
5310
5311
5312def test_evidence_grounding_blocks_positive_claim_for_missing_path(tmp_path):
5313    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
5314    db = AgentDB(tmp_path / "state.db")
5315    try:
5316        job_id = db.create_job("Verify a generic executable path", title="path polarity", kind="generic")
5317        run_id = db.start_run(job_id, model="test")
5318        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
5319        db.finish_step(
5320            step_id,
5321            status="completed",
5322            output_data={
5323                "success": True,
5324                "command": "ls /tmp/tools/build-tool /usr/bin/make",
5325                "stdout": (
5326                    "ls: cannot access '/tmp/tools/build-tool': No such file or directory\n"
5327                    "/usr/bin/make\n"
5328                    "This shell probe checked candidate executable paths before the build step.\n"
5329                ),
5330                "stderr": "",
5331            },
5332        )
5333        db.finish_run(run_id, "completed")
5334
5335        blocked = run_one_step(
5336            job_id,
5337            config=config,
5338            db=db,
5339            llm=ScriptedLLM([
5340                LLMResponse(tool_calls=[
5341                    ToolCall(
5342                        name="record_experiment",
5343                        arguments={
5344                            "title": "Build tool path verification",
5345                            "status": "measured",
5346                            "metric_name": "build_prerequisites",
5347                            "metric_value": 2,
5348                            "metric_unit": "items",
5349                            "result": "Found build tool at /tmp/tools/build-tool and make at /usr/bin/make. Build prerequisites are verified.",
5350                        },
5351                    )
5352                ])
5353            ]),
5354        )
5355
5356        assert blocked.status == "blocked"
5357        assert blocked.result["error"] == "evidence grounding required"
5358        grounding = blocked.result["evidence_grounding"]
5359        assert "/tmp/tools/build-tool" in grounding["unsupported_tokens"]
5360        assert grounding["negative_path_conflicts"][0]["path"] == "/tmp/tools/build-tool"
5361        assert "claims a path or executable is present" in grounding["guidance"]
5362    finally:
5363        db.close()
5364
5365
5366def test_evidence_grounding_checks_later_positive_path_mentions(tmp_path):
5367    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
5368    db = AgentDB(tmp_path / "state.db")
5369    try:
5370        job_id = db.create_job("Verify executable path polarity", title="path later mention", kind="generic")
5371        run_id = db.start_run(job_id, model="test")
5372        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
5373        db.finish_step(
5374            step_id,
5375            status="completed",
5376            output_data={
5377                "success": True,
5378                "stdout": (
5379                    "ls: cannot access '/tmp/tools/build-tool': No such file or directory\n"
5380                    "The probe also checked unrelated files and returned partial output for review.\n"
5381                ),
5382                "stderr": "",
5383            },
5384        )
5385        db.finish_run(run_id, "completed")
5386
5387        blocked = run_one_step(
5388            job_id,
5389            config=config,
5390            db=db,
5391            llm=ScriptedLLM([
5392                LLMResponse(tool_calls=[
5393                    ToolCall(
5394                        name="record_lesson",
5395                        arguments={
5396                            "category": "constraint",
5397                            "lesson": (
5398                                "candidate path /tmp/tools/build-tool was examined. "
5399                                "The executable is at /tmp/tools/build-tool and should be used for the next build."
5400                            ),
5401                        },
5402                    )
5403                ])
5404            ]),
5405        )
5406
5407        assert blocked.status == "blocked"
5408        assert blocked.result["error"] == "evidence grounding required"
5409        grounding = blocked.result["evidence_grounding"]
5410        assert grounding["negative_path_conflicts"][0]["path"] == "/tmp/tools/build-tool"
5411    finally:
5412        db.close()
5413
5414
5415def test_record_findings_allows_negative_file_pattern_when_evidence_is_negative(tmp_path):
5416    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
5417    db = AgentDB(tmp_path / "state.db")
5418    try:
5419        job_id = db.create_job("Keep file discovery evidence exact", title="file-grounding", kind="generic")
5420        run_id = db.start_run(job_id, model="test")
5421        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
5422        db.finish_step(
5423            step_id,
5424            status="completed",
5425            output_data={
5426                "success": True,
5427                "stdout": "find: '/tmp/WidgetModel-99.foo': No such file or directory\n",
5428            },
5429        )
5430        db.finish_run(run_id, "completed")
5431
5432        result = run_one_step(
5433            job_id,
5434            config=config,
5435            db=db,
5436            llm=ScriptedLLM([
5437                LLMResponse(tool_calls=[
5438                    ToolCall(
5439                        name="record_findings",
5440                        arguments={
5441                            "findings": [
5442                                {
5443                                    "name": "No .foo file exists in the checked path",
5444                                    "category": "environment_baseline",
5445                                    "status": "confirmed",
5446                                    "reason": "The shell output says the checked .foo path does not exist.",
5447                                }
5448                            ]
5449                        },
5450                    )
5451                ])
5452            ]),
5453        )
5454
5455        assert result.status == "completed"
5456        findings = db.get_job(job_id)["metadata"]["finding_ledger"]
5457        assert findings[0]["name"] == "No .foo file exists in the checked path"
5458    finally:
5459        db.close()
5460
5461
5462def test_run_one_step_marks_contradicted_negative_finding_stale(tmp_path):
5463    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
5464    db = AgentDB(tmp_path / "state.db")
5465    try:
5466        job_id = db.create_job("Keep durable findings aligned with fresh evidence", title="stale-finding", kind="generic")
5467        db.append_finding_record(
5468            job_id,
5469            name="No .foo files found",
5470            category="environment_baseline",
5471            reason="Shell search found zero .foo files anywhere in the checked filesystem.",
5472            status="confirmed",
5473        )
5474        run_id = db.start_run(job_id, model="test")
5475        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
5476        db.finish_step(
5477            step_id,
5478            status="completed",
5479            output_data={
5480                "success": True,
5481                "stdout": (
5482                    "discovery results from current filesystem scan:\n"
5483                    "/srv/data/WidgetModel-99-Q4.foo\n"
5484                    "/var/cache/other-file.foo\n"
5485                ),
5486            },
5487        )
5488        db.finish_run(run_id, "completed")
5489
5490        result = run_one_step(
5491            job_id,
5492            config=config,
5493            db=db,
5494            llm=ScriptedLLM([
5495                LLMResponse(tool_calls=[
5496                    ToolCall(
5497                        name="record_lesson",
5498                        arguments={
5499                            "category": "strategy",
5500                            "lesson": "Fresh file-discovery evidence should override older absence claims.",
5501                        },
5502                    )
5503                ])
5504            ]),
5505        )
5506
5507        assert result.status == "completed"
5508        job = db.get_job(job_id)
5509        stale_records = job["metadata"].get("stale_negative_records")
5510        assert isinstance(stale_records, list)
5511        assert stale_records[0]["kind"] == "finding"
5512        assert stale_records[0]["token"] == ".foo"
5513
5514        from nipux_cli.worker_prompt_context import _ledgers_for_prompt
5515
5516        ledgers = _ledgers_for_prompt(job)
5517        assert "Contradicted negative findings suppressed" in ledgers
5518        assert "Suppressed 1 stale finding" in ledgers
5519    finally:
5520        db.close()
5521
5522
5523def test_run_one_step_marks_contradicted_negative_memory_node_stale(tmp_path):
5524    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
5525    db = AgentDB(tmp_path / "state.db")
5526    try:
5527        job_id = db.create_job("Keep memory aligned with fresh evidence", title="stale-memory", kind="generic")
5528        db.append_memory_graph_records(
5529            job_id,
5530            nodes=[
5531                {
5532                    "key": "fact-no-local-foo",
5533                    "title": "No local foo files",
5534                    "kind": "fact",
5535                    "status": "active",
5536                    "summary": "Filesystem searches for *.foo files return 0 results.",
5537                },
5538                {
5539                    "key": "current-branch",
5540                    "title": "Current branch",
5541                    "kind": "strategy",
5542                    "status": "active",
5543                    "summary": "Use fresh shell evidence before recording durable claims.",
5544                },
5545            ],
5546        )
5547        run_id = db.start_run(job_id, model="test")
5548        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
5549        db.finish_step(
5550            step_id,
5551            status="completed",
5552            output_data={
5553                "success": True,
5554                "stdout": (
5555                    "Fresh filesystem discovery found an exact candidate path with enough surrounding context "
5556                    "to count as evidence: /srv/data/WidgetModel-99-Q4.foo\n"
5557                ),
5558            },
5559        )
5560        db.finish_run(run_id, "completed")
5561
5562        result = run_one_step(
5563            job_id,
5564            config=config,
5565            db=db,
5566            llm=ScriptedLLM([
5567                LLMResponse(tool_calls=[
5568                    ToolCall(
5569                        name="record_lesson",
5570                        arguments={
5571                            "category": "strategy",
5572                            "lesson": "Fresh file evidence overrides stale absence memory.",
5573                        },
5574                    )
5575                ])
5576            ]),
5577        )
5578
5579        assert result.status == "completed"
5580        job = db.get_job(job_id)
5581        stale_records = job["metadata"].get("stale_negative_records")
5582        assert any(record["kind"] == "memory_node" and record["record_id"] == "fact-no-local-foo" for record in stale_records)
5583
5584        content = build_messages(job, db.list_steps(job_id=job_id))[-1]["content"]
5585        assert "Suppressed 1 stale memory node" in content
5586        assert "No local foo files" not in content
5587        assert "Current branch" in content
5588    finally:
5589        db.close()
5590
5591
5592def test_record_lesson_allows_generic_strategy_without_concrete_facts(tmp_path):
5593    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
5594    db = AgentDB(tmp_path / "state.db")
5595    try:
5596        job_id = db.create_job("Improve a workflow", title="lesson-grounding", kind="generic")
5597        run_id = db.start_run(job_id, model="test")
5598        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
5599        db.finish_step(step_id, status="completed", output_data={"success": True, "stdout": "branch stalled\n"})
5600        db.finish_run(run_id, "completed")
5601
5602        result = run_one_step(
5603            job_id,
5604            config=config,
5605            db=db,
5606            llm=ScriptedLLM([
5607                LLMResponse(tool_calls=[
5608                    ToolCall(
5609                        name="record_lesson",
5610                        arguments={
5611                            "category": "strategy",
5612                            "lesson": "When a branch stalls, pivot to the next measurable action instead of adding more notes.",
5613                        },
5614                    )
5615                ])
5616            ]),
5617        )
5618
5619        assert result.status == "completed"
5620    finally:
5621        db.close()
5622
5623
5624def test_record_lesson_allows_positive_checkpoint_summary_with_new_concrete_terms(tmp_path):
5625    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
5626    db = AgentDB(tmp_path / "state.db")
5627    try:
5628        job_id = db.create_job("Summarize a broad checkpoint", title="lesson-grounding", kind="generic")
5629        run_id = db.start_run(job_id, model="test")
5630        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
5631        db.finish_step(
5632            step_id,
5633            status="completed",
5634            output_data={"success": True, "stdout": "checkpoint read and accounting required\n"},
5635        )
5636        db.finish_run(run_id, "completed")
5637
5638        result = run_one_step(
5639            job_id,
5640            config=config,
5641            db=db,
5642            llm=ScriptedLLM([
5643                LLMResponse(tool_calls=[
5644                    ToolCall(
5645                        name="record_lesson",
5646                        arguments={
5647                            "category": "memory",
5648                            "lesson": (
5649                                "Recording checkpoint context says PackageManager-42 and RuntimeProbe-7 should stay "
5650                                "available for the next branch, but no final benchmark decision has been made."
5651                            ),
5652                        },
5653                    )
5654                ])
5655            ]),
5656        )
5657
5658        assert result.status == "completed"
5659    finally:
5660        db.close()
5661
5662
5663def test_record_findings_blocks_single_unsupported_identifier(tmp_path):
5664    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
5665    db = AgentDB(tmp_path / "state.db")
5666    try:
5667        job_id = db.create_job("Record only observed identifiers", title="grounding", kind="generic")
5668        run_id = db.start_run(job_id, model="test")
5669        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
5670        db.finish_step(
5671            step_id,
5672            status="completed",
5673            output_data={
5674                "success": True,
5675                "stdout": (
5676                    "Observed candidate list from tool output. "
5677                    "The source contains AlphaCandidate and BetaCandidate with ordinary text evidence. "
5678                    "No generated opaque identifiers are present in this evidence."
5679                )
5680                * 12,
5681            },
5682        )
5683        db.finish_run(run_id, "completed")
5684
5685        result = run_one_step(
5686            job_id,
5687            config=config,
5688            db=db,
5689            llm=ScriptedLLM([
5690                LLMResponse(tool_calls=[ToolCall(name="record_findings", arguments={
5691                    "findings": [{
5692                        "name": "WWHHH5 generated candidate",
5693                        "category": "test",
5694                        "reason": "Observed candidate list needs follow-up, but this identifier was not in evidence.",
5695                        "status": "new",
5696                    }]
5697                })])
5698            ]),
5699        )
5700
5701        assert result.status == "blocked"
5702        assert result.result["error"] == "evidence grounding required"
5703        assert result.result["evidence_grounding"]["unsupported_tokens"] == ["WWHHH5"]
5704    finally:
5705        db.close()
5706
5707
5708def test_evidence_grounding_ignores_job_context_labels(tmp_path):
5709    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
5710    db = AgentDB(tmp_path / "state.db")
5711    try:
5712        job_id = db.create_job(
5713            "Benchmark AlphaModel throughput",
5714            title="alphamodel throughput fixed",
5715            kind="generic",
5716        )
5717        run_id = db.start_run(job_id, model="test")
5718        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
5719        db.finish_step(
5720            step_id,
5721            status="completed",
5722            output_data={
5723                "success": True,
5724                "stdout": (
5725                    "Observed benchmark setup is ready. Runtime exists, candidate file exists, "
5726                    "and the next action is a planned baseline measurement. "
5727                )
5728                * 6,
5729            },
5730        )
5731        db.finish_run(run_id, "completed")
5732
5733        result = run_one_step(
5734            job_id,
5735            config=config,
5736            db=db,
5737            llm=ScriptedLLM([
5738                LLMResponse(tool_calls=[ToolCall(name="record_experiment", arguments={
5739                    "title": "Baseline Throughput - AlphaModel",
5740                    "status": "planned",
5741                    "higher_is_better": True,
5742                    "metadata": {"project": "alphamodel-throughput"},
5743                    "next_action": "Run the baseline measurement and record the observed metric.",
5744                })])
5745            ]),
5746        )
5747
5748        assert result.status == "completed"
5749    finally:
5750        db.close()
5751
5752
5753def test_evidence_grounding_blocks_unsupported_numeric_measurements(tmp_path):
5754    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
5755    db = AgentDB(tmp_path / "state.db")
5756    try:
5757        job_id = db.create_job("Validate candidate file size", title="size-grounding", kind="generic")
5758        run_id = db.start_run(job_id, model="test")
5759        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
5760        db.finish_step(
5761            step_id,
5762            status="completed",
5763            output_data={
5764                "success": True,
5765                "stdout": "-rw-r--r-- 1 user user 12G May 14 /srv/models/AlphaModel-Q4.foo\n",
5766            },
5767        )
5768        db.finish_run(run_id, "completed")
5769
5770        result = run_one_step(
5771            job_id,
5772            config=config,
5773            db=db,
5774            llm=ScriptedLLM([
5775                LLMResponse(tool_calls=[ToolCall(name="record_findings", arguments={
5776                    "findings": [{
5777                        "name": "Candidate file",
5778                        "category": "environment",
5779                        "location": "/srv/models/AlphaModel-Q4.foo",
5780                        "metadata": {"file_size": "16G"},
5781                        "status": "verified",
5782                    }]
5783                })])
5784            ]),
5785        )
5786
5787        assert result.status == "blocked"
5788        assert result.result["error"] == "evidence grounding required"
5789        assert "16G" in result.result["evidence_grounding"]["unsupported_tokens"]
5790    finally:
5791        db.close()
5792
5793
5794def test_evidence_grounding_ignores_record_schema_keys(tmp_path):
5795    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
5796    db = AgentDB(tmp_path / "state.db")
5797    try:
5798        job_id = db.create_job("Record observed setup status", title="grounding", kind="generic")
5799        run_id = db.start_run(job_id, model="test")
5800        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
5801        db.finish_step(
5802            step_id,
5803            status="completed",
5804            output_data={"success": True, "stdout": "Python 3 is installed. curl is available. No token file was found."},
5805        )
5806        db.finish_run(run_id, "completed")
5807
5808        result = run_one_step(
5809            job_id,
5810            config=config,
5811            db=db,
5812            llm=ScriptedLLM([
5813                LLMResponse(tool_calls=[
5814                    ToolCall(
5815                        name="record_experiment",
5816                        arguments={
5817                            "title": "Setup status",
5818                            "status": "measured",
5819                            "metric_name": "ready_components",
5820                                "metric_value": 1,
5821                                "config": {"python_3_installed": True, "curl_available": True},
5822                                "result": "Python 3 is installed and curl is available.",
5823                                "next_action": "record remaining setup gaps or proceed to the next validation",
5824                            },
5825                        )
5826                    ])
5827                ]),
5828        )
5829
5830        assert result.status == "completed"
5831        experiment = db.get_job(job_id)["metadata"]["experiment_ledger"][0]
5832        assert experiment["config"]["python_3_installed"] is True
5833    finally:
5834        db.close()
5835
5836
5837def test_evidence_grounding_uses_durable_finding_location_and_metadata(tmp_path):
5838    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
5839    db = AgentDB(tmp_path / "state.db")
5840    try:
5841        job_id = db.create_job("Record known candidate from durable state", title="durable-grounding", kind="generic")
5842        db.append_finding_record(
5843            job_id,
5844            name="Candidate runtime model",
5845            category="environment",
5846            location="/srv/models/AlphaModel-99-Q4.gguf",
5847            reason="Observed candidate model path is ready for later measurement.",
5848            status="available",
5849            metadata={"quantization": "Q4"},
5850        )
5851
5852        result = run_one_step(
5853            job_id,
5854            config=config,
5855            db=db,
5856            llm=ScriptedLLM([
5857                LLMResponse(tool_calls=[
5858                    ToolCall(
5859                        name="record_experiment",
5860                        arguments={
5861                            "title": "Candidate runtime model readiness",
5862                            "status": "measured",
5863                            "metric_name": "candidate_files",
5864                            "metric_value": 1,
5865                            "config": {"model": "/srv/models/AlphaModel-99-Q4.gguf"},
5866                            "result": "Durable finding shows /srv/models/AlphaModel-99-Q4.gguf is available.",
5867                            "next_action": "measure throughput with the durable candidate model",
5868                        },
5869                    )
5870                ])
5871            ]),
5872        )
5873
5874        assert result.status == "completed"
5875        assert result.tool_name == "record_experiment"
5876    finally:
5877        db.close()
5878
5879
5880def test_evidence_grounding_ignores_json_literals_even_when_stale_tokens_exist(tmp_path):
5881    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
5882    db = AgentDB(tmp_path / "state.db")
5883    try:
5884        job_id = db.create_job("Record observed benchmark plan", title="literal-grounding", kind="generic")
5885        db.update_job_metadata(job_id, {"unsupported_claim_tokens": ["true"]})
5886        run_id = db.start_run(job_id, model="test")
5887        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
5888        db.finish_step(
5889            step_id,
5890            status="completed",
5891            output_data={
5892                "success": True,
5893                "stdout": "Observed benchmark harness is ready and next action is to measure throughput. " * 4,
5894            },
5895        )
5896        db.finish_run(run_id, "completed")
5897
5898        result = run_one_step(
5899            job_id,
5900            config=config,
5901            db=db,
5902            llm=ScriptedLLM([
5903                LLMResponse(tool_calls=[ToolCall(name="record_experiment", arguments={
5904                    "title": "Baseline benchmark plan",
5905                    "status": "planned",
5906                    "higher_is_better": True,
5907                    "metric_name": "throughput",
5908                    "metric_unit": "tokens/sec",
5909                    "next_action": "Run the benchmark and record the observed metric.",
5910                })])
5911            ]),
5912        )
5913
5914        assert result.status == "completed"
5915    finally:
5916        db.close()
5917
5918
5919def test_evidence_grounding_ignores_planning_and_status_labels(tmp_path):
5920    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
5921    db = AgentDB(tmp_path / "state.db")
5922    try:
5923        job_id = db.create_job("Record observed build validation", title="status-grounding", kind="generic")
5924        run_id = db.start_run(job_id, model="test")
5925        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
5926        db.finish_step(
5927            step_id,
5928            status="completed",
5929            output_data={
5930                "success": True,
5931                "stdout": (
5932                    "Observed file /srv/models/AlphaModel-Q4.foo exists. "
5933                    "The tool output showed rc=0 and the benchmark branch can continue. "
5934                )
5935                * 4,
5936            },
5937        )
5938        db.finish_run(run_id, "completed")
5939
5940        result = run_one_step(
5941            job_id,
5942            config=config,
5943            db=db,
5944            llm=ScriptedLLM([
5945                LLMResponse(tool_calls=[ToolCall(name="record_roadmap", arguments={
5946                    "title": "Build validation roadmap",
5947                    "scope": "Checking the observed candidate before ongoing benchmark work.",
5948                    "milestones": [
5949                        {"title": "P1 validate observed candidate", "status": "active"},
5950                        {"title": "P2 proceed to benchmark", "status": "planned"},
5951                    ],
5952                })])
5953            ]),
5954        )
5955
5956        assert result.status == "completed"
5957    finally:
5958        db.close()
5959
5960
5961def test_run_one_step_blocks_memory_graph_with_unsupported_claims(tmp_path):
5962    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
5963    db = AgentDB(tmp_path / "state.db")
5964    try:
5965        job_id = db.create_job("Consolidate observed facts", title="memory-grounding", kind="generic")
5966        run_id = db.start_run(job_id, model="test")
5967        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
5968        db.finish_step(
5969            step_id,
5970            status="completed",
5971            output_data={
5972                "success": True,
5973                "stdout": "GPU: AMD Device 7590\nCPU: AMD Ryzen 9 7900X\nMemory: 93Gi\n",
5974            },
5975        )
5976        db.finish_run(run_id, "completed")
5977
5978        result = run_one_step(
5979            job_id,
5980            config=config,
5981            db=db,
5982            llm=ScriptedLLM([
5983                LLMResponse(tool_calls=[
5984                    ToolCall(
5985                        name="record_memory_graph",
5986                        arguments={
5987                            "nodes": [
5988                                {
5989                                    "key": "hardware",
5990                                    "kind": "fact",
5991                                    "title": "NVIDIA GTX 970 CUDA hardware",
5992                                    "summary": "The machine has NVIDIA GTX 970 CUDA hardware.",
5993                                }
5994                            ]
5995                        },
5996                    )
5997                ])
5998            ]),
5999        )
6000
6001        assert result.status == "blocked"
6002        assert result.result["error"] == "evidence grounding required"
6003        assert result.result["blocked_tool"] == "record_memory_graph"
6004    finally:
6005        db.close()
6006
6007
6008def test_run_one_step_allows_memory_graph_identifier_labels_without_evidence(tmp_path):
6009    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
6010    db = AgentDB(tmp_path / "state.db")
6011    try:
6012        job_id = db.create_job("Consolidate abstract graph labels", title="memory-grounding", kind="generic")
6013        run_id = db.start_run(job_id, model="test")
6014        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
6015        db.finish_step(
6016            step_id,
6017            status="completed",
6018            output_data={
6019                "success": True,
6020                "stdout": (
6021                    "Current observation: AMD Ryzen 9 7900X host with fresh API discovery evidence. "
6022                    "The next branch is to convert existing source evidence into a download decision."
6023                ),
6024            },
6025        )
6026        db.finish_run(run_id, "completed")
6027
6028        result = run_one_step(
6029            job_id,
6030            config=config,
6031            db=db,
6032            llm=ScriptedLLM([
6033                LLMResponse(tool_calls=[
6034                    ToolCall(
6035                        name="record_memory_graph",
6036                        arguments={
6037                            "edges": [
6038                                {
6039                                    "from_key": "decision-q4-km-primary",
6040                                    "relation": "informs",
6041                                    "to_key": "question-download-q4-km-url",
6042                                },
6043                                {
6044                                    "from_key": "skill-api-download-pattern",
6045                                    "relation": "supports",
6046                                    "to_key": "milestone-direct-url-download",
6047                                },
6048                            ]
6049                        },
6050                    )
6051                ])
6052            ]),
6053        )
6054
6055        assert result.status == "completed"
6056        assert result.tool_name == "record_memory_graph"
6057    finally:
6058        db.close()
6059
6060
6061def test_run_one_step_still_blocks_stale_memory_graph_key_claims(tmp_path):
6062    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
6063    db = AgentDB(tmp_path / "state.db")
6064    try:
6065        job_id = db.create_job(
6066            "Do not reintroduce stale graph labels",
6067            title="memory-grounding",
6068            kind="generic",
6069            metadata={"unsupported_claim_tokens": ["XeonE5-2690"]},
6070        )
6071        run_id = db.start_run(job_id, model="test")
6072        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
6073        db.finish_step(
6074            step_id,
6075            status="completed",
6076            output_data={
6077                "success": True,
6078                "stdout": (
6079                    "Current observation: AMD Ryzen 9 7900X host with no legacy CPU marker in fresh evidence. "
6080                    "Durable memory must not reuse unsupported old hardware claims."
6081                ),
6082            },
6083        )
6084        db.finish_run(run_id, "completed")
6085
6086        result = run_one_step(
6087            job_id,
6088            config=config,
6089            db=db,
6090            llm=ScriptedLLM([
6091                LLMResponse(tool_calls=[
6092                    ToolCall(
6093                        name="record_memory_graph",
6094                        arguments={
6095                            "edges": [
6096                                {
6097                                    "from_key": "XeonE5-2690",
6098                                    "relation": "constrains",
6099                                    "to_key": "current-plan",
6100                                }
6101                            ]
6102                        },
6103                    )
6104                ])
6105            ]),
6106        )
6107
6108        assert result.status == "blocked"
6109        assert result.result["error"] == "evidence grounding required"
6110        assert "XeonE5-2690" in result.result["evidence_grounding"]["unsupported_tokens"]
6111    finally:
6112        db.close()
6113
6114
6115def test_run_one_step_allows_memory_graph_grounded_in_durable_records(tmp_path):
6116    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
6117    db = AgentDB(tmp_path / "state.db")
6118    try:
6119        job_id = db.create_job("Consolidate durable facts", title="memory-grounded-ledger", kind="generic")
6120        db.append_finding_record(
6121            job_id,
6122            name="Artifact cache includes Package_A-2.7.1 and backend XYZ123",
6123            category="environment_fact",
6124            reason="A saved checkpoint established Package_A-2.7.1 and backend XYZ123 as available options.",
6125            metadata={"evidence_artifact": "art_env"},
6126        )
6127
6128        result = run_one_step(
6129            job_id,
6130            config=config,
6131            db=db,
6132            llm=ScriptedLLM([
6133                LLMResponse(tool_calls=[
6134                    ToolCall(
6135                        name="record_memory_graph",
6136                        arguments={
6137                            "nodes": [
6138                                {
6139                                    "key": "package-a",
6140                                    "kind": "fact",
6141                                    "title": "Package_A-2.7.1 via backend XYZ123",
6142                                    "summary": "Durable finding says Package_A-2.7.1 is available through backend XYZ123.",
6143                                }
6144                            ]
6145                        },
6146                    )
6147                ])
6148            ]),
6149        )
6150
6151        assert result.status == "completed"
6152        assert result.tool_name == "record_memory_graph"
6153    finally:
6154        db.close()
6155
6156
6157def test_run_one_step_blocks_memory_graph_grounded_only_in_stale_records(tmp_path):
6158    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
6159    db = AgentDB(tmp_path / "state.db")
6160    try:
6161        job_id = db.create_job(
6162            "Consolidate durable facts",
6163            title="memory-stale-ledger",
6164            kind="generic",
6165            metadata={"unsupported_claim_tokens": ["XeonE5-2690"]},
6166        )
6167        db.append_finding_record(
6168            job_id,
6169            name="Artifact cache includes XeonE5-2690",
6170            category="environment_fact",
6171            reason="Older ledger record mentioned XeonE5-2690.",
6172            metadata={"evidence_artifact": "art_old"},
6173        )
6174
6175        result = run_one_step(
6176            job_id,
6177            config=config,
6178            db=db,
6179            llm=ScriptedLLM([
6180                LLMResponse(tool_calls=[
6181                    ToolCall(
6182                        name="record_memory_graph",
6183                        arguments={
6184                            "nodes": [
6185                                {
6186                                    "key": "package-a",
6187                                    "kind": "fact",
6188                                    "title": "XeonE5-2690",
6189                                    "summary": "XeonE5-2690 is still valid.",
6190                                }
6191                            ]
6192                        },
6193                    )
6194                ])
6195            ]),
6196        )
6197
6198        assert result.status == "blocked"
6199        assert result.result["error"] == "evidence grounding required"
6200        assert "XeonE5-2690" in result.result["evidence_grounding"]["unsupported_tokens"]
6201    finally:
6202        db.close()
6203
6204
6205def test_run_one_step_allows_stale_token_when_fresh_evidence_revalidates_it(tmp_path):
6206    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
6207    db = AgentDB(tmp_path / "state.db")
6208    try:
6209        job_id = db.create_job(
6210            "Revalidate durable facts",
6211            title="memory-stale-revalidated",
6212            kind="generic",
6213            metadata={"unsupported_claim_tokens": ["XeonE5-2690"]},
6214        )
6215        run_id = db.start_run(job_id, model="test")
6216        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
6217        db.finish_step(
6218            step_id,
6219            status="completed",
6220            output_data={"success": True, "stdout": "Fresh probe: CPU marker XeonE5-2690 is visible in this environment."},
6221        )
6222        db.finish_run(run_id, "completed")
6223
6224        result = run_one_step(
6225            job_id,
6226            config=config,
6227            db=db,
6228            llm=ScriptedLLM([
6229                LLMResponse(tool_calls=[
6230                    ToolCall(
6231                        name="record_memory_graph",
6232                        arguments={
6233                            "nodes": [
6234                                {
6235                                    "key": "fresh-cpu",
6236                                    "kind": "fact",
6237                                    "title": "XeonE5-2690",
6238                                    "summary": "Fresh shell evidence revalidated XeonE5-2690.",
6239                                }
6240                            ]
6241                        },
6242                    )
6243                ])
6244            ]),
6245        )
6246
6247        assert result.status == "completed"
6248        assert result.tool_name == "record_memory_graph"
6249    finally:
6250        db.close()
6251
6252
6253def test_run_one_step_allows_durable_records_grounded_in_read_artifact(tmp_path):
6254    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
6255    db = AgentDB(tmp_path / "state.db")
6256    try:
6257        job_id = db.create_job("Record facts from saved evidence", title="grounded-read", kind="generic")
6258        run_id = db.start_run(job_id, model="test")
6259        step_id = db.add_step(
6260            job_id=job_id,
6261            run_id=run_id,
6262            kind="tool",
6263            tool_name="read_artifact",
6264            input_data={"arguments": {"artifact_id": "art_checkpoint"}},
6265        )
6266        db.finish_step(
6267            step_id,
6268            status="completed",
6269            output_data={
6270                "success": True,
6271                "content": (
6272                    "Environment evidence: CPU Intel Xeon E5-2690 v3, architecture x86_64, "
6273                    "memory 62.8G total, no NVIDIA GPU visible from nvidia-smi. "
6274                    "This content is the source for durable records."
6275                ),
6276            },
6277        )
6278        db.finish_run(run_id, "completed")
6279
6280        result = run_one_step(
6281            job_id,
6282            config=config,
6283            db=db,
6284            llm=ScriptedLLM([
6285                LLMResponse(tool_calls=[
6286                    ToolCall(
6287                        name="record_findings",
6288                        arguments={
6289                            "findings": [
6290                                {
6291                                    "name": "Intel Xeon E5-2690 v3 x86_64 environment",
6292                                    "category": "hardware_fact",
6293                                    "reason": "Saved checkpoint states CPU Intel Xeon E5-2690 v3, x86_64, memory 62.8G total, and no NVIDIA GPU visible.",
6294                                    "evidence_artifact": "art_checkpoint",
6295                                }
6296                            ]
6297                        },
6298                    )
6299                ])
6300            ]),
6301        )
6302
6303        assert result.status == "completed"
6304        findings = db.get_job(job_id)["metadata"]["finding_ledger"]
6305        assert findings[0]["name"] == "Intel Xeon E5-2690 v3 x86_64 environment"
6306    finally:
6307        db.close()
6308
6309
6310def test_run_one_step_scopes_grounding_to_cited_step(tmp_path):
6311    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
6312    db = AgentDB(tmp_path / "state.db")
6313    try:
6314        job_id = db.create_job("Record facts from cited evidence", title="cited-grounding", kind="generic")
6315        run_id = db.start_run(job_id, model="test")
6316        old_step = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
6317        db.finish_step(
6318            old_step,
6319            status="completed",
6320            output_data={
6321                "success": True,
6322                "stdout": (
6323                    "Old evidence: Intel Xeon E5-2690 v3 with 62.8G memory. "
6324                    "This is intentionally stale evidence from an earlier step and should not validate step #2."
6325                ),
6326            },
6327        )
6328        new_step = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
6329        db.finish_step(
6330            new_step,
6331            status="completed",
6332            output_data={
6333                "success": True,
6334                "stdout": (
6335                    "Current evidence: AMD Ryzen 9 7900X with 93Gi memory and AMD GPU. "
6336                    "This newer cited step is the only source that should ground claims citing step #2."
6337                ),
6338            },
6339        )
6340        db.finish_run(run_id, "completed")
6341
6342        result = run_one_step(
6343            job_id,
6344            config=config,
6345            db=db,
6346            llm=ScriptedLLM([
6347                LLMResponse(tool_calls=[
6348                    ToolCall(
6349                        name="write_artifact",
6350                        arguments={
6351                            "title": "Cited baseline",
6352                            "summary": "Baseline from step #2.",
6353                            "content": "From step #2: Intel Xeon E5-2690 v3 with 62.8G memory.",
6354                            "artifact_type": "text",
6355                        },
6356                    )
6357                ])
6358            ]),
6359        )
6360
6361        assert result.status == "blocked"
6362        assert result.result["error"] == "evidence grounding required"
6363        assert "E5-2690" in result.result["evidence_grounding"]["unsupported_tokens"]
6364        assert result.result["evidence_grounding"]["evidence_steps"] == [2]
6365        assert "E5-2690" in db.get_job(job_id)["metadata"]["unsupported_claim_tokens"]
6366    finally:
6367        db.close()
6368
6369
6370def test_cited_step_numbers_ignore_ordinal_hash_labels():
6371    text = (
6372        "llama.cpp Build Attempt #3 should not cite old evidence. "
6373        "Use step #42 and shell_exec_step_1037 if explicit evidence is needed. "
6374        "The older step-2678 reference is also explicit."
6375    )
6376
6377    assert _cited_step_numbers(text) == {42, 1037, 2678}
6378
6379
6380def test_prompt_shows_evidence_grounding_tokens_after_block(tmp_path):
6381    db = AgentDB(tmp_path / "state.db")
6382    try:
6383        job_id = db.create_job("Use only observed evidence", title="grounding-prompt", kind="generic")
6384        run_id = db.start_run(job_id, model="test")
6385        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="write_artifact")
6386        db.finish_step(
6387            step_id,
6388            status="blocked",
6389            output_data={
6390                "success": True,
6391                "recoverable": True,
6392                "error": "evidence grounding required",
6393                "evidence_grounding": {"unsupported_tokens": ["NVIDIA", "Xeon", "AVX-512"]},
6394            },
6395        )
6396        job = db.get_job(job_id)
6397        content = build_messages(job, db.list_steps(job_id=job_id))[-1]["content"]
6398
6399        assert "unsupported=NVIDIA, Xeon, AVX-512" in content
6400        assert "use only tokens present in recent observed evidence" in content
6401    finally:
6402        db.close()
6403
6404
6405def test_prompt_shows_missing_candidate_paths_after_grounding_block(tmp_path):
6406    db = AgentDB(tmp_path / "state.db")
6407    try:
6408        job_id = db.create_job("Optimize benchmark speed with exact file evidence", title="grounding-paths", kind="generic")
6409        run_id = db.start_run(job_id, model="test")
6410        for index in range(18):
6411            shell_step = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
6412            db.finish_step(shell_step, status="completed", output_data={"success": True, "stdout": f"probe {index}"})
6413        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_experiment")
6414        db.finish_step(
6415            step_id,
6416            status="blocked",
6417            summary="blocked record_experiment; evidence grounding required",
6418            output_data={
6419                "success": True,
6420                "recoverable": True,
6421                "error": "evidence grounding required",
6422                "evidence_grounding": {
6423                    "missing_candidate_paths": [
6424                        "/srv/models/AlphaModel-Q4.foo",
6425                        "/srv/models/BetaModel-Q8.foo",
6426                    ],
6427                    "unsupported_tokens": [
6428                        "/srv/models/AlphaModel-Q4.foo",
6429                        "/srv/models/BetaModel-Q8.foo",
6430                    ],
6431                },
6432            },
6433        )
6434        job = db.get_job(job_id)
6435        content = build_messages(job, db.list_steps(job_id=job_id))[-1]["content"]
6436
6437        assert "Recent evidence grounding blocked a durable record" in content
6438        assert "This job needs measured progress" not in content
6439        assert "/srv/models/AlphaModel-Q4.foo" in content
6440        assert "rewrite the durable record with exact observed paths" in content
6441    finally:
6442        db.close()
6443
6444
6445def test_prompt_adds_ranked_current_candidates_to_stale_grounding_block(tmp_path):
6446    db = AgentDB(tmp_path / "state.db")
6447    try:
6448        job_id = db.create_job("Benchmark OmegaModel throughput", title="grounding-current-candidates", kind="generic")
6449        db.update_job_metadata(
6450            job_id,
6451            {
6452                "task_queue": [
6453                    {
6454                        "title": "Validate OmegaModel file path",
6455                        "status": "open",
6456                        "contract": "experiment",
6457                        "acceptance_criteria": "Use a validated candidate path.",
6458                        "evidence_needed": "Shell output with file size and benchmark result.",
6459                    }
6460                ]
6461            },
6462        )
6463        run_id = db.start_run(job_id, model="test")
6464        shell_step = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
6465        db.finish_step(
6466            shell_step,
6467            status="completed",
6468            output_data={
6469                "success": True,
6470                "stdout": (
6471                    "/tmp/aux/ggml-vocab-alpha.foo\n"
6472                    "/srv/models/OmegaModel-primary.foo\n"
6473                ),
6474            },
6475        )
6476        block_step = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_experiment")
6477        db.finish_step(
6478            block_step,
6479            status="blocked",
6480            summary="blocked record_experiment; evidence grounding required",
6481            output_data={
6482                "success": True,
6483                "recoverable": True,
6484                "error": "evidence grounding required",
6485                "evidence_grounding": {
6486                    "missing_candidate_paths": ["/tmp/aux/ggml-vocab-alpha.foo"],
6487                    "unsupported_tokens": ["/tmp/aux/ggml-vocab-alpha.foo"],
6488                },
6489            },
6490        )
6491
6492        content = build_messages(db.get_job(job_id), db.list_steps(job_id=job_id))[-1]["content"]
6493        idx = content.index("Next-action constraint:")
6494        next_constraint = content[idx: idx + 1200]
6495
6496        assert "current ranked candidate paths are available" in next_constraint
6497        ranked_text = next_constraint[next_constraint.index("Candidate paths:"):]
6498        assert ranked_text.index("/srv/models/OmegaModel-primary.foo") < ranked_text.index("/tmp/aux/ggml-vocab-alpha.foo")
6499    finally:
6500        db.close()
6501
6502
6503def test_prompt_does_not_resurface_grounding_block_after_durable_resolution(tmp_path):
6504    db = AgentDB(tmp_path / "state.db")
6505    try:
6506        job_id = db.create_job("Use exact file evidence", title="grounding-resolved", kind="generic")
6507        run_id = db.start_run(job_id, model="test")
6508        block_step = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_findings")
6509        db.finish_step(
6510            block_step,
6511            status="blocked",
6512            summary="blocked record_findings; evidence grounding required",
6513            output_data={
6514                "success": True,
6515                "recoverable": True,
6516                "error": "evidence grounding required",
6517                "evidence_grounding": {
6518                    "missing_candidate_paths": ["/srv/models/AlphaModel-Q4.foo"],
6519                    "unsupported_tokens": ["/srv/models/AlphaModel-Q4.foo"],
6520                },
6521            },
6522        )
6523        resolved_step = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_findings")
6524        db.finish_step(
6525            resolved_step,
6526            status="completed",
6527            output_data={
6528                "success": True,
6529                "findings": [{"name": "Exact path accounted", "reason": "/srv/models/AlphaModel-Q4.foo was validated."}],
6530            },
6531        )
6532        db.finish_run(run_id, "completed")
6533
6534        content = build_messages(db.get_job(job_id), db.list_steps(job_id=job_id))[-1]["content"]
6535        next_action = content.split("Next-action constraint:", 1)[1].split("\n\n", 1)[0]
6536
6537        assert "Recent evidence grounding blocked a durable record" not in content
6538        assert "/srv/models/AlphaModel-Q4.foo" not in next_action
6539    finally:
6540        db.close()
6541
6542
6543def test_prompt_suppresses_findings_matching_stale_claim_tokens(tmp_path):
6544    db = AgentDB(tmp_path / "state.db")
6545    try:
6546        job_id = db.create_job("Prefer current durable evidence", title="stale-ledger", kind="generic")
6547        db.append_finding_record(job_id, name="Intel Xeon E5-2690 v3 baseline", category="hardware")
6548        db.append_finding_record(job_id, name="AMD Ryzen 9 7900X baseline", category="hardware")
6549        db.append_lesson(
6550            job_id,
6551            "Evidence grounding rejected unsupported concrete tokens for record_experiment: E5-2690, v3, RAM. Treat matching prior ledger claims as stale.",
6552            category="mistake",
6553        )
6554
6555        job = db.get_job(job_id)
6556        content = build_messages(job, db.list_steps(job_id=job_id))[-1]["content"]
6557
6558        assert "Unsupported/stale claim tokens to avoid until re-verified: [unsupported-stale-claim]" in content
6559        assert "Suppressed 1 stale finding" in content
6560        assert "AMD Ryzen 9 7900X baseline" in content
6561        assert "Intel Xeon E5-2690 v3 baseline" not in content
6562    finally:
6563        db.close()
6564
6565
6566def test_prompt_prioritizes_validation_for_recent_candidate_file_paths(tmp_path):
6567    db = AgentDB(tmp_path / "state.db")
6568    try:
6569        job_id = db.create_job("Validate a discovered runtime file", title="candidate-file", kind="generic")
6570        db.update_job_metadata(
6571            job_id,
6572            {
6573                "task_queue": [
6574                    {
6575                        "title": "Run baseline benchmark with the discovered file",
6576                        "status": "open",
6577                        "contract": "experiment",
6578                        "acceptance_criteria": "Benchmark command uses a validated file path.",
6579                        "evidence_needed": "Shell output showing file size and benchmark result.",
6580                    }
6581                ]
6582            },
6583        )
6584        run_id = db.start_run(job_id, model="test")
6585        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
6586        db.finish_step(
6587            step_id,
6588            status="completed",
6589            output_data={
6590                "success": True,
6591                "stdout": (
6592                    "candidate files:\n"
6593                    "/srv/models/ExampleModel-Q4.foo\n"
6594                    "/srv/models/sidecar.txt\n"
6595                ),
6596            },
6597        )
6598        db.finish_run(run_id, "completed")
6599
6600        content = build_messages(db.get_job(job_id), db.list_steps(job_id=job_id))[-1]["content"]
6601
6602        assert "Candidate file discovery:" in content
6603        assert "/srv/models/ExampleModel-Q4.foo" in content
6604        assert "Validate likely candidates with shell_exec" in content
6605        assert "Do not reject a non-empty candidate binary from `file` output alone" in content
6606    finally:
6607        db.close()
6608
6609
6610def test_prompt_deprioritizes_recent_stub_candidate_file_paths(tmp_path):
6611    db = AgentDB(tmp_path / "state.db")
6612    try:
6613        job_id = db.create_job("Benchmark AlphaModel throughput", title="alpha benchmark", kind="generic")
6614        db.update_job_metadata(
6615            job_id,
6616            {
6617                "task_queue": [
6618                    {
6619                        "title": "Validate candidate model file before benchmark",
6620                        "status": "open",
6621                        "contract": "experiment",
6622                        "acceptance_criteria": "Benchmark uses a validated model file.",
6623                        "evidence_needed": "Shell output showing file size and parser/header status.",
6624                    }
6625                ]
6626            },
6627        )
6628        run_id = db.start_run(job_id, model="test")
6629        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
6630        db.finish_step(
6631            step_id,
6632            status="completed",
6633            output_data={
6634                "success": True,
6635                "stdout": (
6636                    "-rw-r--r-- 1 user user 29 May 15 10:00 /tmp/models/AlphaModel-Q4.foo\n"
6637                    "/tmp/models/AlphaModel-Q4.foo: ASCII text, with no line terminators\n"
6638                    "-rw-r--r-- 1 user user 12G May 15 10:01 /srv/models/AlphaModel-IQ3.foo\n"
6639                ),
6640            },
6641        )
6642        db.finish_run(run_id, "completed")
6643
6644        content = build_messages(db.get_job(job_id), db.list_steps(job_id=job_id))[-1]["content"]
6645
6646        idx = content.index("Next-action constraint:")
6647        next_constraint = content[idx: idx + 1400]
6648        assert "/srv/models/AlphaModel-IQ3.foo" in next_constraint
6649        assert "/tmp/models/AlphaModel-Q4.foo" not in next_constraint
6650        assert "Recently invalid or stub-like candidates" in content
6651        assert "/tmp/models/AlphaModel-Q4.foo" in content
6652    finally:
6653        db.close()
6654
6655
6656def test_prompt_isolates_current_execution_focus_for_candidate_validation(tmp_path):
6657    db = AgentDB(tmp_path / "state.db")
6658    try:
6659        job_id = db.create_job("Benchmark AlphaModel throughput", title="alpha benchmark", kind="generic")
6660        db.update_job_metadata(
6661            job_id,
6662            {
6663                "task_queue": [
6664                    {
6665                        "title": f"Old branch {index}",
6666                        "status": "open",
6667                        "priority": index,
6668                    }
6669                    for index in range(82)
6670                ] + [
6671                    {
6672                        "title": "Validate AlphaModel candidate file before benchmark",
6673                        "status": "active",
6674                        "priority": 100,
6675                        "contract": "experiment",
6676                        "acceptance_criteria": "Validated candidate file is used in a measurement.",
6677                        "evidence_needed": "Shell output with candidate file size and benchmark result.",
6678                    }
6679                ]
6680            },
6681        )
6682        run_id = db.start_run(job_id, model="test")
6683        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
6684        db.finish_step(
6685            step_id,
6686            status="failed",
6687            output_data={
6688                "success": False,
6689                "stdout": "-rw-r--r-- 1 user user 12G May 15 10:01 /srv/models/AlphaModel-IQ3.foo\n",
6690                "stderr": "ls: cannot access '/tmp/models/AlphaModel-Q4.foo': No such file or directory\n",
6691            },
6692        )
6693        db.finish_run(run_id, "failed")
6694
6695        content = build_messages(db.get_job(job_id), db.list_steps(job_id=job_id))[-1]["content"]
6696
6697        focus = content[content.index("Current execution focus:"): content.index("Pending measurement obligation:")]
6698        assert "phase=execute_with_validated_candidate" in focus
6699        assert "Use the recently validated candidate path: /srv/models/AlphaModel-IQ3.foo" in focus
6700        assert "backlog=83 tasks" in focus
6701        assert "Treat it as advisory" in focus
6702        next_constraint = content[content.index("Next-action constraint:"):]
6703        assert "/srv/models/AlphaModel-IQ3.foo" in next_constraint
6704        assert "/tmp/models/AlphaModel-Q4.foo" not in next_constraint
6705    finally:
6706        db.close()
6707
6708
6709def test_prompt_moves_from_candidate_validation_to_candidate_use_after_positive_evidence(tmp_path):
6710    db = AgentDB(tmp_path / "state.db")
6711    try:
6712        job_id = db.create_job("Benchmark AlphaModel throughput", title="alpha benchmark", kind="generic")
6713        db.update_job_metadata(
6714            job_id,
6715            {
6716                "task_queue": [
6717                    {
6718                        "title": "Run benchmark with validated AlphaModel file",
6719                        "status": "active",
6720                        "priority": 100,
6721                        "contract": "experiment",
6722                        "acceptance_criteria": "Benchmark command uses a validated file path.",
6723                        "evidence_needed": "Shell output showing file size and benchmark result.",
6724                    }
6725                ]
6726            },
6727        )
6728        run_id = db.start_run(job_id, model="test")
6729        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
6730        db.finish_step(
6731            step_id,
6732            status="completed",
6733            output_data={
6734                "success": True,
6735                "stdout": "-rw-r--r-- 1 user user 12G May 15 10:01 /srv/models/AlphaModel-IQ3.foo\n",
6736            },
6737        )
6738        db.finish_run(run_id, "completed")
6739
6740        content = build_messages(db.get_job(job_id), db.list_steps(job_id=job_id))[-1]["content"]
6741
6742        focus = content[content.index("Current execution focus:"): content.index("Pending measurement obligation:")]
6743        assert "phase=execute_with_validated_candidate" in focus
6744        assert "Use the recently validated candidate path: /srv/models/AlphaModel-IQ3.foo" in focus
6745        next_constraint = content[content.index("Next-action constraint:"):]
6746        assert "Use it in the next bounded action or measurement" in next_constraint
6747        assert "repeating existence checks" in next_constraint
6748    finally:
6749        db.close()
6750
6751
6752def test_prompt_ranks_context_matching_candidate_paths_before_auxiliary_files(tmp_path):
6753    db = AgentDB(tmp_path / "state.db")
6754    try:
6755        job_id = db.create_job("Benchmark AlphaModel throughput", title="alpha benchmark", kind="generic")
6756        db.update_job_metadata(
6757            job_id,
6758            {
6759                "task_queue": [
6760                    {
6761                        "title": "Validate candidate file path before benchmark",
6762                        "status": "open",
6763                        "contract": "experiment",
6764                        "acceptance_criteria": "Validated primary file is used in a measurement.",
6765                        "evidence_needed": "Shell output with file size and benchmark result.",
6766                    }
6767                ]
6768            },
6769        )
6770        run_id = db.start_run(job_id, model="test")
6771        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
6772        db.finish_step(
6773            step_id,
6774            status="completed",
6775            output_data={
6776                "success": True,
6777                "stdout": (
6778                    "/srv/models/ggml-vocab-alpha.foo\n"
6779                    "/srv/models/sidecar-mmproj-alpha.foo\n"
6780                    "/srv/models/AlphaModel-Q4.foo\n"
6781                ),
6782            },
6783        )
6784        db.finish_run(run_id, "completed")
6785
6786        job = db.get_job(job_id)
6787        content = build_messages(job, db.list_steps(job_id=job_id))[-1]["content"]
6788        ranked = _rank_candidate_file_paths(
6789            job,
6790            "Validate candidate file path before benchmark",
6791            [
6792                "/srv/models/ggml-vocab-alpha.foo",
6793                "/srv/models/sidecar-mmproj-alpha.foo",
6794                "/srv/models/AlphaModel-Q4.foo",
6795            ],
6796        )
6797
6798        section = content[content.index("Candidate file discovery:"): content.index("Measured progress guard:")]
6799        assert "Candidate paths:" in section
6800        assert ranked[0] == "/srv/models/AlphaModel-Q4.foo"
6801        assert "/srv/models/Alp" in section
6802        assert "This supersedes stale no-candidate/no-file memory" in section
6803        assert "header/signature bytes" in section
6804    finally:
6805        db.close()
6806
6807
6808def test_next_action_prioritizes_candidate_file_validation_over_download_retry(tmp_path):
6809    db = AgentDB(tmp_path / "state.db")
6810    try:
6811        job_id = db.create_job("Benchmark AlphaModel throughput", title="alpha benchmark", kind="generic")
6812        db.update_job_metadata(
6813            job_id,
6814            {
6815                "task_queue": [
6816                    {
6817                        "title": "Run baseline benchmark with the discovered file",
6818                        "status": "open",
6819                        "contract": "experiment",
6820                        "acceptance_criteria": "Benchmark command uses a validated file path.",
6821                        "evidence_needed": "Shell output showing file size and benchmark result.",
6822                    }
6823                ],
6824                "experiment_ledger": [
6825                    {
6826                        "title": "Remote download failed",
6827                        "status": "failed",
6828                        "metric_name": "downloaded_files",
6829                        "metric_value": 0,
6830                        "next_action": "Record tasks to explore alternative download methods and remote sources.",
6831                    }
6832                ],
6833            },
6834        )
6835        run_id = db.start_run(job_id, model="test")
6836        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
6837        db.finish_step(
6838            step_id,
6839            status="completed",
6840            output_data={
6841                "success": True,
6842                "stdout": "/srv/models/AlphaModel-Q4.foo\n",
6843            },
6844        )
6845        db.finish_run(run_id, "completed")
6846
6847        content = build_messages(db.get_job(job_id), db.list_steps(job_id=job_id))[-1]["content"]
6848
6849        idx = content.index("Next-action constraint:")
6850        next_constraint = content[idx: idx + 900]
6851        assert "Concrete candidate file paths are available" in next_constraint
6852        assert "/srv/models/AlphaModel-Q4.foo" in next_constraint
6853        assert "alternative download methods" not in next_constraint
6854    finally:
6855        db.close()
6856
6857
6858def test_prompt_ranks_late_candidate_paths_from_large_shell_listing(tmp_path):
6859    db = AgentDB(tmp_path / "state.db")
6860    try:
6861        job_id = db.create_job("Benchmark OmegaModel throughput", title="omega benchmark", kind="generic")
6862        db.update_job_metadata(
6863            job_id,
6864            {
6865                "task_queue": [
6866                    {
6867                        "title": "Validate OmegaModel candidate file before measurement",
6868                        "status": "open",
6869                        "contract": "experiment",
6870                        "acceptance_criteria": "Validated candidate file is used in a measurement.",
6871                        "evidence_needed": "Shell output with candidate file size and benchmark result.",
6872                    }
6873                ]
6874            },
6875        )
6876        run_id = db.start_run(job_id, model="test")
6877        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
6878        db.finish_step(
6879            step_id,
6880            status="completed",
6881            output_data={
6882                "success": True,
6883                "stdout": "\n".join(
6884                    [f"/srv/models/ggml-vocab-{index}.foo" for index in range(30)]
6885                    + ["/srv/models/OmegaModel-primary.foo"]
6886                ),
6887            },
6888        )
6889        db.finish_run(run_id, "completed")
6890
6891        content = build_messages(db.get_job(job_id), db.list_steps(job_id=job_id))[-1]["content"]
6892        section = content[content.index("Candidate file discovery:"): content.index("Measured progress guard:")]
6893
6894        assert "/srv/models/Ome" in section
6895    finally:
6896        db.close()
6897
6898
6899def test_prompt_prioritizes_structured_candidate_file_paths(tmp_path):
6900    db = AgentDB(tmp_path / "state.db")
6901    try:
6902        job_id = db.create_job("Validate a discovered remote file", title="candidate-file", kind="generic")
6903        db.update_job_metadata(
6904            job_id,
6905            {
6906                "task_queue": [
6907                    {
6908                        "title": "Download and validate a candidate file",
6909                        "status": "open",
6910                        "contract": "action",
6911                        "acceptance_criteria": "A candidate file path is selected and validated before use.",
6912                        "evidence_needed": "Shell output with size, hash, or validation metadata.",
6913                    }
6914                ]
6915            },
6916        )
6917        run_id = db.start_run(job_id, model="test")
6918        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
6919        db.finish_step(
6920            step_id,
6921            status="completed",
6922            output_data={
6923                "success": True,
6924                "stdout": (
6925                    '[{"type":"file","size":123456789,"path":"ExampleModel-Q4.foo"},'
6926                    '{"type":"file","size":42,"path":".gitattributes"}]'
6927                ),
6928            },
6929        )
6930        db.finish_run(run_id, "completed")
6931
6932        content = build_messages(db.get_job(job_id), db.list_steps(job_id=job_id))[-1]["content"]
6933
6934        assert "Candidate file discovery:" in content
6935        assert "ExampleModel-Q4.foo" in content
6936        assert "Validate likely candidates with shell_exec" in content
6937    finally:
6938        db.close()
6939
6940
6941def test_prompt_filters_truncated_and_url_like_candidate_file_paths(tmp_path):
6942    db = AgentDB(tmp_path / "state.db")
6943    try:
6944        job_id = db.create_job("Validate concrete local candidates", title="candidate-file", kind="generic")
6945        db.update_job_metadata(
6946            job_id,
6947            {
6948                "task_queue": [
6949                    {
6950                        "title": "Validate remembered file path",
6951                        "status": "open",
6952                        "contract": "action",
6953                        "acceptance_criteria": "A candidate file path is validated before use.",
6954                        "evidence_needed": "Shell output with file size or hash.",
6955                    }
6956                ],
6957                "experiment_ledger": [
6958                    {
6959                        "title": "Prior candidate discovery",
6960                        "result": "Avoid pseudo-paths like //example.com and truncated paths like /tmp/...",
6961                        "next_action": "Validate /opt/models/ConcreteModel-Q4.foo before declaring no usable file.",
6962                    }
6963                ],
6964            },
6965        )
6966
6967        content = build_messages(db.get_job(job_id), db.list_steps(job_id=job_id))[-1]["content"]
6968
6969        assert "Candidate file discovery:" in content
6970        assert "/opt/models/ConcreteModel-Q4.foo" in content
6971        assert "//example.com" not in content
6972        assert "/tmp/..." not in content
6973    finally:
6974        db.close()
6975
6976
6977def test_candidate_path_extraction_stops_at_escaped_newline_metadata():
6978    text = (
6979        'output="/srv/models/AlphaModel-Q4.foo\\n-rw-rw-r-- owner size" '
6980        '{"path": "/srv/models/BetaModel-Q8.foo\\n-rw-rw-r--"}'
6981    )
6982
6983    paths = _extract_candidate_file_paths(text)
6984
6985    assert "/srv/models/AlphaModel-Q4.foo" in paths
6986    assert "/srv/models/BetaModel-Q8.foo" in paths
6987    assert all("\\n-rw-rw-r--" not in path for path in paths)
6988
6989
6990def test_candidate_path_extraction_skips_globs_and_truncated_fragments():
6991    text = (
6992        "/srv/models/*.foo\n"
6993        "/srv/models/AlphaModel-Q4.foo\n"
6994        "/srv/models/AlphaModel-Q4\n"
6995        "/srv/models/AlphaModel-v1.2-UnfinishedFrag\n"
6996        "/srv/models/BetaModel-v1.2-Q8.foo\n"
6997    )
6998
6999    paths = _extract_candidate_file_paths(text)
7000
7001    assert "/srv/models/AlphaModel-Q4.foo" in paths
7002    assert "/srv/models/BetaModel-v1.2-Q8.foo" in paths
7003    assert "/srv/models/*.foo" not in paths
7004    assert "/srv/models/AlphaModel-Q4" not in paths
7005    assert "/srv/models/AlphaModel-v1.2-UnfinishedFrag" not in paths
7006
7007
7008def test_prompt_resurfaces_durable_candidate_file_paths(tmp_path):
7009    db = AgentDB(tmp_path / "state.db")
7010    try:
7011        job_id = db.create_job("Validate a remembered file candidate", title="durable-candidate", kind="generic")
7012        db.update_job_metadata(
7013            job_id,
7014            {
7015                "task_queue": [
7016                    {
7017                        "title": "Validate remembered file path",
7018                        "status": "open",
7019                        "contract": "action",
7020                        "acceptance_criteria": "A candidate file path is validated before use.",
7021                        "evidence_needed": "Shell output with file size or hash.",
7022                    }
7023                ],
7024                "experiment_ledger": [
7025                    {
7026                        "title": "Prior candidate discovery",
7027                        "result": "A previous branch listed /opt/models/Remembered-Model-Q4.foo as a candidate.",
7028                        "next_action": "Validate /opt/models/Remembered-Model-Q4.foo before declaring no usable file.",
7029                    }
7030                ],
7031            },
7032        )
7033
7034        content = build_messages(db.get_job(job_id), db.list_steps(job_id=job_id))[-1]["content"]
7035
7036        assert "Candidate file discovery:" in content
7037        assert "Durable records mention candidate file paths" in content
7038        assert "/opt/models/Remembered-Model-Q4.foo" in content
7039        assert "Treat durable-record candidates as candidates until revalidated" in content
7040    finally:
7041        db.close()
7042
7043
7044def test_prompt_resurfaces_candidate_paths_from_recent_grounding_block(tmp_path):
7045    db = AgentDB(tmp_path / "state.db")
7046    try:
7047        job_id = db.create_job("Validate a candidate file", title="candidate-file", kind="generic")
7048        db.update_job_metadata(
7049            job_id,
7050            {
7051                "task_queue": [
7052                    {
7053                        "title": "Validate candidate file path",
7054                        "status": "open",
7055                        "contract": "action",
7056                        "acceptance_criteria": "A candidate file path is validated before use.",
7057                        "evidence_needed": "Shell output with file size or hash.",
7058                    }
7059                ]
7060            },
7061        )
7062        run_id = db.start_run(job_id, model="test")
7063        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_findings")
7064        db.finish_step(
7065            step_id,
7066            status="blocked",
7067            output_data={
7068                "success": True,
7069                "error": "evidence grounding required",
7070                "evidence_grounding": {
7071                    "missing_candidate_paths": [
7072                        "/srv/models/ExactModel-Q4.foo",
7073                        "/srv/models/*.foo",
7074                        "/srv/models/Fragment-v1.2-Unfinished",
7075                    ]
7076                },
7077            },
7078            summary="blocked record_findings; evidence grounding required",
7079        )
7080        db.finish_run(run_id, "blocked")
7081
7082        content = build_messages(db.get_job(job_id), db.list_steps(job_id=job_id))[-1]["content"]
7083
7084        assert "Candidate file discovery:" in content
7085        assert "/srv/models/ExactModel-Q4.foo" in content
7086        assert "/srv/models/*.foo" not in content
7087        assert "/srv/models/Fragment-v1.2-Unfinished" not in content
7088    finally:
7089        db.close()
7090
7091
7092def test_grounding_uses_recent_missing_candidate_paths_after_raw_evidence_ages(tmp_path):
7093    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
7094    db = AgentDB(tmp_path / "state.db")
7095    try:
7096        job_id = db.create_job("Validate a candidate file", title="candidate-file", kind="generic")
7097        db.update_job_metadata(
7098            job_id,
7099            {
7100                "task_queue": [
7101                    {
7102                        "title": "Validate candidate file path",
7103                        "status": "open",
7104                        "contract": "experiment",
7105                        "acceptance_criteria": "A candidate file path is validated before use.",
7106                        "evidence_needed": "Shell output with file size or hash.",
7107                    }
7108                ]
7109            },
7110        )
7111        run_id = db.start_run(job_id, model="test")
7112        for index in range(10):
7113            step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_lesson")
7114            db.finish_step(
7115                step_id,
7116                status="completed",
7117                output_data={"success": True, "lesson": {"lesson": f"filler {index}"}},
7118                summary=f"filler {index}",
7119            )
7120        blocked_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_findings")
7121        db.finish_step(
7122            blocked_id,
7123            status="blocked",
7124            output_data={
7125                "success": True,
7126                "error": "evidence grounding required",
7127                "evidence_grounding": {
7128                    "missing_candidate_paths": ["/srv/models/ExactModel-Q4.foo"]
7129                },
7130            },
7131            summary="blocked record_findings; evidence grounding required",
7132        )
7133        db.finish_run(run_id, "blocked")
7134
7135        result = run_one_step(
7136            job_id,
7137            config=config,
7138            db=db,
7139            llm=ScriptedLLM([
7140                LLMResponse(tool_calls=[
7141                    ToolCall(
7142                        name="record_experiment",
7143                        arguments={
7144                            "title": "Candidate file validation",
7145                            "hypothesis": "A candidate model file may be available.",
7146                            "metric_name": "validated_files",
7147                            "metric_value": 0,
7148                            "metric_unit": "files",
7149                            "result": "Candidate files were summarized but not named.",
7150                        },
7151                    )
7152                ])
7153            ]),
7154        )
7155
7156        assert result.status == "blocked"
7157        grounding = result.result["evidence_grounding"]
7158        assert "/srv/models/ExactModel-Q4.foo" in grounding["missing_candidate_paths"]
7159    finally:
7160        db.close()
7161
7162
7163def test_prompt_filters_stale_generated_and_objective_tokens(tmp_path):
7164    db = AgentDB(tmp_path / "state.db")
7165    try:
7166        job_id = db.create_job("Optimize Qwen3.6-27B GGUF throughput", title="qwen job", kind="generic")
7167        db.append_lesson(
7168            job_id,
7169            (
7170                "Evidence grounding rejected unsupported concrete tokens for record_experiment: "
7171                "Qwen3.6-27B-GGUF, JSON, shell_exec_step_1037, timeout_after_300s, E5-2690. "
7172                "Treat matching prior ledger claims as stale."
7173            ),
7174            category="mistake",
7175        )
7176        db.append_finding_record(job_id, name="Qwen3.6-27B-GGUF source", category="source")
7177        db.append_finding_record(job_id, name="Intel Xeon E5-2690 baseline", category="hardware")
7178
7179        content = build_messages(db.get_job(job_id), db.list_steps(job_id=job_id))[-1]["content"]
7180
7181        assert "Qwen3.6-27B-GGUF" in content
7182        assert "JSON" not in content
7183        assert "shell_exec_step_1037" not in content
7184        assert "timeout_after_300s" not in content
7185        assert "Unsupported/stale claim tokens to avoid until re-verified: [unsupported-stale-claim]" in content
7186        assert "Intel Xeon E5-2690 baseline" not in content
7187    finally:
7188        db.close()
7189
7190
7191def test_prompt_redacts_stale_tokens_from_recent_state(tmp_path):
7192    db = AgentDB(tmp_path / "state.db")
7193    try:
7194        job_id = db.create_job(
7195            "Prefer current durable evidence",
7196            title="stale-recent-state",
7197            kind="generic",
7198            metadata={"unsupported_claim_tokens": ["E5-2690", "v3"]},
7199        )
7200        run_id = db.start_run(job_id, model="test")
7201        step_id = db.add_step(
7202            job_id=job_id,
7203            run_id=run_id,
7204            kind="tool",
7205            tool_name="record_findings",
7206            input_data={"arguments": {"finding": "Old CPU claim: Intel Xeon E5-2690 v3"}},
7207        )
7208        db.finish_step(
7209            step_id,
7210            status="blocked",
7211            output_data={"success": False, "error": "evidence grounding required"},
7212            summary="blocked record_findings; Intel Xeon E5-2690 v3 unsupported",
7213        )
7214        db.finish_run(run_id, "blocked")
7215
7216        content = build_messages(db.get_job(job_id), db.list_steps(job_id=job_id))[-1]["content"]
7217
7218        assert "E5-2690" not in content
7219        assert "[unsupported-stale-claim]" in content
7220    finally:
7221        db.close()
7222
7223
7224def test_prompt_does_not_redact_stale_tokens_inside_exact_paths(tmp_path):
7225    db = AgentDB(tmp_path / "state.db")
7226    try:
7227        job_id = db.create_job(
7228            "Validate exact candidate paths",
7229            title="path-redaction",
7230            kind="generic",
7231            metadata={"unsupported_claim_tokens": ["AlphaModel-99"]},
7232        )
7233        db.update_job_metadata(
7234            job_id,
7235            {
7236                "task_queue": [
7237                    {
7238                        "title": "Validate candidate file path",
7239                        "status": "open",
7240                        "contract": "experiment",
7241                        "acceptance_criteria": "Exact path is validated.",
7242                        "evidence_needed": "Shell output with file size.",
7243                    }
7244                ]
7245            },
7246        )
7247        run_id = db.start_run(job_id, model="test")
7248        step_id = db.add_step(
7249            job_id=job_id,
7250            run_id=run_id,
7251            kind="tool",
7252            tool_name="record_findings",
7253            input_data={"arguments": {"finding": "Old unsupported AlphaModel-99 claim"}},
7254        )
7255        db.finish_step(
7256            step_id,
7257            status="blocked",
7258            output_data={
7259                "success": False,
7260                "error": "evidence grounding required",
7261                "evidence_grounding": {
7262                    "missing_candidate_paths": ["/srv/models/AlphaModel-99-Q4.foo"],
7263                    "unsupported_tokens": ["/srv/models/AlphaModel-99-Q4.foo"],
7264                },
7265            },
7266            summary="blocked record_findings; AlphaModel-99 unsupported",
7267        )
7268        db.finish_run(run_id, "blocked")
7269
7270        content = build_messages(db.get_job(job_id), db.list_steps(job_id=job_id))[-1]["content"]
7271
7272        assert "/srv/models/AlphaModel-99-Q4.foo" in content
7273        assert "unsupported [unsupported-stale-claim] claim" in content
7274    finally:
7275        db.close()
7276
7277
7278def test_prompt_redacts_older_stale_tokens_from_task_queue(tmp_path):
7279    stale_tail = [f"GPU{i}X" for i in range(60)]
7280    job = {
7281        "title": "stale task cleanup",
7282        "kind": "generic",
7283        "objective": "use current evidence",
7284        "metadata": {
7285            "unsupported_claim_tokens": ["E5-2690", *stale_tail],
7286            "task_queue": [
7287                {
7288                    "title": "Record old baseline",
7289                    "status": "active",
7290                    "priority": 10,
7291                    "goal": "Record CPU: Dual Intel Xeon E5-2690 v3 from old evidence.",
7292                    "output_contract": "experiment",
7293                }
7294            ],
7295        },
7296    }
7297
7298    content = build_messages(job, [])[-1]["content"]
7299
7300    assert "E5-2690" not in content
7301    assert "[unsupported-stale-claim]" in content
7302
7303
7304def test_run_one_step_requires_accounting_after_auto_checkpoint_read(tmp_path):
7305    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
7306    db = AgentDB(tmp_path / "state.db")
7307    try:
7308        job_id = db.create_job("Convert evidence checkpoints into progress", title="checkpoint", kind="generic")
7309        run_id = db.start_run(job_id, model="test")
7310        checkpoint_step = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
7311        db.finish_step(
7312            checkpoint_step,
7313            status="blocked",
7314            output_data={
7315                "success": True,
7316                "error": "artifact required before more research",
7317                "auto_checkpoint": {
7318                    "artifact_id": "art_checkpoint",
7319                    "path": str(tmp_path / "checkpoint.md"),
7320                    "title": "Auto Evidence Checkpoint after step 1",
7321                    "evidence_step": "step_evidence",
7322                    "blocked_tool": "shell_exec",
7323                },
7324            },
7325        )
7326        read_step = db.add_step(
7327            job_id=job_id,
7328            run_id=run_id,
7329            kind="tool",
7330            tool_name="read_artifact",
7331            input_data={"arguments": {"artifact_id": "art_checkpoint"}},
7332        )
7333        db.finish_step(read_step, status="completed", output_data={"success": True, "content": "evidence"})
7334        db.finish_run(run_id, "completed")
7335
7336        result = run_one_step(
7337            job_id,
7338            config=config,
7339            db=db,
7340            llm=ScriptedLLM([
7341                LLMResponse(tool_calls=[ToolCall(name="shell_exec", arguments={"command": "echo more discovery"})])
7342            ]),
7343        )
7344
7345        assert result.status == "blocked"
7346        assert result.result["error"] == "evidence checkpoint accounting required"
7347        assert result.result["blocked_tool"] == "shell_exec"
7348    finally:
7349        db.close()
7350
7351
7352def test_run_one_step_reads_checkpoint_before_batched_branch_work(tmp_path):
7353    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
7354    db = AgentDB(tmp_path / "state.db")
7355    try:
7356        job_id = db.create_job("Convert evidence checkpoints into progress", title="checkpoint", kind="generic")
7357        db.update_job_metadata(
7358            job_id,
7359            {
7360                "pending_evidence_checkpoint": {
7361                    "artifact_id": "art_checkpoint",
7362                    "title": "Auto Evidence Checkpoint after step 1",
7363                    "evidence_step_no": 1,
7364                    "blocked_tool": "shell_exec",
7365                }
7366            },
7367        )
7368
7369        result = run_one_step(
7370            job_id,
7371            config=config,
7372            db=db,
7373            llm=ScriptedLLM([
7374                LLMResponse(tool_calls=[
7375                    ToolCall(name="shell_exec", arguments={"command": "echo more discovery"}),
7376                    ToolCall(name="read_artifact", arguments={"artifact_id": "art_checkpoint"}),
7377                ])
7378            ]),
7379            registry=SuccessRegistry(),
7380        )
7381
7382        tool_steps = [step for step in db.list_steps(job_id=job_id) if step.get("kind") == "tool"]
7383        assert [step["tool_name"] for step in tool_steps[-2:]] == ["read_artifact", "shell_exec"]
7384        assert result.status == "blocked"
7385        assert result.result["error"] == "evidence checkpoint accounting required"
7386        assert result.result["blocked_tool"] == "shell_exec"
7387        assert result.result["checkpoint_already_read"] is True
7388        pending = db.get_job(job_id)["metadata"]["pending_evidence_checkpoint"]
7389        assert pending["read_at"]
7390    finally:
7391        db.close()
7392
7393
7394def test_run_one_step_allows_checkpoint_read_when_deliverable_guard_is_active(tmp_path):
7395    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
7396    db = AgentDB(tmp_path / "state.db")
7397    try:
7398        job_id = db.create_job("Write a report from long research", title="report checkpoint", kind="generic")
7399        run_id = db.start_run(job_id, model="test")
7400        for index in range(18):
7401            step_id = db.add_step(
7402                job_id=job_id,
7403                run_id=run_id,
7404                kind="tool",
7405                tool_name="shell_exec",
7406                input_data={"arguments": {"command": f"ls item-{index}"}},
7407            )
7408            db.finish_step(step_id, status="completed", output_data={"success": True, "stdout": "evidence"})
7409        db.finish_run(run_id, "completed")
7410        db.update_job_metadata(
7411            job_id,
7412            {
7413                "pending_evidence_checkpoint": {
7414                    "artifact_id": "art_checkpoint",
7415                    "title": "Auto Evidence Checkpoint after step 18",
7416                    "evidence_step_no": 18,
7417                    "blocked_tool": "shell_exec",
7418                }
7419            },
7420        )
7421
7422        result = run_one_step(
7423            job_id,
7424            config=config,
7425            db=db,
7426            llm=ScriptedLLM([
7427                LLMResponse(tool_calls=[ToolCall(name="read_artifact", arguments={"artifact_id": "art_checkpoint"})])
7428            ]),
7429            registry=SuccessRegistry(),
7430        )
7431
7432        assert result.status == "completed"
7433        assert result.tool_name == "read_artifact"
7434        pending = db.get_job(job_id)["metadata"]["pending_evidence_checkpoint"]
7435        assert pending["read_at"]
7436        assert pending["read_step_no"] == 19
7437    finally:
7438        db.close()
7439
7440
7441def test_run_one_step_accounts_checkpoint_before_batched_branch_work(tmp_path):
7442    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
7443    db = AgentDB(tmp_path / "state.db")
7444    try:
7445        job_id = db.create_job("Convert evidence checkpoints into progress", title="checkpoint", kind="generic")
7446        db.update_job_metadata(
7447            job_id,
7448            {
7449                "pending_evidence_checkpoint": {
7450                    "artifact_id": "art_checkpoint",
7451                    "title": "Auto Evidence Checkpoint after step 1",
7452                    "read_at": "2026-01-01T00:00:00+00:00",
7453                    "evidence_step_no": 1,
7454                    "blocked_tool": "shell_exec",
7455                }
7456            },
7457        )
7458
7459        result = run_one_step(
7460            job_id,
7461            config=config,
7462            db=db,
7463            llm=ScriptedLLM([
7464                LLMResponse(tool_calls=[
7465                    ToolCall(name="shell_exec", arguments={"command": "echo more discovery"}),
7466                    ToolCall(name="record_lesson", arguments={"lesson": "checkpoint accounted", "category": "strategy"}),
7467                ])
7468            ]),
7469            registry=SuccessRegistry(),
7470        )
7471
7472        tool_steps = [step for step in db.list_steps(job_id=job_id) if step.get("kind") == "tool"]
7473        assert [step["tool_name"] for step in tool_steps[-2:]] == ["record_lesson", "shell_exec"]
7474        assert result.status == "completed"
7475        assert result.tool_name == "shell_exec"
7476        pending = db.get_job(job_id)["metadata"]["pending_evidence_checkpoint"]
7477        assert pending["resolved_at"]
7478        assert pending["resolved_by_tool"] == "record_lesson"
7479    finally:
7480        db.close()
7481
7482
7483def test_run_one_step_treats_guard_recovery_as_checkpoint_accounting(tmp_path):
7484    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
7485    db = AgentDB(tmp_path / "state.db")
7486    try:
7487        job_id = db.create_job("Convert evidence checkpoints into progress", title="checkpoint", kind="generic")
7488        run_id = db.start_run(job_id, model="test")
7489        checkpoint_step = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
7490        db.finish_step(
7491            checkpoint_step,
7492            status="blocked",
7493            output_data={
7494                "success": True,
7495                "error": "artifact required before more research",
7496                "auto_checkpoint": {
7497                    "artifact_id": "art_checkpoint",
7498                    "path": str(tmp_path / "checkpoint.md"),
7499                    "title": "Auto Evidence Checkpoint after step 1",
7500                    "evidence_step": "step_evidence",
7501                    "blocked_tool": "shell_exec",
7502                },
7503            },
7504        )
7505        read_step = db.add_step(
7506            job_id=job_id,
7507            run_id=run_id,
7508            kind="tool",
7509            tool_name="read_artifact",
7510            input_data={"arguments": {"artifact_id": "art_checkpoint"}},
7511        )
7512        db.finish_step(read_step, status="completed", output_data={"success": True, "content": "evidence"})
7513        guard_step = db.add_step(
7514            job_id=job_id,
7515            run_id=run_id,
7516            kind="tool",
7517            tool_name="read_artifact",
7518            input_data={"arguments": {"artifact_id": "art_checkpoint"}},
7519        )
7520        db.finish_step(
7521            guard_step,
7522            status="blocked",
7523            output_data={
7524                "success": True,
7525                "recoverable": True,
7526                "error": "evidence checkpoint accounting required",
7527                "pending_evidence_checkpoint": {
7528                    "artifact_id": "art_checkpoint",
7529                    "title": "Auto Evidence Checkpoint after step 1",
7530                    "checkpoint_read": True,
7531                },
7532            },
7533        )
7534        recovery_step = db.add_step(job_id=job_id, run_id=run_id, kind="recovery", tool_name="guard_recovery")
7535        db.finish_step(
7536            recovery_step,
7537            status="completed",
7538            output_data={
7539                "success": True,
7540                "lesson": {"lesson": "Open a task after repeated guard blocks."},
7541                "task": {"title": "Resolve guard"},
7542            },
7543        )
7544        db.finish_run(run_id, "completed")
7545
7546        result = run_one_step(
7547            job_id,
7548            config=config,
7549            db=db,
7550            llm=ScriptedLLM([
7551                LLMResponse(tool_calls=[ToolCall(name="shell_exec", arguments={"command": "echo more discovery"})])
7552            ]),
7553            registry=SuccessRegistry(),
7554        )
7555
7556        assert result.status == "completed"
7557        assert result.tool_name == "shell_exec"
7558    finally:
7559        db.close()
7560
7561
7562def test_checkpoint_resolution_tool_bypasses_measured_progress_guard(tmp_path):
7563    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
7564    db = AgentDB(tmp_path / "state.db")
7565    try:
7566        job_id = db.create_job("Optimize benchmark speed", title="checkpoint-measure", kind="generic")
7567        db.update_job_metadata(
7568            job_id,
7569            {
7570                "pending_evidence_checkpoint": {
7571                    "artifact_id": "art_checkpoint",
7572                    "title": "Auto Evidence Checkpoint after step 1",
7573                    "read_at": "2026-01-01T00:00:00+00:00",
7574                    "evidence_step_no": 1,
7575                    "blocked_tool": "shell_exec",
7576                }
7577            },
7578        )
7579        run_id = db.start_run(job_id, model="fake")
7580        for index in range(18):
7581            step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
7582            db.finish_step(step_id, status="completed", output_data={"success": True, "stdout": f"probe {index}"})
7583        db.finish_run(run_id, "completed")
7584
7585        result = run_one_step(
7586            job_id,
7587            config=config,
7588            db=db,
7589            llm=ScriptedLLM([
7590                LLMResponse(tool_calls=[
7591                    ToolCall(
7592                        name="record_source",
7593                        arguments={
7594                            "source": "file:///tmp/checkpoint",
7595                            "source_type": "checkpoint",
7596                            "outcome": "checkpoint accounted before more benchmark work",
7597                        },
7598                    )
7599                ])
7600            ]),
7601        )
7602
7603        assert result.status == "completed"
7604        assert result.tool_name == "record_source"
7605        pending = db.get_job(job_id)["metadata"]["pending_evidence_checkpoint"]
7606        assert pending["resolved_by_tool"] == "record_source"
7607    finally:
7608        db.close()
7609
7610
7611def test_run_one_step_persists_checkpoint_obligation_until_accounted(tmp_path):
7612    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
7613    db = AgentDB(tmp_path / "state.db")
7614    try:
7615        job_id = db.create_job("Convert evidence checkpoints into progress", title="checkpoint", kind="generic")
7616        db.update_job_metadata(
7617            job_id,
7618            {
7619                "pending_evidence_checkpoint": {
7620                    "artifact_id": "art_checkpoint",
7621                    "title": "Auto Evidence Checkpoint after step 1",
7622                    "path": str(tmp_path / "checkpoint.md"),
7623                    "created_at": "2026-01-01T00:00:00+00:00",
7624                    "evidence_step": "step_evidence",
7625                    "evidence_step_no": 1,
7626                    "blocked_tool": "shell_exec",
7627                }
7628            },
7629        )
7630
7631        blocked = run_one_step(
7632            job_id,
7633            config=config,
7634            db=db,
7635            llm=ScriptedLLM([
7636                LLMResponse(tool_calls=[ToolCall(name="shell_exec", arguments={"command": "echo more discovery"})])
7637            ]),
7638        )
7639        assert blocked.status == "blocked"
7640        assert blocked.result["error"] == "evidence checkpoint accounting required"
7641
7642        accounted = run_one_step(
7643            job_id,
7644            config=config,
7645            db=db,
7646            llm=ScriptedLLM([
7647                LLMResponse(tool_calls=[
7648                    ToolCall(
7649                        name="record_lesson",
7650                        arguments={
7651                            "lesson": "The checkpoint contains only diagnostic setup evidence; record it and move to the next concrete branch.",
7652                            "category": "strategy",
7653                        },
7654                    )
7655                ])
7656            ]),
7657        )
7658        assert accounted.status == "completed"
7659        pending = db.get_job(job_id)["metadata"]["pending_evidence_checkpoint"]
7660        assert pending["resolved_at"]
7661        assert pending["resolved_by_tool"] == "record_lesson"
7662    finally:
7663        db.close()
7664
7665
7666def test_run_one_step_blocks_branch_work_when_memory_graph_needs_consolidation(tmp_path):
7667    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
7668    db = AgentDB(tmp_path / "state.db")
7669    try:
7670        job_id = db.create_job("Keep improving durable work", title="memory")
7671        db.append_lesson(job_id, "Use validated checkpoints.", category="strategy")
7672        db.append_lesson(job_id, "Reject low-yield branches.", category="strategy")
7673        db.append_finding_record(job_id, name="Finding A")
7674        db.append_finding_record(job_id, name="Finding B")
7675        db.append_source_record(job_id, "source:a")
7676        db.append_experiment_record(job_id, title="Trial", status="measured")
7677        llm = ScriptedLLM([LLMResponse(tool_calls=[ToolCall(name="web_search", arguments={"query": "more research"})])])
7678
7679        result = run_one_step(job_id, config=config, db=db, llm=llm)
7680
7681        assert result.status == "blocked"
7682        assert result.result["error"] == "memory graph consolidation required"
7683        assert result.result["blocked_tool"] == "web_search"
7684    finally:
7685        db.close()
7686
7687
7688def test_run_one_step_allows_memory_graph_consolidation_when_guard_is_active(tmp_path):
7689    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
7690    db = AgentDB(tmp_path / "state.db")
7691    try:
7692        job_id = db.create_job("Keep improving durable work", title="memory")
7693        db.append_lesson(job_id, "Use validated checkpoints.", category="strategy")
7694        db.append_lesson(job_id, "Reject low-yield branches.", category="strategy")
7695        db.append_finding_record(job_id, name="Finding A")
7696        db.append_finding_record(job_id, name="Finding B")
7697        db.append_source_record(job_id, "source:a")
7698        db.append_experiment_record(job_id, title="Trial", status="measured")
7699        llm = ScriptedLLM([
7700            LLMResponse(
7701                tool_calls=[
7702                    ToolCall(
7703                        name="record_memory_graph",
7704                        arguments={
7705                            "nodes": [
7706                                {
7707                                    "key": "validated-checkpoints",
7708                                    "kind": "strategy",
7709                                    "title": "Validated checkpoints",
7710                                    "summary": "Use measured checkpoints to decide the next branch.",
7711                                    "salience": 0.9,
7712                                }
7713                            ]
7714                        },
7715                    )
7716                ]
7717            )
7718        ])
7719
7720        result = run_one_step(job_id, config=config, db=db, llm=llm)
7721
7722        assert result.status == "completed"
7723        assert result.tool_name == "record_memory_graph"
7724        graph = db.get_job(job_id)["metadata"]["memory_graph"]
7725        assert graph["nodes"][0]["key"] == "validated-checkpoints"
7726    finally:
7727        db.close()
7728
7729
7730def test_prompt_adds_lesson_consolidation_guard_when_raw_lessons_sprawl():
7731    job = {
7732        "title": "lesson sprawl",
7733        "kind": "generic",
7734        "objective": "keep improving a long-running job",
7735        "metadata": {
7736            "lessons": [
7737                {"lesson": f"Reusable lesson {index}", "category": "strategy"}
7738                for index in range(30)
7739            ],
7740        },
7741    }
7742    steps = [
7743        {
7744            "step_no": index,
7745            "kind": "tool",
7746            "tool_name": "record_lesson",
7747            "status": "completed",
7748            "summary": f"lesson {index}",
7749        }
7750        for index in range(1, 4)
7751    ]
7752
7753    content = build_messages(job, steps)[-1]["content"]
7754
7755    assert "Lesson consolidation guard:" in content
7756    assert "Raw lessons are accumulating faster than consolidated memory" in content
7757    assert "lessons=30" in content
7758    assert "record_memory_graph" in content
7759
7760
7761def test_run_one_step_blocks_more_lessons_when_lesson_sprawl_needs_graph(tmp_path):
7762    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
7763    db = AgentDB(tmp_path / "state.db")
7764    try:
7765        job_id = db.create_job("Keep improving durable work", title="lesson-sprawl")
7766        for index in range(30):
7767            db.append_lesson(job_id, f"Reusable lesson {index}", category="strategy")
7768        run_id = db.start_run(job_id, model="test")
7769        for index in range(3):
7770            step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_lesson")
7771            db.finish_step(step_id, status="completed", output_data={"success": True, "lesson": {"lesson": f"recent {index}"}})
7772        db.finish_run(run_id, "completed")
7773
7774        result = run_one_step(
7775            job_id,
7776            config=config,
7777            db=db,
7778            llm=ScriptedLLM([
7779                LLMResponse(tool_calls=[ToolCall(name="record_lesson", arguments={"lesson": "one more lesson"})])
7780            ]),
7781        )
7782
7783        assert result.status == "blocked"
7784        assert result.result["error"] == "lesson consolidation required"
7785        assert result.result["lesson_consolidation"]["lessons"] == 30
7786        assert result.result["blocked_tool"] == "record_lesson"
7787    finally:
7788        db.close()
7789
7790
7791def test_run_one_step_allows_memory_graph_when_lesson_sprawl_is_active(tmp_path):
7792    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
7793    db = AgentDB(tmp_path / "state.db")
7794    try:
7795        job_id = db.create_job("Keep improving durable work", title="lesson-sprawl")
7796        for index in range(30):
7797            db.append_lesson(job_id, f"Reusable lesson {index}", category="strategy")
7798        run_id = db.start_run(job_id, model="test")
7799        for index in range(3):
7800            step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_lesson")
7801            db.finish_step(step_id, status="completed", output_data={"success": True, "lesson": {"lesson": f"recent {index}"}})
7802        db.finish_run(run_id, "completed")
7803
7804        result = run_one_step(
7805            job_id,
7806            config=config,
7807            db=db,
7808            llm=ScriptedLLM([
7809                LLMResponse(
7810                    tool_calls=[
7811                        ToolCall(
7812                            name="record_memory_graph",
7813                            arguments={
7814                                "nodes": [
7815                                    {
7816                                        "key": "lesson-sprawl-strategy",
7817                                        "kind": "strategy",
7818                                        "title": "Consolidate repeated lessons",
7819                                        "summary": "Compress repeated lessons into graph memory before adding more.",
7820                                    }
7821                                ]
7822                            },
7823                        )
7824                    ]
7825                )
7826            ]),
7827        )
7828
7829        assert result.status == "completed"
7830        assert result.tool_name == "record_memory_graph"
7831    finally:
7832        db.close()
7833
7834
7835def test_prompt_includes_activity_stagnation_context():
7836    job = {
7837        "title": "research",
7838        "kind": "generic",
7839        "objective": "keep making durable progress",
7840        "metadata": {
7841            "activity_checkpoint_streak": 3,
7842            "last_checkpoint_counts": {
7843                "findings": 1,
7844                "sources": 2,
7845                "tasks": 4,
7846                "experiments": 0,
7847                "lessons": 1,
7848                "milestones": 0,
7849            },
7850        },
7851    }
7852
7853    content = build_messages(job, [])[-1]["content"]
7854
7855    assert "Activity stagnation" in content
7856    assert "activity_checkpoint_streak=3" in content
7857    assert "Recent checkpoints show activity without durable progress" in content
7858
7859
7860def test_prompt_includes_task_planning_guard_context():
7861    job = {
7862        "title": "research",
7863        "kind": "generic",
7864        "objective": "keep making durable progress",
7865        "metadata": {
7866            "task_planning_checkpoint_streak": 2,
7867            "task_queue": [
7868                {"title": "Plan branch", "status": "open"},
7869                {"title": "Executed branch", "status": "done"},
7870            ],
7871        },
7872    }
7873
7874    content = build_messages(job, [])[-1]["content"]
7875
7876    assert "Task planning guard" in content
7877    assert "task_only_checkpoints=2" in content
7878    assert "Do not create more new open tasks next" in content
7879
7880
7881def test_prompt_includes_durable_yield_pressure():
7882    job = {
7883        "title": "research",
7884        "kind": "generic",
7885        "objective": "keep making durable progress",
7886        "metadata": {},
7887    }
7888    steps = [
7889        {
7890            "step_no": index,
7891            "kind": "tool",
7892            "status": "completed",
7893            "tool_name": "web_search",
7894            "summary": f"search {index}",
7895        }
7896        for index in range(1, 31)
7897    ]
7898
7899    content = build_messages(job, steps)[-1]["content"]
7900
7901    assert "Durable progress yield" in content
7902    assert "No durable progress records after 30 completed actions" in content
7903    assert "record findings/source/experiment/lesson/roadmap progress" in content
7904
7905
7906def test_prompt_includes_finding_source_ledgers_and_reflections():
7907    job = {
7908        "title": "research",
7909        "kind": "generic",
7910        "objective": "find research",
7911        "metadata": {
7912            "finding_ledger": [{"name": "Acme Finding", "category": "example category", "location": "Toronto", "score": 0.8}],
7913            "task_queue": [{"title": "Explore primary sources", "status": "open", "priority": 5, "goal": "Find evidence"}],
7914            "source_ledger": [{"source": "https://example.com", "source_type": "web_source", "usefulness_score": 0.9, "yield_count": 3}],
7915            "reflections": [{"summary": "Primary source map is working", "strategy": "Try archival sources next"}],
7916        },
7917    }
7918
7919    messages = build_messages(job, [])
7920
7921    content = messages[-1]["content"]
7922    assert "Finding ledger: 1 unique candidates." in content
7923    assert "Acme Finding" in content
7924    assert "Explore primary sources" in content
7925    assert "https://example.com" in content
7926    assert "Primary source map is working" in content
7927
7928
7929def test_prompt_includes_experiment_ledger_and_best_result():
7930    job = {
7931        "title": "improve process",
7932        "kind": "generic",
7933        "objective": "make a measurable process better",
7934        "metadata": {
7935            "experiment_ledger": [
7936                {
7937                    "title": "variant a",
7938                    "status": "measured",
7939                    "metric_name": "score",
7940                    "metric_value": 2.0,
7941                    "metric_unit": "units",
7942                    "higher_is_better": True,
7943                    "result": "baseline",
7944                    "best_observed": False,
7945                },
7946                {
7947                    "title": "variant b",
7948                    "status": "measured",
7949                    "metric_name": "score",
7950                    "metric_value": 3.5,
7951                    "metric_unit": "units",
7952                    "higher_is_better": True,
7953                    "result": "better",
7954                    "next_action": "try another independent variant",
7955                    "best_observed": True,
7956                },
7957            ],
7958        },
7959    }
7960
7961    messages = build_messages(job, [])
7962
7963    content = messages[-1]["content"]
7964    assert "Experiment ledger:" in content
7965    assert "Best observed results:" in content
7966    assert "variant b" in content
7967    assert "score=3.5 units" in content
7968    assert "Next-action constraint:" in content
7969    assert "latest measured experiment selected a concrete next action" in content
7970    assert "try another independent variant" in content
7971
7972
7973def _stagnant_experiments():
7974    return [
7975        {
7976            "title": "best variant",
7977            "status": "measured",
7978            "metric_name": "score",
7979            "metric_value": 10.0,
7980            "metric_unit": "units",
7981            "higher_is_better": True,
7982            "best_observed": True,
7983            "next_action": "try a materially different branch",
7984        },
7985        *[
7986            {
7987                "title": f"flat variant {index}",
7988                "status": "measured",
7989                "metric_name": "score",
7990                "metric_value": 8.0 + index * 0.1,
7991                "metric_unit": "units",
7992                "higher_is_better": True,
7993                "best_observed": False,
7994                "next_action": "try another small variant",
7995            }
7996            for index in range(1, 6)
7997        ],
7998    ]
7999
8000
8001def _stagnant_experiment_metadata():
8002    return {
8003        "experiment_ledger": _stagnant_experiments(),
8004        "memory_graph": {
8005            "nodes": [
8006                {"key": "best-variant", "kind": "decision", "title": "Best measured variant"},
8007                {"key": "stagnant-branch", "kind": "strategy", "title": "Stagnant branch should pivot"},
8008            ]
8009        },
8010    }
8011
8012
8013def test_prompt_includes_experiment_stagnation_guard():
8014    job = {
8015        "title": "improve measured process",
8016        "kind": "generic",
8017        "objective": "optimize throughput and keep improving",
8018        "metadata": _stagnant_experiment_metadata(),
8019    }
8020
8021    content = build_messages(job, [])[-1]["content"]
8022
8023    assert "Experiment stagnation guard:" in content
8024    assert "Recent measured trials have not improved" in content
8025    assert "best=10.0" in content
8026    assert "non_improving=5" in content
8027
8028
8029def test_prompt_infers_experiment_stagnation_from_metric_direction():
8030    job = {
8031        "title": "improve measured process",
8032        "kind": "generic",
8033        "objective": "reduce latency and keep improving",
8034        "metadata": {
8035            "experiment_ledger": [
8036                {
8037                    "title": "best latency",
8038                    "status": "measured",
8039                    "metric_name": "latency",
8040                    "metric_value": 1.0,
8041                    "metric_unit": "s",
8042                    "higher_is_better": False,
8043                },
8044                *[
8045                    {
8046                        "title": f"slower variant {index}",
8047                        "status": "measured",
8048                        "metric_name": "latency",
8049                        "metric_value": 1.0 + index * 0.1,
8050                        "metric_unit": "s",
8051                        "higher_is_better": False,
8052                    }
8053                    for index in range(1, 6)
8054                ],
8055            ],
8056            "memory_graph": {
8057                "nodes": [
8058                    {"key": "latency-best", "kind": "decision", "title": "Best latency"},
8059                    {"key": "latency-pivot", "kind": "strategy", "title": "Pivot stagnant latency branch"},
8060                ]
8061            },
8062        },
8063    }
8064
8065    content = build_messages(job, [])[-1]["content"]
8066
8067    assert "Experiment stagnation guard:" in content
8068    assert "best=1.0" in content
8069    assert "latest=1.5" in content
8070    assert "Recent measured trials have not improved" in content
8071
8072
8073def test_prompt_does_not_treat_unmarked_improvements_as_stagnation():
8074    job = {
8075        "title": "improve measured process",
8076        "kind": "generic",
8077        "objective": "increase score and keep improving",
8078        "metadata": {
8079            "experiment_ledger": [
8080                {
8081                    "title": f"better variant {index}",
8082                    "status": "measured",
8083                    "metric_name": "score",
8084                    "metric_value": float(index),
8085                    "metric_unit": "points",
8086                    "higher_is_better": True,
8087                    "best_observed": False,
8088                }
8089                for index in range(1, 7)
8090            ],
8091            "memory_graph": {
8092                "nodes": [
8093                    {"key": "score-progress", "kind": "decision", "title": "Score is improving"},
8094                    {"key": "score-next", "kind": "strategy", "title": "Continue measured branch"},
8095                ]
8096            },
8097        },
8098    }
8099
8100    content = build_messages(job, [])[-1]["content"]
8101
8102    assert "Experiment stagnation guard:" in content
8103    assert "Recent measured trials have not improved" not in content
8104
8105
8106def test_run_one_step_blocks_branch_work_after_experiment_stagnation(tmp_path):
8107    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
8108    db = AgentDB(tmp_path / "state.db")
8109    try:
8110        job_id = db.create_job(
8111            "Optimize a measurable process and keep improving",
8112            title="experiment-stagnation",
8113            kind="generic",
8114            metadata=_stagnant_experiment_metadata(),
8115        )
8116        run_id = db.start_run(job_id, model="test")
8117        for index in range(6):
8118            step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_experiment")
8119            db.finish_step(step_id, status="completed", output_data={"success": True, "experiment": {"title": f"trial {index}"}})
8120        db.finish_run(run_id, "completed")
8121
8122        result = run_one_step(
8123            job_id,
8124            config=config,
8125            db=db,
8126            llm=ScriptedLLM([
8127                LLMResponse(tool_calls=[ToolCall(name="shell_exec", arguments={"command": "python run_next_trial.py"})])
8128            ]),
8129        )
8130
8131        assert result.status == "blocked"
8132        assert result.result["error"] == "experiment stagnation decision required"
8133        assert result.result["blocked_tool"] == "shell_exec"
8134        assert result.result["experiment_stagnation"]["non_improving_count"] == 5
8135    finally:
8136        db.close()
8137
8138
8139def test_run_one_step_allows_branch_decision_after_experiment_stagnation(tmp_path):
8140    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
8141    db = AgentDB(tmp_path / "state.db")
8142    try:
8143        job_id = db.create_job(
8144            "Optimize a measurable process and keep improving",
8145            title="experiment-stagnation",
8146            kind="generic",
8147            metadata=_stagnant_experiment_metadata(),
8148        )
8149        run_id = db.start_run(job_id, model="test")
8150        for index in range(6):
8151            step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_experiment")
8152            db.finish_step(step_id, status="completed", output_data={"success": True, "experiment": {"title": f"trial {index}"}})
8153        db.finish_run(run_id, "completed")
8154
8155        result = run_one_step(
8156            job_id,
8157            config=config,
8158            db=db,
8159            llm=ScriptedLLM([
8160                LLMResponse(tool_calls=[
8161                    ToolCall(
8162                        name="record_tasks",
8163                        arguments={
8164                            "tasks": [
8165                                {
8166                                    "title": "Pivot away from stagnant measured branch",
8167                                    "status": "open",
8168                                    "output_contract": "decision",
8169                                    "acceptance_criteria": "A materially different branch is selected.",
8170                                }
8171                            ]
8172                        },
8173                    )
8174                ])
8175            ]),
8176        )
8177
8178        assert result.status == "completed"
8179        assert result.tool_name == "record_tasks"
8180    finally:
8181        db.close()
8182
8183
8184def test_run_one_step_allows_blocked_experiment_after_experiment_stagnation(tmp_path):
8185    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
8186    db = AgentDB(tmp_path / "state.db")
8187    try:
8188        job_id = db.create_job(
8189            "Optimize a measurable process and keep improving",
8190            title="experiment-stagnation",
8191            kind="generic",
8192            metadata=_stagnant_experiment_metadata(),
8193        )
8194        run_id = db.start_run(job_id, model="test")
8195        for index in range(6):
8196            step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_experiment")
8197            db.finish_step(step_id, status="completed", output_data={"success": True, "experiment": {"title": f"trial {index}"}})
8198        db.finish_run(run_id, "completed")
8199
8200        result = run_one_step(
8201            job_id,
8202            config=config,
8203            db=db,
8204            llm=ScriptedLLM([
8205                LLMResponse(tool_calls=[
8206                    ToolCall(
8207                        name="record_experiment",
8208                        arguments={
8209                            "title": "Stagnant branch decision",
8210                            "status": "blocked",
8211                            "metric_name": "score",
8212                            "metric_unit": "units",
8213                            "result": "recent trials did not improve the objective",
8214                            "next_action": "pivot to a materially different branch",
8215                        },
8216                    )
8217                ])
8218            ]),
8219        )
8220
8221        assert result.status == "completed"
8222        assert result.tool_name == "record_experiment"
8223    finally:
8224        db.close()
8225
8226
8227def test_delivery_experiment_next_action_blocks_unrelated_research(tmp_path):
8228    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
8229    db = AgentDB(tmp_path / "state.db")
8230    try:
8231        job_id = db.create_job("Improve a generic deliverable", title="deliverable", kind="generic")
8232        db.update_job_metadata(job_id, {
8233            "experiment_ledger": [{
8234                "title": "deliverable gap",
8235                "status": "measured",
8236                "metric_name": "coverage",
8237                "metric_value": 0.25,
8238                "metric_unit": "ratio",
8239                "next_action": "merge the measured output into the deliverable file",
8240            }],
8241        })
8242
8243        result = run_one_step(
8244            job_id,
8245            config=config,
8246            db=db,
8247            llm=ScriptedLLM([LLMResponse(tool_calls=[ToolCall(name="web_search", arguments={"query": "more background"})])]),
8248            registry=SuccessRegistry(),
8249        )
8250
8251        assert result.status == "blocked"
8252        assert result.result["error"] == "experiment next action pending"
8253        assert "merge the measured output" in result.result["experiment_next_action"]["next_action"]
8254    finally:
8255        db.close()
8256
8257
8258def test_research_experiment_next_action_allows_research(tmp_path):
8259    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
8260    db = AgentDB(tmp_path / "state.db")
8261    try:
8262        job_id = db.create_job("Improve a generic deliverable", title="deliverable", kind="generic")
8263        db.update_job_metadata(job_id, {
8264            "experiment_ledger": [{
8265                "title": "source gap",
8266                "status": "measured",
8267                "metric_name": "coverage",
8268                "metric_value": 0.25,
8269                "metric_unit": "ratio",
8270                "next_action": "search for additional independent sources",
8271            }],
8272        })
8273
8274        result = run_one_step(
8275            job_id,
8276            config=config,
8277            db=db,
8278            llm=ScriptedLLM([LLMResponse(tool_calls=[ToolCall(name="web_search", arguments={"query": "more background"})])]),
8279            registry=SuccessRegistry(),
8280        )
8281
8282        assert result.status == "completed"
8283        assert result.tool_name == "web_search"
8284    finally:
8285        db.close()
8286
8287
8288def test_delivery_experiment_next_action_blocks_read_only_shell(tmp_path):
8289    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
8290    db = AgentDB(tmp_path / "state.db")
8291    try:
8292        job_id = db.create_job("Improve a generic deliverable", title="deliverable", kind="generic")
8293        db.update_job_metadata(job_id, {
8294            "experiment_ledger": [{
8295                "title": "deliverable gap",
8296                "status": "measured",
8297                "metric_name": "coverage",
8298                "metric_value": 0.25,
8299                "metric_unit": "ratio",
8300                "next_action": "merge the measured output into the deliverable file",
8301            }],
8302        })
8303
8304        result = run_one_step(
8305            job_id,
8306            config=config,
8307            db=db,
8308            llm=ScriptedLLM([LLMResponse(tool_calls=[ToolCall(name="shell_exec", arguments={"command": "cat output.txt 2>/dev/null"})])]),
8309            registry=SuccessRegistry(),
8310        )
8311
8312        assert result.status == "blocked"
8313        assert result.result["error"] == "experiment next action pending"
8314    finally:
8315        db.close()
8316
8317
8318def test_delivery_experiment_next_action_allows_bounded_verification_shell(tmp_path):
8319    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
8320    db = AgentDB(tmp_path / "state.db")
8321    try:
8322        job_id = db.create_job("Improve a generic runtime", title="runtime", kind="generic")
8323        db.update_job_metadata(job_id, {
8324            "experiment_ledger": [{
8325                "title": "runtime gap",
8326                "status": "measured",
8327                "metric_name": "valid_files",
8328                "metric_value": 1,
8329                "metric_unit": "files",
8330                "next_action": "build runner binary then run benchmark with validated file",
8331            }],
8332        })
8333
8334        result = run_one_step(
8335            job_id,
8336            config=config,
8337            db=db,
8338            llm=ScriptedLLM([LLMResponse(tool_calls=[
8339                ToolCall(name="shell_exec", arguments={"command": "ls build/bin/runner 2>/dev/null || command -v runner"})
8340            ])]),
8341            registry=SuccessRegistry(),
8342        )
8343
8344        assert result.status == "completed"
8345        assert result.tool_name == "shell_exec"
8346    finally:
8347        db.close()
8348
8349
8350def test_failed_next_action_requires_accounting_before_more_shell(tmp_path):
8351    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
8352    db = AgentDB(tmp_path / "state.db")
8353    try:
8354        job_id = db.create_job("Improve a generic runtime", title="runtime", kind="generic")
8355        db.update_job_metadata(job_id, {
8356            "experiment_ledger": [{
8357                "title": "runtime gap",
8358                "status": "measured",
8359                "metric_name": "valid_files",
8360                "metric_value": 1,
8361                "metric_unit": "files",
8362                "next_action": "build runner binary then run benchmark with validated file",
8363            }],
8364        })
8365        run_id = db.start_run(job_id, model="test")
8366        step_id = db.add_step(
8367            job_id=job_id,
8368            run_id=run_id,
8369            kind="tool",
8370            tool_name="shell_exec",
8371            input_data={"arguments": {"command": "cd /tmp/runtime && mkdir -p build && build-tool .."}},
8372        )
8373        db.finish_step(
8374            step_id,
8375            status="completed",
8376            output_data={
8377                "success": True,
8378                "returncode": 0,
8379                "stdout": "/bin/sh: 1: build-tool: not found\n",
8380                "stderr": "",
8381            },
8382        )
8383        db.finish_run(run_id, "completed")
8384
8385        result = run_one_step(
8386            job_id,
8387            config=config,
8388            db=db,
8389            llm=ScriptedLLM([LLMResponse(tool_calls=[
8390                ToolCall(name="shell_exec", arguments={"command": "ls /tmp/runtime/build/bin/runner 2>&1"})
8391            ])]),
8392            registry=SuccessRegistry(),
8393        )
8394
8395        assert result.status == "blocked"
8396        assert result.result["error"] == "action result accounting required"
8397        assert result.result["action_failure"]["step_no"] == 1
8398        assert result.result["action_failure"]["missing_commands"] == ["build-tool"]
8399        assert "build-tool: not found" in result.result["action_failure"]["excerpt"]
8400    finally:
8401        db.close()
8402
8403
8404def test_failed_next_action_prompt_prioritizes_accounting(tmp_path):
8405    db = AgentDB(tmp_path / "state.db")
8406    try:
8407        job_id = db.create_job("Improve a generic runtime", title="runtime", kind="generic")
8408        db.update_job_metadata(job_id, {
8409            "experiment_ledger": [{
8410                "title": "runtime gap",
8411                "status": "measured",
8412                "metric_name": "valid_files",
8413                "metric_value": 1,
8414                "metric_unit": "files",
8415                "next_action": "build runner binary then run benchmark with validated file",
8416            }],
8417        })
8418        run_id = db.start_run(job_id, model="test")
8419        step_id = db.add_step(
8420            job_id=job_id,
8421            run_id=run_id,
8422            kind="tool",
8423            tool_name="shell_exec",
8424            input_data={"arguments": {"command": "cd /tmp/runtime && build-tool .."}},
8425        )
8426        db.finish_step(
8427            step_id,
8428            status="failed",
8429            output_data={
8430                "success": False,
8431                "returncode": 0,
8432                "stdout": "/bin/sh: 1: build-tool: not found\n",
8433                "error": "command output indicates missing command despite exit status 0",
8434            },
8435        )
8436        db.finish_run(run_id, "completed")
8437
8438        messages = build_messages(db.get_job(job_id), db.list_steps(job_id=job_id))
8439        prompt = messages[-1]["content"]
8440
8441        assert "latest experiment next action was attempted" in prompt
8442        assert "Missing commands: build-tool" in prompt
8443        assert "record_experiment" in prompt
8444        assert "build-tool: not found" in prompt
8445    finally:
8446        db.close()
8447
8448
8449def test_failed_next_action_narrows_available_tools_to_accounting(tmp_path):
8450    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
8451    db = AgentDB(tmp_path / "state.db")
8452    try:
8453        job_id = db.create_job("Improve a generic runtime", title="runtime", kind="generic")
8454        db.update_job_metadata(job_id, {
8455            "experiment_ledger": [{
8456                "title": "runtime gap",
8457                "status": "measured",
8458                "metric_name": "valid_files",
8459                "metric_value": 1,
8460                "metric_unit": "files",
8461                "next_action": "build runner binary then run benchmark with validated file",
8462            }],
8463        })
8464        run_id = db.start_run(job_id, model="test")
8465        step_id = db.add_step(
8466            job_id=job_id,
8467            run_id=run_id,
8468            kind="tool",
8469            tool_name="shell_exec",
8470            input_data={"arguments": {"command": "cd /tmp/runtime && build-tool .."}},
8471        )
8472        db.finish_step(
8473            step_id,
8474            status="completed",
8475            output_data={"success": True, "returncode": 0, "stdout": "/bin/sh: 1: build-tool: not found\n"},
8476        )
8477        db.finish_run(run_id, "completed")
8478        llm = CapturingLLM(
8479            LLMResponse(tool_calls=[ToolCall(name="record_lesson", arguments={"lesson": "missing build tool"})])
8480        )
8481
8482        run_one_step(job_id, config=config, db=db, llm=llm)
8483
8484        tool_names = {tool["function"]["name"] for tool in llm.tools}
8485        assert {"record_experiment", "record_lesson", "record_tasks"}.issubset(tool_names)
8486        assert "shell_exec" not in tool_names
8487        assert "web_search" not in tool_names
8488        assert "write_artifact" not in tool_names
8489    finally:
8490        db.close()
8491
8492
8493def test_accounted_next_action_failure_does_not_keep_blocking(tmp_path):
8494    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
8495    db = AgentDB(tmp_path / "state.db")
8496    try:
8497        job_id = db.create_job("Improve a generic runtime", title="runtime", kind="generic")
8498        db.update_job_metadata(job_id, {
8499            "experiment_ledger": [{
8500                "title": "runtime gap",
8501                "status": "measured",
8502                "metric_name": "valid_files",
8503                "metric_value": 1,
8504                "metric_unit": "files",
8505                "next_action": "build runner binary then run benchmark with validated file",
8506            }],
8507        })
8508        run_id = db.start_run(job_id, model="test")
8509        failed_step_id = db.add_step(
8510            job_id=job_id,
8511            run_id=run_id,
8512            kind="tool",
8513            tool_name="shell_exec",
8514            input_data={"arguments": {"command": "cd /tmp/runtime && build-tool .."}},
8515        )
8516        db.finish_step(
8517            failed_step_id,
8518            status="completed",
8519            output_data={"success": True, "returncode": 0, "stdout": "/bin/sh: 1: build-tool: not found\n"},
8520        )
8521        accounted_step_id = db.add_step(
8522            job_id=job_id,
8523            run_id=run_id,
8524            kind="tool",
8525            tool_name="record_experiment",
8526            input_data={"arguments": {"title": "build failed"}},
8527        )
8528        db.finish_step(
8529            accounted_step_id,
8530            status="completed",
8531            output_data={"success": True, "experiment": {"title": "build failed", "status": "failed"}},
8532        )
8533        db.finish_run(run_id, "completed")
8534
8535        result = run_one_step(
8536            job_id,
8537            config=config,
8538            db=db,
8539            llm=ScriptedLLM([LLMResponse(tool_calls=[
8540                ToolCall(name="shell_exec", arguments={"command": "printf updated > /tmp/runtime/recovery-plan.txt"})
8541            ])]),
8542            registry=SuccessRegistry(),
8543        )
8544
8545        assert result.status == "completed"
8546        assert result.tool_name == "shell_exec"
8547    finally:
8548        db.close()
8549
8550
8551def test_delivery_experiment_next_action_allows_write_shell(tmp_path):
8552    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
8553    db = AgentDB(tmp_path / "state.db")
8554    try:
8555        job_id = db.create_job("Improve a generic deliverable", title="deliverable", kind="generic")
8556        db.update_job_metadata(job_id, {
8557            "experiment_ledger": [{
8558                "title": "deliverable gap",
8559                "status": "measured",
8560                "metric_name": "coverage",
8561                "metric_value": 0.25,
8562                "metric_unit": "ratio",
8563                "next_action": "merge the measured output into the deliverable file",
8564            }],
8565        })
8566
8567        result = run_one_step(
8568            job_id,
8569            config=config,
8570            db=db,
8571            llm=ScriptedLLM([LLMResponse(tool_calls=[ToolCall(name="shell_exec", arguments={"command": "printf updated > output.txt"})])]),
8572            registry=SuccessRegistry(),
8573        )
8574
8575        assert result.status == "completed"
8576        assert result.tool_name == "shell_exec"
8577    finally:
8578        db.close()
8579
8580
8581def test_write_file_can_consume_recent_shell_evidence(tmp_path):
8582    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
8583    db = AgentDB(tmp_path / "state.db")
8584    try:
8585        job_id = db.create_job("Create a concrete output", title="output", kind="generic")
8586
8587        first = run_one_step(
8588            job_id,
8589            config=config,
8590            db=db,
8591            llm=ScriptedLLM([LLMResponse(tool_calls=[ToolCall(name="shell_exec", arguments={"command": "find . -type f"})])]),
8592            registry=LargeShellEvidenceRegistry(),
8593        )
8594        second = run_one_step(
8595            job_id,
8596            config=config,
8597            db=db,
8598            llm=ScriptedLLM([LLMResponse(tool_calls=[ToolCall(name="write_file", arguments={"path": "out.txt", "content": "done"})])]),
8599            registry=SuccessRegistry(),
8600        )
8601
8602        assert first.tool_name == "shell_exec"
8603        assert second.status == "completed"
8604        assert second.tool_name == "write_file"
8605    finally:
8606        db.close()
8607
8608
8609def test_write_file_creates_validation_obligation_for_code_outputs(tmp_path):
8610    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
8611    db = AgentDB(tmp_path / "state.db")
8612    try:
8613        job_id = db.create_job("Create a validated script", title="validate-file", kind="generic")
8614        path = tmp_path / "generated.py"
8615
8616        first = run_one_step(
8617            job_id,
8618            config=config,
8619            db=db,
8620            llm=ScriptedLLM([
8621                LLMResponse(tool_calls=[ToolCall(
8622                    name="write_file",
8623                    arguments={"path": str(path), "content": "print('ok')\n"},
8624                )])
8625            ]),
8626        )
8627        job = db.get_job(job_id)
8628        obligation = job["metadata"]["pending_file_validation_obligation"]
8629
8630        assert first.status == "completed"
8631        assert obligation["path"] == str(path)
8632        assert "py_compile" in obligation["suggested_validation"]
8633    finally:
8634        db.close()
8635
8636
8637def test_file_validation_obligation_blocks_research_until_validated(tmp_path):
8638    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
8639    db = AgentDB(tmp_path / "state.db")
8640    try:
8641        job_id = db.create_job("Create a validated script", title="validate-file", kind="generic")
8642        path = tmp_path / "generated.py"
8643        run_one_step(
8644            job_id,
8645            config=config,
8646            db=db,
8647            llm=ScriptedLLM([
8648                LLMResponse(tool_calls=[ToolCall(
8649                    name="write_file",
8650                    arguments={"path": str(path), "content": "print('ok')\n"},
8651                )])
8652            ]),
8653        )
8654
8655        blocked = run_one_step(
8656            job_id,
8657            config=config,
8658            db=db,
8659            llm=ScriptedLLM([LLMResponse(tool_calls=[ToolCall(name="web_search", arguments={"query": "more context"})])]),
8660            registry=SuccessRegistry(),
8661        )
8662        validated = run_one_step(
8663            job_id,
8664            config=config,
8665            db=db,
8666            llm=ScriptedLLM([
8667                LLMResponse(tool_calls=[ToolCall(
8668                    name="shell_exec",
8669                    arguments={"command": f"python3 -m py_compile {path}"},
8670                )])
8671            ]),
8672        )
8673        job = db.get_job(job_id)
8674
8675        assert blocked.status == "blocked"
8676        assert blocked.result["error"] == "file validation pending"
8677        assert validated.status == "completed"
8678        assert job["metadata"].get("pending_file_validation_obligation") == {}
8679        assert job["metadata"]["last_file_validation_obligation"]["resolution_status"] == "validated"
8680    finally:
8681        db.close()
8682
8683
8684def test_delivery_experiment_next_action_allows_internal_artifact_review(tmp_path):
8685    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
8686    db = AgentDB(tmp_path / "state.db")
8687    try:
8688        job_id = db.create_job("Improve a generic deliverable", title="deliverable", kind="generic")
8689        db.update_job_metadata(job_id, {
8690            "experiment_ledger": [{
8691                "title": "deliverable gap",
8692                "status": "measured",
8693                "metric_name": "coverage",
8694                "metric_value": 0.25,
8695                "metric_unit": "ratio",
8696                "next_action": "merge the measured output into the deliverable file",
8697            }],
8698        })
8699
8700        result = run_one_step(
8701            job_id,
8702            config=config,
8703            db=db,
8704            llm=ScriptedLLM([LLMResponse(tool_calls=[ToolCall(name="search_artifacts", arguments={"query": "saved evidence"})])]),
8705            registry=SuccessRegistry(),
8706        )
8707
8708        assert result.status == "completed"
8709        assert result.tool_name == "search_artifacts"
8710    finally:
8711        db.close()
8712
8713
8714def test_prompt_marks_recent_anti_bot_browser_source():
8715    job = {"title": "research", "kind": "generic", "objective": "find research"}
8716    steps = [{
8717        "step_no": 8,
8718        "kind": "tool",
8719        "status": "completed",
8720        "tool_name": "browser_navigate",
8721        "summary": "browser_navigate opened Just a moment... <https://clutch.co/example>",
8722        "input": {"arguments": {"url": "https://clutch.co/example"}},
8723        "output": {
8724            "data": {"title": "Just a moment...", "url": "https://clutch.co/example"},
8725            "snapshot": "Performing security verification. Cloudflare security challenge.",
8726        },
8727    }]
8728
8729    messages = build_messages(job, steps)
8730
8731    assert "source_warning=cloudflare anti-bot challenge" in messages[-1]["content"]
8732
8733
8734def test_prompt_marks_recent_captcha_browser_block():
8735    job = {"title": "research", "kind": "generic", "objective": "find research"}
8736    steps = [{
8737        "step_no": 8,
8738        "kind": "tool",
8739        "status": "completed",
8740        "tool_name": "browser_snapshot",
8741        "summary": "browser_snapshot returned 1250 chars",
8742        "input": {"arguments": {"full": True}},
8743        "output": {
8744            "data": {
8745                "origin": "https://source.example/search",
8746                "snapshot": 'Iframe "Security CAPTCHA" You have been blocked. You are browsing and clicking at a speed much faster than expected.',
8747            },
8748        },
8749    }]
8750
8751    messages = build_messages(job, steps)
8752
8753    assert "source_warning=captcha/anti-bot block" in messages[-1]["content"]
8754
8755
8756def test_prompt_includes_browser_candidate_names():
8757    job = {"title": "research", "kind": "generic", "objective": "find research"}
8758    steps = [{
8759        "step_no": 9,
8760        "kind": "tool",
8761        "status": "completed",
8762        "tool_name": "browser_snapshot",
8763        "summary": "browser_snapshot returned 2000 chars",
8764        "input": {"arguments": {"full": False}},
8765        "output": {
8766            "data": {
8767                "snapshot": "source page",
8768                "refs": {
8769                    "e1": {"name": "Contact", "role": "link"},
8770                    "e2": {"name": "Drytech Interiors", "role": "link"},
8771                    "e3": {"name": "Flavour Chaser", "role": "link"},
8772                },
8773            },
8774        },
8775    }]
8776
8777    messages = build_messages(job, steps)
8778
8779    assert "Drytech Interiors (@e2)" in messages[-1]["content"]
8780    assert "Flavour Chaser (@e3)" in messages[-1]["content"]
8781    assert "Contact (@e1)" not in messages[-1]["content"]
8782
8783
8784def test_prompt_includes_candidate_names_from_table_cells():
8785    job = {"title": "research", "kind": "generic", "objective": "find research"}
8786    steps = [{
8787        "step_no": 10,
8788        "kind": "tool",
8789        "status": "completed",
8790        "tool_name": "browser_navigate",
8791        "summary": "browser_navigate opened list",
8792        "input": {"arguments": {"url": "https://example.com/list"}},
8793        "output": {
8794            "data": {"title": "list", "url": "https://example.com/list"},
8795            "snapshot": "table",
8796                "refs": {
8797                    "e100": {"name": "Organization Name", "role": "cell"},
8798                "e101": {"name": "Services", "role": "cell"},
8799                "e102": {
8800                    "name": "Custom integration, workflow automation, reliability testing, reporting",
8801                    "role": "cell",
8802                },
8803                "e103": {"name": "4.8", "role": "cell"},
8804                "e104": {"name": "Major Tom", "role": "cell"},
8805                "e105": {"name": "Kffein", "role": "cell"},
8806            },
8807        },
8808    }]
8809
8810    messages = build_messages(job, steps)
8811
8812    content = messages[-1]["content"]
8813    assert "Major Tom (@e104)" in content
8814    assert "Kffein (@e105)" in content
8815    assert "Organization Name (@e100)" not in content
8816    assert "Custom ecommerce" not in content
8817    assert "4.8 (@e103)" not in content
8818
8819
8820def test_prompt_includes_recovery_candidates_after_stale_ref():
8821    job = {"title": "research", "kind": "generic", "objective": "find research"}
8822    steps = [{
8823        "step_no": 10,
8824        "kind": "tool",
8825        "status": "failed",
8826        "tool_name": "browser_click",
8827        "summary": "browser_click failed: Unknown ref: e102",
8828        "input": {"arguments": {"ref": "@e102"}},
8829        "error": "Unknown ref: e102",
8830        "output": {
8831            "success": False,
8832            "error": "Unknown ref: e102",
8833            "recovery_guidance": "The ref was stale or missing.",
8834            "recovery_snapshot": {
8835                "data": {
8836                    "refs": {
8837                        "e4": {"name": "Clearset Vac Truck Services", "role": "link"},
8838                    },
8839                },
8840            },
8841        },
8842    }]
8843
8844    messages = build_messages(job, steps)
8845
8846    content = messages[-1]["content"]
8847    assert "Unknown ref: e102" in content
8848    assert "Clearset Vac Truck Services (@e4)" in content
8849
8850
8851def test_run_one_step_blocks_exact_duplicate_tool_call(tmp_path):
8852    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
8853    db = AgentDB(tmp_path / "state.db")
8854    call = ToolCall(
8855        name="write_artifact",
8856        arguments={"title": "same", "content": "same content"},
8857    )
8858    try:
8859        job_id = db.create_job("Do not repeat exact tools", title="dedupe")
8860        first = run_one_step(job_id, config=config, db=db, llm=ScriptedLLM([LLMResponse(tool_calls=[call])]))
8861        second = run_one_step(job_id, config=config, db=db, llm=ScriptedLLM([LLMResponse(tool_calls=[call])]))
8862
8863        assert first.status == "completed"
8864        assert second.status == "blocked"
8865        assert second.result["error"] == "duplicate tool call blocked"
8866        assert second.result["recoverable"] is True
8867        assert "previous_step" in second.result
8868    finally:
8869        db.close()
8870
8871
8872def test_duplicate_artifact_read_guidance_pushes_follow_up_work(tmp_path):
8873    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
8874    db = AgentDB(tmp_path / "state.db")
8875    try:
8876        job_id = db.create_job("Use artifact once", title="artifact")
8877        run_id = db.start_run(job_id)
8878        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="write_artifact")
8879        artifacts = ArtifactStore(tmp_path, db)
8880        stored = artifacts.write_text(job_id=job_id, run_id=run_id, step_id=step_id, title="Evidence", content="saved")
8881        db.finish_step(step_id, status="completed", output_data={"success": True, "artifact_id": stored.id, "path": str(stored.path)})
8882        db.finish_run(run_id, "completed")
8883        call = ToolCall(name="read_artifact", arguments={"artifact_id": stored.id})
8884
8885        first = run_one_step(job_id, config=config, db=db, llm=ScriptedLLM([LLMResponse(tool_calls=[call])]))
8886        second = run_one_step(job_id, config=config, db=db, llm=ScriptedLLM([LLMResponse(tool_calls=[call])]))
8887
8888        assert first.status == "completed"
8889        assert second.status == "blocked"
8890        assert "Do not read it again" in second.result["guidance"]
8891    finally:
8892        db.close()
8893
8894
8895def test_fresh_evidence_guard_takes_priority_over_duplicate_read(tmp_path):
8896    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
8897    db = AgentDB(tmp_path / "state.db")
8898    try:
8899        job_id = db.create_job("Save fresh evidence before reviewing old artifacts", title="fresh-evidence")
8900        run_id = db.start_run(job_id)
8901        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="write_artifact")
8902        artifacts = ArtifactStore(tmp_path, db)
8903        stored = artifacts.write_text(job_id=job_id, run_id=run_id, step_id=step_id, title="Old Evidence", content="saved")
8904        db.finish_step(step_id, status="completed", output_data={"success": True, "artifact_id": stored.id, "path": str(stored.path)})
8905        db.finish_run(run_id, "completed")
8906
8907        read = ToolCall(name="read_artifact", arguments={"artifact_id": stored.id})
8908        first_read = run_one_step(job_id, config=config, db=db, llm=ScriptedLLM([LLMResponse(tool_calls=[read])]))
8909        assert first_read.status == "completed"
8910
8911        shell = ToolCall(name="shell_exec", arguments={"command": "find . -type f"})
8912        evidence = run_one_step(
8913            job_id,
8914            config=config,
8915            db=db,
8916            llm=ScriptedLLM([LLMResponse(tool_calls=[shell])]),
8917            registry=LargeShellEvidenceRegistry(),
8918        )
8919        assert evidence.status == "completed"
8920
8921        blocked = run_one_step(job_id, config=config, db=db, llm=ScriptedLLM([LLMResponse(tool_calls=[read])]))
8922
8923        assert blocked.status == "blocked"
8924        assert blocked.result["error"] == "artifact required before more research"
8925        assert blocked.result["blocked_tool"] == "read_artifact"
8926    finally:
8927        db.close()
8928
8929
8930def test_run_one_step_allows_repeated_browser_snapshot(tmp_path):
8931    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
8932    db = AgentDB(tmp_path / "state.db")
8933    try:
8934        job_id = db.create_job("Snapshots are stateful", title="snap")
8935        first = run_one_step(
8936            job_id,
8937            config=config,
8938            db=db,
8939            llm=ScriptedLLM([LLMResponse(tool_calls=[ToolCall(name="browser_snapshot", arguments={"full": False})])]),
8940            registry=SnapshotRegistry(),
8941        )
8942        second = run_one_step(
8943            job_id,
8944            config=config,
8945            db=db,
8946            llm=ScriptedLLM([LLMResponse(tool_calls=[ToolCall(name="browser_snapshot", arguments={"full": False})])]),
8947            registry=SnapshotRegistry(),
8948        )
8949
8950        assert first.status == "completed"
8951        assert second.status == "completed"
8952    finally:
8953        db.close()
8954
8955
8956def test_run_one_step_blocks_browser_tools_after_runtime_missing(tmp_path):
8957    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
8958    db = AgentDB(tmp_path / "state.db")
8959    try:
8960        job_id = db.create_job("Browser runtime can be unavailable", title="browser-runtime")
8961        run_id = db.start_run(job_id)
8962        step_id = db.add_step(
8963            job_id=job_id,
8964            run_id=run_id,
8965            kind="tool",
8966            tool_name="browser_navigate",
8967            input_data={"arguments": {"url": "https://example.test"}},
8968        )
8969        db.finish_step(
8970            step_id,
8971            status="failed",
8972            output_data={
8973                "success": False,
8974                "error": "Chrome not found. Checked: Playwright browser cache and Puppeteer browser cache.",
8975            },
8976            summary="browser_navigate failed: Chrome not found",
8977        )
8978        db.finish_run(run_id, "failed")
8979
8980        result = run_one_step(
8981            job_id,
8982            config=config,
8983            db=db,
8984            llm=ScriptedLLM([
8985                LLMResponse(tool_calls=[ToolCall(name="browser_snapshot", arguments={"full": False})])
8986            ]),
8987            registry=SnapshotRegistry(),
8988        )
8989
8990        assert result.status == "blocked"
8991        assert result.result["error"] == "browser runtime unavailable"
8992        assert result.result["browser_runtime"]["tool"] == "browser_navigate"
8993        assert "Use web_search" in result.result["guidance"]
8994    finally:
8995        db.close()
8996
8997
8998def test_run_one_step_allows_non_browser_work_after_runtime_missing(tmp_path):
8999    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
9000    db = AgentDB(tmp_path / "state.db")
9001    try:
9002        job_id = db.create_job("Use fallback tools when browser is missing", title="browser-runtime")
9003        run_id = db.start_run(job_id)
9004        step_id = db.add_step(
9005            job_id=job_id,
9006            run_id=run_id,
9007            kind="tool",
9008            tool_name="browser_navigate",
9009            input_data={"arguments": {"url": "https://example.test"}},
9010        )
9011        db.finish_step(
9012            step_id,
9013            status="failed",
9014            output_data={"success": False, "error": "Browser executable doesn't exist on this host."},
9015            summary="browser_navigate failed: browser executable missing",
9016        )
9017        db.finish_run(run_id, "failed")
9018
9019        result = run_one_step(
9020            job_id,
9021            config=config,
9022            db=db,
9023            llm=ScriptedLLM([
9024                LLMResponse(tool_calls=[ToolCall(name="web_search", arguments={"query": "public docs", "limit": 5})])
9025            ]),
9026            registry=SuccessRegistry(),
9027        )
9028
9029        assert result.status == "completed"
9030        assert result.tool_name == "web_search"
9031    finally:
9032        db.close()
9033
9034
9035def test_run_one_step_skips_batched_browser_call_when_runtime_missing_and_fallback_present(tmp_path):
9036    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
9037    db = AgentDB(tmp_path / "state.db")
9038    try:
9039        job_id = db.create_job("Use fallback tools when browser is missing", title="browser-runtime")
9040        run_id = db.start_run(job_id)
9041        step_id = db.add_step(
9042            job_id=job_id,
9043            run_id=run_id,
9044            kind="tool",
9045            tool_name="browser_navigate",
9046            input_data={"arguments": {"url": "https://example.test"}},
9047        )
9048        db.finish_step(
9049            step_id,
9050            status="failed",
9051            output_data={"success": False, "error": "Chrome not found. Checked: Playwright browser cache."},
9052            summary="browser_navigate failed: Chrome not found",
9053        )
9054        db.finish_run(run_id, "failed")
9055
9056        result = run_one_step(
9057            job_id,
9058            config=config,
9059            db=db,
9060            llm=ScriptedLLM([
9061                LLMResponse(tool_calls=[
9062                    ToolCall(name="browser_navigate", arguments={"url": "https://example.test/next"}),
9063                    ToolCall(name="web_search", arguments={"query": "public docs", "limit": 5}),
9064                ])
9065            ]),
9066            registry=SuccessRegistry(),
9067        )
9068
9069        tool_steps = [step for step in db.list_steps(job_id=job_id) if step.get("kind") == "tool"]
9070        assert result.status == "completed"
9071        assert result.tool_name == "web_search"
9072        assert tool_steps[-1]["tool_name"] == "web_search"
9073        assert all(
9074            step["input"].get("arguments", {}).get("url") != "https://example.test/next"
9075            for step in tool_steps
9076            if step.get("tool_name") == "browser_navigate"
9077        )
9078    finally:
9079        db.close()
9080
9081
9082def test_run_one_step_removes_browser_tools_from_schema_after_runtime_missing(tmp_path):
9083    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
9084    db = AgentDB(tmp_path / "state.db")
9085    try:
9086        job_id = db.create_job("Use fallback tools when browser is missing", title="browser-runtime")
9087        run_id = db.start_run(job_id)
9088        step_id = db.add_step(
9089            job_id=job_id,
9090            run_id=run_id,
9091            kind="tool",
9092            tool_name="browser_navigate",
9093            input_data={"arguments": {"url": "https://example.test"}},
9094        )
9095        db.finish_step(
9096            step_id,
9097            status="failed",
9098            output_data={"success": False, "error": "Chrome not found. Checked: Playwright browser cache."},
9099            summary="browser_navigate failed: Chrome not found",
9100        )
9101        db.finish_run(run_id, "failed")
9102        llm = CapturingLLM(LLMResponse(tool_calls=[ToolCall(name="web_search", arguments={"query": "fallback"})]))
9103
9104        run_one_step(
9105            job_id,
9106            config=config,
9107            db=db,
9108            llm=llm,
9109            registry=BrowserAndWebRegistry(),
9110        )
9111
9112        tool_names = [tool["function"]["name"] for tool in llm.tools]
9113        assert tool_names == ["web_search"]
9114    finally:
9115        db.close()
9116
9117
9118def test_run_one_step_removes_browser_tools_after_older_runtime_missing(tmp_path):
9119    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
9120    db = AgentDB(tmp_path / "state.db")
9121    try:
9122        job_id = db.create_job("Use fallback tools when browser is missing", title="browser-runtime")
9123        run_id = db.start_run(job_id)
9124        step_id = db.add_step(
9125            job_id=job_id,
9126            run_id=run_id,
9127            kind="tool",
9128            tool_name="browser_navigate",
9129            input_data={"arguments": {"url": "https://example.test"}},
9130        )
9131        db.finish_step(
9132            step_id,
9133            status="failed",
9134            output_data={"success": False, "error": "Chrome not found. Checked: Playwright browser cache."},
9135            summary="browser_navigate failed: Chrome not found",
9136        )
9137        for index in range(80):
9138            filler_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="web_search")
9139            db.finish_step(
9140                filler_id,
9141                status="completed",
9142                output_data={"success": True, "query": f"query {index}", "results": []},
9143                summary=f"web_search query {index}",
9144            )
9145        db.finish_run(run_id, "completed")
9146        llm = CapturingLLM(LLMResponse(tool_calls=[ToolCall(name="web_search", arguments={"query": "fallback"})]))
9147
9148        run_one_step(
9149            job_id,
9150            config=config,
9151            db=db,
9152            llm=llm,
9153            registry=BrowserAndWebRegistry(),
9154        )
9155
9156        tool_names = [tool["function"]["name"] for tool in llm.tools]
9157        assert tool_names == ["web_search"]
9158    finally:
9159        db.close()
9160
9161
9162def test_run_one_step_allows_repeated_defer_for_monitor_intervals(tmp_path):
9163    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
9164    db = AgentDB(tmp_path / "state.db")
9165    call = ToolCall(name="defer_job", arguments={"seconds": 60, "reason": "wait for monitor interval"})
9166    try:
9167        job_id = db.create_job("Check a long-running process later", title="defer")
9168        first = run_one_step(job_id, config=config, db=db, llm=ScriptedLLM([LLMResponse(tool_calls=[call])]))
9169        second = run_one_step(job_id, config=config, db=db, llm=ScriptedLLM([LLMResponse(tool_calls=[call])]))
9170
9171        assert first.status == "completed"
9172        assert second.status == "completed"
9173        assert first.tool_name == "defer_job"
9174        assert second.tool_name == "defer_job"
9175    finally:
9176        db.close()
9177
9178
9179def test_run_one_step_blocks_self_defer_for_next_worker_turn(tmp_path):
9180    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
9181    db = AgentDB(tmp_path / "state.db")
9182    try:
9183        job_id = db.create_job("Keep making progress", title="self-defer")
9184
9185        result = run_one_step(
9186            job_id,
9187            config=config,
9188            db=db,
9189            llm=ScriptedLLM([
9190                LLMResponse(tool_calls=[
9191                    ToolCall(
9192                        name="defer_job",
9193                        arguments={
9194                            "seconds": 300,
9195                            "reason": "waiting for tasks to be picked up by next worker turn",
9196                            "next_action": "continue in the next worker step",
9197                        },
9198                    )
9199                ])
9200            ]),
9201        )
9202
9203        assert result.status == "blocked"
9204        assert result.tool_name == "defer_job"
9205        assert result.result["error"] == "self-defer blocked"
9206        assert result.result["self_defer"]["matched"] == "next worker turn"
9207    finally:
9208        db.close()
9209
9210
9211def test_run_one_step_blocks_defer_without_wait_reason(tmp_path):
9212    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
9213    db = AgentDB(tmp_path / "state.db")
9214    try:
9215        job_id = db.create_job("Keep making progress", title="self-defer")
9216
9217        result = run_one_step(
9218            job_id,
9219            config=config,
9220            db=db,
9221            llm=ScriptedLLM([
9222                LLMResponse(tool_calls=[
9223                    ToolCall(
9224                        name="defer_job",
9225                        arguments={
9226                            "seconds": 300,
9227                            "next_action": "build the project and run the measurement",
9228                        },
9229                    )
9230                ])
9231            ]),
9232        )
9233
9234        assert result.status == "blocked"
9235        assert result.tool_name == "defer_job"
9236        assert result.result["error"] == "self-defer blocked"
9237        assert result.result["self_defer"]["matched"] == "missing wait reason"
9238    finally:
9239        db.close()
9240
9241
9242def test_run_one_step_blocks_search_after_unpersisted_extract(tmp_path):
9243    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
9244    db = AgentDB(tmp_path / "state.db")
9245    try:
9246        job_id = db.create_job("Save extracted evidence before more search", title="guard")
9247        run_id = db.start_run(job_id)
9248        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="web_extract")
9249        db.finish_step(
9250            step_id,
9251            status="completed",
9252            output_data={"success": True, "pages": [{"url": "https://example.com", "text": "useful evidence"}]},
9253        )
9254        db.finish_run(run_id, "completed")
9255
9256        result = run_one_step(
9257            job_id,
9258            config=config,
9259            db=db,
9260            llm=ScriptedLLM([
9261                LLMResponse(tool_calls=[ToolCall(name="web_search", arguments={"query": "more findings", "limit": 5})])
9262            ]),
9263        )
9264
9265        assert result.status == "blocked"
9266        assert result.result["error"] == "artifact required before more research"
9267        assert result.result["blocked_tool"] == "web_search"
9268        assert "auto_checkpoint" in result.result
9269        artifacts = db.list_artifacts(job_id)
9270        assert artifacts[0]["title"].startswith("Auto Evidence Checkpoint")
9271
9272        next_result = run_one_step(
9273            job_id,
9274            config=config,
9275            db=db,
9276            llm=ScriptedLLM([
9277                LLMResponse(tool_calls=[ToolCall(name="web_search", arguments={"query": "different findings", "limit": 5})])
9278            ]),
9279            registry=SuccessRegistry(),
9280        )
9281        assert next_result.status == "blocked"
9282        assert next_result.result["error"] == "evidence checkpoint accounting required"
9283    finally:
9284        db.close()
9285
9286
9287def test_prompt_tells_model_to_save_unpersisted_evidence_before_more_research(tmp_path):
9288    db = AgentDB(tmp_path / "state.db")
9289    try:
9290        job_id = db.create_job("Save evidence before searching", title="guard")
9291        run_id = db.start_run(job_id)
9292        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="web_extract")
9293        db.finish_step(
9294            step_id,
9295            status="completed",
9296            output_data={"success": True, "pages": [{"url": "https://example.com", "text": "useful evidence"}]},
9297        )
9298        job = db.get_job(job_id)
9299        steps = db.list_steps(job_id=job_id)
9300
9301        messages = build_messages(job, steps)
9302
9303        assert "Next-action constraint:" in messages[-1]["content"]
9304        assert "Your next tool call should usually be write_artifact" in messages[-1]["content"]
9305    finally:
9306        db.close()
9307
9308
9309def test_run_one_step_blocks_research_after_unpersisted_browser_snapshot(tmp_path):
9310    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
9311    db = AgentDB(tmp_path / "state.db")
9312    try:
9313        job_id = db.create_job("Save browser evidence before more browsing", title="guard")
9314        run_id = db.start_run(job_id)
9315        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="browser_snapshot")
9316        db.finish_step(
9317            step_id,
9318            status="completed",
9319            output_data={
9320                "success": True,
9321                "data": {"origin": "https://example.com"},
9322                "snapshot": "Useful finding evidence. " * 40,
9323            },
9324        )
9325        db.finish_run(run_id, "completed")
9326
9327        result = run_one_step(
9328            job_id,
9329            config=config,
9330            db=db,
9331            llm=ScriptedLLM([
9332                LLMResponse(tool_calls=[ToolCall(name="browser_scroll", arguments={"direction": "down"})])
9333            ]),
9334        )
9335
9336        assert result.status == "blocked"
9337        assert result.result["error"] == "artifact required before more research"
9338        assert result.result["blocked_tool"] == "browser_scroll"
9339        assert "auto_checkpoint" in result.result
9340    finally:
9341        db.close()
9342
9343
9344def test_prompt_tells_model_to_open_new_branch_when_tasks_are_exhausted():
9345    job = {
9346        "title": "research",
9347        "kind": "generic",
9348        "objective": "keep improving",
9349        "metadata": {
9350            "task_queue": [
9351                {"title": "Initial branch", "status": "done", "priority": 5, "result": "Checkpoint saved"},
9352                {"title": "Blocked branch", "status": "blocked", "priority": 4, "result": "Source unavailable"},
9353            ],
9354        },
9355    }
9356
9357    messages = build_messages(job, [])
9358
9359    content = messages[-1]["content"]
9360    assert "All durable task branches are done" in content
9361    assert "use record_tasks to open the next concrete branch" in content
9362
9363
9364def test_prompt_pushes_deliverable_checkpoint_after_long_research():
9365    job = {
9366        "title": "paper",
9367        "kind": "generic",
9368        "objective": "write a complete research paper from evidence",
9369        "metadata": {
9370            "task_queue": [
9371                {
9372                    "title": "Save the first durable draft",
9373                    "status": "open",
9374                    "priority": 8,
9375                    "output_contract": "report",
9376                }
9377            ],
9378        },
9379    }
9380    steps = [
9381        {
9382            "step_no": index + 1,
9383            "status": "completed",
9384            "kind": "tool",
9385            "tool_name": "shell_exec",
9386            "input": {"arguments": {"command": f"cat source_{index}.txt"}},
9387        }
9388        for index in range(18)
9389    ]
9390
9391    content = build_messages(job, steps)[-1]["content"]
9392
9393    assert "Deliverable progress guard:" in content
9394    assert "durable deliverable checkpoint" in content
9395    assert "write_file or write_artifact" in content
9396
9397
9398def test_low_priority_report_task_does_not_block_execution_task_prompt():
9399    job = {
9400        "title": "execution",
9401        "kind": "generic",
9402        "objective": "keep useful work moving",
9403        "metadata": {
9404            "task_queue": [
9405                {
9406                    "title": "Review saved output later",
9407                    "status": "open",
9408                    "priority": 4,
9409                    "output_contract": "report",
9410                },
9411                {
9412                    "title": "Run current experiment",
9413                    "status": "active",
9414                    "priority": 9,
9415                    "output_contract": "experiment",
9416                },
9417            ],
9418        },
9419    }
9420    steps = [
9421        {
9422            "step_no": index + 1,
9423            "status": "completed",
9424            "kind": "tool",
9425            "tool_name": "shell_exec",
9426            "input": {"arguments": {"command": f"probe_{index}"}},
9427        }
9428        for index in range(18)
9429    ]
9430
9431    content = build_messages(job, steps)[-1]["content"]
9432
9433    assert "Deliverable progress guard:\nNone." in content
9434    assert "durable deliverable checkpoint" not in content
9435
9436
9437def test_run_one_step_blocks_more_research_when_deliverable_needs_checkpoint(tmp_path):
9438    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
9439    db = AgentDB(tmp_path / "state.db")
9440    try:
9441        job_id = db.create_job(
9442            "Write a complete report from collected evidence",
9443            title="deliverable",
9444            metadata={
9445                "task_queue": [
9446                    {
9447                        "title": "Save the first durable report checkpoint",
9448                        "status": "open",
9449                        "priority": 8,
9450                        "output_contract": "report",
9451                    }
9452                ]
9453            },
9454        )
9455        run_id = db.start_run(job_id, model="fake")
9456        for index in range(15):
9457            step_id = db.add_step(
9458                job_id=job_id,
9459                run_id=run_id,
9460                kind="tool",
9461                tool_name="shell_exec",
9462                input_data={"arguments": {"command": f"cat source_{index}.txt"}},
9463            )
9464            db.finish_step(step_id, status="completed", output_data={"success": True, "stdout": "note"})
9465        ledger_step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_findings")
9466        db.finish_step(ledger_step_id, status="completed", output_data={"success": True})
9467        for index in range(15, 18):
9468            step_id = db.add_step(
9469                job_id=job_id,
9470                run_id=run_id,
9471                kind="tool",
9472                tool_name="shell_exec",
9473                input_data={"arguments": {"command": f"cat source_{index}.txt"}},
9474            )
9475            db.finish_step(step_id, status="completed", output_data={"success": True, "stdout": "note"})
9476        db.finish_run(run_id, "completed")
9477
9478        result = run_one_step(
9479            job_id,
9480            config=config,
9481            db=db,
9482            llm=ScriptedLLM([
9483                LLMResponse(tool_calls=[ToolCall(name="web_search", arguments={"query": "more background sources"})])
9484            ]),
9485        )
9486
9487        assert result.status == "blocked"
9488        assert result.result["error"] == "deliverable checkpoint required"
9489        assert result.result["blocked_tool"] == "web_search"
9490        assert result.result["recoverable"] is True
9491    finally:
9492        db.close()
9493
9494
9495def test_prompt_includes_roadmap_and_validation_constraints():
9496    job = {
9497        "title": "broad work",
9498        "kind": "generic",
9499        "objective": "build a broad durable outcome",
9500        "metadata": {
9501            "roadmap": {
9502                "title": "Broad Roadmap",
9503                "status": "active",
9504                "current_milestone": "Foundation",
9505                "validation_contract": "check observable evidence",
9506                "milestones": [{
9507                    "title": "Foundation",
9508                    "status": "validating",
9509                    "validation_status": "pending",
9510                    "acceptance_criteria": "evidence exists",
9511                    "evidence_needed": "saved output",
9512                    "features": [{"title": "First feature", "status": "done"}],
9513                }],
9514            },
9515        },
9516    }
9517
9518    messages = build_messages(job, [])
9519    content = messages[-1]["content"]
9520
9521    assert "Roadmap:" in content
9522    assert "Broad Roadmap" in content
9523    assert "validation=pending" in content
9524    assert "Use record_milestone_validation" in content
9525
9526
9527def test_prompt_suggests_roadmap_for_broad_jobs_without_one():
9528    job = {
9529        "title": "broad work",
9530        "kind": "generic",
9531        "objective": "research and implement a broad multi phase system with validation and durable output",
9532        "metadata": {},
9533    }
9534
9535    messages = build_messages(job, [])
9536    content = messages[-1]["content"]
9537
9538    assert "No roadmap yet" in content
9539    assert "use record_roadmap" in content
9540
9541
9542def test_run_one_step_blocks_branch_work_when_milestone_needs_validation(tmp_path):
9543    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
9544    db = AgentDB(tmp_path / "state.db")
9545    try:
9546        job_id = db.create_job(
9547            "Keep broad work gated by validation",
9548            title="roadmap-gate",
9549            metadata={
9550                "roadmap": {
9551                    "title": "Generic Roadmap",
9552                    "status": "active",
9553                    "milestones": [{
9554                        "title": "Foundation",
9555                        "status": "validating",
9556                        "validation_status": "pending",
9557                        "acceptance_criteria": "evidence exists",
9558                        "evidence_needed": "saved artifact",
9559                    }],
9560                },
9561            },
9562        )
9563
9564        result = run_one_step(
9565            job_id,
9566            config=config,
9567            db=db,
9568            llm=ScriptedLLM([
9569                LLMResponse(tool_calls=[ToolCall(name="web_search", arguments={"query": "new branch", "limit": 5})])
9570            ]),
9571        )
9572
9573        assert result.status == "blocked"
9574        assert result.result["error"] == "milestone validation required"
9575        assert result.result["blocked_tool"] == "web_search"
9576    finally:
9577        db.close()
9578
9579
9580def test_run_one_step_allows_milestone_validation_when_gate_is_active(tmp_path):
9581    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
9582    db = AgentDB(tmp_path / "state.db")
9583    try:
9584        job_id = db.create_job(
9585            "Validate a gated milestone",
9586            title="roadmap-validate",
9587            metadata={
9588                "roadmap": {
9589                    "title": "Generic Roadmap",
9590                    "status": "active",
9591                    "milestones": [{
9592                        "title": "Foundation",
9593                        "status": "validating",
9594                        "validation_status": "pending",
9595                    }],
9596                },
9597            },
9598        )
9599
9600        result = run_one_step(
9601            job_id,
9602            config=config,
9603            db=db,
9604            llm=ScriptedLLM([
9605                LLMResponse(tool_calls=[ToolCall(name="record_milestone_validation", arguments={
9606                    "milestone": "Foundation",
9607                    "validation_status": "passed",
9608                    "result": "Acceptance criteria met.",
9609                    "evidence": "artifact",
9610                })])
9611            ]),
9612        )
9613
9614        assert result.status == "completed"
9615        assert result.tool_name == "record_milestone_validation"
9616        roadmap = db.get_job(job_id)["metadata"]["roadmap"]
9617        assert roadmap["milestones"][0]["validation_status"] == "passed"
9618    finally:
9619        db.close()
9620
9621
9622def test_run_one_step_allows_matching_pending_milestone_evidence_action(tmp_path):
9623    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
9624    db = AgentDB(tmp_path / "state.db")
9625    try:
9626        job_id = db.create_job(
9627            "Validate a pending milestone",
9628            title="roadmap-pending-shell",
9629            metadata={
9630                "roadmap": {
9631                    "title": "Generic Roadmap",
9632                    "status": "validating",
9633                    "milestones": [{
9634                        "title": "Environment baseline",
9635                        "status": "validating",
9636                        "validation_status": "pending",
9637                        "next_action": "Validate candidate files with a shell probe.",
9638                        "evidence_needed": "Shell output showing candidate file status.",
9639                    }],
9640                },
9641            },
9642        )
9643
9644        result = run_one_step(
9645            job_id,
9646            config=config,
9647            db=db,
9648            llm=ScriptedLLM([
9649                LLMResponse(tool_calls=[ToolCall(name="shell_exec", arguments={
9650                    "command": "printf 'candidate file ok\\n'",
9651                    "timeout_seconds": 5,
9652                })])
9653            ]),
9654        )
9655
9656        assert result.status == "completed"
9657        assert result.tool_name == "shell_exec"
9658    finally:
9659        db.close()
9660
9661
9662def test_run_one_step_allows_matching_pending_milestone_validation_evidence_action(tmp_path):
9663    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
9664    db = AgentDB(tmp_path / "state.db")
9665    try:
9666        job_id = db.create_job(
9667            "Validate a pending milestone",
9668            title="roadmap-pending-validation-evidence",
9669            metadata={
9670                "roadmap": {
9671                    "title": "Generic Roadmap",
9672                    "status": "validating",
9673                    "milestones": [{
9674                        "title": "Build tools",
9675                        "status": "validating",
9676                        "validation_status": "pending",
9677                        "validation_evidence": "Need to verify cmake and compiler paths before building.",
9678                    }],
9679                },
9680            },
9681        )
9682
9683        result = run_one_step(
9684            job_id,
9685            config=config,
9686            db=db,
9687            llm=ScriptedLLM([
9688                LLMResponse(tool_calls=[ToolCall(name="shell_exec", arguments={
9689                    "command": "printf 'cmake compiler ok\\n'",
9690                    "timeout_seconds": 5,
9691                })])
9692            ]),
9693        )
9694
9695        assert result.status == "completed"
9696        assert result.tool_name == "shell_exec"
9697    finally:
9698        db.close()
9699
9700
9701def test_run_one_step_blocks_non_matching_pending_milestone_action(tmp_path):
9702    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
9703    db = AgentDB(tmp_path / "state.db")
9704    try:
9705        job_id = db.create_job(
9706            "Validate a pending milestone",
9707            title="roadmap-pending-unrelated",
9708            metadata={
9709                "roadmap": {
9710                    "title": "Generic Roadmap",
9711                    "status": "validating",
9712                    "milestones": [{
9713                        "title": "Environment baseline",
9714                        "status": "validating",
9715                        "validation_status": "pending",
9716                        "next_action": "Validate candidate files with a shell probe.",
9717                        "evidence_needed": "Shell output showing candidate file status.",
9718                    }],
9719                },
9720            },
9721        )
9722
9723        result = run_one_step(
9724            job_id,
9725            config=config,
9726            db=db,
9727            llm=ScriptedLLM([
9728                LLMResponse(tool_calls=[ToolCall(name="web_search", arguments={
9729                    "query": "unrelated topic",
9730                    "limit": 5,
9731                })])
9732            ]),
9733        )
9734
9735        assert result.status == "blocked"
9736        assert result.result["error"] == "milestone validation required"
9737    finally:
9738        db.close()
9739
9740
9741def test_run_one_step_blocks_wrong_milestone_validation_when_gate_is_active(tmp_path):
9742    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
9743    db = AgentDB(tmp_path / "state.db")
9744    try:
9745        job_id = db.create_job(
9746            "Validate the active milestone only",
9747            title="roadmap-wrong-milestone",
9748            metadata={
9749                "roadmap": {
9750                    "title": "Generic Roadmap",
9751                    "status": "validating",
9752                    "milestones": [{
9753                        "title": "Current milestone",
9754                        "status": "validating",
9755                        "validation_status": "pending",
9756                    }],
9757                },
9758            },
9759        )
9760
9761        result = run_one_step(
9762            job_id,
9763            config=config,
9764            db=db,
9765            llm=ScriptedLLM([
9766                LLMResponse(tool_calls=[ToolCall(name="record_milestone_validation", arguments={
9767                    "milestone": "Different milestone",
9768                    "validation_status": "passed",
9769                })])
9770            ]),
9771        )
9772
9773        assert result.status == "blocked"
9774        assert result.result["error"] == "current milestone validation required"
9775        roadmap = db.get_job(job_id)["metadata"]["roadmap"]
9776        assert [milestone["title"] for milestone in roadmap["milestones"]] == ["Current milestone"]
9777    finally:
9778        db.close()
9779
9780
9781def test_run_one_step_normalizes_matching_validation_to_active_milestone(tmp_path):
9782    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
9783    db = AgentDB(tmp_path / "state.db")
9784    try:
9785        job_id = db.create_job(
9786            "Validate the active milestone from matching evidence",
9787            title="roadmap-normalize-milestone-validation",
9788            metadata={
9789                "roadmap": {
9790                    "title": "Generic Roadmap",
9791                    "status": "validating",
9792                    "milestones": [{
9793                        "title": "Environment baseline evidence: check build tools",
9794                        "status": "validating",
9795                        "validation_status": "pending",
9796                        "validation_evidence": "Need to verify cmake, compiler, and candidate files before building.",
9797                    }],
9798                },
9799            },
9800        )
9801
9802        result = run_one_step(
9803            job_id,
9804            config=config,
9805            db=db,
9806            llm=ScriptedLLM([
9807                LLMResponse(tool_calls=[ToolCall(name="record_milestone_validation", arguments={
9808                    "milestone": "Validate candidate files and build environment",
9809                    "validation_status": "blocked",
9810                    "result": "cmake path failed, compiler still needs verification, and candidate file status is unclear.",
9811                    "evidence": "shell output showed missing cmake path and file checks are still needed.",
9812                    "issues": ["cmake path missing", "candidate file status unresolved"],
9813                })])
9814            ]),
9815        )
9816
9817        assert result.status == "completed"
9818        assert result.tool_name == "record_milestone_validation"
9819        roadmap = db.get_job(job_id)["metadata"]["roadmap"]
9820        assert [milestone["title"] for milestone in roadmap["milestones"]] == [
9821            "Environment baseline evidence: check build tools"
9822        ]
9823        milestone = roadmap["milestones"][0]
9824        assert milestone["validation_status"] == "blocked"
9825        assert milestone["metadata"]["normalized_from_milestone"] == "Validate candidate files and build environment"
9826        assert milestone["metadata"]["normalized_to_active_gate"] is True
9827    finally:
9828        db.close()
9829
9830
9831def test_run_one_step_blocks_task_churn_when_roadmap_stalls(tmp_path):
9832    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
9833    db = AgentDB(tmp_path / "state.db")
9834    try:
9835        job_id = db.create_job(
9836            "Keep roadmap aligned with broad work",
9837            title="roadmap-stale",
9838            metadata={
9839                "roadmap": {
9840                    "title": "Generic Roadmap",
9841                    "status": "planned",
9842                    "milestones": [{
9843                        "title": "Foundation",
9844                        "status": "planned",
9845                        "validation_status": "not_started",
9846                    }],
9847                },
9848                "task_queue": [{"title": f"Task {index}", "status": "done"} for index in range(8)],
9849            },
9850        )
9851        run_id = db.start_run(job_id, model="fake")
9852        for index in range(2):
9853            step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="write_artifact")
9854            db.finish_step(step_id, status="completed", summary=f"artifact {index}", output_data={"success": True})
9855
9856        result = run_one_step(
9857            job_id,
9858            config=config,
9859            db=db,
9860            llm=ScriptedLLM([
9861                LLMResponse(tool_calls=[ToolCall(name="record_tasks", arguments={
9862                    "tasks": [{"title": "More task churn", "status": "open"}]
9863                })])
9864            ]),
9865        )
9866
9867        assert result.status == "blocked"
9868        assert result.result["error"] == "roadmap update required"
9869        assert result.result["blocked_tool"] == "record_tasks"
9870    finally:
9871        db.close()
9872
9873
9874def test_run_one_step_allows_roadmap_update_when_roadmap_stalls(tmp_path):
9875    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
9876    db = AgentDB(tmp_path / "state.db")
9877    try:
9878        job_id = db.create_job(
9879            "Update stale roadmap",
9880            title="roadmap-update",
9881            metadata={
9882                "roadmap": {
9883                    "title": "Generic Roadmap",
9884                    "status": "planned",
9885                    "milestones": [{
9886                        "title": "Foundation",
9887                        "status": "planned",
9888                        "validation_status": "not_started",
9889                    }],
9890                },
9891                "task_queue": [{"title": f"Task {index}", "status": "done"} for index in range(8)],
9892            },
9893        )
9894        run_id = db.start_run(job_id, model="fake")
9895        for index in range(2):
9896            step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="write_artifact")
9897            db.finish_step(step_id, status="completed", summary=f"artifact {index}", output_data={"success": True})
9898
9899        result = run_one_step(
9900            job_id,
9901            config=config,
9902            db=db,
9903            llm=ScriptedLLM([
9904                LLMResponse(tool_calls=[ToolCall(name="record_roadmap", arguments={
9905                    "title": "Generic Roadmap",
9906                    "status": "active",
9907                    "current_milestone": "Foundation",
9908                    "milestones": [{
9909                        "title": "Foundation",
9910                        "status": "active",
9911                        "validation_status": "pending",
9912                        "acceptance_criteria": "evidence reviewed",
9913                    }],
9914                })])
9915            ]),
9916        )
9917
9918        assert result.status == "completed"
9919        assert result.tool_name == "record_roadmap"
9920        roadmap = db.get_job(job_id)["metadata"]["roadmap"]
9921        assert roadmap["status"] == "active"
9922        assert roadmap["milestones"][0]["validation_status"] == "pending"
9923    finally:
9924        db.close()
9925
9926
9927def test_run_one_step_blocks_branch_work_when_tasks_are_exhausted(tmp_path):
9928    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
9929    db = AgentDB(tmp_path / "state.db")
9930    try:
9931        job_id = db.create_job(
9932            "Keep improving without looping",
9933            title="exhausted",
9934            metadata={"task_queue": [{"title": "First branch", "status": "done", "priority": 5}]},
9935        )
9936
9937        result = run_one_step(
9938            job_id,
9939            config=config,
9940            db=db,
9941            llm=ScriptedLLM([
9942                LLMResponse(tool_calls=[ToolCall(name="web_search", arguments={"query": "same broad topic", "limit": 5})])
9943            ]),
9944        )
9945
9946        assert result.status == "blocked"
9947        assert result.result["error"] == "task branch required before more work"
9948        assert result.result["blocked_tool"] == "web_search"
9949        assert result.result["recoverable"] is True
9950    finally:
9951        db.close()
9952
9953
9954def test_run_one_step_allows_record_tasks_when_tasks_are_exhausted(tmp_path):
9955    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
9956    db = AgentDB(tmp_path / "state.db")
9957    try:
9958        job_id = db.create_job(
9959            "Keep improving by opening branches",
9960            title="branch",
9961            metadata={"task_queue": [{"title": "First branch", "status": "done", "priority": 5}]},
9962        )
9963
9964        result = run_one_step(
9965            job_id,
9966            config=config,
9967            db=db,
9968            llm=ScriptedLLM([
9969                LLMResponse(tool_calls=[ToolCall(name="record_tasks", arguments={
9970                    "tasks": [{"title": "Next branch", "status": "open", "priority": 6}]
9971                })])
9972            ]),
9973        )
9974
9975        assert result.status == "completed"
9976        assert result.tool_name == "record_tasks"
9977        job = db.get_job(job_id)
9978        assert any(task["title"] == "Next branch" and task["status"] == "open" for task in job["metadata"]["task_queue"])
9979    finally:
9980        db.close()
9981
9982
9983def test_run_one_step_blocks_new_tasks_when_queue_is_saturated(tmp_path):
9984    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
9985    db = AgentDB(tmp_path / "state.db")
9986    try:
9987        job_id = db.create_job(
9988            "Finish existing work",
9989            title="saturated",
9990            kind="generic",
9991            metadata={
9992                "task_queue": [
9993                    {"title": f"Open branch {index}", "status": "open", "priority": index}
9994                    for index in range(40)
9995                ]
9996            },
9997        )
9998
9999        result = run_one_step(
10000            job_id,
10001            config=config,
10002            db=db,
10003            llm=ScriptedLLM([
10004                LLMResponse(tool_calls=[
10005                    ToolCall(name="record_tasks", arguments={"tasks": [{"title": "Yet another branch", "status": "open"}]})
10006                ])
10007            ]),
10008        )
10009
10010        assert result.status == "blocked"
10011        assert result.result["error"] == "task queue saturated"
10012        assert result.result["task_queue"]["open_count"] == 40
10013        job = db.get_job(job_id)
10014        pressure = job["metadata"]["task_backlog_pressure"]
10015        assert pressure["source"] == "blocked_record_tasks"
10016        assert pressure["open_count"] == 40
10017        assert pressure["reason"] == "too many open tasks"
10018    finally:
10019        db.close()
10020
10021
10022def test_run_one_step_blocks_batch_that_would_saturate_task_queue(tmp_path):
10023    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
10024    db = AgentDB(tmp_path / "state.db")
10025    try:
10026        job_id = db.create_job(
10027            "Keep long-running work focused",
10028            title="projected-sprawl",
10029            kind="generic",
10030            metadata={
10031                "task_queue": [
10032                    {"title": f"Existing branch {index}", "status": "done", "priority": index}
10033                    for index in range(74)
10034                ]
10035            },
10036        )
10037
10038        result = run_one_step(
10039            job_id,
10040            config=config,
10041            db=db,
10042            llm=ScriptedLLM([
10043                LLMResponse(tool_calls=[
10044                    ToolCall(
10045                        name="record_tasks",
10046                        arguments={
10047                            "tasks": [
10048                                {"title": f"New branch {index}", "status": "open"}
10049                                for index in range(10)
10050                            ]
10051                        },
10052                    )
10053                ])
10054            ]),
10055        )
10056
10057        assert result.status == "blocked"
10058        assert result.result["error"] == "task queue saturated"
10059        assert result.result["task_queue"]["reason"] == "total task queue is too large"
10060        assert result.result["task_queue"]["projected_total_count"] == 84
10061        job = db.get_job(job_id)
10062        assert len(job["metadata"]["task_queue"]) == 74
10063    finally:
10064        db.close()
10065
10066
10067def test_run_one_step_executes_accounting_before_saturated_record_tasks(tmp_path):
10068    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
10069    db = AgentDB(tmp_path / "state.db")
10070    try:
10071        job_id = db.create_job(
10072            "Keep useful recovery state",
10073            title="saturated-batch-order",
10074            kind="generic",
10075            metadata={
10076                "task_queue": [
10077                    {"title": f"Existing branch {index}", "status": "done", "priority": index}
10078                    for index in range(84)
10079                ]
10080            },
10081        )
10082
10083        result = run_one_step(
10084            job_id,
10085            config=config,
10086            db=db,
10087            llm=ScriptedLLM([
10088                LLMResponse(tool_calls=[
10089                    ToolCall(
10090                        name="record_tasks",
10091                        arguments={"tasks": [{"title": "New blocked branch", "status": "open"}]},
10092                    ),
10093                    ToolCall(
10094                        name="record_lesson",
10095                        arguments={"lesson": "Use the existing branch before adding more tasks.", "category": "strategy"},
10096                    ),
10097                ])
10098            ]),
10099        )
10100
10101        tool_steps = [step for step in db.list_steps(job_id=job_id) if step.get("kind") == "tool"]
10102        assert [step["tool_name"] for step in tool_steps[-2:]] == ["record_lesson", "record_tasks"]
10103        assert result.status == "blocked"
10104        assert result.result["error"] == "task queue saturated"
10105        lessons = db.get_job(job_id)["metadata"].get("lessons") or []
10106        assert any("existing branch" in str(lesson.get("lesson") or "") for lesson in lessons)
10107    finally:
10108        db.close()
10109
10110
10111def test_run_one_step_blocks_batch_that_would_saturate_open_tasks(tmp_path):
10112    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
10113    db = AgentDB(tmp_path / "state.db")
10114    try:
10115        job_id = db.create_job(
10116            "Execute current branches before planning more",
10117            title="projected-open-sprawl",
10118            kind="generic",
10119            metadata={
10120                "task_queue": [
10121                    {"title": f"Open branch {index}", "status": "open", "priority": index}
10122                    for index in range(35)
10123                ]
10124            },
10125        )
10126
10127        result = run_one_step(
10128            job_id,
10129            config=config,
10130            db=db,
10131            llm=ScriptedLLM([
10132                LLMResponse(tool_calls=[
10133                    ToolCall(
10134                        name="record_tasks",
10135                        arguments={
10136                            "tasks": [
10137                                {"title": f"New open branch {index}", "status": "open"}
10138                                for index in range(5)
10139                            ]
10140                        },
10141                    )
10142                ])
10143            ]),
10144        )
10145
10146        assert result.status == "blocked"
10147        assert result.result["error"] == "task queue saturated"
10148        assert result.result["task_queue"]["reason"] == "too many open tasks"
10149        assert result.result["task_queue"]["projected_open_count"] == 40
10150        job = db.get_job(job_id)
10151        assert len(job["metadata"]["task_queue"]) == 35
10152    finally:
10153        db.close()
10154
10155
10156def test_run_one_step_ignores_guard_recovery_tasks_for_queue_saturation(tmp_path):
10157    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
10158    db = AgentDB(tmp_path / "state.db")
10159    try:
10160        job_id = db.create_job(
10161            "Continue objective work after guard recovery",
10162            title="guard-task-sprawl",
10163            kind="generic",
10164            metadata={
10165                "task_queue": [
10166                    {
10167                        "title": f"Resolve guard: recoverable blocker {index}",
10168                        "status": "open",
10169                        "priority": 9,
10170                        "metadata": {"guard_recovery": {"error": f"recoverable blocker {index}"}},
10171                    }
10172                    for index in range(45)
10173                ]
10174            },
10175        )
10176
10177        result = run_one_step(
10178            job_id,
10179            config=config,
10180            db=db,
10181            llm=ScriptedLLM([
10182                LLMResponse(tool_calls=[
10183                    ToolCall(name="record_tasks", arguments={"tasks": [{"title": "Run next objective branch", "status": "open"}]})
10184                ])
10185            ]),
10186        )
10187
10188        assert result.status == "completed"
10189        assert result.tool_name == "record_tasks"
10190        job = db.get_job(job_id)
10191        assert any(task["title"] == "Run next objective branch" for task in job["metadata"]["task_queue"])
10192    finally:
10193        db.close()
10194
10195
10196def test_run_one_step_ignores_guard_recovery_tasks_for_total_sprawl(tmp_path):
10197    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
10198    db = AgentDB(tmp_path / "state.db")
10199    try:
10200        job_id = db.create_job(
10201            "Continue objective work after many recovered guards",
10202            title="guard-total-sprawl",
10203            kind="generic",
10204            metadata={
10205                "task_queue": [
10206                    {
10207                        "title": f"Resolve guard: recovered blocker {index}",
10208                        "status": "done",
10209                        "priority": 9,
10210                        "metadata": {"guard_recovery": {"error": f"recovered blocker {index}"}},
10211                    }
10212                    for index in range(85)
10213                ]
10214            },
10215        )
10216
10217        result = run_one_step(
10218            job_id,
10219            config=config,
10220            db=db,
10221            llm=ScriptedLLM([
10222                LLMResponse(tool_calls=[
10223                    ToolCall(name="record_tasks", arguments={"tasks": [{"title": "Fresh objective branch", "status": "open"}]})
10224                ])
10225            ]),
10226        )
10227
10228        assert result.status == "completed"
10229        assert result.tool_name == "record_tasks"
10230        job = db.get_job(job_id)
10231        assert any(task["title"] == "Fresh objective branch" for task in job["metadata"]["task_queue"])
10232    finally:
10233        db.close()
10234
10235
10236def test_run_one_step_blocks_read_only_shell_churn(tmp_path):
10237    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
10238    db = AgentDB(tmp_path / "state.db")
10239    try:
10240        job_id = db.create_job("Choose from discovered candidates", title="read-only-churn", kind="generic")
10241        for command in [
10242            "find /tmp/work -type f | head",
10243            "ls -lah /tmp/work",
10244            "curl -s https://example.test/api/list | head -100",
10245        ]:
10246            run_id = db.start_run(job_id, model="test")
10247            step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec", input_data={"arguments": {"command": command}})
10248            db.finish_step(step_id, status="completed", output_data={"success": True, "returncode": 0, "stdout": "candidate-a\ncandidate-b"})
10249            db.finish_run(run_id, "completed")
10250
10251        result = run_one_step(
10252            job_id,
10253            config=config,
10254            db=db,
10255            llm=ScriptedLLM([
10256                LLMResponse(tool_calls=[
10257                    ToolCall(name="shell_exec", arguments={"command": "curl -s https://example.test/api/list?page=2"})
10258                ])
10259            ]),
10260        )
10261
10262        assert result.status == "blocked"
10263        assert result.result["error"] == "action decision required"
10264        assert result.result["read_only_shell_churn"]["read_only_shell_count"] == 3
10265    finally:
10266        db.close()
10267
10268
10269def test_run_one_step_allows_action_after_read_only_shell_churn(tmp_path):
10270    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
10271    db = AgentDB(tmp_path / "state.db")
10272    try:
10273        job_id = db.create_job("Act after discovered candidates", title="read-only-to-action", kind="generic")
10274        for command in [
10275            "find /tmp/work -type f | head",
10276            "ls -lah /tmp/work",
10277            "curl -s https://example.test/api/list | head -100",
10278        ]:
10279            run_id = db.start_run(job_id, model="test")
10280            step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec", input_data={"arguments": {"command": command}})
10281            db.finish_step(step_id, status="completed", output_data={"success": True, "returncode": 0, "stdout": "candidate-a\ncandidate-b"})
10282            db.finish_run(run_id, "completed")
10283
10284        result = run_one_step(
10285            job_id,
10286            config=config,
10287            db=db,
10288            llm=ScriptedLLM([
10289                LLMResponse(tool_calls=[
10290                    ToolCall(name="shell_exec", arguments={"command": "python run_candidate.py --input candidate-a"})
10291                ])
10292            ]),
10293            registry=SuccessRegistry(),
10294        )
10295
10296        assert result.status == "completed"
10297        assert result.tool_name == "shell_exec"
10298    finally:
10299        db.close()
10300
10301
10302def test_run_one_step_allows_read_only_shell_after_durable_decision(tmp_path):
10303    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
10304    db = AgentDB(tmp_path / "state.db")
10305    try:
10306        job_id = db.create_job("Recover from inspection churn", title="read-only-decision", kind="generic")
10307        for command in [
10308            "find /tmp/work -type f | head",
10309            "ls -lah /tmp/work",
10310            "curl -s https://example.test/api/list | head -100",
10311        ]:
10312            run_id = db.start_run(job_id, model="test")
10313            step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec", input_data={"arguments": {"command": command}})
10314            db.finish_step(step_id, status="completed", output_data={"success": True, "returncode": 0, "stdout": "candidate-a\ncandidate-b"})
10315            db.finish_run(run_id, "completed")
10316
10317        run_id = db.start_run(job_id, model="test")
10318        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_lesson")
10319        db.finish_step(
10320            step_id,
10321            status="completed",
10322            output_data={"success": True, "lesson": {"category": "decision", "lesson": "Use candidate-a and inspect its exact metadata next."}},
10323        )
10324        db.finish_run(run_id, "completed")
10325
10326        result = run_one_step(
10327            job_id,
10328            config=config,
10329            db=db,
10330            llm=ScriptedLLM([
10331                LLMResponse(tool_calls=[
10332                    ToolCall(name="shell_exec", arguments={"command": "ls -lah /tmp/work/candidate-a"})
10333                ])
10334            ]),
10335            registry=SuccessRegistry(),
10336        )
10337
10338        assert result.status == "completed"
10339        assert result.tool_name == "shell_exec"
10340    finally:
10341        db.close()
10342
10343
10344def test_run_one_step_allows_explicit_download_after_read_only_shell_churn(tmp_path):
10345    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
10346    db = AgentDB(tmp_path / "state.db")
10347    try:
10348        job_id = db.create_job("Download selected candidate", title="read-only-to-download", kind="generic")
10349        for command in [
10350            "find /tmp/work -type f | head",
10351            "ls -lah /tmp/work",
10352            "curl -s https://example.test/api/list | head -100",
10353        ]:
10354            run_id = db.start_run(job_id, model="test")
10355            step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec", input_data={"arguments": {"command": command}})
10356            db.finish_step(step_id, status="completed", output_data={"success": True, "returncode": 0, "stdout": "candidate-a\ncandidate-b"})
10357            db.finish_run(run_id, "completed")
10358
10359        result = run_one_step(
10360            job_id,
10361            config=config,
10362            db=db,
10363            llm=ScriptedLLM([
10364                LLMResponse(tool_calls=[
10365                    ToolCall(name="shell_exec", arguments={"command": "curl -L -o /tmp/candidate.bin https://example.test/candidate.bin"})
10366                ])
10367            ]),
10368            registry=SuccessRegistry(),
10369        )
10370
10371        assert result.status == "completed"
10372        assert result.tool_name == "shell_exec"
10373    finally:
10374        db.close()
10375
10376
10377def test_run_one_step_blocks_new_tasks_when_queue_sprawls(tmp_path):
10378    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
10379    db = AgentDB(tmp_path / "state.db")
10380    try:
10381        job_id = db.create_job(
10382            "Consolidate long-running work",
10383            title="task-sprawl",
10384            kind="generic",
10385            metadata={
10386                "task_queue": [
10387                    {"title": f"Completed branch {index}", "status": "done", "priority": index}
10388                    for index in range(80)
10389                ]
10390            },
10391        )
10392
10393        result = run_one_step(
10394            job_id,
10395            config=config,
10396            db=db,
10397            llm=ScriptedLLM([
10398                LLMResponse(tool_calls=[
10399                    ToolCall(name="record_tasks", arguments={"tasks": [{"title": "New branch", "status": "open"}]})
10400                ])
10401            ]),
10402        )
10403
10404        assert result.status == "blocked"
10405        assert result.result["error"] == "task queue saturated"
10406        assert result.result["task_queue"]["reason"] == "total task queue is too large"
10407        assert result.result["task_queue"]["total_count"] == 80
10408    finally:
10409        db.close()
10410
10411
10412def test_recent_task_saturation_keeps_record_tasks_for_existing_updates(tmp_path):
10413    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
10414    db = AgentDB(tmp_path / "state.db")
10415    try:
10416        job_id = db.create_job(
10417            "Execute existing work",
10418            title="saturated-tools",
10419            kind="generic",
10420            metadata={
10421                "task_queue": [
10422                    {"title": f"Open branch {index}", "status": "open", "priority": index}
10423                    for index in range(40)
10424                ]
10425            },
10426        )
10427        first = run_one_step(
10428            job_id,
10429            config=config,
10430            db=db,
10431            llm=ScriptedLLM([
10432                LLMResponse(tool_calls=[
10433                    ToolCall(name="record_tasks", arguments={"tasks": [{"title": "New branch", "status": "open"}]})
10434                ])
10435            ]),
10436        )
10437        assert first.status == "blocked"
10438        llm = CapturingLLM(LLMResponse(tool_calls=[ToolCall(name="record_lesson", arguments={"lesson": "execute existing work"})]))
10439
10440        run_one_step(job_id, config=config, db=db, llm=llm)
10441
10442        tool_names = {tool["function"]["name"] for tool in llm.tools}
10443        prompt = llm.messages[-1]["content"]
10444        assert "Task queue saturation" in prompt
10445        assert "Do not create new task branches" in prompt
10446        assert "Existing runnable task titles" in prompt
10447        assert "Open branch 0" in prompt
10448        assert "record_tasks only to update existing task titles" in prompt
10449        assert "record_tasks" in tool_names
10450        assert "record_lesson" in tool_names
10451        assert "shell_exec" in tool_names
10452    finally:
10453        db.close()
10454
10455
10456def test_repeated_task_saturation_temporarily_suppresses_record_tasks(tmp_path):
10457    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
10458    db = AgentDB(tmp_path / "state.db")
10459    try:
10460        job_id = db.create_job(
10461            "Execute existing work",
10462            title="repeated-saturation",
10463            kind="generic",
10464            metadata={
10465                "task_queue": [
10466                    {"title": f"Open branch {index}", "status": "open", "priority": index}
10467                    for index in range(40)
10468                ]
10469            },
10470        )
10471        for title in ("New branch one", "New branch two"):
10472            blocked = run_one_step(
10473                job_id,
10474                config=config,
10475                db=db,
10476                llm=ScriptedLLM([
10477                    LLMResponse(tool_calls=[ToolCall(name="record_tasks", arguments={"tasks": [{"title": title, "status": "open"}]})])
10478                ]),
10479            )
10480            assert blocked.status == "blocked"
10481            assert blocked.result["error"] == "task queue saturated"
10482
10483        llm = CapturingLLM(LLMResponse(tool_calls=[ToolCall(name="record_lesson", arguments={"lesson": "execute existing branch"})]))
10484        run_one_step(job_id, config=config, db=db, llm=llm)
10485
10486        tool_names = {tool["function"]["name"] for tool in llm.tools}
10487        assert "record_tasks" not in tool_names
10488        assert "record_lesson" in tool_names
10489        assert "shell_exec" in tool_names
10490    finally:
10491        db.close()
10492
10493
10494def test_chronic_backlog_suppresses_new_task_planning_tool(tmp_path):
10495    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
10496    db = AgentDB(tmp_path / "state.db")
10497    try:
10498        job_id = db.create_job(
10499            "Execute existing work",
10500            title="chronic-backlog-tools",
10501            kind="generic",
10502            metadata={
10503                "task_queue": [
10504                    {"title": f"Open branch {index}", "status": "open", "priority": index}
10505                    for index in range(82)
10506                ]
10507            },
10508        )
10509        llm = CapturingLLM(LLMResponse(tool_calls=[ToolCall(name="record_lesson", arguments={"lesson": "execute existing work"})]))
10510
10511        run_one_step(job_id, config=config, db=db, llm=llm)
10512
10513        tool_names = {tool["function"]["name"] for tool in llm.tools}
10514        prompt = llm.messages[-1]["content"]
10515        assert "Current execution focus" in prompt
10516        assert "backlog=82 tasks" in prompt
10517        assert "record_tasks" not in tool_names
10518        assert "record_lesson" in tool_names
10519        assert "shell_exec" in tool_names
10520    finally:
10521        db.close()
10522
10523
10524def test_run_one_step_allows_existing_task_update_when_queue_is_saturated(tmp_path):
10525    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
10526    db = AgentDB(tmp_path / "state.db")
10527    try:
10528        job_id = db.create_job(
10529            "Finish existing work",
10530            title="saturated",
10531            kind="generic",
10532            metadata={
10533                "task_queue": [
10534                    {"title": f"Open branch {index}", "status": "open", "priority": index}
10535                    for index in range(40)
10536                ]
10537            },
10538        )
10539
10540        result = run_one_step(
10541            job_id,
10542            config=config,
10543            db=db,
10544            llm=ScriptedLLM([
10545                LLMResponse(tool_calls=[
10546                    ToolCall(name="record_tasks", arguments={"tasks": [{"title": "Open branch 0", "status": "active"}]})
10547                ])
10548            ]),
10549        )
10550
10551        assert result.status == "completed"
10552        assert result.tool_name == "record_tasks"
10553        job = db.get_job(job_id)
10554        assert job["metadata"]["task_queue"][0]["status"] == "active"
10555    finally:
10556        db.close()
10557
10558
10559def test_run_one_step_allows_semantic_task_update_when_queue_is_saturated(tmp_path):
10560    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
10561    db = AgentDB(tmp_path / "state.db")
10562    try:
10563        job_id = db.create_job(
10564            "Finish existing work",
10565            title="semantic-saturated",
10566            kind="generic",
10567            metadata={
10568                "task_queue": [
10569                    {
10570                        "title": "Validate model files and run baseline benchmark",
10571                        "status": "open",
10572                        "priority": 5,
10573                    },
10574                    *[
10575                        {"title": f"Completed branch {index}", "status": "done", "priority": index}
10576                        for index in range(81)
10577                    ],
10578                ]
10579            },
10580        )
10581
10582        result = run_one_step(
10583            job_id,
10584            config=config,
10585            db=db,
10586            llm=ScriptedLLM([
10587                LLMResponse(tool_calls=[
10588                    ToolCall(
10589                        name="record_tasks",
10590                        arguments={
10591                            "tasks": [{
10592                                "title": "Validate candidate model files and run baseline benchmark",
10593                                "status": "active",
10594                                "priority": 10,
10595                            }]
10596                        },
10597                    )
10598                ])
10599            ]),
10600        )
10601
10602        assert result.status == "completed"
10603        assert result.tool_name == "record_tasks"
10604        job = db.get_job(job_id)
10605        task = job["metadata"]["task_queue"][0]
10606        assert task["title"] == "Validate model files and run baseline benchmark"
10607        assert task["status"] == "active"
10608        assert task["metadata"]["original_title"] == "Validate candidate model files and run baseline benchmark"
10609    finally:
10610        db.close()
10611
10612
10613def test_run_one_step_auto_records_anti_bot_browser_source(tmp_path):
10614    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
10615    db = AgentDB(tmp_path / "state.db")
10616    try:
10617        job_id = db.create_job("Avoid blocked browser pages", title="guard")
10618
10619        result = run_one_step(
10620            job_id,
10621            config=config,
10622            db=db,
10623            llm=ScriptedLLM([
10624                LLMResponse(tool_calls=[ToolCall(name="browser_snapshot", arguments={"full": True})])
10625            ]),
10626            registry=AntiBotBrowserRegistry(),
10627        )
10628        job = db.get_job(job_id)
10629        source = job["metadata"]["source_ledger"][0]
10630
10631        assert result.status == "completed"
10632        assert result.result["source_warning"] == "captcha/anti-bot block"
10633        assert source["source"] == "https://source.example/search"
10634        assert source["fail_count"] == 1
10635        assert source["usefulness_score"] == 0.02
10636        assert job["metadata"]["last_lesson"]["category"] == "source_quality"
10637    finally:
10638        db.close()
10639
10640
10641def test_run_one_step_blocks_misleading_artifact_after_anti_bot_snapshot(tmp_path):
10642    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
10643    db = AgentDB(tmp_path / "state.db")
10644    try:
10645        job_id = db.create_job("Do not invent findings from blocked pages", title="guard")
10646        run_id = db.start_run(job_id)
10647        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="browser_snapshot")
10648        db.finish_step(
10649            step_id,
10650            status="completed",
10651            output_data={
10652                "success": True,
10653                "data": {
10654                    "origin": "https://source.example/search",
10655                    "snapshot": 'Iframe "Security CAPTCHA" You have been blocked.',
10656                },
10657            },
10658            summary="browser_snapshot returned 1250 chars",
10659        )
10660        db.finish_run(run_id, "completed")
10661
10662        result = run_one_step(
10663            job_id,
10664            config=config,
10665            db=db,
10666            llm=ScriptedLLM([
10667                LLMResponse(tool_calls=[ToolCall(
10668                    name="write_artifact",
10669                    arguments={
10670                        "title": "Directory finding source",
10671                        "summary": "Contains result listings for finding extraction",
10672                        "content": "This source contains reusable findings.",
10673                    },
10674                )])
10675            ]),
10676        )
10677        job = db.get_job(job_id)
10678
10679        assert result.status == "blocked"
10680        assert result.result["error"] == "misleading blocked-source artifact blocked"
10681        assert result.result["auto_source_record"]["source"]["source"] == "https://source.example/search"
10682        assert db.list_artifacts(job_id) == []
10683        assert job["metadata"]["source_ledger"][0]["warnings"] == ["captcha/anti-bot block"]
10684    finally:
10685        db.close()
10686
10687
10688def test_run_one_step_allows_blocked_source_artifact_when_acknowledged(tmp_path):
10689    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
10690    db = AgentDB(tmp_path / "state.db")
10691    try:
10692        job_id = db.create_job("Save blocked source notes", title="guard")
10693        run_id = db.start_run(job_id)
10694        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="browser_snapshot")
10695        db.finish_step(
10696            step_id,
10697            status="completed",
10698            output_data={
10699                "success": True,
10700                "data": {
10701                    "origin": "https://source.example/search",
10702                    "snapshot": 'Iframe "Security CAPTCHA" You have been blocked.',
10703                },
10704            },
10705        )
10706        db.finish_run(run_id, "completed")
10707
10708        result = run_one_step(
10709            job_id,
10710            config=config,
10711            db=db,
10712            llm=ScriptedLLM([
10713                LLMResponse(tool_calls=[ToolCall(
10714                    name="write_artifact",
10715                    arguments={
10716                        "title": "Blocked source note",
10717                        "summary": "Blocked by CAPTCHA; not usable as finding evidence",
10718                        "content": "The page showed a CAPTCHA and no usable evidence was visible.",
10719                    },
10720                )])
10721            ]),
10722        )
10723
10724        assert result.status == "completed"
10725        assert db.list_artifacts(job_id)[0]["title"] == "Blocked source note"
10726    finally:
10727        db.close()
10728
10729
10730def test_run_one_step_blocks_browser_loop_after_anti_bot_snapshot(tmp_path):
10731    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
10732    db = AgentDB(tmp_path / "state.db")
10733    try:
10734        job_id = db.create_job("Pivot after blocked browser pages", title="guard")
10735        run_id = db.start_run(job_id)
10736        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="browser_snapshot")
10737        db.finish_step(
10738            step_id,
10739            status="completed",
10740            output_data={
10741                "success": True,
10742                "data": {
10743                    "origin": "https://source.example/search",
10744                    "snapshot": 'Iframe "Security CAPTCHA" You have been blocked.',
10745                },
10746            },
10747        )
10748        db.finish_run(run_id, "completed")
10749
10750        result = run_one_step(
10751            job_id,
10752            config=config,
10753            db=db,
10754            llm=ScriptedLLM([
10755                LLMResponse(tool_calls=[ToolCall(name="browser_scroll", arguments={"direction": "down"})])
10756            ]),
10757        )
10758
10759        assert result.status == "blocked"
10760        assert result.result["error"] == "anti-bot source loop blocked"
10761        assert result.result["auto_source_record"]["source"]["fail_count"] == 1
10762    finally:
10763        db.close()
10764
10765
10766def test_run_one_step_blocks_known_bad_browser_source_from_ledger(tmp_path):
10767    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
10768    db = AgentDB(tmp_path / "state.db")
10769    try:
10770        job_id = db.create_job("Avoid sources already scored as bad", title="guard")
10771        db.append_source_record(
10772            job_id,
10773            "https://blocked.example/search",
10774            source_type="blocked_browser_source",
10775            usefulness_score=0.02,
10776            fail_count_delta=1,
10777            warnings=["captcha/anti-bot block"],
10778            outcome="blocked; pivot",
10779        )
10780
10781        result = run_one_step(
10782            job_id,
10783            config=config,
10784            db=db,
10785            llm=ScriptedLLM([
10786                LLMResponse(tool_calls=[ToolCall(name="browser_navigate", arguments={"url": "https://www.blocked.example/search?page=2"})])
10787            ]),
10788        )
10789        job = db.get_job(job_id)
10790
10791        assert result.status == "blocked"
10792        assert result.result["error"] == "known bad source blocked"
10793        assert result.result["known_bad_source"]["source"] == "https://blocked.example/search"
10794        assert job["metadata"]["last_agent_update"]["category"] == "blocked"
10795    finally:
10796        db.close()
10797
10798
10799def test_run_one_step_blocks_known_bad_extract_source_from_ledger(tmp_path):
10800    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
10801    db = AgentDB(tmp_path / "state.db")
10802    try:
10803        job_id = db.create_job("Avoid extracting bad sources", title="guard")
10804        db.append_source_record(
10805            job_id,
10806            "https://lowyield.example/source",
10807            source_type="web_source",
10808            usefulness_score=0.05,
10809            fail_count_delta=2,
10810            outcome="no useful candidates",
10811        )
10812
10813        result = run_one_step(
10814            job_id,
10815            config=config,
10816            db=db,
10817            llm=ScriptedLLM([
10818                LLMResponse(tool_calls=[ToolCall(
10819                    name="web_extract",
10820                    arguments={"urls": ["https://lowyield.example/source?retry=1"]},
10821                )])
10822            ]),
10823        )
10824
10825        assert result.status == "blocked"
10826        assert result.result["error"] == "known bad source blocked"
10827        assert result.result["known_bad_source"]["fail_count"] == 2
10828    finally:
10829        db.close()
10830
10831
10832def test_run_one_step_allows_child_url_when_bad_web_source_is_domain_root(tmp_path):
10833    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
10834    db = AgentDB(tmp_path / "state.db")
10835    try:
10836        job_id = db.create_job("Avoid over-broad domain source blocks", title="guard")
10837        db.append_source_record(
10838            job_id,
10839            "https://source.example",
10840            source_type="web_source",
10841            usefulness_score=0.05,
10842            fail_count_delta=1,
10843            outcome="root health check failed",
10844        )
10845
10846        result = run_one_step(
10847            job_id,
10848            config=config,
10849            db=db,
10850            llm=ScriptedLLM([
10851                LLMResponse(tool_calls=[ToolCall(
10852                    name="web_extract",
10853                    arguments={"urls": ["https://source.example/api/public/models"]},
10854                )])
10855            ]),
10856            registry=SuccessRegistry(),
10857        )
10858
10859        assert result.status == "completed"
10860        assert result.tool_name == "web_extract"
10861    finally:
10862        db.close()
10863
10864
10865def test_run_one_step_records_failed_shell_url_source(tmp_path):
10866    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
10867    db = AgentDB(tmp_path / "state.db")
10868    try:
10869        job_id = db.create_job("Avoid broken shell URL sources", title="guard")
10870        url = "https://source.example/api/private/tree/main"
10871
10872        result = run_one_step(
10873            job_id,
10874            config=config,
10875            db=db,
10876            llm=ScriptedLLM([
10877                LLMResponse(tool_calls=[ToolCall(name="shell_exec", arguments={"command": f"curl -s {url}"})])
10878            ]),
10879            registry=FailedUrlShellRegistry(),
10880        )
10881        sources = db.get_job(job_id)["metadata"]["source_ledger"]
10882        source = sources[0]
10883
10884        assert result.status == "failed"
10885        assert source["source"] == url
10886        assert source["source_type"] == "shell_exec"
10887        assert source["fail_count"] == 1
10888        assert source["usefulness_score"] == 0.01
10889        assert source["metadata"]["failure_kind"] == "auth_or_http"
10890    finally:
10891        db.close()
10892
10893
10894def test_run_one_step_records_pathful_failed_shell_urls_not_root_health_checks(tmp_path):
10895    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
10896    db = AgentDB(tmp_path / "state.db")
10897    try:
10898        job_id = db.create_job("Avoid poisoning whole hosts from mixed probes", title="guard")
10899        bad_url = "https://source.example/api/private/tree/main"
10900
10901        result = run_one_step(
10902            job_id,
10903            config=config,
10904            db=db,
10905            llm=ScriptedLLM([
10906                LLMResponse(tool_calls=[ToolCall(
10907                    name="shell_exec",
10908                    arguments={"command": f"curl -sI https://source.example && curl -s {bad_url}"},
10909                )])
10910            ]),
10911            registry=FailedUrlShellRegistry(),
10912        )
10913        sources = db.get_job(job_id)["metadata"]["source_ledger"]
10914
10915        assert result.status == "failed"
10916        assert [source["source"] for source in sources] == [bad_url]
10917    finally:
10918        db.close()
10919
10920
10921def test_run_one_step_blocks_known_bad_shell_source_family(tmp_path):
10922    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
10923    db = AgentDB(tmp_path / "state.db")
10924    try:
10925        job_id = db.create_job("Pivot from failed source family", title="guard")
10926
10927        first = run_one_step(
10928            job_id,
10929            config=config,
10930            db=db,
10931            llm=ScriptedLLM([
10932                LLMResponse(tool_calls=[ToolCall(
10933                    name="shell_exec",
10934                    arguments={"command": "curl -L https://source.example/downloads/private/model-a.bin"},
10935                )])
10936            ]),
10937            registry=FailedUrlShellRegistry(),
10938        )
10939        blocked = run_one_step(
10940            job_id,
10941            config=config,
10942            db=db,
10943            llm=ScriptedLLM([
10944                LLMResponse(tool_calls=[ToolCall(
10945                    name="shell_exec",
10946                    arguments={"command": "curl -L https://source.example/downloads/private/model-b.bin"},
10947                )])
10948            ]),
10949        )
10950        allowed = run_one_step(
10951            job_id,
10952            config=config,
10953            db=db,
10954            llm=ScriptedLLM([
10955                LLMResponse(tool_calls=[ToolCall(
10956                    name="shell_exec",
10957                    arguments={"command": "curl -L https://source.example/downloads/public/model-b.bin"},
10958                )])
10959            ]),
10960            registry=SuccessRegistry(),
10961        )
10962
10963        assert first.status == "failed"
10964        assert blocked.status == "blocked"
10965        assert blocked.result["error"] == "known bad source blocked"
10966        assert blocked.result["known_bad_source"]["source"] == "https://source.example/downloads/private"
10967        assert allowed.status == "completed"
10968    finally:
10969        db.close()
10970
10971
10972def test_run_one_step_derives_bad_shell_source_family_from_exact_failure(tmp_path):
10973    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
10974    db = AgentDB(tmp_path / "state.db")
10975    try:
10976        job_id = db.create_job("Pivot from exact failed file source", title="guard")
10977        db.append_source_record(
10978            job_id,
10979            "https://source.example/downloads/private/model-a.bin",
10980            source_type="shell_exec",
10981            usefulness_score=0.01,
10982            fail_count_delta=1,
10983            warnings=["auth failure"],
10984            outcome="401 Unauthorized",
10985        )
10986
10987        blocked = run_one_step(
10988            job_id,
10989            config=config,
10990            db=db,
10991            llm=ScriptedLLM([
10992                LLMResponse(tool_calls=[ToolCall(
10993                    name="shell_exec",
10994                    arguments={"command": "curl -L https://source.example/downloads/private/model-b.bin"},
10995                )])
10996            ]),
10997        )
10998
10999        assert blocked.status == "blocked"
11000        assert blocked.result["error"] == "known bad source blocked"
11001        assert blocked.result["known_bad_source"]["source"] == "https://source.example/downloads/private"
11002        assert blocked.result["known_bad_source"]["metadata"]["source_family_from"].endswith("/model-a.bin")
11003    finally:
11004        db.close()
11005
11006
11007def test_run_one_step_does_not_block_entire_host_after_auth_source_families(tmp_path):
11008    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
11009    db = AgentDB(tmp_path / "state.db")
11010    try:
11011        job_id = db.create_job("Pivot from repeated host auth failures", title="guard")
11012        for source in (
11013            "https://source.example/private/a/model.bin",
11014            "https://source.example/private/b/model.bin",
11015            "https://source.example/private/c/model.bin",
11016        ):
11017            db.append_source_record(
11018                job_id,
11019                source,
11020                source_type="shell_exec",
11021                usefulness_score=0.01,
11022                fail_count_delta=1,
11023                warnings=["401 unauthorized"],
11024                outcome="HTTP 401 Unauthorized",
11025                metadata={"failure_kind": "auth_or_http"},
11026            )
11027
11028        allowed_same_host = run_one_step(
11029            job_id,
11030            config=config,
11031            db=db,
11032            llm=ScriptedLLM([
11033                LLMResponse(tool_calls=[ToolCall(
11034                    name="shell_exec",
11035                    arguments={"command": "curl -L https://source.example/private/d/model.bin"},
11036                )])
11037            ]),
11038            registry=SuccessRegistry(),
11039        )
11040        allowed = run_one_step(
11041            job_id,
11042            config=config,
11043            db=db,
11044            llm=ScriptedLLM([
11045                LLMResponse(tool_calls=[ToolCall(
11046                    name="shell_exec",
11047                    arguments={"command": "curl -L https://other.example/private/d/model.bin"},
11048                )])
11049            ]),
11050            registry=SuccessRegistry(),
11051        )
11052
11053        assert allowed_same_host.status == "completed"
11054        assert allowed.status == "completed"
11055    finally:
11056        db.close()
11057
11058
11059def test_run_one_step_blocks_known_bad_shell_source_path(tmp_path):
11060    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
11061    db = AgentDB(tmp_path / "state.db")
11062    try:
11063        job_id = db.create_job("Pivot from failed shell URL source", title="guard")
11064        db.append_source_record(
11065            job_id,
11066            "https://source.example/api/private/tree/main",
11067            source_type="shell_exec",
11068            usefulness_score=0.01,
11069            fail_count_delta=1,
11070            warnings=["auth failure"],
11071            outcome="401 Unauthorized",
11072        )
11073
11074        blocked = run_one_step(
11075            job_id,
11076            config=config,
11077            db=db,
11078            llm=ScriptedLLM([
11079                LLMResponse(tool_calls=[ToolCall(
11080                    name="shell_exec",
11081                    arguments={"command": "curl -s 'https://source.example/api/private/tree/main?recursive=true'"},
11082                )])
11083            ]),
11084        )
11085        allowed = run_one_step(
11086            job_id,
11087            config=config,
11088            db=db,
11089            llm=ScriptedLLM([
11090                LLMResponse(tool_calls=[ToolCall(
11091                    name="shell_exec",
11092                    arguments={"command": "curl -s 'https://source.example/api/public/models'"},
11093                )])
11094            ]),
11095            registry=SuccessRegistry(),
11096        )
11097
11098        assert blocked.status == "blocked"
11099        assert blocked.result["error"] == "known bad source blocked"
11100        assert blocked.result["known_bad_source"]["source"] == "https://source.example/api/private/tree/main"
11101        assert allowed.status == "completed"
11102    finally:
11103        db.close()
11104
11105
11106def test_run_one_step_allows_mixed_shell_command_with_bad_root_health_check(tmp_path):
11107    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
11108    db = AgentDB(tmp_path / "state.db")
11109    try:
11110        job_id = db.create_job("Avoid over-broad shell root source blocks", title="guard")
11111        db.append_source_record(
11112            job_id,
11113            "https://source.example",
11114            source_type="shell_exec",
11115            usefulness_score=0.01,
11116            fail_count_delta=1,
11117            warnings=["root health check failed earlier"],
11118            outcome="HTTP failure",
11119        )
11120
11121        result = run_one_step(
11122            job_id,
11123            config=config,
11124            db=db,
11125            llm=ScriptedLLM([
11126                LLMResponse(tool_calls=[ToolCall(
11127                    name="shell_exec",
11128                    arguments={"command": "curl -sI https://source.example && curl -s https://source.example/api/public/models"},
11129                )])
11130            ]),
11131            registry=SuccessRegistry(),
11132        )
11133
11134        assert result.status == "completed"
11135        assert result.tool_name == "shell_exec"
11136    finally:
11137        db.close()
11138
11139
11140def test_run_one_step_saves_unpersisted_evidence_before_known_bad_source_block(tmp_path):
11141    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
11142    db = AgentDB(tmp_path / "state.db")
11143    try:
11144        job_id = db.create_job("Evidence checkpoint still wins", title="guard")
11145        db.append_source_record(
11146            job_id,
11147            "https://blocked.example/search",
11148            source_type="blocked_browser_source",
11149            usefulness_score=0.02,
11150            fail_count_delta=1,
11151            warnings=["captcha/anti-bot block"],
11152            outcome="blocked; pivot",
11153        )
11154        run_id = db.start_run(job_id)
11155        step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="browser_snapshot")
11156        db.finish_step(
11157            step_id,
11158            status="completed",
11159            output_data={
11160                "success": True,
11161                "data": {"origin": "https://useful.example"},
11162                "snapshot": "Useful source evidence. " * 80,
11163            },
11164        )
11165        db.finish_run(run_id, "completed")
11166
11167        result = run_one_step(
11168            job_id,
11169            config=config,
11170            db=db,
11171            llm=ScriptedLLM([
11172                LLMResponse(tool_calls=[ToolCall(name="browser_navigate", arguments={"url": "https://blocked.example/search"})])
11173            ]),
11174        )
11175
11176        assert result.status == "blocked"
11177        assert result.result["error"] == "artifact required before more research"
11178        assert "auto_checkpoint" in result.result
11179        assert result.result["auto_checkpoint"]["artifact_id"]
11180    finally:
11181        db.close()
11182
11183
11184def test_run_one_step_blocks_search_streak(tmp_path):
11185    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
11186    db = AgentDB(tmp_path / "state.db")
11187    try:
11188        job_id = db.create_job("Do not search forever", title="guard")
11189        for query in ("alpha findings", "beta findings", "gamma findings"):
11190            run_id = db.start_run(job_id)
11191            step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="web_search", input_data={"arguments": {"query": query}})
11192            db.finish_step(step_id, status="completed", output_data={"success": True, "query": query, "results": []})
11193            db.finish_run(run_id, "completed")
11194
11195        result = run_one_step(
11196            job_id,
11197            config=config,
11198            db=db,
11199            llm=ScriptedLLM([
11200                LLMResponse(tool_calls=[ToolCall(name="web_search", arguments={"query": "delta findings", "limit": 5})])
11201            ]),
11202        )
11203
11204        assert result.status == "blocked"
11205        assert result.result["error"] == "search loop blocked"
11206        assert result.result["recent_search_streak"] == 3
11207    finally:
11208        db.close()
11209
11210
11211def test_run_one_step_blocks_similar_search_query(tmp_path):
11212    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
11213    db = AgentDB(tmp_path / "state.db")
11214    try:
11215        job_id = db.create_job("Avoid query rewrites", title="guard")
11216        run_id = db.start_run(job_id)
11217        step_id = db.add_step(
11218            job_id=job_id,
11219            run_id=run_id,
11220            kind="tool",
11221            tool_name="web_search",
11222            input_data={"arguments": {"query": "target digital marketing research"}},
11223        )
11224        db.finish_step(step_id, status="completed", output_data={"success": True, "query": "target digital marketing research", "results": []})
11225        db.finish_run(run_id, "completed")
11226
11227        result = run_one_step(
11228            job_id,
11229            config=config,
11230            db=db,
11231            llm=ScriptedLLM([
11232                LLMResponse(tool_calls=[ToolCall(name="web_search", arguments={"query": "target marketing digital research", "limit": 5})])
11233            ]),
11234        )
11235
11236        assert result.status == "blocked"
11237        assert result.result["error"] == "similar search query blocked"
11238    finally:
11239        db.close()
11240
11241
11242def test_run_one_step_reflects_every_fixed_interval(tmp_path):
11243    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
11244    db = AgentDB(tmp_path / "state.db")
11245    try:
11246        job_id = db.create_job("Reflect over work", title="reflect")
11247        for index in range(12):
11248            run_id = db.start_run(job_id)
11249            step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="web_search")
11250            db.finish_step(step_id, status="completed", summary=f"step {index}", output_data={"success": True})
11251            db.finish_run(run_id, "completed")
11252
11253        result = run_one_step(
11254            job_id,
11255            config=config,
11256            db=db,
11257            llm=ScriptedLLM([
11258                LLMResponse(tool_calls=[ToolCall(name="web_search", arguments={"query": "should not be used"})])
11259            ]),
11260        )
11261        job = db.get_job(job_id)
11262
11263        assert result.tool_name == "reflect"
11264        assert result.status == "completed"
11265        assert job["metadata"]["reflections"]
11266        assert job["metadata"]["last_agent_update"]["category"] == "plan"
11267        assert "Lessons learned:" in build_messages(job, db.list_steps(job_id=job_id))[-1]["content"]
11268    finally:
11269        db.close()
11270
11271
11272def test_reflection_does_not_repeat_existing_strategy_lesson(tmp_path):
11273    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
11274    db = AgentDB(tmp_path / "state.db")
11275    strategy = "Choose the next branch from durable evidence, then record the result as findings, tasks, experiments, sources, or memory."
11276    try:
11277        job_id = db.create_job("Reflect over repeated work", title="reflect")
11278        db.append_lesson(job_id, strategy, category="strategy")
11279        for index in range(12):
11280            run_id = db.start_run(job_id)
11281            step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="web_search")
11282            db.finish_step(step_id, status="completed", summary=f"step {index}", output_data={"success": True})
11283            db.finish_run(run_id, "completed")
11284
11285        result = run_one_step(
11286            job_id,
11287            config=config,
11288            db=db,
11289            llm=ScriptedLLM([
11290                LLMResponse(tool_calls=[ToolCall(name="web_search", arguments={"query": "should not be used"})])
11291            ]),
11292        )
11293        job = db.get_job(job_id)
11294
11295        assert result.tool_name == "reflect"
11296        assert result.result["lesson_recorded"] is False
11297        assert len(job["metadata"]["lessons"]) == 1
11298        assert job["metadata"]["lessons"][0].get("seen_count") is None
11299    finally:
11300        db.close()
11301
11302
11303def test_reflection_strategy_uses_current_operator_state(tmp_path):
11304    config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
11305    db = AgentDB(tmp_path / "state.db")
11306    try:
11307        job_id = db.create_job(
11308            "Reflect over operator context",
11309            title="reflect",
11310            metadata={
11311                "operator_messages": [
11312                    {
11313                        "id": "op_1",
11314                        "mode": "steer",
11315                        "message": "Use the corrected target before continuing.",
11316                        "created_at": "2026-01-01T00:00:00+00:00",
11317                    }
11318                ]
11319            },
11320        )
11321        for index in range(12):
11322            run_id = db.start_run(job_id)
11323            step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
11324            db.finish_step(step_id, status="completed", summary=f"step {index}", output_data={"success": True})
11325            db.finish_run(run_id, "completed")
11326
11327        result = run_one_step(
11328            job_id,
11329            config=config,
11330            db=db,
11331            llm=ScriptedLLM([
11332                LLMResponse(tool_calls=[ToolCall(name="shell_exec", arguments={"command": "should not run"})])
11333            ]),
11334        )
11335
11336        assert result.tool_name == "reflect"
11337        assert "Incorporate or supersede active operator context" in result.result["reflection"]["strategy"]
11338    finally:
11339        db.close()
Verification

Test Coverage Map

Test files included in the source index.

tests/nipux_cli/test_artifacts.py

2 tests · 34 lines

test_artifact_store_writes_reads_and_searches, test_artifact_store_rejects_paths_outside_home

tests/nipux_cli/test_browser_web.py

8 tests · 118 lines

test_session_name_is_stable_and_safe, test_long_session_name_is_short_and_hashed, test_strip_html_removes_scripts_and_keeps_text, test_browser_marks_anti_bot_interstitial_as_warning, test_browser_marks_captcha_block_as_warning, test_web_extract_marks_anti_bot_pages_as_warning, test_browser_tool_uses_native_wrapper, ...

tests/nipux_cli/test_cli.py

184 tests · 4970 lines

test_cli_has_operator_commands, test_cli_version_flag, test_main_catches_keyboard_interrupt_without_traceback, test_python_module_entrypoint_uses_cli_main, test_init_openrouter_writes_secret_free_config_and_env_template, test_init_defaults_to_local_endpoint, test_init_openrouter_defaults_to_generic_route, test_shell...

tests/nipux_cli/test_cli_model_preflight.py

5 tests · 86 lines

test_remote_model_preflight_blocks_rejected_auth, test_remote_model_preflight_allows_recovery_monitor_for_quota, test_remote_model_preflight_skips_fake_runs, test_model_preflight_checks_local_endpoints, test_start_does_not_spawn_daemon_when_model_preflight_fails

tests/nipux_cli/test_compression.py

1 tests · 101 lines

test_refresh_memory_index_includes_durable_progress_ledgers

tests/nipux_cli/test_config.py

6 tests · 143 lines

test_load_config_defaults_to_local_endpoint, test_load_config_from_yaml, test_load_config_reads_local_env_file, test_load_config_tightens_local_env_permissions, test_default_config_yaml_allows_provider_template_without_secret, test_config_example_matches_default_local_endpoint

tests/nipux_cli/test_daemon.py

30 tests · 560 lines

test_single_instance_lock_rejects_second_holder, test_daemon_lock_status_reports_free_lock, test_lock_metadata_can_be_updated_while_held, test_lock_metadata_update_restores_missing_process_fields, test_daemon_lock_heartbeat_updates_while_worker_turn_runs, test_stop_daemon_recovers_pidless_lock_from_process_list, tes...

tests/nipux_cli/test_dashboard.py

3 tests · 86 lines

test_dashboard_collects_jobs_steps_and_artifacts, test_overview_marks_idle_daemon_as_ready_for_work, test_overview_marks_old_heartbeat_as_busy_for_running_step

tests/nipux_cli/test_db.py

19 tests · 609 lines

test_db_job_run_step_and_artifact_roundtrip, test_create_job_uses_unique_readable_slug_ids, test_step_numbers_increment_across_runs_for_a_job, test_job_token_usage_aggregates_message_usage, test_append_operator_message_roundtrip, test_claim_operator_messages_marks_one_message_at_a_time, test_acknowledge_operator_mes...

tests/nipux_cli/test_digest.py

1 tests · 43 lines

test_daily_digest_includes_ledgers_lessons_sources_and_strategy

tests/nipux_cli/test_doctor.py

5 tests · 157 lines

test_doctor_checks_local_runtime_without_model_call, test_doctor_warns_when_remote_model_key_is_missing, test_doctor_reports_openrouter_auth_failure, test_doctor_reports_generation_limit_after_model_listing, test_doctor_reports_nested_provider_generation_error

tests/nipux_cli/test_generic_runtime_audit.py

1 tests · 35 lines

test_runtime_code_has_no_task_specific_literals

tests/nipux_cli/test_live_memory_graph_smoke.py

2 tests · 37 lines

test_live_memory_graph_smoke_fails_cleanly_without_key, test_live_memory_graph_smoke_seed_pushes_generic_consolidation

tests/nipux_cli/test_llm.py

5 tests · 151 lines

test_chat_llm_requires_tool_choice_for_worker_actions, test_chat_llm_retries_without_tool_choice_when_provider_rejects_it, test_chat_llm_complete_response_returns_usage, test_chat_llm_disables_provider_sdk_retries, test_openrouter_generation_usage_enriches_cost_and_tokens

tests/nipux_cli/test_measurement.py

2 tests · 32 lines

test_measurement_candidates_extract_markdown_table_unit_columns, test_measurement_candidates_extract_generic_table_metrics

tests/nipux_cli/test_metric_format.py

2 tests · 11 lines

test_format_metric_value_spaces_named_units, test_format_metric_value_keeps_attached_symbol_units

tests/nipux_cli/test_operator_context.py

3 tests · 30 lines

test_conversation_only_operator_messages_do_not_enter_worker_prompt, test_actionable_operator_messages_remain_worker_constraints, test_inactive_prompt_operator_ids_returns_only_conversation_active_messages

tests/nipux_cli/test_planning.py

8 tests · 90 lines

test_initial_task_contracts_are_generic_and_complete, test_initial_roadmap_uses_valid_generic_contracts, test_initial_plan_adapts_to_measurable_objectives, test_initial_plan_adapts_to_deliverable_objectives, test_initial_plan_treats_generated_files_as_deliverables, test_initial_plan_adapts_to_monitoring_objectives, ...

tests/nipux_cli/test_progress.py

7 tests · 214 lines

test_progress_checkpoint_reports_deltas_and_recent_durable_work, test_progress_checkpoint_for_saved_output_is_concise, test_progress_checkpoint_without_delta_is_activity_not_progress, test_progress_checkpoint_counts_existing_record_updates_as_progress, test_progress_checkpoint_ignores_non_substantive_record_touches,...

tests/nipux_cli/test_project_atlas.py

2 tests · 46 lines

test_project_atlas_generator_maps_prompts_tools_and_source_without_self_embedding, test_project_atlas_redacts_secret_assignments_from_rendered_source

tests/nipux_cli/test_provider_errors.py

2 tests · 21 lines

test_provider_action_required_detects_payload_and_status_text, test_provider_rate_limited_detects_transient_rate_text

tests/nipux_cli/test_templates.py

1 tests · 15 lines

test_generic_template_pushes_artifacts_and_updates

tests/nipux_cli/test_tools.py

69 tests · 2166 lines

test_static_tool_surface_is_focused, test_tool_registry_validates_required_arguments, test_tool_registry_blocks_truncated_reference_arguments, test_tool_access_config_filters_worker_schema_and_blocks_calls, test_artifact_tools_roundtrip, test_read_artifact_missing_ref_returns_valid_recent_refs, test_defer_job_record...

tests/nipux_cli/test_uninstall.py

7 tests · 136 lines

test_uninstall_plan_includes_runtime_and_legacy_state, test_uninstall_plan_includes_configured_runtime_home, test_uninstall_runtime_removes_state_and_service_files, test_uninstall_runtime_dry_run_keeps_files, test_uninstall_installed_tool_uses_uv_when_available, test_uninstall_installed_tool_falls_back_to_safe_uv_pa...

tests/nipux_cli/test_worker.py

296 tests · 11339 lines

test_system_prompt_is_contract_first_not_research_first, test_run_one_step_executes_scripted_tool_call, test_run_one_step_records_estimated_usage_for_scripted_model, test_run_one_step_blocks_content_only_worker_turn, test_run_one_step_repairs_content_only_worker_turn_with_tool_retry, test_run_one_step_recovers_repea...

Audit cues

Review Points

Generated signals for where to inspect next.

Large modules

tests/nipux_cli/test_worker.py (11339 lines), nipux_cli/worker.py (7538 lines), tests/nipux_cli/test_cli.py (4970 lines), nipux_cli/cli.py (3188 lines), nipux_cli/db.py (2752 lines)

Prompt surfaces

25 prompt/instruction-like strings were extracted. Inspect this section after any agent-behavior change.

Tool surface

29 tools are exposed to the worker. Review descriptions whenever generic behavior changes.

Symbol map

2158 symbols were parsed. Large modules are candidates for refactoring once behavior stabilizes.