AGENTS.md 32 lines
1# Development Notes
2
3This repo is a focused long-running worker, not a broad assistant distribution.
4
5## Active Surface
6
7- Runtime package: `nipux_cli/`
8- Tests: `tests/nipux_cli/`
9- Entry point: `nipux`
10- State home: `~/.nipux` or `NIPUX_HOME`
11- Planning notes: `plans/nipux-runtime-notes.md`
12
13## Constraints
14
15- Keep the default tool surface small and explicit.
16- Do not reintroduce broad upstream surfaces such as gateways, skills, plugins, web UI, ACP, RL environments, voice, image generation, or arbitrary terminal execution. The chat-first terminal UI is part of Nipux's active product surface; keep it generic, minimal, and backed by persisted worker state.
17- Preserve restartability: every worker step should persist state before and after tool execution.
18- Store exact evidence as artifacts. Summaries should point back to artifacts instead of replacing them.
19- Keep `memory_index` entries compact and artifact-referenced; do not use raw transcript replay as the long-term state strategy.
20- Prefer OpenAI-compatible model serving, configured through `~/.nipux/config.yaml`.
21- Keep runtime behavior domain-neutral. Do not add task-specific or environment-specific guards, keyword lists, examples, prompts, tools, or tests and describe them as generic framework improvements.
22
23## Validation
24
25Use the focused suite:
26
27```bash
28PYTEST_ADDOPTS='' uv run --extra dev python -m pytest -q
29uv run --extra dev ruff check --isolated nipux_cli tests/nipux_cli
30```
31
32Use `nipux daemon --once --fake` for a no-model smoke test. Use `nipux logs JOB_ID --verbose` or `nipux watch JOB_ID --verbose` when inspecting what a background job is actually doing.
README.md 378 lines
1# Nipux CLI
2
3```text
4 _ _ ___ ____ _ ___ __
5| \ | |_ _| _ \| | | \ \/ /
6| \| || || |_) | | | |> <
7| |\ || || __/| |_| /_/\_\
8|_| \_|___|_| \__,_|
9```
10
11Nipux CLI is a small, restartable worker for long-running browser, web research,
12and command-line jobs. It supports any OpenAI-compatible local or remote model
13endpoint. It is maintained for Nipux and built around one practical idea: keep a
14worker moving in bounded steps, save exact evidence, learn from each branch, and
15recover cleanly when a process or model call fails.
16
17- Website: [Nipux.com](https://nipux.com)
18- Source: [github.com/nipuxx/agent-cli](https://github.com/nipuxx/agent-cli)
19- License: [MIT](LICENSE)
20
21## What It Does
22
23Nipux runs jobs that are too long or repetitive for a single chat turn. A job can
24search the web, operate a persistent browser profile, write artifacts, inspect
25local files with bounded shell commands, update source and finding ledgers, and
26continue through a daemon loop until the operator pauses or cancels it.
27
28The default runtime is intentionally narrow:
29
30- one OpenAI-compatible model endpoint chosen during setup
31- one SQLite state store under `~/.nipux`
32- one restartable daemon with a single-instance lock
33- per-job artifact files for exact evidence
34- per-job browser profiles through `agent-browser`
35- compact memory summaries that point back to artifacts
36- visible event history for chat, tools, artifacts, progress, errors, and digests
37- durable ledgers for lessons, sources, findings, tasks, roadmap, and experiments
38
39Nipux does not include a messaging gateway, plugin marketplace, skills manager,
40RL environment, voice stack, image stack, or broad web application. The public
41surface is the `nipux` CLI and the focused `nipux_cli/` Python package.
42
43## Install
44
45Requirements:
46
47- Python 3.11+
48- an OpenAI-compatible chat completions endpoint, local or remote
49- optional browser automation: `npm install -g agent-browser && agent-browser install`
50
51Install and open the full-screen setup wizard with one command:
52
53```bash
54curl -fsSL https://raw.githubusercontent.com/nipuxx/agent-cli/main/scripts/install.sh | bash
55```
56
57The first-run wizard asks for the provider/model, endpoint, API key location,
58and tool access. After the model is verified, Nipux opens the workspace chat
59where you can describe worker jobs in plain language. It stores
60secrets outside the git repo and writes runtime state under `~/.nipux` unless
61`NIPUX_HOME` is set.
62
63Install from a local checkout while developing:
64
65```bash
66git clone https://github.com/nipuxx/agent-cli.git
67cd agent-cli
68uv tool install --editable .
69nipux
70```
71
72Install directly from git once the repository is public:
73
74```bash
75uv tool install git+https://github.com/nipuxx/agent-cli.git
76```
77
78## First Run
79
80Run `nipux`. If this is a fresh profile, the full-screen setup wizard opens
81immediately and locks chat/job creation until the configured model passes a real
82chat request. The wizard writes `config.yaml` and a local `.env` template under
83`~/.nipux` unless `NIPUX_HOME` is set. Real API keys stay in the environment or
84`~/.nipux/.env`, not in the git repo.
85
86```bash
87nipux
88```
89
90After setup, `nipux` opens the workspace chat. Type a plain-English goal to spin
91up a worker, or use `/new OBJECTIVE`. Use `/settings` to edit model, endpoint,
92tool access, runtime, and cost fields from inside the UI.
93
94Manual configuration is still available for scripts or headless environments:
95
96```bash
97nipux init --model local-model --base-url http://localhost:8000/v1 --api-key-env OPENAI_API_KEY
98nipux doctor --check-model
99```
100
101`nipux init` creates `~/.nipux/config.yaml` and `~/.nipux/.env` with private
102file permissions. Later `/api-key` edits keep the secret in `~/.nipux/.env`
103instead of writing it to config.
104
105Update an installed tool or source checkout from anywhere:
106
107```bash
108nipux update
109```
110
111When installed as a `uv tool`, `nipux update` force-refreshes the command from
112the source repository and verifies the installed command afterward. When run
113inside a git checkout, it fast-forwards the checkout. If a daemon is running,
114update restarts it automatically unless `--no-restart` is used. Set
115`NIPUX_UPDATE_SPEC` only when you need to update from a different package source.
116
117Inspect progress from the terminal:
118
119```bash
120nipux status
121nipux activity --follow
122```
123
124On macOS, install launchd autostart:
125
126```bash
127nipux autostart install --poll-seconds 0
128nipux autostart status
129```
130
131On Linux, install a user service:
132
133```bash
134nipux service install
135nipux service status
136```
137
138Fully remove local runtime state when you want a fresh user install:
139
140```bash
141nipux uninstall --yes
142```
143
144This stops the daemon, removes launchd/systemd service files, deletes
145`~/.nipux`, removes legacy `~/.kneepucks` state if it exists, and removes the
146installed `nipux` command with `uv tool uninstall nipux`. Add `--keep-tool` only
147when you intentionally want to keep the command installed.
148
149## Secrets
150
151Nipux never needs an API key in `config.yaml`. The config stores only the name
152of the environment variable to read:
153
154```yaml
155model:
156 name: provider/model
157 base_url: https://provider.example/v1
158 api_key_env: PROVIDER_API_KEY
159 # Optional fallback pricing when the provider does not return cost metadata.
160 input_cost_per_million: null
161 output_cost_per_million: null
162```
163
164Put secrets in your shell, your process manager, or `~/.nipux/.env`:
165
166```bash
167# ~/.nipux/.env
168PROVIDER_API_KEY = <redacted>
169```
170
171The repository includes `.env.example` and `config.example.yaml` as templates.
172Do not commit real `.env`, state databases, logs, artifacts, or browser
173profiles. The default `.gitignore` excludes those local runtime files.
174
175## Tool Access
176
177The first-run wizard and config slash commands control which generic tool groups
178the worker can use:
179
180```yaml
181tools:
182 browser: true
183 web: true
184 shell: true
185 files: true
186```
187
188Use `/browser`, `/web`, `/cli-access`, and `/file-access` in the terminal UI to
189change those switches later. Disabled tools are removed from the worker tool
190schema and blocked if an old daemon tries to call them.
191
192## Local Model Examples
193
194Nipux talks to OpenAI-compatible `/v1/chat/completions` and `/v1/models`
195servers. Use any serving stack that supports the model and tool-calling behavior
196you want.
197
198SGLang example:
199
200```bash
201python -m sglang.launch_server \
202 --model-path "$MODEL_NAME" \
203 --port 8000 \
204 --context-length 262144 \
205 --reasoning-parser auto \
206 --tool-call-parser auto
207```
208
209vLLM example:
210
211```bash
212vllm serve "$MODEL_NAME" \
213 --port 8000 \
214 --max-model-len 262144 \
215 --enable-auto-tool-choice \
216 --tool-call-parser auto
217```
218
219## Operator Workflow
220
221The no-argument CLI opens the focused job directly. Plain text becomes operator
222steering for the next worker step. The terminal UI keeps conversation/output on
223the left and status, jobs, saved outputs, updates, and worker activity on the
224right. Configuration is handled through slash commands such as `/model`,
225`/api-key`, `/base-url`, and `/context`, not a separate settings page.
226
227```text
228nipux > what are you working on?
229nipux > prioritize measured progress over notes
230```
231
232For direct command use:
233
234```bash
235uv run nipux status "nightly research" --full
236uv run nipux history "nightly research"
237uv run nipux events "nightly research" --follow
238uv run nipux activity "nightly research" --follow
239uv run nipux outcomes "nightly research"
240uv run nipux outcomes --all
241uv run nipux findings "nightly research"
242uv run nipux tasks "nightly research"
243uv run nipux roadmap "nightly research"
244uv run nipux experiments "nightly research"
245uv run nipux sources "nightly research"
246uv run nipux memory "nightly research"
247uv run nipux metrics "nightly research"
248uv run nipux usage "nightly research"
249uv run nipux artifacts "nightly research" --paths
250```
251
252Use `nipux health` for daemon truth without opening the dashboard. It reports
253the lock state, heartbeat, recent failures, log paths, autostart state, focused
254job, and latest daemon events.
255
256### Seeing What It Actually Did
257
258Use these views when a job has been running unattended:
259
260- `nipux outcomes JOB` or the **Outcomes** pane: durable work grouped by time,
261 including saved outputs, findings, measurements, decisions, lessons, and file
262 changes.
263- `nipux outcomes --all`: latest durable work and saved outputs for every job,
264 useful when several agents have been running in the background.
265- `nipux activity JOB --follow` or the **Work** pane: the raw live tool stream
266 for debugging what the worker is doing right now.
267- `nipux usage JOB`: model calls, context pressure, output tokens, and cost when
268 the provider returns cost metadata. If the provider does not return cost,
269 configure `/input-cost` and `/output-cost` to estimate it from token counts.
270- `nipux digest JOB` and `nipux daily-digest`: durable summary reports that
271 include progress counts, active operator context, experiments, artifacts, and
272 token/cost usage.
273
274## Tool Surface
275
276The worker exposes a deliberately small tool registry:
277
278- `browser_navigate`
279- `browser_snapshot`
280- `browser_click`
281- `browser_type`
282- `browser_scroll`
283- `browser_back`
284- `browser_press`
285- `browser_console`
286- `web_search`
287- `web_extract`
288- `shell_exec`
289- `write_file`
290- `write_artifact`
291- `read_artifact`
292- `search_artifacts`
293- `update_job_state`
294- `defer_job`
295- `report_update`
296- `record_lesson`
297- `acknowledge_operator_context`
298- `record_source`
299- `record_findings`
300- `record_tasks`
301- `record_roadmap`
302- `record_milestone_validation`
303- `record_experiment`
304- `send_digest_email`
305
306`shell_exec` is bounded with timeouts and output capture. Browser sessions use
307per-job profiles under `~/.nipux/browser-profiles/`. Anti-bot, CAPTCHA, login,
308and paywall pages are recorded as visible source-quality warnings; Nipux does
309not bypass protections.
310
311Workers can use `defer_job` for scheduled follow-up, monitor intervals, or long
312external processes that are actually waiting on time to pass. Deferred jobs stay
313runnable but show as waiting until their next check time, so the daemon can keep
314other work moving without burning model calls on repeated polling.
315
316## Command Reference
317
318```bash
319nipux init [--force] [--openrouter] [--model MODEL] [--base-url URL] [--api-key-env ENV]
320nipux update [--path PATH] [--allow-dirty] [--no-restart]
321nipux uninstall [--yes] [--dry-run] [--keep-legacy] [--keep-tool]
322nipux doctor [--check-model]
323nipux shell [--status]
324nipux create "objective" [--title TITLE] [--kind KIND] [--cadence CADENCE]
325nipux jobs
326nipux ls
327nipux focus [JOB_TITLE]
328nipux rename JOB_TITLE --title NEW_TITLE
329nipux delete JOB_TITLE [--keep-files]
330nipux chat [JOB_TITLE] [--no-history]
331nipux steer [--job JOB_TITLE] MESSAGE
332nipux pause [JOB_TITLE] [note...]
333nipux resume [JOB_TITLE]
334nipux cancel [JOB_TITLE] [note...]
335nipux start [--poll-seconds N]
336nipux stop
337nipux autostart install|status|uninstall [--poll-seconds N]
338nipux service install|status|uninstall [--poll-seconds N]
339nipux browser-dashboard [--port N] [--foreground] [--stop]
340nipux health
341nipux status [JOB_TITLE] [--full] [--json]
342nipux history [JOB_TITLE] [--full] [--json]
343nipux events [JOB_TITLE] [--follow] [--json]
344nipux activity [JOB_TITLE] [--follow] [--verbose]
345nipux updates [JOB_TITLE]
346nipux outcomes [JOB_TITLE] [--all]
347nipux dashboard [JOB_TITLE]
348nipux findings [JOB_TITLE] [--limit N] [--json]
349nipux tasks [JOB_TITLE] [--limit N] [--status STATUS] [--json]
350nipux roadmap [JOB_TITLE] [--limit N] [--json]
351nipux experiments [JOB_TITLE] [--limit N] [--status STATUS] [--json]
352nipux sources [JOB_TITLE] [--limit N] [--json]
353nipux memory [JOB_TITLE]
354nipux metrics [JOB_TITLE]
355nipux usage [JOB_TITLE] [--json]
356nipux artifacts [JOB_TITLE] [--paths]
357nipux artifact QUERY_OR_TITLE [--job JOB_TITLE]
358nipux lessons [JOB_TITLE]
359nipux learn [--job JOB_TITLE] [--category CATEGORY] LESSON
360nipux logs [JOB_TITLE] [--limit N] [--verbose]
361nipux outputs [JOB_TITLE] [--limit N] [--verbose]
362nipux watch JOB_TITLE [--verbose]
363nipux run-one JOB_TITLE [--fake]
364nipux work [JOB_TITLE] [--steps N] [--verbose] [--dashboard]
365nipux run [JOB_TITLE] [--poll-seconds N] [--no-follow]
366nipux daemon [--once] [--fake] [--verbose] [--poll-seconds N]
367nipux digest JOB_TITLE
368nipux daily-digest [--day YYYY-MM-DD]
369```
370
371## Development
372
373```bash
374PYTEST_ADDOPTS='' uv run --extra dev python -m pytest -q
375uv run --extra dev ruff check --isolated nipux_cli tests/nipux_cli
376```
377
378The active implementation notes live in `plans/nipux-runtime-notes.md`.
RELEASE_CHECKLIST.md 38 lines
1# Release Checklist
2
3Use this before sharing the repository with outside users.
4
5## Secrets
6
7- No real API keys in git.
8- `config.yaml` examples use `model.api_key_env`, not literal keys.
9- `.env`, `.env.*`, state databases, logs, artifacts, and browser profiles are ignored.
10- `nipux doctor` reports missing remote API-key environment variables without printing key values.
11
12## Install
13
14- `uv tool install --editable .` works from a checkout.
15- `uv run nipux --help` works without installing.
16- `NIPUX_HOME=$(mktemp -d) uv run nipux` opens the first-run terminal UI, not argparse help or an ASCII-only prompt.
17- `nipux init` writes the default Qwen/OpenRouter `~/.nipux/config.yaml` and a blank `~/.nipux/.env` template.
18- `nipux doctor` passes for local runtime checks after initialization.
19- `nipux daemon --once --fake` runs without a model key.
20
21## Runtime
22
23- `nipux start`, `nipux stop`, and `nipux restart` recover stale daemon state.
24- `nipux status`, `nipux activity`, `nipux history`, and `nipux artifacts` expose enough state to debug jobs.
25- Worker prompts stay bounded and do not replay raw transcript history.
26- Operator chat that is only conversational stays in history but does not remain active worker context.
27- Measurable jobs record experiments instead of treating notes as progress.
28- Status, outcomes, and work panes show different layers clearly: jobs and latest outputs, durable progress by hour, and raw tool/console events.
29
30## Validation
31
32```bash
33python -m compileall nipux_cli tests/nipux_cli
34uv run --extra dev python -m pytest tests/nipux_cli -q
35uv run --extra dev ruff check nipux_cli tests/nipux_cli
36rg -n --hidden -S "(sk-[A-Za-z0-9_-]{20,}|OPENROUTER_API_KEY[=].+|OPENAI_API_KEY[=].+|Bearer\\s+[A-Za-z0-9._-]{20,})" . \
37 -g '!uv.lock' -g '!**/__pycache__/**' -g '!*.db' -g '!*.log' -g '!*.pyc'
38```
config.example.yaml 27 lines
1model:
2 name: local-model
3 base_url: http://localhost:8000/v1
4 api_key_env: OPENAI_API_KEY
5 context_length: 262144
6 input_cost_per_million: null
7 output_cost_per_million: null
8runtime:
9 max_step_seconds: 600
10 max_steps_per_run: 1
11 artifact_inline_char_limit: 12000
12 daily_digest_enabled: true
13 daily_digest_time: "08:00"
14 max_job_cost_usd: null
15tools:
16 browser: true
17 web: true
18 shell: true
19 files: true
20email:
21 enabled: false
22 smtp_host: ""
23 smtp_port: 587
24 username: ""
25 password_env: NIPUX_EMAIL_PASSWORD
26 from_addr: ""
27 to_addr: ""
docs/long-running-memory-graph-design.md 62 lines
1# Long-Running Memory Graph Design
2
3Nipux needs long-running workers that keep improving instead of flattening into repeated search, notes, or shallow checkpoints. The backend now treats each job as having a small durable "brain": a job-local memory graph made of connected nodes and links. It is not task-specific and does not require embeddings or a new service to be useful.
4
5## Research Takeaways
6
7- **Complementary learning systems:** human memory separates fast episodic capture from slower semantic consolidation. The hippocampus rapidly stores separated episodes while cortex gradually extracts structure. Nipux mirrors this with recent events/steps as fast episodic traces and `memory_graph` nodes as consolidated reusable knowledge. Source: [O'Reilly and Norman, 2002](https://collaborate.princeton.edu/en/publications/hippocampal-and-neocortical-contributions-to-memory-advances-in-t) and [McClelland et al., 1995](https://colab.ws/articles/10.1037/0033-295x.102.3.419).
8- **Sleep/consolidation:** memory consolidation strengthens relevant traces and reorganizes them into associations that support later inference. Nipux should periodically turn raw work into compact graph nodes and edges instead of replaying full history. Source: [Born and Wilhelm, 2012](https://link.springer.com/article/10.1007/s00426-011-0335-6) and [Diekelmann and Born, 2010](https://www.nature.com/articles/nrn2762).
9- **Reflexion:** agents improve without weight updates by writing verbal reflections into episodic memory after feedback. Nipux already has lessons and reflection; the graph adds structure so reflections can connect to facts, decisions, tasks, and evidence. Source: [Reflexion](https://huggingface.co/papers/2303.11366).
10- **Generative Agents:** believable long-lived agents combine memory stream, retrieval, reflection, and planning. Nipux should keep the event stream, but retrieve distilled context through durable ledgers and graph nodes. Source: [Generative Agents](https://huggingface.co/papers/2304.03442).
11- **MemGPT:** OS-style memory tiers let fixed-context models use long histories by paging between prompt context and archival memory. Nipux's prompt now gets only a ranked slice of graph memory, with `search_memory_graph` for deeper recall. Source: [MemGPT](https://huggingface.co/papers/2310.08560).
12- **Voyager:** long-horizon improvement comes from an automatic curriculum, a growing reusable skill library, and iterative self-verification. Nipux's graph supports this by representing skills, strategies, open questions, decisions, and evidence links as reusable nodes. Source: [Voyager](https://voyager.minedojo.org/).
13- **Agent memory surveys and graph memory work:** recent surveys and systems emphasize memory operations: write, retrieve, update, consolidate, forget/deprecate, and evaluate. Graph memory helps preserve relationships and temporal change better than a flat note list. Sources: [LLM Agent Memory Survey](https://huggingface.co/papers/2404.13501), [AriGraph](https://huggingface.co/papers/2407.04363), [Zep](https://huggingface.co/papers/2501.13956).
14
15## Backend Shape
16
17Each job can now maintain metadata under `memory_graph`:
18
19- `nodes`: connected notes with `kind`, `status`, `summary`, `salience`, `confidence`, `tags`, `parent_key`, `links`, and `evidence_refs`.
20- `edges`: typed links between nodes such as `supports`, `replaces`, `raises`, `blocks`, or `depends_on`.
21- Nodes are generic: `episode`, `fact`, `strategy`, `skill`, `question`, `decision`, `constraint`, `artifact`, `source`, `task`, `experiment`, and `milestone`.
22
23The worker gets a compact `Memory graph` prompt section that ranks active, salient, recent, and procedural nodes. It can call `search_memory_graph` when it needs deeper recall. It can call `record_memory_graph` whenever new work should become reusable knowledge.
24
25Operators can inspect the same graph with `nipux memory --graph`, which writes a self-contained clickable HTML artifact. The view uses a local canvas renderer, needs no external network assets, and lets the operator rotate, zoom, search, and click nodes to inspect summaries, evidence refs, tags, and links.
26
27The worker also has a generic consolidation guard: once findings, sources, experiments, lessons, resolved tasks, or roadmap milestones accumulate faster than graph nodes and links, more branch churn is blocked until the worker calls `record_memory_graph` or records why the current branch has no reusable memory value.
28
29## Live Model Smoke
30
31Use `scripts/live_memory_graph_smoke.py` to verify a real OpenAI-compatible model can follow the graph-consolidation contract. The script creates a temporary Nipux home, disables side-effect tools, seeds generic durable job state, and runs a few worker turns. It succeeds only after the model calls `record_memory_graph` and creates at least one node.
32
33Example:
34
35```bash
36OPENROUTER_API_KEY = <redacted>
37```
38
39The key is read from the configured environment variable and is never printed. If no key is present, the script exits before making a network request.
40
41Latest smoke result:
42
43- Model: `qwen/qwen3.6-27b`
44- Provider path: OpenAI-compatible chat completions through OpenRouter
45- Isolation: temporary Nipux home with browser, web, shell, and file tools disabled
46- Result: first worker step called `record_memory_graph`
47- Graph written: 7 nodes and 8 edges
48
49## Why This Should Improve Long Runs
50
51- Raw history stays available in events/artifacts, but the model sees a compact graph slice.
52- Bad or small models get explicit, typed memory instead of relying on implicit recap.
53- Repeated branches can be deprecated instead of merely summarized.
54- Useful strategies and skills can compound across hundreds or thousands of actions.
55- Open questions remain visible as first-class nodes, making it harder for the worker to drift away from unresolved blockers.
56
57## Next Backend Slices
58
59- Add periodic deterministic consolidation that proposes graph nodes from recent events when the model fails to do it.
60- Tune graph-aware stagnation checks from real runs: if a branch has no new node, edge, validation, experiment, or deliverable after a budget, force consolidation or branch rejection.
61- Add better retrieval scoring using local embeddings when available, while keeping lexical fallback mandatory.
62- Add live UI/status counters for memory graph growth: new nodes, active questions, deprecated paths, and current strategy.
docs/pi-agent-core-port-plan.md 267 lines
1# Pi Agent Core Port Plan
2
3Research date: 2026-04-30
4
5Sources:
6- https://github.com/badlogic/pi-mono
7- https://github.com/badlogic/pi-mono/tree/main/packages/agent
8- https://github.com/badlogic/pi-mono/tree/main/packages/coding-agent
9- https://github.com/badlogic/pi-mono/blob/main/packages/coding-agent/docs/session.md
10- https://github.com/badlogic/pi-mono/blob/main/packages/coding-agent/docs/sdk.md
11- https://github.com/badlogic/pi-mono/blob/main/packages/coding-agent/docs/rpc.md
12
13Pi is MIT licensed, so direct adaptation is allowed if we preserve attribution
14where substantial code is ported. The right move is not to copy the full
15TypeScript app into Nipux. The right move is to port the small generic runtime
16ideas from `packages/agent` and keep Nipux's SQLite daemon, tools, and
17multi-job persistence.
18
19## What Pi Does Better
20
21Pi's core is a stateful agent loop, not a "one next action" prompt wrapper.
22The important files are:
23
24- `packages/agent/src/agent-loop.ts`
25- `packages/agent/src/agent.ts`
26- `packages/agent/src/types.ts`
27- `packages/coding-agent/src/core/session-manager.ts`
28- `packages/coding-agent/src/core/agent-session.ts`
29- `packages/coding-agent/src/core/compaction/*`
30
31Key behaviors to port:
32
331. Evented loop as the runtime contract.
34 Pi emits `agent_start`, `turn_start`, `message_start`, `message_update`,
35 `message_end`, `tool_execution_start`, `tool_execution_update`,
36 `tool_execution_end`, `turn_end`, and `agent_end`. Nipux has events, but
37 currently treats a daemon step as the main unit. We should make these events
38 first-class and derive UI/status from them.
39
402. Real transcript state.
41 Pi keeps `AgentMessage[]` as state and converts it to LLM messages only at
42 the model boundary. Nipux currently rebuilds each prompt from job metadata,
43 memory, recent steps, and ledgers. That works, but it loses the clean
44 distinction between visible transcript, UI-only records, and model context.
45
463. Context transform boundary.
47 Pi uses `transformContext(messages)` before `convertToLlm(messages)`. This is
48 the exact place Nipux should inject durable operator context, compact memory,
49 task contracts, ledgers, and active constraints without polluting raw history.
50
514. Steering and follow-up queues.
52 Pi splits queued user input into:
53 - `steer`: delivered after current tool execution and before the next model turn.
54 - `followUp`: delivered only when the agent would otherwise stop.
55 Nipux already has `steer` and `follow_up` metadata, but delivery is bolted
56 onto single steps. This should move into the agent core.
57
585. Hookable tool preflight and postprocessing.
59 Pi has `beforeToolCall` and `afterToolCall`. Nipux has good generic guards,
60 but they live inside `worker.py`. They should become hooks:
61 - duplicate/repetition guard
62 - artifact obligation guard
63 - measurement obligation guard
64 - source quality guard
65 - experiment accounting after measurable shell output
66
676. Tool batch semantics.
68 Pi can prepare tool calls sequentially, execute safe tools in parallel, and
69 still persist tool-result messages in assistant order. Nipux currently
70 executes only the first tool call. That leaves useful model intent unused.
71
727. Compaction as session structure, not just memory refresh.
73 Pi stores compaction entries in the session tree. Full history remains, but
74 future model context sees a summary plus kept recent messages. Nipux has
75 `memory_index`, but it should add explicit compaction entries tied to the
76 transcript path.
77
788. Continue semantics.
79 Pi has `continue()` for retries after errors or compaction. Nipux currently
80 creates a new run every step. Continue semantics would make recovery cleaner
81 after model errors, context overflow, daemon restart, and queued messages.
82
83## Nipux Mapping
84
85Current Nipux files:
86
87- `nipux_cli/worker.py`: prompt building, step execution, guards, reflection.
88- `nipux_cli/daemon.py`: forever loop, lock, heartbeat, multi-job scheduling.
89- `nipux_cli/db.py`: SQLite state, events, job metadata, ledgers.
90- `nipux_cli/operator_context.py`: durable operator message filtering.
91- `nipux_cli/tools.py`: tool registry and tool execution.
92- `nipux_cli/compression.py`: compact memory refresh.
93- `nipux_cli/cli.py`: chat/TUI/status/output rendering.
94
95Target files:
96
97- Add `nipux_cli/agent_core.py`
98 - Python port of Pi's small `Agent`, `PendingMessageQueue`, event types,
99 tool result types, and loop control.
100 - Keep attribution header because it is directly inspired by Pi's MIT code.
101 - Support non-streaming model responses first, then streaming later.
102
103- Add `nipux_cli/session.py`
104 - Load/save transcript entries for a job.
105 - Build current session context from entries plus compaction records.
106 - Keep SQLite as source of truth instead of JSONL files, but use Pi's entry
107 shape: `message`, `compaction`, `branch_summary`, `custom`,
108 `custom_message`, `model_change`, and `label`.
109
110- Refactor `nipux_cli/worker.py`
111 - Move prompt assembly into `transform_context`.
112 - Move `_blocked_tool_call_result` into `before_tool_call`.
113 - Move measurement/source/artifact side effects into `after_tool_call`.
114 - Replace "only first tool call" execution with core loop tool execution.
115 - Preserve one bounded heartbeat by limiting wall-clock/tool budget per daemon
116 tick, not by discarding the agent loop structure.
117
118- Extend `nipux_cli/db.py`
119 - Add a `session_entries` table:
120 - `id`
121 - `job_id`
122 - `parent_id`
123 - `entry_type`
124 - `created_at`
125 - `payload_json`
126 - Add `job_session_state` metadata for current leaf and compaction stats.
127 - Backfill existing `events`/`steps` into session view lazily.
128
129- Keep `nipux_cli/daemon.py`
130 - Do not replace the daemon. Pi is mostly single-session interactive; Nipux
131 needs multi-job background scheduling.
132 - The daemon should call `AgentSession.continue_or_prompt()` for whichever job
133 is runnable, then keep heartbeating while the agent loop emits events.
134
135## Implementation Sequence
136
137### Commit 1: Agent Core Skeleton
138
139Create `agent_core.py` with:
140
141- `AgentMessage`
142- `AgentToolCall`
143- `AgentToolResult`
144- `AgentEvent`
145- `PendingMessageQueue`
146- `AgentState`
147- `Agent`
148
149Support:
150
151- `prompt(messages)`
152- `continue_()`
153- `steer(message)`
154- `follow_up(message)`
155- `abort()`
156- `wait_for_idle()`
157- event subscription
158- sequential tool execution only
159- `before_tool_call`
160- `after_tool_call`
161- `transform_context`
162- `convert_to_llm`
163
164Tests:
165
166- event order matches Pi's documented event order
167- steering is delivered after tool execution
168- follow-up waits until no tool calls remain
169- prompt/continue reject concurrent runs
170- tool errors become tool result messages instead of crashing the loop
171
172### Commit 2: Session Entries
173
174Add SQLite session entries and a `SessionManager` equivalent.
175
176Tests:
177
178- append messages with parent IDs
179- build context from current leaf
180- compaction summary appears before kept messages
181- full raw history remains queryable
182- branch summaries can be represented even if UI does not expose branching yet
183
184### Commit 3: Worker Integration
185
186Make `run_one_step` use the Pi-style agent loop.
187
188Important constraint:
189
190Nipux should still be generic and background-safe. Do not encode any objective,
191host, model, source, or task domain. The old guards stay generic and move into
192hooks.
193
194Tests:
195
196- model can call multiple tools and all are persisted in order
197- duplicate/measurement/artifact guards block through `before_tool_call`
198- measurable output creates obligations through `after_tool_call`
199- operator steer persists until acknowledged and is injected through the queue
200- follow-up waits behind active branch work
201
202### Commit 4: Compaction
203
204Replace fixed memory refresh as the main context strategy with session
205compaction:
206
207- estimate context from last usage when available
208- truncate tool results for summarization
209- store compaction entries
210- rebuild model context from compaction plus recent path
211- continue after overflow or threshold compaction
212
213Tests:
214
215- long transcript compacts without losing recent messages
216- compaction failure does not crash daemon
217- queued messages survive compaction
218- context overflow retry uses `continue_()`
219
220### Commit 5: UI/Event Stream Cleanup
221
222Make CLI/TUI read the event stream and session entries, not ad hoc step text.
223
224Tests:
225
226- chat shows user/assistant transcript
227- right pane shows job/daemon/session stats
228- activity shows tool start/update/end events
229- history can show full transcript, compacted transcript, and raw events
230
231## Why This Should Fix The Current Failure Mode
232
233Nipux currently has many good guards, but the model is still treated like a
234stateless planner that gets one tool call per daemon step. That encourages
235research churn because the loop boundary is outside the model's natural
236tool-result feedback cycle.
237
238Pi's design keeps the model inside a coherent turn loop:
239
2401. User/operator/context enters as messages.
2412. Assistant proposes tool calls.
2423. Tools execute and return tool-result messages.
2434. The assistant immediately sees those results.
2445. Steering and follow-up are delivered at well-defined boundaries.
2456. Compaction preserves the useful path instead of stuffing every summary into
246 every future prompt.
247
248Porting that structure should make Nipux feel less like a step counter and more
249like an actual long-running agent runtime.
250
251## What Not To Copy
252
253Do not copy Pi's task-specific extension examples into Nipux core.
254
255Do not make the harness depend on Node, Bun, or the Pi TUI.
256
257Do not encode any SSH, model, inference, lead-finding, browser-source, or local
258machine assumptions. Everything here must stay generic:
259
260- transcript
261- events
262- queues
263- tool hooks
264- compaction
265- session state
266- UI rendering over events
267
nipux_cli/__init__.py 11 lines
1"""Minimal daemon-first Nipux runtime.
2
3This package owns the daemon, state store, model adapter, artifact store, and
4fixed tool surface.
5"""
6
7__all__ = [
8 "__version__",
9]
10
11__version__ = "0.1.0"
nipux_cli/__main__.py 9 lines
1"""Run the Nipux CLI with ``python -m nipux_cli``."""
2
3from __future__ import annotations
4
5from nipux_cli.cli import main
6
7
8if __name__ == "__main__":
9 main()
nipux_cli/artifacts.py 138 lines
1"""Artifact file storage for long-running jobs."""
2
3from __future__ import annotations
4
5import hashlib
6import re
7from dataclasses import dataclass
8from pathlib import Path
9from typing import Any
10
11from nipux_cli.db import AgentDB, new_id, utc_now
12
13_SAFE_NAME_RE = re.compile(r"[^A-Za-z0-9._-]+")
14
15
16def safe_filename(value: str, *, default: str = "artifact") -> str:
17 cleaned = _SAFE_NAME_RE.sub("-", value.strip()).strip(".-")
18 return (cleaned or default)[:96]
19
20
21def sha256_text(text: str) -> str:
22 return hashlib.sha256(text.encode("utf-8")).hexdigest()
23
24
25@dataclass(frozen=True)
26class StoredArtifact:
27 id: str
28 path: Path
29 sha256: str
30 title: str | None = None
31 summary: str | None = None
32
33
34class ArtifactStore:
35 def __init__(self, home: str | Path, db: AgentDB | None = None):
36 self.home = Path(home)
37 self.db = db
38 self.home.mkdir(parents=True, exist_ok=True)
39
40 def job_dir(self, job_id: str) -> Path:
41 path = self.home / "jobs" / job_id / "artifacts"
42 path.mkdir(parents=True, exist_ok=True)
43 return path
44
45 def _assert_inside_home(self, path: Path) -> Path:
46 resolved = path.resolve()
47 root = self.home.resolve()
48 try:
49 resolved.relative_to(root)
50 except ValueError as exc:
51 raise ValueError(f"Refusing to read outside agent home: {path}") from exc
52 return resolved
53
54 def write_text(
55 self,
56 *,
57 job_id: str,
58 content: str,
59 title: str | None = None,
60 summary: str | None = None,
61 artifact_type: str = "text",
62 run_id: str | None = None,
63 step_id: str | None = None,
64 metadata: dict[str, Any] | None = None,
65 ) -> StoredArtifact:
66 suffix = "html" if artifact_type == "html" else "md" if artifact_type in {"digest", "markdown", "text"} else "txt"
67 stem = safe_filename(title or artifact_type)
68 timestamp = utc_now().replace("+00:00", "Z").replace(":", "")
69 filename = f"{timestamp}-{stem}-{new_id('file')}.{suffix}"
70 path = self.job_dir(job_id) / filename
71 path.write_text(content, encoding="utf-8")
72 digest = sha256_text(content)
73 artifact_id = new_id("art")
74 if self.db is not None:
75 artifact_id = self.db.add_artifact(
76 job_id=job_id,
77 run_id=run_id,
78 step_id=step_id,
79 path=path,
80 sha256=digest,
81 artifact_type=artifact_type,
82 title=title,
83 summary=summary,
84 metadata=metadata,
85 )
86 return StoredArtifact(id=artifact_id, path=path, sha256=digest, title=title, summary=summary)
87
88 def read_text(self, artifact_id_or_path: str) -> str:
89 path = Path(artifact_id_or_path)
90 if self.db is not None and not path.exists():
91 path = Path(self.db.get_artifact(artifact_id_or_path)["path"])
92 safe_path = self._assert_inside_home(path)
93 return safe_path.read_text(encoding="utf-8")
94
95 def search_text(self, *, job_id: str, query: str, limit: int = 10) -> list[dict[str, Any]]:
96 if self.db is None:
97 return []
98 query_lower = query.lower().strip()
99 results: list[dict[str, Any]] = []
100 for artifact in self.db.list_artifacts(job_id, limit=250):
101 haystack = " ".join(
102 str(artifact.get(key) or "") for key in ("title", "summary", "type")
103 ).lower()
104 content = ""
105 if query_lower and query_lower not in haystack:
106 try:
107 content = self.read_text(artifact["id"])
108 except OSError:
109 content = ""
110 if query_lower not in content.lower():
111 continue
112 elif not query_lower:
113 try:
114 content = self.read_text(artifact["id"])
115 except OSError:
116 content = ""
117 if not content:
118 try:
119 content = self.read_text(artifact["id"])
120 except OSError:
121 content = ""
122 excerpt = content[:500]
123 if query_lower:
124 idx = content.lower().find(query_lower)
125 if idx >= 0:
126 start = max(0, idx - 160)
127 excerpt = content[start:start + 500]
128 results.append({
129 "id": artifact["id"],
130 "title": artifact.get("title"),
131 "type": artifact.get("type"),
132 "path": artifact.get("path"),
133 "summary": artifact.get("summary"),
134 "excerpt": excerpt,
135 })
136 if len(results) >= limit:
137 break
138 return results
nipux_cli/browser.py 189 lines
1"""Small `agent-browser` wrapper for the Nipux runtime."""
2
3from __future__ import annotations
4
5import json
6import os
7import shutil
8import subprocess
9import tempfile
10import hashlib
11from pathlib import Path
12from typing import Any
13
14from nipux_cli.config import AppConfig
15from nipux_cli.source_quality import anti_bot_reason
16
17
18def _find_agent_browser() -> list[str]:
19 direct = shutil.which("agent-browser")
20 if direct:
21 return [direct]
22 if shutil.which("npx"):
23 return ["npx", "--yes", "agent-browser"]
24 raise FileNotFoundError("agent-browser CLI not found. Install with: npm install -g agent-browser && agent-browser install")
25
26
27def _session_name(task_id: str) -> str:
28 safe = "".join(ch if ch.isalnum() or ch in "-_" else "_" for ch in task_id)
29 if len(safe) <= 32:
30 return f"nipux_{safe}"
31 digest = hashlib.sha1(task_id.encode("utf-8")).hexdigest()[:10]
32 return f"nipux_{safe[:20]}_{digest}"
33
34
35def _profile_dir(config: AppConfig, task_id: str) -> Path:
36 return config.runtime.home / "browser-profiles" / _session_name(task_id)
37
38
39def _socket_dir(task_id: str) -> Path:
40 root = Path(os.environ.get("NIPUX_BROWSER_SOCKET_ROOT") or "/tmp")
41 return root / "nipux-ab" / _session_name(task_id)
42
43
44def run_browser_command(
45 config: AppConfig,
46 *,
47 task_id: str,
48 command: str,
49 args: list[str] | None = None,
50 timeout: int = 60,
51) -> dict[str, Any]:
52 args = args or []
53 profile_dir = _profile_dir(config, task_id)
54 profile_dir.mkdir(parents=True, exist_ok=True)
55 cmd = [
56 *_find_agent_browser(),
57 "--session",
58 _session_name(task_id),
59 "--session-name",
60 _session_name(task_id),
61 "--profile",
62 str(profile_dir),
63 "--json",
64 command,
65 *args,
66 ]
67 socket_dir = _socket_dir(task_id)
68 socket_dir.mkdir(parents=True, exist_ok=True)
69 env = {
70 **os.environ,
71 "AGENT_BROWSER_SOCKET_DIR": str(socket_dir),
72 "AGENT_BROWSER_SESSION_NAME": _session_name(task_id),
73 "AGENT_BROWSER_PROFILE": str(profile_dir),
74 }
75
76 with tempfile.TemporaryDirectory(dir=str(socket_dir)) as tmp:
77 stdout_path = Path(tmp) / "stdout"
78 stderr_path = Path(tmp) / "stderr"
79 with stdout_path.open("w", encoding="utf-8") as stdout, stderr_path.open("w", encoding="utf-8") as stderr:
80 proc = subprocess.Popen(cmd, stdin=subprocess.DEVNULL, stdout=stdout, stderr=stderr, env=env)
81 try:
82 proc.wait(timeout=timeout)
83 except subprocess.TimeoutExpired:
84 proc.kill()
85 proc.wait()
86 return {"success": False, "error": f"browser command timed out after {timeout}s"}
87 stdout_text = stdout_path.read_text(encoding="utf-8").strip()
88 stderr_text = stderr_path.read_text(encoding="utf-8").strip()
89
90 if stdout_text:
91 try:
92 result = json.loads(stdout_text)
93 except json.JSONDecodeError:
94 return {"success": False, "error": f"agent-browser returned non-JSON output: {stdout_text[:1000]}"}
95 if isinstance(result, dict):
96 result.setdefault("browser_session", _session_name(task_id))
97 result.setdefault("browser_profile", str(profile_dir))
98 return result
99 return {"success": True, "data": result, "browser_session": _session_name(task_id), "browser_profile": str(profile_dir)}
100 if proc.returncode != 0:
101 return {
102 "success": False,
103 "error": stderr_text or f"agent-browser exited {proc.returncode}",
104 "browser_session": _session_name(task_id),
105 "browser_profile": str(profile_dir),
106 }
107 return {"success": True, "data": {}, "browser_session": _session_name(task_id), "browser_profile": str(profile_dir)}
108
109
110def navigate(config: AppConfig, *, task_id: str, url: str) -> dict[str, Any]:
111 result = run_browser_command(config, task_id=task_id, command="open", args=[url], timeout=90)
112 if not result.get("success"):
113 return result
114 snapshot = run_browser_command(config, task_id=task_id, command="snapshot", args=["-c"], timeout=30)
115 if snapshot.get("success"):
116 result["snapshot"] = snapshot.get("data", {}).get("snapshot", "")
117 result["refs"] = snapshot.get("data", {}).get("refs", {})
118 return _annotate_source_quality(result)
119
120
121def snapshot(config: AppConfig, *, task_id: str, full: bool = False) -> dict[str, Any]:
122 return _annotate_source_quality(run_browser_command(config, task_id=task_id, command="snapshot", args=[] if full else ["-c"]))
123
124
125def click(config: AppConfig, *, task_id: str, ref: str) -> dict[str, Any]:
126 result = run_browser_command(config, task_id=task_id, command="click", args=[ref if ref.startswith("@") else f"@{ref}"])
127 return _with_recovery_snapshot(config, task_id=task_id, result=result)
128
129
130def fill(config: AppConfig, *, task_id: str, ref: str, text: str) -> dict[str, Any]:
131 result = run_browser_command(config, task_id=task_id, command="fill", args=[ref if ref.startswith("@") else f"@{ref}", text])
132 return _with_recovery_snapshot(config, task_id=task_id, result=result)
133
134
135def scroll(config: AppConfig, *, task_id: str, direction: str) -> dict[str, Any]:
136 return run_browser_command(config, task_id=task_id, command="scroll", args=[direction, "500"])
137
138
139def back(config: AppConfig, *, task_id: str) -> dict[str, Any]:
140 return run_browser_command(config, task_id=task_id, command="back")
141
142
143def press(config: AppConfig, *, task_id: str, key: str) -> dict[str, Any]:
144 return run_browser_command(config, task_id=task_id, command="press", args=[key])
145
146
147def console(config: AppConfig, *, task_id: str, clear: bool = False, expression: str | None = None) -> dict[str, Any]:
148 if expression is not None:
149 return run_browser_command(config, task_id=task_id, command="eval", args=[expression])
150 args = ["--clear"] if clear else []
151 console_result = run_browser_command(config, task_id=task_id, command="console", args=args)
152 errors_result = run_browser_command(config, task_id=task_id, command="errors", args=args)
153 return {
154 "success": bool(console_result.get("success") or errors_result.get("success")),
155 "console": console_result,
156 "errors": errors_result,
157 }
158
159
160def _annotate_source_quality(result: dict[str, Any]) -> dict[str, Any]:
161 data = result.get("data") if isinstance(result.get("data"), dict) else {}
162 reason = anti_bot_reason(
163 str(data.get("title") or ""),
164 str(data.get("url") or data.get("origin") or ""),
165 str(result.get("snapshot") or data.get("snapshot") or ""),
166 )
167 if not reason:
168 return result
169 result["source_warning"] = reason
170 warnings = result.get("warnings") if isinstance(result.get("warnings"), list) else []
171 warnings.append({
172 "type": "anti_bot",
173 "message": reason,
174 "guidance": "This page may require normal human browser verification. Do not bypass protections; continue only with visible browser actions or choose another source if stuck.",
175 })
176 result["warnings"] = warnings
177 return result
178
179
180def _with_recovery_snapshot(config: AppConfig, *, task_id: str, result: dict[str, Any]) -> dict[str, Any]:
181 if result.get("success", True):
182 return result
183 error = str(result.get("error") or "")
184 if "unknown ref" not in error.lower():
185 return result
186 recovery = run_browser_command(config, task_id=task_id, command="snapshot", args=["-c"], timeout=30)
187 result["recovery_guidance"] = "The ref was stale or missing. Use refs from recovery_snapshot before clicking or typing again."
188 result["recovery_snapshot"] = _annotate_source_quality(recovery)
189 return result
nipux_cli/chat_commands.py 286 lines
1"""Slash-command dispatch for focused chat sessions."""
2
3from __future__ import annotations
4
5import argparse
6from dataclasses import dataclass
7from typing import Any, Callable
8
9from nipux_cli.config import DEFAULT_CONTEXT_LENGTH
10from nipux_cli.tui_style import _one_line
11
12
13@dataclass(frozen=True)
14class ChatCommandDeps:
15 db_factory: Callable[[], tuple[Any, Any]]
16 jobs: Callable[[argparse.Namespace], None]
17 history: Callable[[argparse.Namespace], None]
18 events: Callable[[argparse.Namespace], None]
19 logs: Callable[[argparse.Namespace], None]
20 updates: Callable[[argparse.Namespace], None]
21 artifacts: Callable[[argparse.Namespace], None]
22 artifact: Callable[[argparse.Namespace], None]
23 lessons: Callable[[argparse.Namespace], None]
24 findings: Callable[[argparse.Namespace], None]
25 tasks: Callable[[argparse.Namespace], None]
26 roadmap: Callable[[argparse.Namespace], None]
27 experiments: Callable[[argparse.Namespace], None]
28 sources: Callable[[argparse.Namespace], None]
29 memory: Callable[[argparse.Namespace], None]
30 metrics: Callable[[argparse.Namespace], None]
31 activity: Callable[[argparse.Namespace], None]
32 digest: Callable[[argparse.Namespace], None]
33 status: Callable[[argparse.Namespace], None]
34 usage: Callable[[argparse.Namespace], None]
35 handle_setting: Callable[[str, list[str]], bool]
36 doctor: Callable[[argparse.Namespace], None]
37 init: Callable[[argparse.Namespace], None]
38 health: Callable[[argparse.Namespace], None]
39 start: Callable[[argparse.Namespace], None]
40 ensure_job_runnable: Callable[[Any, str], None]
41 run: Callable[[argparse.Namespace], None]
42 restart: Callable[[argparse.Namespace], None]
43 work: Callable[[argparse.Namespace], None]
44 pause: Callable[[argparse.Namespace], None]
45 resume: Callable[[argparse.Namespace], None]
46 cancel: Callable[[argparse.Namespace], None]
47 queue_note: Callable[..., None]
48 create_job: Callable[..., tuple[str, str]]
49 focus: Callable[[argparse.Namespace], None]
50 delete: Callable[[argparse.Namespace], None]
51
52
53def handle_chat_slash_command(job_id: str, command: str, rest: list[str], *, deps: ChatCommandDeps) -> bool:
54 if command in {"jobs", "ls"}:
55 deps.jobs(argparse.Namespace())
56 return True
57 if command == "history":
58 deps.history(
59 argparse.Namespace(
60 job_id=job_id,
61 limit=_optional_int(rest, default=40),
62 chars=220,
63 full=False,
64 json=False,
65 )
66 )
67 return True
68 if command == "events":
69 deps.events(
70 argparse.Namespace(
71 job_id=job_id,
72 limit=_optional_int(rest, default=40),
73 chars=220,
74 full=False,
75 json=False,
76 follow=False,
77 interval=2.0,
78 )
79 )
80 return True
81 if command == "outputs":
82 deps.logs(
83 argparse.Namespace(
84 job_id=[job_id],
85 limit=_optional_int(rest, default=25),
86 verbose=False,
87 chars=260,
88 )
89 )
90 return True
91 if command in {"updates", "outcomes", "outcome"}:
92 all_jobs = bool(rest and rest[0].lower() == "all")
93 deps.updates(argparse.Namespace(job_id=job_id, all=all_jobs, limit=5, chars=180, paths=False))
94 return True
95 if command == "artifacts":
96 deps.artifacts(argparse.Namespace(job_id=job_id, limit=10, chars=220, paths=False))
97 return True
98 if command == "artifact":
99 query = " ".join(rest).strip()
100 if not query:
101 print("usage: /artifact QUERY_OR_ID")
102 return True
103 deps.artifact(argparse.Namespace(artifact_id_or_path=[query], job_id=job_id, chars=12000))
104 return True
105 if command == "lessons":
106 deps.lessons(argparse.Namespace(job_id=job_id, limit=10, chars=220))
107 return True
108 if command == "findings":
109 deps.findings(argparse.Namespace(job_id=job_id, limit=20, chars=220, json=False))
110 return True
111 if command == "tasks":
112 deps.tasks(argparse.Namespace(job_id=job_id, limit=20, chars=220, status=None, json=False))
113 return True
114 if command == "roadmap":
115 deps.roadmap(argparse.Namespace(job_id=job_id, limit=20, features=3, chars=220, json=False))
116 return True
117 if command == "experiments":
118 deps.experiments(argparse.Namespace(job_id=job_id, limit=20, chars=220, status=None, json=False))
119 return True
120 if command == "sources":
121 deps.sources(argparse.Namespace(job_id=job_id, limit=20, chars=220, json=False))
122 return True
123 if command == "memory":
124 deps.memory(
125 argparse.Namespace(
126 job_id=job_id,
127 limit=10,
128 chars=220,
129 json=False,
130 graph=bool(rest and rest[0].lower() in {"graph", "view", "html"}),
131 output=None,
132 )
133 )
134 return True
135 if command == "metrics":
136 deps.metrics(argparse.Namespace(job_id=job_id, chars=220))
137 return True
138 if command == "learn":
139 lesson = " ".join(rest).strip()
140 if not lesson:
141 print("usage: /learn LESSON")
142 return True
143 db, _config = deps.db_factory()
144 try:
145 entry = db.append_lesson(job_id, lesson, category="operator_preference", metadata={"source": "chat"})
146 job = db.get_job(job_id)
147 print(f"learned for {job['title']}: {_one_line(entry['lesson'], 220)}")
148 finally:
149 db.close()
150 return True
151 if command == "activity":
152 deps.activity(
153 argparse.Namespace(job_id=job_id, limit=20, chars=180, follow=False, interval=2.0, verbose=False, paths=False)
154 )
155 return True
156 if command == "digest":
157 deps.digest(argparse.Namespace(job_id=[job_id]))
158 return True
159 if command == "status":
160 deps.status(argparse.Namespace(job_id=job_id, limit=8, chars=180, full=False, json=False))
161 return True
162 if command == "usage":
163 deps.usage(argparse.Namespace(job_id=job_id, json=False))
164 return True
165 if command == "settings":
166 deps.handle_setting("config", [])
167 return True
168 if deps.handle_setting(command, rest):
169 return True
170 if command == "doctor":
171 try:
172 deps.doctor(argparse.Namespace(check_model=True))
173 except SystemExit:
174 pass
175 return True
176 if command == "init":
177 deps.init(
178 argparse.Namespace(
179 path=None,
180 force=False,
181 model=None,
182 base_url=None,
183 api_key_env=None,
184 openrouter=False,
185 context_length=DEFAULT_CONTEXT_LENGTH,
186 )
187 )
188 return True
189 if command == "health":
190 deps.health(argparse.Namespace(limit=8, chars=180))
191 return True
192 if command == "start":
193 deps.start(argparse.Namespace(poll_seconds=0.0, fake=False, quiet=False, log_file=None))
194 return True
195 if command == "run":
196 db, _config = deps.db_factory()
197 try:
198 deps.ensure_job_runnable(db, job_id)
199 finally:
200 db.close()
201 deps.run(
202 argparse.Namespace(
203 job_id=job_id,
204 poll_seconds=0.0,
205 interval=2.0,
206 limit=20,
207 chars=180,
208 verbose=False,
209 paths=False,
210 fake=False,
211 quiet=False,
212 log_file=None,
213 no_follow=True,
214 )
215 )
216 return True
217 if command == "restart":
218 deps.restart(argparse.Namespace(poll_seconds=0.0, wait=5.0, fake=False, quiet=False, log_file=None))
219 return True
220 if command in {"work", "work-verbose"}:
221 deps.work(
222 argparse.Namespace(
223 job_id=job_id,
224 steps=_optional_int(rest, default=1),
225 poll_seconds=0.5,
226 fake=False,
227 verbose=command == "work-verbose",
228 dashboard=False,
229 limit=12,
230 chars=260 if command == "work" else 4000,
231 continue_on_error=False,
232 )
233 )
234 return True
235 if command in {"pause", "stop"}:
236 deps.pause(argparse.Namespace(job_id=job_id, note=rest))
237 return True
238 if command == "resume":
239 deps.resume(argparse.Namespace(job_id=job_id))
240 return True
241 if command == "cancel":
242 deps.cancel(argparse.Namespace(job_id=job_id, note=rest))
243 return True
244 if command == "note":
245 message = " ".join(rest).strip()
246 if not message:
247 print("usage: /note MESSAGE")
248 return True
249 deps.queue_note(job_id, message, mode="note")
250 return True
251 if command == "follow":
252 message = " ".join(rest).strip()
253 if not message:
254 print("usage: /follow MESSAGE")
255 return True
256 deps.queue_note(job_id, message, mode="follow_up")
257 return True
258 if command == "new":
259 objective = " ".join(rest).strip()
260 if not objective:
261 print("usage: /new OBJECTIVE")
262 return True
263 _created_id, title = deps.create_job(objective=objective, title=None, kind="generic", cadence=None)
264 print(f"created {title}")
265 started = deps.start(argparse.Namespace(poll_seconds=0.0, fake=False, quiet=True, log_file=None))
266 if started is False:
267 print(f"focus set to {title}; worker is waiting for a working model.")
268 else:
269 print(f"focus set to {title}; initial plan accepted and worker started.")
270 return True
271 if command in {"focus", "switch"}:
272 if not " ".join(rest).strip():
273 deps.focus(argparse.Namespace(query=[]))
274 return True
275 deps.focus(argparse.Namespace(query=rest))
276 return True
277 if command == "delete":
278 target = rest if rest else [job_id]
279 deps.delete(argparse.Namespace(job_id=target, keep_files=False))
280 return bool(rest)
281 print(f"unknown chat command: /{command}")
282 return True
283
284
285def _optional_int(values: list[str], *, default: int) -> int:
286 return int(values[0]) if values and values[0].isdigit() else default
nipux_cli/chat_context.py 223 lines
1"""Prompt context builder for the Nipux chat-side controller model."""
2
3from __future__ import annotations
4
5from typing import Any
6
7from nipux_cli.db import AgentDB
8from nipux_cli.event_render import event_line
9from nipux_cli.metric_format import format_metric_value
10from nipux_cli.operator_context import active_prompt_operator_entries
11from nipux_cli.tui_event_format import clean_step_summary
12from nipux_cli.tui_outcomes import (
13 SUMMARY_EVENT_TYPES,
14 SUMMARY_TOOL_EVENT_TYPES,
15 hourly_outcome_summary,
16 is_summary_event_candidate,
17 model_update_event_parts,
18 outcome_counts,
19)
20
21
22def build_chat_messages(db: AgentDB, job: dict[str, Any], message: str) -> list[dict[str, str]]:
23 """Build bounded visible-state context for conversational job control."""
24
25 steps = db.list_steps(job_id=job["id"])[-10:]
26 jobs = db.list_jobs()[:12]
27 artifacts = db.list_artifacts(job["id"], limit=5)
28 timeline_events = db.list_timeline_events(job["id"], limit=18)
29 outcome_events = _durable_outcome_events(db, job["id"])
30 metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
31 operator_messages = metadata.get("operator_messages") if isinstance(metadata.get("operator_messages"), list) else []
32 agent_updates = metadata.get("agent_updates") if isinstance(metadata.get("agent_updates"), list) else []
33 lessons = metadata.get("lessons") if isinstance(metadata.get("lessons"), list) else []
34 findings = metadata.get("finding_ledger") if isinstance(metadata.get("finding_ledger"), list) else []
35 sources = metadata.get("source_ledger") if isinstance(metadata.get("source_ledger"), list) else []
36 tasks = metadata.get("task_queue") if isinstance(metadata.get("task_queue"), list) else []
37 experiments = metadata.get("experiment_ledger") if isinstance(metadata.get("experiment_ledger"), list) else []
38 roadmap = metadata.get("roadmap") if isinstance(metadata.get("roadmap"), dict) else {}
39
40 step_lines = "\n".join(
41 f"- #{step['step_no']} {step['status']} {step.get('tool_name') or step['kind']}: "
42 f"{clean_step_summary(step.get('summary') or step.get('error') or '')}"
43 for step in steps
44 )
45 artifact_lines = "\n".join(
46 f"- #{index} {artifact.get('title') or artifact['id']}: {artifact.get('summary') or ''} "
47 f"(view with /artifact {index})"
48 for index, artifact in enumerate(artifacts, start=1)
49 )
50 steering_lines = "\n".join(
51 f"- {entry.get('source', 'operator')} {entry.get('mode', 'steer')}: {entry.get('message', '')}"
52 for entry in active_prompt_operator_entries(operator_messages)[-6:]
53 if isinstance(entry, dict)
54 )
55 update_lines = "\n".join(
56 f"- {entry.get('category', 'progress')}: {entry.get('message', '')}"
57 for entry in agent_updates[-5:]
58 if isinstance(entry, dict)
59 )
60 lesson_lines = "\n".join(
61 f"- {entry.get('category', 'memory')}: {entry.get('lesson', '')}"
62 for entry in lessons[-8:]
63 if isinstance(entry, dict)
64 )
65 finding_lines = "\n".join(
66 f"- {entry.get('name')}: {entry.get('category') or ''} {entry.get('location') or ''} score={entry.get('score')}"
67 for entry in findings[-8:]
68 if isinstance(entry, dict)
69 )
70 task_lines = "\n".join(
71 f"- {entry.get('status') or 'open'} p={entry.get('priority') or 0}: {entry.get('title')}"
72 for entry in tasks[-10:]
73 if isinstance(entry, dict)
74 )
75 milestone_lines = _roadmap_lines(roadmap)
76 experiment_lines = "\n".join(_experiment_line(entry) for entry in experiments[-10:] if isinstance(entry, dict))
77 source_lines = "\n".join(
78 f"- {entry.get('source')}: score={entry.get('usefulness_score')} "
79 f"findings={entry.get('yield_count') or 0} outcome={entry.get('last_outcome') or ''}"
80 for entry in sources[-8:]
81 if isinstance(entry, dict)
82 )
83 timeline_lines = "\n".join(event_line(event, chars=700) for event in timeline_events[-12:])
84
85 sections = {
86 "Jobs": _clip_chat_context(_job_list_lines(jobs, focused_job_id=job["id"]), 1_300),
87 "Durable outcomes": _clip_chat_context(_durable_outcome_lines(outcome_events), 1_600),
88 "Recent tool calls": _clip_chat_context(step_lines, 1_800),
89 "Latest artifacts": _clip_chat_context(artifact_lines, 1_200),
90 "Finding ledger": _clip_chat_context(finding_lines, 1_200),
91 "Task queue": _clip_chat_context(task_lines, 1_300),
92 "Roadmap": _clip_chat_context(milestone_lines, 1_200),
93 "Experiment ledger": _clip_chat_context(experiment_lines, 1_300),
94 "Source ledger": _clip_chat_context(source_lines, 1_100),
95 "Lessons learned": _clip_chat_context(lesson_lines, 1_000),
96 "Recent operator steering": _clip_chat_context(steering_lines, 1_200),
97 "Recent agent notes": _clip_chat_context(update_lines, 1_200),
98 "Recent visible timeline": _clip_chat_context(timeline_lines, 1_800),
99 }
100 section_text = "\n\n".join(f"{title}:\n{body or _empty_section_text(title)}" for title, body in sections.items())
101 return [
102 {
103 "role": "system",
104 "content": (
105 "You are Nipux, the chat model that controls a generic long-running agent workspace. "
106 "You know the visible CLI state, focused job, job list, task queue, artifacts, memory, metrics, and recent activity. "
107 "Answer directly from the visible job state. Do not claim hidden chain-of-thought. "
108 "If the operator asks for work to be done, explain the concrete job/control action Nipux will take or how to run it from the Jobs/Status panel. "
109 "If the operator asks where saved work is, explain that artifacts and history are visible from the Jobs/Status panel or direct CLI commands. "
110 "Do not start replies with an introduction. Keep replies concise and useful."
111 ),
112 },
113 {
114 "role": "user",
115 "content": (
116 f"Job title: {job['title']}\n"
117 f"Job status: {job['status']}\n"
118 f"Kind: {job['kind']}\n"
119 f"Objective: {job['objective']}\n\n"
120 f"{section_text}\n\n"
121 f"Operator message:\n{message}"
122 ),
123 },
124 ]
125
126
127def _durable_outcome_events(db: AgentDB, job_id: str) -> list[dict[str, Any]]:
128 durable_events = db.list_events(job_id=job_id, limit=160, event_types=SUMMARY_EVENT_TYPES)
129 tool_events = [
130 event
131 for event in db.list_events(job_id=job_id, limit=80, event_types=SUMMARY_TOOL_EVENT_TYPES)
132 if is_summary_event_candidate(event)
133 ]
134 merged: dict[str, dict[str, Any]] = {}
135 for event in [*durable_events, *tool_events]:
136 event_id = str(event.get("id") or "")
137 key = event_id or f"{event.get('created_at')}-{event.get('event_type')}-{event.get('title')}-{len(merged)}"
138 merged[key] = event
139 return sorted(merged.values(), key=lambda event: (str(event.get("created_at") or ""), str(event.get("id") or "")))
140
141
142def _durable_outcome_lines(events: list[dict[str, Any]]) -> str:
143 if not events:
144 return ""
145 counts = outcome_counts(events, include_research=True, include_failures=True)
146 lines = [f"- summary: {hourly_outcome_summary(counts)}"]
147 seen: set[str] = set()
148 for event in reversed(events):
149 parsed = model_update_event_parts(event, width=240, compact=False)
150 if not parsed:
151 continue
152 label, text, _clock = parsed
153 if label in {"DONE", "PLAN", "UPDATE"}:
154 continue
155 key = f"{label}:{text}"
156 if key in seen:
157 continue
158 seen.add(key)
159 lines.append(f"- {label.lower()}: {text}")
160 if len(lines) >= 9:
161 break
162 return "\n".join(lines)
163
164
165def _job_list_lines(jobs: list[dict[str, Any]], *, focused_job_id: str) -> str:
166 lines: list[str] = []
167 for index, entry in enumerate(jobs, start=1):
168 marker = "*" if str(entry.get("id") or "") == focused_job_id else "-"
169 title = entry.get("title") or entry.get("id") or "untitled"
170 objective = " ".join(str(entry.get("objective") or "").split())
171 if len(objective) > 120:
172 objective = objective[:119].rstrip() + "..."
173 lines.append(
174 f"{marker} {index}. {title} status={entry.get('status') or 'unknown'} "
175 f"kind={entry.get('kind') or 'generic'} objective={objective}"
176 )
177 return "\n".join(lines)
178
179
180def _roadmap_lines(roadmap: dict[str, Any]) -> str:
181 if not roadmap:
182 return ""
183 milestones = roadmap.get("milestones") if isinstance(roadmap.get("milestones"), list) else []
184 body = "\n".join(
185 (
186 f"- {entry.get('status') or 'planned'} validation={entry.get('validation_status') or 'not_started'} "
187 f"p={entry.get('priority') or 0}: {entry.get('title')}"
188 )
189 for entry in milestones[-8:]
190 if isinstance(entry, dict)
191 )
192 header = (
193 f"{roadmap.get('status') or 'planned'}: {roadmap.get('title') or 'Roadmap'}"
194 + (f" current={roadmap.get('current_milestone')}" if roadmap.get("current_milestone") else "")
195 )
196 return f"{header}\n{body}".strip()
197
198
199def _experiment_line(entry: dict[str, Any]) -> str:
200 if entry.get("metric_value") is None:
201 return f"- {entry.get('status') or 'planned'}: {entry.get('title')}"
202 metric = format_metric_value(
203 entry.get("metric_name") or "metric",
204 entry.get("metric_value"),
205 entry.get("metric_unit") or "",
206 )
207 return (
208 f"- {entry.get('status') or 'planned'}: {entry.get('title')}"
209 f" {metric}"
210 f"{' best' if entry.get('best_observed') else ''}"
211 )
212
213
214def _empty_section_text(title: str) -> str:
215 return "None." if title.startswith("Recent operator") else "None yet."
216
217
218def _clip_chat_context(value: str, limit: int) -> str:
219 text = str(value or "")
220 if len(text) <= limit:
221 return text
222 marker = f"\n... clipped {len(text) - limit} chars from this visible state section ..."
223 return text[: max(0, limit - len(marker))].rstrip() + marker
nipux_cli/chat_controller.py 173 lines
1"""Chat-controller behavior shared by the interactive CLI."""
2
3from __future__ import annotations
4
5from dataclasses import dataclass
6from typing import Any, Callable
7
8from nipux_cli.chat_intent import (
9 chat_control_command,
10 extract_job_objective_from_message,
11 message_requests_immediate_run,
12 message_requests_queued_job,
13)
14
15
16@dataclass(frozen=True)
17class ChatControllerDeps:
18 db_factory: Callable[[], tuple[Any, Any]]
19 reply_fn: Callable[[str, str], Any]
20 create_job: Callable[..., tuple[str, str]]
21 write_shell_state: Callable[[dict[str, Any]], None]
22 start_daemon: Callable[..., Any]
23 capture_command: Callable[[str, str], tuple[bool, str]]
24 compact_command_output: Callable[[str], list[str]]
25 friendly_error_text: Callable[[str], str]
26
27
28def handle_chat_message(
29 job_id: str,
30 line: str,
31 *,
32 deps: ChatControllerDeps,
33 reply_fn: Callable[[str, str], Any] | None = None,
34 quiet: bool = False,
35) -> tuple[bool, str]:
36 reply_callable = reply_fn or deps.reply_fn
37 spawned = maybe_spawn_job_from_chat(job_id, line, deps=deps, quiet=quiet)
38 if spawned:
39 return True, spawned
40 controlled = handle_chat_control_intent(job_id, line, deps=deps, quiet=quiet)
41 if controlled is not None:
42 return controlled
43 queue_chat_note(job_id, line, deps=deps, mode="steer", quiet=quiet)
44 try:
45 reply = reply_callable(job_id, line)
46 except Exception as exc:
47 detail = deps.friendly_error_text(f"{type(exc).__name__}: {exc}")
48 message = f"{detail}; message saved for the worker"
49 if not quiet:
50 print(detail)
51 print("Your message is still saved for the next worker step.")
52 return True, message
53 reply_text, reply_metadata = chat_reply_text_and_metadata(reply)
54 if reply_text.strip():
55 db, _config = deps.db_factory()
56 try:
57 if reply_metadata:
58 db.append_event(
59 job_id,
60 event_type="loop",
61 title="message_end",
62 body=reply_text[:1000],
63 metadata={"source": "chat", "tool_calls": [], **reply_metadata},
64 )
65 db.append_agent_update(job_id, reply_text.strip(), category="chat")
66 finally:
67 db.close()
68 if not quiet:
69 print()
70 print(reply_text.strip())
71 print()
72 return True, ""
73 message = "model returned an empty reply; message is queued"
74 if not quiet:
75 print("model returned an empty reply; your message is still queued.")
76 return True, message
77
78
79def chat_reply_text_and_metadata(reply: Any) -> tuple[str, dict[str, Any]]:
80 content = getattr(reply, "content", None)
81 if content is None:
82 return str(reply), {}
83 metadata: dict[str, Any] = {}
84 usage = getattr(reply, "usage", None)
85 if isinstance(usage, dict) and usage:
86 metadata["usage"] = usage
87 model = getattr(reply, "model", "")
88 if model:
89 metadata["model"] = model
90 response_id = getattr(reply, "response_id", "")
91 if response_id:
92 metadata["response_id"] = response_id
93 return str(content), metadata
94
95
96def handle_chat_control_intent(
97 job_id: str,
98 line: str,
99 *,
100 deps: ChatControllerDeps,
101 quiet: bool = False,
102) -> tuple[bool, str] | None:
103 command = chat_control_command(line)
104 if not command:
105 return None
106 keep_running, output = deps.capture_command(job_id, command)
107 compact = deps.compact_command_output(output)
108 message = " | ".join(compact[-4:]) if compact else f"{command.lstrip('/')} done"
109 if not quiet:
110 print(message)
111 return keep_running, message
112
113
114def maybe_spawn_job_from_chat(
115 job_id: str,
116 message: str,
117 *,
118 deps: ChatControllerDeps,
119 quiet: bool = False,
120) -> str:
121 objective = extract_job_objective_from_message(message)
122 if not objective:
123 return ""
124 created_id, title = deps.create_job(objective=objective, title=None, kind="generic", cadence=None)
125 deps.write_shell_state({"focus_job_id": created_id})
126 db, _config = deps.db_factory()
127 try:
128 db.append_operator_message(created_id, message, source="chat", mode="steer")
129 run_now = not message_requests_queued_job(message) or message_requests_immediate_run(message)
130 update = "Created this job from chat and drafted its initial plan."
131 if run_now:
132 update += " Starting the daemon so it can begin work."
133 else:
134 update += " Use the right pane to run it."
135 db.append_agent_update(created_id, update, category="chat")
136 db.append_agent_update(
137 job_id,
138 f"Created job '{title}' from your chat request and switched focus to it.",
139 category="chat",
140 )
141 finally:
142 db.close()
143 run_now = not message_requests_queued_job(message) or message_requests_immediate_run(message)
144 text = f"Created job: {title}. Focus switched to it."
145 if run_now:
146 started = deps.start_daemon(poll_seconds=0.0, quiet=True)
147 if started is False:
148 text += " Worker is waiting for a working model."
149 else:
150 text += " Started worker."
151 if not quiet:
152 print(text)
153 return text
154
155
156def queue_chat_note(
157 job_id: str,
158 message: str,
159 *,
160 deps: ChatControllerDeps,
161 mode: str = "steer",
162 quiet: bool = False,
163) -> None:
164 db, _config = deps.db_factory()
165 try:
166 entry = db.append_operator_message(job_id, message, source="chat", mode=mode)
167 if not quiet:
168 if entry.get("mode") == "follow_up":
169 print(f"waiting after current branch: {entry['message']}")
170 else:
171 print(f"waiting: {entry['message']}")
172 finally:
173 db.close()
nipux_cli/chat_frame_runtime.py 571 lines
1"""Terminal chat-frame runtime helpers."""
2
3from __future__ import annotations
4
5import queue
6import select
7import shutil
8import sys
9import termios
10import threading
11import time
12import tty
13from dataclasses import dataclass
14from typing import Callable
15from typing import Any
16
17from nipux_cli.settings import inline_setting_notice
18from nipux_cli.tui_commands import CHAT_SLASH_COMMANDS, autocomplete_slash, cycle_slash, slash_completion_for_submit
19from nipux_cli.tui_input import (
20 decode_terminal_escape,
21 drain_pending_input,
22 read_escape_sequence,
23 read_terminal_char,
24)
25from nipux_cli.tui_outcomes import CHAT_RIGHT_PAGES
26from nipux_cli.tui_style import _frame_enter_sequence, _frame_exit_sequence, _one_line, _strip_ansi
27
28
29IDLE_REFRESH_SECONDS = 0.75
30ACTIVE_INPUT_REFRESH_SECONDS = 2.0
31THINKING_REFRESH_SECONDS = 0.18
32WORKSPACE_CHAT_ID = "__workspace__"
33THINKING_NOTICE = "__nipux_thinking__"
34THINKING_FRAMES = ("◐ thinking", "◓ thinking", "◑ thinking", "◒ thinking")
35WAITING_NOTICE = "__nipux_waiting__"
36WAITING_FRAMES = ("∙ waiting", "· waiting", "• waiting", "· waiting")
37
38
39@dataclass(frozen=True)
40class ChatFrameDeps:
41 load_snapshot: Callable[[str, int], dict[str, Any]]
42 render_frame: Callable[[dict[str, Any], str, list[str], str, int, str | None, str | None, str], str]
43 handle_chat_message: Callable[[str, str], tuple[bool, str]]
44 capture_chat_command: Callable[[str, str], tuple[bool, str]]
45 write_shell_state: Callable[[dict[str, str]], None]
46 is_plain_chat_line: Callable[[str], bool]
47 page_click: Callable[[int, int, str], str | None]
48
49
50def compact_command_output(output: str) -> list[str]:
51 lines = [" ".join(line.split()) for line in output.splitlines() if line.strip()]
52 compacted: list[str] = []
53 for line in lines:
54 if line.startswith("\033[2J"):
55 continue
56 compacted.append(_one_line(line, 120))
57 return compacted[-8:]
58
59
60def frame_next_job_id(snapshot: dict[str, Any], current_job_id: str, *, direction: int) -> str | None:
61 jobs = snapshot.get("jobs")
62 if not isinstance(jobs, list) or not jobs:
63 return None
64 ids = [str(job.get("id")) for job in jobs if job.get("id")]
65 if not ids:
66 return None
67 try:
68 index = ids.index(str(current_job_id))
69 except ValueError:
70 index = 0
71 return ids[(index + direction) % len(ids)]
72
73
74def next_chat_right_view(current: str, direction: int) -> str:
75 keys = [key for key, _label in CHAT_RIGHT_PAGES]
76 try:
77 index = keys.index(current)
78 except ValueError:
79 index = 0
80 return keys[(index + direction) % len(keys)]
81
82
83def frame_refresh_interval(input_buffer: str, *, thinking: bool = False) -> float:
84 if thinking:
85 return THINKING_REFRESH_SECONDS
86 return ACTIVE_INPUT_REFRESH_SECONDS if input_buffer else IDLE_REFRESH_SECONDS
87
88
89def run_chat_frame(job_id: str, *, history_limit: int, deps: ChatFrameDeps) -> None:
90 if job_id != WORKSPACE_CHAT_ID:
91 deps.write_shell_state({"focus_job_id": job_id})
92 buffer = ""
93 notices: list[str] = []
94 right_view = "updates"
95 modal_view: str | None = None
96 selected_control = 0
97 editing_field: str | None = None
98 async_messages: queue.Queue[str] = queue.Queue()
99 snapshot = deps.load_snapshot(job_id, history_limit)
100 job_id = str(snapshot["job_id"])
101 old_attrs = termios.tcgetattr(sys.stdin)
102 print(_frame_enter_sequence(), end="", flush=True)
103 try:
104 stdin_fd = sys.stdin.fileno()
105 tty.setcbreak(stdin_fd)
106 last_snapshot = 0.0
107 needs_render = True
108 last_frame = ""
109 while True:
110 now = time.monotonic()
111 if _drain_async_notices(async_messages, notices):
112 last_snapshot = 0.0
113 needs_render = True
114 if now - last_snapshot >= frame_refresh_interval(buffer, thinking=_has_active_state_notice(notices)):
115 try:
116 snapshot = deps.load_snapshot(job_id, history_limit)
117 job_id = str(snapshot["job_id"])
118 last_snapshot = now
119 needs_render = True
120 except Exception as exc:
121 _append_notice(notices, f"frame refresh failed: {type(exc).__name__}")
122 if needs_render:
123 selected_control = 0
124 last_frame = _safe_render_frame(
125 deps,
126 snapshot=snapshot,
127 buffer=buffer,
128 notices=notices,
129 right_view=right_view,
130 selected_control=selected_control,
131 editing_field=editing_field,
132 modal_view=modal_view,
133 previous_frame=last_frame,
134 )
135 needs_render = False
136 try:
137 readable, _, _ = select.select([stdin_fd], [], [], 0.05)
138 except OSError as exc:
139 _append_notice(notices, f"terminal read failed: {type(exc).__name__}: {_one_line(exc, 90)}")
140 needs_render = True
141 continue
142 if not readable:
143 continue
144 try:
145 char = read_terminal_char(stdin_fd)
146 except OSError as exc:
147 _append_notice(notices, f"terminal input failed: {type(exc).__name__}: {_one_line(exc, 90)}")
148 needs_render = True
149 continue
150 if editing_field is not None:
151 try:
152 buffer, editing_field, should_exit = _handle_edit_input(
153 char,
154 buffer=buffer,
155 editing_field=editing_field,
156 notices=notices,
157 stdin_fd=stdin_fd,
158 )
159 except Exception as exc:
160 buffer = ""
161 editing_field = None
162 _append_notice(notices, f"edit failed: {type(exc).__name__}: {_one_line(exc, 90)}")
163 needs_render = True
164 continue
165 if should_exit:
166 return
167 needs_render = True
168 continue
169 if char in {"\r", "\n"}:
170 buffer, should_submit = slash_completion_for_submit(buffer, CHAT_SLASH_COMMANDS)
171 if not should_submit:
172 needs_render = True
173 continue
174 try:
175 keep_running, snapshot, job_id, notices, right_view, modal_view = _handle_chat_submit(
176 buffer,
177 job_id=job_id,
178 history_limit=history_limit,
179 snapshot=snapshot,
180 notices=notices,
181 right_view=right_view,
182 modal_view=modal_view,
183 deps=deps,
184 async_messages=async_messages,
185 )
186 except Exception as exc:
187 keep_running = True
188 _append_notice(notices, f"submit failed: {type(exc).__name__}: {_one_line(exc, 100)}")
189 buffer = ""
190 needs_render = True
191 if not keep_running:
192 return
193 continue
194 if char in {"\x04"}:
195 return
196 if char == "\x03":
197 buffer = ""
198 _append_notice(notices, "cancelled input")
199 needs_render = True
200 continue
201 if char == "\x15":
202 buffer = ""
203 needs_render = True
204 continue
205 if char in {"\x7f", "\b"}:
206 buffer = buffer[:-1]
207 needs_render = True
208 continue
209 if char == "\t":
210 try:
211 buffer = autocomplete_slash(buffer, CHAT_SLASH_COMMANDS)
212 except Exception as exc:
213 _append_notice(notices, f"autocomplete failed: {type(exc).__name__}: {_one_line(exc, 90)}")
214 needs_render = True
215 continue
216 if char == "\x1b":
217 try:
218 snapshot, job_id, right_view, modal_view, buffer = _handle_chat_escape(
219 stdin_fd,
220 snapshot=snapshot,
221 job_id=job_id,
222 history_limit=history_limit,
223 right_view=right_view,
224 modal_view=modal_view,
225 buffer=buffer,
226 notices=notices,
227 deps=deps,
228 )
229 except Exception as exc:
230 modal_view = None
231 _append_notice(notices, f"navigation failed: {type(exc).__name__}: {_one_line(exc, 90)}")
232 needs_render = True
233 continue
234 if char.isprintable():
235 buffer += char
236 needs_render = True
237 except KeyboardInterrupt:
238 return
239 finally:
240 termios.tcsetattr(sys.stdin, termios.TCSADRAIN, old_attrs)
241 print(_frame_exit_sequence(), flush=True)
242
243
244def emit_frame_if_changed(frame: str, previous_frame: str = "") -> str:
245 if frame != previous_frame:
246 if not previous_frame:
247 print("\033[H" + frame, end="", flush=True)
248 else:
249 print(_diff_frame_update(frame, previous_frame), end="", flush=True)
250 return frame
251
252
253def _safe_render_frame(
254 deps: ChatFrameDeps,
255 *,
256 snapshot: dict[str, Any],
257 buffer: str,
258 notices: list[str],
259 right_view: str,
260 selected_control: int,
261 editing_field: str | None,
262 modal_view: str | None,
263 previous_frame: str,
264) -> str:
265 try:
266 return deps.render_frame(
267 snapshot,
268 buffer,
269 _display_notices(notices),
270 right_view,
271 selected_control,
272 editing_field,
273 modal_view,
274 previous_frame,
275 )
276 except Exception as exc:
277 _append_notice(notices, f"render failed: {type(exc).__name__}: {_one_line(exc, 100)}")
278 frame = _fallback_chat_frame(snapshot=snapshot, buffer=buffer, notices=notices)
279 print("\033[H" + frame, end="", flush=True)
280 return frame
281
282
283def _fallback_chat_frame(*, snapshot: dict[str, Any], buffer: str, notices: list[str]) -> str:
284 width, height = shutil.get_terminal_size((100, 30))
285 width = max(60, width)
286 job = snapshot.get("job") if isinstance(snapshot.get("job"), dict) else {}
287 title = str(job.get("title") or snapshot.get("job_id") or "Nipux")
288 lines = [
289 _fit_plain("NIPUX - safe mode", width),
290 _fit_plain("=" * width, width),
291 _fit_plain(f"Job: {title}", width),
292 _fit_plain("A UI render error was caught. You can keep typing; /exit leaves.", width),
293 "",
294 "Recent notices:",
295 ]
296 lines.extend(f"- {_one_line(notice, width - 3)}" for notice in notices[-8:])
297 lines.extend(["", f"> {_one_line(buffer, width - 3)}"])
298 return "\n".join(_fit_plain(line, width) for line in lines[:height])
299
300
301def _diff_frame_update(frame: str, previous_frame: str) -> str:
302 current_lines = frame.splitlines()
303 previous_lines = previous_frame.splitlines()
304 output: list[str] = []
305 max_lines = max(len(current_lines), len(previous_lines))
306 for index in range(max_lines):
307 current = current_lines[index] if index < len(current_lines) else ""
308 previous = previous_lines[index] if index < len(previous_lines) else ""
309 if current == previous:
310 continue
311 output.append(f"\033[{index + 1};1H\033[2K{current}")
312 return "".join(output)
313
314
315def _fit_plain(text: Any, width: int) -> str:
316 content = _strip_ansi(str(text))
317 if len(content) > width:
318 content = _one_line(content, width)
319 return content + " " * max(0, width - len(content))
320
321
322def _append_notice(notices: list[str], message: str, *, limit: int = 12) -> None:
323 notices.append(message)
324 notices[:] = notices[-limit:]
325
326
327def _append_thinking_notice(notices: list[str]) -> None:
328 if not _has_thinking_notice(notices):
329 _append_notice(notices, THINKING_NOTICE)
330
331
332def _append_waiting_notice(notices: list[str]) -> None:
333 if not _has_waiting_notice(notices):
334 _append_notice(notices, WAITING_NOTICE)
335
336
337def _has_thinking_notice(notices: list[str]) -> bool:
338 return any(notice == THINKING_NOTICE or notice.startswith(f"{THINKING_NOTICE}:") for notice in notices)
339
340
341def _has_waiting_notice(notices: list[str]) -> bool:
342 return any(notice == WAITING_NOTICE or notice.startswith(f"{WAITING_NOTICE}:") for notice in notices)
343
344
345def _has_active_state_notice(notices: list[str]) -> bool:
346 return _has_thinking_notice(notices) or _has_waiting_notice(notices)
347
348
349def _clear_thinking_notices(notices: list[str]) -> None:
350 notices[:] = [
351 notice
352 for notice in notices
353 if notice != THINKING_NOTICE and not notice.startswith(f"{THINKING_NOTICE}:")
354 ]
355
356
357def _display_notices(notices: list[str]) -> list[str]:
358 if not notices:
359 return []
360 index = int(time.monotonic() / THINKING_REFRESH_SECONDS)
361 thinking_frame = THINKING_FRAMES[index % len(THINKING_FRAMES)]
362 waiting_frame = WAITING_FRAMES[index % len(WAITING_FRAMES)]
363 rendered = []
364 for notice in notices:
365 if notice == THINKING_NOTICE:
366 rendered.append(f"{THINKING_NOTICE}:{thinking_frame}")
367 elif notice == WAITING_NOTICE:
368 rendered.append(f"{WAITING_NOTICE}:{waiting_frame}")
369 else:
370 rendered.append(notice)
371 return rendered
372
373
374def _handle_edit_input(
375 char: str,
376 *,
377 buffer: str,
378 editing_field: str,
379 notices: list[str],
380 stdin_fd: int,
381) -> tuple[str, str | None, bool]:
382 if char in {"\r", "\n"}:
383 _append_notice(notices, inline_setting_notice(editing_field, buffer))
384 return "", None, False
385 if char in {"\x04"}:
386 return buffer, editing_field, True
387 if char == "\x03":
388 _append_notice(notices, "cancelled edit")
389 return "", None, False
390 if char == "\x15":
391 return "", editing_field, False
392 if char in {"\x7f", "\b"}:
393 return buffer[:-1], editing_field, False
394 if char == "\x1b":
395 key, _payload = decode_terminal_escape(read_escape_sequence(char, fd=stdin_fd))
396 if key == "unknown":
397 _append_notice(notices, "cancelled edit")
398 return "", None, False
399 return buffer, editing_field, False
400 if char.isprintable():
401 return buffer + char, editing_field, False
402 return buffer, editing_field, False
403
404
405def _handle_chat_submit(
406 buffer: str,
407 *,
408 job_id: str,
409 history_limit: int,
410 snapshot: dict[str, Any],
411 notices: list[str],
412 right_view: str,
413 modal_view: str | None,
414 deps: ChatFrameDeps,
415 async_messages: queue.Queue[str] | None = None,
416) -> tuple[bool, dict[str, Any], str, list[str], str, str | None]:
417 line = buffer.strip()
418 if not line:
419 return True, snapshot, job_id, notices, right_view, modal_view
420 if line in {"clear", "/clear"}:
421 notices.clear()
422 return True, snapshot, job_id, notices, right_view, None
423 if line in {"settings", "/settings"}:
424 _append_notice(notices, "opened settings")
425 return True, snapshot, job_id, notices, right_view, "settings"
426 if line in {"jobs", "/jobs", "status", "/status"}:
427 _append_notice(notices, "opened jobs")
428 return True, snapshot, job_id, notices, "status", None
429 if line in {"outcomes", "/outcomes", "updates", "/updates"}:
430 _append_notice(notices, "opened outcomes")
431 return True, snapshot, job_id, notices, "updates", None
432 keep_running = True
433 try:
434 if deps.is_plain_chat_line(line):
435 _append_thinking_notice(notices)
436 _start_chat_message_worker(
437 job_id,
438 line,
439 deps=deps,
440 async_messages=async_messages,
441 )
442 modal_view = None
443 else:
444 _append_notice(notices, f"> {line}")
445 keep_running, output = deps.capture_chat_command(job_id, line)
446 output_lines = compact_command_output(output)
447 if output_lines:
448 output_text = "\n".join(output_lines)
449 if _looks_like_waiting_output(output_text):
450 _append_waiting_notice(notices)
451 else:
452 _append_notice(notices, output_text)
453 if line.startswith(("/model", "/base-url", "/api-key", "/api-key-env", "/context", "/input-cost", "/output-cost", "/timeout", "/home", "/step-limit", "/output-chars", "/daily-digest", "/digest-time", "/config")):
454 modal_view = "settings"
455 else:
456 modal_view = None
457 except Exception as exc:
458 _append_notice(notices, f"message failed: {type(exc).__name__}: {_one_line(exc, 120)}")
459 try:
460 refresh_job_id = _post_submit_snapshot_job_id(line, job_id)
461 snapshot = deps.load_snapshot(refresh_job_id, history_limit)
462 job_id = str(snapshot["job_id"])
463 except Exception as exc:
464 _append_notice(notices, f"refresh failed after message: {type(exc).__name__}: {_one_line(exc, 100)}")
465 return keep_running, snapshot, job_id, notices, right_view, modal_view
466
467
468def _post_submit_snapshot_job_id(line: str, current_job_id: str) -> str:
469 """Return the job id to refresh after a submitted command or message."""
470
471 text = line.strip()
472 if not text.startswith("/"):
473 return current_job_id
474 command = text[1:].split(maxsplit=1)[0].lower()
475 if command in {"new", "focus", "switch"}:
476 return WORKSPACE_CHAT_ID if current_job_id == WORKSPACE_CHAT_ID else ""
477 return current_job_id
478
479
480def _start_chat_message_worker(
481 job_id: str,
482 line: str,
483 *,
484 deps: ChatFrameDeps,
485 async_messages: queue.Queue[str] | None,
486) -> None:
487 def run() -> None:
488 try:
489 deps.handle_chat_message(job_id, line)
490 if async_messages is not None:
491 async_messages.put("__refresh__")
492 except Exception as exc:
493 if async_messages is not None:
494 async_messages.put(f"message failed: {type(exc).__name__}: {_one_line(exc, 120)}")
495
496 thread = threading.Thread(target=run, name="nipux-chat-submit", daemon=True)
497 thread.start()
498
499
500def _drain_async_notices(async_messages: queue.Queue[str], notices: list[str]) -> bool:
501 changed = False
502 while True:
503 try:
504 message = async_messages.get_nowait()
505 except queue.Empty:
506 return changed
507 if message:
508 if message == "__refresh__":
509 _clear_thinking_notices(notices)
510 changed = True
511 continue
512 _clear_thinking_notices(notices)
513 if _looks_like_waiting_output(message):
514 _append_waiting_notice(notices)
515 else:
516 _append_notice(notices, message)
517 changed = True
518
519
520def _looks_like_waiting_output(message: str) -> bool:
521 normalized = " ".join(str(message or "").lower().split())
522 if not normalized:
523 return False
524 return (
525 normalized.startswith("waiting:")
526 or normalized.startswith("waiting for ")
527 or "waiting for model" in normalized
528 or "waiting for the next worker step" in normalized
529 or "message saved for the worker" in normalized
530 )
531
532
533def _handle_chat_escape(
534 stdin_fd: int,
535 *,
536 snapshot: dict[str, Any],
537 job_id: str,
538 history_limit: int,
539 right_view: str,
540 modal_view: str | None,
541 buffer: str,
542 notices: list[str],
543 deps: ChatFrameDeps,
544) -> tuple[dict[str, Any], str, str, str | None, str]:
545 key, payload = decode_terminal_escape(read_escape_sequence("\x1b", fd=stdin_fd))
546 if modal_view:
547 _append_notice(notices, "closed settings")
548 drain_pending_input(stdin_fd)
549 return snapshot, job_id, right_view, None, buffer
550 if key in {"up", "down"} and buffer.startswith("/"):
551 buffer = cycle_slash(buffer, CHAT_SLASH_COMMANDS, direction=-1 if key == "up" else 1)
552 return snapshot, job_id, right_view, modal_view, buffer
553 if key == "right" and not buffer:
554 return snapshot, job_id, next_chat_right_view(right_view, 1), modal_view, buffer
555 if key == "left" and not buffer:
556 return snapshot, job_id, next_chat_right_view(right_view, -1), modal_view, buffer
557 if key in {"up", "down"} and not buffer:
558 next_focus = frame_next_job_id(snapshot, job_id, direction=-1 if key == "up" else 1)
559 if next_focus and next_focus != job_id:
560 job_id = next_focus
561 deps.write_shell_state({"focus_job_id": job_id})
562 snapshot = deps.load_snapshot(job_id, history_limit)
563 title = snapshot["job"].get("title") or job_id
564 _append_notice(notices, f"focus {title}")
565 return snapshot, job_id, right_view, modal_view, buffer
566 if key == "click" and isinstance(payload, tuple):
567 clicked_view = deps.page_click(payload[0], payload[1], right_view)
568 if clicked_view:
569 return snapshot, job_id, clicked_view, modal_view, buffer
570 drain_pending_input(stdin_fd)
571 return snapshot, job_id, right_view, modal_view, buffer
nipux_cli/chat_intent.py 349 lines
1"""Natural-language intent parsing for Nipux chat and shell control."""
2
3from __future__ import annotations
4
5import re
6
7
8NATURAL_COMMANDS = {
9 "tell me updates": "updates",
10 "show updates": "updates",
11 "show outcomes": "outcomes",
12 "show all outcomes": "outcomes all",
13 "show all accomplishments": "outcomes all",
14 "show accomplishments": "outcomes",
15 "what have all jobs done": "outcomes all",
16 "what has everything done": "outcomes all",
17 "what did all jobs do": "outcomes all",
18 "what did it accomplish": "outcomes",
19 "what has it done": "outcomes",
20 "what has it done so far": "outcomes",
21 "what have you done": "outcomes",
22 "what have you done so far": "outcomes",
23 "what did it actually do": "outcomes",
24 "what did the model do": "outcomes",
25 "show me what it did": "outcomes",
26 "show history": "history",
27 "what happened": "history",
28 "show events": "events",
29 "what did it find": "updates",
30 "what did you find": "updates",
31 "what has it found": "updates",
32 "findings": "findings",
33 "tasks": "tasks",
34 "roadmap": "roadmap",
35 "show roadmap": "roadmap",
36 "show artifacts": "artifacts",
37 "where are artifacts": "artifacts",
38 "show lessons": "lessons",
39 "what did it learn": "lessons",
40 "show findings": "findings",
41 "show tasks": "tasks",
42 "show experiments": "experiments",
43 "show sources": "sources",
44 "show memory": "memory",
45 "show metrics": "metrics",
46 "show usage": "usage",
47 "show cost": "usage",
48 "show tokens": "usage",
49 "show token usage": "usage",
50 "context usage": "usage",
51 "token usage": "usage",
52 "how much did it cost": "usage",
53 "how many tokens did it use": "usage",
54 "status": "status",
55 "check status": "status",
56 "job status": "status",
57 "what is going on": "status",
58 "whats going on": "status",
59 "what's going on": "status",
60 "what is happening": "status",
61 "whats happening": "status",
62 "what's happening": "status",
63 "what are you doing": "status",
64 "what is it doing": "status",
65 "how is it going": "status",
66 "how are things going": "status",
67 "check up on things": "status",
68 "what is blocking it": "status",
69 "what's blocking it": "status",
70 "why is it stuck": "status",
71 "is it stuck": "status",
72 "is it running": "health",
73 "is the daemon running": "health",
74 "daemon health": "health",
75 "show health": "health",
76 "how do i start a job": "help",
77 "how do i create a job": "help",
78 "how do i make a job": "help",
79 "how do i run a job": "help",
80 "how do i start work": "help",
81 "how do i use this": "help",
82 "what can i do": "help",
83 "show activity": "activity",
84 "show tool calls": "activity",
85 "show worker activity": "activity",
86 "show worker output": "activity",
87 "show raw work": "outputs",
88 "show console output": "outputs",
89 "show logs": "outputs",
90 "show saved files": "artifacts",
91 "what did it save": "artifacts",
92 "what files did it create": "artifacts",
93 "what outputs did it save": "artifacts",
94 "what tasks are open": "tasks",
95 "what is the current task": "tasks",
96 "show measurements": "experiments",
97 "show benchmarks": "experiments",
98 "show milestones": "roadmap",
99 "show plan": "roadmap",
100 "show daemon": "health",
101 "start daemon": "start",
102 "restart daemon": "restart",
103}
104
105
106def natural_command_for(text: str) -> str:
107 return NATURAL_COMMANDS.get(" ".join(text.strip().lower().split()), "")
108
109
110def chat_control_command(line: str) -> str:
111 text = " ".join(line.strip().split())
112 if not text:
113 return ""
114 lowered = text.lower().rstrip("?.!")
115 natural = NATURAL_COMMANDS.get(lowered)
116 if natural:
117 return f"/{natural}"
118 control_phrase = _looks_like_control_phrase(lowered)
119 if control_phrase and _mentions_any(lowered, ("token", "cost", "usage", "context window", "context budget")):
120 return "/usage"
121 if control_phrase and _mentions_any(lowered, ("tool call", "tool calls", "worker activity", "worker output", "right pane")):
122 return "/activity"
123 if control_phrase and _mentions_any(lowered, ("console output", "raw output", "raw run", "raw runs", "log", "logs")):
124 return "/outputs"
125 if control_phrase and _mentions_any(lowered, ("saved file", "saved files", "artifact", "artifacts")):
126 return "/artifacts"
127 if (
128 _mentions_any(lowered, ("what did", "what has", "what have", "show me"))
129 and _mentions_any(lowered, ("made", "created", "saved", "produced", "done", "accomplished"))
130 ):
131 return "/outcomes"
132 if control_phrase and _mentions_any(lowered, ("measurement", "measurements", "experiment", "experiments", "benchmark", "benchmarks")):
133 return "/experiments"
134 if control_phrase and _mentions_any(lowered, ("roadmap", "milestone", "milestones", "plan")):
135 return "/roadmap"
136 if control_phrase and _mentions_any(lowered, ("task", "tasks", "todo", "to do", "queue")):
137 return "/tasks"
138 if control_phrase and _mentions_any(lowered, ("finding", "findings")):
139 return "/findings"
140 if control_phrase and _mentions_any(lowered, ("source", "sources")):
141 return "/sources"
142 if control_phrase and _mentions_any(lowered, ("lesson", "lessons", "learned")):
143 return "/lessons"
144 if control_phrase and _mentions_any(lowered, ("memory", "remembered", "learning state")):
145 return "/memory"
146 if lowered in {"start daemon", "launch daemon"}:
147 return "/start"
148 if lowered in {"restart daemon", "reload daemon"}:
149 return "/restart"
150 if lowered in {"jobs", "show jobs", "list jobs", "switch jobs", "change jobs"}:
151 return "/jobs"
152 if lowered in {"settings", "show settings"}:
153 return "/settings"
154 if lowered in {"model settings", "change model", "edit settings"}:
155 return "/model"
156 if lowered in {
157 "run",
158 "start",
159 "run it",
160 "start it",
161 "run job",
162 "run worker",
163 "start job",
164 "start worker",
165 "start working",
166 "start work",
167 "run this",
168 "run the job",
169 "run this job",
170 "start the job",
171 "start this job",
172 "continue",
173 "continue it",
174 "keep going",
175 "keep working",
176 "resume work",
177 }:
178 return "/run"
179 if lowered in {
180 "pause",
181 "pause it",
182 "pause job",
183 "pause worker",
184 "pause the job",
185 "pause work",
186 "pause this job",
187 "stop",
188 "stop it",
189 "stop job",
190 "stop worker",
191 "stop the job",
192 "stop work",
193 "stop working",
194 "stop this job",
195 "halt",
196 "halt job",
197 "halt the job",
198 }:
199 return "/pause"
200 if lowered in {
201 "resume",
202 "resume it",
203 "resume job",
204 "resume worker",
205 "resume the job",
206 "resume this job",
207 "reopen this job",
208 }:
209 return "/resume"
210 if lowered in {"history", "show history", "timeline", "show timeline"}:
211 return "/history"
212 if lowered in {
213 "all outcomes",
214 "show all outcomes",
215 "show all accomplishments",
216 "what have all jobs done",
217 "what has everything done",
218 "what did all jobs do",
219 }:
220 return "/outcomes all"
221 if lowered in {
222 "outcomes",
223 "show outcomes",
224 "accomplishments",
225 "show accomplishments",
226 "what has it done",
227 "what has it done so far",
228 "what have you done",
229 "what have you done so far",
230 "what did it actually do",
231 "what did the model do",
232 "show me what it did",
233 }:
234 return "/outcomes"
235 if lowered in {"artifacts", "outputs", "saved outputs", "show artifacts", "show outputs"}:
236 return "/artifacts"
237 if lowered in {"memory", "show memory", "learning", "show learning"}:
238 return "/memory"
239 return ""
240
241
242def _mentions_any(text: str, needles: tuple[str, ...]) -> bool:
243 for needle in needles:
244 if " " in needle:
245 if needle in text:
246 return True
247 continue
248 if re.search(rf"\b{re.escape(needle)}\b", text):
249 return True
250 return False
251
252
253def _looks_like_control_phrase(text: str) -> bool:
254 return text.startswith(
255 (
256 "show ",
257 "view ",
258 "open ",
259 "list ",
260 "display ",
261 "give me ",
262 "where ",
263 "what ",
264 "how ",
265 "is ",
266 "are ",
267 "check ",
268 )
269 )
270
271
272def message_requests_immediate_run(message: str) -> bool:
273 lowered = " ".join(message.strip().lower().split())
274 if message_requests_queued_job(message):
275 return False
276 if re.match(r"^(?:please\s+)?(?:start|launch|run|spin\s+off|spin\s+up)\b", lowered):
277 return True
278 return bool(re.search(r"\b(?:and|then)\s+(?:start|launch|run|resume)\s+(?:it|the\s+job|work)?\b", lowered))
279
280
281def message_requests_queued_job(message: str) -> bool:
282 lowered = " ".join(message.strip().lower().split())
283 return bool(
284 re.search(
285 r"\b(?:queue only|plan only|create only|do not start|don't start|do not run|don't run|without starting)\b",
286 lowered,
287 )
288 )
289
290
291def extract_job_objective_from_message(message: str) -> str:
292 text = " ".join(message.strip().split())
293 if not text:
294 return ""
295 lowered = text.lower()
296 patterns = [
297 r"^(?:please\s+)?(?:create|start|spin\s+off|make|launch)\s+(?:a\s+)?(?:new\s+)?job\s+(?:to|for|that|which)?\s*(.+)$",
298 r"^(?:please\s+)?(?:create|start|spin\s+off|spin\s+up|make|launch|run)\s+(?:a\s+|an\s+)?(?:new\s+)?(?:worker|agent|task)\s+(?:to|for|that|which)?\s*(.+)$",
299 r"^(?:please\s+)?(?:send|queue)\s+(?:off\s+)?(?:a\s+)?(?:new\s+)?job\s+(?:to|for|that|which)?\s*(.+)$",
300 r"^(?:please\s+)?(?:new|job)\s+(.+)$",
301 r"^(?:please\s+)?(?:start|run|launch)\s+(?!daemon\b|it\b|this\b|that\b|the\s+job\b|the\s+worker\b|job\b|worker\b|work\b)(.+)$",
302 r"^(?:please\s+)?(?:can\s+you|could\s+you|i\s+need\s+you\s+to|i\s+want\s+you\s+to)\s+(.+)$",
303 ]
304 for pattern in patterns:
305 match = re.match(pattern, text, flags=re.IGNORECASE)
306 if match:
307 objective = match.group(1).strip(" .")
308 return objective if looks_like_job_objective(objective) else ""
309 if looks_like_job_objective(text) and not looks_like_smalltalk(lowered):
310 return text
311 return ""
312
313
314def looks_like_smalltalk(lowered: str) -> bool:
315 return lowered in {"hi", "hello", "hey", "yo", "sup", "thanks", "thank you"} or lowered.endswith("?")
316
317
318def looks_like_job_objective(text: str) -> bool:
319 lowered = text.lower()
320 if len(text.split()) < 3:
321 return False
322 action_words = {
323 "research",
324 "monitor",
325 "optimize",
326 "build",
327 "find",
328 "test",
329 "deploy",
330 "fix",
331 "write",
332 "analyze",
333 "audit",
334 "track",
335 "benchmark",
336 "create",
337 "document",
338 "draft",
339 "generate",
340 "scrape",
341 "produce",
342 "watch",
343 "automate",
344 "summarize",
345 "compare",
346 "investigate",
347 "improve",
348 }
349 return any(re.search(rf"\b{re.escape(word)}\b", lowered) for word in action_words)
nipux_cli/chat_tui.py 277 lines
1"""Chat workspace terminal frame rendering."""
2
3from __future__ import annotations
4
5from typing import Any
6
7from nipux_cli.config import load_config
8from nipux_cli.first_run_tui import first_run_themed_lines
9from nipux_cli.settings import edit_target_hint, edit_target_label, edit_target_masks_input
10from nipux_cli.tui_commands import CHAT_SLASH_COMMANDS, slash_suggestion_lines
11from nipux_cli.tui_event_format import clean_step_summary
12from nipux_cli.tui_events import chat_pane_lines
13from nipux_cli.tui_layout import _compose_bar, _top_bar
14from nipux_cli.tui_outcomes import chat_updates_pane_lines
15from nipux_cli.tui_status import (
16 job_display_state,
17 right_pane_lines,
18 worker_label,
19)
20from nipux_cli.tui_style import _accent, _bold, _fit_ansi, _muted, _one_line, _strip_ansi
21
22
23def build_chat_frame(
24 snapshot: dict[str, Any],
25 input_buffer: str,
26 notices: list[str],
27 *,
28 width: int,
29 height: int,
30 right_view: str = "updates",
31 selected_control: int = 0,
32 editing_field: str | None = None,
33 modal_view: str | None = None,
34) -> str:
35 del selected_control
36 if right_view == "work":
37 right_view = "updates"
38 width = max(92, width)
39 height = max(22, height)
40 job = snapshot["job"]
41 right_job = snapshot.get("right_job") if isinstance(snapshot.get("right_job"), dict) else job
42 jobs = snapshot["jobs"]
43 steps = snapshot["steps"]
44 artifacts = snapshot["artifacts"]
45 job_id = str(snapshot["job_id"])
46 right_job_id = str(snapshot.get("right_job_id") or job_id)
47 job_artifacts = snapshot.get("job_artifacts") if isinstance(snapshot.get("job_artifacts"), dict) else {}
48 if artifacts:
49 job_artifacts.setdefault(job_id, artifacts)
50 job_summary_events = snapshot.get("job_summary_events") if isinstance(snapshot.get("job_summary_events"), dict) else {}
51 job_counts = snapshot.get("job_counts") if isinstance(snapshot.get("job_counts"), dict) else {}
52 memory_entries = snapshot["memory_entries"]
53 events = snapshot["events"]
54 summary_events = snapshot.get("summary_events") if isinstance(snapshot.get("summary_events"), list) else events
55 daemon = snapshot["daemon"]
56 model = str(snapshot["model"])
57 base_url = str(snapshot.get("base_url") or "")
58 token_usage = snapshot.get("token_usage") if isinstance(snapshot.get("token_usage"), dict) else {}
59 context_length = int(snapshot.get("context_length") or 0)
60 counts = snapshot.get("counts") if isinstance(snapshot.get("counts"), dict) else {}
61 findings = _metadata_records(right_job, "finding_ledger")
62 sources = _metadata_records(right_job, "source_ledger")
63 tasks = _metadata_records(right_job, "task_queue")
64 experiments = _metadata_records(right_job, "experiment_ledger")
65 lessons = _metadata_records(right_job, "lessons")
66 roadmap = right_job.get("metadata", {}).get("roadmap") if isinstance(right_job.get("metadata"), dict) else {}
67 milestones = roadmap.get("milestones") if isinstance(roadmap, dict) and isinstance(roadmap.get("milestones"), list) else []
68 open_tasks = sum(1 for task in tasks if str(task.get("status") or "open") in {"open", "active"})
69 state = job_display_state(right_job, bool(daemon["running"]))
70 worker = worker_label(right_job, bool(daemon["running"]))
71 latest_step = steps[-1] if steps else None
72 right_width = min(max(52, int(width * 0.36)), 72)
73 left_width = max(48, width - right_width - 3)
74 if left_width < 48:
75 left_width = 48
76 right_width = max(34, width - left_width - 3)
77 latest_text = _step_line(latest_step, chars=right_width - 6) if latest_step else "no worker steps yet"
78 daemon_text = _daemon_state_line(daemon)
79 goal_text = " ".join(str(right_job.get("objective") or "").split())
80 metrics = [
81 ("actions", counts.get("steps", _step_count(steps))),
82 ("outputs", counts.get("artifacts", len(artifacts))),
83 ("findings", len(findings)),
84 ("sources", len(sources)),
85 ("tasks", f"{len(tasks)}/{open_tasks} open"),
86 ("roadmap", len(milestones)),
87 ("experiments", len(experiments)),
88 ("lessons", len(lessons)),
89 ("memory", counts.get("memory", len(memory_entries))),
90 ]
91
92 header = _top_bar(
93 width,
94 state=state,
95 daemon=daemon_text,
96 model=model,
97 token_usage=token_usage,
98 context_length=context_length,
99 base_url=base_url,
100 )
101 if editing_field:
102 hint = edit_target_hint(editing_field)
103 prompt_label = edit_target_label(editing_field)
104 elif not jobs:
105 hint = "Type a goal to create the first worker · / opens commands · /settings configures"
106 prompt_label = "❯"
107 else:
108 hint = "Enter sends · / opens commands · /settings configures · ←→ updates/jobs"
109 prompt_label = "❯"
110 suggestions = [] if editing_field else slash_suggestion_lines(input_buffer, CHAT_SLASH_COMMANDS, width=width)
111 compose_lines = _compose_bar(
112 input_buffer,
113 width=width,
114 hint=hint,
115 suggestions=suggestions,
116 prompt_label=prompt_label,
117 mask_input=edit_target_masks_input(editing_field),
118 )
119 footer_rows = len(compose_lines)
120 body_rows = max(10, height - len(header) - 1 - footer_rows)
121 chat_lines = chat_pane_lines(events, notices, width=left_width, rows=body_rows)
122 if right_view == "updates":
123 right_lines = chat_updates_pane_lines(
124 job=right_job,
125 events=summary_events,
126 width=right_width,
127 rows=body_rows,
128 )
129 right_title = "Model updates"
130 else:
131 right_lines = right_pane_lines(
132 job=right_job,
133 jobs=jobs,
134 job_artifacts=job_artifacts,
135 job_summary_events=job_summary_events,
136 job_counts=job_counts,
137 job_id=right_job_id,
138 daemon_running=bool(daemon["running"]),
139 state=state,
140 worker=worker,
141 daemon_text=daemon_text,
142 model=model,
143 goal_text=goal_text,
144 latest_text=latest_text,
145 metrics=metrics,
146 events=summary_events,
147 token_usage=token_usage,
148 context_length=context_length,
149 width=right_width,
150 rows=body_rows,
151 right_view=right_view,
152 )
153 right_title = "Jobs"
154 lines = [*header, _two_col_title(left_width, right_width, "Chat", right_title)]
155 for index in range(body_rows):
156 left = chat_lines[index] if index < len(chat_lines) else ""
157 right = right_lines[index] if index < len(right_lines) else ""
158 lines.append(_two_col_line(left, right, left_width=left_width, right_width=right_width))
159 lines.extend(compose_lines)
160 if len(lines) > height:
161 keep_top = min(4, len(header) + 1)
162 keep_bottom = footer_rows
163 middle_budget = max(0, height - keep_top - keep_bottom)
164 lines = lines[:keep_top] + lines[-(middle_budget + keep_bottom) : -keep_bottom] + lines[-keep_bottom:]
165 if modal_view == "settings":
166 lines = _overlay_settings_modal(lines[:height], width=width, height=height)
167 return "\n".join(first_run_themed_lines(lines[:height], width=width))
168
169
170def _two_col_title(left_width: int, right_width: int, left: str, right: str) -> str:
171 return _fit_ansi(_bold(left.upper()), left_width) + _muted(" │ ") + _fit_ansi(_bold(right.upper()), right_width)
172
173
174def _two_col_line(left: str, right: str, *, left_width: int, right_width: int) -> str:
175 return _fit_ansi(left, left_width) + _muted(" │ ") + _fit_ansi(right, right_width)
176
177
178def _overlay_settings_modal(lines: list[str], *, width: int, height: int) -> list[str]:
179 config = load_config()
180 key_state = "set" if config.model.api_key else "missing"
181 input_cost = _rate_text(config.model.input_cost_per_million)
182 output_cost = _rate_text(config.model.output_cost_per_million)
183 cost_limit = "none" if config.runtime.max_job_cost_usd is None else f"${config.runtime.max_job_cost_usd:g}"
184 content = [
185 _bold("Model"),
186 _settings_row("id", config.model.model, "/model MODEL"),
187 _settings_row("endpoint", config.model.base_url, "/base-url URL"),
188 _settings_row("key", f"{key_state} in {config.model.api_key_env}", "/api-key KEY"),
189 _settings_row(
190 "limits",
191 f"context {config.model.context_length}, timeout {config.model.request_timeout_seconds:g}s",
192 "/context TOKENS /timeout SECONDS",
193 ),
194 "",
195 _bold("Runtime"),
196 _settings_row("home", str(config.runtime.home), "/home PATH"),
197 _settings_row(
198 "steps",
199 f"tool {config.runtime.max_step_seconds}s, preview {config.runtime.artifact_inline_char_limit} chars",
200 "/step-limit SECONDS /output-chars CHARS",
201 ),
202 _settings_row(
203 "digest",
204 f"{config.runtime.daily_digest_enabled} at {config.runtime.daily_digest_time}",
205 "/daily-digest BOOL /digest-time HH:MM",
206 ),
207 "",
208 _bold("Cost"),
209 _settings_row("rates", f"input {input_cost}, output {output_cost}", "/input-cost DOLLARS /output-cost DOLLARS"),
210 _settings_row("limit", cost_limit, "/max-cost DOLLARS"),
211 "",
212 _muted("Edit with slash commands in the composer. Esc closes."),
213 ]
214 box_width = min(max(64, int(width * 0.58)), width - 8)
215 box_height = min(len(content) + 4, height - 6)
216 inner = max(20, box_width - 4)
217 title = f" Settings {_accent('●')} "
218 rule_width = max(2, box_width - len(_strip_ansi(title)) - 2)
219 left_rule = max(1, rule_width // 2)
220 right_rule = max(1, rule_width - left_rule)
221 top = "╭" + "─" * left_rule + title + "─" * right_rule + "╮"
222 box = [top]
223 for item in content[: box_height - 3]:
224 if item:
225 box.append("│ " + _fit_ansi(item, inner) + " │")
226 else:
227 box.append("│ " + " " * inner + " │")
228 while len(box) < box_height - 1:
229 box.append("│ " + " " * inner + " │")
230 box.append("╰" + "─" * (box_width - 2) + "╯")
231 output = [_fit_ansi(line, width) for line in lines]
232 start_y = max(2, (height - len(box)) // 2)
233 start_x = max(0, (width - box_width) // 2)
234 for offset, modal_line in enumerate(box):
235 target = start_y + offset
236 if target >= len(output):
237 break
238 output[target] = _fit_ansi(" " * start_x + modal_line, width)
239 return output
240
241
242def _settings_row(label: str, value: Any, command: str) -> str:
243 value_text = _one_line(value, 42)
244 return f"{_muted(label.ljust(9))} {_bold(value_text)} {_muted(command)}"
245
246
247def _rate_text(value: float | None) -> str:
248 return "provider-reported" if value is None else f"${value:g}/1M"
249
250
251def _metadata_records(job: dict[str, Any], key: str) -> list[dict[str, Any]]:
252 metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
253 values = metadata.get(key)
254 if not isinstance(values, list):
255 return []
256 return [value for value in values if isinstance(value, dict)]
257
258
259def _step_count(steps: list[dict[str, Any]]) -> int:
260 numbers = [int(step.get("step_no") or 0) for step in steps]
261 return max(numbers, default=0)
262
263
264def _step_line(step: dict[str, Any], *, chars: int = 180) -> str:
265 tool = step.get("tool_name") or step.get("kind") or "-"
266 summary = clean_step_summary(step.get("summary") or step.get("error") or "-")
267 error = " ERROR" if step.get("error") else ""
268 return f"#{step['step_no']:<4} {step['status']:<9} {tool:<18} {_one_line(summary, chars)}{error}"
269
270
271def _daemon_state_line(lock: dict[str, Any]) -> str:
272 metadata = lock.get("metadata") if isinstance(lock.get("metadata"), dict) else {}
273 if lock.get("running"):
274 pid = metadata.get("pid") or "unknown"
275 stale = " stale-runtime" if lock.get("stale") else ""
276 return f"running pid={pid}{stale}"
277 return "ready when work starts"
nipux_cli/cli.py 3188 lines
1"""Thin CLI for the Nipux agent runtime."""
2
3from __future__ import annotations
4
5import argparse
6import json
7import os
8import shlex
9import shutil
10import subprocess
11import sys
12import threading
13import time
14from contextlib import redirect_stdout
15from io import StringIO
16from pathlib import Path
17from typing import Any
18
19from nipux_cli import __version__
20from nipux_cli.artifacts import ArtifactStore
21from nipux_cli.chat_intent import (
22 chat_control_command,
23 extract_job_objective_from_message as _extract_job_objective_from_message,
24 message_requests_immediate_run,
25 message_requests_queued_job,
26 natural_command_for,
27)
28from nipux_cli.cli_state import (
29 configured_focus_job_id as _configured_focus_job_id,
30 default_job_id as _default_job_id,
31 find_job as _find_job,
32 clear_model_setup_verified as _clear_model_setup_verified,
33 mark_model_setup_verified as _mark_model_setup_verified,
34 model_setup_verified as _model_setup_verified,
35 read_shell_state as _read_shell_state,
36 write_shell_state as _write_shell_state,
37)
38from nipux_cli.cli_render import (
39 daemon_event_line as _daemon_event_line,
40 daemon_state_line as _daemon_state_line,
41 important_startup_events as _important_startup_events,
42 job_ref_text as _job_ref_text,
43 json_default as _json_default,
44 next_operator_action as _next_operator_action,
45 note_text as _note_text,
46 print_artifact as _print_artifact,
47 print_event_card as _print_event_card,
48 print_event_details as _print_event_details,
49 print_jobs_panel as _print_jobs_panel,
50 print_metric_grid as _print_metric_grid,
51 print_run as _print_run,
52 print_step as _print_step,
53 print_wrapped as _print_wrapped,
54 public_event as _public_event,
55 rule as _rule,
56 section_title as _section_title,
57 short_path as _short_path,
58 step_line as _step_line,
59 terminal_width as _terminal_width,
60)
61from nipux_cli.chat_context import build_chat_messages as _build_chat_messages
62from nipux_cli.chat_commands import ChatCommandDeps, handle_chat_slash_command as _handle_chat_slash_command
63from nipux_cli.chat_controller import (
64 ChatControllerDeps,
65 chat_reply_text_and_metadata as _controller_reply_text_and_metadata,
66 handle_chat_control_intent as _controller_handle_chat_control_intent,
67 handle_chat_message as _controller_handle_chat_message,
68 maybe_spawn_job_from_chat as _controller_maybe_spawn_job_from_chat,
69 queue_chat_note as _controller_queue_chat_note,
70)
71from nipux_cli.chat_frame_runtime import (
72 ChatFrameDeps,
73 compact_command_output as _compact_command_output,
74 emit_frame_if_changed as _emit_frame_if_changed,
75 run_chat_frame as _run_chat_frame,
76)
77from nipux_cli.chat_tui import build_chat_frame as _build_chat_tui_frame
78from nipux_cli.cli_help import NIPUX_BANNER, print_shell_help as _render_shell_help
79from nipux_cli.config import (
80 DEFAULT_BASE_URL,
81 DEFAULT_API_KEY_ENV,
82 DEFAULT_CONTEXT_LENGTH,
83 DEFAULT_MODEL,
84 DEFAULT_OPENROUTER_API_KEY_ENV,
85 DEFAULT_OPENROUTER_MODEL,
86 default_config_yaml,
87 load_config,
88 write_private_text,
89)
90from nipux_cli.daemon_control import cmd_restart_impl as _cmd_restart_impl
91from nipux_cli.daemon_control import cmd_start_impl as _cmd_start_impl
92from nipux_cli.daemon_control import ensure_remote_model_ready_for_worker as _daemon_ensure_remote_model_ready
93from nipux_cli.daemon_control import provider_preflight_is_recoverable as _daemon_provider_preflight_is_recoverable
94from nipux_cli.daemon_control import recoverable_remote_model_preflight_failures as _daemon_recoverable_remote_model_preflight_failures
95from nipux_cli.daemon_control import remote_model_preflight_failures as _daemon_remote_model_preflight_failures
96from nipux_cli.daemon_control import start_daemon_if_needed_impl as _start_daemon_if_needed_impl
97from nipux_cli.daemon_control import stop_daemon_process_impl as _stop_daemon_process_impl
98from nipux_cli.daemon import Daemon, DaemonAlreadyRunning, daemon_lock_status, read_daemon_events
99from nipux_cli.dashboard import collect_dashboard_state, render_dashboard, render_overview
100from nipux_cli.db import AgentDB, utc_now
101from nipux_cli.digest import render_job_digest, write_daily_digest
102from nipux_cli.doctor import run_doctor
103from nipux_cli.first_run_tui import (
104 build_first_run_frame as _build_first_run_tui_frame,
105 first_run_actions as _first_run_tui_actions,
106 first_run_columns as _first_run_columns,
107)
108from nipux_cli.first_run_controller import (
109 FirstRunFrameDeps,
110 capture_first_run_command as _controller_capture_first_run_command,
111 create_first_run_job as _controller_create_first_run_job,
112 first_run_chat_reply as _controller_first_run_chat_reply,
113 first_token as _controller_first_token,
114 handle_first_run_action as _controller_handle_first_run_action,
115 handle_first_run_frame_line as _controller_handle_first_run_frame_line,
116)
117from nipux_cli.first_run_frame_runtime import (
118 FirstRunRuntimeDeps,
119 clamp_selection as _clamp_first_run_runtime_selection,
120 run_first_run_frame as _run_first_run_frame,
121)
122from nipux_cli.event_render import event_line as _event_line
123from nipux_cli.frame_snapshot import WORKSPACE_CHAT_ID, load_frame_snapshot
124from nipux_cli.parser_builder import build_arg_parser
125from nipux_cli.planning import (
126 format_initial_plan,
127 initial_plan_for_objective,
128 initial_roadmap_for_objective,
129 initial_task_contract,
130)
131from nipux_cli.scheduling import job_provider_blocked, operator_resume_metadata
132from nipux_cli.record_commands import (
133 RecordCommandDeps,
134 cmd_experiments_impl,
135 cmd_findings_impl,
136 cmd_memory_impl,
137 cmd_metrics_impl,
138 cmd_roadmap_impl,
139 cmd_sources_impl,
140 cmd_tasks_impl,
141 cmd_usage_impl,
142)
143from nipux_cli.service_install import cmd_autostart, cmd_service
144from nipux_cli.service_install import launch_agent_path as _launch_agent_path
145from nipux_cli.service_install import launch_agent_plist as _service_launch_agent_plist
146from nipux_cli.service_install import systemd_service_text as _service_systemd_service_text
147from nipux_cli.templates import program_for_job
148from nipux_cli.tui_commands import CHAT_SETTING_COMMANDS, slash_suggestion_lines
149from nipux_cli.settings import (
150 config_field_value,
151 save_config_field,
152)
153from nipux_cli.settings_commands import (
154 capture_setting_command as _capture_setting_command,
155 handle_chat_setting_command as _handle_chat_setting_command,
156)
157from nipux_cli.tui_event_format import (
158 clean_step_summary as _clean_step_summary,
159 friendly_error_text as _friendly_error_text,
160 generic_display_text as _generic_display_text,
161)
162from nipux_cli.tui_events import (
163 live_badge as _live_badge,
164 minimal_live_event_line as _minimal_live_event_line,
165)
166from nipux_cli.tui_outcomes import model_update_event_parts as _model_update_event_parts
167from nipux_cli.tui_status import (
168 job_display_state as _job_display_state,
169 worker_label as _worker_label,
170)
171from nipux_cli.tui_style import (
172 _accent,
173 _fancy_ui,
174 _one_line,
175 _status_badge,
176)
177from nipux_cli.uninstall import build_uninstall_plan, uninstall_installed_tool, uninstall_runtime
178from nipux_cli.updater import update_checkout
179from nipux_cli.updates import render_all_updates_report, render_updates_report
180
181_save_config_field = save_config_field
182_config_field_value = config_field_value
183_slash_suggestion_lines = slash_suggestion_lines
184_chat_control_command = chat_control_command
185
186
187def _launch_agent_plist(*, poll_seconds: float, quiet: bool) -> str:
188 return _service_launch_agent_plist(poll_seconds=poll_seconds, quiet=quiet)
189
190
191def _systemd_service_text(*, poll_seconds: float, quiet: bool) -> str:
192 return _service_systemd_service_text(poll_seconds=poll_seconds, quiet=quiet)
193
194
195SHELL_BUILTINS = {"help", "?", "commands", "exit", "quit", ":q", "clear"}
196SHELL_COMMAND_NAMES = {
197 "init",
198 "uninstall",
199 "create",
200 "jobs",
201 "ls",
202 "focus",
203 "rename",
204 "delete",
205 "rm",
206 "chat",
207 "shell",
208 "status",
209 "health",
210 "history",
211 "events",
212 "activity",
213 "feed",
214 "tail",
215 "updates",
216 "findings",
217 "tasks",
218 "roadmap",
219 "experiments",
220 "update",
221 "dashboard",
222 "dash",
223 "start",
224 "stop",
225 "restart",
226 "browser-dashboard",
227 "artifacts",
228 "artifact",
229 "lessons",
230 "learn",
231 "findings",
232 "sources",
233 "memory",
234 "metrics",
235 "usage",
236 "logs",
237 "outputs",
238 "output",
239 "watch",
240 "run-one",
241 "run",
242 "work",
243 "steer",
244 "say",
245 "pause",
246 "resume",
247 "cancel",
248 "digest",
249 "daily-digest",
250 "daemon",
251 "doctor",
252 "autostart",
253 "service",
254}
255
256def _db() -> tuple[AgentDB, object]:
257 config = load_config()
258 config.ensure_dirs()
259 return AgentDB(config.runtime.state_db_path), config
260
261
262def _record_command_deps() -> RecordCommandDeps:
263 return RecordCommandDeps(
264 db_factory=_db,
265 resolve_job_id=_resolve_job_id,
266 job_ref_text=_job_ref_text,
267 )
268
269
270def cmd_init(args: argparse.Namespace) -> None:
271 config = load_config()
272 config.ensure_dirs()
273 path = Path(args.path).expanduser() if args.path else config.runtime.home / "config.yaml"
274 if path.exists() and not args.force:
275 print(f"Config already exists: {path}")
276 return
277 path.parent.mkdir(parents=True, exist_ok=True)
278 model = args.model or DEFAULT_MODEL
279 base_url = args.base_url or DEFAULT_BASE_URL
280 api_key_env = args.api_key_env or DEFAULT_API_KEY_ENV
281 if args.openrouter:
282 base_url = args.base_url or "https://openrouter.ai/api/v1"
283 api_key_env = args.api_key_env or DEFAULT_OPENROUTER_API_KEY_ENV
284 model = args.model or DEFAULT_OPENROUTER_MODEL
285 write_private_text(
286 path,
287 default_config_yaml(
288 model=model,
289 base_url=base_url,
290 api_key_env=api_key_env,
291 context_length=getattr(args, "context_length", DEFAULT_CONTEXT_LENGTH),
292 ),
293 )
294 print(f"Wrote {path}")
295 env_path = config.runtime.home / ".env"
296 if not env_path.exists():
297 write_private_text(
298 env_path,
299 f"# Optional local secrets for Nipux. This file stays outside the git repo.\n{api_key_env}=\n",
300 )
301 print(f"Wrote {env_path} (fill {api_key_env}; do not commit secrets)")
302
303
304def cmd_update(args: argparse.Namespace) -> None:
305 config = load_config()
306 config.ensure_dirs()
307 daemon_before = daemon_lock_status(config.runtime.home / "agentd.lock")
308 code, lines = update_checkout(path=args.path, allow_dirty=args.allow_dirty)
309 for line in lines:
310 print(line)
311 if code:
312 raise SystemExit(code)
313 if getattr(args, "no_restart", False):
314 print("Daemon restart skipped by --no-restart.")
315 return
316 daemon_after = daemon_lock_status(config.runtime.home / "agentd.lock")
317 if not daemon_before.get("running") and not daemon_after.get("running"):
318 print("No daemon is running; no restart needed.")
319 return
320 print("Restarting running daemon so it uses the updated code.")
321 try:
322 cmd_restart(
323 argparse.Namespace(
324 poll_seconds=0.0,
325 wait=5.0,
326 fake=False,
327 quiet=True,
328 log_file=None,
329 )
330 )
331 except SystemExit as exc:
332 detail = str(exc) if str(exc) else "restart failed"
333 print(f"Update succeeded, but daemon restart failed: {_one_line(detail, 160)}")
334
335
336def cmd_uninstall(args: argparse.Namespace) -> None:
337 config = load_config()
338 plan = build_uninstall_plan(runtime_home=config.runtime.home, include_legacy=not args.keep_legacy)
339 remove_tool = bool(getattr(args, "remove_tool", False)) or not bool(getattr(args, "keep_tool", False))
340 if not args.yes and not args.dry_run:
341 print("This will stop Nipux and remove local runtime state:")
342 for path in (*plan.service_paths, *plan.paths):
343 print(f" {path.expanduser()}")
344 if remove_tool:
345 print("It will also remove the installed `nipux` command with: uv tool uninstall nipux")
346 try:
347 answer = input("Type 'uninstall' to continue: ").strip().lower()
348 except (EOFError, KeyboardInterrupt):
349 print()
350 print("uninstall aborted")
351 return
352 if answer != "uninstall":
353 print("uninstall aborted")
354 return
355 if not args.dry_run:
356 try:
357 _stop_daemon_process(config, wait=float(args.wait), quiet=True)
358 except (OSError, SystemExit) as exc:
359 print(f"daemon stop skipped: {exc}")
360 for line in uninstall_runtime(
361 runtime_home=config.runtime.home,
362 dry_run=bool(args.dry_run),
363 include_legacy=not args.keep_legacy,
364 ):
365 print(line)
366 if remove_tool:
367 code, lines = uninstall_installed_tool(dry_run=bool(args.dry_run))
368 for line in lines:
369 print(line)
370 if code:
371 print("installed command removal failed; runtime state was still removed")
372 elif not args.dry_run:
373 print("runtime removed. Installed `nipux` command kept by --keep-tool.")
374
375
376def cmd_create(args: argparse.Namespace) -> None:
377 if not _ensure_model_setup_verified_for_workspace():
378 raise SystemExit(1)
379 job_id, title = _create_job(
380 objective=args.objective,
381 title=args.title,
382 kind=args.kind,
383 cadence=args.cadence,
384 )
385 print(f"created {title}")
386
387
388def _ensure_model_setup_verified_for_workspace() -> bool:
389 config = load_config()
390 if _model_setup_verified(config):
391 return True
392 if _workspace_has_model_config(config) and _auto_verify_model_setup(config):
393 return True
394 print("Model setup is not verified.")
395 print("Run `nipux` and finish setup, or run `nipux doctor --check-model` after configuring a provider.")
396 print("Jobs and chat stay locked until the configured model accepts a chat request.")
397 return False
398
399
400def _workspace_has_model_config(config: Any) -> bool:
401 return bool(_read_shell_state().get("setup_completed")) or (config.runtime.home / "config.yaml").exists()
402
403
404def _auto_verify_model_setup(config: Any) -> bool:
405 checks = run_doctor(config=config, check_model=True)
406 ok = all(check.ok for check in checks)
407 if ok:
408 _mark_model_setup_verified(config)
409 return True
410 _clear_model_setup_verified()
411 return False
412
413
414def _create_job(
415 *, objective: str, title: str | None = None, kind: str = "generic", cadence: str | None = None
416) -> tuple[str, str]:
417 db, config = _db()
418 try:
419 title = title or objective.strip().splitlines()[0][:80] or "Untitled job"
420 plan = initial_plan_for_objective(objective)
421 job_id = db.create_job(
422 objective,
423 title=title,
424 kind=kind,
425 cadence=cadence,
426 metadata={"planning": plan},
427 )
428 db.update_job_status(job_id, "queued", metadata_patch={"planning": plan, "planning_status": "auto_accepted"})
429 db.append_agent_update(job_id, format_initial_plan(plan), category="plan", metadata={"planning": plan})
430 db.append_agent_update(job_id, "Plan accepted automatically. I will start working from the planned tasks.", category="plan")
431 db.append_roadmap_record(job_id, **initial_roadmap_for_objective(title=title, objective=objective))
432 for index, task in enumerate(plan["tasks"], start=1):
433 task_contract = initial_task_contract(str(task))
434 db.append_task_record(
435 job_id,
436 title=str(task),
437 status="open",
438 priority=max(0, 10 - index),
439 goal=objective,
440 output_contract=task_contract["output_contract"],
441 acceptance_criteria=task_contract["acceptance_criteria"],
442 evidence_needed=task_contract["evidence_needed"],
443 stall_behavior=task_contract["stall_behavior"],
444 metadata={"phase": "initial_plan"},
445 )
446 program = config.runtime.jobs_dir / job_id / "program.md"
447 program.parent.mkdir(parents=True, exist_ok=True)
448 program.write_text(
449 program_for_job(kind=kind, title=title, objective=objective),
450 encoding="utf-8",
451 )
452 _write_shell_state({"focus_job_id": job_id})
453 return job_id, title
454 finally:
455 db.close()
456
457
458def cmd_jobs(args: argparse.Namespace) -> None:
459 db, _ = _db()
460 try:
461 jobs = db.list_jobs()
462 if not jobs:
463 print('No jobs yet. Create one with: nipux create "objective"')
464 return
465 focused = _configured_focus_job_id(db)
466 daemon_running = daemon_lock_status(load_config().runtime.home / "agentd.lock")["running"]
467 _print_jobs_panel(jobs, focused_job_id=str(focused or ""), daemon_running=bool(daemon_running))
468 finally:
469 db.close()
470
471
472def cmd_focus(args: argparse.Namespace) -> None:
473 db, _ = _db()
474 try:
475 if not args.query:
476 job_id = _default_job_id(db)
477 if not job_id:
478 print("No focused job. Create one first.")
479 return
480 job = db.get_job(job_id)
481 daemon = daemon_lock_status(load_config().runtime.home / "agentd.lock")
482 print(f"focus: {job['title']} | job {_job_display_state(job, bool(daemon['running']))}")
483 return
484 job = _find_job(db, " ".join(args.query))
485 if not job:
486 print(f"No job matched: {' '.join(args.query)}")
487 return
488 _write_shell_state({"focus_job_id": job["id"]})
489 daemon = daemon_lock_status(load_config().runtime.home / "agentd.lock")
490 print(f"focus set: {job['title']} | job {_job_display_state(job, bool(daemon['running']))}")
491 finally:
492 db.close()
493
494
495def cmd_rename(args: argparse.Namespace) -> None:
496 db, config = _db()
497 try:
498 job_id = _resolve_job_id(db, args.job_id)
499 if not job_id:
500 ref = _job_ref_text(args.job_id)
501 print(f"No job matched: {ref}" if ref else "No jobs found.")
502 return
503 old = db.get_job(job_id)
504 renamed = db.rename_job(job_id, _job_ref_text(args.title))
505 program = config.runtime.jobs_dir / job_id / "program.md"
506 if program.exists():
507 try:
508 content = program.read_text(encoding="utf-8")
509 lines = content.splitlines()
510 if lines and lines[0].startswith("# "):
511 lines[0] = f"# {renamed['title']}"
512 program.write_text("\n".join(lines) + ("\n" if content.endswith("\n") else ""), encoding="utf-8")
513 except OSError:
514 pass
515 _write_shell_state({"focus_job_id": job_id})
516 print(f"renamed {old['title']} -> {renamed['title']}")
517 finally:
518 db.close()
519
520
521def cmd_delete(args: argparse.Namespace) -> None:
522 db, config = _db()
523 try:
524 job_id = _resolve_job_id(db, args.job_id)
525 if not job_id:
526 ref = _job_ref_text(args.job_id)
527 print(f"No job matched: {ref}" if ref else "usage: delete JOB_TITLE")
528 return
529 result = db.delete_job(job_id)
530 job = result["job"]
531 finally:
532 db.close()
533
534 removed_files = 0
535 if not args.keep_files:
536 job_dir = config.runtime.jobs_dir / job_id
537 for path_text in result.get("artifact_paths") or []:
538 path = Path(path_text)
539 try:
540 if path.exists() and job_dir in path.parents:
541 path.unlink()
542 removed_files += 1
543 except OSError:
544 pass
545 try:
546 if job_dir.exists():
547 shutil.rmtree(job_dir)
548 except OSError:
549 pass
550 state = _read_shell_state()
551 if state.get("focus_job_id") == job_id:
552 _write_shell_state({"focus_job_id": ""})
553 counts = result.get("counts") or {}
554 file_text = "kept files" if args.keep_files else f"removed files={removed_files}"
555 print(
556 f"deleted {job['title']} | steps={counts.get('steps', 0)} "
557 f"artifacts={counts.get('artifacts', 0)} runs={counts.get('runs', 0)} | {file_text}"
558 )
559
560
561def cmd_chat(args: argparse.Namespace) -> None:
562 if not _ensure_model_setup_verified_for_workspace():
563 return
564 db, _ = _db()
565 try:
566 job_id = _resolve_job_id(db, args.job_id)
567 if not job_id:
568 ref = _job_ref_text(args.job_id)
569 print(f"No job matched: {ref}" if ref else "No jobs found. Create one first.")
570 return
571 _write_shell_state({"focus_job_id": job_id})
572 finally:
573 db.close()
574
575 _enter_chat(job_id, show_history=not args.no_history, history_limit=args.history_limit)
576
577
578def cmd_home(args: argparse.Namespace) -> None:
579 _install_readline_history()
580 config = load_config()
581 if not _model_setup_verified(config) and _workspace_has_model_config(config):
582 _auto_verify_model_setup(config)
583 if not _model_setup_verified(load_config()):
584 _enter_first_run_setup(history_limit=args.history_limit)
585 return
586
587 if _has_saved_jobs():
588 _start_interactive_daemon_if_possible()
589 _enter_workspace_chat(history_limit=args.history_limit)
590
591
592def _enter_first_run_setup(*, history_limit: int = 12) -> None:
593 if _frame_chat_enabled():
594 _enter_first_run_frame(history_limit=history_limit)
595 return
596
597 print("Nipux setup requires an interactive terminal.")
598 print("Run `nipux` in a terminal window to choose model, endpoint, and tool access.")
599
600
601def _enter_empty_workspace(*, history_limit: int = 12) -> None:
602 del history_limit
603 db, _ = _db()
604 try:
605 jobs = db.list_jobs()[:12]
606 finally:
607 db.close()
608 print("NIPUX WORKSPACE")
609 print(_rule("="))
610 if jobs:
611 print("Jobs")
612 for job in jobs:
613 marker = "*" if str(job.get("id") or "") == _read_shell_state().get("focus_job_id") else " "
614 title = _one_line(job.get("title") or job.get("id") or "untitled", 44)
615 print(f"{marker} {title:44} {job.get('status') or 'unknown'}")
616 print()
617 print("Run in a terminal to use the full-screen workspace, or use: nipux chat JOB_TITLE")
618 else:
619 print("No jobs are saved in this profile.")
620 print("Create a job with: nipux create \"objective\"")
621 print("Settings: nipux init --force | Check setup: nipux doctor --check-model")
622
623
624def _has_saved_jobs() -> bool:
625 db, _ = _db()
626 try:
627 return bool(db.list_jobs()[:1])
628 finally:
629 db.close()
630
631
632def _enter_workspace_chat(*, history_limit: int = 12) -> None:
633 if _frame_chat_enabled():
634 _enter_chat_frame(WORKSPACE_CHAT_ID, history_limit=history_limit)
635 return
636 _enter_empty_workspace(history_limit=history_limit)
637
638
639def _print_first_run_menu() -> None:
640 config = load_config()
641 print("Start")
642 print(f" model {config.model.model}")
643 print(" status ready when work starts")
644 print(f" home {_short_path(config.runtime.home)}")
645 print()
646 print("Commands")
647 print(" 1 doctor verify provider/model")
648 print(" 2 init write config/env template")
649 print(" 3 exit leave")
650 print()
651 print("Finish setup before chat or job creation is available.")
652
653
654def _handle_first_run_menu_line(line: str, *, history_limit: int = 12) -> bool:
655 line = line.strip()
656 if not line:
657 _print_first_run_menu()
658 return True
659 if line.startswith("/"):
660 line = line[1:].strip()
661 lowered = line.lower()
662 if lowered in {"exit", "quit", ":q", "3", "5"}:
663 return False
664 if lowered in {"help", "?", "commands"}:
665 _print_first_run_menu()
666 return True
667 if lowered in {"new"} or lowered.startswith("new "):
668 print("Finish setup first. Then describe worker jobs in the chat workspace.")
669 return True
670 if lowered in {"jobs", "ls"}:
671 cmd_jobs(argparse.Namespace())
672 return True
673 if lowered in {"1", "doctor"}:
674 try:
675 cmd_doctor(argparse.Namespace(check_model=False))
676 except SystemExit:
677 pass
678 return True
679 if lowered in {"2", "init"}:
680 cmd_init(argparse.Namespace(path=None, force=False))
681 return True
682 first = _first_token(line)
683 if first in {"create", "new"}:
684 print("Finish setup first. Then describe worker jobs in the chat workspace.")
685 return True
686 if first in SHELL_COMMAND_NAMES:
687 _run_shell_line(line)
688 return True
689 objective = _extract_job_objective_from_message(line)
690 if objective:
691 print("Finish setup first. Then describe worker jobs in the chat workspace.")
692 return True
693 print(_first_run_chat_reply(line))
694 return True
695
696
697def _prompt_first_run_value(label: str) -> str:
698 try:
699 return input(f"{label} > ").strip()
700 except (EOFError, KeyboardInterrupt):
701 print()
702 return ""
703
704
705def _first_run_create_and_open(objective: str, *, history_limit: int = 12) -> None:
706 if not _ensure_model_setup_verified_for_workspace():
707 return
708 job_id, title = _create_job(objective=objective, title=None, kind="generic", cadence=None)
709 _write_shell_state({"focus_job_id": job_id})
710 print(f"created {title}")
711 _start_interactive_daemon_if_possible()
712 print("Opening workspace.")
713 _enter_workspace_chat(history_limit=history_limit)
714
715
716def _first_token(line: str) -> str:
717 return _controller_first_token(line)
718
719
720def _enter_first_run_frame(*, history_limit: int = 12) -> None:
721 next_job_id = _run_first_run_frame(deps=_first_run_runtime_deps())
722 if next_job_id == WORKSPACE_CHAT_ID:
723 _enter_workspace_chat(history_limit=history_limit)
724 elif next_job_id:
725 _start_interactive_daemon_if_possible()
726 _write_shell_state({"focus_job_id": next_job_id})
727 _enter_workspace_chat(history_limit=history_limit)
728
729
730def _first_run_runtime_deps() -> FirstRunRuntimeDeps:
731 return FirstRunRuntimeDeps(
732 render_frame=lambda buffer, notices, selected, view, editing_field, previous: _render_first_run_frame(
733 buffer,
734 notices,
735 selected=selected,
736 view=view,
737 editing_field=editing_field,
738 previous_frame=previous,
739 ),
740 actions=_first_run_actions,
741 handle_action=_handle_first_run_action,
742 handle_line=_handle_first_run_frame_line,
743 click_action=lambda x, y, view: _first_run_click_action(x, y, view=view),
744 )
745
746
747def _first_run_actions(view: str) -> list[tuple[str, str, str]]:
748 return _first_run_tui_actions(view)
749
750
751def _clamp_first_run_selection(selected: int, view: str) -> int:
752 return _clamp_first_run_runtime_selection(selected, _first_run_actions(view))
753
754
755def _handle_first_run_action(action: str) -> tuple[str, str | list[str] | None]:
756 return _controller_handle_first_run_action(action, deps=_first_run_frame_deps())
757
758
759def _first_run_click_action(x: int, y: int, *, view: str) -> int | str | None:
760 width, height = shutil.get_terminal_size((100, 30))
761 width = max(92, width)
762 actions = _first_run_actions(view)
763 if not actions or y < 10 or y > max(10, height - 4):
764 return None
765 gap = 2
766 card_width = max(18, min(34, (width - (len(actions) - 1) * gap - 4) // len(actions)))
767 total_width = len(actions) * card_width + (len(actions) - 1) * gap
768 start_x = max(1, (width - total_width) // 2 + 1)
769 relative = x - start_x
770 if relative < 0 or relative >= total_width:
771 return None
772 span = card_width + gap
773 index = relative // span
774 within_card = relative % span < card_width
775 if not within_card:
776 return None
777 return index if 0 <= index < len(actions) else None
778
779
780def _chat_page_click(x: int, y: int, *, right_view: str) -> str | None:
781 del right_view
782 width, _height = shutil.get_terminal_size((100, 30))
783 width = max(92, width)
784 right_width = min(max(52, int(width * 0.36)), 72)
785 left_width = max(48, width - right_width - 3)
786 if left_width < 48:
787 left_width = 48
788 right_width = max(34, width - left_width - 3)
789 right_start = left_width + 4
790 if x < right_start or y > 8:
791 return None
792 relative = max(0, x - right_start)
793 third = max(1, right_width // 3)
794 return ["updates", "status", "work"][min(2, relative // third)]
795
796
797def _handle_first_run_frame_line(line: str) -> tuple[str, str | list[str] | None]:
798 return _controller_handle_first_run_frame_line(line, deps=_first_run_frame_deps())
799
800
801def _first_run_chat_reply(message: str) -> str:
802 return _controller_first_run_chat_reply(message)
803
804
805def _create_first_run_job(objective: str) -> str | list[str]:
806 return _controller_create_first_run_job(objective, deps=_first_run_frame_deps())
807
808
809def _capture_first_run_command(line: str) -> list[str]:
810 return _controller_capture_first_run_command(line, _run_shell_line)
811
812
813def _first_run_frame_deps() -> FirstRunFrameDeps:
814 return FirstRunFrameDeps(
815 capture_command=_capture_first_run_command,
816 capture_setting_command=_capture_setting_command,
817 create_job=_create_job,
818 current_default_job_id=_current_default_job_id,
819 extract_objective=_extract_job_objective_from_message,
820 model_setup_verified=lambda: _model_setup_verified(load_config()),
821 verify_model_setup=_verify_model_setup_from_first_run,
822 shell_command_names=SHELL_COMMAND_NAMES,
823 )
824
825
826def _current_default_job_id() -> str | None:
827 db, _ = _db()
828 try:
829 return _default_job_id(db)
830 finally:
831 db.close()
832
833
834def _render_first_run_frame(
835 input_buffer: str,
836 notices: list[str],
837 *,
838 selected: int = 0,
839 view: str = "start",
840 editing_field: str | None = None,
841 previous_frame: str = "",
842) -> str:
843 width, height = shutil.get_terminal_size((100, 30))
844 frame = _build_first_run_frame(
845 input_buffer,
846 notices,
847 width=width,
848 height=height,
849 selected=selected,
850 view=view,
851 editing_field=editing_field,
852 )
853 return _emit_frame_if_changed(frame, previous_frame)
854
855
856def _build_first_run_frame(
857 input_buffer: str,
858 notices: list[str],
859 *,
860 width: int,
861 height: int,
862 selected: int = 0,
863 view: str = "start",
864 editing_field: str | None = None,
865) -> str:
866 width = max(92, width)
867 height = max(22, height)
868 config = load_config()
869 daemon = daemon_lock_status(config.runtime.home / "agentd.lock")
870 jobs: list[dict[str, Any]] = []
871 db, _ = _db()
872 try:
873 jobs = db.list_jobs()
874 finally:
875 db.close()
876 right_width = _first_run_columns(width)[1]
877 return _build_first_run_tui_frame(
878 input_buffer,
879 notices,
880 width=width,
881 height=height,
882 selected=selected,
883 view=view,
884 editing_field=editing_field,
885 config=config,
886 jobs=jobs,
887 daemon_text=_daemon_state_line(daemon),
888 home=_short_path(config.runtime.home, max_width=max(20, right_width - 8)),
889 config_path=_short_path(config.runtime.home / "config.yaml", max_width=max(20, right_width - 8)),
890 )
891
892
893def _enter_chat(job_id: str, *, show_history: bool, history_limit: int = 12) -> None:
894 if not _ensure_model_setup_verified_for_workspace():
895 return
896 _install_readline_history()
897 startup_note = _start_interactive_daemon_if_possible()
898 if _frame_chat_enabled():
899 _enter_chat_frame(job_id, history_limit=history_limit)
900 return
901 db, _ = _db()
902 try:
903 job = db.get_job(job_id)
904 _write_shell_state({"focus_job_id": job_id})
905 finally:
906 db.close()
907
908 if _fancy_ui():
909 print("\033[2J\033[H", end="")
910 print(NIPUX_BANNER)
911 print(_rule("="))
912 print(_shell_summary())
913 print(_rule("="))
914 if show_history:
915 _print_startup_history(job_id, limit=history_limit, chars=180)
916 print()
917 if startup_note:
918 print(_one_line(startup_note, 180))
919 _print_chat_composer(job)
920 live_stop, live_thread = _start_chat_live_feed(job_id)
921 try:
922 while True:
923 db, _ = _db()
924 try:
925 refreshed = _default_job_id(db)
926 if refreshed:
927 job_id = refreshed
928 job = db.get_job(job_id)
929 finally:
930 db.close()
931 try:
932 line = input(_chat_prompt(job))
933 except EOFError:
934 print()
935 return
936 except KeyboardInterrupt:
937 print()
938 continue
939 if not _chat_handle_line(job_id, line):
940 return
941 finally:
942 if live_stop is not None:
943 live_stop.set()
944 if live_thread is not None:
945 live_thread.join(timeout=1.0)
946
947
948def _frame_chat_enabled() -> bool:
949 return (
950 sys.stdin.isatty()
951 and sys.stdout.isatty()
952 and not os.environ.get("NIPUX_APPEND_LIVE")
953 and not os.environ.get("NIPUX_NO_FRAME")
954 )
955
956
957def _enter_chat_frame(job_id: str, *, history_limit: int = 12) -> None:
958 _run_chat_frame(job_id, history_limit=history_limit, deps=_chat_frame_deps())
959
960
961def _chat_frame_deps() -> ChatFrameDeps:
962 return ChatFrameDeps(
963 load_snapshot=lambda job_id, history_limit: _load_frame_snapshot(job_id, history_limit=history_limit),
964 render_frame=lambda snapshot, buffer, notices, right_view, selected, editing_field, modal_view, previous: _render_chat_frame(
965 snapshot,
966 buffer,
967 notices,
968 right_view=right_view,
969 selected_control=selected,
970 editing_field=editing_field,
971 modal_view=modal_view,
972 previous_frame=previous,
973 ),
974 handle_chat_message=lambda job_id, line: _handle_chat_message(job_id, line, quiet=True),
975 capture_chat_command=_capture_chat_command,
976 write_shell_state=_write_shell_state,
977 is_plain_chat_line=_is_plain_chat_line,
978 page_click=lambda x, y, right_view: _chat_page_click(x, y, right_view=right_view),
979 )
980
981
982def _capture_chat_command(job_id: str, line: str) -> tuple[bool, str]:
983 stream = StringIO()
984 with redirect_stdout(stream):
985 if job_id == WORKSPACE_CHAT_ID:
986 raw = line.strip()
987 command = raw[1:].strip() if raw.startswith("/") else (chat_control_command(raw).lstrip("/") or raw)
988 keep_running = _run_workspace_command_line(command) if command else True
989 else:
990 keep_running = _chat_handle_line(job_id, line)
991 return keep_running, stream.getvalue()
992
993
994def _run_workspace_command_line(command: str) -> bool:
995 try:
996 tokens = shlex.split(command)
997 except ValueError as exc:
998 print(f"parse error: {exc}")
999 return True
1000 if tokens and tokens[0] == "help":
1001 _print_workspace_chat_help()
1002 return True
1003 if tokens and _run_workspace_setting_command(tokens[0], tokens[1:]):
1004 return True
1005 if tokens and tokens[0] == "new":
1006 objective = command[len("new") :].strip()
1007 if not objective:
1008 print("usage: /new OBJECTIVE")
1009 return True
1010 operator_line = f"/new {objective}"
1011 _append_workspace_chat_event("operator_message", "command", operator_line, {"source": "workspace"})
1012 message = _create_workspace_job_from_chat(operator_line, objective)
1013 _append_workspace_chat_event("agent_message", "chat", message, {"source": "workspace"})
1014 print(message)
1015 return True
1016 if tokens and tokens[0] == "run":
1017 if len(tokens) > 1 and not tokens[1].startswith("-") and _workspace_command_should_create_worker(command, " ".join(tokens[1:])):
1018 objective = _extract_job_objective_from_message(command)
1019 operator_line = f"/{tokens[0]} {objective}"
1020 _append_workspace_chat_event("operator_message", "command", operator_line, {"source": "workspace"})
1021 message = _create_workspace_job_from_chat(operator_line, objective)
1022 _append_workspace_chat_event("agent_message", "chat", message, {"source": "workspace"})
1023 print(message)
1024 return True
1025 return _run_workspace_run_command(tokens)
1026 if tokens and tokens[0] in {"start", "launch"} and len(tokens) > 1 and not tokens[1].startswith("-"):
1027 target_text = " ".join(tokens[1:])
1028 if _workspace_command_should_create_worker(command, target_text):
1029 objective = _extract_job_objective_from_message(command)
1030 operator_line = f"/{tokens[0]} {objective}"
1031 _append_workspace_chat_event("operator_message", "command", operator_line, {"source": "workspace"})
1032 message = _create_workspace_job_from_chat(operator_line, objective)
1033 _append_workspace_chat_event("agent_message", "chat", message, {"source": "workspace"})
1034 print(message)
1035 return True
1036 return _run_workspace_run_command(["run", target_text])
1037 return _run_shell_line(command)
1038
1039
1040def _run_workspace_setting_command(command: str, rest: list[str]) -> bool:
1041 if command == "settings":
1042 command = "config"
1043 if command in {"config", "key", "api-key"} or command in CHAT_SETTING_COMMANDS:
1044 return _handle_chat_setting_command(command, rest)
1045 return False
1046
1047
1048def _workspace_command_should_create_worker(command: str, target_text: str) -> bool:
1049 objective = _extract_job_objective_from_message(command)
1050 if not objective:
1051 return False
1052 db, _config = _db()
1053 try:
1054 return _find_job(db, target_text) is None
1055 finally:
1056 db.close()
1057
1058
1059def _run_workspace_run_command(tokens: list[str]) -> bool:
1060 try:
1061 parsed = build_parser().parse_args(tokens)
1062 except SystemExit as exc:
1063 if exc.code:
1064 print(f"command exited with status {exc.code}")
1065 return True
1066 parsed.no_follow = True
1067 parsed.quiet = True
1068 parsed.func(parsed)
1069 return True
1070
1071
1072def _print_workspace_chat_help() -> None:
1073 print("Create: type a goal, or /new OBJECTIVE.")
1074 print("Run: /run, /pause, /resume. Inspect: /jobs, /outcomes, /artifacts, /activity.")
1075 print("Config: /settings, /model, /base-url, /api-key. Navigate: ←→ pages, ↑↓ jobs.")
1076
1077
1078def _start_worker_from_chat_context(
1079 *,
1080 poll_seconds: float = 0.0,
1081 fake: bool = False,
1082 quiet: bool = True,
1083 log_file: str | None = None,
1084) -> bool:
1085 """Start the daemon from the TUI without dumping preflight internals into chat."""
1086
1087 def report(message: str) -> None:
1088 if not quiet:
1089 print(message)
1090
1091 stream = StringIO()
1092 try:
1093 with redirect_stdout(stream):
1094 _start_daemon_if_needed(poll_seconds=poll_seconds, fake=fake, quiet=True, log_file=log_file)
1095 except SystemExit as exc:
1096 detail = _one_line(str(exc) or "daemon start failed", 120)
1097 report(f"worker not started: {detail}")
1098 return False
1099 except Exception as exc:
1100 detail = _one_line(f"{type(exc).__name__}: {exc}", 120)
1101 report(f"worker not started: {detail}")
1102 return False
1103 output = stream.getvalue()
1104 lowered = output.lower()
1105 if (
1106 "model is not ready" in lowered
1107 or "model setup is not verified" in lowered
1108 or "model_generation:" in lowered
1109 or "model_endpoint:" in lowered
1110 or "model_auth:" in lowered
1111 or "model_config:" in lowered
1112 ):
1113 report("worker not started: model provider is not ready. Use /settings, then /doctor.")
1114 return False
1115 return True
1116
1117
1118def _start_worker_from_chat_namespace(args: argparse.Namespace) -> bool:
1119 return _start_worker_from_chat_context(
1120 poll_seconds=float(getattr(args, "poll_seconds", 0.0) or 0.0),
1121 fake=bool(getattr(args, "fake", False)),
1122 quiet=bool(getattr(args, "quiet", True)),
1123 log_file=getattr(args, "log_file", None),
1124 )
1125
1126
1127def _is_plain_chat_line(line: str) -> bool:
1128 stripped = line.strip()
1129 if not stripped or stripped.startswith("/"):
1130 return False
1131 lowered = stripped.lower()
1132 if lowered in {"help", "jobs", "ls", "clear", "exit", "quit"}:
1133 return False
1134 if chat_control_command(stripped):
1135 return False
1136 try:
1137 first = shlex.split(stripped)[0].lower()
1138 except (IndexError, ValueError):
1139 first = lowered.split(maxsplit=1)[0]
1140 return first not in {"chat", "focus", "switch", "jobs", "ls", "help", "clear", "exit", "quit"}
1141
1142
1143def _load_frame_snapshot(job_id: str, *, history_limit: int = 12) -> dict[str, Any]:
1144 db, config = _db()
1145 try:
1146 return load_frame_snapshot(
1147 db,
1148 config,
1149 job_id,
1150 default_job_id=_default_job_id(db),
1151 history_limit=history_limit,
1152 workspace_events=_workspace_chat_events() if job_id == WORKSPACE_CHAT_ID else None,
1153 )
1154 finally:
1155 db.close()
1156
1157
1158def _render_chat_frame(
1159 snapshot: dict[str, Any],
1160 input_buffer: str,
1161 notices: list[str],
1162 *,
1163 right_view: str = "updates",
1164 selected_control: int = 0,
1165 editing_field: str | None = None,
1166 modal_view: str | None = None,
1167 previous_frame: str = "",
1168) -> str:
1169 width, height = shutil.get_terminal_size((100, 30))
1170 frame = _build_chat_frame(
1171 snapshot,
1172 input_buffer,
1173 notices,
1174 width=width,
1175 height=height,
1176 right_view=right_view,
1177 selected_control=selected_control,
1178 editing_field=editing_field,
1179 modal_view=modal_view,
1180 )
1181 return _emit_frame_if_changed(frame, previous_frame)
1182
1183
1184def _build_chat_frame(
1185 snapshot: dict[str, Any],
1186 input_buffer: str,
1187 notices: list[str],
1188 *,
1189 width: int,
1190 height: int,
1191 right_view: str = "updates",
1192 selected_control: int = 0,
1193 editing_field: str | None = None,
1194 modal_view: str | None = None,
1195) -> str:
1196 return _build_chat_tui_frame(
1197 snapshot,
1198 input_buffer,
1199 notices,
1200 width=width,
1201 height=height,
1202 right_view=right_view,
1203 selected_control=selected_control,
1204 editing_field=editing_field,
1205 modal_view=modal_view,
1206 )
1207
1208
1209def _resolve_job_id(db: AgentDB, requested: Any = None) -> str | None:
1210 requested = _job_ref_text(requested)
1211 if requested:
1212 job = _find_job(db, requested)
1213 return str(job["id"]) if job else None
1214 return _default_job_id(db)
1215
1216
1217def _activate_job_if_planning(db: AgentDB, job_id: str) -> bool:
1218 job = db.get_job(job_id)
1219 if job.get("status") != "planning":
1220 return False
1221 db.update_job_status(job_id, "queued", metadata_patch={"planning_status": "accepted"})
1222 db.append_agent_update(job_id, "Plan accepted. I will start working from the planned tasks.", category="plan")
1223 return True
1224
1225
1226def _ensure_job_runnable(db: AgentDB, job_id: str) -> None:
1227 if _activate_job_if_planning(db, job_id):
1228 return
1229 job = db.get_job(job_id)
1230 status = str(job.get("status") or "")
1231 if status in {"completed", "paused", "cancelled", "failed"} or job_provider_blocked(job):
1232 patch = operator_resume_metadata()
1233 patch["last_note"] = f"reopened from {status} by operator run command"
1234 db.update_job_status(
1235 job_id,
1236 "queued",
1237 metadata_patch=patch,
1238 )
1239 db.append_agent_update(
1240 job_id,
1241 f"Reopened from {status}; continuing as a long-running job.",
1242 category="progress",
1243 metadata={"previous_status": status},
1244 )
1245
1246
1247def cmd_steer(args: argparse.Namespace) -> None:
1248 message = " ".join(args.message).strip()
1249 if not message:
1250 print("No steering message provided.")
1251 return
1252 db, _ = _db()
1253 try:
1254 job_id = _resolve_job_id(db, args.job_id)
1255 if not job_id:
1256 ref = _job_ref_text(args.job_id)
1257 print(f"No job matched: {ref}" if ref else "No jobs found. Create one first, then send steering.")
1258 return
1259 entry = db.append_operator_message(job_id, message, source="operator")
1260 job = db.get_job(job_id)
1261 print(f"waiting for {job['title']}: {entry['message']}")
1262 print("The next worker step will include this in model-visible context.")
1263 finally:
1264 db.close()
1265
1266
1267def cmd_pause(args: argparse.Namespace) -> None:
1268 db, _ = _db()
1269 try:
1270 job_id, note, ref = _resolve_control_job_and_note(db, args)
1271 if not job_id:
1272 print(f"No job matched: {ref}" if ref else "No jobs found.")
1273 return
1274 patch = {"last_note": note} if note else None
1275 db.update_job_status(job_id, "paused", metadata_patch=patch)
1276 job = db.get_job(job_id)
1277 print(f"paused {job['title']}" + (f": {note}" if note else ""))
1278 finally:
1279 db.close()
1280
1281
1282def cmd_resume(args: argparse.Namespace) -> None:
1283 db, _ = _db()
1284 try:
1285 job_id = _resolve_job_id(db, args.job_id)
1286 if not job_id:
1287 ref = _job_ref_text(args.job_id)
1288 print(f"No job matched: {ref}" if ref else "No jobs found.")
1289 return
1290 db.update_job_status(job_id, "queued", metadata_patch=operator_resume_metadata())
1291 job = db.get_job(job_id)
1292 print(f"resumed {job['title']}")
1293 finally:
1294 db.close()
1295
1296
1297def cmd_cancel(args: argparse.Namespace) -> None:
1298 db, _ = _db()
1299 try:
1300 job_id, note, ref = _resolve_control_job_and_note(db, args)
1301 if not job_id:
1302 print(f"No job matched: {ref}" if ref else "No jobs found.")
1303 return
1304 patch = {"last_note": note} if note else None
1305 db.update_job_status(job_id, "cancelled", metadata_patch=patch)
1306 job = db.get_job(job_id)
1307 print(f"cancelled {job['title']}" + (f": {note}" if note else ""))
1308 finally:
1309 db.close()
1310
1311
1312def cmd_status(args: argparse.Namespace) -> None:
1313 db, config = _db()
1314 try:
1315 job_id = _resolve_job_id(db, args.job_id)
1316 if _job_ref_text(args.job_id) and not job_id:
1317 print(f"No job matched: {_job_ref_text(args.job_id)}")
1318 return
1319 state = collect_dashboard_state(db, config, job_id=job_id, limit=args.limit)
1320 if args.json:
1321 print(json.dumps(state, ensure_ascii=False, indent=2, default=_json_default))
1322 return
1323 if args.full:
1324 print(render_dashboard(state, width=_terminal_width(), chars=args.chars), end="")
1325 else:
1326 print(render_overview(state, width=_terminal_width()), end="")
1327 finally:
1328 db.close()
1329
1330
1331def cmd_health(args: argparse.Namespace) -> None:
1332 db, config = _db()
1333 try:
1334 config.ensure_dirs()
1335 lock = daemon_lock_status(config.runtime.home / "agentd.lock")
1336 metadata = lock.get("metadata") if isinstance(lock.get("metadata"), dict) else {}
1337 events = read_daemon_events(config, limit=args.limit)
1338 job_id = _default_job_id(db)
1339 print("Nipux Health")
1340 print(_rule("="))
1341 print(f"daemon: {_daemon_state_line(lock)}")
1342 if metadata.get("last_heartbeat"):
1343 print(f"heartbeat: {metadata['last_heartbeat']}")
1344 if metadata.get("last_state"):
1345 print(f"state: {metadata['last_state']}")
1346 if metadata.get("last_status") or metadata.get("last_tool"):
1347 print(f"last step: {metadata.get('last_status') or '?'} {metadata.get('last_tool') or '-'}")
1348 if metadata.get("consecutive_failures"):
1349 print(f"consecutive failures: {metadata['consecutive_failures']}")
1350 if metadata.get("last_error"):
1351 print(
1352 f"last error: {metadata.get('last_error_type') or 'error'}: {_one_line(metadata['last_error'], args.chars)}"
1353 )
1354 print(f"model: {config.model.model}")
1355 print(f"state db: {config.runtime.state_db_path}")
1356 print(f"daemon log: {config.runtime.logs_dir / 'daemon.log'}")
1357 print(f"event log: {config.runtime.logs_dir / 'daemon-events.jsonl'}")
1358 print(f"autostart: {'installed' if _launch_agent_path().exists() else 'not installed'}")
1359 if job_id:
1360 job = db.get_job(job_id)
1361 steps = db.list_steps(job_id=job_id)
1362 artifacts = db.list_artifacts(job_id, limit=1)
1363 print()
1364 print(f"focus: {job['title']}")
1365 state = _job_display_state(job, bool(lock["running"]))
1366 print(
1367 f"state: {state} | worker: {_worker_label(job, bool(lock['running']))} | "
1368 f"steps: {_step_count(steps)} | latest artifacts: {len(artifacts)}"
1369 )
1370 if steps:
1371 print(f"latest: {_step_line(steps[-1], chars=args.chars)}")
1372 else:
1373 print()
1374 print("focus: no jobs")
1375 if events:
1376 print()
1377 print("recent daemon events:")
1378 job_titles = {job["id"]: job["title"] for job in db.list_jobs()}
1379 for event in events[-args.limit :]:
1380 print(f" {_daemon_event_line(event, chars=args.chars, job_titles=job_titles)}")
1381 else:
1382 print()
1383 print("recent daemon events: none")
1384 finally:
1385 db.close()
1386
1387
1388def cmd_history(args: argparse.Namespace) -> None:
1389 db, _ = _db()
1390 try:
1391 job_id = _resolve_job_id(db, args.job_id)
1392 if not job_id:
1393 ref = _job_ref_text(args.job_id)
1394 print(f"No job matched: {ref}" if ref else "No jobs found.")
1395 return
1396 job = db.get_job(job_id)
1397 events = db.list_timeline_events(job_id, limit=args.limit)
1398 if args.json:
1399 print(
1400 json.dumps(
1401 [_public_event(event) for event in events], ensure_ascii=False, indent=2, default=_json_default
1402 )
1403 )
1404 return
1405 print(f"history {job['title']}")
1406 print(_rule("="))
1407 if not events:
1408 print("No visible history yet.")
1409 return
1410 for event in events:
1411 if args.full:
1412 print(_event_line(event, chars=max(args.chars, 1200), full=True))
1413 else:
1414 _print_event_card(event, chars=args.chars)
1415 finally:
1416 db.close()
1417
1418
1419def cmd_events(args: argparse.Namespace) -> None:
1420 db, _ = _db()
1421 seen: set[str] = set()
1422 try:
1423 job_id = _resolve_job_id(db, args.job_id)
1424 if not job_id:
1425 ref = _job_ref_text(args.job_id)
1426 print(f"No job matched: {ref}" if ref else "No jobs found.")
1427 return
1428 job = db.get_job(job_id)
1429 if not args.json:
1430 print(f"events {job['title']}")
1431 print(_rule("="))
1432
1433 def emit() -> None:
1434 events = db.list_timeline_events(job_id, limit=args.limit)
1435 printed = False
1436 for event in events:
1437 event_id = str(event.get("id") or "")
1438 if event_id in seen:
1439 continue
1440 seen.add(event_id)
1441 if args.json:
1442 print(json.dumps(_public_event(event), ensure_ascii=False, default=_json_default), flush=True)
1443 else:
1444 if args.full:
1445 print(_event_line(event, chars=args.chars, full=True), flush=True)
1446 else:
1447 _print_event_card(event, chars=args.chars)
1448 printed = True
1449 if printed and not args.json:
1450 print(_rule("-"), flush=True)
1451
1452 emit()
1453 while args.follow:
1454 time.sleep(args.interval)
1455 emit()
1456 except KeyboardInterrupt:
1457 print("\nevents stopped")
1458 finally:
1459 db.close()
1460
1461
1462def cmd_dashboard(args: argparse.Namespace) -> None:
1463 db, config = _db()
1464 try:
1465 while True:
1466 job_id = _resolve_job_id(db, args.job_id)
1467 if _job_ref_text(args.job_id) and not job_id:
1468 print(f"No job matched: {_job_ref_text(args.job_id)}")
1469 return
1470 state = collect_dashboard_state(db, config, job_id=job_id, limit=args.limit)
1471 if args.clear:
1472 print("\033[2J\033[H", end="")
1473 print(render_dashboard(state, width=_terminal_width(), chars=args.chars), end="", flush=True)
1474 if not args.follow:
1475 return
1476 time.sleep(args.interval)
1477 except KeyboardInterrupt:
1478 print("\ndashboard stopped")
1479 finally:
1480 db.close()
1481
1482
1483def cmd_artifacts(args: argparse.Namespace) -> None:
1484 db, _ = _db()
1485 try:
1486 job_id = _resolve_job_id(db, args.job_id)
1487 if not job_id:
1488 ref = _job_ref_text(args.job_id)
1489 print(f"No job matched: {ref}" if ref else "No jobs found.")
1490 return
1491 job = db.get_job(job_id)
1492 artifacts = db.list_artifacts(job_id, limit=args.limit)
1493 if not artifacts:
1494 print(f"No saved outputs recorded for {job['title']}.")
1495 return
1496 print(f"saved outputs {job['title']} (newest first)")
1497 print(_rule("-"))
1498 print("Open one with: artifact NUMBER, artifact latest, or artifact TITLE")
1499 for index, artifact in enumerate(artifacts, start=1):
1500 title = artifact.get("title") or artifact["id"]
1501 print(f"{index:>2}. {_one_line(title, 72)}")
1502 meta = f"{artifact['created_at']} | {artifact['type']} | id {artifact['id']}"
1503 print(f" {meta}")
1504 if artifact.get("summary"):
1505 print(f" {_one_line(_generic_display_text(artifact['summary']), args.chars)}")
1506 print(f" view: artifact {index}")
1507 if args.paths:
1508 print(f" path: {artifact['path']}")
1509 finally:
1510 db.close()
1511
1512
1513def cmd_artifact(args: argparse.Namespace) -> None:
1514 db, config = _db()
1515 try:
1516 store = ArtifactStore(config.runtime.home, db=db)
1517 ref = _job_ref_text(args.artifact_id_or_path)
1518 resolved = _resolve_artifact_ref(db, config, ref, job_id=_resolve_job_id(db, getattr(args, "job_id", None)))
1519 if not resolved:
1520 print(f"No artifact matched: {ref}")
1521 return
1522 content = store.read_text(resolved["id"] if resolved.get("id") else resolved["path"])
1523 if resolved.get("title"):
1524 print(f"artifact: {resolved['title']}")
1525 if resolved.get("summary"):
1526 print(f"summary: {resolved['summary']}")
1527 print(_rule("-"))
1528 if args.chars and len(content) > args.chars:
1529 content = content[: args.chars] + f"\n... truncated {len(content) - args.chars} chars\n"
1530 print(content, end="" if content.endswith("\n") else "\n")
1531 finally:
1532 db.close()
1533
1534
1535def cmd_lessons(args: argparse.Namespace) -> None:
1536 db, _ = _db()
1537 try:
1538 job_id = _resolve_job_id(db, args.job_id)
1539 if not job_id:
1540 ref = _job_ref_text(args.job_id)
1541 print(f"No job matched: {ref}" if ref else "No jobs found.")
1542 return
1543 job = db.get_job(job_id)
1544 _print_lessons(job, limit=args.limit, chars=args.chars)
1545 finally:
1546 db.close()
1547
1548
1549def cmd_learn(args: argparse.Namespace) -> None:
1550 lesson = " ".join(args.lesson).strip()
1551 if not lesson:
1552 print("usage: learn [--job JOB_TITLE] [--category CATEGORY] LESSON")
1553 return
1554 db, _ = _db()
1555 try:
1556 job_id = _resolve_job_id(db, args.job_id)
1557 if not job_id:
1558 ref = _job_ref_text(args.job_id)
1559 print(f"No job matched: {ref}" if ref else "No jobs found.")
1560 return
1561 entry = db.append_lesson(
1562 job_id, lesson, category=args.category or "operator_preference", metadata={"source": "operator"}
1563 )
1564 job = db.get_job(job_id)
1565 print(f"learned for {job['title']}: {_one_line(entry['lesson'], args.chars)}")
1566 finally:
1567 db.close()
1568
1569
1570def cmd_findings(args: argparse.Namespace) -> None:
1571 return cmd_findings_impl(args, _record_command_deps())
1572
1573
1574def cmd_tasks(args: argparse.Namespace) -> None:
1575 return cmd_tasks_impl(args, _record_command_deps())
1576
1577
1578def cmd_roadmap(args: argparse.Namespace) -> None:
1579 return cmd_roadmap_impl(args, _record_command_deps())
1580
1581
1582def cmd_experiments(args: argparse.Namespace) -> None:
1583 return cmd_experiments_impl(args, _record_command_deps())
1584
1585
1586def cmd_sources(args: argparse.Namespace) -> None:
1587 return cmd_sources_impl(args, _record_command_deps())
1588
1589
1590def cmd_memory(args: argparse.Namespace) -> None:
1591 return cmd_memory_impl(args, _record_command_deps())
1592
1593
1594def cmd_metrics(args: argparse.Namespace) -> None:
1595 return cmd_metrics_impl(args, _record_command_deps())
1596
1597
1598def cmd_usage(args: argparse.Namespace) -> None:
1599 return cmd_usage_impl(args, _record_command_deps())
1600
1601
1602def _remote_model_preflight_failures(config) -> list[str]:
1603 return _daemon_remote_model_preflight_failures(config, doctor_fn=run_doctor)
1604
1605
1606def _recoverable_remote_model_preflight_failures(config) -> list[str]:
1607 return _daemon_recoverable_remote_model_preflight_failures(config, doctor_fn=run_doctor)
1608
1609
1610def _provider_preflight_is_recoverable(failures: list[str]) -> bool:
1611 return _daemon_provider_preflight_is_recoverable(failures)
1612
1613
1614def _ensure_remote_model_ready_for_worker(config, *, fake: bool) -> bool:
1615 return _daemon_ensure_remote_model_ready(config, fake=fake, doctor_fn=run_doctor)
1616
1617
1618def cmd_start(args: argparse.Namespace) -> None:
1619 return _cmd_start_impl(
1620 args,
1621 ready_fn=lambda config, fake: _ensure_remote_model_ready_for_worker(config, fake=fake),
1622 stop_fn=lambda config, wait, quiet: _stop_daemon_process(config, wait=wait, quiet=quiet),
1623 )
1624
1625
1626def _start_daemon_if_needed(
1627 *, poll_seconds: float, fake: bool = False, quiet: bool = False, log_file: str | None = None
1628) -> None:
1629 return _start_daemon_if_needed_impl(
1630 poll_seconds=poll_seconds,
1631 fake=fake,
1632 quiet=quiet,
1633 log_file=log_file,
1634 start_fn=cmd_start,
1635 stop_fn=lambda config, wait, quiet: _stop_daemon_process(config, wait=wait, quiet=quiet),
1636 )
1637
1638
1639def _start_interactive_daemon_if_possible() -> str:
1640 """Best-effort daemon start for the full-screen UI without printing over the frame."""
1641
1642 stream = StringIO()
1643 with redirect_stdout(stream):
1644 try:
1645 _start_daemon_if_needed(poll_seconds=0.0, quiet=True)
1646 except SystemExit:
1647 pass
1648 return stream.getvalue()
1649
1650
1651def cmd_restart(args: argparse.Namespace) -> None:
1652 return _cmd_restart_impl(
1653 args,
1654 start_fn=cmd_start,
1655 stop_fn=lambda config, wait, quiet: _stop_daemon_process(config, wait=wait, quiet=quiet),
1656 )
1657
1658
1659def _stop_daemon_process(config, *, wait: float, quiet: bool) -> bool:
1660 return _stop_daemon_process_impl(config, wait=wait, quiet=quiet, pid_alive=_pid_is_alive)
1661
1662
1663def cmd_stop(args: argparse.Namespace) -> None:
1664 requested_job = _job_ref_text(getattr(args, "job_id", None))
1665 if requested_job:
1666 db, _ = _db()
1667 try:
1668 job_id = _resolve_job_id(db, requested_job)
1669 if not job_id:
1670 print(f"No job matched: {requested_job}")
1671 return
1672 db.update_job_status(job_id, "paused", metadata_patch={"last_note": "stopped by operator"})
1673 job = db.get_job(job_id)
1674 print(f"stopped {job['title']} (paused job)")
1675 print("Use resume/run to start it again. Plain 'stop' still stops the daemon.")
1676 return
1677 finally:
1678 db.close()
1679
1680 config = load_config()
1681 _stop_daemon_process(config, wait=args.wait, quiet=False)
1682
1683
1684def cmd_browser_dashboard(args: argparse.Namespace) -> None:
1685 from nipux_cli.browser import _find_agent_browser
1686
1687 config = load_config()
1688 config.ensure_dirs()
1689 if args.stop:
1690 result = subprocess.run([*_find_agent_browser(), "dashboard", "stop"], check=False)
1691 if result.returncode:
1692 raise SystemExit(result.returncode)
1693 print("agent-browser dashboard stopped")
1694 return
1695
1696 command = [*_find_agent_browser(), "dashboard", "start", "--port", str(args.port)]
1697 if args.foreground:
1698 raise SystemExit(subprocess.call(command))
1699
1700 log_path = Path(args.log_file).expanduser() if args.log_file else config.runtime.logs_dir / "browser-dashboard.log"
1701 log_path.parent.mkdir(parents=True, exist_ok=True)
1702 with log_path.open("a", encoding="utf-8") as log_file:
1703 process = subprocess.Popen(
1704 command,
1705 cwd=str(Path.cwd()),
1706 stdout=log_file,
1707 stderr=subprocess.STDOUT,
1708 start_new_session=True,
1709 )
1710 print(f"agent-browser dashboard started pid={process.pid}")
1711 print(f"url: http://127.0.0.1:{args.port}")
1712 print(f"log: {log_path}")
1713
1714
1715def _print_startup_history(job_id: str, *, limit: int, chars: int) -> None:
1716 db, config = _db()
1717 try:
1718 job = db.get_job(job_id)
1719 jobs = db.list_jobs()
1720 steps = db.list_steps(job_id=job_id)
1721 artifacts = db.list_artifacts(job_id, limit=1000)
1722 memory_entries = db.list_memory(job_id)
1723 events = db.list_timeline_events(job_id, limit=limit)
1724 daemon = daemon_lock_status(config.runtime.home / "agentd.lock")
1725 finally:
1726 db.close()
1727 print()
1728 _print_session_overview(
1729 job,
1730 steps=steps,
1731 artifacts=artifacts,
1732 memory_entries=memory_entries,
1733 daemon_running=bool(daemon["running"]),
1734 model=config.model.model,
1735 artifacts_dir=config.runtime.jobs_dir / job_id / "artifacts",
1736 jobs=jobs,
1737 chars=chars,
1738 )
1739 print()
1740 print(_section_title("Recent activity", f"{job['title']}"))
1741 if not events:
1742 print(" No visible history yet.")
1743 return
1744 display_events = _important_startup_events(events, limit=min(limit, 8))
1745 artifact_indexes = {str(artifact["id"]): index for index, artifact in enumerate(artifacts, start=1)}
1746 for event in display_events:
1747 _print_event_card(event, chars=min(chars, 140), artifact_indexes=artifact_indexes)
1748 if len(events) > len(display_events):
1749 print(f" ... {len(events) - len(display_events)} older events hidden. Use /history for the full timeline.")
1750
1751
1752def _print_session_overview(
1753 job: dict[str, Any],
1754 *,
1755 steps: list[dict[str, Any]],
1756 artifacts: list[dict[str, Any]],
1757 memory_entries: list[dict[str, Any]],
1758 daemon_running: bool,
1759 model: str,
1760 artifacts_dir: Path,
1761 jobs: list[dict[str, Any]],
1762 chars: int,
1763) -> None:
1764 metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
1765 findings = _metadata_records(job, "finding_ledger")
1766 sources = _metadata_records(job, "source_ledger")
1767 tasks = _metadata_records(job, "task_queue")
1768 experiments = _metadata_records(job, "experiment_ledger")
1769 lessons = _metadata_records(job, "lessons")
1770 roadmap = metadata.get("roadmap") if isinstance(metadata.get("roadmap"), dict) else {}
1771 milestones = roadmap.get("milestones") if isinstance(roadmap.get("milestones"), list) else []
1772 open_tasks = sum(1 for task in tasks if str(task.get("status") or "open") in {"open", "active"})
1773 state = _job_display_state(job, daemon_running)
1774 worker = _worker_label(job, daemon_running)
1775 print(_section_title("Workspace"))
1776 print(f" model {model}")
1777 print(f" focus {job['title']}")
1778 print(f" state {_status_badge(state)} worker {_status_badge(worker)} kind {job['kind']}")
1779 next_action = _next_operator_action(job, daemon_running)
1780 if next_action:
1781 print(f" next {next_action}")
1782
1783 print()
1784 _print_jobs_panel(jobs, focused_job_id=str(job["id"]), daemon_running=daemon_running)
1785
1786 print()
1787 print(_section_title("Focus"))
1788 _print_wrapped(
1789 " goal ", job.get("objective") or "", width=_terminal_width(), subsequent_indent=" "
1790 )
1791 planning = metadata.get("planning") if isinstance(metadata.get("planning"), dict) else {}
1792 if job.get("status") == "planning" and planning:
1793 print(" plan waiting for your answers or /run")
1794 questions = planning.get("questions") if isinstance(planning.get("questions"), list) else []
1795 for question in questions[:3]:
1796 _print_wrapped(" question ", question, width=_terminal_width(), subsequent_indent=" ")
1797
1798 print()
1799 print(_section_title("Progress"))
1800 _print_metric_grid(
1801 [
1802 ("actions", _step_count(steps)),
1803 ("outputs", len(artifacts)),
1804 ("findings", len(findings)),
1805 ("sources", len(sources)),
1806 ("tasks", f"{len(tasks)} ({open_tasks} open)"),
1807 ("roadmap", len(milestones)),
1808 ("experiments", len(experiments)),
1809 ("lessons", len(lessons)),
1810 ("memory", len(memory_entries)),
1811 ]
1812 )
1813 print(f" output dir {_short_path(artifacts_dir, max_width=min(_terminal_width() - 13, 84))}")
1814
1815
1816def _print_chat_composer(job: dict[str, Any]) -> None:
1817 width = min(_terminal_width(), 96)
1818 if _fancy_ui():
1819 print(_accent("╭─ Message " + "─" * max(0, width - 11)))
1820 print("│ Type normally to chat. Live steps stream above. /jobs switches workspaces. /help shows commands.")
1821 print("╰─" + "─" * max(0, width - 2))
1822 return
1823 print(_section_title("Message"))
1824 print(" Type normally to chat. Live steps stream above. /jobs switches workspaces. /help shows commands.")
1825
1826
1827def _chat_prompt(job: dict[str, Any]) -> str:
1828 return f"{_accent('nipux')} > "
1829
1830
1831def _start_chat_live_feed(job_id: str) -> tuple[threading.Event | None, threading.Thread | None]:
1832 if (
1833 not sys.stdin.isatty()
1834 or not sys.stdout.isatty()
1835 or os.environ.get("NIPUX_NO_LIVE")
1836 or os.environ.get("NIPUX_PLAIN")
1837 ):
1838 return None, None
1839 stop = threading.Event()
1840 thread = threading.Thread(target=_chat_live_feed_loop, args=(job_id, stop), daemon=True)
1841 thread.start()
1842 return stop, thread
1843
1844
1845def _chat_live_feed_loop(initial_job_id: str, stop: threading.Event) -> None:
1846 seen_by_job: dict[str, set[str]] = {}
1847 initialized_jobs: set[str] = set()
1848 active_job_id = initial_job_id
1849 while not stop.wait(1.0):
1850 try:
1851 db, _ = _db()
1852 try:
1853 focused = _default_job_id(db) or active_job_id
1854 active_job_id = focused
1855 seen = seen_by_job.setdefault(focused, set())
1856 events = db.list_events(job_id=focused, limit=40)
1857 if focused not in initialized_jobs:
1858 initialized_jobs.add(focused)
1859 seen.update(str(event.get("id") or "") for event in events)
1860 continue
1861 for event in events:
1862 event_id = str(event.get("id") or "")
1863 if not event_id or event_id in seen:
1864 continue
1865 seen.add(event_id)
1866 line = _minimal_live_event_line(event)
1867 if line:
1868 _print_live_line(line)
1869 finally:
1870 db.close()
1871 except Exception:
1872 continue
1873
1874
1875def _print_live_line(line: str) -> None:
1876 try:
1877 if _fancy_ui():
1878 print(f"\r\033[K{_live_badge(line)} {line}\n{_chat_prompt({})}", end="", flush=True)
1879 else:
1880 print(f"\n· {line}", flush=True)
1881 except Exception:
1882 return
1883
1884
1885def _resolve_control_job_and_note(db: AgentDB, args: argparse.Namespace) -> tuple[str | None, str, str | None]:
1886 if hasattr(args, "parts"):
1887 parts = [str(part) for part in getattr(args, "parts") or []]
1888 if not parts:
1889 return _default_job_id(db), "", None
1890 for end in range(len(parts), 0, -1):
1891 ref = " ".join(parts[:end])
1892 job = _find_job(db, ref)
1893 if job:
1894 return str(job["id"]), " ".join(parts[end:]).strip(), ref
1895 return None, "", " ".join(parts)
1896 job_ref = _job_ref_text(getattr(args, "job_id", None))
1897 return _resolve_job_id(db, job_ref), _note_text(getattr(args, "note", None)), job_ref
1898
1899
1900def _pid_is_alive(pid: int) -> bool:
1901 try:
1902 os.kill(pid, 0)
1903 except OSError:
1904 return False
1905 return True
1906
1907
1908def _step_by_id(db: AgentDB, job_id: str, step_id: str) -> dict[str, Any] | None:
1909 for step in db.list_steps(job_id=job_id):
1910 if step["id"] == step_id:
1911 return step
1912 return None
1913
1914
1915def _step_count(steps: list[dict[str, Any]]) -> int:
1916 numbers = [int(step.get("step_no") or 0) for step in steps]
1917 return max(numbers, default=0)
1918
1919
1920def _job_lessons(job: dict[str, Any]) -> list[dict[str, Any]]:
1921 return _metadata_records(job, "lessons")
1922
1923
1924def _metadata_records(job: dict[str, Any], key: str) -> list[dict[str, Any]]:
1925 metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
1926 values = metadata.get(key) if isinstance(metadata.get(key), list) else []
1927 return [entry for entry in values if isinstance(entry, dict)]
1928
1929
1930def _print_lessons(job: dict[str, Any], *, limit: int, chars: int) -> None:
1931 lessons = _job_lessons(job)
1932 print(f"lessons {job['title']}")
1933 print(_rule("="))
1934 if not lessons:
1935 print("none yet")
1936 print("add one with: learn this source is not useful for the current objective")
1937 return
1938 for index, lesson in enumerate(lessons[-limit:], start=max(1, len(lessons) - limit + 1)):
1939 category = lesson.get("category") or "memory"
1940 confidence = lesson.get("confidence")
1941 suffix = f" | confidence {confidence:g}" if isinstance(confidence, (int, float)) else ""
1942 print(f"{index:>2}. {category}{suffix}")
1943 print(f" {_one_line(lesson.get('lesson') or '', chars)}")
1944
1945
1946def _resolve_artifact_ref(
1947 db: AgentDB,
1948 config: Any,
1949 query: str | None,
1950 *,
1951 job_id: str | None = None,
1952) -> dict[str, Any] | None:
1953 if not query:
1954 return None
1955 ref = query.strip()
1956 path = Path(ref).expanduser()
1957 if path.exists():
1958 return {"path": str(path), "title": path.name, "summary": ""}
1959
1960 ref_lower = ref.lower()
1961 focused_artifacts = db.list_artifacts(job_id, limit=250) if job_id else []
1962 if focused_artifacts and ref_lower in {"latest", "last", "newest"}:
1963 return focused_artifacts[0]
1964 index_ref = ref_lower[1:] if ref_lower.startswith("#") else ref_lower
1965 if focused_artifacts and index_ref.isdigit():
1966 index = int(index_ref)
1967 if 1 <= index <= len(focused_artifacts):
1968 return focused_artifacts[index - 1]
1969
1970 jobs = db.list_jobs()
1971 ordered_jobs = []
1972 if job_id:
1973 try:
1974 selected = db.get_job(job_id)
1975 ordered_jobs.append(selected)
1976 except KeyError:
1977 pass
1978 ordered_jobs.extend(job for job in jobs if not job_id or job["id"] != job_id)
1979 artifacts: list[dict[str, Any]] = []
1980 for job in ordered_jobs:
1981 artifacts.extend(db.list_artifacts(job["id"], limit=250))
1982
1983 for artifact in artifacts:
1984 if str(artifact["id"]).lower() == ref_lower:
1985 return artifact
1986 for artifact in artifacts:
1987 title = str(artifact.get("title") or "")
1988 if title.lower() == ref_lower:
1989 return artifact
1990 for artifact in artifacts:
1991 haystack = " ".join(str(artifact.get(key) or "") for key in ("title", "summary", "type")).lower()
1992 if ref_lower in haystack:
1993 return artifact
1994
1995 store = ArtifactStore(config.runtime.home, db=db)
1996 search_job_ids = [job_id] if job_id else [str(job["id"]) for job in ordered_jobs]
1997 for candidate_job_id in search_job_ids:
1998 if not candidate_job_id:
1999 continue
2000 for result in store.search_text(job_id=candidate_job_id, query=ref, limit=1):
2001 try:
2002 return db.get_artifact(str(result["id"]))
2003 except KeyError:
2004 continue
2005 return None
2006
2007
2008def cmd_logs(args: argparse.Namespace) -> None:
2009 db, _ = _db()
2010 try:
2011 job_id = _resolve_job_id(db, args.job_id)
2012 if not job_id:
2013 ref = _job_ref_text(args.job_id)
2014 print(f"No job matched: {ref}" if ref else "No jobs found.")
2015 return
2016 job = db.get_job(job_id)
2017 daemon = daemon_lock_status(load_config().runtime.home / "agentd.lock")
2018 print(f"{job['title']}\tstate {_job_display_state(job, bool(daemon['running']))}\t{job['kind']}")
2019 print()
2020 print("Runs")
2021 for run in db.list_runs(job_id, limit=args.limit):
2022 error = f"\tERROR {run['error']}" if run.get("error") else ""
2023 print(f"{run['started_at']}\t{run['status']}\t{run['id']}\t{run.get('model') or ''}{error}")
2024 print()
2025 print("Steps")
2026 steps = db.list_steps(job_id=job_id)[-args.limit :]
2027 if not steps:
2028 print("No steps recorded.")
2029 for step in steps:
2030 if args.verbose:
2031 _print_step(step, verbose=True, chars=args.chars)
2032 else:
2033 tool = step.get("tool_name") or "-"
2034 summary = _one_line(_clean_step_summary(step.get("summary") or ""), args.chars)
2035 error = f"\tERROR {step['error']}" if step.get("error") else ""
2036 print(
2037 f"#{step['step_no']}\t{step['started_at']}\t{step['status']}\t{step['kind']}\t{tool}\t{summary}{error}"
2038 )
2039 print()
2040 print("Artifacts")
2041 artifacts = db.list_artifacts(job_id, limit=args.limit)
2042 if not artifacts:
2043 print("No artifacts recorded.")
2044 for artifact in artifacts:
2045 print(
2046 f"{artifact['created_at']}\t{artifact['type']}\t{artifact.get('title') or artifact['id']}\t{artifact['path']}"
2047 )
2048 finally:
2049 db.close()
2050
2051
2052def cmd_activity(args: argparse.Namespace) -> None:
2053 db, _ = _db()
2054 seen_events: set[str] = set()
2055 try:
2056 job_id = _resolve_job_id(db, args.job_id)
2057 if not job_id:
2058 print("No jobs found.")
2059 return
2060 job = db.get_job(job_id)
2061 daemon = daemon_lock_status(load_config().runtime.home / "agentd.lock")
2062 print(f"activity {job['title']} | state {_job_display_state(job, bool(daemon['running']))}")
2063 print("tool calls, artifacts, learning, and messages, oldest to newest")
2064 print(_rule("-"))
2065
2066 def emit() -> None:
2067 events = db.list_timeline_events(job_id, limit=args.limit)
2068 printed = False
2069 for event in events:
2070 event_id = str(event.get("id") or "")
2071 if event_id in seen_events:
2072 continue
2073 print(_event_line(event, chars=args.chars, full=args.verbose))
2074 if args.verbose:
2075 _print_event_details(event, chars=args.chars)
2076 if args.paths and event.get("event_type") == "artifact":
2077 metadata = event.get("metadata") if isinstance(event.get("metadata"), dict) else {}
2078 if metadata.get("path"):
2079 print(f" path: {metadata['path']}")
2080 seen_events.add(event_id)
2081 printed = True
2082 if printed:
2083 print(_rule("-"))
2084
2085 emit()
2086 while args.follow:
2087 time.sleep(args.interval)
2088 emit()
2089 except KeyboardInterrupt:
2090 print("\nactivity stopped")
2091 finally:
2092 db.close()
2093
2094
2095def cmd_updates(args: argparse.Namespace) -> None:
2096 db, config = _db()
2097 try:
2098 if getattr(args, "all", False):
2099 print(
2100 "\n".join(
2101 render_all_updates_report(
2102 db,
2103 config,
2104 limit=args.limit,
2105 chars=args.chars,
2106 paths=args.paths,
2107 )
2108 )
2109 )
2110 return
2111 job_id = _resolve_job_id(db, args.job_id)
2112 if not job_id:
2113 print("No jobs found.")
2114 return
2115 print(
2116 "\n".join(
2117 render_updates_report(
2118 db,
2119 config,
2120 job_id,
2121 limit=args.limit,
2122 chars=args.chars,
2123 paths=args.paths,
2124 )
2125 )
2126 )
2127 finally:
2128 db.close()
2129
2130
2131def cmd_watch(args: argparse.Namespace) -> None:
2132 db, _ = _db()
2133 seen_runs: set[str] = set()
2134 seen_steps: set[str] = set()
2135 seen_artifacts: set[str] = set()
2136 try:
2137 job_id = _resolve_job_id(db, args.job_id)
2138 if not job_id:
2139 print(f"No job matched: {_job_ref_text(args.job_id)}")
2140 return
2141 job = db.get_job(job_id)
2142 daemon = daemon_lock_status(load_config().runtime.home / "agentd.lock")
2143 print(f"watching {job['title']} | state {_job_display_state(job, bool(daemon['running']))} | {job['kind']}")
2144 print(f"objective: {job['objective']}")
2145 print(
2146 "Note: this shows model-visible state, tool calls, outputs, and errors. It does not expose hidden chain-of-thought."
2147 )
2148 print()
2149
2150 def emit_snapshot(*, initial: bool = False) -> None:
2151 nonlocal job
2152 job = db.get_job(job_id)
2153 runs = list(reversed(db.list_runs(job_id, limit=args.limit)))
2154 steps = db.list_steps(job_id=job_id)[-args.limit :]
2155 artifacts = list(reversed(db.list_artifacts(job_id, limit=args.limit)))
2156 printed = False
2157 for run in runs:
2158 if run["id"] in seen_runs:
2159 continue
2160 if not initial:
2161 print()
2162 _print_run(run)
2163 seen_runs.add(run["id"])
2164 printed = True
2165 for step in steps:
2166 if step["id"] in seen_steps:
2167 continue
2168 if not initial and not printed:
2169 print()
2170 _print_step(step, verbose=args.verbose, chars=args.chars)
2171 seen_steps.add(step["id"])
2172 printed = True
2173 for artifact in artifacts:
2174 if artifact["id"] in seen_artifacts:
2175 continue
2176 if not initial and not printed:
2177 print()
2178 _print_artifact(artifact)
2179 seen_artifacts.add(artifact["id"])
2180 printed = True
2181 if printed:
2182 print(f"status: {job['status']}")
2183
2184 emit_snapshot(initial=True)
2185 while args.follow:
2186 time.sleep(args.interval)
2187 emit_snapshot()
2188 except KeyboardInterrupt:
2189 print("\nwatch stopped")
2190 finally:
2191 db.close()
2192
2193
2194def cmd_run_one(args: argparse.Namespace) -> None:
2195 from nipux_cli.worker import run_one_step
2196
2197 db, config = _db()
2198 try:
2199 job_id = _resolve_job_id(db, args.job_id)
2200 if not job_id:
2201 print(f"No job matched: {_job_ref_text(args.job_id)}")
2202 return
2203 if not args.fake and not _model_setup_verified(config):
2204 _ensure_model_setup_verified_for_workspace()
2205 return
2206 _activate_job_if_planning(db, job_id)
2207 llm = None
2208 if args.fake:
2209 from nipux_cli.llm import LLMResponse, ScriptedLLM, ToolCall
2210
2211 llm = ScriptedLLM(
2212 [
2213 LLMResponse(
2214 tool_calls=[
2215 ToolCall(
2216 name="write_artifact",
2217 arguments={
2218 "title": "fake-step",
2219 "type": "text",
2220 "summary": "Fake one-step smoke artifact",
2221 "content": "This is a fake bounded worker step.",
2222 },
2223 )
2224 ]
2225 )
2226 ]
2227 )
2228 result = run_one_step(job_id, config=config, db=db, llm=llm)
2229 print(json.dumps(result.__dict__, ensure_ascii=False, indent=2))
2230 finally:
2231 db.close()
2232
2233
2234def cmd_work(args: argparse.Namespace) -> None:
2235 from nipux_cli.worker import run_one_step
2236
2237 db, config = _db()
2238 try:
2239 job_id = _resolve_job_id(db, args.job_id)
2240 if not job_id:
2241 print('No jobs found. Create one with: nipux create "objective"')
2242 return
2243 if not args.fake and not _model_setup_verified(config):
2244 _ensure_model_setup_verified_for_workspace()
2245 return
2246 _activate_job_if_planning(db, job_id)
2247 job = db.get_job(job_id)
2248 print(f"working {job['title']} | state foreground | {job['kind']}")
2249 print(
2250 "Note: this shows model-visible state, tool calls, outputs, and errors. It does not expose hidden chain-of-thought."
2251 )
2252 print()
2253 for index in range(1, args.steps + 1):
2254 llm = None
2255 if args.fake:
2256 from nipux_cli.llm import LLMResponse, ScriptedLLM, ToolCall
2257
2258 llm = ScriptedLLM(
2259 [
2260 LLMResponse(
2261 tool_calls=[
2262 ToolCall(
2263 name="write_artifact",
2264 arguments={
2265 "title": f"fake-work-step-{index}",
2266 "type": "text",
2267 "summary": "Fake foreground work step",
2268 "content": f"This is fake foreground work step {index}.",
2269 },
2270 )
2271 ]
2272 )
2273 ]
2274 )
2275 print(f"work step {index}/{args.steps}", flush=True)
2276 result = run_one_step(job_id, config=config, db=db, llm=llm)
2277 step = _step_by_id(db, job_id, result.step_id)
2278 if step:
2279 _print_step(step, verbose=args.verbose, chars=args.chars)
2280 else:
2281 print(json.dumps(result.__dict__, ensure_ascii=False, indent=2, default=_json_default))
2282 if args.dashboard:
2283 state = collect_dashboard_state(db, config, job_id=job_id, limit=args.limit)
2284 print()
2285 print(render_dashboard(state, width=_terminal_width(), chars=args.chars), end="")
2286 if result.status == "failed" and not args.continue_on_error:
2287 print("stopped after failed step; pass --continue-on-error to keep going")
2288 return
2289 if index < args.steps and args.poll_seconds > 0:
2290 time.sleep(args.poll_seconds)
2291 finally:
2292 db.close()
2293
2294
2295def _pause_job_for_recoverable_provider_preflight(
2296 db: AgentDB,
2297 config: Any,
2298 job_id: str,
2299 *,
2300 fake: bool,
2301 failures: list[str] | None = None,
2302) -> bool:
2303 if fake:
2304 return False
2305 failures = failures if failures is not None else _recoverable_remote_model_preflight_failures(config)
2306 if not failures:
2307 return False
2308 now = utc_now()
2309 detail = "; ".join(failures)
2310 job = db.get_job(job_id)
2311 already_provider_blocked = job_provider_blocked(job)
2312 if already_provider_blocked and str(job.get("status") or "") == "paused":
2313 db.update_job_metadata(
2314 job_id,
2315 {
2316 "provider_last_probe_at": now,
2317 "provider_last_probe_detail": detail[:1000],
2318 "last_note": "Model provider still unavailable; daemon will check again later.",
2319 },
2320 )
2321 return True
2322 note = "Model provider is unavailable; daemon will monitor and resume this job when calls succeed."
2323 db.update_job_status(
2324 job_id,
2325 "paused",
2326 metadata_patch={
2327 "last_note": note,
2328 "provider_blocked_at": str(job.get("metadata", {}).get("provider_blocked_at") or now)
2329 if already_provider_blocked
2330 else now,
2331 "provider_last_probe_at": now,
2332 "provider_last_probe_detail": detail[:1000],
2333 },
2334 )
2335 if not already_provider_blocked:
2336 db.append_agent_update(
2337 job_id,
2338 note,
2339 category="error",
2340 metadata={"reason": "llm_provider_blocked", "detail": detail[:1000]},
2341 )
2342 return True
2343
2344
2345def cmd_run(args: argparse.Namespace) -> None:
2346 config = load_config()
2347 preflight_failures = [] if args.fake or _model_setup_verified(config) else _remote_model_preflight_failures(config)
2348 preflight_recoverable = _provider_preflight_is_recoverable(preflight_failures)
2349 can_prepare_job = not preflight_failures or preflight_recoverable
2350 requested = _job_ref_text(args.job_id)
2351 if requested:
2352 db, _ = _db()
2353 try:
2354 job = _find_job(db, requested)
2355 if not job:
2356 print(f"No job matched: {requested}")
2357 return
2358 args.job_id = job["id"]
2359 _write_shell_state({"focus_job_id": job["id"]})
2360 if can_prepare_job:
2361 already_provider_blocked = preflight_recoverable and job_provider_blocked(job)
2362 if not already_provider_blocked:
2363 _ensure_job_runnable(db, job["id"])
2364 if preflight_recoverable:
2365 _pause_job_for_recoverable_provider_preflight(
2366 db,
2367 config,
2368 job["id"],
2369 fake=bool(args.fake),
2370 failures=preflight_failures,
2371 )
2372 job = db.get_job(job["id"])
2373 daemon = daemon_lock_status(load_config().runtime.home / "agentd.lock")
2374 print(f"focus set: {job['title']} | job {_job_display_state(job, bool(daemon['running']))}")
2375 finally:
2376 db.close()
2377 else:
2378 db, _ = _db()
2379 try:
2380 job_id = _default_job_id(db)
2381 if job_id:
2382 if can_prepare_job:
2383 job = db.get_job(job_id)
2384 already_provider_blocked = preflight_recoverable and job_provider_blocked(job)
2385 if not already_provider_blocked:
2386 _ensure_job_runnable(db, job_id)
2387 if preflight_recoverable:
2388 _pause_job_for_recoverable_provider_preflight(
2389 db,
2390 config,
2391 job_id,
2392 fake=bool(args.fake),
2393 failures=preflight_failures,
2394 )
2395 else:
2396 print("No jobs found. Create one with /new OBJECTIVE.")
2397 return
2398 finally:
2399 db.close()
2400 _start_daemon_if_needed(
2401 poll_seconds=args.poll_seconds,
2402 fake=args.fake,
2403 quiet=args.quiet,
2404 log_file=args.log_file,
2405 )
2406 if args.no_follow:
2407 return
2408 cmd_activity(
2409 argparse.Namespace(
2410 job_id=args.job_id,
2411 limit=args.limit,
2412 chars=args.chars,
2413 follow=True,
2414 interval=args.interval,
2415 verbose=args.verbose,
2416 paths=args.paths,
2417 )
2418 )
2419
2420
2421def cmd_digest(args: argparse.Namespace) -> None:
2422 db, config = _db()
2423 try:
2424 job_id = _resolve_job_id(db, args.job_id)
2425 if not job_id:
2426 print(f"No job matched: {_job_ref_text(args.job_id)}")
2427 return
2428 print(
2429 render_job_digest(
2430 db,
2431 job_id,
2432 model=config.model.model,
2433 base_url=config.model.base_url,
2434 context_length=config.model.context_length,
2435 input_cost_per_million=config.model.input_cost_per_million,
2436 output_cost_per_million=config.model.output_cost_per_million,
2437 ),
2438 end="",
2439 )
2440 finally:
2441 db.close()
2442
2443
2444def cmd_daily_digest(args: argparse.Namespace) -> None:
2445 db, config = _db()
2446 try:
2447 result = write_daily_digest(config, db, day=args.day)
2448 print(json.dumps(result, ensure_ascii=False, indent=2))
2449 finally:
2450 db.close()
2451
2452
2453def cmd_daemon(args: argparse.Namespace) -> None:
2454 config = load_config()
2455 if not _ensure_remote_model_ready_for_worker(config, fake=args.fake):
2456 raise SystemExit(2)
2457 daemon = Daemon.open(config=config)
2458 try:
2459 if args.once:
2460 result = daemon.run_once(fake=args.fake, verbose=args.verbose)
2461 print(json.dumps(result.__dict__ if result else None, ensure_ascii=False, indent=2))
2462 return
2463 daemon.run_forever(fake=args.fake, poll_seconds=args.poll_seconds, quiet=args.quiet, verbose=args.verbose)
2464 except DaemonAlreadyRunning as exc:
2465 raise SystemExit(str(exc)) from exc
2466 finally:
2467 daemon.close()
2468
2469
2470def cmd_doctor(args: argparse.Namespace) -> None:
2471 config = load_config()
2472 checks = run_doctor(config=config, check_model=args.check_model)
2473 for check in checks:
2474 status = "ok" if check.ok else "fail"
2475 print(f"{status}\t{check.name}\t{check.detail}")
2476 ok = all(check.ok for check in checks)
2477 if args.check_model:
2478 if ok:
2479 _mark_model_setup_verified(config)
2480 print("ok\tmodel_setup\tverified for workspace and chat")
2481 else:
2482 _clear_model_setup_verified()
2483 if not ok:
2484 raise SystemExit(1)
2485
2486
2487def _verify_model_setup_from_first_run() -> list[str]:
2488 stream = StringIO()
2489 with redirect_stdout(stream):
2490 try:
2491 cmd_doctor(argparse.Namespace(check_model=True))
2492 except SystemExit as exc:
2493 if exc.code not in (None, 0):
2494 print("Model setup is not ready. Fix the failed check above before creating a job.")
2495 print("Use /base-url URL, /api-key KEY, or /model MODEL here, then run Doctor again.")
2496 print("For a local endpoint, start the local server or change the endpoint.")
2497 lines = [" ".join(item.split()) for item in stream.getvalue().splitlines() if item.strip()]
2498 return lines[-12:] or ["done"]
2499
2500
2501def _chat_handle_line(job_id: str, line: str, *, reply_fn=None) -> bool:
2502 line = line.strip()
2503 if not line:
2504 return True
2505 if line.startswith("chat "):
2506 db, _ = _db()
2507 try:
2508 job = db.get_job(job_id)
2509 print(f"already chatting with {job['title']}; type your message, /run, or /exit")
2510 return True
2511 finally:
2512 db.close()
2513 if line in {"/exit", "/quit", "exit", "quit"}:
2514 return False
2515 if line in {"/help", "help"}:
2516 print("Core workflow:")
2517 print(" /new OBJECTIVE create a job and start work")
2518 print(" /run resume/start the focused job")
2519 print(" /jobs switch or inspect jobs")
2520 print(" /status current job state")
2521 print(" /outcomes durable progress")
2522 print(" /artifacts saved files")
2523 print(" /activity tool calls")
2524 print(" /pause /resume control the focused job")
2525 print()
2526 print("All commands:")
2527 print(" /jobs /focus JOB_TITLE /switch JOB_TITLE /new OBJECTIVE /delete [JOB_TITLE]")
2528 print(" /history /events /activity /outputs /updates /outcomes [all] /status /usage /config /settings /health")
2529 print(" /artifacts /artifact QUERY /findings /tasks /roadmap /experiments /sources /memory /metrics /lessons")
2530 print(" /model MODEL /base-url URL /api-key KEY /api-key-env ENV /context TOKENS")
2531 print(" /input-cost DOLLARS_PER_1M_INPUT_TOKENS /output-cost DOLLARS_PER_1M_OUTPUT_TOKENS")
2532 print(" /browser true|false /web true|false /cli-access true|false /file-access true|false")
2533 print(" /timeout SECONDS /home PATH /step-limit SECONDS /output-chars CHARS /daily-digest BOOL /digest-time HH:MM /doctor")
2534 print(" /run /start /restart /work N /work-verbose N /stop /pause [note] /resume /cancel [note]")
2535 print(" /learn LESSON /note MESSAGE /follow MESSAGE /digest /clear /exit")
2536 print("Plain text gets a model reply and is saved as model-visible steering.")
2537 return True
2538 if line in {"clear", "/clear"}:
2539 print("\033[2J\033[H", end="")
2540 return True
2541 if line == "jobs" or line == "ls" or line.startswith("jobs "):
2542 cmd_jobs(argparse.Namespace())
2543 return True
2544 if line.startswith(("focus ", "switch ")):
2545 parts = shlex.split(line)
2546 cmd_focus(argparse.Namespace(query=parts[1:]))
2547 return True
2548 if line.startswith("/"):
2549 parts = shlex.split(line[1:])
2550 if not parts:
2551 return True
2552 return _handle_chat_slash_command(job_id, parts[0], parts[1:], deps=_chat_command_deps())
2553 if reply_fn is None:
2554 reply_fn = _reply_to_chat
2555 _handle_chat_message(job_id, line, reply_fn=reply_fn)
2556 return True
2557
2558
2559def _handle_chat_message(job_id: str, line: str, *, reply_fn=None, quiet: bool = False) -> tuple[bool, str]:
2560 if not _model_setup_verified(load_config()):
2561 message = (
2562 "Model setup is not verified. Complete setup or run /doctor after configuring a working provider."
2563 )
2564 if not quiet:
2565 print(message)
2566 return True, message
2567 if job_id == WORKSPACE_CHAT_ID:
2568 return _handle_workspace_chat_message(line, quiet=quiet)
2569 return _controller_handle_chat_message(
2570 job_id,
2571 line,
2572 deps=_chat_controller_deps(),
2573 reply_fn=reply_fn,
2574 quiet=quiet,
2575 )
2576
2577
2578def _chat_reply_text_and_metadata(reply: Any) -> tuple[str, dict[str, Any]]:
2579 return _controller_reply_text_and_metadata(reply)
2580
2581
2582def _workspace_chat_events() -> list[dict[str, Any]]:
2583 events = _read_shell_state().get("workspace_chat_events")
2584 if not isinstance(events, list):
2585 return []
2586 return [event for event in events if isinstance(event, dict)][-120:]
2587
2588
2589def _append_workspace_chat_event(event_type: str, title: str, body: str, metadata: dict[str, Any] | None = None) -> None:
2590 events = _workspace_chat_events()
2591 events.append(
2592 {
2593 "id": f"workspace_{len(events) + 1}_{int(time.time() * 1000)}",
2594 "job_id": WORKSPACE_CHAT_ID,
2595 "event_type": event_type,
2596 "created_at": utc_now(),
2597 "title": title,
2598 "body": body,
2599 "metadata": metadata or {},
2600 }
2601 )
2602 _write_shell_state({"workspace_chat_events": events[-120:]})
2603
2604
2605def _handle_workspace_chat_message(line: str, *, quiet: bool = False) -> tuple[bool, str]:
2606 _append_workspace_chat_event("operator_message", "chat", line, {"source": "workspace"})
2607 objective = _extract_job_objective_from_message(line)
2608 if objective:
2609 message = _create_workspace_job_from_chat(line, objective)
2610 _append_workspace_chat_event("agent_message", "chat", message, {"source": "workspace"})
2611 if not quiet:
2612 print(message)
2613 return True, message
2614 control_command = chat_control_command(line)
2615 if control_command:
2616 keep_running, output = _capture_chat_command(WORKSPACE_CHAT_ID, control_command)
2617 compact = _compact_command_output(output)
2618 message = " | ".join(compact[-4:]) if compact else f"{control_command.lstrip('/')} done"
2619 _append_workspace_chat_event(
2620 "agent_message",
2621 "chat",
2622 message,
2623 {"source": "workspace", "command": control_command},
2624 )
2625 if not quiet:
2626 print(message)
2627 return keep_running, message
2628 try:
2629 reply = _reply_to_workspace_chat(line)
2630 except Exception as exc:
2631 message = _friendly_error_text(f"{type(exc).__name__}: {exc}")
2632 _append_workspace_chat_event("agent_message", "chat", message, {"source": "workspace", "error": True})
2633 if not quiet:
2634 print(message)
2635 return True, message
2636 reply_text, reply_metadata = _chat_reply_text_and_metadata(reply)
2637 text = reply_text.strip() or "I did not get a usable model reply."
2638 _append_workspace_chat_event("agent_message", "chat", text, {"source": "workspace", **reply_metadata})
2639 if not quiet:
2640 print(text)
2641 return True, text
2642
2643
2644def _create_workspace_job_from_chat(message: str, objective: str) -> str:
2645 refined = _refine_job_objective_for_worker(message=message, objective=objective)
2646 job_id, title = _create_job(objective=refined, title=None, kind="generic", cadence=None)
2647 _write_shell_state({"focus_job_id": job_id})
2648 db, _config = _db()
2649 try:
2650 db.append_operator_message(job_id, message, source="workspace_chat", mode="steer")
2651 db.append_agent_update(
2652 job_id,
2653 "Created from Nipux workspace chat with an expanded long-running objective.",
2654 category="chat",
2655 )
2656 finally:
2657 db.close()
2658 run_now = not message_requests_queued_job(message) or message_requests_immediate_run(message)
2659 text = f"Created worker job: {title}."
2660 if run_now:
2661 if _start_worker_from_chat_context():
2662 text += " Started worker."
2663 else:
2664 text += " Worker is waiting for a working model."
2665 else:
2666 text += " It is queued; tell me to run it when ready."
2667 return text
2668
2669
2670def _refine_job_objective_for_worker(*, message: str, objective: str) -> str:
2671 fallback = _durable_job_objective(objective)
2672 try:
2673 from nipux_cli.llm import OpenAIChatLLM
2674
2675 _db_handle, config = _db()
2676 _db_handle.close()
2677 prompt = [
2678 {
2679 "role": "system",
2680 "content": (
2681 "You rewrite operator requests into strong, generic Nipux worker objectives. "
2682 "Nipux workers are long-running autonomous jobs with browser, web, CLI, file, artifact, "
2683 "memory, roadmap, task, source, finding, and experiment tools. "
2684 "Return only the objective text for the worker. Start with one concise title line, then add "
2685 "clear success criteria, output expectations, constraints, evidence requirements, progress "
2686 "reporting expectations, and instructions to keep improving until no useful progress remains. "
2687 "Do not invent hosts, credentials, accounts, domains, models, or private details that the operator did not provide."
2688 ),
2689 },
2690 {
2691 "role": "user",
2692 "content": f"Operator message:\n{message}\n\nExtracted objective:\n{objective}",
2693 },
2694 ]
2695 refined = OpenAIChatLLM(config.model).complete(messages=prompt).strip()
2696 except Exception:
2697 return fallback
2698 if len(refined) < 20:
2699 return fallback
2700 return refined[:8000]
2701
2702
2703def _durable_job_objective(objective: str) -> str:
2704 cleaned = " ".join(str(objective or "").split()).strip() or "Long-running Nipux job"
2705 title = _one_line(cleaned, 96)
2706 return (
2707 f"{title}\n\n"
2708 "Run this as a durable long-running Nipux worker job.\n"
2709 "- Clarify and preserve the operator's actual goal, constraints, and success criteria.\n"
2710 "- Build a roadmap before deep work, then keep the task queue current as evidence changes.\n"
2711 "- Produce concrete outputs as artifacts or files when the work creates something useful.\n"
2712 "- Record findings, sources, lessons, experiments, and measurable results when they apply.\n"
2713 "- Separate activity from progress: report what changed, what was learned, what failed, and what branch is next.\n"
2714 "- Keep improving autonomously until no useful progress remains, the operator pauses the job, or a real blocker needs operator input."
2715 )
2716
2717
2718def _workspace_chat_job_dossier(db: AgentDB, jobs: list[dict[str, Any]], *, limit: int = 8) -> str:
2719 """Compact job context for the left-side workspace chat model."""
2720
2721 if not jobs:
2722 return "No worker jobs yet."
2723 sections: list[str] = []
2724 for index, job in enumerate(jobs[:limit], start=1):
2725 job_id = str(job.get("id") or "")
2726 metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
2727 counts = _safe_job_counts(db, job_id)
2728 findings = _metadata_records(job, "finding_ledger")
2729 sources = _metadata_records(job, "source_ledger")
2730 tasks = _metadata_records(job, "task_queue")
2731 experiments = _metadata_records(job, "experiment_ledger")
2732 lessons = _metadata_records(job, "lessons")
2733 roadmap = metadata.get("roadmap") if isinstance(metadata.get("roadmap"), dict) else {}
2734 milestones = roadmap.get("milestones") if isinstance(roadmap.get("milestones"), list) else []
2735 open_tasks = sum(1 for task in tasks if str(task.get("status") or "open") in {"open", "active", "blocked"})
2736 artifacts = _safe_list_artifacts(db, job_id, limit=3)
2737 events = _safe_list_events(db, job_id, limit=40)
2738 steps = _safe_list_steps(db, job_id, limit=1)
2739 current_task = _workspace_current_task(tasks)
2740 recent_outcomes = _workspace_recent_outcomes(events, limit=5)
2741 latest_outputs = [
2742 _one_line(str(artifact.get("title") or artifact.get("id") or "saved output"), 160)
2743 for artifact in artifacts[:3]
2744 ]
2745 progress = (
2746 f"actions={counts.get('steps', 0)} outputs={counts.get('artifacts', 0)} "
2747 f"findings={len(findings)} sources={len(sources)} tasks={len(tasks)}/{open_tasks} open "
2748 f"experiments={len(experiments)} lessons={len(lessons)} memory={counts.get('memory', 0)} "
2749 f"roadmap={len(milestones)}"
2750 )
2751 latest_step = _one_line(_step_line(steps[-1]), 180) if steps else "no worker steps yet"
2752 lines = [
2753 f"{index}. {job.get('title') or job_id} | state={job.get('status') or 'unknown'} kind={job.get('kind') or 'generic'}",
2754 f" objective: {_one_line(job.get('objective') or '', 220)}",
2755 f" progress: {progress}",
2756 f" latest step: {latest_step}",
2757 ]
2758 if current_task:
2759 lines.append(f" active task: {current_task}")
2760 if latest_outputs:
2761 lines.append(f" latest outputs: {'; '.join(latest_outputs)}")
2762 if recent_outcomes:
2763 lines.append(f" recent outcomes: {'; '.join(recent_outcomes)}")
2764 sections.append("\n".join(lines))
2765 if len(jobs) > limit:
2766 sections.append(f"... {len(jobs) - limit} more job(s) available.")
2767 return "\n\n".join(sections)
2768
2769
2770def _safe_job_counts(db: AgentDB, job_id: str) -> dict[str, int]:
2771 if not job_id:
2772 return {"steps": 0, "artifacts": 0, "memory": 0, "events": 0}
2773 try:
2774 return db.job_record_counts(job_id)
2775 except Exception:
2776 return {"steps": 0, "artifacts": 0, "memory": 0, "events": 0}
2777
2778
2779def _safe_list_artifacts(db: AgentDB, job_id: str, *, limit: int) -> list[dict[str, Any]]:
2780 try:
2781 return db.list_artifacts(job_id, limit=limit)
2782 except Exception:
2783 return []
2784
2785
2786def _safe_list_events(db: AgentDB, job_id: str, *, limit: int) -> list[dict[str, Any]]:
2787 try:
2788 return db.list_timeline_events(job_id, limit=limit)
2789 except Exception:
2790 return []
2791
2792
2793def _safe_list_steps(db: AgentDB, job_id: str, *, limit: int) -> list[dict[str, Any]]:
2794 try:
2795 return db.list_steps(job_id=job_id, limit=limit)
2796 except Exception:
2797 return []
2798
2799
2800def _workspace_current_task(tasks: list[dict[str, Any]]) -> str:
2801 visible = [
2802 task
2803 for task in tasks
2804 if str(task.get("status") or "open") in {"active", "open", "blocked"}
2805 ]
2806 if not visible:
2807 return ""
2808 visible.sort(
2809 key=lambda task: (
2810 {"active": 0, "open": 1, "blocked": 2}.get(str(task.get("status") or "open"), 9),
2811 -int(task.get("priority") or 0),
2812 )
2813 )
2814 task = visible[0]
2815 status = str(task.get("status") or "open")
2816 contract = str(task.get("output_contract") or "")
2817 suffix = f" [{contract}]" if contract else ""
2818 return _one_line(f"{status} {task.get('title') or 'task'}{suffix}", 180)
2819
2820
2821def _workspace_recent_outcomes(events: list[dict[str, Any]], *, limit: int) -> list[str]:
2822 outcomes: list[str] = []
2823 seen: set[str] = set()
2824 for event in reversed(events):
2825 parsed = _model_update_event_parts(event, width=240, compact=True)
2826 if not parsed:
2827 continue
2828 label, text, _clock = parsed
2829 if label == "DONE":
2830 continue
2831 piece = _one_line(f"{label.lower()} {text}", 180)
2832 if piece in seen:
2833 continue
2834 seen.add(piece)
2835 outcomes.append(piece)
2836 if len(outcomes) >= limit:
2837 break
2838 return outcomes
2839
2840
2841def _reply_to_workspace_chat(message: str) -> Any:
2842 from nipux_cli.llm import OpenAIChatLLM
2843
2844 db, config = _db()
2845 try:
2846 jobs = db.list_jobs()[:12]
2847 job_dossier = _workspace_chat_job_dossier(db, jobs)
2848 workspace_events = _workspace_chat_events()[-12:]
2849 history_lines = [
2850 f"- {event.get('event_type')} {event.get('title')}: {_one_line(event.get('body') or '', 220)}"
2851 for event in workspace_events
2852 ]
2853 messages = [
2854 {
2855 "role": "system",
2856 "content": (
2857 "You are Nipux, the workspace chat model for a generic long-running agent CLI. "
2858 "Your job is to help the operator create, start, inspect, pause, resume, and steer worker jobs. "
2859 "You know the CLI concepts: jobs are long-running workers; artifacts are saved outputs; outcomes summarize durable progress; "
2860 "the updates page shows durable worker outcomes; the jobs page shows state, outputs, tasks, memory, findings, sources, experiments, and cost. "
2861 "Answer job-status questions from the job dossier. Mention concrete outputs, tasks, measurements, sources, blockers, and next branches when present. "
2862 "When the operator asks you to do new work, explain that Nipux will spin up a worker job; the harness will create the job from plain language. "
2863 "Keep replies concise, concrete, and operator-facing. Do not expose hidden chain-of-thought."
2864 ),
2865 },
2866 {
2867 "role": "user",
2868 "content": (
2869 f"Model: {config.model.model}\n"
2870 f"Endpoint: {config.model.base_url}\n"
2871 f"Tools: browser={config.tools.browser}, web={config.tools.web}, CLI={config.tools.shell}, files={config.tools.files}\n\n"
2872 f"Job dossier:\n{job_dossier}\n\n"
2873 f"Recent workspace chat:\n{chr(10).join(history_lines) or 'None yet.'}\n\n"
2874 f"Operator message:\n{message}"
2875 ),
2876 },
2877 ]
2878 finally:
2879 db.close()
2880 return OpenAIChatLLM(config.model).complete_response(messages=messages)
2881
2882
2883def _handle_chat_control_intent(job_id: str, line: str, *, quiet: bool = False) -> tuple[bool, str] | None:
2884 return _controller_handle_chat_control_intent(job_id, line, deps=_chat_controller_deps(), quiet=quiet)
2885
2886
2887def _maybe_spawn_job_from_chat(job_id: str, message: str, *, quiet: bool = False) -> str:
2888 return _controller_maybe_spawn_job_from_chat(job_id, message, deps=_chat_controller_deps(), quiet=quiet)
2889
2890
2891def _queue_chat_note(job_id: str, message: str, *, mode: str = "steer", quiet: bool = False) -> None:
2892 _controller_queue_chat_note(job_id, message, deps=_chat_controller_deps(), mode=mode, quiet=quiet)
2893
2894
2895def _chat_controller_deps() -> ChatControllerDeps:
2896 return ChatControllerDeps(
2897 db_factory=_db,
2898 reply_fn=_reply_to_chat,
2899 create_job=_create_job,
2900 write_shell_state=_write_shell_state,
2901 start_daemon=_start_worker_from_chat_context,
2902 capture_command=_capture_chat_command,
2903 compact_command_output=_compact_command_output,
2904 friendly_error_text=_friendly_error_text,
2905 )
2906
2907
2908def _chat_command_deps() -> ChatCommandDeps:
2909 return ChatCommandDeps(
2910 db_factory=_db,
2911 jobs=cmd_jobs,
2912 history=cmd_history,
2913 events=cmd_events,
2914 logs=cmd_logs,
2915 updates=cmd_updates,
2916 artifacts=cmd_artifacts,
2917 artifact=cmd_artifact,
2918 lessons=cmd_lessons,
2919 findings=cmd_findings,
2920 tasks=cmd_tasks,
2921 roadmap=cmd_roadmap,
2922 experiments=cmd_experiments,
2923 sources=cmd_sources,
2924 memory=cmd_memory,
2925 metrics=cmd_metrics,
2926 activity=cmd_activity,
2927 digest=cmd_digest,
2928 status=cmd_status,
2929 usage=cmd_usage,
2930 handle_setting=_handle_chat_setting_command,
2931 doctor=cmd_doctor,
2932 init=cmd_init,
2933 health=cmd_health,
2934 start=_start_worker_from_chat_namespace,
2935 ensure_job_runnable=_ensure_job_runnable,
2936 run=cmd_run,
2937 restart=cmd_restart,
2938 work=cmd_work,
2939 pause=cmd_pause,
2940 resume=cmd_resume,
2941 cancel=cmd_cancel,
2942 queue_note=_queue_chat_note,
2943 create_job=_create_job,
2944 focus=cmd_focus,
2945 delete=cmd_delete,
2946 )
2947
2948
2949def _reply_to_chat(job_id: str, message: str) -> Any:
2950 from nipux_cli.llm import OpenAIChatLLM
2951
2952 db, config = _db()
2953 try:
2954 job = db.get_job(job_id)
2955 messages = _build_chat_messages(db, job, message)
2956 finally:
2957 db.close()
2958 return OpenAIChatLLM(config.model).complete_response(messages=messages)
2959
2960
2961def cmd_shell(args: argparse.Namespace) -> None:
2962 _install_readline_history()
2963 _print_shell_header()
2964 print()
2965 if args.status:
2966 _print_shell_status(limit=args.limit, chars=args.chars)
2967 while True:
2968 try:
2969 line = input(_shell_prompt())
2970 except EOFError:
2971 print()
2972 return
2973 except KeyboardInterrupt:
2974 print()
2975 continue
2976 if not _run_shell_line(line):
2977 return
2978
2979
2980def _print_shell_header() -> None:
2981 print(NIPUX_BANNER)
2982 print(_rule("="))
2983 print(_shell_summary())
2984 print("Type 'chat' to talk, 'history' or 'artifacts' to inspect output, or plain text to steer.")
2985 print("Trace output is observable state and tool I/O, not hidden chain-of-thought.")
2986 print(_rule("="))
2987
2988
2989def _shell_summary() -> str:
2990 db, config = _db()
2991 try:
2992 daemon = daemon_lock_status(config.runtime.home / "agentd.lock")
2993 job_id = _default_job_id(db)
2994 if not job_id:
2995 focus = "no jobs"
2996 else:
2997 job = db.get_job(job_id)
2998 state = _job_display_state(job, bool(daemon["running"]))
2999 focus = f"{job['title']} [job {state} | worker {_worker_label(job, bool(daemon['running']))}]"
3000 daemon_text = "running" if daemon["running"] else "stopped"
3001 return f"daemon: {daemon_text} | model: {config.model.model} | focus: {focus}"
3002 finally:
3003 db.close()
3004
3005
3006def _shell_prompt() -> str:
3007 db, _ = _db()
3008 try:
3009 job_id = _default_job_id(db)
3010 if not job_id:
3011 return "nipux> "
3012 job = db.get_job(job_id)
3013 title = str(job.get("title") or job_id).strip()[:22]
3014 daemon = daemon_lock_status(load_config().runtime.home / "agentd.lock")
3015 worker = _worker_label(job, bool(daemon["running"]))
3016 return f"nipux[{title}:{worker}]> "
3017 except Exception:
3018 return "nipux> "
3019 finally:
3020 db.close()
3021
3022
3023def _install_readline_history() -> None:
3024 try:
3025 import atexit
3026 import readline
3027 except ImportError:
3028 return
3029 config = load_config()
3030 config.ensure_dirs()
3031 history_path = config.runtime.home / "shell_history"
3032 try:
3033 readline.read_history_file(history_path)
3034 except OSError:
3035 pass
3036 atexit.register(readline.write_history_file, history_path)
3037
3038
3039def _print_shell_status(*, limit: int, chars: int) -> None:
3040 db, config = _db()
3041 try:
3042 state = collect_dashboard_state(db, config, limit=limit)
3043 print(render_dashboard(state, width=_terminal_width(), chars=chars), end="")
3044 print()
3045 finally:
3046 db.close()
3047
3048
3049def _print_shell_help() -> None:
3050 _render_shell_help(rule=_rule)
3051
3052
3053def _run_shell_line(line: str) -> bool:
3054 line = line.strip()
3055 if not line:
3056 return True
3057 if line in {"exit", "quit", ":q"}:
3058 return False
3059 if line in {"help", "?", "commands"}:
3060 _print_shell_help()
3061 return True
3062 if line == "clear":
3063 print("\033[2J\033[H", end="")
3064 return True
3065 try:
3066 tokens = shlex.split(line)
3067 except ValueError as exc:
3068 print(f"parse error: {exc}")
3069 return True
3070 if tokens and tokens[0] == "nipux":
3071 tokens = tokens[1:]
3072 if not tokens:
3073 return True
3074 natural = natural_command_for(" ".join(tokens))
3075 if natural:
3076 tokens = [natural]
3077 if tokens[0] == "ls":
3078 tokens[0] = "jobs"
3079 if tokens[0] == "new":
3080 tokens[0] = "create"
3081 if tokens[0] == "focus" and len(tokens) > 1 and tokens[1].lower() in {"on", "more", "only"}:
3082 _steer_default_job(line)
3083 return True
3084 if tokens[0] not in SHELL_COMMAND_NAMES and tokens[0] not in SHELL_BUILTINS:
3085 _steer_default_job(line)
3086 return True
3087 try:
3088 parser = build_parser()
3089 parsed = parser.parse_args(tokens)
3090 if parsed.func is cmd_shell:
3091 print("already in nipux shell")
3092 return True
3093 parsed.func(parsed)
3094 except SystemExit as exc:
3095 code = exc.code if isinstance(exc.code, int) else 1
3096 if code:
3097 print(f"command exited with status {code}")
3098 return True
3099
3100
3101def _steer_default_job(message: str) -> None:
3102 db, _ = _db()
3103 try:
3104 job_id = _default_job_id(db)
3105 if not job_id:
3106 print('No focused job. Create one first, or run: create "objective"')
3107 return
3108 job = db.get_job(job_id)
3109 entry = db.append_operator_message(job_id, message, source="shell")
3110 print(f"waiting for {job['title']}: {entry['message']}")
3111 print("Waiting for the next worker step.")
3112 finally:
3113 db.close()
3114
3115
3116def build_parser() -> argparse.ArgumentParser:
3117 return build_arg_parser(
3118 handlers={
3119 "init": cmd_init,
3120 "update": cmd_update,
3121 "uninstall": cmd_uninstall,
3122 "create": cmd_create,
3123 "jobs": cmd_jobs,
3124 "focus": cmd_focus,
3125 "rename": cmd_rename,
3126 "delete": cmd_delete,
3127 "chat": cmd_chat,
3128 "shell": cmd_shell,
3129 "steer": cmd_steer,
3130 "pause": cmd_pause,
3131 "resume": cmd_resume,
3132 "cancel": cmd_cancel,
3133 "status": cmd_status,
3134 "health": cmd_health,
3135 "history": cmd_history,
3136 "events": cmd_events,
3137 "dashboard": cmd_dashboard,
3138 "start": cmd_start,
3139 "stop": cmd_stop,
3140 "restart": cmd_restart,
3141 "browser_dashboard": cmd_browser_dashboard,
3142 "autostart": cmd_autostart,
3143 "service": cmd_service,
3144 "artifacts": cmd_artifacts,
3145 "artifact": cmd_artifact,
3146 "lessons": cmd_lessons,
3147 "learn": cmd_learn,
3148 "findings": cmd_findings,
3149 "tasks": cmd_tasks,
3150 "roadmap": cmd_roadmap,
3151 "experiments": cmd_experiments,
3152 "sources": cmd_sources,
3153 "memory": cmd_memory,
3154 "metrics": cmd_metrics,
3155 "usage": cmd_usage,
3156 "logs": cmd_logs,
3157 "activity": cmd_activity,
3158 "updates": cmd_updates,
3159 "watch": cmd_watch,
3160 "run_one": cmd_run_one,
3161 "work": cmd_work,
3162 "run": cmd_run,
3163 "digest": cmd_digest,
3164 "daily_digest": cmd_daily_digest,
3165 "daemon": cmd_daemon,
3166 "doctor": cmd_doctor,
3167 },
3168 version=__version__,
3169 default_context_length=DEFAULT_CONTEXT_LENGTH,
3170 )
3171
3172
3173def main(argv: list[str] | None = None) -> None:
3174 argv = sys.argv[1:] if argv is None else argv
3175 try:
3176 if not argv:
3177 cmd_home(argparse.Namespace(history_limit=12))
3178 return
3179 parser = build_parser()
3180 args = parser.parse_args(argv)
3181 args.func(args)
3182 except KeyboardInterrupt:
3183 print()
3184 return
3185
3186
3187if __name__ == "__main__":
3188 main()
nipux_cli/cli_help.py 92 lines
1"""Help text and static branding for the Nipux command console."""
2
3from __future__ import annotations
4
5from typing import Callable
6
7
8NIPUX_BANNER = r"""
9 _ _ _ ____ _ ___
10| \ | (_)_ __ _ ___ _/ ___| | |_ _|
11| \| | | '_ \| | | \ \/ / | | | | |
12| |\ | | |_) | |_| |> <| |___| |___ | |
13|_| \_|_| .__/ \__,_/_/\_\\____|_____|___|
14 |_|
15""".strip("\n")
16
17
18def print_shell_help(*, rule: Callable[[str], str]) -> None:
19 print(NIPUX_BANNER)
20 print(rule("="))
21 _print_group(
22 "Jobs",
23 (
24 'create "objective" --title TITLE',
25 "ls",
26 "focus [JOB_TITLE]",
27 "rename JOB_TITLE --title NEW_TITLE",
28 "delete JOB_TITLE",
29 "chat [JOB_TITLE]",
30 "steer [--job JOB_TITLE] MESSAGE",
31 "pause [JOB_TITLE] [note...]",
32 "resume [JOB_TITLE]",
33 "cancel [JOB_TITLE] [note...]",
34 ),
35 )
36 _print_group(
37 "Inspect",
38 (
39 "status [JOB_TITLE]",
40 "health",
41 "history [JOB_TITLE]",
42 "events [JOB_TITLE] [--follow] [--json]",
43 "activity [JOB_TITLE] [--follow]",
44 "updates [JOB_TITLE]",
45 "outputs [JOB_TITLE] --verbose",
46 "findings [JOB_TITLE]",
47 "tasks [JOB_TITLE]",
48 "roadmap [JOB_TITLE]",
49 "experiments [JOB_TITLE]",
50 "sources [JOB_TITLE]",
51 "memory [JOB_TITLE]",
52 "metrics [JOB_TITLE]",
53 "usage [JOB_TITLE]",
54 "artifacts [JOB_TITLE]",
55 "artifact QUERY_OR_TITLE",
56 "lessons [JOB_TITLE]",
57 ),
58 )
59 _print_group(
60 "Worker",
61 (
62 "work [JOB_TITLE] --steps N [--verbose]",
63 "run [JOB_TITLE] --poll-seconds N",
64 "start --poll-seconds N",
65 "restart --poll-seconds N",
66 "stop # daemon",
67 "stop [JOB_TITLE] # pause job",
68 ),
69 )
70 _print_group(
71 "System",
72 (
73 "learn [--job JOB_TITLE] LESSON",
74 "digest JOB_TITLE",
75 "daily-digest",
76 "update",
77 "service install|status|uninstall",
78 "autostart install|status|uninstall",
79 "dashboard [JOB_TITLE] --no-follow",
80 "doctor --check-model",
81 "browser-dashboard --port 4848",
82 "help",
83 "exit",
84 ),
85 )
86
87
88def _print_group(title: str, commands: tuple[str, ...]) -> None:
89 print(title)
90 for command in commands:
91 print(f" {command}")
92 print()
nipux_cli/cli_render.py 280 lines
1"""Reusable text renderers for non-frame CLI commands."""
2
3from __future__ import annotations
4
5import json
6import os
7import shutil
8import textwrap
9from pathlib import Path
10from typing import Any
11
12from nipux_cli.event_render import event_display_parts
13from nipux_cli.tui_event_format import clean_step_summary
14from nipux_cli.tui_status import job_display_state, worker_label
15from nipux_cli.tui_style import _accent, _event_badge, _fancy_ui, _muted, _one_line, _status_badge
16
17
18def clip_json(value: Any, limit: int) -> str:
19 text = json.dumps(value, ensure_ascii=False, indent=2, sort_keys=True)
20 if len(text) <= limit:
21 return text
22 return text[:limit] + f"\n... truncated {len(text) - limit} chars"
23
24
25def print_step(step: dict[str, Any], *, verbose: bool = False, chars: int = 4000) -> None:
26 tool = step.get("tool_name") or "-"
27 summary = _one_line(clean_step_summary(step.get("summary") or ""), chars)
28 error = _one_line(step["error"], chars) if step.get("error") else ""
29 print(f"step #{step['step_no']} {step['started_at']} {step['status']} {step['kind']} {tool}")
30 if summary:
31 print(f" summary: {summary}")
32 if error:
33 print(f" error: {error}")
34 output_data = step.get("output") or {}
35 if not verbose and isinstance(output_data, dict):
36 artifact_id = output_data.get("artifact_id")
37 if artifact_id:
38 print(f" artifact: {artifact_id} (view with: artifact {artifact_id})")
39 lesson = output_data.get("lesson") if isinstance(output_data.get("lesson"), dict) else None
40 if lesson:
41 print(f" lesson: {_one_line(lesson.get('lesson') or '', chars)}")
42 update = output_data.get("update") if isinstance(output_data.get("update"), dict) else None
43 if update:
44 print(f" update: {_one_line(update.get('message') or '', chars)}")
45 source = output_data.get("source") if isinstance(output_data.get("source"), dict) else None
46 if source:
47 print(f" source: {_one_line(source.get('source') or '', chars)} score={source.get('usefulness_score')}")
48 if isinstance(output_data.get("findings"), list):
49 print(f" findings: {output_data.get('added', 0)} new, {output_data.get('updated', 0)} updated")
50 checkpoint = output_data.get("auto_checkpoint") if isinstance(output_data.get("auto_checkpoint"), dict) else None
51 if checkpoint:
52 print(f" auto checkpoint: {checkpoint.get('artifact_id')}")
53 if verbose:
54 input_data = step.get("input") or {}
55 if input_data:
56 print(" input:")
57 print(clip_json(input_data, chars))
58 if output_data:
59 print(" output:")
60 print(clip_json(output_data, chars))
61
62
63def print_artifact(artifact: dict[str, Any]) -> None:
64 title = artifact.get("title") or artifact["id"]
65 print(f"artifact {artifact['created_at']} {artifact['type']} {title}")
66 print(f" {artifact['path']}")
67
68
69def print_run(run: dict[str, Any]) -> None:
70 print(f"run {run['started_at']} {run['status']} {run['id']} {run.get('model') or ''}")
71 if run.get("error"):
72 print(f" error: {run['error']}")
73
74
75def print_wrapped(prefix: str, text: Any, *, width: int, subsequent_indent: str = "") -> None:
76 content = " ".join(str(text).split())
77 if not content:
78 print(prefix.rstrip())
79 return
80 available = max(20, min(width, 96) - len(prefix))
81 wrapped = textwrap.wrap(content, width=available) or [content]
82 print(prefix + wrapped[0])
83 for line in wrapped[1:]:
84 print(subsequent_indent + line)
85
86
87def section_title(title: str, subtitle: str = "") -> str:
88 text = title.upper()
89 if subtitle:
90 text = f"{text} - {_one_line(subtitle, 52)}"
91 width = min(terminal_width(), 96)
92 if len(text) >= width - 2:
93 return text[:width]
94 if _fancy_ui():
95 return _accent(f"╭─ {text} " + "─" * max(0, width - len(text) - 4))
96 return f"{text} " + "-" * max(0, width - len(text) - 1)
97
98
99def print_metric_grid(items: list[tuple[str, Any]]) -> None:
100 width = min(terminal_width(), 96)
101 cell_width = 24 if width >= 80 else 18
102 cells = [f"{label:<12} {value}"[:cell_width].ljust(cell_width) for label, value in items]
103 columns = max(1, width // cell_width)
104 for start in range(0, len(cells), columns):
105 print(" " + " ".join(cells[start : start + columns]).rstrip())
106
107
108def short_path(path: Path | str, *, max_width: int = 80) -> str:
109 text = str(path)
110 home = str(Path.home())
111 if text.startswith(home + os.sep):
112 text = "~" + text[len(home) :]
113 if len(text) <= max_width:
114 return text
115 keep = max(12, max_width - 4)
116 return "..." + text[-keep:]
117
118
119def print_jobs_panel(jobs: list[dict[str, Any]], *, focused_job_id: str, daemon_running: bool) -> None:
120 print(section_title("Jobs"))
121 if not jobs:
122 print(" No jobs yet. Type an objective or use /new OBJECTIVE.")
123 return
124 print(" # job state worker kind")
125 for index, item in enumerate(jobs[:8], start=1):
126 marker = "*" if str(item.get("id")) == focused_job_id else " "
127 state = job_display_state(item, daemon_running)
128 worker = worker_label(item, daemon_running)
129 title = _one_line(item.get("title") or item.get("id") or "job", 27)
130 print(f" {marker}{index:<2} {title:<27} {_status_badge(state):<11} {_status_badge(worker):<11} {item.get('kind') or ''}")
131 if len(jobs) > 8:
132 print(f" ... {len(jobs) - 8} more. Use /jobs for the full list.")
133 print(" switch: /focus JOB_TITLE")
134
135
136def next_operator_action(job: dict[str, Any], daemon_running: bool) -> str:
137 status = str(job.get("status") or "")
138 if status == "planning":
139 return "review the plan, or run when ready"
140 if status == "cancelled":
141 return "resume to reopen this job, or delete it"
142 if status == "paused":
143 return "resume, then run to continue"
144 if status in {"queued", "running"} and not daemon_running:
145 return "run to start background work"
146 if status in {"queued", "running"} and daemon_running:
147 return "daemon is active; live steps will stream here"
148 if status == "completed":
149 return "inspect history or artifacts"
150 if status == "failed":
151 return "resume, then run one worker step to test recovery"
152 return ""
153
154
155def important_startup_events(events: list[dict[str, Any]], *, limit: int) -> list[dict[str, Any]]:
156 if len(events) <= limit:
157 return events
158 important_types = {
159 "operator_message",
160 "agent_message",
161 "artifact",
162 "finding",
163 "task",
164 "experiment",
165 "lesson",
166 "reflection",
167 "error",
168 "compaction",
169 }
170 selected: list[dict[str, Any]] = []
171 for event in reversed(events):
172 if event.get("event_type") in important_types:
173 selected.append(event)
174 if len(selected) >= limit:
175 break
176 if len(selected) < limit:
177 for event in reversed(events):
178 if event not in selected:
179 selected.append(event)
180 if len(selected) >= limit:
181 break
182 selected.sort(key=lambda event: (str(event.get("created_at") or ""), str(event.get("id") or "")))
183 return selected
184
185
186def print_event_card(event: dict[str, Any], *, chars: int, artifact_indexes: dict[str, int] | None = None) -> None:
187 when, label, detail, access = event_display_parts(event, chars=chars, full=False)
188 artifact_indexes = artifact_indexes or {}
189 artifact_index = artifact_indexes.get(str(event.get("ref_id") or ""))
190 if artifact_index and event.get("event_type") == "artifact":
191 access = f"open: /artifact {artifact_index}"
192 print(f" {_event_badge(label):<8} {_muted(when):<16} {_one_line(detail, chars)}")
193 if access:
194 print(f" {'':<8} {'':<16} {access}")
195
196
197def public_event(event: dict[str, Any]) -> dict[str, Any]:
198 public = dict(event)
199 public.pop("metadata_json", None)
200 return public
201
202
203def print_event_details(event: dict[str, Any], *, chars: int) -> None:
204 metadata = event.get("metadata") if isinstance(event.get("metadata"), dict) else {}
205 if not metadata:
206 return
207 compact = {
208 key: value
209 for key, value in metadata.items()
210 if key not in {"input", "output"} and value not in (None, "", [], {})
211 }
212 if compact:
213 print(f" meta: {_one_line(json.dumps(compact, ensure_ascii=False, sort_keys=True, default=str), chars)}")
214 if isinstance(metadata.get("input"), dict):
215 print(f" input: {_one_line(json.dumps(metadata['input'], ensure_ascii=False, sort_keys=True, default=str), chars)}")
216 if isinstance(metadata.get("output"), dict):
217 print(f" output: {_one_line(json.dumps(metadata['output'], ensure_ascii=False, sort_keys=True, default=str), chars)}")
218
219
220def step_line(step: dict[str, Any], *, chars: int = 180) -> str:
221 tool = step.get("tool_name") or step.get("kind") or "-"
222 summary = clean_step_summary(step.get("summary") or step.get("error") or "-")
223 error = " ERROR" if step.get("error") else ""
224 return f"#{step['step_no']:<4} {step['status']:<9} {tool:<18} {_one_line(summary, chars)}{error}"
225
226
227def terminal_width() -> int:
228 return shutil.get_terminal_size((120, 40)).columns
229
230
231def rule(char: str = "-", width: int | None = None) -> str:
232 return char * min(width or terminal_width(), 96)
233
234
235def json_default(value: Any) -> str:
236 return str(value)
237
238
239def daemon_state_line(lock: dict[str, Any]) -> str:
240 metadata = lock.get("metadata") if isinstance(lock.get("metadata"), dict) else {}
241 if lock.get("running"):
242 pid = metadata.get("pid") or "unknown"
243 stale = " stale-runtime" if lock.get("stale") else ""
244 return f"running pid={pid}{stale}"
245 return "ready when work starts"
246
247
248def daemon_event_line(event: dict[str, Any], *, chars: int, job_titles: dict[str, str] | None = None) -> str:
249 at = str(event.get("at") or "?")
250 name = str(event.get("event") or "?")
251 pieces = []
252 job_titles = job_titles or {}
253 for key in ("status", "tool", "job_id", "step_id", "error_type", "detail", "error"):
254 value = event.get(key)
255 if value not in (None, ""):
256 label = key
257 if key == "job_id":
258 value = job_titles.get(str(value), value)
259 pieces.append(f"{label}={value}")
260 suffix = " ".join(pieces)
261 return _one_line(f"{at} {name} {suffix}".strip(), chars)
262
263
264def job_ref_text(value: Any) -> str | None:
265 if value is None:
266 return None
267 if isinstance(value, list):
268 text = " ".join(str(item) for item in value)
269 else:
270 text = str(value)
271 text = " ".join(text.split())
272 return text or None
273
274
275def note_text(value: Any) -> str:
276 if value is None:
277 return ""
278 if isinstance(value, list):
279 return " ".join(str(item) for item in value).strip()
280 return str(value).strip()
nipux_cli/cli_state.py 126 lines
1"""Persistent CLI focus state and job lookup helpers."""
2
3from __future__ import annotations
4
5import hashlib
6import json
7from datetime import datetime, timezone
8from pathlib import Path
9from typing import Any
10
11from nipux_cli.config import AppConfig, load_config
12from nipux_cli.db import AgentDB
13
14
15def default_job_id(db: AgentDB) -> str | None:
16 configured = configured_focus_job_id(db)
17 if configured:
18 return configured
19 jobs = db.list_jobs()
20 for status in ("running", "queued", "planning", "paused", "failed", "completed"):
21 for job in jobs:
22 if job.get("status") == status:
23 return str(job["id"])
24 return str(jobs[0]["id"]) if jobs else None
25
26
27def configured_focus_job_id(db: AgentDB) -> str | None:
28 job_id = read_shell_state().get("focus_job_id")
29 if not isinstance(job_id, str) or not job_id:
30 return None
31 try:
32 db.get_job(job_id)
33 except KeyError:
34 return None
35 return job_id
36
37
38def find_job(db: AgentDB, query: str) -> dict[str, Any] | None:
39 needle = " ".join(query.split()).lower()
40 if not needle:
41 return None
42 jobs = db.list_jobs()
43 for job in jobs:
44 if str(job["id"]).lower() == needle:
45 return job
46 for job in jobs:
47 if str(job.get("title") or "").lower() == needle:
48 return job
49 for job in jobs:
50 if needle in str(job.get("title") or "").lower():
51 return job
52 return None
53
54
55def shell_state_path() -> Path:
56 config = load_config()
57 config.ensure_dirs()
58 return config.runtime.home / "shell_state.json"
59
60
61def read_shell_state() -> dict[str, Any]:
62 path = shell_state_path()
63 if not path.exists():
64 return {}
65 try:
66 parsed = json.loads(path.read_text(encoding="utf-8"))
67 except (OSError, json.JSONDecodeError):
68 return {}
69 return parsed if isinstance(parsed, dict) else {}
70
71
72def write_shell_state(patch: dict[str, Any]) -> None:
73 state = read_shell_state()
74 state.update(patch)
75 shell_state_path().write_text(
76 json.dumps(state, ensure_ascii=False, indent=2, sort_keys=True) + "\n", encoding="utf-8"
77 )
78
79
80def setup_completed() -> bool:
81 return bool(read_shell_state().get("setup_completed"))
82
83
84def mark_setup_completed() -> None:
85 write_shell_state({"setup_completed": True})
86
87
88def model_setup_fingerprint(config: AppConfig | None = None) -> str:
89 config = config or load_config()
90 key_hash = hashlib.sha256(config.model.api_key.encode("utf-8")).hexdigest() if config.model.api_key else ""
91 payload = {
92 "model": config.model.model,
93 "base_url": config.model.base_url,
94 "api_key_env": config.model.api_key_env,
95 "api_key_hash": key_hash,
96 }
97 return hashlib.sha256(json.dumps(payload, sort_keys=True).encode("utf-8")).hexdigest()
98
99
100def model_setup_verified(config: AppConfig | None = None) -> bool:
101 state = read_shell_state()
102 marker = state.get("model_setup_verified")
103 if not isinstance(marker, dict) or not marker.get("ok"):
104 return False
105 return marker.get("fingerprint") == model_setup_fingerprint(config)
106
107
108def mark_model_setup_verified(config: AppConfig | None = None) -> None:
109 config = config or load_config()
110 write_shell_state(
111 {
112 "setup_completed": True,
113 "model_setup_verified": {
114 "ok": True,
115 "fingerprint": model_setup_fingerprint(config),
116 "checked_at": datetime.now(timezone.utc).isoformat(),
117 "model": config.model.model,
118 "base_url": config.model.base_url,
119 "api_key_env": config.model.api_key_env,
120 },
121 }
122 )
123
124
125def clear_model_setup_verified() -> None:
126 write_shell_state({"model_setup_verified": {}})
nipux_cli/compression.py 246 lines
1"""Deterministic rolling memory summaries for long-running jobs."""
2
3from __future__ import annotations
4
5from nipux_cli.db import AgentDB
6from nipux_cli.memory_graph import rank_memory_nodes
7from nipux_cli.operator_context import active_prompt_operator_entries
8
9
10def _clip_text(value: object, limit: int) -> str:
11 text = " ".join(str(value or "").split())
12 if len(text) <= limit:
13 return text
14 return text[: max(0, limit - 3)].rstrip() + "..."
15
16
17def refresh_memory_index(db: AgentDB, job_id: str, *, max_steps: int = 8, max_artifacts: int = 8) -> str:
18 """Write a compact, artifact-referenced job memory entry.
19
20 This is deliberately deterministic. A local model can later improve the
21 prose, but the daemon should always have a cheap compaction path that runs
22 after every step and survives model failures.
23 """
24
25 job = db.get_job(job_id)
26 steps = db.list_steps(job_id=job_id)[-max_steps:]
27 artifacts = db.list_artifacts(job_id, limit=max_artifacts)
28 artifact_refs = [artifact["id"] for artifact in artifacts]
29 metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
30 operator_messages = metadata.get("operator_messages") if isinstance(metadata.get("operator_messages"), list) else []
31 active_operator = [
32 entry
33 for entry in active_prompt_operator_entries(operator_messages)
34 if str(entry.get("mode") or "steer") in {"steer", "follow_up"}
35 ][-5:]
36 operator_notes = [
37 entry for entry in operator_messages
38 if isinstance(entry, dict)
39 and str(entry.get("mode") or "steer") == "note"
40 ][-3:]
41
42 lines = [
43 f"Job lifecycle status: {job['status']}",
44 f"Objective: {job['objective']}",
45 "",
46 "Active operator context:",
47 ]
48 if not active_operator and not operator_notes:
49 lines.append("- none")
50 for entry in active_operator:
51 lines.append(
52 f"- {entry.get('mode') or 'steer'} {entry.get('event_id') or ''}: "
53 f"{_clip_text(entry.get('message') or '', 300)}"
54 )
55 for entry in operator_notes:
56 lines.append(f"- note {entry.get('event_id') or ''}: {_clip_text(entry.get('message') or '', 300)}")
57
58 lines.extend([
59 "",
60 "Recent steps:",
61 ])
62 if not steps:
63 lines.append("- none")
64 for step in steps:
65 tool = f" tool={step['tool_name']}" if step.get("tool_name") else ""
66 summary = step.get("summary") or step.get("error") or ""
67 lines.append(f"- #{step['step_no']} {step['kind']} {step['status']}{tool}: {_clip_text(summary, 280)}")
68
69 lines.extend(["", "Recent artifacts:"])
70 if not artifacts:
71 lines.append("- none")
72 for artifact in artifacts:
73 title = artifact.get("title") or artifact["id"]
74 summary = artifact.get("summary") or ""
75 lines.append(f"- {artifact['id']} {_clip_text(title, 120)} ({artifact['type']}): {_clip_text(summary, 240)}")
76
77 tasks = _metadata_list(metadata, "task_queue")
78 findings = _metadata_list(metadata, "finding_ledger")
79 sources = _metadata_list(metadata, "source_ledger")
80 experiments = _metadata_list(metadata, "experiment_ledger")
81 roadmap = metadata.get("roadmap") if isinstance(metadata.get("roadmap"), dict) else {}
82 memory_graph = metadata.get("memory_graph") if isinstance(metadata.get("memory_graph"), dict) else {}
83 memory_nodes = _metadata_list(memory_graph, "nodes")
84 memory_edges = _metadata_list(memory_graph, "edges")
85 pending_measurement = (
86 metadata.get("pending_measurement_obligation")
87 if isinstance(metadata.get("pending_measurement_obligation"), dict)
88 and metadata.get("pending_measurement_obligation")
89 and not metadata.get("pending_measurement_obligation", {}).get("resolved_at")
90 else {}
91 )
92
93 lines.extend(["", "Durable progress ledgers:"])
94 lines.append(
95 "- "
96 + ", ".join(
97 [
98 f"tasks={len(tasks)}",
99 f"findings={len(findings)}",
100 f"sources={len(sources)}",
101 f"experiments={len(experiments)}",
102 f"memory_nodes={len(memory_nodes)}",
103 f"roadmap={'yes' if roadmap else 'no'}",
104 ]
105 )
106 )
107 for node in rank_memory_nodes(memory_nodes, limit=4):
108 lines.append(
109 "- memory "
110 f"{node.get('status') or 'active'} "
111 f"{node.get('kind') or 'fact'} "
112 f"{_clip_text(node.get('title') or node.get('key') or '', 120)}"
113 )
114 if memory_edges:
115 lines.append(f"- memory_links={len(memory_edges)}")
116 for task in _rank_tasks(tasks)[:4]:
117 lines.append(
118 "- task "
119 f"{task.get('status') or 'open'} "
120 f"{_clip_text(task.get('title') or '', 120)} "
121 f"contract={task.get('output_contract') or '?'}"
122 )
123 for experiment in experiments[-3:]:
124 metric = ""
125 if experiment.get("metric_value") not in (None, ""):
126 metric = (
127 f" metric={experiment.get('metric_name') or 'value'}="
128 f"{experiment.get('metric_value')}{experiment.get('metric_unit') or ''}"
129 )
130 lines.append(
131 "- experiment "
132 f"{experiment.get('status') or 'planned'} "
133 f"{_clip_text(experiment.get('title') or '', 120)}{metric}"
134 )
135 if pending_measurement:
136 candidates = pending_measurement.get("metric_candidates")
137 candidate_text = "; ".join(str(item) for item in candidates[:3]) if isinstance(candidates, list) else ""
138 lines.append(
139 "- pending_measurement "
140 f"step=#{pending_measurement.get('source_step_no') or '?'} "
141 f"tool={pending_measurement.get('tool') or '?'} "
142 f"{_clip_text(candidate_text or pending_measurement.get('summary') or '', 220)}"
143 )
144 for finding in findings[-3:]:
145 lines.append(f"- finding {_clip_text(finding.get('name') or finding.get('title') or '', 140)}")
146 for source in sources[-3:]:
147 score = source.get("usefulness_score")
148 lines.append(f"- source {_clip_text(source.get('source') or '', 140)} score={score if score is not None else '?'}")
149 if roadmap:
150 lines.append(
151 "- roadmap "
152 f"{roadmap.get('status') or 'planned'} "
153 f"{_clip_text(roadmap.get('title') or 'Roadmap', 140)} "
154 f"current={_clip_text(roadmap.get('current_milestone') or '', 120)}"
155 )
156
157 usage = db.job_token_usage(job_id)
158 if int(usage.get("calls") or 0) > 0:
159 lines.extend(["", "Model usage:"])
160 latest_prompt = _compact_count(usage.get("latest_prompt_tokens"))
161 latest_total = _compact_count(usage.get("latest_total_tokens"))
162 context_length = _first_positive_int(usage.get("latest_context_length"), usage.get("context_length"))
163 context_fraction = _context_fraction(usage, context_length=context_length)
164 lines.append(
165 "- "
166 + ", ".join(
167 [
168 f"calls={usage.get('calls') or 0}",
169 f"total_tokens={_compact_count(usage.get('total_tokens'))}",
170 f"output_tokens={_compact_count(usage.get('completion_tokens'))}",
171 f"latest_context={latest_prompt}",
172 f"latest_total={latest_total}",
173 f"estimated_calls={usage.get('estimated_calls') or 0}",
174 ]
175 )
176 )
177 if context_fraction >= 0.65:
178 lines.append(
179 "- context_pressure "
180 f"latest_context={latest_prompt}"
181 + (f"/{_compact_count(context_length)}" if context_length else "")
182 + f" ({context_fraction:.0%}); prefer compact ledgers, artifacts, and decisions over raw history."
183 )
184
185 return db.upsert_memory(
186 job_id=job_id,
187 key="rolling_state",
188 summary="\n".join(lines).strip(),
189 artifact_refs=artifact_refs,
190 )
191
192
193def _metadata_list(metadata: dict, key: str) -> list[dict]:
194 values = metadata.get(key)
195 if not isinstance(values, list):
196 return []
197 return [value for value in values if isinstance(value, dict)]
198
199
200def _rank_tasks(tasks: list[dict]) -> list[dict]:
201 status_rank = {"active": 0, "open": 1, "blocked": 2, "validating": 3, "done": 4, "skipped": 5}
202 return sorted(
203 tasks,
204 key=lambda task: (
205 status_rank.get(str(task.get("status") or "open"), 9),
206 -int(task.get("priority") or 0),
207 str(task.get("title") or ""),
208 ),
209 )
210
211
212def _compact_count(value: object) -> str:
213 try:
214 number = int(float(value or 0))
215 except (TypeError, ValueError):
216 number = 0
217 if number >= 1_000_000:
218 return f"{number / 1_000_000:.1f}M"
219 if number >= 1_000:
220 return f"{number / 1_000:.1f}K"
221 return str(number)
222
223
224def _context_fraction(usage: dict, *, context_length: int) -> float:
225 raw_fraction = usage.get("latest_context_fraction") or usage.get("context_fraction")
226 try:
227 fraction = float(raw_fraction)
228 except (TypeError, ValueError):
229 fraction = 0.0
230 if fraction > 0:
231 return fraction
232 latest_prompt = _first_positive_int(usage.get("latest_prompt_tokens"), usage.get("prompt_tokens"))
233 if context_length <= 0 or latest_prompt <= 0:
234 return 0.0
235 return latest_prompt / context_length
236
237
238def _first_positive_int(*values: object) -> int:
239 for value in values:
240 try:
241 number = int(float(value or 0))
242 except (TypeError, ValueError):
243 continue
244 if number > 0:
245 return number
246 return 0
nipux_cli/config.py 271 lines
1"""Configuration for the Nipux long-running agent runtime."""
2
3from __future__ import annotations
4
5import os
6from dataclasses import dataclass, field
7from pathlib import Path
8from typing import Any
9
10import yaml
11
12
13DEFAULT_OPENROUTER_MODEL = "openrouter/auto"
14DEFAULT_OPENROUTER_API_KEY_ENV = <redacted>
15DEFAULT_MODEL = "local-model"
16DEFAULT_BASE_URL = "http://localhost:8000/v1"
17DEFAULT_API_KEY_ENV = <redacted>
18DEFAULT_CONTEXT_LENGTH = 262_144
19DEFAULT_REQUEST_TIMEOUT_SECONDS = 300.0
20
21
22def get_agent_home() -> Path:
23 """Return the Nipux agent home directory."""
24
25 value = os.environ.get("NIPUX_HOME", "").strip()
26 return Path(value).expanduser() if value else Path.home() / ".nipux"
27
28
29def load_env_file(path: str | Path) -> None:
30 """Load KEY=value pairs from a local env file without overriding the shell."""
31
32 env_path = Path(path).expanduser()
33 if not env_path.exists():
34 return
35 ensure_private_file_permissions(env_path)
36 for raw_line in env_path.read_text(encoding="utf-8").splitlines():
37 line = raw_line.strip()
38 if not line or line.startswith("#") or "=" not in line:
39 continue
40 key, value = line.split("=", 1)
41 key = key.strip()
42 value = value.strip().strip("\"'")
43 if key and key not in os.environ:
44 os.environ[key] = value
45
46
47def ensure_private_file_permissions(path: str | Path) -> None:
48 """Best-effort POSIX privacy for local config/secret files."""
49
50 if os.name == "nt":
51 return
52 try:
53 Path(path).chmod(0o600)
54 except OSError:
55 pass
56
57
58def ensure_private_dir_permissions(path: str | Path) -> None:
59 """Best-effort POSIX privacy for the local Nipux state directory."""
60
61 if os.name == "nt":
62 return
63 try:
64 Path(path).chmod(0o700)
65 except OSError:
66 pass
67
68
69def write_private_text(path: str | Path, text: str) -> None:
70 """Write text with private file permissions from creation time."""
71
72 target = Path(path).expanduser()
73 target.parent.mkdir(parents=True, exist_ok=True)
74 flags = os.O_WRONLY | os.O_CREAT | os.O_TRUNC
75 fd = os.open(target, flags, 0o600)
76 try:
77 with os.fdopen(fd, "w", encoding="utf-8") as handle:
78 fd = -1
79 handle.write(text)
80 finally:
81 if fd >= 0:
82 os.close(fd)
83 ensure_private_file_permissions(target)
84
85
86@dataclass(frozen=True)
87class ModelConfig:
88 model: str = DEFAULT_MODEL
89 base_url: str = DEFAULT_BASE_URL
90 api_key_env: str = DEFAULT_API_KEY_ENV
91 context_length: int = DEFAULT_CONTEXT_LENGTH
92 request_timeout_seconds: float = DEFAULT_REQUEST_TIMEOUT_SECONDS
93 input_cost_per_million: float | None = None
94 output_cost_per_million: float | None = None
95
96 @property
97 def api_key(self) -> str:
98 return os.environ.get(self.api_key_env, "")
99
100
101@dataclass(frozen=True)
102class RuntimeConfig:
103 home: Path = field(default_factory=get_agent_home)
104 max_step_seconds: int = 600
105 max_steps_per_run: int = 1
106 artifact_inline_char_limit: int = 12_000
107 daily_digest_enabled: bool = True
108 daily_digest_time: str = "08:00"
109 max_job_cost_usd: float | None = None
110
111 @property
112 def state_db_path(self) -> Path:
113 return self.home / "state.db"
114
115 @property
116 def jobs_dir(self) -> Path:
117 return self.home / "jobs"
118
119 @property
120 def logs_dir(self) -> Path:
121 return self.home / "logs"
122
123 @property
124 def digests_dir(self) -> Path:
125 return self.home / "digests"
126
127
128@dataclass(frozen=True)
129class ToolAccessConfig:
130 browser: bool = True
131 web: bool = True
132 shell: bool = True
133 files: bool = True
134
135
136@dataclass(frozen=True)
137class EmailConfig:
138 enabled: bool = False
139 smtp_host: str = ""
140 smtp_port: int = 587
141 username: str = ""
142 password_env: str = "NIPUX_EMAIL_PASSWORD"
143 from_addr: str = ""
144 to_addr: str = ""
145 use_tls: bool = True
146
147 @property
148 def password(self) -> str:
149 return os.environ.get(self.password_env, "")
150
151
152@dataclass(frozen=True)
153class AppConfig:
154 runtime: RuntimeConfig = field(default_factory=RuntimeConfig)
155 model: ModelConfig = field(default_factory=ModelConfig)
156 tools: ToolAccessConfig = field(default_factory=ToolAccessConfig)
157 email: EmailConfig = field(default_factory=EmailConfig)
158
159 def ensure_dirs(self) -> None:
160 for directory in (
161 self.runtime.home,
162 self.runtime.jobs_dir,
163 self.runtime.logs_dir,
164 self.runtime.digests_dir,
165 ):
166 directory.mkdir(parents=True, exist_ok=True)
167 ensure_private_dir_permissions(directory)
168
169
170def _as_dict(value: Any) -> dict[str, Any]:
171 return value if isinstance(value, dict) else {}
172
173
174def _optional_float(value: Any) -> float | None:
175 if value in (None, ""):
176 return None
177 return float(value)
178
179
180def load_config(path: str | Path | None = None) -> AppConfig:
181 """Load config.yaml, falling back to a local OpenAI-compatible endpoint."""
182
183 home = get_agent_home()
184 load_env_file(home / ".env")
185 cfg_path = Path(path).expanduser() if path else home / "config.yaml"
186 raw: dict[str, Any] = {}
187 if cfg_path.exists():
188 loaded = yaml.safe_load(cfg_path.read_text(encoding="utf-8")) or {}
189 raw = _as_dict(loaded)
190
191 runtime_raw = _as_dict(raw.get("runtime"))
192 model_raw = _as_dict(raw.get("model"))
193 tools_raw = _as_dict(raw.get("tools"))
194 email_raw = _as_dict(raw.get("email"))
195
196 runtime_home = Path(runtime_raw.get("home") or home).expanduser()
197 runtime = RuntimeConfig(
198 home=runtime_home,
199 max_step_seconds=int(runtime_raw.get("max_step_seconds", 600)),
200 max_steps_per_run=int(runtime_raw.get("max_steps_per_run", 1)),
201 artifact_inline_char_limit=int(runtime_raw.get("artifact_inline_char_limit", 12_000)),
202 daily_digest_enabled=bool(runtime_raw.get("daily_digest_enabled", True)),
203 daily_digest_time=str(runtime_raw.get("daily_digest_time") or "08:00"),
204 max_job_cost_usd=_optional_float(runtime_raw.get("max_job_cost_usd")),
205 )
206 model = ModelConfig(
207 model=str(model_raw.get("name") or model_raw.get("model") or DEFAULT_MODEL),
208 base_url=str(model_raw.get("base_url") or DEFAULT_BASE_URL).rstrip("/"),
209 api_key_env=str(model_raw.get("api_key_env") or DEFAULT_API_KEY_ENV),
210 context_length=int(model_raw.get("context_length", DEFAULT_CONTEXT_LENGTH)),
211 request_timeout_seconds=float(model_raw.get("request_timeout_seconds", DEFAULT_REQUEST_TIMEOUT_SECONDS)),
212 input_cost_per_million=_optional_float(model_raw.get("input_cost_per_million")),
213 output_cost_per_million=_optional_float(model_raw.get("output_cost_per_million")),
214 )
215 tools = ToolAccessConfig(
216 browser=bool(tools_raw.get("browser", True)),
217 web=bool(tools_raw.get("web", True)),
218 shell=bool(tools_raw.get("shell", True)),
219 files=bool(tools_raw.get("files", True)),
220 )
221 email = EmailConfig(
222 enabled=bool(email_raw.get("enabled", False)),
223 smtp_host=str(email_raw.get("smtp_host") or ""),
224 smtp_port=int(email_raw.get("smtp_port", 587)),
225 username=str(email_raw.get("username") or ""),
226 password_env=str(email_raw.get("password_env") or "NIPUX_EMAIL_PASSWORD"),
227 from_addr=str(email_raw.get("from_addr") or ""),
228 to_addr=str(email_raw.get("to_addr") or ""),
229 use_tls=bool(email_raw.get("use_tls", True)),
230 )
231 return AppConfig(runtime=runtime, model=model, tools=tools, email=email)
232
233
234def default_config_yaml(
235 *,
236 model: str = DEFAULT_MODEL,
237 base_url: str = DEFAULT_BASE_URL,
238 api_key_env: str = DEFAULT_API_KEY_ENV,
239 context_length: int = DEFAULT_CONTEXT_LENGTH,
240) -> str:
241 """Return a starter config file for an OpenAI-compatible model server."""
242
243 return (
244 "model:\n"
245 f" name: {model}\n"
246 f" base_url: {base_url.rstrip('/')}\n"
247 f" api_key_env: {api_key_env}\n"
248 f" context_length: {context_length}\n"
249 " input_cost_per_million: null\n"
250 " output_cost_per_million: null\n"
251 "runtime:\n"
252 " max_step_seconds: 600\n"
253 " max_steps_per_run: 1\n"
254 " artifact_inline_char_limit: 12000\n"
255 " daily_digest_enabled: true\n"
256 " daily_digest_time: \"08:00\"\n"
257 " max_job_cost_usd: null\n"
258 "tools:\n"
259 " browser: true\n"
260 " web: true\n"
261 " shell: true\n"
262 " files: true\n"
263 "email:\n"
264 " enabled: false\n"
265 " smtp_host: \"\"\n"
266 " smtp_port: 587\n"
267 " username: \"\"\n"
268 " password_env: NIPUX_EMAIL_PASSWORD\n"
269 " from_addr: \"\"\n"
270 " to_addr: \"\"\n"
271 )
nipux_cli/context_pressure.py 254 lines
1"""Context-pressure signals for long-running worker prompts."""
2
3from __future__ import annotations
4
5from datetime import datetime, timezone
6from typing import Any
7
8from nipux_cli.db import AgentDB
9
10
11CONTEXT_PRESSURE_BANDS = (
12 (0.95, "critical"),
13 (0.85, "high"),
14 (0.65, "watch"),
15)
16USAGE_TOKEN_BANDS = <redacted>
17 (20_000_000, "critical"),
18 (5_000_000, "high"),
19 (1_000_000, "watch"),
20)
21USAGE_CALL_BANDS = (
22 (2_000, "critical"),
23 (1_000, "high"),
24 (200, "watch"),
25)
26USAGE_COST_BANDS = (
27 (10.0, "critical"),
28 (5.0, "high"),
29 (1.0, "watch"),
30)
31USAGE_BAND_RANK = {"": 0, "watch": 1, "high": 2, "critical": 3}
32
33
34def context_pressure_for_prompt(job: dict[str, Any]) -> str:
35 metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
36 pressure = metadata.get("context_pressure") if isinstance(metadata.get("context_pressure"), dict) else {}
37 band = str(pressure.get("band") or "")
38 if band not in {"watch", "high", "critical"}:
39 return "None."
40 prompt_tokens = compact_token_count(pressure.get("prompt_tokens"))
41 context_length = compact_token_count(pressure.get("context_length"))
42 context_text = prompt_tokens
43 if context_length != "0":
44 context_text = f"{context_text}/{context_length}"
45 fraction = _as_float(pressure.get("fraction"))
46 fraction_text = f" ({fraction:.0%})" if fraction else ""
47 return (
48 f"Context pressure is {band}: latest prompt used {context_text}{fraction_text}. "
49 "Keep the next turn compact; prefer durable memory, ledgers, artifact references, and explicit decisions "
50 "over copying raw history."
51 )
52
53
54def usage_pressure_for_prompt(job: dict[str, Any], usage: dict[str, Any] | None) -> str:
55 usage = usage if isinstance(usage, dict) else {}
56 band = _usage_pressure_band(usage)
57 if not band:
58 return "None."
59 calls = _as_int(usage.get("calls"))
60 prompt_tokens = _as_int(usage.get("prompt_tokens"))
61 completion_tokens = _as_int(usage.get("completion_tokens"))
62 total_tokens = _as_int(usage.get("total_tokens")) or prompt_tokens + completion_tokens
63 latest_prompt_tokens = _as_int(usage.get("latest_prompt_tokens"))
64 latest_context_length = _as_int(usage.get("latest_context_length"))
65 durable_records = _durable_usage_signal_count(job)
66 tokens_per_record = total_tokens / max(1, durable_records)
67 latest_context = compact_token_count(latest_prompt_tokens)
68 if latest_context_length:
69 latest_context = f"{latest_context}/{compact_token_count(latest_context_length)}"
70 bits = [
71 f"calls={calls}",
72 f"tokens={compact_token_count(total_tokens)}",
73 f"prompt={compact_token_count(prompt_tokens)}",
74 f"output={compact_token_count(completion_tokens)}",
75 ]
76 if bool(usage.get("has_cost")):
77 bits.append(f"cost=${_as_float(usage.get('cost')):.4f}")
78 if latest_prompt_tokens:
79 bits.append(f"latest_context={latest_context}")
80 lines = [
81 f"Cumulative model usage pressure is {band}: " + " ".join(bits) + ".",
82 (
83 f"Durable progress records={durable_records}; "
84 f"approximately {compact_token_count(int(tokens_per_record))} tokens per durable record."
85 ),
86 (
87 "Next action should be high leverage: execute, measure, validate, consolidate, defer, or mark a branch "
88 "blocked/skipped from concrete evidence. Avoid low-yield retries, broad rereads, or new research unless it "
89 "directly resolves an active contract or unlocks the next experiment."
90 ),
91 ]
92 return "\n".join(lines)
93
94
95def emit_usage_pressure_update(db: AgentDB, job_id: str, usage: dict[str, Any]) -> None:
96 band = _usage_pressure_band(usage)
97 if not band:
98 return
99 job = db.get_job(job_id)
100 metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
101 previous = metadata.get("usage_pressure") if isinstance(metadata.get("usage_pressure"), dict) else {}
102 previous_band = str(previous.get("band") or "")
103 total_tokens = _as_int(usage.get("total_tokens"))
104 previous_high_tokens = _as_int(previous.get("high_water_tokens"))
105 should_emit = (
106 previous_band != band
107 or (previous_high_tokens > 0 and total_tokens >= int(previous_high_tokens * 1.5))
108 or (previous_high_tokens <= 0)
109 )
110 pressure = {
111 "band": band,
112 "calls": _as_int(usage.get("calls")),
113 "total_tokens": total_tokens,
114 "prompt_tokens": _as_int(usage.get("prompt_tokens")),
115 "completion_tokens": _as_int(usage.get("completion_tokens")),
116 "cost": _as_float(usage.get("cost")) if bool(usage.get("has_cost")) else None,
117 "has_cost": bool(usage.get("has_cost")),
118 "high_water_tokens": max(total_tokens, previous_high_tokens),
119 "updated_at": datetime.now(timezone.utc).isoformat(),
120 }
121 db.update_job_metadata(job_id, {"usage_pressure": pressure})
122 if not should_emit:
123 return
124 cost_text = ""
125 if pressure["has_cost"]:
126 cost_text = f" cost=${pressure['cost']:.4f}"
127 db.append_agent_update(
128 job_id,
129 (
130 f"Usage pressure {band}: {compact_token_count(total_tokens)} tokens across "
131 f"{pressure['calls']} model calls.{cost_text} Prefer high-leverage actions, measurement, "
132 "consolidation, or explicit blocked/deferred branches over low-yield churn."
133 ),
134 category="update",
135 metadata={"kind": "usage_pressure", "usage_pressure": pressure},
136 )
137
138
139def emit_context_pressure_update(db: AgentDB, job_id: str, usage: dict[str, Any]) -> None:
140 fraction = _as_float(usage.get("context_fraction"))
141 band = _context_pressure_band(fraction)
142 if not band:
143 return
144 job = db.get_job(job_id)
145 metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
146 previous = metadata.get("context_pressure") if isinstance(metadata.get("context_pressure"), dict) else {}
147 previous_band = str(previous.get("band") or "")
148 previous_high = _as_float(previous.get("high_water_fraction"))
149 should_emit = previous_band != band or fraction >= previous_high + 0.10
150 prompt_tokens = _as_int(usage.get("prompt_tokens"))
151 context_length = _as_int(usage.get("context_length"))
152 pressure = {
153 "band": band,
154 "fraction": round(fraction, 6),
155 "high_water_fraction": round(max(fraction, previous_high), 6),
156 "prompt_tokens": prompt_tokens,
157 "context_length": context_length,
158 "updated_at": datetime.now(timezone.utc).isoformat(),
159 }
160 db.update_job_metadata(job_id, {"context_pressure": pressure})
161 if not should_emit:
162 return
163 denominator = f"/{compact_token_count(context_length)}" if context_length else ""
164 estimated = ", estimated" if usage.get("estimated") else ""
165 db.append_agent_update(
166 job_id,
167 (
168 f"Context pressure {band}: latest prompt "
169 f"{compact_token_count(prompt_tokens)}{denominator} ({fraction:.0%}{estimated}). "
170 "Prefer compact memory, ledgers, artifact references, and explicit decisions over raw history."
171 ),
172 category="update",
173 metadata={"kind": "context_pressure", "context_pressure": pressure},
174 )
175
176
177def compact_token_count(value: object) -> str:
178 number = _as_int(value)
179 if number >= 1_000_000:
180 return f"{number / 1_000_000:.1f}M"
181 if number >= 1_000:
182 return f"{number / 1_000:.1f}K"
183 return str(number)
184
185
186def _usage_pressure_band(usage: dict[str, Any]) -> str:
187 total_tokens = _as_int(usage.get("total_tokens"))
188 if total_tokens <= 0:
189 total_tokens = _as_int(usage.get("prompt_tokens")) + _as_int(usage.get("completion_tokens"))
190 calls = _as_int(usage.get("calls"))
191 cost = _as_float(usage.get("cost")) if bool(usage.get("has_cost")) else 0.0
192 band = ""
193 for threshold, candidate in USAGE_TOKEN_BANDS:
194 if total_tokens >= threshold and USAGE_BAND_RANK[candidate] > USAGE_BAND_RANK[band]:
195 band = candidate
196 break
197 for threshold, candidate in USAGE_CALL_BANDS:
198 if calls >= threshold and USAGE_BAND_RANK[candidate] > USAGE_BAND_RANK[band]:
199 band = candidate
200 break
201 if bool(usage.get("has_cost")):
202 for threshold, candidate in USAGE_COST_BANDS:
203 if cost >= threshold and USAGE_BAND_RANK[candidate] > USAGE_BAND_RANK[band]:
204 band = candidate
205 break
206 return band
207
208
209def _context_pressure_band(fraction: float) -> str:
210 for threshold, band in CONTEXT_PRESSURE_BANDS:
211 if fraction >= threshold:
212 return band
213 return ""
214
215
216def _durable_usage_signal_count(job: dict[str, Any]) -> int:
217 metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
218 count = 0
219 for key in ("finding_ledger", "source_ledger", "experiment_ledger", "lessons"):
220 records = metadata.get(key)
221 if isinstance(records, list):
222 count += sum(1 for record in records if isinstance(record, dict))
223 tasks = metadata.get("task_queue")
224 if isinstance(tasks, list):
225 count += sum(
226 1
227 for task in tasks
228 if isinstance(task, dict)
229 and str(task.get("status") or "open").lower() in {"done", "blocked", "skipped"}
230 and (task.get("result") or task.get("evidence_needed") or task.get("acceptance_criteria"))
231 )
232 roadmap = metadata.get("roadmap") if isinstance(metadata.get("roadmap"), dict) else {}
233 milestones = roadmap.get("milestones") if isinstance(roadmap.get("milestones"), list) else []
234 count += sum(
235 1
236 for milestone in milestones
237 if isinstance(milestone, dict)
238 and str(milestone.get("status") or "planned").lower() in {"active", "validating", "done", "blocked", "skipped"}
239 )
240 return count
241
242
243def _as_float(value: Any, default: float = 0.0) -> float:
244 try:
245 return float(value)
246 except (TypeError, ValueError):
247 return default
248
249
250def _as_int(value: Any, default: int = 0) -> int:
251 try:
252 return int(float(value))
253 except (TypeError, ValueError):
254 return default
nipux_cli/daemon.py 695 lines
1"""Daemon runner for restartable background jobs."""
2
3from __future__ import annotations
4
5import contextlib
6import fcntl
7import hashlib
8import json
9import os
10import signal
11import threading
12import time
13from dataclasses import dataclass
14from datetime import datetime, timezone
15from email.utils import parsedate_to_datetime
16from functools import lru_cache
17from pathlib import Path
18from typing import Any, Callable
19
20from nipux_cli.config import AppConfig, load_config
21from nipux_cli.db import AgentDB
22from nipux_cli.digest import write_daily_digest
23from nipux_cli.doctor import run_doctor
24from nipux_cli.provider_errors import provider_rate_limited
25from nipux_cli.scheduling import job_deferred_until, job_is_deferred, job_provider_blocked
26from nipux_cli.shell_tools import cleanup_registered_shell_processes
27
28
29class DaemonAlreadyRunning(RuntimeError):
30 pass
31
32
33RUNTIME_CODE_GLOB = "*.py"
34
35
36def runtime_code_file_names() -> tuple[str, ...]:
37 package_dir = Path(__file__).resolve().parent
38 return tuple(path.name for path in _runtime_code_paths(package_dir))
39
40
41@lru_cache(maxsize=1)
42def current_runtime_fingerprint() -> dict[str, Any]:
43 """Return a stable fingerprint for code that affects daemon behavior."""
44
45 from nipux_cli import __version__
46 from nipux_cli.tools import DEFAULT_REGISTRY
47 from nipux_cli.worker_policy import SYSTEM_PROMPT, WORKER_PROTOCOL_VERSION
48
49 tool_schema = DEFAULT_REGISTRY.openai_tools()
50 tool_schema_hash = hashlib.sha256(json.dumps(tool_schema, sort_keys=True, default=str).encode("utf-8")).hexdigest()
51 prompt_hash = hashlib.sha256(SYSTEM_PROMPT.encode("utf-8")).hexdigest()
52 code_fingerprint = _runtime_code_fingerprint()
53 payload = {
54 "nipux_version": __version__,
55 "worker_protocol": WORKER_PROTOCOL_VERSION,
56 "tool_schema_hash": tool_schema_hash[:16],
57 "prompt_hash": prompt_hash[:16],
58 "code_hash": code_fingerprint["code_hash"],
59 "code_mtime": code_fingerprint["code_mtime"],
60 "tool_count": len(DEFAULT_REGISTRY.names()),
61 }
62 hash_payload = {key: value for key, value in payload.items() if key != "code_mtime"}
63 payload["runtime_hash"] = hashlib.sha256(json.dumps(hash_payload, sort_keys=True).encode("utf-8")).hexdigest()[:16]
64 return payload
65
66
67@lru_cache(maxsize=1)
68def _runtime_code_fingerprint() -> dict[str, Any]:
69 package_dir = Path(__file__).resolve().parent
70 digest = hashlib.sha256()
71 mtimes: list[float] = []
72 for path in _runtime_code_paths(package_dir):
73 name = path.name
74 digest.update(name.encode("utf-8"))
75 data = path.read_bytes()
76 digest.update(hashlib.sha256(data).digest())
77 mtimes.append(path.stat().st_mtime)
78 return {
79 "code_hash": digest.hexdigest()[:16],
80 "code_mtime": max(mtimes) if mtimes else 0,
81 }
82
83
84def _runtime_code_paths(package_dir: Path) -> list[Path]:
85 return sorted(path for path in package_dir.glob(RUNTIME_CODE_GLOB) if path.is_file())
86
87
88RUNTIME_CODE_FILES = runtime_code_file_names()
89PROVIDER_RECOVERY_PROBE_SECONDS = 300.0
90WORK_HEARTBEAT_INTERVAL_SECONDS = 15.0
91
92
93def runtime_stale(metadata: dict[str, Any] | None) -> bool:
94 if not isinstance(metadata, dict):
95 return False
96 recorded = metadata.get("runtime")
97 if not isinstance(recorded, dict):
98 return True
99 return recorded.get("runtime_hash") != current_runtime_fingerprint().get("runtime_hash")
100
101
102def _parse_lock_metadata(raw: str) -> dict[str, Any]:
103 raw = raw.strip()
104 if not raw:
105 return {}
106 try:
107 parsed = json.loads(raw)
108 return parsed if isinstance(parsed, dict) else {"raw": raw}
109 except json.JSONDecodeError:
110 return {"raw": raw}
111
112
113def daemon_lock_status(path: str | Path) -> dict[str, Any]:
114 """Return whether another process currently holds the daemon lock."""
115
116 path = Path(path)
117 path.parent.mkdir(parents=True, exist_ok=True)
118 with path.open("a+", encoding="utf-8") as handle:
119 handle.seek(0)
120 metadata = _parse_lock_metadata(handle.read())
121 try:
122 fcntl.flock(handle.fileno(), fcntl.LOCK_EX | fcntl.LOCK_NB)
123 except BlockingIOError:
124 stale = runtime_stale(metadata)
125 return {
126 "running": True,
127 "lock_path": str(path),
128 "metadata": metadata,
129 "stale": stale,
130 "current_runtime": current_runtime_fingerprint(),
131 "detail": "daemon lock is held",
132 }
133 with contextlib.suppress(OSError):
134 fcntl.flock(handle.fileno(), fcntl.LOCK_UN)
135 return {
136 "running": False,
137 "lock_path": str(path),
138 "metadata": metadata,
139 "stale": False,
140 "current_runtime": current_runtime_fingerprint(),
141 "detail": "daemon lock is free",
142 }
143
144
145@contextlib.contextmanager
146def single_instance_lock(path: str | Path):
147 """Hold an exclusive non-blocking daemon lock for this state directory."""
148
149 path = Path(path)
150 path.parent.mkdir(parents=True, exist_ok=True)
151 with path.open("w+", encoding="utf-8") as handle:
152 try:
153 fcntl.flock(handle.fileno(), fcntl.LOCK_EX | fcntl.LOCK_NB)
154 except BlockingIOError as exc:
155 raise DaemonAlreadyRunning(f"Another nipux daemon holds {path}") from exc
156 payload = {
157 "pid": os.getpid(),
158 "started_at": datetime.now(timezone.utc).isoformat(),
159 "runtime": current_runtime_fingerprint(),
160 }
161 handle.seek(0)
162 handle.truncate()
163 handle.write(json.dumps(payload, sort_keys=True))
164 handle.flush()
165 try:
166 yield handle
167 finally:
168 fcntl.flock(handle.fileno(), fcntl.LOCK_UN)
169
170
171def update_lock_metadata(handle, **patch: Any) -> None:
172 handle.seek(0)
173 metadata = _parse_lock_metadata(handle.read())
174 metadata.setdefault("pid", os.getpid())
175 metadata.setdefault("started_at", datetime.now(timezone.utc).isoformat())
176 metadata.update(patch)
177 handle.seek(0)
178 handle.truncate()
179 handle.write(json.dumps(metadata, sort_keys=True))
180 handle.flush()
181
182
183@contextlib.contextmanager
184def _work_heartbeat(
185 update_metadata: Callable[..., None],
186 *,
187 interval_seconds: float | None = None,
188 state: str = "working",
189 **metadata: Any,
190):
191 """Keep daemon lock metadata fresh while a worker turn is in progress."""
192
193 raw_interval = WORK_HEARTBEAT_INTERVAL_SECONDS if interval_seconds is None else interval_seconds
194 interval = max(0.01, float(raw_interval))
195 stop = threading.Event()
196
197 def beat() -> None:
198 while not stop.wait(interval):
199 update_metadata(
200 last_heartbeat=datetime.now(timezone.utc).isoformat(),
201 last_state=state,
202 runtime=current_runtime_fingerprint(),
203 **metadata,
204 )
205
206 update_metadata(
207 last_heartbeat=datetime.now(timezone.utc).isoformat(),
208 last_state=state,
209 runtime=current_runtime_fingerprint(),
210 **metadata,
211 )
212 thread = threading.Thread(target=beat, name="nipux-daemon-heartbeat", daemon=True)
213 thread.start()
214 try:
215 yield
216 finally:
217 stop.set()
218 thread.join(timeout=1.0)
219
220
221def append_daemon_event(config: AppConfig, event: str, **fields: Any) -> Path:
222 """Append a small daemon event that the CLI can tail without parsing stdout."""
223
224 config.ensure_dirs()
225 path = config.runtime.logs_dir / "daemon-events.jsonl"
226 payload = {
227 "at": datetime.now(timezone.utc).isoformat(),
228 "event": event,
229 **fields,
230 }
231 with path.open("a", encoding="utf-8") as handle:
232 handle.write(json.dumps(payload, ensure_ascii=False, sort_keys=True, default=str) + "\n")
233 return path
234
235
236def read_daemon_events(config: AppConfig, *, limit: int = 20) -> list[dict[str, Any]]:
237 path = config.runtime.logs_dir / "daemon-events.jsonl"
238 if not path.exists():
239 return []
240 lines = path.read_text(encoding="utf-8", errors="replace").splitlines()[-limit:]
241 events: list[dict[str, Any]] = []
242 for line in lines:
243 try:
244 parsed = json.loads(line)
245 except json.JSONDecodeError:
246 events.append({"event": "unparseable", "raw": line})
247 continue
248 if isinstance(parsed, dict):
249 events.append(parsed)
250 return events
251
252
253def fake_step_llm():
254 from nipux_cli.llm import LLMResponse, ScriptedLLM, ToolCall
255
256 nonce = datetime.now(timezone.utc).isoformat()
257 return ScriptedLLM([
258 LLMResponse(tool_calls=[
259 ToolCall(
260 name="write_artifact",
261 arguments={
262 "title": "daemon-fake-step",
263 "type": "text",
264 "summary": "Fake daemon step",
265 "content": f"This is a fake daemon worker step.\n\nnonce: {nonce}",
266 },
267 )
268 ])
269 ])
270
271
272@dataclass
273class Daemon:
274 config: AppConfig
275 db: AgentDB
276
277 @classmethod
278 def open(cls, config: AppConfig | None = None) -> "Daemon":
279 config = config or load_config()
280 config.ensure_dirs()
281 return cls(config=config, db=AgentDB(config.runtime.state_db_path))
282
283 @property
284 def lock_path(self) -> Path:
285 return self.config.runtime.home / "agentd.lock"
286
287 def close(self) -> None:
288 self.db.close()
289
290 def next_runnable_job(self) -> dict | None:
291 """Return the next runnable job by priority/age.
292
293 UI focus is intentionally not used here. Focus is for the operator's
294 chat view; the daemon should keep all runnable jobs advancing.
295 """
296
297 now = datetime.now(timezone.utc)
298 self._maybe_recover_provider_blocked_jobs(now=now)
299 runnable_jobs = self.db.list_jobs(statuses=["queued", "running"])
300 for job in runnable_jobs:
301 if job_provider_blocked(job):
302 self.db.update_job_status(
303 job["id"],
304 "paused",
305 metadata_patch={
306 "last_note": "Model provider still requires operator action; paused before retrying failed calls.",
307 },
308 )
309 self.db.append_agent_update(
310 job["id"],
311 "Model provider still requires operator action; paused before retrying failed calls.",
312 category="error",
313 metadata={"reason": "llm_provider_blocked"},
314 )
315 continue
316 for job in runnable_jobs:
317 if job_provider_blocked(job):
318 continue
319 if job_is_deferred(job, now=now):
320 continue
321 return job
322 return None
323
324 def _maybe_recover_provider_blocked_jobs(self, *, now: datetime) -> None:
325 paused = [job for job in self.db.list_jobs(statuses=["paused"]) if job_provider_blocked(job)]
326 due = [job for job in paused if _provider_probe_due(job, now=now)]
327 if not due:
328 return
329 ok, detail = _model_generation_ready(self.config)
330 timestamp = now.isoformat()
331 if not ok:
332 for job in due:
333 self.db.update_job_status(
334 job["id"],
335 "paused",
336 metadata_patch={
337 "provider_last_probe_at": timestamp,
338 "provider_last_probe_detail": detail[:1000],
339 "last_note": "Model provider still unavailable; daemon will check again later.",
340 },
341 )
342 append_daemon_event(
343 self.config,
344 "provider_recovery_wait",
345 checked_jobs=len(due),
346 detail=detail[:500],
347 next_probe_seconds=PROVIDER_RECOVERY_PROBE_SECONDS,
348 )
349 return
350 for job in paused:
351 self.db.update_job_status(
352 job["id"],
353 "queued",
354 metadata_patch={
355 "provider_last_probe_at": timestamp,
356 "provider_unblocked_at": timestamp,
357 "last_note": "Model provider recovered; daemon resumed this job.",
358 },
359 )
360 self.db.append_agent_update(
361 job["id"],
362 "Model provider recovered; continuing queued work.",
363 category="progress",
364 metadata={"reason": "llm_provider_recovered"},
365 )
366 append_daemon_event(self.config, "provider_recovered", resumed_jobs=len(paused), detail=detail[:500])
367
368 def idle_sleep_seconds(self, *, poll_seconds: float, now: datetime | None = None) -> float:
369 """Return the next idle sleep, capped by the nearest deferred job wake."""
370
371 fallback = max(5.0, poll_seconds)
372 now = (now or datetime.now(timezone.utc)).astimezone(timezone.utc)
373 due_times: list[datetime] = []
374 for job in self.db.list_jobs(statuses=["queued", "running"]):
375 due = job_deferred_until(job, now=now)
376 if due is not None:
377 due_times.append(due)
378 if not due_times:
379 return fallback
380 wait_seconds = min((due - now).total_seconds() for due in due_times)
381 return max(0.5, min(fallback, wait_seconds))
382
383 def run_once(self, *, fake: bool = False, verbose: bool = False):
384 from nipux_cli.worker import run_one_step
385
386 job = self.next_runnable_job()
387 if job is None:
388 return None
389 if verbose:
390 print(f"thinking job={job['id']} title={job['title']} kind={job['kind']}", flush=True)
391 print(f"objective: {job['objective']}", flush=True)
392 llm = fake_step_llm() if fake else None
393 return run_one_step(job["id"], config=self.config, db=self.db, llm=llm)
394
395 def send_due_daily_digest(self, *, now: datetime | None = None) -> dict | None:
396 if not self.config.runtime.daily_digest_enabled:
397 return None
398 now = now or datetime.now()
399 if not _is_digest_due(now, self.config.runtime.daily_digest_time):
400 return None
401 day = now.date().isoformat()
402 target = self.config.email.to_addr or "dry-run"
403 if self.db.digest_exists(day=day, target=target):
404 return None
405 return write_daily_digest(self.config, self.db, day=day)
406
407 def run_forever(
408 self,
409 *,
410 fake: bool = False,
411 poll_seconds: float = 30.0,
412 quiet: bool = False,
413 verbose: bool = False,
414 max_iterations: int | None = None,
415 ) -> None:
416 consecutive_failures = 0
417 iterations = 0
418 with single_instance_lock(self.lock_path) as lock_handle:
419 metadata_lock = threading.Lock()
420
421 def locked_update_metadata(**patch: Any) -> None:
422 with metadata_lock:
423 update_lock_metadata(lock_handle, **patch)
424
425 previous_sigterm = signal.getsignal(signal.SIGTERM)
426 signal.signal(signal.SIGTERM, _raise_keyboard_interrupt)
427 cleaned_shell_processes = cleanup_registered_shell_processes(self.config.runtime.home)
428 recovered = self.db.mark_interrupted_running(reason="daemon recovered abandoned running work from a previous process")
429 append_daemon_event(
430 self.config,
431 "daemon_started",
432 pid=os.getpid(),
433 fake=fake,
434 poll_seconds=poll_seconds,
435 recovered_steps=recovered["steps"],
436 recovered_runs=recovered["runs"],
437 cleaned_shell_processes=len(cleaned_shell_processes),
438 runtime=current_runtime_fingerprint(),
439 )
440 if cleaned_shell_processes:
441 append_daemon_event(
442 self.config,
443 "shell_processes_cleaned",
444 count=len(cleaned_shell_processes),
445 processes=cleaned_shell_processes[:12],
446 )
447 if recovered["steps"] or recovered["runs"]:
448 append_daemon_event(self.config, "stale_work_recovered", **recovered)
449 if not quiet:
450 print(f"nipux daemon started; db={self.config.runtime.state_db_path}", flush=True)
451 try:
452 while True:
453 iterations += 1
454 locked_update_metadata(
455 last_heartbeat=datetime.now(timezone.utc).isoformat(),
456 last_state="checking",
457 consecutive_failures=consecutive_failures,
458 runtime=current_runtime_fingerprint(),
459 )
460
461 try:
462 digest = self.send_due_daily_digest()
463 if digest:
464 append_daemon_event(self.config, "daily_digest", **digest)
465 if not quiet:
466 print(f"daily_digest {json.dumps(digest, ensure_ascii=False)}", flush=True)
467 with _work_heartbeat(
468 locked_update_metadata,
469 consecutive_failures=consecutive_failures,
470 ):
471 result = self.run_once(fake=fake, verbose=verbose and not quiet)
472 except Exception as exc:
473 consecutive_failures += 1
474 payload = _exception_payload(exc)
475 locked_update_metadata(
476 last_heartbeat=datetime.now(timezone.utc).isoformat(),
477 last_state="error",
478 last_error=payload["error"],
479 last_error_type=payload["error_type"],
480 consecutive_failures=consecutive_failures,
481 runtime=current_runtime_fingerprint(),
482 )
483 append_daemon_event(self.config, "daemon_error", **payload, consecutive_failures=consecutive_failures)
484 if not quiet:
485 print(
486 f"daemon_error type={payload['error_type']} error={payload['error'][:240]}",
487 flush=True,
488 )
489 _sleep_or_stop(_exception_backoff(exc, poll_seconds, consecutive_failures), max_iterations, iterations)
490 if max_iterations is not None and iterations >= max_iterations:
491 return
492 continue
493
494 if result is None:
495 locked_update_metadata(
496 last_heartbeat=datetime.now(timezone.utc).isoformat(),
497 last_state="idle",
498 runtime=current_runtime_fingerprint(),
499 )
500 idle_sleep = self.idle_sleep_seconds(poll_seconds=poll_seconds)
501 if not quiet:
502 print(f"idle; sleeping {idle_sleep:g}s", flush=True)
503 _sleep_or_stop(idle_sleep, max_iterations, iterations)
504 else:
505 consecutive_failures = consecutive_failures + 1 if result.status == "failed" else 0
506 locked_update_metadata(
507 last_heartbeat=datetime.now(timezone.utc).isoformat(),
508 last_state="step",
509 last_job_id=result.job_id,
510 last_run_id=result.run_id,
511 last_step_id=result.step_id,
512 last_status=result.status,
513 last_tool=result.tool_name,
514 last_error="" if result.status != "failed" else str(result.result.get("error") or ""),
515 last_error_type="" if result.status != "failed" else str(result.result.get("error_type") or ""),
516 consecutive_failures=consecutive_failures,
517 runtime=current_runtime_fingerprint(),
518 )
519 detail = result.result.get("error") or result.result.get("artifact_id") or result.result.get("content", "")
520 append_daemon_event(
521 self.config,
522 "step",
523 job_id=result.job_id,
524 run_id=result.run_id,
525 step_id=result.step_id,
526 status=result.status,
527 tool=result.tool_name,
528 detail=str(detail)[:500],
529 consecutive_failures=consecutive_failures,
530 )
531 if not quiet:
532 print(
533 f"step job={result.job_id} run={result.run_id} step={result.step_id} "
534 f"status={result.status} tool={result.tool_name or '-'} detail={str(detail)[:240]}",
535 flush=True,
536 )
537 if verbose:
538 print(json.dumps(result.result, ensure_ascii=False, indent=2)[:8000], flush=True)
539 sleep_seconds = (
540 _step_failure_backoff(result, poll_seconds, consecutive_failures)
541 if result.status == "failed"
542 else max(0.0, poll_seconds)
543 )
544 _sleep_or_stop(sleep_seconds, max_iterations, iterations)
545 if max_iterations is not None and iterations >= max_iterations:
546 return
547 except KeyboardInterrupt:
548 interrupted = self.db.mark_interrupted_running(reason="daemon stopped during active work")
549 locked_update_metadata(
550 last_heartbeat=datetime.now(timezone.utc).isoformat(),
551 last_state="stopped",
552 consecutive_failures=consecutive_failures,
553 runtime=current_runtime_fingerprint(),
554 )
555 append_daemon_event(self.config, "daemon_stopped", pid=os.getpid(), interrupted_steps=interrupted["steps"], interrupted_runs=interrupted["runs"])
556 if not quiet:
557 print("nipux daemon stopped", flush=True)
558 finally:
559 signal.signal(signal.SIGTERM, previous_sigterm)
560
561
562def _is_digest_due(now: datetime, configured_time: str) -> bool:
563 try:
564 hour_text, minute_text = configured_time.split(":", 1)
565 hour = int(hour_text)
566 minute = int(minute_text)
567 except ValueError:
568 hour, minute = 8, 0
569 return (now.hour, now.minute) >= (hour, minute)
570
571
572def _raise_keyboard_interrupt(signum, frame) -> None:
573 raise KeyboardInterrupt
574
575
576def _exception_payload(exc: Exception) -> dict[str, str]:
577 return {
578 "error": str(exc),
579 "error_type": type(exc).__name__,
580 }
581
582
583def _failure_backoff(poll_seconds: float, consecutive_failures: int) -> float:
584 base = max(1.0, poll_seconds)
585 return min(60.0, base * min(8, max(1, consecutive_failures)))
586
587
588def _step_failure_backoff(result: Any, poll_seconds: float, consecutive_failures: int) -> float:
589 """Return a retry delay for failed worker steps.
590
591 Worker failures are recorded as failed steps rather than escaping as daemon
592 exceptions, so they use the same generic throttling path here.
593 """
594
595 fallback = _failure_backoff(poll_seconds, consecutive_failures)
596 return fallback
597
598
599def _exception_backoff(exc: Exception, poll_seconds: float, consecutive_failures: int) -> float:
600 fallback = _failure_backoff(poll_seconds, consecutive_failures)
601 if not _is_rate_limit_error(exc):
602 return fallback
603 retry_after = _retry_after_seconds(exc)
604 if retry_after is None:
605 return max(fallback, 10.0)
606 return max(fallback, min(300.0, retry_after))
607
608
609def _is_rate_limit_error(exc: Exception) -> bool:
610 status_code = getattr(exc, "status_code", None)
611 if status_code == 429:
612 return True
613 return _is_rate_limit_text(f"{type(exc).__name__} {exc}")
614
615
616def _is_rate_limit_text(text: str) -> bool:
617 return provider_rate_limited(text)
618
619
620def _retry_after_seconds(exc: Exception) -> float | None:
621 headers = _exception_headers(exc)
622 for key, value in headers.items():
623 normalized = key.lower()
624 if normalized in {"retry-after", "x-ratelimit-reset", "x-rate-limit-reset"}:
625 parsed = _parse_retry_after(value)
626 if parsed is not None:
627 return parsed
628 return None
629
630
631def _exception_headers(exc: Exception) -> dict[str, str]:
632 response = getattr(exc, "response", None)
633 headers = getattr(response, "headers", None)
634 if headers:
635 return {str(key): str(value) for key, value in dict(headers).items()}
636 return {}
637
638
639def _parse_retry_after(value: str) -> float | None:
640 text = str(value).strip()
641 if not text:
642 return None
643 with contextlib.suppress(ValueError):
644 number = float(text)
645 if number > 10_000_000_000:
646 number = number / 1000
647 if number > 1_000_000_000:
648 return max(0.0, number - time.time())
649 return max(0.0, number)
650 with contextlib.suppress(ValueError, TypeError, OSError):
651 parsed = parsedate_to_datetime(text)
652 if parsed.tzinfo is None:
653 parsed = parsed.replace(tzinfo=timezone.utc)
654 return max(0.0, parsed.timestamp() - time.time())
655 return None
656
657
658def _provider_probe_due(job: dict[str, Any], *, now: datetime) -> bool:
659 metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
660 raw = str(metadata.get("provider_last_probe_at") or "").strip()
661 if not raw:
662 return True
663 with contextlib.suppress(ValueError):
664 previous = datetime.fromisoformat(raw.replace("Z", "+00:00"))
665 if previous.tzinfo is None:
666 previous = previous.replace(tzinfo=timezone.utc)
667 return (now.astimezone(timezone.utc) - previous.astimezone(timezone.utc)).total_seconds() >= PROVIDER_RECOVERY_PROBE_SECONDS
668 return True
669
670
671def _model_generation_ready(config: AppConfig) -> tuple[bool, str]:
672 checks = run_doctor(config=config, check_model=True)
673 failures = [check for check in checks if not check.ok and check.name in {"model_config", "model_auth", "model_endpoint", "model_generation"}]
674 if not failures:
675 return True, "model_generation accepted"
676 detail = "; ".join(f"{check.name}: {check.detail}" for check in failures)
677 return False, detail
678
679
680def _sleep_or_stop(seconds: float, max_iterations: int | None, iterations: int) -> None:
681 if max_iterations is not None and iterations >= max_iterations:
682 return
683 time.sleep(seconds)
684
685
686def _focused_job_id(config: AppConfig) -> str | None:
687 path = config.runtime.home / "shell_state.json"
688 if not path.exists():
689 return None
690 try:
691 parsed = json.loads(path.read_text(encoding="utf-8"))
692 except (OSError, json.JSONDecodeError):
693 return None
694 job_id = parsed.get("focus_job_id") if isinstance(parsed, dict) else None
695 return job_id if isinstance(job_id, str) and job_id else None
nipux_cli/daemon_control.py 243 lines
1"""Daemon process control helpers used by CLI commands."""
2
3from __future__ import annotations
4
5import argparse
6import os
7import signal
8import subprocess
9import sys
10import time
11from pathlib import Path
12from typing import Any, Callable
13
14from nipux_cli.config import AppConfig, load_config
15from nipux_cli.cli_state import clear_model_setup_verified, mark_model_setup_verified
16from nipux_cli.daemon import daemon_lock_status
17from nipux_cli.doctor import run_doctor
18from nipux_cli.provider_errors import provider_action_required, provider_rate_limited
19
20
21ReadyFn = Callable[[Any], bool]
22StartFn = Callable[[argparse.Namespace], None]
23StopFn = Callable[[Any], bool]
24PidAliveFn = Callable[[int], bool]
25
26
27def remote_model_preflight_failures(config: Any, *, doctor_fn: Callable[..., list[Any]] = run_doctor) -> list[str]:
28 blocking = {"model_config", "model_auth", "model_endpoint", "model_generation"}
29 checks = doctor_fn(config=config, check_model=True)
30 return [f"{check.name}: {check.detail}" for check in checks if not check.ok and check.name in blocking]
31
32
33def _recoverable_provider_preflight(failures: list[str]) -> bool:
34 if not failures:
35 return False
36 for failure in failures:
37 name = failure.split(":", 1)[0].strip()
38 if name != "model_generation":
39 return False
40 if not (provider_action_required(failure) or provider_rate_limited(failure)):
41 return False
42 return True
43
44
45def recoverable_remote_model_preflight_failures(
46 config: Any,
47 *,
48 doctor_fn: Callable[..., list[Any]] = run_doctor,
49) -> list[str]:
50 failures = remote_model_preflight_failures(config, doctor_fn=doctor_fn)
51 return failures if _recoverable_provider_preflight(failures) else []
52
53
54def provider_preflight_is_recoverable(failures: list[str]) -> bool:
55 return _recoverable_provider_preflight(failures)
56
57
58def ensure_remote_model_ready_for_worker(
59 config: Any,
60 *,
61 fake: bool,
62 doctor_fn: Callable[..., list[Any]] = run_doctor,
63) -> bool:
64 if fake:
65 return True
66 failures = remote_model_preflight_failures(config, doctor_fn=doctor_fn)
67 if not failures:
68 mark_model_setup_verified(config)
69 return True
70 if _recoverable_provider_preflight(failures):
71 clear_model_setup_verified()
72 print("model provider is not ready; starting daemon in recovery monitor mode")
73 for failure in failures:
74 print(f" wait {failure}")
75 print("The daemon will periodically re-check the configured model and resume provider-blocked jobs when it works.")
76 return True
77 clear_model_setup_verified()
78 print("model is not ready; daemon not started")
79 for failure in failures:
80 print(f" fail {failure}")
81 print("Run `nipux doctor --check-model` after fixing the model configuration.")
82 return False
83
84
85def cmd_start_impl(
86 args: argparse.Namespace,
87 *,
88 ready_fn: Callable[[Any, bool], bool],
89 stop_fn: Callable[[AppConfig, float, bool], bool],
90) -> None:
91 config = load_config()
92 config.ensure_dirs()
93 status = daemon_lock_status(config.runtime.home / "agentd.lock")
94 if status["running"]:
95 metadata = status.get("metadata") or {}
96 if status.get("stale"):
97 print(f"nipux daemon stale pid={metadata.get('pid', 'unknown')}; restarting")
98 stop_fn(config, 5.0, True)
99 time.sleep(0.5)
100 else:
101 print(f"nipux daemon already running pid={metadata.get('pid', 'unknown')}")
102 return
103 if not ready_fn(config, bool(args.fake)):
104 return
105 log_path = Path(args.log_file).expanduser() if args.log_file else config.runtime.logs_dir / "daemon.log"
106 log_path.parent.mkdir(parents=True, exist_ok=True)
107 command = [
108 sys.executable,
109 "-m",
110 "nipux_cli.cli",
111 "daemon",
112 "--poll-seconds",
113 str(args.poll_seconds),
114 ]
115 if args.fake:
116 command.append("--fake")
117 command.append("--quiet" if args.quiet else "--verbose")
118 with log_path.open("a", encoding="utf-8") as log_file:
119 process = subprocess.Popen(
120 command,
121 cwd=str(Path.cwd()),
122 stdout=log_file,
123 stderr=subprocess.STDOUT,
124 start_new_session=True,
125 )
126 time.sleep(0.5)
127 status = daemon_lock_status(config.runtime.home / "agentd.lock")
128 if status["running"]:
129 metadata = status.get("metadata") or {}
130 print(f"nipux daemon started pid={metadata.get('pid') or process.pid}")
131 print(f"log: {log_path}")
132 return
133 if process.poll() is None:
134 print(f"nipux daemon process started pid={process.pid}, waiting for lock")
135 print(f"log: {log_path}")
136 return
137 raise SystemExit(f"nipux daemon exited immediately with code {process.returncode}; see {log_path}")
138
139
140def start_daemon_if_needed_impl(
141 *,
142 poll_seconds: float,
143 fake: bool,
144 quiet: bool,
145 log_file: str | None,
146 start_fn: StartFn,
147 stop_fn: Callable[[AppConfig, float, bool], bool],
148) -> None:
149 config = load_config()
150 config.ensure_dirs()
151 status = daemon_lock_status(config.runtime.home / "agentd.lock")
152 if status["running"]:
153 metadata = status.get("metadata") or {}
154 if status.get("stale"):
155 print(f"daemon stale pid={metadata.get('pid', 'unknown')}; restarting")
156 stop_fn(config, 5.0, True)
157 time.sleep(0.5)
158 start_fn(argparse.Namespace(poll_seconds=poll_seconds, fake=fake, quiet=quiet, log_file=log_file))
159 return
160 print(f"daemon already running pid={metadata.get('pid', 'unknown')}")
161 return
162 start_fn(argparse.Namespace(poll_seconds=poll_seconds, fake=fake, quiet=quiet, log_file=log_file))
163
164
165def cmd_restart_impl(
166 args: argparse.Namespace,
167 *,
168 start_fn: StartFn,
169 stop_fn: Callable[[AppConfig, float, bool], bool],
170) -> None:
171 config = load_config()
172 config.ensure_dirs()
173 stopped = stop_fn(config, float(args.wait), False)
174 if stopped:
175 time.sleep(0.5)
176 start_fn(argparse.Namespace(poll_seconds=args.poll_seconds, fake=args.fake, quiet=args.quiet, log_file=args.log_file))
177
178
179def stop_daemon_process_impl(
180 config: AppConfig,
181 *,
182 wait: float,
183 quiet: bool,
184 pid_alive: PidAliveFn,
185) -> bool:
186 status = daemon_lock_status(config.runtime.home / "agentd.lock")
187 if not status["running"]:
188 if not quiet:
189 print("nipux daemon is not running")
190 return False
191 metadata = status.get("metadata") or {}
192 pid = metadata.get("pid")
193 if not isinstance(pid, int):
194 recovered = _find_single_daemon_process()
195 if recovered is None:
196 raise SystemExit("daemon is running but lock file has no pid; stop it from the terminal that owns it")
197 pid = recovered
198 if not quiet:
199 print(f"daemon lock had no pid; recovered daemon pid={pid}")
200 os.kill(pid, signal.SIGTERM)
201 deadline = time.time() + wait
202 while time.time() < deadline:
203 if not pid_alive(pid):
204 if not quiet:
205 print(f"nipux daemon stopped pid={pid}")
206 return True
207 time.sleep(0.2)
208 if not quiet:
209 print(f"sent SIGTERM to nipux daemon pid={pid}; it may still be shutting down")
210 return False
211
212
213def _find_single_daemon_process() -> int | None:
214 """Best-effort recovery for older locks that lost pid metadata."""
215
216 try:
217 result = subprocess.run(["ps", "-eo", "pid=,args="], capture_output=True, text=True, timeout=5)
218 except (OSError, subprocess.SubprocessError):
219 return None
220 if result.returncode != 0:
221 return None
222 candidates: list[int] = []
223 current_pid = os.getpid()
224 for raw_line in result.stdout.splitlines():
225 line = raw_line.strip()
226 if not line:
227 continue
228 pid_text, _, command = line.partition(" ")
229 try:
230 pid = int(pid_text)
231 except ValueError:
232 continue
233 if pid == current_pid:
234 continue
235 normalized = " ".join(command.split())
236 if "-m nipux_cli.cli daemon" in normalized or " nipux_cli.cli daemon" in normalized:
237 candidates.append(pid)
238 continue
239 parts = normalized.split()
240 if parts and Path(parts[0]).name == "nipux" and "daemon" in parts[1:]:
241 candidates.append(pid)
242 unique = sorted(set(candidates))
243 return unique[0] if len(unique) == 1 else None
nipux_cli/dashboard.py 493 lines
1"""Operator-facing dashboard state and rendering."""
2
3from __future__ import annotations
4
5from collections import Counter
6from datetime import datetime, timezone
7from pathlib import Path
8from textwrap import shorten
9from typing import Any
10
11from nipux_cli.config import AppConfig
12from nipux_cli.daemon import daemon_lock_status
13from nipux_cli.db import AgentDB
14from nipux_cli.operator_context import active_prompt_operator_entries
15from nipux_cli.scheduling import job_deferred_until
16from nipux_cli.tools import DEFAULT_REGISTRY
17
18
19def collect_dashboard_state(
20 db: AgentDB,
21 config: AppConfig,
22 *,
23 job_id: str | None = None,
24 limit: int = 12,
25) -> dict[str, Any]:
26 """Build a serializable snapshot for status and dashboard commands."""
27
28 jobs = db.list_jobs()
29 selected = _select_focus_job(db, jobs, job_id)
30 job_cards = [_job_card(db, job) for job in jobs]
31 focus = _focus_state(db, selected, limit=limit) if selected else None
32 return {
33 "generated_at": datetime.now(timezone.utc).isoformat(),
34 "daemon": daemon_lock_status(config.runtime.home / "agentd.lock"),
35 "runtime": {
36 "home": str(config.runtime.home),
37 "state_db": str(config.runtime.state_db_path),
38 "logs_dir": str(config.runtime.logs_dir),
39 "model": config.model.model,
40 "base_url": config.model.base_url,
41 "tool_count": len(DEFAULT_REGISTRY.names()),
42 },
43 "jobs": job_cards,
44 "focus": focus,
45 }
46
47
48def render_dashboard(state: dict[str, Any], *, width: int = 120, chars: int = 260) -> str:
49 """Render a compact terminal dashboard."""
50
51 width = max(72, min(width, 160))
52 line = "-" * width
53 runtime = state["runtime"]
54 daemon = state["daemon"]
55 focus = state.get("focus")
56 generated_at = _compact_time(state["generated_at"])
57 daemon_text = _daemon_text(daemon)
58 lines = [
59 "Nipux CLI Dashboard".ljust(width - len(generated_at)) + generated_at,
60 line,
61 f"daemon: {daemon_text}",
62 f"model: {runtime['model']} | endpoint: {runtime['base_url']} | tools: {runtime['tool_count']}",
63 f"home: {runtime['home']}",
64 "trace: model-visible state, tool calls, outputs, artifacts, and errors. Hidden chain-of-thought is not exposed.",
65 line,
66 "Jobs",
67 ]
68 jobs = state.get("jobs") or []
69 if not jobs:
70 lines.append(" no jobs yet")
71 else:
72 lines.append(" title state kind steps artifacts last action")
73 for job in jobs[:12]:
74 latest = job.get("latest_step") or {}
75 last_action = _one_line(latest.get("summary") or latest.get("error") or "-", 42)
76 display_state = _job_state_text(job, bool(daemon.get("running")))
77 lines.append(
78 f" {_one_line(job['title'], 29):<29} {display_state:<10} {job['kind']:<15} "
79 f"{job['step_count']:>5} {job['artifact_count']:>10} {last_action}"
80 )
81 if focus:
82 lines.extend(_render_focus(focus, width=width, chars=chars, daemon_running=bool(daemon.get("running"))))
83 return "\n".join(lines).rstrip() + "\n"
84
85
86def render_overview(state: dict[str, Any], *, width: int = 100) -> str:
87 """Render a human-sized status view for the interactive shell."""
88
89 width = max(72, min(width, 120))
90 runtime = state["runtime"]
91 daemon = state["daemon"]
92 focus = state.get("focus")
93 jobs = state.get("jobs") or []
94 latest_step = ((focus or {}).get("recent_steps") or [{}])[-1] if focus else {}
95 lines = [
96 "Nipux Status",
97 "=" * min(width, 96),
98 f"daemon: {_daemon_health_text(daemon, latest_step=latest_step)}",
99 f"model: {runtime['model']}",
100 f"jobs: {len(jobs)} total | tools: {runtime['tool_count']} | home: {runtime['home']}",
101 ]
102 if not focus:
103 lines.append("focus: no job yet")
104 lines.append("")
105 lines.append("next: create \"your objective\" --title \"name\"")
106 return "\n".join(lines).rstrip() + "\n"
107
108 job = focus["job"]
109 counts = focus["counts"]
110 artifacts = focus.get("artifacts") or []
111 metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
112 operator = (metadata.get("last_operator_message") if isinstance(metadata, dict) else None) or {}
113 agent_update = (metadata.get("last_agent_update") if isinstance(metadata, dict) else None) or {}
114 lesson = (metadata.get("last_lesson") if isinstance(metadata, dict) else None) or {}
115 findings = metadata.get("finding_ledger") if isinstance(metadata.get("finding_ledger"), list) else []
116 sources = metadata.get("source_ledger") if isinstance(metadata.get("source_ledger"), list) else []
117 tasks = metadata.get("task_queue") if isinstance(metadata.get("task_queue"), list) else []
118 experiments = metadata.get("experiment_ledger") if isinstance(metadata.get("experiment_ledger"), list) else []
119 active_operator = _active_operator_messages(metadata)
120 pending_measurement = metadata.get("pending_measurement_obligation") if isinstance(metadata.get("pending_measurement_obligation"), dict) else {}
121 lines.extend([
122 "",
123 f"focus: {job['title']}",
124 (
125 f"state: {_job_state_text(job, bool(daemon.get('running')))} | "
126 f"worker: {_worker_text(job, bool(daemon.get('running')))} | kind: {job['kind']} | "
127 f"steps: {counts['steps']} | artifacts: {counts['artifacts']} | failures: {counts['failed_steps']}"
128 ),
129 f"learning: findings={len(findings)} | sources={len(sources)} | tasks={len(tasks)} | experiments={len(experiments)} | lessons={counts.get('lessons', 0)} | reflections={counts.get('reflections', 0)}",
130 f"objective: {_one_line(job['objective'], width - 11)}",
131 ])
132 if active_operator:
133 lines.append(f"operator context: {len(active_operator)} active | {_one_line(active_operator[-1].get('message') or '', width - 28)}")
134 if pending_measurement:
135 lines.append(f"measurement: pending from step #{pending_measurement.get('source_step_no') or '?'}")
136 if latest_step:
137 tool = latest_step.get("tool_name") or latest_step.get("kind") or "-"
138 status = latest_step.get("status") or "-"
139 summary = latest_step.get("summary") or latest_step.get("error") or "-"
140 lines.append(f"latest: #{latest_step.get('step_no')} {status} {tool}: {_one_line(summary, width - 22)}")
141 if artifacts:
142 artifact = artifacts[0]
143 lines.append(f"latest artifact: {artifact.get('title') or artifact['id']}")
144 if operator:
145 lines.append(f"last steering: {_one_line(operator.get('message') or '', width - 15)}")
146 if agent_update:
147 lines.append(f"agent note: {_one_line(agent_update.get('message') or '', width - 12)}")
148 if lesson:
149 lines.append(f"latest lesson: {_one_line(lesson.get('lesson') or '', width - 16)}")
150 lines.extend([
151 "",
152 "commands: activity | updates | findings | tasks | sources | memory | metrics | work --steps 3 | start | stop",
153 ])
154 return "\n".join(lines).rstrip() + "\n"
155
156
157def _select_focus_job(db: AgentDB, jobs: list[dict[str, Any]], job_id: str | None) -> dict[str, Any] | None:
158 if job_id:
159 return db.get_job(job_id)
160 for status in ("running", "queued", "paused", "failed", "completed"):
161 for job in jobs:
162 if job.get("status") == status:
163 return job
164 return jobs[0] if jobs else None
165
166
167def _job_card(db: AgentDB, job: dict[str, Any]) -> dict[str, Any]:
168 steps = db.list_steps(job_id=job["id"])
169 artifacts = db.list_artifacts(job["id"], limit=500)
170 runs = db.list_runs(job["id"], limit=500)
171 return {
172 "id": job["id"],
173 "status": job["status"],
174 "kind": job["kind"],
175 "title": job["title"],
176 "updated_at": job["updated_at"],
177 "step_count": _step_count(steps),
178 "run_count": len(runs),
179 "failed_steps": sum(1 for step in steps if step.get("status") == "failed"),
180 "artifact_count": len(artifacts),
181 "latest_step": _public_step(steps[-1]) if steps else None,
182 }
183
184
185def _focus_state(db: AgentDB, job: dict[str, Any], *, limit: int) -> dict[str, Any]:
186 steps = db.list_steps(job_id=job["id"])
187 runs = db.list_runs(job["id"], limit=limit)
188 artifacts = db.list_artifacts(job["id"], limit=limit)
189 memory = db.list_memory(job["id"])
190 metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
191 lessons = metadata.get("lessons") if isinstance(metadata.get("lessons"), list) else []
192 findings = metadata.get("finding_ledger") if isinstance(metadata.get("finding_ledger"), list) else []
193 sources = metadata.get("source_ledger") if isinstance(metadata.get("source_ledger"), list) else []
194 tasks = metadata.get("task_queue") if isinstance(metadata.get("task_queue"), list) else []
195 experiments = metadata.get("experiment_ledger") if isinstance(metadata.get("experiment_ledger"), list) else []
196 reflections = metadata.get("reflections") if isinstance(metadata.get("reflections"), list) else []
197 active_operator = _active_operator_messages(metadata)
198 tool_counts = Counter(step.get("tool_name") or step.get("kind") or "unknown" for step in steps)
199 blocked = [step for step in steps if str(step.get("error") or "").endswith("blocked") or "blocked" in str(step.get("summary") or "")]
200 return {
201 "job": {
202 "id": job["id"],
203 "title": job["title"],
204 "kind": job["kind"],
205 "status": job["status"],
206 "objective": job["objective"],
207 "updated_at": job["updated_at"],
208 "metadata": job.get("metadata") or {},
209 },
210 "counts": {
211 "steps": _step_count(steps),
212 "runs": len(db.list_runs(job["id"], limit=1000)),
213 "artifacts": len(db.list_artifacts(job["id"], limit=1000)),
214 "failed_steps": sum(1 for step in steps if step.get("status") == "failed"),
215 "blocked_steps": len(blocked),
216 "findings": len(findings),
217 "sources": len(sources),
218 "tasks": len(tasks),
219 "experiments": len(experiments),
220 "active_operator_messages": len(active_operator),
221 "lessons": len(lessons),
222 "reflections": len(reflections),
223 },
224 "tool_counts": dict(tool_counts.most_common(8)),
225 "recent_runs": [_public_run(run) for run in runs],
226 "recent_steps": [_public_step(step) for step in steps[-limit:]],
227 "artifacts": [_public_artifact(artifact) for artifact in artifacts],
228 "memory": [
229 {
230 "key": entry.get("key"),
231 "summary": entry.get("summary"),
232 "artifact_refs": entry.get("artifact_refs") or [],
233 "updated_at": entry.get("updated_at"),
234 }
235 for entry in memory[:4]
236 ],
237 "lessons": [
238 {
239 "at": entry.get("at"),
240 "category": entry.get("category") or "memory",
241 "lesson": entry.get("lesson") or "",
242 "confidence": entry.get("confidence"),
243 }
244 for entry in lessons[-5:]
245 if isinstance(entry, dict)
246 ],
247 "findings": findings[-8:],
248 "tasks": tasks[-12:],
249 "experiments": experiments[-12:],
250 "active_operator_messages": active_operator[-12:],
251 "sources": sources[-8:],
252 "reflections": reflections[-4:],
253 }
254
255
256def _render_focus(focus: dict[str, Any], *, width: int, chars: int, daemon_running: bool) -> list[str]:
257 job = focus["job"]
258 counts = focus["counts"]
259 lines = [
260 "-" * width,
261 f"Focus Job: {job['title']} | state {_job_state_text(job, daemon_running)} | {job['kind']}",
262 f"objective: {_one_line(job['objective'], width - 11)}",
263 (
264 f"counts: steps={counts['steps']} runs={counts['runs']} artifacts={counts['artifacts']} "
265 f"failed_steps={counts['failed_steps']} blocked_steps={counts['blocked_steps']}"
266 ),
267 f"learning: findings={counts.get('findings', 0)} sources={counts.get('sources', 0)} tasks={counts.get('tasks', 0)} experiments={counts.get('experiments', 0)} lessons={counts.get('lessons', 0)} reflections={counts.get('reflections', 0)}",
268 f"tool mix: {_tool_mix(focus.get('tool_counts') or {})}",
269 ]
270 active_operator = focus.get("active_operator_messages") or []
271 if active_operator:
272 lines.append(f"operator context: {len(active_operator)} active | {_one_line(active_operator[-1].get('message') or '', chars)}")
273 pending_measurement = (job.get("metadata") or {}).get("pending_measurement_obligation") if isinstance(job.get("metadata"), dict) else {}
274 if isinstance(pending_measurement, dict) and pending_measurement:
275 lines.append(f"measurement obligation: pending from step #{pending_measurement.get('source_step_no') or '?'}")
276 lines.extend(["", "Recent Steps"])
277 recent_steps = focus.get("recent_steps") or []
278 if not recent_steps:
279 lines.append(" no steps recorded")
280 for step in recent_steps:
281 error = f" | error={_one_line(step['error'], 70)}" if step.get("error") else ""
282 lines.append(
283 f" #{step['step_no']:<4} {step['status']:<9} {step.get('tool_name') or step['kind']:<18} "
284 f"{_one_line(step.get('summary') or '-', chars)}{error}"
285 )
286 args = step.get("arguments") or {}
287 if args:
288 lines.append(f" args: {_one_line(_compact_value(args), chars)}")
289 lines.append("")
290 lines.append("Artifacts")
291 artifacts = focus.get("artifacts") or []
292 if not artifacts:
293 lines.append(" no artifacts yet")
294 for artifact in artifacts[:8]:
295 title = artifact.get("title") or artifact["id"]
296 lines.append(f" {artifact['created_at']} {artifact['type']} {title}")
297 if artifact.get("summary"):
298 lines.append(f" {_one_line(artifact['summary'], chars)}")
299 lessons = focus.get("lessons") or []
300 if lessons:
301 lines.append("")
302 lines.append("Lessons")
303 for lesson in lessons:
304 lines.append(f" {lesson.get('category') or 'memory'}: {_one_line(lesson.get('lesson') or '', chars)}")
305 findings = focus.get("findings") or []
306 if findings:
307 lines.append("")
308 lines.append("Recent Findings")
309 for finding in findings[-5:]:
310 lines.append(f" {_one_line(finding.get('name') or 'unknown', 48)} score={finding.get('score')} {finding.get('category') or ''}")
311 tasks = focus.get("tasks") or []
312 if tasks:
313 lines.append("")
314 lines.append("Task Queue")
315 for task in tasks[-6:]:
316 lines.append(f" {task.get('status') or 'open':<7} p={task.get('priority') or 0:<3} {_one_line(task.get('title') or 'untitled', 56)}")
317 sources = focus.get("sources") or []
318 if sources:
319 lines.append("")
320 lines.append("Recent Sources")
321 for source in sources[-5:]:
322 lines.append(f" {_one_line(source.get('source') or 'unknown', 48)} score={source.get('usefulness_score')} findings={source.get('yield_count') or 0}")
323 memory = focus.get("memory") or []
324 if memory:
325 lines.append("")
326 lines.append("Compact Memory")
327 for entry in memory:
328 refs = ", ".join(entry.get("artifact_refs") or [])
329 suffix = f" refs={refs}" if refs else ""
330 lines.append(f" {entry['key']}: {_one_line(entry.get('summary') or '', chars)}{suffix}")
331 return lines
332
333
334def _public_run(run: dict[str, Any]) -> dict[str, Any]:
335 return {
336 "id": run["id"],
337 "status": run["status"],
338 "started_at": run["started_at"],
339 "ended_at": run.get("ended_at"),
340 "model": run.get("model"),
341 "error": run.get("error"),
342 }
343
344
345def _public_step(step: dict[str, Any]) -> dict[str, Any]:
346 input_data = step.get("input") if isinstance(step.get("input"), dict) else {}
347 args = input_data.get("arguments") if isinstance(input_data.get("arguments"), dict) else {}
348 return {
349 "id": step["id"],
350 "step_no": step["step_no"],
351 "kind": step["kind"],
352 "status": step["status"],
353 "tool_name": step.get("tool_name"),
354 "started_at": step["started_at"],
355 "ended_at": step.get("ended_at"),
356 "summary": _clean_step_summary(step.get("summary")),
357 "error": step.get("error"),
358 "arguments": args,
359 }
360
361
362def _public_artifact(artifact: dict[str, Any]) -> dict[str, Any]:
363 return {
364 "id": artifact["id"],
365 "created_at": artifact["created_at"],
366 "type": artifact["type"],
367 "title": artifact.get("title"),
368 "summary": artifact.get("summary"),
369 "path": artifact["path"],
370 }
371
372
373def _step_count(steps: list[dict[str, Any]]) -> int:
374 numbers = [int(step.get("step_no") or 0) for step in steps]
375 return max(numbers, default=0)
376
377
378def _active_operator_messages(metadata: dict[str, Any]) -> list[dict[str, Any]]:
379 messages = metadata.get("operator_messages") if isinstance(metadata.get("operator_messages"), list) else []
380 prompt_entries = active_prompt_operator_entries(messages)
381 return [
382 entry for entry in messages
383 if isinstance(entry, dict)
384 and entry in prompt_entries
385 and str(entry.get("mode") or "steer") in {"steer", "follow_up"}
386 ]
387
388
389def _daemon_text(daemon: dict[str, Any]) -> str:
390 metadata = daemon.get("metadata") or {}
391 if daemon.get("running"):
392 pid = metadata.get("pid") or "unknown"
393 started = metadata.get("started_at") or "unknown start"
394 stale = " stale-runtime" if daemon.get("stale") else ""
395 return f"running pid={pid}{stale} started={started}"
396 return "ready when work starts"
397
398
399def _daemon_health_text(daemon: dict[str, Any], *, latest_step: dict[str, Any] | None = None) -> str:
400 if not daemon.get("running"):
401 return "ready when work starts"
402 metadata = daemon.get("metadata") or {}
403 heartbeat = metadata.get("last_heartbeat")
404 status = "running"
405 if daemon.get("stale"):
406 status = "running stale-runtime"
407 if heartbeat:
408 age = _age_seconds(heartbeat)
409 if age is not None:
410 status += f" | heartbeat {int(age)}s ago"
411 running_step = latest_step or {}
412 if age > 120 and running_step.get("status") == "running":
413 tool = running_step.get("tool_name") or running_step.get("kind") or "step"
414 step_age = _age_seconds(running_step.get("started_at") or "")
415 if step_age is not None:
416 status += f" | busy #{running_step.get('step_no')} {tool} for {int(step_age)}s"
417 else:
418 status += f" | busy #{running_step.get('step_no')} {tool}"
419 elif age > 120:
420 status += " (stale)"
421 failures = metadata.get("consecutive_failures")
422 if failures:
423 status += f" | consecutive failures: {failures}"
424 tool = metadata.get("last_tool")
425 step_status = metadata.get("last_status")
426 if tool or step_status:
427 status += f" | last: {step_status or '?'} {tool or '-'}"
428 if metadata.get("last_error"):
429 status += f" | error: {_one_line(metadata.get('last_error'), 48)}"
430 return status
431
432
433def _worker_text(job: dict[str, Any], daemon_running: bool) -> str:
434 status = str(job.get("status") or "")
435 if status in {"paused", "completed", "cancelled", "failed"}:
436 return status
437 if job_deferred_until(job):
438 return "waiting"
439 return "active" if daemon_running and status in {"running", "queued"} else "idle"
440
441
442def _job_state_text(job: dict[str, Any], daemon_running: bool) -> str:
443 status = str(job.get("status") or "")
444 if status in {"running", "queued"}:
445 if job_deferred_until(job):
446 return "waiting"
447 return "advancing" if daemon_running else "open"
448 return status or "unknown"
449
450
451def _age_seconds(value: str) -> float | None:
452 try:
453 parsed = datetime.fromisoformat(value.replace("Z", "+00:00"))
454 except ValueError:
455 return None
456 return max(0.0, (datetime.now(timezone.utc) - parsed.astimezone(timezone.utc)).total_seconds())
457
458
459def _compact_time(value: str) -> str:
460 try:
461 parsed = datetime.fromisoformat(value.replace("Z", "+00:00"))
462 except ValueError:
463 return value
464 return parsed.astimezone().strftime("%Y-%m-%d %H:%M:%S %Z")
465
466
467def _one_line(value: Any, width: int) -> str:
468 text = " ".join(str(value).split())
469 return shorten(text, width=max(8, width), placeholder="...")
470
471
472def _clean_step_summary(summary: Any) -> str:
473 text = " ".join(str(summary or "").split())
474 if text.startswith("write_artifact saved ") and " at /" in text:
475 return text.split(" at /", 1)[0]
476 return text
477
478
479def _compact_value(value: Any) -> str:
480 if isinstance(value, dict):
481 parts = [f"{key}={value[key]!r}" for key in sorted(value)]
482 return ", ".join(parts)
483 return str(value)
484
485
486def _tool_mix(tool_counts: dict[str, int]) -> str:
487 if not tool_counts:
488 return "none"
489 return ", ".join(f"{name}:{count}" for name, count in tool_counts.items())
490
491
492def resolve_artifact_path(path: str | Path) -> str:
493 return str(Path(path).expanduser())
nipux_cli/db.py 2752 lines
1"""SQLite state store for the Nipux agent."""
2
3from __future__ import annotations
4
5import json
6import random
7import re
8import sqlite3
9import threading
10import time
11import uuid
12from datetime import datetime, timezone
13from pathlib import Path
14from typing import Any, Callable, Iterable, TypeVar
15
16from nipux_cli.metric_format import format_metric_value
17from nipux_cli.memory_graph import DEFAULT_NODE_KIND, DEFAULT_NODE_STATUS, NODE_KINDS, NODE_STATUSES
18
19T = TypeVar("T")
20
21SCHEMA_VERSION = 1
22
23SCHEMA_SQL = """
24CREATE TABLE IF NOT EXISTS schema_version (
25 version INTEGER NOT NULL
26);
27
28CREATE TABLE IF NOT EXISTS jobs (
29 id TEXT PRIMARY KEY,
30 title TEXT NOT NULL,
31 objective TEXT NOT NULL,
32 kind TEXT NOT NULL DEFAULT 'generic',
33 status TEXT NOT NULL DEFAULT 'queued',
34 priority INTEGER NOT NULL DEFAULT 0,
35 cadence TEXT,
36 created_at TEXT NOT NULL,
37 updated_at TEXT NOT NULL,
38 metadata_json TEXT NOT NULL DEFAULT '{}'
39);
40
41CREATE TABLE IF NOT EXISTS job_runs (
42 id TEXT PRIMARY KEY,
43 job_id TEXT NOT NULL REFERENCES jobs(id),
44 status TEXT NOT NULL,
45 started_at TEXT NOT NULL,
46 ended_at TEXT,
47 model TEXT,
48 config_hash TEXT,
49 score REAL,
50 error TEXT
51);
52
53CREATE TABLE IF NOT EXISTS steps (
54 id TEXT PRIMARY KEY,
55 job_id TEXT NOT NULL REFERENCES jobs(id),
56 run_id TEXT NOT NULL REFERENCES job_runs(id),
57 step_no INTEGER NOT NULL,
58 kind TEXT NOT NULL,
59 status TEXT NOT NULL,
60 tool_name TEXT,
61 started_at TEXT NOT NULL,
62 ended_at TEXT,
63 summary TEXT,
64 input_json TEXT NOT NULL DEFAULT '{}',
65 output_json TEXT NOT NULL DEFAULT '{}',
66 error TEXT
67);
68
69CREATE TABLE IF NOT EXISTS artifacts (
70 id TEXT PRIMARY KEY,
71 job_id TEXT NOT NULL REFERENCES jobs(id),
72 run_id TEXT,
73 step_id TEXT,
74 type TEXT NOT NULL,
75 path TEXT NOT NULL,
76 sha256 TEXT NOT NULL,
77 title TEXT,
78 summary TEXT,
79 metadata_json TEXT NOT NULL DEFAULT '{}',
80 created_at TEXT NOT NULL
81);
82
83CREATE TABLE IF NOT EXISTS evidence (
84 id TEXT PRIMARY KEY,
85 job_id TEXT NOT NULL REFERENCES jobs(id),
86 url_or_source TEXT NOT NULL,
87 artifact_id TEXT REFERENCES artifacts(id),
88 extracted_text_path TEXT,
89 summary TEXT,
90 score_json TEXT NOT NULL DEFAULT '{}',
91 created_at TEXT NOT NULL
92);
93
94CREATE TABLE IF NOT EXISTS memory_index (
95 id TEXT PRIMARY KEY,
96 job_id TEXT NOT NULL REFERENCES jobs(id),
97 key TEXT NOT NULL,
98 summary TEXT NOT NULL,
99 artifact_refs_json TEXT NOT NULL DEFAULT '[]',
100 updated_at TEXT NOT NULL,
101 UNIQUE(job_id, key)
102);
103
104CREATE TABLE IF NOT EXISTS digests (
105 id TEXT PRIMARY KEY,
106 day TEXT NOT NULL,
107 target TEXT,
108 subject TEXT,
109 body_path TEXT,
110 sent_at TEXT,
111 status TEXT NOT NULL,
112 error TEXT
113);
114
115CREATE TABLE IF NOT EXISTS events (
116 id TEXT PRIMARY KEY,
117 job_id TEXT REFERENCES jobs(id),
118 event_type TEXT NOT NULL,
119 created_at TEXT NOT NULL,
120 title TEXT,
121 body TEXT,
122 ref_table TEXT,
123 ref_id TEXT,
124 metadata_json TEXT NOT NULL DEFAULT '{}'
125);
126
127CREATE INDEX IF NOT EXISTS idx_jobs_status_priority ON jobs(status, priority DESC, updated_at);
128CREATE INDEX IF NOT EXISTS idx_runs_job ON job_runs(job_id, started_at DESC);
129CREATE INDEX IF NOT EXISTS idx_steps_run ON steps(run_id, step_no);
130CREATE INDEX IF NOT EXISTS idx_artifacts_job ON artifacts(job_id, created_at DESC);
131CREATE INDEX IF NOT EXISTS idx_events_job_time ON events(job_id, created_at DESC);
132CREATE INDEX IF NOT EXISTS idx_events_ref ON events(ref_table, ref_id);
133"""
134
135
136def utc_now() -> str:
137 return datetime.now(timezone.utc).isoformat()
138
139
140def new_id(prefix: str) -> str:
141 return f"{prefix}_{uuid.uuid4().hex[:16]}"
142
143
144def _slugify(value: str) -> str:
145 slug = re.sub(r"[^a-z0-9]+", "-", value.lower()).strip("-")
146 return slug[:72].strip("-") or new_id("job")
147
148
149def _unique_job_id(conn: sqlite3.Connection, seed: str) -> str:
150 base = _slugify(seed)
151 candidate = base
152 suffix = 2
153 while conn.execute("SELECT 1 FROM jobs WHERE id = ? LIMIT 1", (candidate,)).fetchone():
154 candidate = f"{base[:68]}-{suffix}"
155 suffix += 1
156 return candidate
157
158
159def _json_dumps(value: Any) -> str:
160 return json.dumps(value if value is not None else {}, ensure_ascii=False, sort_keys=True)
161
162
163def _json_loads(value: str | None) -> dict[str, Any]:
164 try:
165 loaded = json.loads(value or "{}")
166 except json.JSONDecodeError:
167 return {}
168 return loaded if isinstance(loaded, dict) else {}
169
170
171def _bounded_float(value: Any, low: float, high: float) -> float:
172 try:
173 number = float(value)
174 except (TypeError, ValueError):
175 return low
176 return min(high, max(low, number))
177
178
179def _merge_string_lists(existing: Any, incoming: Any, *, limit: int) -> list[str]:
180 values: list[str] = []
181 for source in (existing, incoming):
182 if isinstance(source, list):
183 items = source
184 elif isinstance(source, str) and source.strip():
185 items = [source]
186 else:
187 items = []
188 for item in items:
189 text = " ".join(str(item).split())
190 if text and text not in values:
191 values.append(text)
192 return values[-limit:]
193
194
195def _memory_edge_key(edge: dict[str, Any]) -> str:
196 from_key = str(edge.get("from_key") or "")
197 relation = str(edge.get("relation") or "")
198 to_key = str(edge.get("to_key") or "")
199 return f"{from_key}|{relation}|{to_key}" if from_key and relation and to_key else ""
200
201
202def _as_int(value: Any) -> int:
203 try:
204 return int(float(value))
205 except (TypeError, ValueError):
206 return 0
207
208
209def _as_float(value: Any) -> float | None:
210 try:
211 return float(value)
212 except (TypeError, ValueError):
213 return None
214
215
216def _nested_value(value: dict[str, Any], *keys: str) -> Any:
217 current: Any = value
218 for key in keys:
219 if not isinstance(current, dict):
220 return None
221 current = current.get(key)
222 return current
223
224
225def _metadata_list(metadata: dict[str, Any], key: str) -> list[dict[str, Any]]:
226 values = metadata.get(key)
227 if not isinstance(values, list):
228 return []
229 return [value for value in values if isinstance(value, dict)]
230
231
232def _change_fingerprint(entry: dict[str, Any], fields: Iterable[str]) -> str:
233 return _json_dumps({field: entry.get(field) for field in fields})
234
235
236def _norm_key(value: str) -> str:
237 return re.sub(r"[^a-z0-9]+", "-", value.lower()).strip("-")[:120]
238
239
240def _clean_status(value: str, allowed: set[str], default: str) -> str:
241 status = (value.strip().lower() or default).replace(" ", "_")
242 return status if status in allowed else default
243
244
245def _experiment_metric_value(entry: dict[str, Any]) -> float | None:
246 try:
247 value = entry.get("metric_value")
248 if value is None:
249 return None
250 return float(value)
251 except (TypeError, ValueError):
252 return None
253
254
255def _same_metric_group(
256 entry: dict[str, Any],
257 *,
258 metric_name: str,
259 metric_unit: str,
260 higher_is_better: bool,
261) -> bool:
262 return (
263 str(entry.get("metric_name") or "").strip().lower() == metric_name.strip().lower()
264 and str(entry.get("metric_unit") or "").strip().lower() == metric_unit.strip().lower()
265 and bool(entry.get("higher_is_better", True)) == bool(higher_is_better)
266 and _experiment_metric_value(entry) is not None
267 )
268
269
270def _best_experiment_for_metric(
271 experiments: list[dict[str, Any]],
272 *,
273 metric_name: str,
274 metric_unit: str,
275 higher_is_better: bool,
276 exclude_key: str = "",
277) -> dict[str, Any] | None:
278 candidates = [
279 experiment
280 for experiment in experiments
281 if experiment.get("key") != exclude_key
282 and _same_metric_group(
283 experiment,
284 metric_name=metric_name,
285 metric_unit=metric_unit,
286 higher_is_better=higher_is_better,
287 )
288 ]
289 if not candidates:
290 return None
291 return max(candidates, key=lambda item: _experiment_metric_value(item) or 0.0) if higher_is_better else min(candidates, key=lambda item: _experiment_metric_value(item) or 0.0)
292
293
294def _metric_delta(
295 *,
296 metric_value: Any,
297 previous_best: dict[str, Any] | None,
298 higher_is_better: bool,
299) -> float | None:
300 try:
301 current = float(metric_value)
302 except (TypeError, ValueError):
303 return None
304 if previous_best is None:
305 return None
306 previous = _experiment_metric_value(previous_best)
307 if previous is None:
308 return None
309 delta = current - previous if higher_is_better else previous - current
310 return round(delta, 6)
311
312
313def _mark_best_experiments(experiments: list[dict[str, Any]]) -> dict[str, Any] | None:
314 groups: dict[tuple[str, str, bool], list[dict[str, Any]]] = {}
315 for experiment in experiments:
316 metric_name = str(experiment.get("metric_name") or "").strip().lower()
317 if _experiment_metric_value(experiment) is None or not metric_name:
318 experiment["best_observed"] = False
319 continue
320 key = (
321 metric_name,
322 str(experiment.get("metric_unit") or "").strip().lower(),
323 bool(experiment.get("higher_is_better", True)),
324 )
325 groups.setdefault(key, []).append(experiment)
326 winners: list[dict[str, Any]] = []
327 for (_metric_name, _metric_unit, higher_is_better), entries in groups.items():
328 winner = max(entries, key=lambda item: _experiment_metric_value(item) or 0.0) if higher_is_better else min(entries, key=lambda item: _experiment_metric_value(item) or 0.0)
329 for entry in entries:
330 entry["best_observed"] = entry is winner
331 winners.append(winner)
332 if not winners:
333 return None
334 return max(winners, key=lambda item: str(item.get("updated_at") or item.get("created_at") or ""))
335
336
337def _row_to_dict(row: sqlite3.Row | None) -> dict[str, Any] | None:
338 if row is None:
339 return None
340 result = dict(row)
341 for key in ("metadata_json", "input_json", "output_json", "score_json", "artifact_refs_json"):
342 if key in result:
343 try:
344 result[key.removesuffix("_json")] = json.loads(result[key] or "{}")
345 except json.JSONDecodeError:
346 result[key.removesuffix("_json")] = {}
347 return result
348
349
350def _insert_event(
351 conn: sqlite3.Connection,
352 *,
353 job_id: str | None,
354 event_type: str,
355 title: str = "",
356 body: str = "",
357 ref_table: str = "",
358 ref_id: str = "",
359 metadata: dict[str, Any] | None = None,
360 created_at: str | None = None,
361) -> dict[str, Any]:
362 event_id = new_id("evt")
363 when = created_at or utc_now()
364 conn.execute(
365 """
366 INSERT INTO events(id, job_id, event_type, created_at, title, body, ref_table, ref_id, metadata_json)
367 VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)
368 """,
369 (
370 event_id,
371 job_id,
372 event_type.strip().lower() or "event",
373 when,
374 title.strip(),
375 body.strip(),
376 ref_table.strip(),
377 ref_id.strip(),
378 _json_dumps(metadata or {}),
379 ),
380 )
381 return {
382 "id": event_id,
383 "job_id": job_id,
384 "event_type": event_type.strip().lower() or "event",
385 "created_at": when,
386 "title": title.strip(),
387 "body": body.strip(),
388 "ref_table": ref_table.strip(),
389 "ref_id": ref_id.strip(),
390 "metadata": metadata or {},
391 }
392
393
394def _projected_event(
395 *,
396 event_id: str,
397 job_id: str,
398 event_type: str,
399 created_at: str,
400 title: str = "",
401 body: str = "",
402 ref_table: str = "",
403 ref_id: str = "",
404 metadata: dict[str, Any] | None = None,
405) -> dict[str, Any]:
406 return {
407 "id": event_id,
408 "job_id": job_id,
409 "event_type": event_type,
410 "created_at": created_at,
411 "title": title,
412 "body": body,
413 "ref_table": ref_table,
414 "ref_id": ref_id,
415 "metadata": metadata or {},
416 "projected": True,
417 }
418
419
420class AgentDB:
421 """Small SQLite wrapper with WAL and jittered write retries."""
422
423 _WRITE_RETRIES = 12
424
425 def __init__(self, path: str | Path):
426 self.path = Path(path)
427 self.path.parent.mkdir(parents=True, exist_ok=True)
428 self._lock = threading.RLock()
429 self._conn = sqlite3.connect(
430 str(self.path),
431 check_same_thread=False,
432 timeout=1.0,
433 isolation_level=None,
434 )
435 self._conn.row_factory = sqlite3.Row
436 self._conn.execute("PRAGMA journal_mode=WAL")
437 self._conn.execute("PRAGMA foreign_keys=ON")
438 self._init_schema()
439
440 def close(self) -> None:
441 with self._lock:
442 if self._conn is not None:
443 try:
444 self._conn.execute("PRAGMA wal_checkpoint(PASSIVE)")
445 finally:
446 self._conn.close()
447 self._conn = None
448
449 def _init_schema(self) -> None:
450 with self._lock:
451 self._conn.executescript(SCHEMA_SQL)
452 row = self._conn.execute("SELECT version FROM schema_version LIMIT 1").fetchone()
453 if row is None:
454 self._conn.execute("INSERT INTO schema_version(version) VALUES (?)", (SCHEMA_VERSION,))
455 elif int(row["version"]) != SCHEMA_VERSION:
456 raise RuntimeError(f"Unsupported nipux schema version: {row['version']}")
457
458 def _write(self, fn: Callable[[sqlite3.Connection], T]) -> T:
459 last_error: Exception | None = None
460 for attempt in range(self._WRITE_RETRIES):
461 try:
462 with self._lock:
463 self._conn.execute("BEGIN IMMEDIATE")
464 try:
465 result = fn(self._conn)
466 self._conn.commit()
467 return result
468 except BaseException:
469 self._conn.rollback()
470 raise
471 except sqlite3.OperationalError as exc:
472 if "locked" not in str(exc).lower() and "busy" not in str(exc).lower():
473 raise
474 last_error = exc
475 if attempt < self._WRITE_RETRIES - 1:
476 time.sleep(random.uniform(0.02, 0.15))
477 raise last_error or sqlite3.OperationalError("database is locked")
478
479 def append_event(
480 self,
481 job_id: str | None = None,
482 *,
483 event_type: str,
484 title: str = "",
485 body: str = "",
486 ref_table: str = "",
487 ref_id: str = "",
488 metadata: dict[str, Any] | None = None,
489 created_at: str | None = None,
490 ) -> dict[str, Any]:
491 def op(conn: sqlite3.Connection) -> dict[str, Any]:
492 return _insert_event(
493 conn,
494 job_id=job_id,
495 event_type=event_type,
496 title=title,
497 body=body,
498 ref_table=ref_table,
499 ref_id=ref_id,
500 metadata=metadata,
501 created_at=created_at,
502 )
503
504 return self._write(op)
505
506 def list_events(
507 self,
508 *,
509 job_id: str | None = None,
510 limit: int = 100,
511 event_types: Iterable[str] | None = None,
512 ) -> list[dict[str, Any]]:
513 filters = []
514 params: list[Any] = []
515 if job_id is not None:
516 filters.append("job_id = ?")
517 params.append(job_id)
518 if event_types:
519 values = [str(value).strip().lower() for value in event_types if str(value).strip()]
520 if values:
521 filters.append(f"event_type IN ({','.join('?' for _ in values)})")
522 params.extend(values)
523 where = f"WHERE {' AND '.join(filters)}" if filters else ""
524 rows = self._conn.execute(
525 f"""
526 SELECT * FROM (
527 SELECT * FROM events
528 {where}
529 ORDER BY created_at DESC, id DESC
530 LIMIT ?
531 )
532 ORDER BY created_at ASC, id ASC
533 """,
534 [*params, int(limit)],
535 ).fetchall()
536 return [_row_to_dict(row) for row in rows]
537
538 def list_timeline_events(self, job_id: str, *, limit: int = 100) -> list[dict[str, Any]]:
539 """Return visible job history, combining durable events with old projected state."""
540
541 actual = self.list_events(job_id=job_id, limit=max(limit * 4, 250))
542 actual_ids = {str(event.get("id")) for event in actual}
543 actual_refs = {
544 (str(event.get("ref_table") or ""), str(event.get("ref_id") or ""))
545 for event in actual
546 if event.get("ref_table") and event.get("ref_id")
547 }
548 timeline: list[dict[str, Any]] = list(actual)
549 job = self.get_job(job_id)
550 metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
551
552 for index, entry in enumerate(_metadata_list(metadata, "operator_messages")):
553 if entry.get("event_id") in actual_ids:
554 continue
555 timeline.append(_projected_event(
556 event_id=f"projected_operator_{index}",
557 job_id=job_id,
558 event_type="operator_message",
559 created_at=str(entry.get("at") or job.get("updated_at") or job.get("created_at")),
560 title=str(entry.get("source") or "operator"),
561 body=str(entry.get("message") or ""),
562 metadata={
563 "source": entry.get("source") or "operator",
564 "mode": entry.get("mode") or "steer",
565 "claimed_at": entry.get("claimed_at"),
566 "acknowledged_at": entry.get("acknowledged_at"),
567 "superseded_at": entry.get("superseded_at"),
568 },
569 ))
570
571 for index, entry in enumerate(_metadata_list(metadata, "agent_updates")):
572 if entry.get("event_id") in actual_ids:
573 continue
574 timeline.append(_projected_event(
575 event_id=f"projected_agent_{index}",
576 job_id=job_id,
577 event_type="agent_message",
578 created_at=str(entry.get("at") or job.get("updated_at") or job.get("created_at")),
579 title=str(entry.get("category") or "progress"),
580 body=str(entry.get("message") or ""),
581 metadata=entry.get("metadata") if isinstance(entry.get("metadata"), dict) else {},
582 ))
583
584 for index, lesson in enumerate(_metadata_list(metadata, "lessons")):
585 if lesson.get("event_id") in actual_ids:
586 continue
587 timeline.append(_projected_event(
588 event_id=f"projected_lesson_{index}",
589 job_id=job_id,
590 event_type="lesson",
591 created_at=str(lesson.get("at") or lesson.get("last_seen") or job.get("updated_at") or job.get("created_at")),
592 title=str(lesson.get("category") or "memory"),
593 body=str(lesson.get("lesson") or ""),
594 metadata={"confidence": lesson.get("confidence"), **(lesson.get("metadata") if isinstance(lesson.get("metadata"), dict) else {})},
595 ))
596
597 for index, source in enumerate(_metadata_list(metadata, "source_ledger")):
598 if source.get("event_id") in actual_ids:
599 continue
600 timeline.append(_projected_event(
601 event_id=f"projected_source_{index}",
602 job_id=job_id,
603 event_type="source",
604 created_at=str(source.get("last_seen") or source.get("first_seen") or job.get("updated_at") or job.get("created_at")),
605 title=str(source.get("source") or "source"),
606 body=str(source.get("last_outcome") or ""),
607 metadata=source,
608 ))
609
610 for index, finding in enumerate(_metadata_list(metadata, "finding_ledger")):
611 if finding.get("event_id") in actual_ids:
612 continue
613 timeline.append(_projected_event(
614 event_id=f"projected_finding_{index}",
615 job_id=job_id,
616 event_type="finding",
617 created_at=str(finding.get("updated_at") or finding.get("created_at") or job.get("updated_at") or job.get("created_at")),
618 title=str(finding.get("name") or "finding"),
619 body=str(finding.get("reason") or finding.get("category") or ""),
620 metadata=finding,
621 ))
622
623 for index, task in enumerate(_metadata_list(metadata, "task_queue")):
624 if task.get("event_id") in actual_ids:
625 continue
626 timeline.append(_projected_event(
627 event_id=f"projected_task_{index}",
628 job_id=job_id,
629 event_type="task",
630 created_at=str(task.get("updated_at") or task.get("created_at") or job.get("updated_at") or job.get("created_at")),
631 title=str(task.get("title") or "task"),
632 body=str(task.get("result") or task.get("goal") or ""),
633 metadata=task,
634 ))
635
636 for index, experiment in enumerate(_metadata_list(metadata, "experiment_ledger")):
637 if experiment.get("event_id") in actual_ids:
638 continue
639 metric = ""
640 if experiment.get("metric_value") is not None:
641 metric = format_metric_value(
642 experiment.get("metric_name") or "metric",
643 experiment.get("metric_value"),
644 experiment.get("metric_unit") or "",
645 )
646 timeline.append(_projected_event(
647 event_id=f"projected_experiment_{index}",
648 job_id=job_id,
649 event_type="experiment",
650 created_at=str(experiment.get("updated_at") or experiment.get("created_at") or job.get("updated_at") or job.get("created_at")),
651 title=str(experiment.get("title") or "experiment"),
652 body=str(experiment.get("result") or metric or experiment.get("hypothesis") or ""),
653 metadata=experiment,
654 ))
655
656 for index, reflection in enumerate(_metadata_list(metadata, "reflections")):
657 if reflection.get("event_id") in actual_ids:
658 continue
659 timeline.append(_projected_event(
660 event_id=f"projected_reflection_{index}",
661 job_id=job_id,
662 event_type="reflection",
663 created_at=str(reflection.get("at") or job.get("updated_at") or job.get("created_at")),
664 title="reflection",
665 body=str(reflection.get("summary") or reflection.get("strategy") or ""),
666 metadata=reflection.get("metadata") if isinstance(reflection.get("metadata"), dict) else {},
667 ))
668
669 for step in self.list_steps(job_id=job_id):
670 ref = ("steps", str(step["id"]))
671 if ref in actual_refs:
672 continue
673 event_type = "error" if step.get("status") == "failed" or step.get("error") else "tool_result"
674 title = str(step.get("tool_name") or step.get("kind") or "step")
675 body = str(step.get("summary") or step.get("error") or "")
676 timeline.append(_projected_event(
677 event_id=f"projected_step_{step['id']}",
678 job_id=job_id,
679 event_type=event_type,
680 created_at=str(step.get("ended_at") or step.get("started_at")),
681 title=title,
682 body=body,
683 ref_table="steps",
684 ref_id=str(step["id"]),
685 metadata={"step_no": step.get("step_no"), "status": step.get("status"), "kind": step.get("kind")},
686 ))
687
688 for artifact in self.list_artifacts(job_id, limit=10000):
689 ref = ("artifacts", str(artifact["id"]))
690 if ref in actual_refs:
691 continue
692 timeline.append(_projected_event(
693 event_id=f"projected_artifact_{artifact['id']}",
694 job_id=job_id,
695 event_type="artifact",
696 created_at=str(artifact.get("created_at")),
697 title=str(artifact.get("title") or artifact["id"]),
698 body=str(artifact.get("summary") or artifact.get("path") or ""),
699 ref_table="artifacts",
700 ref_id=str(artifact["id"]),
701 metadata={"type": artifact.get("type"), "path": artifact.get("path")},
702 ))
703
704 for memory in self.list_memory(job_id):
705 ref = ("memory_index", str(memory["id"]))
706 if ref in actual_refs:
707 continue
708 timeline.append(_projected_event(
709 event_id=f"projected_memory_{memory['id']}",
710 job_id=job_id,
711 event_type="compaction",
712 created_at=str(memory.get("updated_at")),
713 title=str(memory.get("key") or "compact memory"),
714 body=str(memory.get("summary") or ""),
715 ref_table="memory_index",
716 ref_id=str(memory["id"]),
717 metadata={"artifact_refs": memory.get("artifact_refs") or []},
718 ))
719
720 timeline = [event for event in timeline if event.get("created_at")]
721 timeline.sort(key=lambda event: (str(event.get("created_at") or ""), str(event.get("id") or "")))
722 return timeline[-int(limit):]
723
724 def create_job(
725 self,
726 objective: str,
727 *,
728 title: str | None = None,
729 kind: str = "generic",
730 priority: int = 0,
731 cadence: str | None = None,
732 metadata: dict[str, Any] | None = None,
733 ) -> str:
734 now = utc_now()
735 title = title or objective.strip().splitlines()[0][:80] or "Untitled job"
736
737 def op(conn: sqlite3.Connection) -> str:
738 job_id = _unique_job_id(conn, title)
739 conn.execute(
740 """
741 INSERT INTO jobs(id, title, objective, kind, status, priority, cadence, created_at, updated_at, metadata_json)
742 VALUES (?, ?, ?, ?, 'queued', ?, ?, ?, ?, ?)
743 """,
744 (job_id, title, objective, kind, priority, cadence, now, now, _json_dumps(metadata)),
745 )
746 _insert_event(
747 conn,
748 job_id=job_id,
749 event_type="daemon",
750 title="job created",
751 body=objective,
752 metadata={"title": title, "kind": kind, "cadence": cadence},
753 created_at=now,
754 )
755 return job_id
756
757 return self._write(op)
758
759 def get_job(self, job_id: str) -> dict[str, Any]:
760 row = self._conn.execute("SELECT * FROM jobs WHERE id = ?", (job_id,)).fetchone()
761 job = _row_to_dict(row)
762 if job is None:
763 raise KeyError(f"Job not found: {job_id}")
764 return job
765
766 def list_jobs(self, *, statuses: Iterable[str] | None = None) -> list[dict[str, Any]]:
767 if statuses:
768 values = list(statuses)
769 placeholders = ",".join("?" for _ in values)
770 rows = self._conn.execute(
771 f"SELECT * FROM jobs WHERE status IN ({placeholders}) ORDER BY priority DESC, updated_at",
772 values,
773 ).fetchall()
774 else:
775 rows = self._conn.execute("SELECT * FROM jobs ORDER BY updated_at DESC").fetchall()
776 return [_row_to_dict(row) for row in rows]
777
778 def update_job_status(self, job_id: str, status: str, *, metadata_patch: dict[str, Any] | None = None) -> None:
779 now = utc_now()
780
781 def op(conn: sqlite3.Connection) -> None:
782 metadata_json = None
783 if metadata_patch:
784 row = conn.execute("SELECT metadata_json FROM jobs WHERE id = ?", (job_id,)).fetchone()
785 if row is None:
786 raise KeyError(f"Job not found: {job_id}")
787 current = json.loads(row["metadata_json"] or "{}")
788 current.update(metadata_patch)
789 metadata_json = _json_dumps(current)
790 if metadata_json is None:
791 conn.execute("UPDATE jobs SET status = ?, updated_at = ? WHERE id = ?", (status, now, job_id))
792 else:
793 conn.execute(
794 "UPDATE jobs SET status = ?, updated_at = ?, metadata_json = ? WHERE id = ?",
795 (status, now, metadata_json, job_id),
796 )
797 _insert_event(
798 conn,
799 job_id=job_id,
800 event_type="daemon",
801 title=f"job {status}",
802 body=str((metadata_patch or {}).get("last_note") or ""),
803 metadata={"status": status, "metadata_patch": metadata_patch or {}},
804 created_at=now,
805 )
806
807 self._write(op)
808
809 def update_job_metadata(self, job_id: str, metadata_patch: dict[str, Any]) -> None:
810 now = utc_now()
811
812 def op(conn: sqlite3.Connection) -> None:
813 row = conn.execute("SELECT metadata_json FROM jobs WHERE id = ?", (job_id,)).fetchone()
814 if row is None:
815 raise KeyError(f"Job not found: {job_id}")
816 current = json.loads(row["metadata_json"] or "{}")
817 current.update(metadata_patch)
818 conn.execute(
819 "UPDATE jobs SET updated_at = ?, metadata_json = ? WHERE id = ?",
820 (now, _json_dumps(current), job_id),
821 )
822
823 self._write(op)
824
825 def claim_operator_messages(
826 self,
827 job_id: str,
828 *,
829 modes: Iterable[str] = ("steer",),
830 limit: int = 1,
831 ) -> list[dict[str, Any]]:
832 now = utc_now()
833 allowed = {mode.strip().lower().replace("-", "_") for mode in modes}
834
835 def op(conn: sqlite3.Connection) -> list[dict[str, Any]]:
836 row = conn.execute("SELECT metadata_json FROM jobs WHERE id = ?", (job_id,)).fetchone()
837 if row is None:
838 raise KeyError(f"Job not found: {job_id}")
839 metadata = json.loads(row["metadata_json"] or "{}")
840 messages = metadata.get("operator_messages")
841 if not isinstance(messages, list):
842 return []
843 claimed: list[dict[str, Any]] = []
844 for entry in messages:
845 if len(claimed) >= limit:
846 break
847 if not isinstance(entry, dict):
848 continue
849 mode = str(entry.get("mode") or "steer").strip().lower().replace("-", "_")
850 if mode not in allowed or entry.get("claimed_at"):
851 continue
852 if entry.get("acknowledged_at") or entry.get("superseded_at"):
853 continue
854 entry["claimed_at"] = now
855 entry["delivered_at"] = now
856 claimed.append(dict(entry))
857 if not claimed:
858 return []
859 metadata["operator_messages"] = messages[-200:]
860 metadata["last_claimed_operator_messages"] = claimed
861 conn.execute(
862 "UPDATE jobs SET updated_at = ?, metadata_json = ? WHERE id = ?",
863 (now, _json_dumps(metadata), job_id),
864 )
865 for entry in claimed:
866 _insert_event(
867 conn,
868 job_id=job_id,
869 event_type="loop",
870 title="steering claimed",
871 body=str(entry.get("message") or ""),
872 metadata={
873 "source": entry.get("source"),
874 "mode": entry.get("mode"),
875 "operator_event_id": entry.get("event_id"),
876 },
877 created_at=now,
878 )
879 return claimed
880
881 return self._write(op)
882
883 def acknowledge_operator_messages(
884 self,
885 job_id: str,
886 *,
887 message_ids: Iterable[str] | None = None,
888 summary: str = "",
889 status: str = "acknowledged",
890 ) -> dict[str, Any]:
891 now = utc_now()
892 wanted = {str(message_id).strip() for message_id in (message_ids or []) if str(message_id).strip()}
893 status = status.strip().lower().replace("-", "_") or "acknowledged"
894 if status not in {"acknowledged", "superseded"}:
895 status = "acknowledged"
896
897 def op(conn: sqlite3.Connection) -> dict[str, Any]:
898 row = conn.execute("SELECT metadata_json FROM jobs WHERE id = ?", (job_id,)).fetchone()
899 if row is None:
900 raise KeyError(f"Job not found: {job_id}")
901 metadata = json.loads(row["metadata_json"] or "{}")
902 messages = metadata.get("operator_messages")
903 if not isinstance(messages, list):
904 messages = []
905 acknowledged: list[dict[str, Any]] = []
906 for entry in messages:
907 if not isinstance(entry, dict):
908 continue
909 mode = str(entry.get("mode") or "steer").strip().lower().replace("-", "_")
910 if mode not in {"steer", "follow_up"}:
911 continue
912 event_id = str(entry.get("event_id") or "")
913 if wanted and event_id not in wanted:
914 continue
915 if not wanted and not entry.get("claimed_at"):
916 continue
917 if entry.get("acknowledged_at") or entry.get("superseded_at"):
918 continue
919 if status == "superseded":
920 entry["superseded_at"] = now
921 else:
922 entry["acknowledged_at"] = now
923 if summary:
924 entry["acknowledgement_summary"] = summary.strip()
925 acknowledged.append(dict(entry))
926 metadata["operator_messages"] = messages[-200:]
927 metadata["last_operator_context_ack"] = {
928 "at": now,
929 "status": status,
930 "summary": summary.strip(),
931 "message_ids": [entry.get("event_id") for entry in acknowledged if entry.get("event_id")],
932 "count": len(acknowledged),
933 }
934 conn.execute(
935 "UPDATE jobs SET updated_at = ?, metadata_json = ? WHERE id = ?",
936 (now, _json_dumps(metadata), job_id),
937 )
938 event = _insert_event(
939 conn,
940 job_id=job_id,
941 event_type="operator_context",
942 title=f"operator {status}",
943 body=summary.strip() or f"{len(acknowledged)} operator message(s) {status}",
944 metadata={
945 "status": status,
946 "message_ids": [entry.get("event_id") for entry in acknowledged if entry.get("event_id")],
947 "count": len(acknowledged),
948 },
949 created_at=now,
950 )
951 return {"event": event, "messages": acknowledged, "count": len(acknowledged), "status": status}
952
953 return self._write(op)
954
955 def rename_job(self, job_id: str, title: str) -> dict[str, Any]:
956 now = utc_now()
957 new_title = title.strip()
958 if not new_title:
959 raise ValueError("title is required")
960
961 def op(conn: sqlite3.Connection) -> dict[str, Any]:
962 row = conn.execute("SELECT * FROM jobs WHERE id = ?", (job_id,)).fetchone()
963 if row is None:
964 raise KeyError(f"Job not found: {job_id}")
965 conn.execute("UPDATE jobs SET title = ?, updated_at = ? WHERE id = ?", (new_title, now, job_id))
966 _insert_event(
967 conn,
968 job_id=job_id,
969 event_type="daemon",
970 title="job renamed",
971 body=f"{row['title']} -> {new_title}",
972 metadata={"old_title": row["title"], "new_title": new_title},
973 created_at=now,
974 )
975 updated = dict(row)
976 updated["title"] = new_title
977 updated["updated_at"] = now
978 return _row_to_dict(updated)
979
980 return self._write(op)
981
982 def delete_job(self, job_id: str) -> dict[str, Any]:
983 def op(conn: sqlite3.Connection) -> dict[str, Any]:
984 row = conn.execute("SELECT * FROM jobs WHERE id = ?", (job_id,)).fetchone()
985 if row is None:
986 raise KeyError(f"Job not found: {job_id}")
987 artifact_rows = conn.execute("SELECT path FROM artifacts WHERE job_id = ?", (job_id,)).fetchall()
988 artifact_paths = [str(artifact["path"]) for artifact in artifact_rows if artifact["path"]]
989 counts = {
990 "evidence": conn.execute("SELECT COUNT(*) AS n FROM evidence WHERE job_id = ?", (job_id,)).fetchone()["n"],
991 "artifacts": conn.execute("SELECT COUNT(*) AS n FROM artifacts WHERE job_id = ?", (job_id,)).fetchone()["n"],
992 "memory": conn.execute("SELECT COUNT(*) AS n FROM memory_index WHERE job_id = ?", (job_id,)).fetchone()["n"],
993 "steps": conn.execute("SELECT COUNT(*) AS n FROM steps WHERE job_id = ?", (job_id,)).fetchone()["n"],
994 "runs": conn.execute("SELECT COUNT(*) AS n FROM job_runs WHERE job_id = ?", (job_id,)).fetchone()["n"],
995 "events": conn.execute("SELECT COUNT(*) AS n FROM events WHERE job_id = ?", (job_id,)).fetchone()["n"],
996 }
997 conn.execute("DELETE FROM evidence WHERE job_id = ?", (job_id,))
998 conn.execute("DELETE FROM artifacts WHERE job_id = ?", (job_id,))
999 conn.execute("DELETE FROM memory_index WHERE job_id = ?", (job_id,))
1000 conn.execute("DELETE FROM steps WHERE job_id = ?", (job_id,))
1001 conn.execute("DELETE FROM job_runs WHERE job_id = ?", (job_id,))
1002 conn.execute("DELETE FROM events WHERE job_id = ?", (job_id,))
1003 conn.execute("DELETE FROM jobs WHERE id = ?", (job_id,))
1004 return {
1005 "job": _row_to_dict(row),
1006 "artifact_paths": artifact_paths,
1007 "counts": counts,
1008 }
1009
1010 return self._write(op)
1011
1012 def append_operator_message(
1013 self,
1014 job_id: str,
1015 message: str,
1016 *,
1017 source: str = "operator",
1018 mode: str = "steer",
1019 ) -> dict[str, Any]:
1020 now = utc_now()
1021 text = message.strip()
1022 if not text:
1023 raise ValueError("message is required")
1024 mode = mode.strip().lower().replace("-", "_") or "steer"
1025 if mode not in {"steer", "follow_up", "note"}:
1026 mode = "steer"
1027 entry = {"at": now, "source": source, "mode": mode, "message": text}
1028
1029 def op(conn: sqlite3.Connection) -> dict[str, Any]:
1030 row = conn.execute("SELECT metadata_json FROM jobs WHERE id = ?", (job_id,)).fetchone()
1031 if row is None:
1032 raise KeyError(f"Job not found: {job_id}")
1033 event = _insert_event(
1034 conn,
1035 job_id=job_id,
1036 event_type="operator_message",
1037 title=source,
1038 body=text,
1039 metadata={"source": source, "mode": mode},
1040 created_at=now,
1041 )
1042 entry["event_id"] = event["id"]
1043 metadata = json.loads(row["metadata_json"] or "{}")
1044 messages = metadata.get("operator_messages")
1045 if not isinstance(messages, list):
1046 messages = []
1047 messages.append(entry)
1048 metadata["operator_messages"] = messages[-200:]
1049 metadata["last_operator_message"] = entry
1050 conn.execute(
1051 "UPDATE jobs SET updated_at = ?, metadata_json = ? WHERE id = ?",
1052 (now, _json_dumps(metadata), job_id),
1053 )
1054 return entry
1055
1056 return self._write(op)
1057
1058 def append_agent_update(
1059 self,
1060 job_id: str,
1061 message: str,
1062 *,
1063 category: str = "progress",
1064 metadata: dict[str, Any] | None = None,
1065 ) -> dict[str, Any]:
1066 now = utc_now()
1067 text = message.strip()
1068 if not text:
1069 raise ValueError("message is required")
1070 entry = {
1071 "at": now,
1072 "category": category.strip() or "progress",
1073 "message": text,
1074 "metadata": metadata or {},
1075 }
1076
1077 def op(conn: sqlite3.Connection) -> dict[str, Any]:
1078 row = conn.execute("SELECT metadata_json FROM jobs WHERE id = ?", (job_id,)).fetchone()
1079 if row is None:
1080 raise KeyError(f"Job not found: {job_id}")
1081 event = _insert_event(
1082 conn,
1083 job_id=job_id,
1084 event_type="agent_message",
1085 title=entry["category"],
1086 body=text,
1087 metadata=entry["metadata"],
1088 created_at=now,
1089 )
1090 entry["event_id"] = event["id"]
1091 job_metadata = json.loads(row["metadata_json"] or "{}")
1092 updates = job_metadata.get("agent_updates")
1093 if not isinstance(updates, list):
1094 updates = []
1095 updates.append(entry)
1096 job_metadata["agent_updates"] = updates[-100:]
1097 job_metadata["last_agent_update"] = entry
1098 conn.execute(
1099 "UPDATE jobs SET updated_at = ?, metadata_json = ? WHERE id = ?",
1100 (now, _json_dumps(job_metadata), job_id),
1101 )
1102 return entry
1103
1104 return self._write(op)
1105
1106 def append_lesson(
1107 self,
1108 job_id: str,
1109 lesson: str,
1110 *,
1111 category: str = "memory",
1112 confidence: float | None = None,
1113 metadata: dict[str, Any] | None = None,
1114 ) -> dict[str, Any]:
1115 now = utc_now()
1116 text = lesson.strip()
1117 if not text:
1118 raise ValueError("lesson is required")
1119 entry = {
1120 "at": now,
1121 "category": category.strip().lower() or "memory",
1122 "key": _norm_key(f"{category}:{text}"),
1123 "lesson": text,
1124 "confidence": confidence,
1125 "metadata": metadata or {},
1126 }
1127
1128 def op(conn: sqlite3.Connection) -> dict[str, Any]:
1129 row = conn.execute("SELECT metadata_json FROM jobs WHERE id = ?", (job_id,)).fetchone()
1130 if row is None:
1131 raise KeyError(f"Job not found: {job_id}")
1132 job_metadata = json.loads(row["metadata_json"] or "{}")
1133 lessons = job_metadata.get("lessons")
1134 if not isinstance(lessons, list):
1135 lessons = []
1136 existing = next(
1137 (
1138 item
1139 for item in lessons
1140 if isinstance(item, dict)
1141 and (item.get("key") or _norm_key(f"{item.get('category', 'memory')}:{item.get('lesson', '')}"))
1142 == entry["key"]
1143 ),
1144 None,
1145 )
1146 if existing is None:
1147 lessons.append(entry)
1148 current = entry
1149 current["created"] = True
1150 current["substantive_update"] = True
1151 event = _insert_event(
1152 conn,
1153 job_id=job_id,
1154 event_type="lesson",
1155 title=current.get("category") or "memory",
1156 body=current.get("lesson") or text,
1157 metadata={
1158 "confidence": current.get("confidence"),
1159 "seen_count": current.get("seen_count"),
1160 **(current.get("metadata") if isinstance(current.get("metadata"), dict) else {}),
1161 },
1162 created_at=now,
1163 )
1164 current["event_id"] = event["id"]
1165 else:
1166 existing["last_seen"] = now
1167 existing["seen_count"] = int(existing.get("seen_count") or 1) + 1
1168 if confidence is not None:
1169 existing["confidence"] = confidence
1170 if metadata:
1171 merged = existing.get("metadata") if isinstance(existing.get("metadata"), dict) else {}
1172 merged.update(metadata)
1173 existing["metadata"] = merged
1174 existing["key"] = entry["key"]
1175 current = existing
1176 current["created"] = False
1177 current["substantive_update"] = False
1178 job_metadata["lessons"] = lessons[-200:]
1179 job_metadata["last_lesson"] = current
1180 conn.execute(
1181 "UPDATE jobs SET updated_at = ?, metadata_json = ? WHERE id = ?",
1182 (now, _json_dumps(job_metadata), job_id),
1183 )
1184 return current
1185
1186 return self._write(op)
1187
1188 def append_memory_graph_records(
1189 self,
1190 job_id: str,
1191 *,
1192 nodes: list[dict[str, Any]] | None = None,
1193 edges: list[dict[str, Any]] | None = None,
1194 ) -> dict[str, Any]:
1195 now = utc_now()
1196 node_items = [node for node in (nodes or []) if isinstance(node, dict)]
1197 edge_items = [edge for edge in (edges or []) if isinstance(edge, dict)]
1198 if not node_items and not edge_items:
1199 raise ValueError("nodes or edges are required")
1200
1201 def op(conn: sqlite3.Connection) -> dict[str, Any]:
1202 row = conn.execute("SELECT metadata_json FROM jobs WHERE id = ?", (job_id,)).fetchone()
1203 if row is None:
1204 raise KeyError(f"Job not found: {job_id}")
1205 job_metadata = json.loads(row["metadata_json"] or "{}")
1206 graph = job_metadata.get("memory_graph") if isinstance(job_metadata.get("memory_graph"), dict) else {}
1207 stored_nodes = _metadata_list(graph, "nodes")
1208 stored_edges = _metadata_list(graph, "edges")
1209 node_by_key = {str(node.get("key") or ""): node for node in stored_nodes if node.get("key")}
1210 added_nodes = 0
1211 updated_nodes = 0
1212 touched_nodes: list[dict[str, Any]] = []
1213
1214 for node in node_items[:50]:
1215 title = str(node.get("title") or node.get("name") or "").strip()
1216 summary = str(node.get("summary") or node.get("body") or "").strip()
1217 if not title and not summary:
1218 continue
1219 key = _norm_key(str(node.get("key") or title or summary[:80]))
1220 current = node_by_key.get(key)
1221 created = current is None
1222 if current is None:
1223 current = {
1224 "key": key,
1225 "title": title or key,
1226 "kind": DEFAULT_NODE_KIND,
1227 "status": DEFAULT_NODE_STATUS,
1228 "summary": "",
1229 "tags": [],
1230 "evidence_refs": [],
1231 "links": [],
1232 "metadata": {},
1233 "created_at": now,
1234 }
1235 stored_nodes.append(current)
1236 node_by_key[key] = current
1237 added_nodes += 1
1238 else:
1239 updated_nodes += 1
1240 if title:
1241 current["title"] = title
1242 if summary:
1243 current["summary"] = summary
1244 kind = str(node.get("kind") or current.get("kind") or DEFAULT_NODE_KIND).strip().lower()
1245 current["kind"] = kind if kind in NODE_KINDS else DEFAULT_NODE_KIND
1246 status = str(node.get("status") or current.get("status") or DEFAULT_NODE_STATUS).strip().lower()
1247 current["status"] = status if status in NODE_STATUSES else DEFAULT_NODE_STATUS
1248 if "salience" in node:
1249 current["salience"] = _bounded_float(node.get("salience"), 0.0, 1.0)
1250 elif "salience" not in current:
1251 current["salience"] = 0.5
1252 if "confidence" in node:
1253 current["confidence"] = _bounded_float(node.get("confidence"), 0.0, 1.0)
1254 elif "confidence" not in current:
1255 current["confidence"] = 0.5
1256 parent_key = str(node.get("parent_key") or node.get("parent") or "").strip()
1257 if parent_key:
1258 current["parent_key"] = _norm_key(parent_key)
1259 current["tags"] = _merge_string_lists(current.get("tags"), node.get("tags"), limit=24)
1260 current["evidence_refs"] = _merge_string_lists(current.get("evidence_refs"), node.get("evidence_refs") or node.get("evidence"), limit=24)
1261 current["links"] = _merge_string_lists(current.get("links"), node.get("links"), limit=50)
1262 if isinstance(node.get("metadata"), dict):
1263 merged = current.get("metadata") if isinstance(current.get("metadata"), dict) else {}
1264 merged.update(node["metadata"])
1265 current["metadata"] = merged
1266 current["created"] = created
1267 current["updated_at"] = now
1268 current["use_count"] = int(current.get("use_count") or 0)
1269 touched_nodes.append(current)
1270
1271 existing_edge_keys = {
1272 _memory_edge_key(edge)
1273 for edge in stored_edges
1274 if _memory_edge_key(edge)
1275 }
1276 added_edges = 0
1277 touched_edges: list[dict[str, Any]] = []
1278 for edge in edge_items[:100]:
1279 from_key = _norm_key(str(edge.get("from_key") or edge.get("from") or "").strip())
1280 to_key = _norm_key(str(edge.get("to_key") or edge.get("to") or "").strip())
1281 if not from_key or not to_key:
1282 continue
1283 relation = str(edge.get("relation") or "related_to").strip().lower().replace(" ", "_")
1284 relation = re.sub(r"[^a-z0-9_-]+", "_", relation).strip("_") or "related_to"
1285 edge_key = f"{from_key}|{relation}|{to_key}"
1286 if edge_key in existing_edge_keys:
1287 continue
1288 stored = {
1289 "key": edge_key,
1290 "from_key": from_key,
1291 "to_key": to_key,
1292 "relation": relation,
1293 "evidence_refs": _merge_string_lists([], edge.get("evidence_refs") or edge.get("evidence"), limit=24),
1294 "metadata": edge.get("metadata") if isinstance(edge.get("metadata"), dict) else {},
1295 "created_at": now,
1296 "updated_at": now,
1297 }
1298 stored_edges.append(stored)
1299 existing_edge_keys.add(edge_key)
1300 touched_edges.append(stored)
1301 added_edges += 1
1302
1303 graph = {
1304 "nodes": stored_nodes[-1000:],
1305 "edges": stored_edges[-2000:],
1306 "updated_at": now,
1307 }
1308 event = _insert_event(
1309 conn,
1310 job_id=job_id,
1311 event_type="memory_node",
1312 title="memory graph",
1313 body=f"nodes +{added_nodes}/~{updated_nodes}; edges +{added_edges}",
1314 metadata={
1315 "added_nodes": added_nodes,
1316 "updated_nodes": updated_nodes,
1317 "added_edges": added_edges,
1318 "node_keys": [node.get("key") for node in touched_nodes[-20:]],
1319 "edge_keys": [edge.get("key") for edge in touched_edges[-20:]],
1320 },
1321 created_at=now,
1322 )
1323 graph["event_id"] = event["id"]
1324 job_metadata["memory_graph"] = graph
1325 job_metadata["last_memory_graph_record"] = {
1326 "at": now,
1327 "event_id": event["id"],
1328 "added_nodes": added_nodes,
1329 "updated_nodes": updated_nodes,
1330 "added_edges": added_edges,
1331 "nodes": touched_nodes[-20:],
1332 "edges": touched_edges[-20:],
1333 }
1334 conn.execute(
1335 "UPDATE jobs SET updated_at = ?, metadata_json = ? WHERE id = ?",
1336 (now, _json_dumps(job_metadata), job_id),
1337 )
1338 return {
1339 "added_nodes": added_nodes,
1340 "updated_nodes": updated_nodes,
1341 "added_edges": added_edges,
1342 "nodes": touched_nodes,
1343 "edges": touched_edges,
1344 "event_id": event["id"],
1345 }
1346
1347 return self._write(op)
1348
1349 def append_source_record(
1350 self,
1351 job_id: str,
1352 source: str,
1353 *,
1354 source_type: str = "",
1355 usefulness_score: float | None = None,
1356 yield_count: int = 0,
1357 fail_count_delta: int = 0,
1358 warnings: list[str] | None = None,
1359 outcome: str = "",
1360 metadata: dict[str, Any] | None = None,
1361 ) -> dict[str, Any]:
1362 now = utc_now()
1363 text = source.strip()
1364 if not text:
1365 raise ValueError("source is required")
1366 key = _norm_key(text)
1367
1368 def op(conn: sqlite3.Connection) -> dict[str, Any]:
1369 row = conn.execute("SELECT metadata_json FROM jobs WHERE id = ?", (job_id,)).fetchone()
1370 if row is None:
1371 raise KeyError(f"Job not found: {job_id}")
1372 job_metadata = json.loads(row["metadata_json"] or "{}")
1373 sources = _metadata_list(job_metadata, "source_ledger")
1374 current = next((entry for entry in sources if entry.get("key") == key), None)
1375 created = current is None
1376 change_fields = (
1377 "source",
1378 "source_type",
1379 "usefulness_score",
1380 "fail_count",
1381 "yield_count",
1382 "warnings",
1383 "last_outcome",
1384 "metadata",
1385 )
1386 before = "" if created else _change_fingerprint(current, change_fields)
1387 if current is None:
1388 current = {
1389 "key": key,
1390 "source": text,
1391 "source_type": source_type.strip() or "unknown",
1392 "usefulness_score": 0.0,
1393 "fail_count": 0,
1394 "yield_count": 0,
1395 "warnings": [],
1396 "last_outcome": "",
1397 "metadata": {},
1398 "first_seen": now,
1399 }
1400 sources.append(current)
1401 if source_type:
1402 current["source_type"] = source_type.strip()
1403 if usefulness_score is not None:
1404 current["usefulness_score"] = float(usefulness_score)
1405 if yield_count:
1406 current["yield_count"] = int(current.get("yield_count") or 0) + int(yield_count)
1407 if fail_count_delta:
1408 current["fail_count"] = int(current.get("fail_count") or 0) + int(fail_count_delta)
1409 if warnings:
1410 merged = list(dict.fromkeys([*current.get("warnings", []), *[str(warning) for warning in warnings]]))
1411 current["warnings"] = merged[-20:]
1412 if outcome:
1413 current["last_outcome"] = outcome.strip()
1414 if metadata:
1415 merged_metadata = current.get("metadata") if isinstance(current.get("metadata"), dict) else {}
1416 merged_metadata.update(metadata)
1417 current["metadata"] = merged_metadata
1418 current["created"] = created
1419 substantive_update = created or before != _change_fingerprint(current, change_fields)
1420 current["substantive_update"] = substantive_update
1421 if substantive_update:
1422 current["updated_at"] = now
1423 current["last_seen"] = now
1424 if substantive_update:
1425 event = _insert_event(
1426 conn,
1427 job_id=job_id,
1428 event_type="source",
1429 title=current.get("source") or text,
1430 body=current.get("last_outcome") or outcome,
1431 metadata={
1432 "created": created,
1433 "substantive_update": substantive_update,
1434 "source_type": current.get("source_type"),
1435 "usefulness_score": current.get("usefulness_score"),
1436 "yield_count": current.get("yield_count"),
1437 "fail_count": current.get("fail_count"),
1438 "warnings": current.get("warnings") or [],
1439 **(current.get("metadata") if isinstance(current.get("metadata"), dict) else {}),
1440 },
1441 created_at=now,
1442 )
1443 current["event_id"] = event["id"]
1444 job_metadata["source_ledger"] = sources[-250:]
1445 job_metadata["last_source_record"] = current
1446 conn.execute(
1447 "UPDATE jobs SET updated_at = ?, metadata_json = ? WHERE id = ?",
1448 (now, _json_dumps(job_metadata), job_id),
1449 )
1450 return current
1451
1452 return self._write(op)
1453
1454 def append_finding_record(
1455 self,
1456 job_id: str,
1457 *,
1458 name: str,
1459 url: str = "",
1460 source_url: str = "",
1461 category: str = "",
1462 location: str = "",
1463 contact: str = "",
1464 reason: str = "",
1465 status: str = "new",
1466 score: float | None = None,
1467 evidence_artifact: str = "",
1468 metadata: dict[str, Any] | None = None,
1469 ) -> dict[str, Any]:
1470 now = utc_now()
1471 name = name.strip()
1472 if not name:
1473 raise ValueError("name is required")
1474 url = url.strip()
1475 source_url = source_url.strip()
1476 key = _norm_key(f"{name}|{url or source_url}")
1477
1478 def op(conn: sqlite3.Connection) -> dict[str, Any]:
1479 row = conn.execute("SELECT metadata_json FROM jobs WHERE id = ?", (job_id,)).fetchone()
1480 if row is None:
1481 raise KeyError(f"Job not found: {job_id}")
1482 job_metadata = json.loads(row["metadata_json"] or "{}")
1483 findings = _metadata_list(job_metadata, "finding_ledger")
1484 current = next((entry for entry in findings if entry.get("key") == key), None)
1485 created = current is None
1486 change_fields = (
1487 "url",
1488 "source_url",
1489 "category",
1490 "location",
1491 "contact",
1492 "reason",
1493 "status",
1494 "score",
1495 "evidence_artifact",
1496 "metadata",
1497 )
1498 before = "" if created else _change_fingerprint(current, change_fields)
1499 if current is None:
1500 current = {
1501 "key": key,
1502 "name": name,
1503 "url": url,
1504 "source_url": source_url,
1505 "category": category.strip(),
1506 "location": location.strip(),
1507 "contact": contact.strip(),
1508 "reason": reason.strip(),
1509 "status": status.strip() or "new",
1510 "score": score,
1511 "evidence_artifact": evidence_artifact.strip(),
1512 "metadata": metadata or {},
1513 "created_at": now,
1514 }
1515 findings.append(current)
1516 else:
1517 for field, value in {
1518 "url": url,
1519 "source_url": source_url,
1520 "category": category.strip(),
1521 "location": location.strip(),
1522 "contact": contact.strip(),
1523 "reason": reason.strip(),
1524 "status": status.strip(),
1525 "evidence_artifact": evidence_artifact.strip(),
1526 }.items():
1527 if value:
1528 current[field] = value
1529 if score is not None:
1530 current["score"] = score
1531 if metadata:
1532 merged_metadata = current.get("metadata") if isinstance(current.get("metadata"), dict) else {}
1533 merged_metadata.update(metadata)
1534 current["metadata"] = merged_metadata
1535 current["created"] = created
1536 substantive_update = created or before != _change_fingerprint(current, change_fields)
1537 current["substantive_update"] = substantive_update
1538 if substantive_update:
1539 current["updated_at"] = now
1540 if substantive_update:
1541 event = _insert_event(
1542 conn,
1543 job_id=job_id,
1544 event_type="finding",
1545 title=current.get("name") or name,
1546 body=current.get("reason") or current.get("category") or "",
1547 metadata={
1548 "created": created,
1549 "substantive_update": substantive_update,
1550 "score": current.get("score"),
1551 "status": current.get("status"),
1552 "source_url": current.get("source_url"),
1553 "evidence_artifact": current.get("evidence_artifact"),
1554 **(current.get("metadata") if isinstance(current.get("metadata"), dict) else {}),
1555 },
1556 created_at=now,
1557 )
1558 current["event_id"] = event["id"]
1559 job_metadata["finding_ledger"] = findings[-1000:]
1560 job_metadata["last_finding_record"] = current
1561 conn.execute(
1562 "UPDATE jobs SET updated_at = ?, metadata_json = ? WHERE id = ?",
1563 (now, _json_dumps(job_metadata), job_id),
1564 )
1565 return current
1566
1567 return self._write(op)
1568
1569 def append_roadmap_record(
1570 self,
1571 job_id: str,
1572 *,
1573 title: str,
1574 status: str = "planned",
1575 objective: str = "",
1576 scope: str = "",
1577 current_milestone: str = "",
1578 validation_contract: str = "",
1579 milestones: list[dict[str, Any]] | None = None,
1580 metadata: dict[str, Any] | None = None,
1581 ) -> dict[str, Any]:
1582 now = utc_now()
1583 title = title.strip()
1584 if not title:
1585 raise ValueError("title is required")
1586 status = _clean_status(status, {"planned", "active", "validating", "done", "blocked", "paused"}, "planned")
1587 milestone_items = milestones if isinstance(milestones, list) else []
1588
1589 def merge_feature(existing_features: list[dict[str, Any]], feature: dict[str, Any]) -> tuple[dict[str, Any] | None, bool, bool]:
1590 feature_title = str(feature.get("title") or feature.get("name") or "").strip()
1591 if not feature_title:
1592 return None, False, False
1593 feature_key = _norm_key(str(feature.get("key") or feature_title))
1594 feature_title_key = _norm_key(feature_title)
1595 current = next(
1596 (
1597 entry for entry in existing_features
1598 if entry.get("key") == feature_key
1599 or _norm_key(str(entry.get("title") or "")) == feature_title_key
1600 ),
1601 None,
1602 )
1603 created = current is None
1604 change_fields = (
1605 "title",
1606 "status",
1607 "goal",
1608 "output_contract",
1609 "acceptance_criteria",
1610 "evidence_needed",
1611 "result",
1612 "metadata",
1613 )
1614 before = "" if created else _change_fingerprint(current, change_fields)
1615 if current is None:
1616 current = {
1617 "key": feature_key,
1618 "title": feature_title,
1619 "status": _clean_status(str(feature.get("status") or "planned"), {"planned", "active", "done", "blocked", "skipped"}, "planned"),
1620 "goal": str(feature.get("goal") or feature.get("description") or "").strip(),
1621 "output_contract": str(feature.get("output_contract") or feature.get("contract") or "").strip().lower().replace(" ", "_"),
1622 "acceptance_criteria": str(feature.get("acceptance_criteria") or "").strip(),
1623 "evidence_needed": str(feature.get("evidence_needed") or "").strip(),
1624 "result": str(feature.get("result") or feature.get("outcome") or "").strip(),
1625 "metadata": feature.get("metadata") if isinstance(feature.get("metadata"), dict) else {},
1626 "created_at": now,
1627 }
1628 existing_features.append(current)
1629 else:
1630 current["status"] = _clean_status(str(feature.get("status") or current.get("status") or "planned"), {"planned", "active", "done", "blocked", "skipped"}, "planned")
1631 for field, value in {
1632 "title": feature_title,
1633 "goal": str(feature.get("goal") or feature.get("description") or "").strip(),
1634 "output_contract": str(feature.get("output_contract") or feature.get("contract") or "").strip().lower().replace(" ", "_"),
1635 "acceptance_criteria": str(feature.get("acceptance_criteria") or "").strip(),
1636 "evidence_needed": str(feature.get("evidence_needed") or "").strip(),
1637 "result": str(feature.get("result") or feature.get("outcome") or "").strip(),
1638 }.items():
1639 if value:
1640 current[field] = value
1641 if isinstance(feature.get("metadata"), dict):
1642 merged = current.get("metadata") if isinstance(current.get("metadata"), dict) else {}
1643 merged.update(feature["metadata"])
1644 current["metadata"] = merged
1645 if current.get("output_contract") not in {"research", "artifact", "experiment", "action", "monitor", "decision", "report", "validation"}:
1646 current["output_contract"] = ""
1647 current["created"] = created
1648 changed = created or before != _change_fingerprint(current, change_fields)
1649 current["substantive_update"] = changed
1650 if changed:
1651 current["updated_at"] = now
1652 return current, created, changed
1653
1654 def op(conn: sqlite3.Connection) -> dict[str, Any]:
1655 row = conn.execute("SELECT objective, metadata_json FROM jobs WHERE id = ?", (job_id,)).fetchone()
1656 if row is None:
1657 raise KeyError(f"Job not found: {job_id}")
1658 job_metadata = json.loads(row["metadata_json"] or "{}")
1659 roadmap = job_metadata.get("roadmap")
1660 created = not isinstance(roadmap, dict)
1661 roadmap_change_fields = (
1662 "title",
1663 "status",
1664 "objective",
1665 "scope",
1666 "validation_contract",
1667 "current_milestone",
1668 "metadata",
1669 )
1670 roadmap_before = "" if created else _change_fingerprint(roadmap, roadmap_change_fields)
1671 if created:
1672 roadmap = {
1673 "key": _norm_key(title),
1674 "title": title,
1675 "status": status,
1676 "objective": objective.strip() or str(row["objective"] or "").strip(),
1677 "scope": scope.strip(),
1678 "validation_contract": validation_contract.strip(),
1679 "current_milestone": current_milestone.strip(),
1680 "milestones": [],
1681 "metadata": metadata or {},
1682 "created_at": now,
1683 }
1684 else:
1685 roadmap["title"] = title or roadmap.get("title") or "Roadmap"
1686 roadmap["status"] = status
1687 for field, value in {
1688 "objective": objective.strip(),
1689 "scope": scope.strip(),
1690 "validation_contract": validation_contract.strip(),
1691 "current_milestone": current_milestone.strip(),
1692 }.items():
1693 if value:
1694 roadmap[field] = value
1695 if metadata:
1696 merged_metadata = roadmap.get("metadata") if isinstance(roadmap.get("metadata"), dict) else {}
1697 merged_metadata.update(metadata)
1698 roadmap["metadata"] = merged_metadata
1699
1700 stored_milestones = roadmap.get("milestones") if isinstance(roadmap.get("milestones"), list) else []
1701 added_milestones = 0
1702 updated_milestones = 0
1703 added_features = 0
1704 updated_features = 0
1705 touched: list[dict[str, Any]] = []
1706 for milestone in milestone_items[:100]:
1707 if not isinstance(milestone, dict):
1708 continue
1709 milestone_title = str(milestone.get("title") or milestone.get("name") or "").strip()
1710 if not milestone_title:
1711 continue
1712 milestone_key = _norm_key(str(milestone.get("key") or milestone_title))
1713 milestone_title_key = _norm_key(milestone_title)
1714 current = next(
1715 (
1716 entry for entry in stored_milestones
1717 if entry.get("key") == milestone_key
1718 or _norm_key(str(entry.get("title") or "")) == milestone_title_key
1719 ),
1720 None,
1721 )
1722 milestone_created = current is None
1723 milestone_change_fields = (
1724 "title",
1725 "status",
1726 "priority",
1727 "goal",
1728 "acceptance_criteria",
1729 "evidence_needed",
1730 "validation_status",
1731 "validation_result",
1732 "next_action",
1733 "metadata",
1734 )
1735 milestone_before = "" if milestone_created else _change_fingerprint(current, milestone_change_fields)
1736 if current is None:
1737 current = {
1738 "key": milestone_key,
1739 "title": milestone_title,
1740 "status": _clean_status(str(milestone.get("status") or "planned"), {"planned", "active", "validating", "done", "blocked", "skipped"}, "planned"),
1741 "priority": int(milestone.get("priority") or 0),
1742 "goal": str(milestone.get("goal") or milestone.get("description") or "").strip(),
1743 "acceptance_criteria": str(milestone.get("acceptance_criteria") or "").strip(),
1744 "evidence_needed": str(milestone.get("evidence_needed") or "").strip(),
1745 "validation_status": _clean_status(str(milestone.get("validation_status") or "not_started"), {"not_started", "pending", "passed", "failed", "blocked"}, "not_started"),
1746 "validation_result": str(milestone.get("validation_result") or "").strip(),
1747 "next_action": str(milestone.get("next_action") or "").strip(),
1748 "features": [],
1749 "metadata": milestone.get("metadata") if isinstance(milestone.get("metadata"), dict) else {},
1750 "created_at": now,
1751 }
1752 stored_milestones.append(current)
1753 added_milestones += 1
1754 else:
1755 current["status"] = _clean_status(str(milestone.get("status") or current.get("status") or "planned"), {"planned", "active", "validating", "done", "blocked", "skipped"}, "planned")
1756 if "priority" in milestone:
1757 current["priority"] = int(milestone.get("priority") or 0)
1758 for field, value in {
1759 "title": milestone_title,
1760 "goal": str(milestone.get("goal") or milestone.get("description") or "").strip(),
1761 "acceptance_criteria": str(milestone.get("acceptance_criteria") or "").strip(),
1762 "evidence_needed": str(milestone.get("evidence_needed") or "").strip(),
1763 "validation_status": _clean_status(str(milestone.get("validation_status") or ""), {"not_started", "pending", "passed", "failed", "blocked"}, ""),
1764 "validation_result": str(milestone.get("validation_result") or "").strip(),
1765 "next_action": str(milestone.get("next_action") or "").strip(),
1766 }.items():
1767 if value:
1768 current[field] = value
1769 if isinstance(milestone.get("metadata"), dict):
1770 merged_metadata = current.get("metadata") if isinstance(current.get("metadata"), dict) else {}
1771 merged_metadata.update(milestone["metadata"])
1772 current["metadata"] = merged_metadata
1773 feature_items = milestone.get("features") if isinstance(milestone.get("features"), list) else []
1774 features = current.get("features") if isinstance(current.get("features"), list) else []
1775 feature_changed = False
1776 for feature in feature_items[:100]:
1777 if not isinstance(feature, dict):
1778 continue
1779 stored_feature, feature_created, feature_updated = merge_feature(features, feature)
1780 if stored_feature is None:
1781 continue
1782 if feature_created:
1783 added_features += 1
1784 elif feature_updated:
1785 updated_features += 1
1786 feature_changed = feature_changed or feature_updated
1787 current["features"] = features[-500:]
1788 current["created"] = milestone_created
1789 milestone_changed = milestone_created or milestone_before != _change_fingerprint(current, milestone_change_fields)
1790 current["substantive_update"] = milestone_changed or feature_changed
1791 if current["substantive_update"]:
1792 current["updated_at"] = now
1793 if not milestone_created and milestone_changed:
1794 updated_milestones += 1
1795 touched.append(current)
1796
1797 roadmap["milestones"] = stored_milestones[-500:]
1798 roadmap["created"] = created
1799 roadmap_substantive_update = created or roadmap_before != _change_fingerprint(roadmap, roadmap_change_fields)
1800 roadmap["substantive_update"] = roadmap_substantive_update
1801 if roadmap_substantive_update or added_milestones or updated_milestones or added_features or updated_features:
1802 roadmap["updated_at"] = now
1803 roadmap["added_milestones"] = added_milestones
1804 roadmap["updated_milestones"] = updated_milestones
1805 roadmap["added_features"] = added_features
1806 roadmap["updated_features"] = updated_features
1807 roadmap_has_change = bool(
1808 roadmap_substantive_update
1809 or added_milestones
1810 or updated_milestones
1811 or added_features
1812 or updated_features
1813 )
1814 if roadmap_has_change:
1815 event = _insert_event(
1816 conn,
1817 job_id=job_id,
1818 event_type="roadmap",
1819 title=roadmap.get("title") or title,
1820 body=f"{roadmap.get('status')} | milestones +{added_milestones}/~{updated_milestones} | features +{added_features}/~{updated_features}",
1821 metadata={
1822 "created": created,
1823 "substantive_update": roadmap_substantive_update,
1824 "status": roadmap.get("status"),
1825 "current_milestone": roadmap.get("current_milestone"),
1826 "milestone_count": len(roadmap.get("milestones") or []),
1827 "added_milestones": added_milestones,
1828 "updated_milestones": updated_milestones,
1829 "added_features": added_features,
1830 "updated_features": updated_features,
1831 "roadmap_updated": roadmap_substantive_update and not created,
1832 },
1833 created_at=now,
1834 )
1835 roadmap["event_id"] = event["id"]
1836 job_metadata["roadmap"] = roadmap
1837 job_metadata["last_roadmap_record"] = {
1838 "at": now,
1839 "updated_at": roadmap.get("updated_at") or now,
1840 "event_id": roadmap.get("event_id"),
1841 "created": created,
1842 "substantive_update": roadmap_substantive_update,
1843 "title": roadmap.get("title"),
1844 "status": roadmap.get("status"),
1845 "added_milestones": added_milestones,
1846 "updated_milestones": updated_milestones,
1847 "added_features": added_features,
1848 "updated_features": updated_features,
1849 "roadmap_updated": roadmap_substantive_update and not created,
1850 "milestones": touched[-10:],
1851 }
1852 conn.execute(
1853 "UPDATE jobs SET updated_at = ?, metadata_json = ? WHERE id = ?",
1854 (now, _json_dumps(job_metadata), job_id),
1855 )
1856 return roadmap
1857
1858 return self._write(op)
1859
1860 def append_milestone_validation_record(
1861 self,
1862 job_id: str,
1863 *,
1864 milestone: str,
1865 validation_status: str = "pending",
1866 result: str = "",
1867 evidence: str = "",
1868 issues: list[str] | None = None,
1869 next_action: str = "",
1870 metadata: dict[str, Any] | None = None,
1871 ) -> dict[str, Any]:
1872 now = utc_now()
1873 milestone = milestone.strip()
1874 if not milestone:
1875 raise ValueError("milestone is required")
1876 validation_status = _clean_status(validation_status, {"pending", "passed", "failed", "blocked"}, "pending")
1877 issue_values = [str(issue).strip() for issue in (issues or []) if str(issue).strip()]
1878 milestone_key = _norm_key(milestone)
1879
1880 def op(conn: sqlite3.Connection) -> dict[str, Any]:
1881 row = conn.execute("SELECT objective, metadata_json FROM jobs WHERE id = ?", (job_id,)).fetchone()
1882 if row is None:
1883 raise KeyError(f"Job not found: {job_id}")
1884 job_metadata = json.loads(row["metadata_json"] or "{}")
1885 roadmap = job_metadata.get("roadmap")
1886 if not isinstance(roadmap, dict):
1887 roadmap = {
1888 "key": _norm_key(str(row["objective"] or "roadmap")),
1889 "title": "Roadmap",
1890 "status": "active",
1891 "objective": str(row["objective"] or ""),
1892 "scope": "",
1893 "validation_contract": "",
1894 "current_milestone": milestone,
1895 "milestones": [],
1896 "metadata": {},
1897 "created_at": now,
1898 }
1899 milestones = roadmap.get("milestones") if isinstance(roadmap.get("milestones"), list) else []
1900 current = next(
1901 (
1902 entry for entry in milestones
1903 if entry.get("key") == milestone_key
1904 or _norm_key(str(entry.get("title") or "")) == milestone_key
1905 ),
1906 None,
1907 )
1908 created = current is None
1909 if current is None:
1910 current = {
1911 "key": milestone_key,
1912 "title": milestone,
1913 "status": "validating" if validation_status == "pending" else ("done" if validation_status == "passed" else "blocked"),
1914 "priority": 0,
1915 "goal": "",
1916 "acceptance_criteria": "",
1917 "evidence_needed": "",
1918 "features": [],
1919 "metadata": {},
1920 "created_at": now,
1921 }
1922 milestones.append(current)
1923 current["validation_status"] = validation_status
1924 current["validation_result"] = result.strip()
1925 current["validation_evidence"] = evidence.strip()
1926 current["validation_issues"] = issue_values
1927 current["next_action"] = next_action.strip()
1928 if validation_status == "passed":
1929 current["status"] = "done"
1930 elif validation_status == "pending":
1931 current["status"] = "validating"
1932 elif validation_status in {"failed", "blocked"}:
1933 current["status"] = "blocked"
1934 if metadata:
1935 merged_metadata = current.get("metadata") if isinstance(current.get("metadata"), dict) else {}
1936 merged_metadata.update(metadata)
1937 current["metadata"] = merged_metadata
1938 current["updated_at"] = now
1939 current["created"] = created
1940 roadmap["milestones"] = milestones[-500:]
1941 roadmap["status"] = "active" if validation_status in {"failed", "blocked"} else ("validating" if validation_status == "pending" else roadmap.get("status") or "active")
1942 roadmap["current_milestone"] = current.get("title") or milestone
1943 roadmap["updated_at"] = now
1944 event = _insert_event(
1945 conn,
1946 job_id=job_id,
1947 event_type="milestone_validation",
1948 title=current.get("title") or milestone,
1949 body=result.strip() or validation_status,
1950 metadata={
1951 "created": created,
1952 "validation_status": validation_status,
1953 "evidence": evidence.strip(),
1954 "issues": issue_values,
1955 "next_action": next_action.strip(),
1956 **(metadata or {}),
1957 },
1958 created_at=now,
1959 )
1960 current["validation_event_id"] = event["id"]
1961 job_metadata["roadmap"] = roadmap
1962 job_metadata["last_milestone_validation"] = {
1963 "at": now,
1964 "validated_at": now,
1965 "event_id": event["id"],
1966 "milestone": current.get("title"),
1967 "validation_status": validation_status,
1968 "result": result.strip(),
1969 "issues": issue_values,
1970 "next_action": next_action.strip(),
1971 }
1972 conn.execute(
1973 "UPDATE jobs SET updated_at = ?, metadata_json = ? WHERE id = ?",
1974 (now, _json_dumps(job_metadata), job_id),
1975 )
1976 return current
1977
1978 return self._write(op)
1979
1980 def append_task_record(
1981 self,
1982 job_id: str,
1983 *,
1984 title: str,
1985 status: str = "open",
1986 priority: int = 0,
1987 goal: str = "",
1988 source_hint: str = "",
1989 result: str = "",
1990 parent: str = "",
1991 output_contract: str = "",
1992 acceptance_criteria: str = "",
1993 evidence_needed: str = "",
1994 stall_behavior: str = "",
1995 metadata: dict[str, Any] | None = None,
1996 ) -> dict[str, Any]:
1997 now = utc_now()
1998 title = title.strip()
1999 if not title:
2000 raise ValueError("title is required")
2001 status = (status.strip().lower() or "open").replace(" ", "_")
2002 if status not in {"open", "active", "done", "blocked", "skipped"}:
2003 status = "open"
2004 output_contract = output_contract.strip().lower().replace(" ", "_")
2005 if output_contract not in {"research", "artifact", "experiment", "action", "monitor", "decision", "report"}:
2006 output_contract = ""
2007 key = _norm_key(f"{parent}|{title}")
2008
2009 def op(conn: sqlite3.Connection) -> dict[str, Any]:
2010 row = conn.execute("SELECT metadata_json FROM jobs WHERE id = ?", (job_id,)).fetchone()
2011 if row is None:
2012 raise KeyError(f"Job not found: {job_id}")
2013 job_metadata = json.loads(row["metadata_json"] or "{}")
2014 tasks = _metadata_list(job_metadata, "task_queue")
2015 current = next(
2016 (
2017 entry
2018 for entry in tasks
2019 if entry.get("key") == key
2020 or (
2021 not entry.get("key")
2022 and _norm_key(f"{entry.get('parent') or ''}|{entry.get('title') or ''}") == key
2023 )
2024 ),
2025 None,
2026 )
2027 created = current is None
2028 change_fields = (
2029 "status",
2030 "priority",
2031 "goal",
2032 "source_hint",
2033 "result",
2034 "parent",
2035 "output_contract",
2036 "acceptance_criteria",
2037 "evidence_needed",
2038 "stall_behavior",
2039 "metadata",
2040 )
2041 before = "" if created else _change_fingerprint(current, change_fields)
2042 if current is None:
2043 current = {
2044 "key": key,
2045 "title": title,
2046 "status": status,
2047 "priority": int(priority),
2048 "goal": goal.strip(),
2049 "source_hint": source_hint.strip(),
2050 "result": result.strip(),
2051 "parent": parent.strip(),
2052 "output_contract": output_contract,
2053 "acceptance_criteria": acceptance_criteria.strip(),
2054 "evidence_needed": evidence_needed.strip(),
2055 "stall_behavior": stall_behavior.strip(),
2056 "metadata": metadata or {},
2057 "created_at": now,
2058 }
2059 tasks.append(current)
2060 else:
2061 current["status"] = status
2062 current["priority"] = int(priority)
2063 for field, value in {
2064 "goal": goal.strip(),
2065 "source_hint": source_hint.strip(),
2066 "result": result.strip(),
2067 "parent": parent.strip(),
2068 "output_contract": output_contract,
2069 "acceptance_criteria": acceptance_criteria.strip(),
2070 "evidence_needed": evidence_needed.strip(),
2071 "stall_behavior": stall_behavior.strip(),
2072 }.items():
2073 if value:
2074 current[field] = value
2075 if metadata:
2076 merged_metadata = current.get("metadata") if isinstance(current.get("metadata"), dict) else {}
2077 merged_metadata.update(metadata)
2078 current["metadata"] = merged_metadata
2079 current["created"] = created
2080 substantive_update = created or before != _change_fingerprint(current, change_fields)
2081 current["substantive_update"] = substantive_update
2082 if substantive_update:
2083 current["updated_at"] = now
2084 if substantive_update:
2085 event = _insert_event(
2086 conn,
2087 job_id=job_id,
2088 event_type="task",
2089 title=current.get("title") or title,
2090 body=current.get("result") or current.get("goal") or "",
2091 metadata={
2092 "created": created,
2093 "substantive_update": substantive_update,
2094 "status": current.get("status"),
2095 "priority": current.get("priority"),
2096 "parent": current.get("parent"),
2097 "source_hint": current.get("source_hint"),
2098 "output_contract": current.get("output_contract"),
2099 "acceptance_criteria": current.get("acceptance_criteria"),
2100 "evidence_needed": current.get("evidence_needed"),
2101 "stall_behavior": current.get("stall_behavior"),
2102 **(current.get("metadata") if isinstance(current.get("metadata"), dict) else {}),
2103 },
2104 created_at=now,
2105 )
2106 current["event_id"] = event["id"]
2107 job_metadata["task_queue"] = tasks[-500:]
2108 job_metadata["last_task_record"] = current
2109 conn.execute(
2110 "UPDATE jobs SET updated_at = ?, metadata_json = ? WHERE id = ?",
2111 (now, _json_dumps(job_metadata), job_id),
2112 )
2113 return current
2114
2115 return self._write(op)
2116
2117 def append_experiment_record(
2118 self,
2119 job_id: str,
2120 *,
2121 title: str,
2122 hypothesis: str = "",
2123 status: str = "planned",
2124 metric_name: str = "",
2125 metric_value: float | None = None,
2126 metric_unit: str = "",
2127 higher_is_better: bool = True,
2128 baseline_value: float | None = None,
2129 config: dict[str, Any] | None = None,
2130 result: str = "",
2131 evidence_artifact: str = "",
2132 next_action: str = "",
2133 metadata: dict[str, Any] | None = None,
2134 ) -> dict[str, Any]:
2135 now = utc_now()
2136 title = title.strip()
2137 if not title:
2138 raise ValueError("title is required")
2139 status = (status.strip().lower() or "planned").replace(" ", "_")
2140 if status not in {"planned", "running", "measured", "failed", "blocked", "skipped"}:
2141 status = "planned"
2142 config_value = config if isinstance(config, dict) else {}
2143 key = _norm_key(f"{title}|{_json_dumps(config_value)}")
2144
2145 def op(conn: sqlite3.Connection) -> dict[str, Any]:
2146 row = conn.execute("SELECT metadata_json FROM jobs WHERE id = ?", (job_id,)).fetchone()
2147 if row is None:
2148 raise KeyError(f"Job not found: {job_id}")
2149 job_metadata = json.loads(row["metadata_json"] or "{}")
2150 experiments = _metadata_list(job_metadata, "experiment_ledger")
2151 current = next((entry for entry in experiments if entry.get("key") == key), None)
2152 created = current is None
2153 change_fields = (
2154 "hypothesis",
2155 "status",
2156 "metric_name",
2157 "metric_value",
2158 "metric_unit",
2159 "higher_is_better",
2160 "baseline_value",
2161 "config",
2162 "result",
2163 "evidence_artifact",
2164 "next_action",
2165 "metadata",
2166 "delta_from_previous_best",
2167 "best_observed",
2168 )
2169 before = "" if created else _change_fingerprint(current, change_fields)
2170 previous_best = _best_experiment_for_metric(
2171 experiments,
2172 metric_name=metric_name,
2173 metric_unit=metric_unit,
2174 higher_is_better=higher_is_better,
2175 exclude_key=key,
2176 )
2177 if current is None:
2178 current = {
2179 "key": key,
2180 "title": title,
2181 "hypothesis": hypothesis.strip(),
2182 "status": status,
2183 "metric_name": metric_name.strip(),
2184 "metric_value": metric_value,
2185 "metric_unit": metric_unit.strip(),
2186 "higher_is_better": bool(higher_is_better),
2187 "baseline_value": baseline_value,
2188 "config": config_value,
2189 "result": result.strip(),
2190 "evidence_artifact": evidence_artifact.strip(),
2191 "next_action": next_action.strip(),
2192 "metadata": metadata or {},
2193 "created_at": now,
2194 }
2195 experiments.append(current)
2196 else:
2197 current["status"] = status
2198 for field, value in {
2199 "hypothesis": hypothesis.strip(),
2200 "metric_name": metric_name.strip(),
2201 "metric_unit": metric_unit.strip(),
2202 "result": result.strip(),
2203 "evidence_artifact": evidence_artifact.strip(),
2204 "next_action": next_action.strip(),
2205 }.items():
2206 if value:
2207 current[field] = value
2208 current["higher_is_better"] = bool(higher_is_better)
2209 if metric_value is not None:
2210 current["metric_value"] = metric_value
2211 if baseline_value is not None:
2212 current["baseline_value"] = baseline_value
2213 if config_value:
2214 current["config"] = config_value
2215 if metadata:
2216 merged_metadata = current.get("metadata") if isinstance(current.get("metadata"), dict) else {}
2217 merged_metadata.update(metadata)
2218 current["metadata"] = merged_metadata
2219 current["created"] = created
2220 current["delta_from_previous_best"] = _metric_delta(
2221 metric_value=current.get("metric_value"),
2222 previous_best=previous_best,
2223 higher_is_better=bool(current.get("higher_is_better", True)),
2224 )
2225 best = _mark_best_experiments(experiments)
2226 substantive_update = created or before != _change_fingerprint(current, change_fields)
2227 current["substantive_update"] = substantive_update
2228 if substantive_update:
2229 current["updated_at"] = now
2230 event_body = current.get("result") or ""
2231 if current.get("metric_value") is not None:
2232 event_body = format_metric_value(
2233 current.get("metric_name") or "metric",
2234 current.get("metric_value"),
2235 current.get("metric_unit") or "",
2236 )
2237 if current.get("delta_from_previous_best") is not None:
2238 event_body += f" delta={current.get('delta_from_previous_best')}"
2239 if current.get("result"):
2240 event_body += f" | {current.get('result')}"
2241 if substantive_update:
2242 event = _insert_event(
2243 conn,
2244 job_id=job_id,
2245 event_type="experiment",
2246 title=current.get("title") or title,
2247 body=event_body,
2248 metadata={
2249 "created": created,
2250 "substantive_update": substantive_update,
2251 "status": current.get("status"),
2252 "metric_name": current.get("metric_name"),
2253 "metric_value": current.get("metric_value"),
2254 "metric_unit": current.get("metric_unit"),
2255 "higher_is_better": current.get("higher_is_better"),
2256 "best_observed": current.get("best_observed"),
2257 "delta_from_previous_best": current.get("delta_from_previous_best"),
2258 "evidence_artifact": current.get("evidence_artifact"),
2259 **(current.get("metadata") if isinstance(current.get("metadata"), dict) else {}),
2260 },
2261 created_at=now,
2262 )
2263 current["event_id"] = event["id"]
2264 job_metadata["experiment_ledger"] = experiments[-1000:]
2265 job_metadata["last_experiment_record"] = current
2266 if best:
2267 job_metadata["best_experiment_record"] = best
2268 conn.execute(
2269 "UPDATE jobs SET updated_at = ?, metadata_json = ? WHERE id = ?",
2270 (now, _json_dumps(job_metadata), job_id),
2271 )
2272 return current
2273
2274 return self._write(op)
2275
2276 def append_reflection(
2277 self,
2278 job_id: str,
2279 summary: str,
2280 *,
2281 strategy: str = "",
2282 metadata: dict[str, Any] | None = None,
2283 ) -> dict[str, Any]:
2284 now = utc_now()
2285 text = summary.strip()
2286 if not text:
2287 raise ValueError("summary is required")
2288 entry = {
2289 "at": now,
2290 "summary": text,
2291 "strategy": strategy.strip(),
2292 "metadata": metadata or {},
2293 }
2294
2295 def op(conn: sqlite3.Connection) -> dict[str, Any]:
2296 row = conn.execute("SELECT metadata_json FROM jobs WHERE id = ?", (job_id,)).fetchone()
2297 if row is None:
2298 raise KeyError(f"Job not found: {job_id}")
2299 event = _insert_event(
2300 conn,
2301 job_id=job_id,
2302 event_type="reflection",
2303 title="reflection",
2304 body=text,
2305 metadata={"strategy": strategy.strip(), **(metadata or {})},
2306 created_at=now,
2307 )
2308 entry["event_id"] = event["id"]
2309 job_metadata = json.loads(row["metadata_json"] or "{}")
2310 reflections = _metadata_list(job_metadata, "reflections")
2311 reflections.append(entry)
2312 job_metadata["reflections"] = reflections[-100:]
2313 job_metadata["last_reflection"] = entry
2314 conn.execute(
2315 "UPDATE jobs SET updated_at = ?, metadata_json = ? WHERE id = ?",
2316 (now, _json_dumps(job_metadata), job_id),
2317 )
2318 return entry
2319
2320 return self._write(op)
2321
2322 def start_run(self, job_id: str, *, model: str = "", config_hash: str = "") -> str:
2323 run_id = new_id("run")
2324 now = utc_now()
2325
2326 def op(conn: sqlite3.Connection) -> str:
2327 conn.execute(
2328 """
2329 INSERT INTO job_runs(id, job_id, status, started_at, model, config_hash)
2330 VALUES (?, ?, 'running', ?, ?, ?)
2331 """,
2332 (run_id, job_id, now, model, config_hash),
2333 )
2334 conn.execute("UPDATE jobs SET status = 'running', updated_at = ? WHERE id = ?", (now, job_id))
2335 _insert_event(
2336 conn,
2337 job_id=job_id,
2338 event_type="daemon",
2339 title="run started",
2340 body=f"model={model}" if model else "",
2341 ref_table="job_runs",
2342 ref_id=run_id,
2343 metadata={"model": model, "config_hash": config_hash},
2344 created_at=now,
2345 )
2346 return run_id
2347
2348 return self._write(op)
2349
2350 def finish_run(self, run_id: str, status: str, *, score: float | None = None, error: str | None = None) -> None:
2351 now = utc_now()
2352
2353 def op(conn: sqlite3.Connection) -> None:
2354 conn.execute(
2355 "UPDATE job_runs SET status = ?, ended_at = ?, score = ?, error = ? WHERE id = ?",
2356 (status, now, score, error, run_id),
2357 )
2358
2359 self._write(op)
2360
2361 def mark_interrupted_running(self, *, reason: str = "daemon interrupted active work") -> dict[str, int]:
2362 now = utc_now()
2363 output = {"success": False, "error": reason, "error_type": "Interrupted"}
2364
2365 def op(conn: sqlite3.Connection) -> dict[str, int]:
2366 step_result = conn.execute(
2367 """
2368 UPDATE steps
2369 SET status = 'failed',
2370 ended_at = ?,
2371 summary = COALESCE(summary, ?),
2372 output_json = ?,
2373 error = ?
2374 WHERE status = 'running'
2375 """,
2376 (now, reason, _json_dumps(output), reason),
2377 )
2378 run_result = conn.execute(
2379 """
2380 UPDATE job_runs
2381 SET status = 'failed',
2382 ended_at = ?,
2383 error = ?
2384 WHERE status = 'running'
2385 """,
2386 (now, reason),
2387 )
2388 return {"steps": int(step_result.rowcount or 0), "runs": int(run_result.rowcount or 0)}
2389
2390 return self._write(op)
2391
2392 def next_step_no(self, job_id: str) -> int:
2393 row = self._conn.execute("SELECT COALESCE(MAX(step_no), 0) + 1 AS next_step FROM steps WHERE job_id = ?", (job_id,)).fetchone()
2394 return int(row["next_step"])
2395
2396 def add_step(
2397 self,
2398 *,
2399 job_id: str,
2400 run_id: str,
2401 kind: str,
2402 status: str = "running",
2403 tool_name: str | None = None,
2404 summary: str | None = None,
2405 input_data: dict[str, Any] | None = None,
2406 ) -> str:
2407 step_id = new_id("step")
2408 step_no = self.next_step_no(job_id)
2409 now = utc_now()
2410
2411 def op(conn: sqlite3.Connection) -> str:
2412 conn.execute(
2413 """
2414 INSERT INTO steps(id, job_id, run_id, step_no, kind, status, tool_name, started_at, summary, input_json)
2415 VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
2416 """,
2417 (step_id, job_id, run_id, step_no, kind, status, tool_name, now, summary, _json_dumps(input_data)),
2418 )
2419 _insert_event(
2420 conn,
2421 job_id=job_id,
2422 event_type="tool_call" if tool_name else kind,
2423 title=tool_name or kind,
2424 body=summary or "",
2425 ref_table="steps",
2426 ref_id=step_id,
2427 metadata={"run_id": run_id, "step_no": step_no, "kind": kind, "status": status, "input": input_data or {}},
2428 created_at=now,
2429 )
2430 return step_id
2431
2432 return self._write(op)
2433
2434 def finish_step(
2435 self,
2436 step_id: str,
2437 *,
2438 status: str,
2439 summary: str | None = None,
2440 output_data: dict[str, Any] | None = None,
2441 error: str | None = None,
2442 ) -> None:
2443 now = utc_now()
2444
2445 def op(conn: sqlite3.Connection) -> None:
2446 row = conn.execute("SELECT job_id, run_id, step_no, kind, tool_name FROM steps WHERE id = ?", (step_id,)).fetchone()
2447 conn.execute(
2448 """
2449 UPDATE steps
2450 SET status = ?, ended_at = ?, summary = COALESCE(?, summary), output_json = ?, error = ?
2451 WHERE id = ?
2452 """,
2453 (status, now, summary, _json_dumps(output_data), error, step_id),
2454 )
2455 if row is not None:
2456 event_type = "error" if status == "failed" or error else "tool_result"
2457 if row["kind"] == "reflection" and not error:
2458 event_type = "reflection"
2459 _insert_event(
2460 conn,
2461 job_id=row["job_id"],
2462 event_type=event_type,
2463 title=row["tool_name"] or row["kind"],
2464 body=summary or error or "",
2465 ref_table="steps",
2466 ref_id=step_id,
2467 metadata={
2468 "run_id": row["run_id"],
2469 "step_no": row["step_no"],
2470 "kind": row["kind"],
2471 "status": status,
2472 "output": output_data or {},
2473 "error": error,
2474 },
2475 created_at=now,
2476 )
2477
2478 self._write(op)
2479
2480 def list_steps(self, *, job_id: str | None = None, run_id: str | None = None, limit: int | None = None) -> list[dict[str, Any]]:
2481 if run_id:
2482 if limit:
2483 rows = self._conn.execute(
2484 """
2485 SELECT * FROM (
2486 SELECT * FROM steps WHERE run_id = ? ORDER BY step_no DESC LIMIT ?
2487 ) ORDER BY step_no
2488 """,
2489 (run_id, int(limit)),
2490 ).fetchall()
2491 else:
2492 rows = self._conn.execute("SELECT * FROM steps WHERE run_id = ? ORDER BY step_no", (run_id,)).fetchall()
2493 elif job_id:
2494 if limit:
2495 rows = self._conn.execute(
2496 """
2497 SELECT * FROM (
2498 SELECT * FROM steps WHERE job_id = ? ORDER BY step_no DESC LIMIT ?
2499 ) ORDER BY step_no
2500 """,
2501 (job_id, int(limit)),
2502 ).fetchall()
2503 else:
2504 rows = self._conn.execute("SELECT * FROM steps WHERE job_id = ? ORDER BY started_at", (job_id,)).fetchall()
2505 else:
2506 if limit:
2507 rows = self._conn.execute(
2508 """
2509 SELECT * FROM (
2510 SELECT * FROM steps ORDER BY started_at DESC LIMIT ?
2511 ) ORDER BY started_at
2512 """,
2513 (int(limit),),
2514 ).fetchall()
2515 else:
2516 rows = self._conn.execute("SELECT * FROM steps ORDER BY started_at").fetchall()
2517 return [_row_to_dict(row) for row in rows]
2518
2519 def job_record_counts(self, job_id: str) -> dict[str, int]:
2520 row = self._conn.execute(
2521 """
2522 SELECT
2523 (SELECT COUNT(*) FROM steps WHERE job_id = ?) AS steps,
2524 (SELECT COUNT(*) FROM artifacts WHERE job_id = ?) AS artifacts,
2525 (SELECT COUNT(*) FROM memory_index WHERE job_id = ?) AS memory,
2526 (SELECT COUNT(*) FROM events WHERE job_id = ?) AS events
2527 """,
2528 (job_id, job_id, job_id, job_id),
2529 ).fetchone()
2530 return {
2531 "steps": int(row["steps"] or 0),
2532 "artifacts": int(row["artifacts"] or 0),
2533 "memory": int(row["memory"] or 0),
2534 "events": int(row["events"] or 0),
2535 }
2536
2537 def job_token_usage(self, job_id: str) -> dict[str, Any]:
2538 rows = self._conn.execute(
2539 """
2540 SELECT created_at, metadata_json
2541 FROM events
2542 WHERE job_id = ? AND event_type = 'loop' AND title = 'message_end'
2543 ORDER BY created_at ASC, id ASC
2544 """,
2545 (job_id,),
2546 ).fetchall()
2547 totals: dict[str, Any] = {
2548 "prompt_tokens": 0,
2549 "completion_tokens": 0,
2550 "total_tokens": 0,
2551 "reasoning_tokens": 0,
2552 "cached_tokens": 0,
2553 "cost": 0.0,
2554 "calls": 0,
2555 "estimated_calls": 0,
2556 "latest_prompt_tokens": 0,
2557 "latest_completion_tokens": 0,
2558 "latest_total_tokens": 0,
2559 "latest_context_length": 0,
2560 "latest_context_fraction": 0.0,
2561 "latest_at": "",
2562 "has_cost": False,
2563 }
2564 for row in rows:
2565 metadata = _json_loads(row["metadata_json"])
2566 usage = metadata.get("usage")
2567 if not isinstance(usage, dict):
2568 continue
2569 prompt = _as_int(usage.get("prompt_tokens"))
2570 completion = _as_int(usage.get("completion_tokens"))
2571 total = _as_int(usage.get("total_tokens")) or prompt + completion
2572 totals["prompt_tokens"] += prompt
2573 totals["completion_tokens"] += completion
2574 totals["total_tokens"] += total
2575 totals["reasoning_tokens"] += _as_int(_nested_value(usage, "completion_tokens_details", "reasoning_tokens"))
2576 totals["cached_tokens"] += _as_int(_nested_value(usage, "prompt_tokens_details", "cached_tokens"))
2577 cost = _as_float(usage.get("cost"))
2578 if cost is not None:
2579 totals["cost"] += cost
2580 totals["has_cost"] = True
2581 totals["calls"] += 1
2582 if bool(usage.get("estimated")):
2583 totals["estimated_calls"] += 1
2584 totals["latest_prompt_tokens"] = prompt
2585 totals["latest_completion_tokens"] = completion
2586 totals["latest_total_tokens"] = total
2587 totals["latest_context_length"] = _as_int(usage.get("context_length"))
2588 totals["latest_context_fraction"] = _as_float(usage.get("context_fraction")) or 0.0
2589 totals["latest_at"] = str(row["created_at"] or "")
2590 return totals
2591
2592 def list_runs(self, job_id: str, *, limit: int = 50) -> list[dict[str, Any]]:
2593 rows = self._conn.execute(
2594 "SELECT * FROM job_runs WHERE job_id = ? ORDER BY started_at DESC LIMIT ?",
2595 (job_id, limit),
2596 ).fetchall()
2597 return [_row_to_dict(row) for row in rows]
2598
2599 def add_artifact(
2600 self,
2601 *,
2602 job_id: str,
2603 path: str | Path,
2604 sha256: str,
2605 artifact_type: str,
2606 run_id: str | None = None,
2607 step_id: str | None = None,
2608 title: str | None = None,
2609 summary: str | None = None,
2610 metadata: dict[str, Any] | None = None,
2611 ) -> str:
2612 artifact_id = new_id("art")
2613 now = utc_now()
2614
2615 def op(conn: sqlite3.Connection) -> str:
2616 conn.execute(
2617 """
2618 INSERT INTO artifacts(id, job_id, run_id, step_id, type, path, sha256, title, summary, metadata_json, created_at)
2619 VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
2620 """,
2621 (
2622 artifact_id,
2623 job_id,
2624 run_id,
2625 step_id,
2626 artifact_type,
2627 str(path),
2628 sha256,
2629 title,
2630 summary,
2631 _json_dumps(metadata),
2632 now,
2633 ),
2634 )
2635 _insert_event(
2636 conn,
2637 job_id=job_id,
2638 event_type="artifact",
2639 title=title or artifact_id,
2640 body=summary or str(path),
2641 ref_table="artifacts",
2642 ref_id=artifact_id,
2643 metadata={"type": artifact_type, "path": str(path), "sha256": sha256, **(metadata or {})},
2644 created_at=now,
2645 )
2646 return artifact_id
2647
2648 return self._write(op)
2649
2650 def get_artifact(self, artifact_id: str) -> dict[str, Any]:
2651 row = self._conn.execute("SELECT * FROM artifacts WHERE id = ?", (artifact_id,)).fetchone()
2652 artifact = _row_to_dict(row)
2653 if artifact is None:
2654 raise KeyError(f"Artifact not found: {artifact_id}")
2655 return artifact
2656
2657 def list_artifacts(self, job_id: str, *, limit: int = 100) -> list[dict[str, Any]]:
2658 rows = self._conn.execute(
2659 "SELECT * FROM artifacts WHERE job_id = ? ORDER BY created_at DESC LIMIT ?",
2660 (job_id, limit),
2661 ).fetchall()
2662 return [_row_to_dict(row) for row in rows]
2663
2664 def upsert_memory(
2665 self,
2666 *,
2667 job_id: str,
2668 key: str,
2669 summary: str,
2670 artifact_refs: list[str] | None = None,
2671 ) -> str:
2672 memory_id = new_id("mem")
2673 now = utc_now()
2674
2675 def op(conn: sqlite3.Connection) -> str:
2676 conn.execute(
2677 """
2678 INSERT INTO memory_index(id, job_id, key, summary, artifact_refs_json, updated_at)
2679 VALUES (?, ?, ?, ?, ?, ?)
2680 ON CONFLICT(job_id, key) DO UPDATE SET
2681 summary = excluded.summary,
2682 artifact_refs_json = excluded.artifact_refs_json,
2683 updated_at = excluded.updated_at
2684 """,
2685 (memory_id, job_id, key, summary, _json_dumps(artifact_refs or []), now),
2686 )
2687 row = conn.execute("SELECT id FROM memory_index WHERE job_id = ? AND key = ?", (job_id, key)).fetchone()
2688 current_id = str(row["id"])
2689 _insert_event(
2690 conn,
2691 job_id=job_id,
2692 event_type="compaction",
2693 title=key,
2694 body=summary,
2695 ref_table="memory_index",
2696 ref_id=current_id,
2697 metadata={"artifact_refs": artifact_refs or []},
2698 created_at=now,
2699 )
2700 return current_id
2701
2702 return self._write(op)
2703
2704 def list_memory(self, job_id: str) -> list[dict[str, Any]]:
2705 rows = self._conn.execute(
2706 "SELECT * FROM memory_index WHERE job_id = ? ORDER BY updated_at DESC",
2707 (job_id,),
2708 ).fetchall()
2709 return [_row_to_dict(row) for row in rows]
2710
2711 def digest_exists(self, *, day: str, target: str) -> bool:
2712 row = self._conn.execute(
2713 "SELECT 1 FROM digests WHERE day = ? AND target = ? AND status IN ('sent', 'dry_run') LIMIT 1",
2714 (day, target),
2715 ).fetchone()
2716 return row is not None
2717
2718 def record_digest(
2719 self,
2720 *,
2721 day: str,
2722 target: str,
2723 subject: str,
2724 body_path: str | Path,
2725 status: str,
2726 error: str | None = None,
2727 ) -> str:
2728 digest_id = new_id("dig")
2729 sent_at = utc_now() if status in {"sent", "dry_run"} else None
2730
2731 def op(conn: sqlite3.Connection) -> str:
2732 conn.execute(
2733 """
2734 INSERT INTO digests(id, day, target, subject, body_path, sent_at, status, error)
2735 VALUES (?, ?, ?, ?, ?, ?, ?, ?)
2736 """,
2737 (digest_id, day, target, subject, str(body_path), sent_at, status, error),
2738 )
2739 _insert_event(
2740 conn,
2741 job_id=None,
2742 event_type="digest",
2743 title=subject,
2744 body=str(body_path),
2745 ref_table="digests",
2746 ref_id=digest_id,
2747 metadata={"day": day, "target": target, "status": status, "error": error},
2748 created_at=sent_at or utc_now(),
2749 )
2750 return digest_id
2751
2752 return self._write(op)
nipux_cli/digest.py 380 lines
1"""Digest rendering and optional email delivery."""
2
3from __future__ import annotations
4
5import smtplib
6from datetime import date
7from email.message import EmailMessage
8from pathlib import Path
9from typing import Any
10
11from nipux_cli.config import AppConfig, EmailConfig
12from nipux_cli.db import AgentDB
13from nipux_cli.operator_context import active_prompt_operator_entries
14from nipux_cli.tui_layout import _format_compact_count, _format_usage_cost
15
16
17def _metadata_list(job: dict, key: str) -> list[dict]:
18 metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
19 values = metadata.get(key)
20 return [value for value in values if isinstance(value, dict)] if isinstance(values, list) else []
21
22
23def _active_operator_messages(messages: list[dict]) -> list[dict]:
24 prompt_entries = active_prompt_operator_entries(messages)
25 return [
26 entry for entry in messages
27 if str(entry.get("mode") or "steer") in {"steer", "follow_up"}
28 and entry in prompt_entries
29 ]
30
31
32def _safe_int(value: Any) -> int:
33 try:
34 return int(float(value))
35 except (TypeError, ValueError):
36 return 0
37
38
39def _latest_run_model(db: AgentDB, job_id: str) -> str:
40 runs = db.list_runs(job_id, limit=1)
41 if runs:
42 return str(runs[0].get("model") or "unknown")
43 return "unknown"
44
45
46def _usage_lines(
47 db: AgentDB,
48 job_id: str,
49 *,
50 model: str | None = None,
51 base_url: str = "",
52 context_length: int = 0,
53 input_cost_per_million: float | None = None,
54 output_cost_per_million: float | None = None,
55) -> list[str]:
56 usage = db.job_token_usage(job_id)
57 usage["input_cost_per_million"] = input_cost_per_million
58 usage["output_cost_per_million"] = output_cost_per_million
59 calls = _safe_int(usage.get("calls"))
60 if calls <= 0:
61 return ["- No model usage recorded yet."]
62 model_name = model or _latest_run_model(db, job_id)
63 prompt = _safe_int(usage.get("prompt_tokens"))
64 completion = _safe_int(usage.get("completion_tokens"))
65 total = _safe_int(usage.get("total_tokens")) or prompt + completion
66 latest_prompt = _safe_int(usage.get("latest_prompt_tokens"))
67 latest_completion = _safe_int(usage.get("latest_completion_tokens"))
68 context_text = _format_compact_count(latest_prompt)
69 if context_length > 0:
70 context_text = f"{context_text}/{_format_compact_count(context_length)}"
71 cost_text = _format_usage_cost(usage, model=model_name, base_url=base_url)
72 lines = [
73 (
74 f"- {model_name}: {calls} calls, {_format_compact_count(total)} tokens "
75 f"({_format_compact_count(prompt)} prompt, {_format_compact_count(completion)} output), "
76 f"latest ctx={context_text}, latest output={_format_compact_count(latest_completion)}, cost={cost_text}"
77 )
78 ]
79 if _safe_int(usage.get("estimated_calls")):
80 lines.append("- Some token/cost values are estimated because the provider did not return complete usage metadata.")
81 elif not bool(usage.get("has_cost")) and cost_text == "pending":
82 lines.append("- Cost is pending until the provider returns cost metadata or the model is configured as local/free.")
83 return lines
84
85
86def render_job_digest(
87 db: AgentDB,
88 job_id: str,
89 *,
90 model: str | None = None,
91 base_url: str = "",
92 context_length: int = 0,
93 input_cost_per_million: float | None = None,
94 output_cost_per_million: float | None = None,
95) -> str:
96 job = db.get_job(job_id)
97 artifacts = db.list_artifacts(job_id, limit=50)
98 steps = db.list_steps(job_id=job_id)
99 findings = _metadata_list(job, "finding_ledger")
100 sources = _metadata_list(job, "source_ledger")
101 tasks = _metadata_list(job, "task_queue")
102 experiments = _metadata_list(job, "experiment_ledger")
103 lessons = _metadata_list(job, "lessons")
104 reflections = _metadata_list(job, "reflections")
105 operator_messages = _metadata_list(job, "operator_messages")
106 active_operator = _active_operator_messages(operator_messages)
107 lines = [
108 f"# {job['title']}",
109 "",
110 f"Status: {job['status']}",
111 f"Findings: {len(findings)}",
112 f"Sources: {len(sources)}",
113 f"Tasks: {len(tasks)}",
114 f"Experiments: {len(experiments)}",
115 f"Lessons: {len(lessons)}",
116 "",
117 "## Model Usage",
118 "",
119 *_usage_lines(
120 db,
121 job_id,
122 model=model,
123 base_url=base_url,
124 context_length=context_length,
125 input_cost_per_million=input_cost_per_million,
126 output_cost_per_million=output_cost_per_million,
127 ),
128 "",
129 "## Objective",
130 "",
131 job["objective"],
132 "",
133 "## Active Operator Context",
134 "",
135 ]
136 if not active_operator:
137 lines.append("- none")
138 for entry in active_operator[-8:]:
139 lines.append(f"- {entry.get('mode') or 'steer'}: {entry.get('message') or ''}")
140 lines.extend([
141 "",
142 "## Recent Steps",
143 "",
144 ])
145 if not steps:
146 lines.append("- No steps have run yet.")
147 for step in steps[-20:]:
148 tool = f" `{step['tool_name']}`" if step.get("tool_name") else ""
149 lines.append(f"- #{step['step_no']} {step['kind']}{tool}: {step['status']} - {step.get('summary') or ''}")
150 lines.extend(["", "## Best Findings", ""])
151 if not findings:
152 lines.append("- No findings recorded yet.")
153 for finding in sorted(findings, key=lambda item: float(item.get("score") or 0), reverse=True)[:15]:
154 details = " | ".join(str(finding.get(key) or "") for key in ("category", "location", "contact") if finding.get(key))
155 suffix = f" - {details}" if details else ""
156 lines.append(f"- {finding.get('name') or 'unknown'} (score={finding.get('score')}){suffix}")
157 if finding.get("reason"):
158 lines.append(f" - {finding['reason']}")
159 lines.extend(["", "## Source Learning", ""])
160 if not sources:
161 lines.append("- No sources scored yet.")
162 for source in sorted(sources, key=lambda item: float(item.get("usefulness_score") or 0), reverse=True)[:12]:
163 lines.append(
164 f"- {source.get('source')} score={source.get('usefulness_score')} "
165 f"findings={source.get('yield_count') or 0} fails={source.get('fail_count') or 0}: {source.get('last_outcome') or ''}"
166 )
167 lines.extend(["", "## Task Queue", ""])
168 if not tasks:
169 lines.append("- No tasks recorded yet.")
170 status_order = {"active": 0, "open": 1, "blocked": 2, "done": 3, "skipped": 4}
171 for task in sorted(tasks, key=lambda item: (status_order.get(str(item.get("status") or "open"), 9), -int(item.get("priority") or 0)))[:15]:
172 contract = f" [{task.get('output_contract')}]" if task.get("output_contract") else ""
173 lines.append(f"- {task.get('status') or 'open'} p={task.get('priority') or 0}{contract}: {task.get('title') or 'untitled'}")
174 for key, label in (("acceptance_criteria", "accept"), ("evidence_needed", "evidence"), ("stall_behavior", "stall")):
175 if task.get(key):
176 lines.append(f" - {label}: {task[key]}")
177 if task.get("result"):
178 lines.append(f" - {task['result']}")
179 lines.extend(["", "## Experiments", ""])
180 if not experiments:
181 lines.append("- No experiments recorded yet.")
182 measured = [experiment for experiment in experiments if experiment.get("metric_value") is not None]
183 for experiment in sorted(measured or experiments, key=lambda item: (not bool(item.get("best_observed")), str(item.get("updated_at") or item.get("created_at") or "")))[:15]:
184 metric = ""
185 if experiment.get("metric_value") is not None:
186 metric = f" {experiment.get('metric_name') or 'metric'}={experiment.get('metric_value')}{experiment.get('metric_unit') or ''}"
187 best = " best" if experiment.get("best_observed") else ""
188 lines.append(f"- {experiment.get('status') or 'planned'}: {experiment.get('title') or 'experiment'}{metric}{best}")
189 if experiment.get("result"):
190 lines.append(f" - {experiment['result']}")
191 lines.extend(["", "## Lessons", ""])
192 if not lessons:
193 lines.append("- No lessons recorded yet.")
194 for lesson in lessons[-12:]:
195 lines.append(f"- {lesson.get('category') or 'memory'}: {lesson.get('lesson') or ''}")
196 if reflections:
197 lines.extend(["", "## Current Strategy", ""])
198 reflection = reflections[-1]
199 lines.append(reflection.get("summary") or "")
200 if reflection.get("strategy"):
201 lines.append("")
202 lines.append(reflection["strategy"])
203 lines.extend(["", "## Artifacts", ""])
204 if not artifacts:
205 lines.append("- No artifacts yet.")
206 for artifact in artifacts[:20]:
207 title = artifact.get("title") or artifact["id"]
208 lines.append(f"- {title} ({artifact['type']}): {artifact['path']}")
209 return "\n".join(lines).rstrip() + "\n"
210
211
212def send_digest_email(config: EmailConfig, *, subject: str, body: str, to_addr: str | None = None) -> dict:
213 if not config.enabled:
214 return {"sent": False, "dry_run": True, "reason": "email.disabled", "subject": subject, "body": body}
215 target = to_addr or config.to_addr
216 if not all([config.smtp_host, config.from_addr, target]):
217 raise ValueError("Email is enabled but smtp_host/from_addr/to_addr is incomplete")
218 message = EmailMessage()
219 message["Subject"] = subject
220 message["From"] = config.from_addr
221 message["To"] = target
222 message.set_content(body)
223 with smtplib.SMTP(config.smtp_host, config.smtp_port, timeout=30) as smtp:
224 if config.use_tls:
225 smtp.starttls()
226 if config.username:
227 smtp.login(config.username, config.password)
228 smtp.send_message(message)
229 return {"sent": True, "target": target, "subject": subject}
230
231
232def render_daily_digest(
233 db: AgentDB,
234 *,
235 model: str | None = None,
236 base_url: str = "",
237 context_length: int = 0,
238 input_cost_per_million: float | None = None,
239 output_cost_per_million: float | None = None,
240) -> str:
241 jobs = [job for job in db.list_jobs() if job["status"] not in {"cancelled"}]
242 lines = ["# Nipux CLI Daily Digest", ""]
243 if not jobs:
244 lines.append("No jobs are currently tracked.")
245 return "\n".join(lines).rstrip() + "\n"
246
247 for job in jobs:
248 artifacts = db.list_artifacts(job["id"], limit=10)
249 steps = db.list_steps(job_id=job["id"])[-5:]
250 findings = _metadata_list(job, "finding_ledger")
251 sources = _metadata_list(job, "source_ledger")
252 tasks = _metadata_list(job, "task_queue")
253 experiments = _metadata_list(job, "experiment_ledger")
254 lessons = _metadata_list(job, "lessons")
255 reflections = _metadata_list(job, "reflections")
256 operator_messages = _metadata_list(job, "operator_messages")
257 active_operator = _active_operator_messages(operator_messages)
258 finding_batches = [artifact for artifact in artifacts if "finding" in str(artifact.get("title") or artifact.get("summary") or "").lower()]
259 lines.extend([
260 f"## {job['title']}",
261 "",
262 f"Status: {job['status']}",
263 f"Kind: {job['kind']}",
264 f"Counts: {len(findings)} findings, {len(sources)} sources, {len(tasks)} tasks, {len(experiments)} experiments, {len(lessons)} lessons, {len(finding_batches)} recent finding artifacts",
265 "",
266 "Model usage:",
267 ])
268 lines.extend(
269 _usage_lines(
270 db,
271 job["id"],
272 model=model,
273 base_url=base_url,
274 context_length=context_length,
275 input_cost_per_million=input_cost_per_million,
276 output_cost_per_million=output_cost_per_million,
277 )
278 )
279 lines.extend([
280 "",
281 "Recent steps:",
282 ])
283 if not steps:
284 lines.append("- none")
285 for step in steps:
286 tool = f" `{step['tool_name']}`" if step.get("tool_name") else ""
287 lines.append(f"- #{step['step_no']} {step['kind']}{tool}: {step['status']} - {step.get('summary') or ''}")
288 lines.extend(["", "Active operator context:"])
289 if not active_operator:
290 lines.append("- none")
291 for entry in active_operator[-5:]:
292 lines.append(f"- {entry.get('mode') or 'steer'}: {entry.get('message') or ''}")
293 lines.extend(["", "Best findings:"])
294 if not findings:
295 lines.append("- none")
296 for finding in sorted(findings, key=lambda item: float(item.get("score") or 0), reverse=True)[:8]:
297 lines.append(f"- {finding.get('name') or 'unknown'} (score={finding.get('score')}) - {finding.get('reason') or finding.get('category') or ''}")
298 lines.extend(["", "Task queue:"])
299 if not tasks:
300 lines.append("- none")
301 status_order = {"active": 0, "open": 1, "blocked": 2, "done": 3, "skipped": 4}
302 for task in sorted(tasks, key=lambda item: (status_order.get(str(item.get("status") or "open"), 9), -int(item.get("priority") or 0)))[:8]:
303 contract = f" [{task.get('output_contract')}]" if task.get("output_contract") else ""
304 lines.append(f"- {task.get('status') or 'open'} p={task.get('priority') or 0}{contract}: {task.get('title') or 'untitled'}")
305 lines.extend(["", "Experiments:"])
306 if not experiments:
307 lines.append("- none")
308 measured = [experiment for experiment in experiments if experiment.get("metric_value") is not None]
309 for experiment in (measured or experiments)[-8:]:
310 metric = ""
311 if experiment.get("metric_value") is not None:
312 metric = f" {experiment.get('metric_name') or 'metric'}={experiment.get('metric_value')}{experiment.get('metric_unit') or ''}"
313 best = " best" if experiment.get("best_observed") else ""
314 lines.append(f"- {experiment.get('status') or 'planned'}: {experiment.get('title') or 'experiment'}{metric}{best}")
315 lines.extend(["", "Lessons learned:"])
316 if not lessons:
317 lines.append("- none")
318 for lesson in lessons[-8:]:
319 lines.append(f"- {lesson.get('category') or 'memory'}: {lesson.get('lesson') or ''}")
320 lines.extend(["", "Source quality:"])
321 if not sources:
322 lines.append("- none")
323 for source in sorted(sources, key=lambda item: float(item.get("usefulness_score") or 0), reverse=True)[:8]:
324 lines.append(f"- {source.get('source')} score={source.get('usefulness_score')} findings={source.get('yield_count') or 0}: {source.get('last_outcome') or ''}")
325 if reflections:
326 reflection = reflections[-1]
327 lines.extend(["", "Current strategy:", f"- {reflection.get('strategy') or reflection.get('summary') or ''}"])
328 lines.extend(["", "Next branches:"])
329 lines.append("- Continue with high-yield source types, avoid low-yield paths, and save durable findings as artifacts.")
330 lines.extend(["", "Recent artifacts:"])
331 if not artifacts:
332 lines.append("- none")
333 for artifact in artifacts:
334 title = artifact.get("title") or artifact["id"]
335 lines.append(f"- {title} ({artifact['type']}): {artifact['path']}")
336 lines.append("")
337 return "\n".join(lines).rstrip() + "\n"
338
339
340def write_daily_digest(config: AppConfig, db: AgentDB, *, day: str | None = None) -> dict:
341 day = day or date.today().isoformat()
342 target = config.email.to_addr or "dry-run"
343 subject = f"Nipux CLI daily digest - {day}"
344 if db.digest_exists(day=day, target=target):
345 return {"sent": False, "skipped": True, "reason": "already_recorded", "day": day, "target": target}
346
347 body = render_daily_digest(
348 db,
349 model=config.model.model,
350 base_url=config.model.base_url,
351 context_length=config.model.context_length,
352 input_cost_per_million=config.model.input_cost_per_million,
353 output_cost_per_million=config.model.output_cost_per_million,
354 )
355 config.runtime.digests_dir.mkdir(parents=True, exist_ok=True)
356 body_path = Path(config.runtime.digests_dir) / f"{day}-daily.md"
357 body_path.write_text(body, encoding="utf-8")
358
359 try:
360 email_result = send_digest_email(config.email, subject=subject, body=body)
361 status = "sent" if email_result.get("sent") else "dry_run"
362 digest_id = db.record_digest(day=day, target=target, subject=subject, body_path=body_path, status=status)
363 return {
364 "digest_id": digest_id,
365 "status": status,
366 "day": day,
367 "target": target,
368 "path": str(body_path),
369 "email": email_result,
370 }
371 except Exception as exc:
372 digest_id = db.record_digest(
373 day=day,
374 target=target,
375 subject=subject,
376 body_path=body_path,
377 status="failed",
378 error=str(exc),
379 )
380 return {"digest_id": digest_id, "status": "failed", "day": day, "target": target, "path": str(body_path), "error": str(exc)}
nipux_cli/doctor.py 260 lines
1"""Runtime checks for the Nipux agent."""
2
3from __future__ import annotations
4
5import json
6import shutil
7import urllib.error
8import urllib.request
9from dataclasses import dataclass
10from pathlib import Path
11from typing import Any
12from urllib.parse import urlparse
13
14from nipux_cli.config import AppConfig, load_config
15from nipux_cli.db import AgentDB
16from nipux_cli.tools import DEFAULT_REGISTRY
17
18
19@dataclass(frozen=True)
20class Check:
21 name: str
22 ok: bool
23 detail: str
24
25 def as_dict(self) -> dict[str, Any]:
26 return {"name": self.name, "ok": self.ok, "detail": self.detail}
27
28
29def _check_writable_dir(path: Path) -> Check:
30 try:
31 path.mkdir(parents=True, exist_ok=True)
32 probe = path / ".write-test"
33 probe.write_text("ok", encoding="utf-8")
34 probe.unlink()
35 return Check("state_dir_writable", True, str(path))
36 except OSError as exc:
37 return Check("state_dir_writable", False, f"{path}: {exc}")
38
39
40def _check_db(config: AppConfig) -> Check:
41 try:
42 db = AgentDB(config.runtime.state_db_path)
43 db.close()
44 return Check("sqlite", True, str(config.runtime.state_db_path))
45 except Exception as exc:
46 return Check("sqlite", False, str(exc))
47
48
49def _check_tool_surface() -> Check:
50 names = DEFAULT_REGISTRY.names()
51 forbidden = sorted({"terminal", "delegate_task", "skill_manage", "image_generate"} & set(names))
52 if forbidden:
53 return Check("tool_surface", False, f"forbidden tools exposed: {', '.join(forbidden)}")
54 return Check("tool_surface", True, f"{len(names)} tools: {', '.join(names)}")
55
56
57def _check_model_config(config: AppConfig) -> Check:
58 base_url = config.model.base_url
59 host = (urlparse(base_url).hostname or "").lower()
60 local_hosts = {"", "localhost", "127.0.0.1", "::1", "0.0.0.0"}
61 if host in local_hosts or host.endswith(".local"):
62 return Check("model_config", True, f"{config.model.model} at {base_url}")
63 if config.model.api_key:
64 return Check("model_config", True, f"{config.model.model} at {base_url}; key read from {config.model.api_key_env}")
65 return Check(
66 "model_config",
67 False,
68 f"{config.model.api_key_env} is not set for remote endpoint {base_url}; put it in the shell or ~/.nipux/.env",
69 )
70
71
72def _check_browser_runtime() -> Check:
73 direct = shutil.which("agent-browser")
74 if direct:
75 return Check("browser_runtime", True, f"agent-browser: {direct}")
76 npx = shutil.which("npx")
77 if npx:
78 return Check("browser_runtime", True, f"agent-browser available through npx fallback: {npx}")
79 return Check(
80 "browser_runtime",
81 False,
82 "agent-browser not found and npx is unavailable; install with: npm install -g agent-browser && agent-browser install",
83 )
84
85
86def _check_model_endpoint(config: AppConfig) -> Check:
87 if "openrouter.ai" in config.model.base_url and not config.model.api_key:
88 return Check("model_endpoint", False, "API key is not set")
89 auth = _check_openrouter_auth(config)
90 if auth is not None and not auth.ok:
91 return auth
92 url = config.model.base_url.rstrip("/") + "/models"
93 request = urllib.request.Request(url, headers={"Authorization": f"Bearer {config.model.api_key or 'local-no-key'}"})
94 try:
95 with urllib.request.urlopen(request, timeout=5) as response:
96 payload = response.read(512_000).decode("utf-8", errors="replace")
97 try:
98 data = json.loads(payload)
99 count = len(data.get("data", [])) if isinstance(data, dict) else "unknown"
100 available = _model_available(data, config.model.model)
101 if available is False:
102 return Check("model_endpoint", False, f"{config.model.model} not found at {url}; models={count}")
103 generation = _check_model_generation(config)
104 if not generation.ok:
105 return generation
106 return Check("model_endpoint", True, f"{url} returned models={count}; {config.model.model} available; generation accepted")
107 except json.JSONDecodeError:
108 generation = _check_model_generation(config)
109 if not generation.ok:
110 return generation
111 return Check("model_endpoint", True, f"{url} responded; generation accepted")
112 except (urllib.error.URLError, TimeoutError, OSError) as exc:
113 return Check("model_endpoint", False, f"{url}: {exc}")
114
115
116def _check_model_generation(config: AppConfig) -> Check:
117 url = config.model.base_url.rstrip("/") + "/chat/completions"
118 payload = {
119 "model": config.model.model,
120 "messages": [{"role": "user", "content": "Reply with exactly: ok"}],
121 "max_tokens": 8,
122 "temperature": 0,
123 "tools": [
124 {
125 "type": "function",
126 "function": {
127 "name": "noop",
128 "description": "No-op model readiness probe.",
129 "parameters": {
130 "type": "object",
131 "properties": {"reason": {"type": "string"}},
132 "required": ["reason"],
133 },
134 },
135 }
136 ],
137 }
138 request = urllib.request.Request(
139 url,
140 data=json.dumps(payload).encode("utf-8"),
141 headers={
142 "Authorization": f"Bearer {config.model.api_key or 'local-no-key'}",
143 "Content-Type": "application/json",
144 },
145 method="POST",
146 )
147 try:
148 with urllib.request.urlopen(request, timeout=15) as response:
149 body = response.read(64_000).decode("utf-8", errors="replace")
150 data = json.loads(body)
151 choices = data.get("choices") if isinstance(data, dict) else None
152 if isinstance(choices, list) and choices:
153 return Check("model_generation", True, f"{url} accepted chat/tool request")
154 return Check("model_generation", False, f"{url} returned no choices")
155 except urllib.error.HTTPError as exc:
156 body = exc.read(2048).decode("utf-8", errors="replace")
157 detail = _extract_error_message(body) or str(exc)
158 return Check("model_generation", False, f"{url}: {detail}")
159 except (json.JSONDecodeError, urllib.error.URLError, TimeoutError, OSError) as exc:
160 return Check("model_generation", False, f"{url}: {exc}")
161
162
163def _check_openrouter_auth(config: AppConfig) -> Check | None:
164 if "openrouter.ai" not in config.model.base_url:
165 return None
166 if not config.model.api_key:
167 return Check("model_auth", False, "OpenRouter API key is not set")
168 url = "https://openrouter.ai/api/v1/key"
169 request = urllib.request.Request(url, headers={"Authorization": f"Bearer {config.model.api_key}"})
170 try:
171 with urllib.request.urlopen(request, timeout=5) as response:
172 response.read(2048)
173 return Check("model_auth", True, "OpenRouter API key accepted")
174 except urllib.error.HTTPError as exc:
175 body = exc.read(512).decode("utf-8", errors="replace")
176 detail = _extract_error_message(body) or str(exc)
177 return Check("model_auth", False, f"OpenRouter rejected API key: {detail}")
178 except (urllib.error.URLError, TimeoutError, OSError) as exc:
179 return Check("model_auth", False, f"{url}: {exc}")
180
181
182def _extract_error_message(body: str) -> str:
183 try:
184 data = json.loads(body)
185 except json.JSONDecodeError:
186 return body.strip()
187 error = data.get("error") if isinstance(data, dict) else None
188 if isinstance(error, dict):
189 message = str(error.get("message") or "").strip()
190 code = str(error.get("code") or "").strip()
191 metadata = error.get("metadata")
192 raw = _extract_error_raw(metadata)
193 provider = _metadata_value(metadata, "provider_name")
194 byok = _metadata_value(metadata, "is_byok")
195
196 primary = message or code
197 if raw and raw != primary:
198 primary = f"{primary}: {raw}" if primary else raw
199 details = []
200 if code and code not in primary:
201 details.append(f"code={code}")
202 if provider:
203 details.append(f"provider={provider}")
204 if byok not in {"", None}:
205 details.append(f"byok={byok}")
206 if details:
207 primary = f"{primary} ({'; '.join(details)})" if primary else "; ".join(details)
208 return primary.strip()
209 return ""
210
211
212def _metadata_value(metadata: Any, key: str) -> str:
213 if not isinstance(metadata, dict):
214 return ""
215 value = metadata.get(key)
216 if value is None:
217 return ""
218 return str(value).strip()
219
220
221def _extract_error_raw(metadata: Any) -> str:
222 if not isinstance(metadata, dict):
223 return ""
224 raw = metadata.get("raw")
225 if raw is None:
226 return ""
227 if isinstance(raw, dict):
228 return _extract_error_message(json.dumps(raw))
229 raw_text = str(raw).strip()
230 if not raw_text:
231 return ""
232 try:
233 raw_json = json.loads(raw_text)
234 except json.JSONDecodeError:
235 return raw_text
236 if isinstance(raw_json, dict):
237 nested = _extract_error_message(json.dumps(raw_json))
238 return nested or raw_text
239 return raw_text
240
241
242def _model_available(data: Any, model: str) -> bool | None:
243 if not isinstance(data, dict) or not isinstance(data.get("data"), list):
244 return None
245 ids = {str(item.get("id") or "") for item in data["data"] if isinstance(item, dict)}
246 return model in ids
247
248
249def run_doctor(*, config: AppConfig | None = None, check_model: bool = False) -> list[Check]:
250 config = config or load_config()
251 checks = [
252 _check_writable_dir(config.runtime.home),
253 _check_db(config),
254 _check_model_config(config),
255 _check_tool_surface(),
256 _check_browser_runtime(),
257 ]
258 if check_model:
259 checks.append(_check_model_endpoint(config))
260 return checks
nipux_cli/event_render.py 118 lines
1"""Readable event rendering shared by CLI history and chat context."""
2
3from __future__ import annotations
4
5import shlex
6from typing import Any
7
8from nipux_cli.metric_format import format_metric_value
9from nipux_cli.tui_event_format import clean_step_summary, generic_display_text
10from nipux_cli.tui_style import _one_line
11
12
13def event_line(event: dict[str, Any], *, chars: int, full: bool = False) -> str:
14 when, label, detail, access = event_display_parts(event, chars=chars, full=full)
15 suffix = f" | {access}" if access and full else ""
16 return f"{when:<16} {label:<8} {_one_line(detail + suffix, chars)}"
17
18
19def event_display_parts(event: dict[str, Any], *, chars: int, full: bool = False) -> tuple[str, str, str, str]:
20 when = compact_time(str(event.get("created_at") or "?"))
21 kind = str(event.get("event_type") or "event")
22 title = str(event.get("title") or "").strip()
23 body = generic_display_text(event.get("body") or "")
24 ref_table = str(event.get("ref_table") or "")
25 ref_id = str(event.get("ref_id") or "")
26 metadata = event.get("metadata") if isinstance(event.get("metadata"), dict) else {}
27 label = event_label(kind, metadata)
28 access = ""
29 if kind == "tool_result" and metadata.get("status"):
30 label = event_label(f"{kind}:{metadata.get('status')}", metadata)
31 if kind == "error":
32 label = "ERROR"
33 if kind.startswith("tool_result") or kind == "error":
34 body = clean_step_summary(body)
35 if kind == "artifact":
36 title = title or ref_id
37 if body.startswith("/") or "/.nipux/jobs/" in body or "/jobs/job_" in body:
38 body = generic_display_text(metadata.get("summary") or "saved output")
39 if title:
40 access = f"open: /artifact {shlex.quote(title)}"
41 if kind == "operator_message" and metadata.get("mode"):
42 title = f"{title or 'operator'} {metadata.get('mode')}"
43 if kind == "operator_context":
44 body = body or f"{metadata.get('count') or 0} message(s)"
45 if kind in {"tool_call", "tool_result", "error"} and metadata.get("step_no"):
46 title = f"#{metadata.get('step_no')} {title}".strip()
47 if not body and kind == "artifact" and metadata.get("path"):
48 body = str(metadata.get("type") or "saved artifact")
49 if not body and kind == "finding" and metadata.get("category"):
50 body = str(metadata.get("category") or "")
51 if not body and kind == "task" and metadata.get("status"):
52 body = str(metadata.get("status") or "")
53 if not body and kind == "roadmap" and metadata.get("status"):
54 body = str(metadata.get("status") or "")
55 if not body and kind == "milestone_validation" and metadata.get("validation_status"):
56 body = str(metadata.get("validation_status") or "")
57 if not body and kind == "experiment":
58 metric_value = metadata.get("metric_value")
59 if metric_value is not None:
60 body = format_metric_value(
61 metadata.get("metric_name") or "metric",
62 metric_value,
63 metadata.get("metric_unit") or "",
64 )
65 if kind == "compaction":
66 body = _one_line(body, min(chars, 140))
67 if kind == "daemon" and title == "run started":
68 body = body or str(metadata.get("model") or "")
69 detail = title if title else kind
70 if body:
71 detail = f"{detail} - {body}"
72 if ref_table and ref_id and full:
73 detail = f"{detail} [{ref_table}:{ref_id}]"
74 return when, label, detail, access
75
76
77def event_label(kind: str, metadata: dict[str, Any]) -> str:
78 if kind == "operator_message":
79 mode = str(metadata.get("mode") or "")
80 return "FOLLOW" if mode == "follow_up" else "USER"
81 if kind == "operator_context":
82 return "ACK"
83 if kind == "agent_message":
84 return "AGENT"
85 if kind == "roadmap":
86 return "ROAD"
87 if kind == "milestone_validation":
88 return "VALID"
89 if kind == "tool_call":
90 return "TOOL"
91 if kind.startswith("tool_result"):
92 status = str(metadata.get("status") or "")
93 if status == "blocked":
94 return "BLOCK"
95 if status == "failed" or kind.endswith(":failed"):
96 return "ERROR"
97 return "DONE"
98 labels = {
99 "artifact": "OUTPUT",
100 "compaction": "MEMORY",
101 "daemon": "SYSTEM",
102 "digest": "DIGEST",
103 "error": "ERROR",
104 "experiment": "TEST",
105 "finding": "FIND",
106 "lesson": "LEARN",
107 "reflection": "PLAN",
108 "source": "SOURCE",
109 "task": "TASK",
110 }
111 return labels.get(kind, kind.upper()[:8])
112
113
114def compact_time(value: str) -> str:
115 text = value.replace("T", " ")
116 if len(text) >= 16 and text[4:5] == "-" and text[13:14] == ":":
117 return text[:16]
118 return _one_line(text, 16)
nipux_cli/first_run_controller.py 170 lines
1"""First-run command decisions for the Nipux TUI."""
2
3from __future__ import annotations
4
5import shlex
6from contextlib import redirect_stdout
7from dataclasses import dataclass
8from io import StringIO
9from typing import Callable
10
11from nipux_cli.settings import config_field_value
12from nipux_cli.tui_commands import CHAT_SETTING_COMMANDS
13from nipux_cli.frame_snapshot import WORKSPACE_CHAT_ID
14
15
16TOGGLE_SETTING_COMMANDS = {
17 "tools.browser": "browser",
18 "tools.web": "web",
19 "tools.shell": "cli-access",
20 "tools.files": "file-access",
21}
22
23
24@dataclass(frozen=True)
25class FirstRunFrameDeps:
26 capture_command: Callable[[str], list[str]]
27 capture_setting_command: Callable[[str], list[str]]
28 create_job: Callable[..., tuple[str, str]]
29 current_default_job_id: Callable[[], str | None]
30 extract_objective: Callable[[str], str]
31 model_setup_verified: Callable[[], bool]
32 verify_model_setup: Callable[[], list[str]]
33 shell_command_names: set[str]
34
35
36def handle_first_run_action(action: str, *, deps: FirstRunFrameDeps) -> tuple[str, str | list[str] | None]:
37 if action == "open_workspace" and not deps.model_setup_verified():
38 return "notice", "Run Doctor first. The workspace opens only after the configured model accepts a chat request."
39 if action == "open_workspace":
40 return "open", WORKSPACE_CHAT_ID
41 if action.startswith("view:"):
42 return "view", action.split(":", 1)[1]
43 if action == "preset:local":
44 notices = [
45 *deps.capture_setting_command("model local-model"),
46 *deps.capture_setting_command("base-url http://localhost:8000/v1"),
47 *deps.capture_setting_command("api-key-env OPENAI_API_KEY"),
48 "Local connector selected. Start your OpenAI-compatible server, then run Doctor.",
49 ]
50 return "notice", notices
51 if action.startswith("toggle:"):
52 field = action.split(":", 1)[1]
53 command = TOGGLE_SETTING_COMMANDS.get(field)
54 if not command:
55 return "notice", f"Unknown setup toggle: {field}"
56 next_value = "false" if bool(config_field_value(field)) else "true"
57 return "notice", deps.capture_setting_command(f"{command} {next_value}")
58 if action.startswith("edit:"):
59 return "edit", action.split(":", 1)[1]
60 if action.startswith("secret:"):
61 return "edit", action
62 if action == "new":
63 return "notice", "Finish setup first. Then describe worker jobs in the chat workspace."
64 if action == "back":
65 return "view", "endpoint"
66 if action == "jobs":
67 return "notice", "Finish setup first. Jobs are available after Doctor verifies the configured model."
68 if action == "doctor":
69 notices = deps.verify_model_setup()
70 if deps.model_setup_verified():
71 return "open", WORKSPACE_CHAT_ID
72 return "notice", notices
73 if action == "init":
74 return "notice", deps.capture_command("init")
75 if action == "exit":
76 return "exit", None
77 return "notice", f"Unknown action: {action}"
78
79
80def handle_first_run_frame_line(line: str, *, deps: FirstRunFrameDeps) -> tuple[str, str | list[str] | None]:
81 original = line.strip()
82 if original.startswith("/"):
83 original = original[1:].strip()
84 lowered = original.lower()
85 if lowered in {"exit", "quit", ":q", "5"}:
86 return "exit", None
87 if lowered in {"clear"}:
88 return "clear", None
89 if lowered in {"help", "?", "commands"}:
90 return "notice", [
91 "Finish setup before chat or jobs are available.",
92 "Enter endpoint, API key, and model id when prompted.",
93 "Doctor must verify the configured model before the workspace opens.",
94 ]
95 if lowered in {"1", "new"}:
96 return "notice", "Finish setup first. Then tell Nipux what job to create from the chat workspace."
97 if lowered.startswith("new "):
98 return "notice", "Finish setup first. Then describe worker jobs in the chat workspace."
99 if lowered in {"2", "jobs", "ls"}:
100 return "notice", "Finish setup first. Jobs are available after Doctor verifies the configured model."
101 if lowered == "settings":
102 return "notice", "Config is changed with slash commands: /model, /api-key, /base-url, /context."
103 if lowered in {"back"}:
104 return "notice", "Setup is linear during first run. Continue forward, then edit settings later if needed."
105 if lowered in {"3", "doctor"}:
106 notices = deps.verify_model_setup()
107 if deps.model_setup_verified():
108 return "open", WORKSPACE_CHAT_ID
109 return "notice", notices
110 if lowered in {"4", "init"}:
111 return "notice", deps.capture_command("init")
112 if lowered == "shell":
113 return "notice", "The old console is only available as `nipux shell` from your terminal."
114 first = first_token(original)
115 if first == "shell":
116 return "notice", "The old console is only available as `nipux shell` from your terminal."
117 if first in {"create", "new"}:
118 return "notice", "Finish setup first. Then describe worker jobs in the chat workspace."
119 if first in CHAT_SETTING_COMMANDS or first in {"api-key", "key"}:
120 return "notice", deps.capture_setting_command(original)
121 if first in deps.shell_command_names:
122 before_job_id = deps.current_default_job_id()
123 output = deps.capture_command(original)
124 after_job_id = deps.current_default_job_id()
125 if first == "create" and after_job_id and after_job_id != before_job_id:
126 return "open", after_job_id
127 return "notice", output
128 objective = deps.extract_objective(original)
129 if objective:
130 return "notice", "Finish setup first. Then describe worker jobs in the chat workspace."
131 return "notice", first_run_chat_reply(original)
132
133
134def first_run_chat_reply(message: str) -> str:
135 del message
136 return "Setup must be completed before chat is available."
137
138
139def create_first_run_job(objective: str, *, deps: FirstRunFrameDeps) -> str | list[str]:
140 objective = objective.strip()
141 if not objective:
142 return ["No job created. Type an objective first."]
143 if not deps.model_setup_verified():
144 return [
145 "No job created.",
146 "Finish model setup first: choose a connector, set the endpoint/key if needed, then run Doctor.",
147 "Doctor must confirm that the configured model accepts a chat request.",
148 ]
149 job_id, _title = deps.create_job(objective=objective, title=None, kind="generic", cadence=None)
150 return job_id
151
152
153def capture_first_run_command(line: str, run_shell_line: Callable[[str], bool]) -> list[str]:
154 stream = StringIO()
155 with redirect_stdout(stream):
156 try:
157 run_shell_line(line)
158 except SystemExit as exc:
159 if exc.code not in (None, 0):
160 print(f"command exited with status {exc.code}")
161 lines = [" ".join(item.split()) for item in stream.getvalue().splitlines() if item.strip()]
162 return lines[-8:] or ["done"]
163
164
165def first_token(line: str) -> str:
166 try:
167 parts = shlex.split(line)
168 except ValueError:
169 parts = line.split()
170 return parts[0].lower() if parts else ""
nipux_cli/first_run_frame_runtime.py 436 lines
1"""Terminal runtime for the first-run Nipux workspace."""
2
3from __future__ import annotations
4
5import select
6import shutil
7import sys
8import termios
9import time
10import tty
11from dataclasses import dataclass
12from typing import Callable
13from urllib.parse import urlparse
14
15from nipux_cli.config import load_config
16from nipux_cli.settings import inline_setting_notice
17from nipux_cli.tui_commands import FIRST_RUN_SLASH_COMMANDS, autocomplete_slash, cycle_slash, slash_completion_for_submit
18from nipux_cli.tui_input import (
19 decode_terminal_escape,
20 drain_pending_input,
21 read_escape_sequence,
22 read_terminal_char,
23)
24from nipux_cli.tui_style import _frame_enter_sequence, _frame_exit_sequence
25
26
27@dataclass(frozen=True)
28class FirstRunRuntimeDeps:
29 render_frame: Callable[[str, list[str], int, str, str | None, str], str]
30 actions: Callable[[str], list[tuple[str, str, str]]]
31 handle_action: Callable[[str], tuple[str, str | list[str] | None]]
32 handle_line: Callable[[str], tuple[str, str | list[str] | None]]
33 click_action: Callable[[int, int, str], int | str | None]
34
35
36def run_first_run_frame(*, deps: FirstRunRuntimeDeps) -> str | None:
37 buffer = ""
38 notices: list[str] = []
39 next_job_id: str | None = None
40 view = "endpoint"
41 selected = 0
42 editing_field: str | None = required_first_run_edit_field(view)
43 old_attrs = termios.tcgetattr(sys.stdin)
44 print(_frame_enter_sequence(), end="", flush=True)
45 try:
46 stdin_fd = sys.stdin.fileno()
47 tty.setcbreak(stdin_fd)
48 needs_render = True
49 last_render = 0.0
50 last_frame = ""
51 while next_job_id is None:
52 now = time.monotonic()
53 if needs_render or now - last_render >= 1.0:
54 selected = clamp_selection(selected, deps.actions(view))
55 last_frame = _safe_render_frame(
56 deps,
57 buffer=buffer,
58 notices=notices,
59 selected=selected,
60 view=view,
61 editing_field=editing_field,
62 previous_frame=last_frame,
63 )
64 needs_render = False
65 last_render = now
66 try:
67 readable, _, _ = select.select([stdin_fd], [], [], 0.05)
68 except OSError as exc:
69 _append_notice(notices, f"terminal read failed: {type(exc).__name__}: {_one_line(exc, 90)}")
70 needs_render = True
71 continue
72 if not readable:
73 continue
74 try:
75 char = read_terminal_char(stdin_fd)
76 except OSError as exc:
77 _append_notice(notices, f"terminal input failed: {type(exc).__name__}: {_one_line(exc, 90)}")
78 needs_render = True
79 continue
80 if editing_field is not None:
81 previous_edit = editing_field
82 try:
83 buffer, editing_field, should_exit = _handle_edit_input(
84 char,
85 buffer=buffer,
86 editing_field=editing_field,
87 notices=notices,
88 stdin_fd=stdin_fd,
89 )
90 except Exception as exc:
91 buffer = ""
92 editing_field = None
93 _append_notice(notices, f"edit failed: {type(exc).__name__}: {_one_line(exc, 90)}")
94 needs_render = True
95 continue
96 if should_exit:
97 return None
98 if previous_edit and editing_field is None:
99 next_view = next_first_run_view_after_edit(view)
100 if next_view:
101 view = next_view
102 selected = 0
103 editing_field = required_first_run_edit_field(view)
104 needs_render = True
105 continue
106 if char in {"\r", "\n"}:
107 buffer, should_submit = slash_completion_for_submit(buffer, FIRST_RUN_SLASH_COMMANDS)
108 if not should_submit:
109 needs_render = True
110 continue
111 try:
112 action, payload = _submit_first_run_line(buffer, selected=selected, view=view, deps=deps)
113 except Exception as exc:
114 action, payload = "notice", f"input failed: {type(exc).__name__}: {_one_line(exc, 100)}"
115 buffer = ""
116 try:
117 state = _apply_first_run_action(action, payload, view=view, selected=selected, notices=notices)
118 except Exception as exc:
119 _append_notice(notices, f"action failed: {type(exc).__name__}: {_one_line(exc, 100)}")
120 state = (view, selected, None, None, False)
121 view, selected, editing_field, next_job_id, should_exit = state
122 editing_field = editing_field or required_first_run_edit_field(view)
123 if should_exit:
124 return None
125 needs_render = True
126 continue
127 if char in {"\x04"}:
128 return None
129 if char == "\x03":
130 buffer = ""
131 _append_notice(notices, "cancelled input")
132 needs_render = True
133 continue
134 if char == "\x15":
135 buffer = ""
136 needs_render = True
137 continue
138 if char in {"\x7f", "\b"}:
139 buffer = buffer[:-1]
140 needs_render = True
141 continue
142 if char == "\t":
143 try:
144 buffer = autocomplete_slash(buffer, FIRST_RUN_SLASH_COMMANDS)
145 except Exception as exc:
146 _append_notice(notices, f"autocomplete failed: {type(exc).__name__}: {_one_line(exc, 90)}")
147 needs_render = True
148 continue
149 if char == "\x1b":
150 try:
151 view, selected, editing_field, next_job_id, should_exit, buffer = _handle_first_run_escape(
152 stdin_fd,
153 view=view,
154 selected=selected,
155 buffer=buffer,
156 notices=notices,
157 deps=deps,
158 )
159 except Exception as exc:
160 _append_notice(notices, f"navigation failed: {type(exc).__name__}: {_one_line(exc, 90)}")
161 should_exit = False
162 if should_exit:
163 return None
164 editing_field = editing_field or required_first_run_edit_field(view)
165 needs_render = True
166 continue
167 if char.isprintable():
168 buffer += char
169 needs_render = True
170 except KeyboardInterrupt:
171 return None
172 finally:
173 termios.tcsetattr(sys.stdin, termios.TCSADRAIN, old_attrs)
174 print(_frame_exit_sequence(), flush=True)
175 return next_job_id
176
177
178def clamp_selection(selected: int, actions: list[tuple[str, str, str]]) -> int:
179 if not actions:
180 return 0
181 return max(0, min(selected, len(actions) - 1))
182
183
184def _safe_render_frame(
185 deps: FirstRunRuntimeDeps,
186 *,
187 buffer: str,
188 notices: list[str],
189 selected: int,
190 view: str,
191 editing_field: str | None,
192 previous_frame: str,
193) -> str:
194 try:
195 return deps.render_frame(buffer, notices, selected, view, editing_field, previous_frame)
196 except Exception as exc:
197 _append_notice(notices, f"render failed: {type(exc).__name__}: {_one_line(exc, 100)}")
198 frame = _fallback_first_run_frame(buffer=buffer, notices=notices, view=view)
199 print("\033[H" + frame, end="", flush=True)
200 return frame
201
202
203def _fallback_first_run_frame(*, buffer: str, notices: list[str], view: str) -> str:
204 width, height = shutil.get_terminal_size((100, 30))
205 width = max(60, width)
206 lines = [
207 _fit_plain("NIPUX - setup safe mode", width),
208 _fit_plain("=" * width, width),
209 _fit_plain(f"Screen: {view}", width),
210 _fit_plain("A UI render error was caught. You can keep typing; /exit leaves.", width),
211 "",
212 "Recent notices:",
213 ]
214 lines.extend(f"- {_one_line(notice, width - 3)}" for notice in notices[-8:])
215 lines.extend(["", f"> {_one_line(buffer, width - 3)}"])
216 return "\n".join(_fit_plain(line, width) for line in lines[:height])
217
218
219def _submit_first_run_line(
220 buffer: str,
221 *,
222 selected: int,
223 view: str,
224 deps: FirstRunRuntimeDeps,
225) -> tuple[str, str | list[str] | None]:
226 line = buffer.strip()
227 if not line:
228 actions = deps.actions(view)
229 if not actions:
230 return "notice", "This setup step requires an explicit value."
231 return deps.handle_action(actions[clamp_selection(selected, actions)][0])
232 if not line.startswith("/"):
233 return "notice", "Complete the active setup field before continuing."
234 return deps.handle_line(line)
235
236
237def _handle_first_run_escape(
238 stdin_fd: int,
239 *,
240 view: str,
241 selected: int,
242 buffer: str,
243 notices: list[str],
244 deps: FirstRunRuntimeDeps,
245) -> tuple[str, int, str | None, str | None, bool, str]:
246 key, payload = decode_terminal_escape(read_escape_sequence("\x1b", fd=stdin_fd))
247 if key in {"up", "down"} and buffer.startswith("/"):
248 buffer = cycle_slash(buffer, FIRST_RUN_SLASH_COMMANDS, direction=-1 if key == "up" else 1)
249 return view, selected, None, None, False, buffer
250 if key == "up":
251 actions = deps.actions(view)
252 if not actions:
253 return view, selected, None, None, False, buffer
254 return view, (selected - 1) % len(actions), None, None, False, buffer
255 if key == "down":
256 actions = deps.actions(view)
257 if not actions:
258 return view, selected, None, None, False, buffer
259 return view, (selected + 1) % len(actions), None, None, False, buffer
260 if key in {"left", "right"}:
261 actions = deps.actions(view)
262 if not actions:
263 return view, selected, None, None, False, buffer
264 delta = 1 if key == "right" else -1
265 return view, (selected + delta) % len(actions), None, None, False, buffer
266 if key == "click" and isinstance(payload, tuple):
267 clicked = deps.click_action(payload[0], payload[1], view)
268 if clicked is not None:
269 if isinstance(clicked, str):
270 action, payload = deps.handle_action(clicked)
271 next_view, next_selected, editing_field, next_job_id, should_exit = _apply_first_run_action(
272 action,
273 payload,
274 view=view,
275 selected=selected,
276 notices=notices,
277 )
278 return next_view, next_selected, editing_field, next_job_id, should_exit, buffer
279 actions = deps.actions(view)
280 if not actions:
281 return view, selected, None, None, False, buffer
282 action, payload = deps.handle_action(actions[clamp_selection(clicked, actions)][0])
283 next_view, next_selected, editing_field, next_job_id, should_exit = _apply_first_run_action(
284 action,
285 payload,
286 view=view,
287 selected=clicked,
288 notices=notices,
289 )
290 return next_view, next_selected, editing_field, next_job_id, should_exit, buffer
291 drain_pending_input(stdin_fd)
292 return view, selected, None, None, False, buffer
293
294
295def directional_first_run_action(actions: list[tuple[str, str, str]], *, direction: int) -> str | None:
296 """Return the setup-screen action for left/right navigation."""
297
298 if direction >= 0:
299 for key, label, _detail in actions:
300 if key.startswith("view:") and label.lower() in {"begin setup", "continue"}:
301 return key
302 return None
303 for key, label, _detail in reversed(actions):
304 if key.startswith("view:") and label.lower() == "back":
305 return key
306 return None
307
308
309def _apply_first_run_action(
310 action: str,
311 payload: str | list[str] | None,
312 *,
313 view: str,
314 selected: int,
315 notices: list[str],
316) -> tuple[str, int, str | None, str | None, bool]:
317 if action == "view":
318 notices.clear()
319 return str(payload or "start"), 0, None, None, False
320 if action == "exit":
321 return view, selected, None, None, True
322 if action == "clear":
323 notices.clear()
324 return view, selected, None, None, False
325 if action == "open":
326 return view, selected, None, str(payload), False
327 if action == "edit":
328 editing_field = str(payload)
329 _append_notice(notices, f"editing {editing_field}; enter saves, escape cancels", limit=10)
330 return view, selected, editing_field, None, False
331 if isinstance(payload, list):
332 for item in payload:
333 if str(item).strip():
334 _append_notice(notices, str(item), limit=10)
335 elif payload:
336 _append_notice(notices, str(payload), limit=10)
337 return view, selected, None, None, False
338
339
340def _handle_edit_input(
341 char: str,
342 *,
343 buffer: str,
344 editing_field: str,
345 notices: list[str],
346 stdin_fd: int,
347) -> tuple[str, str | None, bool]:
348 if char in {"\r", "\n"}:
349 saved, notice = _save_first_run_edit(editing_field, buffer)
350 _append_notice(notices, notice, limit=10)
351 return ("", None, False) if saved else (buffer, editing_field, False)
352 if char in {"\x04"}:
353 return buffer, editing_field, True
354 if char == "\x03":
355 _append_notice(notices, "cancelled edit", limit=10)
356 return "", editing_field, False
357 if char == "\x15":
358 return "", editing_field, False
359 if char in {"\x7f", "\b"}:
360 return buffer[:-1], editing_field, False
361 if char == "\x1b":
362 key, _payload = decode_terminal_escape(read_escape_sequence(char, fd=stdin_fd))
363 if key == "unknown":
364 _append_notice(notices, "cancelled edit", limit=10)
365 return "", editing_field, False
366 return buffer, editing_field, False
367 if char.isprintable():
368 return buffer + char, editing_field, False
369 return buffer, editing_field, False
370
371
372def required_first_run_edit_field(view: str) -> str | None:
373 return {
374 "endpoint": "model.base_url",
375 "api": "secret:model.api_key",
376 "model": "model.name",
377 }.get(view)
378
379
380def next_first_run_view_after_edit(view: str) -> str | None:
381 return {
382 "endpoint": "api",
383 "api": "model",
384 "model": "access",
385 }.get(view)
386
387
388def _save_first_run_edit(field: str, raw_value: str) -> tuple[bool, str]:
389 value = raw_value.strip()
390 if field == "model.base_url":
391 if not value:
392 return False, "Endpoint URL is required."
393 parsed = urlparse(value)
394 if parsed.scheme not in {"http", "https"} or not parsed.netloc:
395 return False, "Endpoint must be a full http:// or https:// URL."
396 if not parsed.path.rstrip("/").endswith("/v1"):
397 return False, "Endpoint must point at an OpenAI-compatible /v1 path."
398 return True, inline_setting_notice(field, value)
399 if field == "model.name":
400 if not value:
401 return False, "Model id is required."
402 return True, inline_setting_notice(field, value)
403 if field == "secret:model.api_key":
404 if not value:
405 return False, "API key is required, or type skip for a local endpoint."
406 if value.lower() in {"skip", "none", "local"}:
407 config = load_config()
408 if not _is_local_endpoint(config.model.base_url):
409 return False, "Only local endpoints can skip the API key."
410 return True, "skipped API key for local endpoint"
411 return True, inline_setting_notice(field, value)
412 return True, inline_setting_notice(field, value)
413
414
415def _is_local_endpoint(value: str) -> bool:
416 host = (urlparse(value).hostname or "").lower()
417 return host in {"localhost", "127.0.0.1", "::1", "0.0.0.0"} or host.endswith(".local")
418
419
420def _append_notice(notices: list[str], message: str, *, limit: int = 10) -> None:
421 notices.append(message)
422 notices[:] = notices[-limit:]
423
424
425def _one_line(value: object, width: int) -> str:
426 text = " ".join(str(value).split())
427 if len(text) <= width:
428 return text
429 return text[: max(0, width - 3)] + "..."
430
431
432def _fit_plain(text: object, width: int) -> str:
433 content = str(text)
434 if len(content) > width:
435 content = _one_line(content, width)
436 return content + " " * max(0, width - len(content))
nipux_cli/first_run_tui.py 571 lines
1"""First-run terminal UI rendering for Nipux."""
2
3from __future__ import annotations
4
5from typing import Any
6
7from nipux_cli.config import AppConfig
8from nipux_cli.settings import (
9 config_field_value,
10 edit_target_hint,
11 edit_target_label,
12 edit_target_masks_input,
13)
14from nipux_cli.tui_layout import _compose_bar
15from nipux_cli.tui_style import (
16 _accent,
17 _bold,
18 _center_ansi,
19 _fit_ansi,
20 _muted,
21 _one_line,
22 _style,
23 _strip_ansi,
24 _themed_lines,
25)
26
27
28INSTALL_FLOW = [
29 ("endpoint", "Endpoint", "OpenAI-compatible /v1"),
30 ("api", "API key", "secret stored in .env"),
31 ("model", "Model", "choose the model id"),
32 ("access", "Tools", "browser, web, CLI, files"),
33 ("doctor", "Doctor", "check setup"),
34]
35
36
37FIRST_RUN_ACTIONS_BY_VIEW: dict[str, list[tuple[str, str, str]]] = {
38 "endpoint": [],
39 "api": [],
40 "model": [],
41 "access": [
42 ("toggle:tools.browser", "Browser", "automation"),
43 ("toggle:tools.web", "Web", "search/extract"),
44 ("toggle:tools.shell", "CLI", "terminal commands"),
45 ("toggle:tools.files", "Files", "write files"),
46 ("view:doctor", "Continue", "run checks"),
47 ],
48 "doctor": [
49 ("doctor", "Run doctor", "verify setup"),
50 ("open_workspace", "Open chat", "talk to Nipux"),
51 ],
52}
53
54
55def build_first_run_frame(
56 input_buffer: str,
57 notices: list[str],
58 *,
59 width: int,
60 height: int,
61 selected: int = 0,
62 view: str = "start",
63 editing_field: str | None = None,
64 config: AppConfig,
65 daemon_text: str,
66 jobs: list[dict[str, Any]],
67 home: str,
68 config_path: str,
69) -> str:
70 del daemon_text
71 width = max(92, width)
72 height = max(22, height)
73 view = _normalize_first_run_view(view)
74 selected = _clamp_first_run_selection(selected, view)
75 header: list[str] = []
76 if editing_field:
77 hint = _first_run_edit_hint(editing_field, config)
78 prompt_label = _first_run_prompt_label(editing_field)
79 else:
80 hint = _first_run_hint(view)
81 prompt_label = "Setup"
82 suggestions: list[str] = []
83 compose_lines = _compose_bar(
84 input_buffer,
85 width=width,
86 hint=hint,
87 suggestions=suggestions,
88 prompt_label=prompt_label,
89 title="setup",
90 mask_input=edit_target_masks_input(editing_field),
91 )
92 footer_rows = len(compose_lines)
93 body_rows = max(10, height - len(header) - 1 - footer_rows)
94 body_lines = _wizard_body_lines(
95 notices=notices,
96 jobs=jobs,
97 config=config,
98 home=home,
99 config_path=config_path,
100 selected=selected,
101 view=view,
102 width=width,
103 rows=body_rows,
104 )
105 lines = [*header, *body_lines, *compose_lines]
106 return "\n".join(first_run_themed_lines(lines[:height], width=width))
107
108
109def first_run_columns(width: int) -> tuple[int, int]:
110 right_width = min(max(40, int(width * 0.34)), 54)
111 left_width = max(48, width - right_width - 3)
112 if left_width < 48:
113 left_width = 48
114 right_width = max(36, width - left_width - 3)
115 return left_width, right_width
116
117
118def first_run_actions(view: str) -> list[tuple[str, str, str]]:
119 return FIRST_RUN_ACTIONS_BY_VIEW[_normalize_first_run_view(view)]
120
121
122def first_run_themed_lines(lines: list[str], *, width: int) -> list[str]:
123 return _themed_lines(lines, width=width)
124
125
126def _wizard_body_lines(
127 *,
128 notices: list[str],
129 jobs: list[dict[str, Any]],
130 config: AppConfig,
131 home: str,
132 config_path: str,
133 selected: int,
134 view: str,
135 width: int,
136 rows: int,
137) -> list[str]:
138 if view == "model":
139 lines = _model_page_lines(config=config, selected=selected, width=width)
140 elif view == "endpoint":
141 lines = _endpoint_page_lines(config=config, selected=selected, width=width)
142 elif view == "api":
143 lines = _api_page_lines(config=config, selected=selected, width=width)
144 elif view == "access":
145 lines = _access_page_lines(config=config, selected=selected, width=width)
146 elif view == "doctor":
147 lines = _doctor_page_lines(config=config, selected=selected, width=width)
148 else:
149 lines = _endpoint_page_lines(config=config, selected=selected, width=width)
150 if notices:
151 lines = _append_notice_block(lines, notices, width=width, rows=rows)
152 return _fit_page(lines, width=width, rows=rows)
153
154
155def _model_page_lines(*, config: AppConfig, selected: int, width: int) -> list[str]:
156 return [
157 *_step_header("model", width=width),
158 "",
159 _center_ansi(_muted(_step_count_label("model")), width),
160 _center_ansi(_bold("Enter the model id"), width),
161 _center_ansi(_muted("This exact model powers chat replies and background workers."), width),
162 "",
163 *_panel(
164 "MODEL ID",
165 [
166 _bold(_accent("waiting for input")),
167 _muted(f"current config: {_one_line(config.model.model, max(16, width - 30))}"),
168 _muted("Type the model id below. Blank input is not accepted."),
169 ],
170 width=min(84, width - 8),
171 page_width=width,
172 ),
173 "",
174 _center_ansi(_muted("Press Enter after typing the model. Setup moves forward automatically."), width),
175 ]
176
177
178def _endpoint_page_lines(*, config: AppConfig, selected: int, width: int) -> list[str]:
179 return [
180 *_step_header("endpoint", width=width),
181 "",
182 _center_ansi(_muted(_step_count_label("endpoint")), width),
183 _center_ansi(_bold("Enter the endpoint first"), width),
184 _center_ansi(_muted("Use an OpenAI-compatible /v1 endpoint. Local or hosted both work."), width),
185 "",
186 *_form_panel(
187 "BASE URL",
188 f"waiting for input · current config: {config.model.base_url}",
189 "required",
190 width=min(90, width - 8),
191 page_width=width,
192 ),
193 "",
194 _center_ansi(_muted("Example formats: http://localhost:8000/v1 or https://provider.example/v1"), width),
195 ]
196
197
198def _api_page_lines(*, config: AppConfig, selected: int, width: int) -> list[str]:
199 key_state = "set" if config.model.api_key else "missing"
200 key_color = _style(key_state, "32" if key_state == "set" else "33")
201 return [
202 *_step_header("api", width=width),
203 "",
204 _center_ansi(_muted(_step_count_label("api")), width),
205 _center_ansi(_bold("Enter the API key"), width),
206 _center_ansi(_muted("Hosted endpoints need a key. For a local endpoint, type skip."), width),
207 "",
208 *_panel(
209 "API KEY",
210 [
211 f"{_muted('state')} {key_color}",
212 f"{_muted('env')} {_bold(config.model.api_key_env)}",
213 _muted("Stored in the local Nipux env file, never in repository config."),
214 ],
215 width=min(84, width - 8),
216 page_width=width,
217 ),
218 "",
219 _center_ansi(_muted("Blank input is not accepted; type skip only when the endpoint is local."), width),
220 ]
221
222
223def _access_page_lines(*, config: AppConfig, selected: int, width: int) -> list[str]:
224 rows = [
225 _access_row("browser", config.tools.browser, "persistent browser automation"),
226 _access_row("web", config.tools.web, "web search and page extraction"),
227 _access_row("CLI", config.tools.shell, "bounded terminal commands"),
228 _access_row("files", config.tools.files, "write deliverables into the workspace"),
229 ]
230 return [
231 *_step_header("access", width=width),
232 "",
233 _center_ansi(_muted(_step_count_label("access")), width),
234 _center_ansi(_bold("Choose tool access"), width),
235 _center_ansi(_muted("These switches control the generic tools workers can call for any job."), width),
236 "",
237 *_panel("TOOL ACCESS", rows, width=min(90, width - 8), page_width=width),
238 "",
239 *_action_cards(first_run_actions("access"), selected=selected, config=config, width=width),
240 ]
241
242
243def _doctor_page_lines(*, config: AppConfig, selected: int, width: int) -> list[str]:
244 checks = [
245 ("state directory", "writable under ~/.nipux or NIPUX_HOME"),
246 ("database", "SQLite state store can open"),
247 ("model config", f"{config.model.model} at {config.model.base_url}"),
248 (
249 "tools",
250 f"browser={config.tools.browser} web={config.tools.web} CLI={config.tools.shell} files={config.tools.files}",
251 ),
252 ]
253 rows = [f"{_accent('✓')} {_fit_ansi(name, 18)} {_muted(detail)}" for name, detail in checks]
254 return [
255 *_step_header("doctor", width=width),
256 "",
257 _center_ansi(_muted(_step_count_label("doctor")), width),
258 _center_ansi(_bold("Run checks"), width),
259 _center_ansi(_muted("Doctor calls the configured model before the workspace opens."), width),
260 "",
261 *_panel("DOCTOR", rows, width=min(90, width - 8), page_width=width),
262 "",
263 _center_ansi(_muted("If a check fails, edit with /base-url, /api-key, or /model, then run Doctor again."), width),
264 "",
265 *_action_cards(first_run_actions("doctor"), selected=selected, config=config, width=width),
266 ]
267
268
269def _stepper_lines(view: str, *, config: AppConfig, width: int) -> list[str]:
270 lines: list[str] = []
271 for key, label, _detail in INSTALL_FLOW:
272 marker = _accent("●") if key == view else _muted("○")
273 state = _step_state(key, config=config)
274 lines.append(_fit_ansi(f"{marker} {_fit_ansi(label, 10)} {_muted(state)}", width))
275 return lines
276
277
278def _step_header(view: str, *, width: int) -> list[str]:
279 parts = []
280 for index, (key, label, _detail) in enumerate(INSTALL_FLOW, start=1):
281 marker = _accent("●") if key == view else _muted("○")
282 text = _bold(label) if key == view else _muted(label)
283 parts.append(f"{marker} {index} {text}")
284 return [
285 _center_ansi(" ".join(parts), width),
286 _muted("─" * width),
287 ]
288
289
290def _action_cards(
291 actions: list[tuple[str, str, str]],
292 *,
293 selected: int,
294 config: AppConfig,
295 width: int,
296) -> list[str]:
297 if not actions:
298 return []
299 gap = 2
300 card_width = max(18, min(34, (width - (len(actions) - 1) * gap - 4) // len(actions)))
301 cards = [_action_tile(index, action, selected=selected, config=config, width=card_width) for index, action in enumerate(actions)]
302 rows = _join_many_cards(cards, gap=gap, width=width)
303 return [_center_ansi(row.rstrip(), width) for row in rows]
304
305
306def _action_tile(
307 index: int,
308 action: tuple[str, str, str],
309 *,
310 selected: int,
311 config: AppConfig,
312 width: int,
313) -> list[str]:
314 key, label, detail = action
315 active = index == selected
316 border = _accent if active else _muted
317 marker = _accent("›") if active else _muted(" ")
318 label_text = _bold(label) if active else label
319 value = _action_value(key, detail, config=config)
320 inner = max(8, width - 4)
321 return [
322 border("╭" + "─" * (width - 2) + "╮"),
323 border("│ ") + _fit_ansi(f"{marker} {index + 1}. {label_text}", inner) + border(" │"),
324 border("│ ") + _fit_ansi(_muted(_one_line(value, inner)), inner) + border(" │"),
325 border("╰" + "─" * (width - 2) + "╯"),
326 ]
327
328
329def _panel(title: str, body: list[str], *, width: int, page_width: int | None = None) -> list[str]:
330 width = max(32, width)
331 inner = max(8, width - 4)
332 title_text = f" {title} "
333 lines = [_muted("╭─" + title_text + "─" * max(0, width - len(title_text) - 3) + "╮")]
334 for item in body:
335 lines.append(_muted("│ ") + _fit_ansi(item, inner) + _muted(" │"))
336 lines.append(_muted("╰" + "─" * (width - 2) + "╯"))
337 return [_center_ansi(line, page_width or width) for line in lines]
338
339
340def _form_panel(title: str, value: str, command: str, *, width: int, page_width: int | None = None) -> list[str]:
341 return _panel(
342 title,
343 [
344 _bold(_accent(_one_line(value, max(16, width - 10)))),
345 _muted(f"{command}; type the value in the setup input below"),
346 ],
347 width=width,
348 page_width=page_width,
349 )
350
351
352def _choice_card(title: str, copy: str, value: str, *, active: bool, width: int) -> list[str]:
353 border = _accent if active else _muted
354 marker = _accent("● selected") if active else _muted("○ available")
355 inner = max(8, width - 4)
356 return [
357 border("╭" + "─" * (width - 2) + "╮"),
358 border("│ ") + _fit_ansi(_bold(title), inner) + border(" │"),
359 border("│ ") + _fit_ansi(marker, inner) + border(" │"),
360 border("│ ") + _fit_ansi(_muted(copy), inner) + border(" │"),
361 border("│ ") + _fit_ansi(_accent(value), inner) + border(" │"),
362 border("╰" + "─" * (width - 2) + "╯"),
363 ]
364
365
366def _join_cards(left: list[str], right: list[str], *, width: int) -> list[str]:
367 gap = " "
368 rows = []
369 for index in range(max(len(left), len(right))):
370 left_line = left[index] if index < len(left) else " " * len(_strip_ansi(left[0]))
371 right_line = right[index] if index < len(right) else " " * len(_strip_ansi(right[0]))
372 rows.append(_center_ansi(left_line + gap + right_line, width))
373 return rows
374
375
376def _join_many_cards(cards: list[list[str]], *, gap: int, width: int) -> list[str]:
377 rows: list[str] = []
378 max_rows = max(len(card) for card in cards)
379 gap_text = " " * gap
380 for row_index in range(max_rows):
381 row_parts = []
382 for card in cards:
383 fallback_width = len(_strip_ansi(card[0]))
384 row_parts.append(card[row_index] if row_index < len(card) else " " * fallback_width)
385 rows.append(gap_text.join(row_parts))
386 return [_fit_ansi(row, width) for row in rows]
387
388
389def _append_notice_block(lines: list[str], notices: list[str], *, width: int, rows: int) -> list[str]:
390 budget = max(3, min(6, rows // 4))
391 notice_lines = [_bold("Transcript")]
392 for notice in notices[-budget:]:
393 notice_lines.append(_fit_ansi(_accent("› ") + _one_line(notice, width - 4), width))
394 if len(lines) + len(notice_lines) + 1 <= rows:
395 return [*lines, "", *notice_lines]
396 keep = max(0, rows - len(notice_lines) - 1)
397 return [*lines[:keep], "", *notice_lines]
398
399
400def _fit_page(lines: list[str], *, width: int, rows: int) -> list[str]:
401 fitted = [_fit_ansi(line, width) for line in lines]
402 if len(fitted) < rows:
403 fitted.extend([" " * width for _ in range(rows - len(fitted))])
404 return fitted[:rows]
405
406
407def _action_line(
408 index: int,
409 action: tuple[str, str, str],
410 *,
411 selected: int,
412 config: AppConfig,
413 width: int,
414) -> str:
415 key, label, detail = action
416 marker = _accent("›") if index == selected else _muted(" ")
417 label_text = _bold(label) if index == selected else label
418 value = _action_value(key, detail, config=config)
419 return _fit_ansi(
420 f"{marker} {index + 1}. {_fit_ansi(label_text, 15)} {_muted(_one_line(value, max(8, width - 21)))}",
421 width,
422 )
423
424
425def _screen_value_lines(view: str, *, config: AppConfig, width: int) -> list[str]:
426 if view == "model":
427 return [_large_value("model", config.model.model, width=width)]
428 if view == "endpoint":
429 return [_large_value("endpoint", config.model.base_url, width=width)]
430 if view == "api":
431 key_state = "set" if config.model.api_key else "missing"
432 return [
433 _large_value("key", key_state, width=width),
434 _muted(f"Stored under {config.model.api_key_env} in ~/.nipux/.env."),
435 ]
436 if view == "doctor":
437 return [
438 _large_value("check", "ready to run", width=width),
439 _muted("Doctor verifies runtime checks, then sends a small chat request to the configured model."),
440 ]
441 return []
442
443
444def _large_value(label: str, value: str, *, width: int) -> str:
445 label_text = _muted(f"{label} ")
446 return _fit_ansi(label_text + _bold(_accent(_one_line(value, max(12, width - len(label) - 2)))), width)
447
448
449def _action_value(key: str, detail: str, *, config: AppConfig) -> str:
450 if key.startswith("view:"):
451 return detail
452 if key.startswith("edit:"):
453 field = key.split(":", 1)[1]
454 return str(config_field_value(field, config))
455 if key.startswith("toggle:"):
456 field = key.split(":", 1)[1]
457 return "enabled" if bool(config_field_value(field, config)) else "disabled"
458 if key == "secret:model.api_key":
459 return "stored in .env" if config.model.api_key else f"uses {config.model.api_key_env}"
460 if key == "preset:local":
461 return "http://localhost:8000/v1"
462 return detail
463
464
465def _step_state(key: str, *, config: AppConfig) -> str:
466 if key == "model":
467 return _one_line(config.model.model, 20)
468 if key == "endpoint":
469 return _one_line(config.model.base_url, 20)
470 if key == "api":
471 return "ready" if config.model.api_key or _is_local_endpoint(config.model.base_url) else "missing"
472 if key == "access":
473 enabled = sum(bool(value) for value in (config.tools.browser, config.tools.web, config.tools.shell, config.tools.files))
474 return f"{enabled}/4 enabled"
475 if key == "doctor":
476 return "pending"
477 return ""
478
479
480def _first_run_hint(view: str) -> str:
481 if view == "endpoint":
482 return "Required: type an OpenAI-compatible endpoint URL, then Enter."
483 if view == "api":
484 return "Required: type an API key, or type skip for a local endpoint."
485 if view == "model":
486 return "Required: type the model id accepted by this endpoint."
487 if view == "access":
488 return "Use arrows/clicks to toggle tools, then choose Continue."
489 if view == "doctor":
490 return "Run Doctor, then open the chat workspace."
491 return "Complete setup before the workspace opens."
492
493
494def _first_run_edit_hint(field: str, config: AppConfig) -> str:
495 if field == "model.base_url":
496 return "Endpoint URL required. Enter saves and advances. Blank input is blocked."
497 if field == "secret:model.api_key":
498 return "API key required for hosted endpoints. For local endpoints, type skip."
499 if field == "model.name":
500 return "Model id required. Enter saves and advances. Blank input is blocked."
501 return edit_target_hint(field, config)
502
503
504def _first_run_prompt_label(field: str) -> str:
505 if field == "model.base_url":
506 return "Endpoint"
507 if field == "secret:model.api_key":
508 return "API key"
509 if field == "model.name":
510 return "Model"
511 return edit_target_label(field)
512
513
514def _left_title(view: str) -> str:
515 return _screen_heading(view)
516
517
518def _screen_heading(view: str) -> str:
519 return {
520 "model": "Choose model",
521 "endpoint": "Connect endpoint",
522 "api": "Add API key",
523 "access": "Choose tools",
524 "doctor": "Run checks",
525 }.get(view, "Connect endpoint")
526
527
528def _screen_copy(view: str) -> str:
529 return {
530 "model": "The chat controller and workers use this model unless you change it later.",
531 "endpoint": "Use any OpenAI-compatible /v1 endpoint. This stays generic and provider-neutral.",
532 "api": "Hosted providers need a secret. Local endpoints can continue without one.",
533 "access": "Enable the generic tools this worker can use for any job.",
534 "doctor": "Verify the configured model, then open the main chat workspace.",
535 }.get(view, "Nipux installs through this full-screen setup.")
536
537
538def _install_summary(config: AppConfig, *, width: int) -> str:
539 connector = "local connector" if _is_local_endpoint(config.model.base_url) else "hosted connector"
540 text = f"{connector} · {config.model.model} · {config.model.base_url}"
541 return _muted(_one_line(text, width))
542
543
544def _normalize_first_run_view(view: str) -> str:
545 return view if view in FIRST_RUN_ACTIONS_BY_VIEW else "endpoint"
546
547
548def _step_count_label(view: str) -> str:
549 keys = [key for key, _label, _detail in INSTALL_FLOW]
550 try:
551 index = keys.index(view) + 1
552 except ValueError:
553 index = 1
554 return f"STEP {index} / {len(INSTALL_FLOW)}"
555
556
557def _access_row(name: str, enabled: bool, detail: str) -> str:
558 marker = _accent("on ") if enabled else _muted("off")
559 return f"{_fit_ansi(name, 10)} {marker} {_muted(detail)}"
560
561
562def _is_local_endpoint(value: str) -> bool:
563 lowered = value.lower()
564 return "localhost" in lowered or "127.0.0.1" in lowered or lowered.startswith("http://0.0.0.0")
565
566
567def _clamp_first_run_selection(selected: int, view: str) -> int:
568 actions = first_run_actions(view)
569 if not actions:
570 return 0
571 return max(0, min(selected, len(actions) - 1))
nipux_cli/frame_snapshot.py 183 lines
1"""Data loading contract for the interactive Nipux terminal frame."""
2
3from __future__ import annotations
4
5from typing import Any
6
7from nipux_cli.config import AppConfig
8from nipux_cli.daemon import daemon_lock_status
9from nipux_cli.db import AgentDB
10from nipux_cli.tui_outcomes import SUMMARY_EVENT_TYPES, SUMMARY_TOOL_EVENT_TYPES, is_summary_event_candidate
11
12
13WORKSPACE_CHAT_ID = "__workspace__"
14
15
16def load_frame_snapshot(
17 db: AgentDB,
18 config: AppConfig,
19 job_id: str,
20 *,
21 default_job_id: str | None = None,
22 history_limit: int = 12,
23 workspace_events: list[dict[str, Any]] | None = None,
24) -> dict[str, Any]:
25 """Return the compact state bundle rendered by the chat TUI."""
26
27 resolved_job_id = job_id or default_job_id
28 if resolved_job_id == WORKSPACE_CHAT_ID:
29 return load_workspace_frame_snapshot(
30 db,
31 config,
32 focused_job_id=default_job_id,
33 history_limit=history_limit,
34 workspace_events=workspace_events or [],
35 )
36 job = db.get_job(resolved_job_id)
37 jobs = db.list_jobs()
38 token_usage = db.job_token_usage(resolved_job_id)
39 token_usage["input_cost_per_million"] = config.model.input_cost_per_million
40 token_usage["output_cost_per_million"] = config.model.output_cost_per_million
41 summary_events = _summary_events(db, resolved_job_id, history_limit=history_limit)
42 return {
43 "job_id": resolved_job_id,
44 "job": job,
45 "jobs": jobs,
46 "steps": db.list_steps(job_id=resolved_job_id, limit=80),
47 "artifacts": db.list_artifacts(resolved_job_id, limit=8),
48 "job_artifacts": {
49 str(item["id"]): db.list_artifacts(str(item["id"]), limit=3)
50 for item in jobs[:6]
51 if item.get("id")
52 },
53 "job_summary_events": {
54 str(item["id"]): _summary_events(db, str(item["id"]), history_limit=3)
55 for item in jobs[:6]
56 if item.get("id")
57 },
58 "job_counts": {
59 str(item["id"]): db.job_record_counts(str(item["id"]))
60 for item in jobs[:6]
61 if item.get("id")
62 },
63 "memory_entries": db.list_memory(resolved_job_id)[:8],
64 "events": db.list_events(job_id=resolved_job_id, limit=max(history_limit * 16, 240)),
65 "summary_events": summary_events,
66 "daemon": daemon_lock_status(config.runtime.home / "agentd.lock"),
67 "model": config.model.model,
68 "base_url": config.model.base_url,
69 "context_length": config.model.context_length,
70 "token_usage": token_usage,
71 "counts": db.job_record_counts(resolved_job_id),
72 }
73
74
75def load_workspace_frame_snapshot(
76 db: AgentDB,
77 config: AppConfig,
78 *,
79 focused_job_id: str | None = None,
80 history_limit: int = 12,
81 workspace_events: list[dict[str, Any]] | None = None,
82) -> dict[str, Any]:
83 """Return the chat/control frame before any worker job is focused."""
84
85 jobs = db.list_jobs()
86 events = list(workspace_events or [])[-max(history_limit * 8, 80) :]
87 focused_job = _safe_job(db, focused_job_id)
88 focused_id = str(focused_job.get("id") or "") if focused_job else ""
89 token_usage = _workspace_token_usage(config)
90 if focused_id:
91 token_usage = db.job_token_usage(focused_id)
92 token_usage["input_cost_per_million"] = config.model.input_cost_per_million
93 token_usage["output_cost_per_million"] = config.model.output_cost_per_million
94 workspace_job = {
95 "id": WORKSPACE_CHAT_ID,
96 "title": "Nipux",
97 "objective": "Chat with Nipux to create, start, inspect, and steer long-running worker jobs.",
98 "kind": "workspace",
99 "status": "ready",
100 "metadata": {},
101 }
102 right_job = focused_job or workspace_job
103 return {
104 "job_id": WORKSPACE_CHAT_ID,
105 "job": workspace_job,
106 "right_job": right_job,
107 "right_job_id": focused_id or WORKSPACE_CHAT_ID,
108 "jobs": jobs,
109 "steps": db.list_steps(job_id=focused_id, limit=80) if focused_id else [],
110 "artifacts": db.list_artifacts(focused_id, limit=8) if focused_id else [],
111 "job_artifacts": {
112 str(item["id"]): db.list_artifacts(str(item["id"]), limit=3)
113 for item in jobs[:6]
114 if item.get("id")
115 },
116 "job_summary_events": {
117 str(item["id"]): _summary_events(db, str(item["id"]), history_limit=3)
118 for item in jobs[:6]
119 if item.get("id")
120 },
121 "job_counts": {
122 str(item["id"]): db.job_record_counts(str(item["id"]))
123 for item in jobs[:6]
124 if item.get("id")
125 },
126 "memory_entries": db.list_memory(focused_id)[:8] if focused_id else [],
127 "events": events,
128 "right_events": db.list_events(job_id=focused_id, limit=max(history_limit * 16, 240)) if focused_id else [],
129 "summary_events": _summary_events(db, focused_id, history_limit=history_limit) if focused_id else [],
130 "daemon": daemon_lock_status(config.runtime.home / "agentd.lock"),
131 "model": config.model.model,
132 "base_url": config.model.base_url,
133 "context_length": config.model.context_length,
134 "token_usage": token_usage,
135 "counts": db.job_record_counts(focused_id) if focused_id else {"steps": 0, "artifacts": 0, "memory": 0, "events": len(events)},
136 }
137
138
139def _safe_job(db: AgentDB, job_id: str | None) -> dict[str, Any] | None:
140 if not job_id:
141 return None
142 try:
143 return db.get_job(job_id)
144 except Exception:
145 return None
146
147
148def _workspace_token_usage(config: AppConfig) -> dict[str, Any]:
149 return {
150 "prompt_tokens": 0,
151 "completion_tokens": 0,
152 "total_tokens": 0,
153 "cost": 0.0,
154 "calls": 0,
155 "has_cost": False,
156 "input_cost_per_million": config.model.input_cost_per_million,
157 "output_cost_per_million": config.model.output_cost_per_million,
158 }
159
160
161def _summary_events(db: AgentDB, job_id: str, *, history_limit: int) -> list[dict[str, Any]]:
162 durable_events = db.list_events(
163 job_id=job_id,
164 limit=max(history_limit * 24, 360),
165 event_types=SUMMARY_EVENT_TYPES,
166 )
167 tool_events = [
168 event
169 for event in db.list_events(
170 job_id=job_id,
171 limit=max(history_limit * 10, 160),
172 event_types=SUMMARY_TOOL_EVENT_TYPES,
173 )
174 if is_summary_event_candidate(event)
175 ]
176 merged: dict[str, dict[str, Any]] = {}
177 for event in [*durable_events, *tool_events]:
178 event_id = str(event.get("id") or "")
179 if event_id:
180 merged[event_id] = event
181 else:
182 merged[f"{event.get('created_at')}-{event.get('event_type')}-{event.get('title')}-{len(merged)}"] = event
183 return sorted(merged.values(), key=lambda event: (str(event.get("created_at") or ""), str(event.get("id") or "")))
nipux_cli/llm.py 288 lines
1"""LLM provider adapters for one bounded worker step."""
2
3from __future__ import annotations
4
5import json
6import urllib.error
7import urllib.parse
8import urllib.request
9from dataclasses import dataclass, field
10from typing import Any, Protocol
11
12from openai import OpenAI
13
14from nipux_cli.config import ModelConfig
15
16
17@dataclass(frozen=True)
18class ToolCall:
19 name: str
20 arguments: dict[str, Any] = field(default_factory=dict)
21 id: str = ""
22
23
24@dataclass(frozen=True)
25class LLMResponse:
26 content: str = ""
27 tool_calls: list[ToolCall] = field(default_factory=list)
28 usage: dict[str, Any] = field(default_factory=dict)
29 model: str = ""
30 response_id: str = ""
31
32
33class LLMResponseError(RuntimeError):
34 """Raised when a provider returns an OpenAI-shaped response without choices."""
35
36 def __init__(self, message: str, *, payload: dict[str, Any] | None = None):
37 super().__init__(message)
38 self.payload = payload or {}
39
40
41class StepLLM(Protocol):
42 def next_action(self, *, messages: list[dict[str, Any]], tools: list[dict[str, Any]]) -> LLMResponse:
43 ...
44
45
46class OpenAIChatLLM:
47 """OpenAI-compatible chat-completions adapter."""
48
49 tool_repair = True
50
51 def __init__(self, config: ModelConfig):
52 self.config = config
53 headers = {}
54 if "openrouter.ai" in config.base_url:
55 headers = {
56 "HTTP-Referer": "https://github.com/nipuxx/agent-cli",
57 "X-Title": "Nipux CLI",
58 }
59 self._openai = OpenAI(
60 api_key=config.api_key or "local-no-key",
61 base_url=config.base_url,
62 timeout=config.request_timeout_seconds,
63 max_retries=0,
64 default_headers=headers or None,
65 )
66
67 def next_action(self, *, messages: list[dict[str, Any]], tools: list[dict[str, Any]]) -> LLMResponse:
68 request: dict[str, Any] = {
69 "model": self.config.model,
70 "messages": messages,
71 "tools": tools,
72 }
73 if tools:
74 request["tool_choice"] = "required"
75 try:
76 response = self._openai.chat.completions.create(**request)
77 except Exception as exc:
78 if "tool_choice" not in request or not _is_unsupported_tool_choice_error(exc):
79 raise
80 fallback_request = dict(request)
81 fallback_request.pop("tool_choice", None)
82 response = self._openai.chat.completions.create(**fallback_request)
83 choices = response.choices or []
84 if not choices:
85 payload = _response_payload(response)
86 error = payload.get("error") if isinstance(payload.get("error"), dict) else {}
87 detail = error.get("message") or payload.get("message") or "provider returned no choices"
88 raise LLMResponseError(str(detail), payload=payload)
89 message = choices[0].message
90 calls: list[ToolCall] = []
91 for call in message.tool_calls or []:
92 raw_args = call.function.arguments or "{}"
93 try:
94 parsed = json.loads(raw_args)
95 except json.JSONDecodeError:
96 parsed = {}
97 calls.append(ToolCall(name=call.function.name, arguments=parsed, id=call.id or ""))
98 content = message.content or ""
99 response_id = _response_id(response)
100 usage = _response_usage(response, messages=messages, content=content, tool_calls=calls)
101 usage = _enrich_openrouter_generation_usage(
102 usage,
103 response_id=response_id,
104 base_url=self.config.base_url,
105 api_key=self.config.api_key,
106 )
107 return LLMResponse(
108 content=content,
109 tool_calls=calls,
110 usage=usage,
111 model=_response_model(response),
112 response_id=response_id,
113 )
114
115 def complete(self, *, messages: list[dict[str, Any]]) -> str:
116 return self.complete_response(messages=messages).content
117
118 def complete_response(self, *, messages: list[dict[str, Any]]) -> LLMResponse:
119 response = self._openai.chat.completions.create(
120 model=self.config.model,
121 messages=messages,
122 )
123 choices = response.choices or []
124 if not choices:
125 payload = _response_payload(response)
126 error = payload.get("error") if isinstance(payload.get("error"), dict) else {}
127 detail = error.get("message") or payload.get("message") or "provider returned no choices"
128 raise LLMResponseError(str(detail), payload=payload)
129 content = choices[0].message.content or ""
130 response_id = _response_id(response)
131 usage = _response_usage(response, messages=messages, content=content, tool_calls=[])
132 usage = _enrich_openrouter_generation_usage(
133 usage,
134 response_id=response_id,
135 base_url=self.config.base_url,
136 api_key=self.config.api_key,
137 )
138 return LLMResponse(
139 content=content,
140 usage=usage,
141 model=_response_model(response),
142 response_id=response_id,
143 )
144
145
146class ScriptedLLM:
147 """Tiny deterministic LLM used by tests and CLI dry runs."""
148
149 def __init__(self, responses: list[LLMResponse]):
150 self.responses = list(responses)
151
152 def next_action(self, *, messages: list[dict[str, Any]], tools: list[dict[str, Any]]) -> LLMResponse:
153 del messages, tools
154 if not self.responses:
155 return LLMResponse(content="No scripted response left.")
156 return self.responses.pop(0)
157
158
159def _response_payload(response: Any) -> dict[str, Any]:
160 if hasattr(response, "model_dump"):
161 dumped = response.model_dump()
162 return dumped if isinstance(dumped, dict) else {"response": dumped}
163 if hasattr(response, "to_dict"):
164 dumped = response.to_dict()
165 return dumped if isinstance(dumped, dict) else {"response": dumped}
166 return {"response": repr(response)}
167
168
169def _response_usage(
170 response: Any,
171 *,
172 messages: list[dict[str, Any]],
173 content: str,
174 tool_calls: list[ToolCall],
175) -> dict[str, Any]:
176 payload = _response_payload(response)
177 usage = payload.get("usage")
178 if isinstance(usage, dict):
179 normalized = dict(usage)
180 normalized["estimated"] = False
181 return normalized
182 usage_obj = getattr(response, "usage", None)
183 if usage_obj is not None:
184 dumped = usage_obj.model_dump() if hasattr(usage_obj, "model_dump") else getattr(usage_obj, "__dict__", {})
185 if isinstance(dumped, dict) and dumped:
186 normalized = dict(dumped)
187 normalized["estimated"] = False
188 return normalized
189 prompt_tokens = _estimate_token_count(json.dumps(messages, ensure_ascii=False, default=str))
190 tool_text = json.dumps([{"name": call.name, "arguments": call.arguments} for call in tool_calls], ensure_ascii=False, default=str)
191 completion_tokens = _estimate_token_count(content + tool_text)
192 return {
193 "prompt_tokens": prompt_tokens,
194 "completion_tokens": completion_tokens,
195 "total_tokens": prompt_tokens + completion_tokens,
196 "estimated": True,
197 }
198
199
200def _enrich_openrouter_generation_usage(
201 usage: dict[str, Any],
202 *,
203 response_id: str,
204 base_url: str,
205 api_key: str,
206) -> dict[str, Any]:
207 if usage.get("cost") is not None or not response_id or not api_key:
208 return usage
209 if "openrouter.ai" not in base_url:
210 return usage
211 parsed = urllib.parse.urlparse(base_url)
212 root = f"{parsed.scheme or 'https'}://{parsed.netloc or 'openrouter.ai'}"
213 url = f"{root}/api/v1/generation?id={urllib.parse.quote(response_id)}"
214 request = urllib.request.Request(url, headers={"Authorization": f"Bearer {api_key}"})
215 try:
216 with urllib.request.urlopen(request, timeout=5) as response:
217 payload = json.loads(response.read().decode("utf-8", errors="replace"))
218 except (OSError, urllib.error.URLError, urllib.error.HTTPError, json.JSONDecodeError):
219 return usage
220 data = payload.get("data") if isinstance(payload, dict) else None
221 if not isinstance(data, dict):
222 return usage
223 enriched = dict(usage)
224 cost = _safe_float(data.get("total_cost") or data.get("cost"))
225 if cost is not None:
226 enriched["cost"] = cost
227 prompt = _safe_int(data.get("native_tokens_prompt") or data.get("tokens_prompt"))
228 completion = _safe_int(data.get("native_tokens_completion") or data.get("tokens_completion"))
229 total = _safe_int(data.get("native_tokens_total") or data.get("tokens_total"))
230 if prompt is not None:
231 enriched["prompt_tokens"] = prompt
232 if completion is not None:
233 enriched["completion_tokens"] = completion
234 if total is not None:
235 enriched["total_tokens"] = total
236 elif prompt is not None or completion is not None:
237 enriched["total_tokens"] = int(enriched.get("prompt_tokens") or 0) + int(enriched.get("completion_tokens") or 0)
238 enriched["estimated"] = bool(enriched.get("estimated")) and cost is None
239 return enriched
240
241
242def _safe_float(value: Any) -> float | None:
243 try:
244 if value in (None, ""):
245 return None
246 return float(value)
247 except (TypeError, ValueError):
248 return None
249
250
251def _safe_int(value: Any) -> int | None:
252 try:
253 if value in (None, ""):
254 return None
255 return int(float(value))
256 except (TypeError, ValueError):
257 return None
258
259
260def _is_unsupported_tool_choice_error(exc: Exception) -> bool:
261 text = f"{type(exc).__name__} {exc}".lower()
262 return "tool_choice" in text and any(
263 marker in text
264 for marker in (
265 "unsupported",
266 "unknown parameter",
267 "unrecognized",
268 "not supported",
269 "invalid_request",
270 "extra inputs are not permitted",
271 )
272 )
273
274
275def _estimate_token_count(text: str) -> int:
276 if not text:
277 return 0
278 return max(1, (len(text) + 3) // 4)
279
280
281def _response_model(response: Any) -> str:
282 payload = _response_payload(response)
283 return str(payload.get("model") or getattr(response, "model", "") or "")
284
285
286def _response_id(response: Any) -> str:
287 payload = _response_payload(response)
288 return str(payload.get("id") or getattr(response, "id", "") or "")
nipux_cli/measurement.py 141 lines
1"""Measurement parsing helpers for generic progress accounting."""
2
3from __future__ import annotations
4
5import re
6from typing import Any
7
8
9MEASUREMENT_PATTERN = re.compile(
10 r"(?i)(?:"
11 r"\b\d+(?:\.\d+)?\s*(?:%|ms|s|sec|secs|seconds|msec|us|hz|khz|mhz|ghz|kb/s|mb/s|gb/s|tb/s|"
12 r"it/s|ops/s|req/s|qps|rps|samples/s|items/s|units/s|tokens/s|tok/s|t/s)\b"
13 r"|(?:score|rate|speed|throughput|latency|accuracy|loss|error|duration|runtime|time|memory|cpu|gpu|ram)\D{0,40}\d+(?:\.\d+)?"
14 r")"
15)
16MEASUREMENT_INTENT_PATTERN = re.compile(
17 r"(?i)\b(bench(?:mark)?|compare|duration|eval(?:uate)?|experiment|hyperfine|latency|measure|metric|perf|"
18 r"profile|rate|runtime|speed|test|throughput|time|trial)\b"
19)
20DIAGNOSTIC_MEASUREMENT_PATTERN = re.compile(r"(?i)^\s*(?:cpu|gpu|memory|mem|ram)\b")
21ACTION_MEASUREMENT_PATTERN = re.compile(
22 r"(?i)^\s*(?:score|rate|speed|throughput|latency|accuracy|loss|error|duration|runtime|time)\b"
23)
24LABELED_MEASUREMENT_PATTERN = re.compile(
25 r"(?i)^\s*(?:score|rate|speed|throughput|latency|accuracy|loss|error|duration|runtime|time)\s*(?:=|:)\s*[-+]?\d"
26)
27EXPLICIT_RESULT_UNIT_PATTERN = re.compile(
28 r"(?i)\b\d+(?:\.\d+)?\s*(?:%|ms|msec|sec|secs|seconds|it/s|ops/s|req/s|qps|rps|samples/s|items/s|units/s|"
29 r"tokens/s|tok/s|t/s|kb/s|mb/s|gb/s|tb/s)\b"
30)
31TABLE_UNIT_PATTERN = re.compile(
32 r"(?i)^(?:%|ms|msec|sec|secs|seconds|it/s|ops/s|req/s|qps|rps|samples/s|items/s|units/s|"
33 r"tokens/s|tok/s|t/s|kb/s|mb/s|gb/s|tb/s)$"
34)
35TABLE_NUMBER_PATTERN = re.compile(r"[-+]?\d+(?:\.\d+)?(?:\s*(?:±|\+/-)\s*[-+]?\d+(?:\.\d+)?)?")
36
37
38def measurement_candidates(output: dict[str, Any], *, command: str = "", limit: int = 8) -> list[str]:
39 text = "\n".join(
40 str(output.get(key) or "")
41 for key in ("stdout", "stderr", "result", "content")
42 if output.get(key) is not None
43 )
44 if not text.strip():
45 return []
46 command_has_measurement_intent = bool(MEASUREMENT_INTENT_PATTERN.search(command))
47 candidates: list[str] = []
48 for candidate in _table_measurement_candidates(text, limit=limit):
49 if candidate not in candidates:
50 candidates.append(candidate)
51 if len(candidates) >= limit:
52 return candidates
53 for match in MEASUREMENT_PATTERN.finditer(text[:20000]):
54 candidate = " ".join(match.group(0).split())
55 if not EXPLICIT_RESULT_UNIT_PATTERN.search(candidate):
56 expanded = " ".join(text[match.start() : min(len(text), match.end() + 32)].split())
57 if EXPLICIT_RESULT_UNIT_PATTERN.search(expanded):
58 candidate = expanded
59 if _candidate_is_diagnostic_only(candidate, command_has_measurement_intent):
60 continue
61 if candidate not in candidates:
62 candidates.append(candidate[:140])
63 if len(candidates) >= limit:
64 break
65 return candidates
66
67
68def _table_measurement_candidates(text: str, *, limit: int = 8) -> list[str]:
69 candidates: list[str] = []
70 table_lines = [line.strip() for line in str(text or "").splitlines() if line.strip().startswith("|") and "|" in line.strip()[1:]]
71 for index, line in enumerate(table_lines):
72 headers = _split_markdown_table_row(line)
73 if not headers or _is_markdown_separator_row(headers):
74 continue
75 unit_indexes = [idx for idx, header in enumerate(headers) if TABLE_UNIT_PATTERN.search(header.strip())]
76 if not unit_indexes:
77 continue
78 for row_line in table_lines[index + 1 : index + 16]:
79 cells = _split_markdown_table_row(row_line)
80 if not cells or _is_markdown_separator_row(cells):
81 continue
82 for unit_index in unit_indexes:
83 if unit_index >= len(cells):
84 continue
85 value = cells[unit_index].strip()
86 number = TABLE_NUMBER_PATTERN.search(value)
87 if not number:
88 continue
89 unit = headers[unit_index].strip()
90 label = _table_measurement_label(headers, cells, unit_index=unit_index)
91 candidate = f"{label} {number.group(0).strip()} {unit}".strip()
92 if candidate not in candidates:
93 candidates.append(candidate[:140])
94 if len(candidates) >= limit:
95 return candidates
96 return candidates
97
98
99def _split_markdown_table_row(line: str) -> list[str]:
100 raw = str(line or "").strip()
101 if raw.startswith("|"):
102 raw = raw[1:]
103 if raw.endswith("|"):
104 raw = raw[:-1]
105 return [" ".join(cell.strip().split()) for cell in raw.split("|")]
106
107
108def _is_markdown_separator_row(cells: list[str]) -> bool:
109 return bool(cells) and all(re.fullmatch(r":?-{2,}:?", cell.strip()) for cell in cells if cell.strip())
110
111
112def _table_measurement_label(headers: list[str], cells: list[str], *, unit_index: int) -> str:
113 preferred_headers = {"test", "metric", "name", "case", "benchmark"}
114 for index, header in enumerate(headers):
115 if index >= len(cells) or index == unit_index:
116 continue
117 if header.strip().lower() in preferred_headers and cells[index].strip():
118 return cells[index].strip()
119 for index in range(min(unit_index, len(cells)) - 1, -1, -1):
120 cell = cells[index].strip()
121 if cell and not TABLE_NUMBER_PATTERN.fullmatch(cell):
122 return cell
123 return "measurement"
124
125
126def measurement_candidates_are_diagnostic_only(candidates: list[Any], *, command: str = "") -> bool:
127 command_has_measurement_intent = bool(MEASUREMENT_INTENT_PATTERN.search(command))
128 return all(_candidate_is_diagnostic_only(str(candidate), command_has_measurement_intent) for candidate in candidates)
129
130
131def _candidate_is_diagnostic_only(candidate: str, command_has_measurement_intent: bool) -> bool:
132 has_structured_metric = bool(EXPLICIT_RESULT_UNIT_PATTERN.search(candidate) or LABELED_MEASUREMENT_PATTERN.search(candidate))
133 if command_has_measurement_intent:
134 return not has_structured_metric
135 if DIAGNOSTIC_MEASUREMENT_PATTERN.search(candidate):
136 return True
137 if EXPLICIT_RESULT_UNIT_PATTERN.search(candidate) and not re.search(r"(?i)\b(?:cpu|gpu|ram|mem|memory)\b", candidate):
138 return False
139 if ACTION_MEASUREMENT_PATTERN.search(candidate):
140 return not bool(LABELED_MEASUREMENT_PATTERN.search(candidate))
141 return True
nipux_cli/memory_graph.py 302 lines
1"""Job-local memory graph helpers for long-running workers."""
2
3from __future__ import annotations
4
5import re
6from datetime import datetime
7from typing import Any
8
9from nipux_cli.worker_prompt_format import clip_text
10
11
12NODE_KINDS = {
13 "artifact",
14 "constraint",
15 "decision",
16 "episode",
17 "experiment",
18 "fact",
19 "milestone",
20 "question",
21 "skill",
22 "source",
23 "strategy",
24 "task",
25}
26NODE_STATUSES = {"active", "blocked", "deprecated", "open", "resolved", "stable"}
27DEFAULT_NODE_KIND = "fact"
28DEFAULT_NODE_STATUS = "active"
29NEGATIVE_MEMORY_MARKERS = (
30 "0 files",
31 "0 results",
32 "blocked until",
33 "cannot access",
34 "critical blocker",
35 "does not exist",
36 "failed to find",
37 "missing",
38 "must be downloaded",
39 "no ",
40 "no such",
41 "none",
42 "not available",
43 "not detected",
44 "not downloaded",
45 "not found",
46 "not installed",
47 "prevents",
48 "unavailable",
49 "without",
50)
51
52
53def memory_graph_from_job(job: dict[str, Any]) -> dict[str, Any]:
54 metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
55 graph = metadata.get("memory_graph") if isinstance(metadata.get("memory_graph"), dict) else {}
56 nodes = graph.get("nodes") if isinstance(graph.get("nodes"), list) else []
57 edges = graph.get("edges") if isinstance(graph.get("edges"), list) else []
58 return {
59 "nodes": [node for node in nodes if isinstance(node, dict)],
60 "edges": [edge for edge in edges if isinstance(edge, dict)],
61 "updated_at": graph.get("updated_at") or "",
62 }
63
64
65def memory_graph_for_prompt(job: dict[str, Any], *, limit: int = 10, stale_tokens: list[str] | None = None) -> str:
66 graph = memory_graph_from_job(job)
67 metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
68 stale_tokens = [str(token) for token in stale_tokens or [] if str(token).strip()]
69 stale_node_ids = {
70 str(record.get("record_id") or "")
71 for record in metadata.get("stale_negative_records", [])
72 if isinstance(record, dict) and str(record.get("kind") or "") == "memory_node"
73 } if isinstance(metadata.get("stale_negative_records"), list) else set()
74 stale_nodes = [
75 node
76 for node in graph["nodes"]
77 if _node_contains_stale_token(node, stale_tokens) or _node_has_stale_id(node, stale_node_ids)
78 ]
79 active_nodes = [node for node in graph["nodes"] if node not in stale_nodes]
80 nodes = rank_memory_nodes(active_nodes, limit=limit)
81 durable_count = _durable_signal_count(job)
82 if not nodes:
83 hint = (
84 "No memory graph yet. When a branch produces reusable knowledge, use record_memory_graph "
85 "to save connected episode, fact, strategy, skill, question, decision, or constraint nodes."
86 )
87 if durable_count:
88 hint += (
89 f" Durable ledgers already contain {durable_count} reusable item(s); consolidate the most important "
90 "ones into graph nodes before raw history grows further."
91 )
92 return hint
93 edge_index = _edge_index(graph["edges"])
94 lines = [
95 f"Memory graph: nodes={len(graph['nodes'])} edges={len(graph['edges'])}",
96 ]
97 if stale_nodes:
98 lines.append(
99 f"Suppressed {len(stale_nodes)} stale memory node(s) matching unsupported tokens; "
100 "do not use them as facts until observed again."
101 )
102 if durable_count >= 8 and len(graph["nodes"]) < max(4, durable_count // 4):
103 lines.append(
104 "Consolidation pressure: durable ledgers are growing faster than the memory graph. "
105 "After the next meaningful checkpoint, use record_memory_graph to add or update connected nodes."
106 )
107 for node in nodes:
108 key = str(node.get("key") or "")
109 title = str(node.get("title") or key or "memory")
110 kind = str(node.get("kind") or DEFAULT_NODE_KIND)
111 status = str(node.get("status") or DEFAULT_NODE_STATUS)
112 summary = str(node.get("summary") or "")
113 tags = _clean_string_list(node.get("tags"))[:5]
114 refs = _clean_string_list(node.get("evidence_refs"))[:4]
115 parent = str(node.get("parent_key") or "")
116 line = f"- {status} {kind}: {title}"
117 if parent:
118 line += f" | parent={parent}"
119 if tags:
120 line += f" | tags={', '.join(tags)}"
121 if refs:
122 line += f" | evidence={', '.join(refs)}"
123 if summary:
124 line += f" | {summary}"
125 lines.append(clip_text(line, 620))
126 related = edge_index.get(key, [])[:3]
127 if related:
128 lines.append(" links: " + clip_text("; ".join(related), 420))
129 return "\n".join(lines)
130
131
132def _node_contains_stale_token(node: dict[str, Any], stale_tokens: list[str]) -> bool:
133 if not stale_tokens:
134 return False
135 text_parts = [
136 node.get("key"),
137 node.get("title"),
138 node.get("kind"),
139 node.get("status"),
140 node.get("summary"),
141 " ".join(_clean_string_list(node.get("tags"))),
142 ]
143 text = " ".join(str(part or "") for part in text_parts)
144 text_lower = text.lower()
145 negative_node = _node_has_negative_memory_marker(text_lower)
146 for token in stale_tokens:
147 pattern = r"(?<![A-Za-z0-9])" + re.escape(token) + r"(?![A-Za-z0-9])"
148 if re.search(pattern, text, flags=re.IGNORECASE):
149 return True
150 if negative_node and token.startswith("."):
151 bare = token[1:].strip()
152 if bare and re.search(r"(?<![A-Za-z0-9])" + re.escape(bare) + r"(?![A-Za-z0-9])", text_lower):
153 return True
154 return False
155
156
157def _node_has_negative_memory_marker(text_lower: str) -> bool:
158 return any(marker in text_lower for marker in NEGATIVE_MEMORY_MARKERS)
159
160
161def _node_has_stale_id(node: dict[str, Any], stale_node_ids: set[str]) -> bool:
162 if not stale_node_ids:
163 return False
164 for key in ("key", "event_id", "id"):
165 value = str(node.get(key) or "").strip()
166 if value and value in stale_node_ids:
167 return True
168 return False
169
170
171def search_memory_graph(graph: dict[str, Any], query: str, *, limit: int = 10) -> dict[str, Any]:
172 nodes = graph.get("nodes") if isinstance(graph.get("nodes"), list) else []
173 edges = graph.get("edges") if isinstance(graph.get("edges"), list) else []
174 ranked = rank_memory_nodes([node for node in nodes if isinstance(node, dict)], query=query, limit=limit)
175 keys = {str(node.get("key") or "") for node in ranked}
176 related_edges = [
177 edge
178 for edge in edges
179 if isinstance(edge, dict)
180 and (str(edge.get("from_key") or "") in keys or str(edge.get("to_key") or "") in keys)
181 ][: max(limit * 2, 10)]
182 return {"nodes": ranked, "edges": related_edges}
183
184
185def rank_memory_nodes(nodes: list[dict[str, Any]], *, query: str = "", limit: int = 10) -> list[dict[str, Any]]:
186 tokens = _tokens(query)
187 ranked = sorted(nodes, key=lambda node: _node_score(node, tokens), reverse=True)
188 if tokens:
189 ranked = [node for node in ranked if _node_score(node, tokens) > 0]
190 return ranked[: max(0, limit)]
191
192
193def _node_score(node: dict[str, Any], query_tokens: set[str]) -> float:
194 haystack = " ".join(
195 str(value or "")
196 for value in [
197 node.get("key"),
198 node.get("title"),
199 node.get("kind"),
200 node.get("status"),
201 node.get("summary"),
202 " ".join(_clean_string_list(node.get("tags"))),
203 ]
204 ).lower()
205 score = 0.0
206 for token in query_tokens:
207 if token in haystack:
208 score += 4.0 if token in str(node.get("title") or "").lower() else 2.0
209 score += _float_between(node.get("salience"), 0.0, 1.0) * 3.0
210 score += _float_between(node.get("confidence"), 0.0, 1.0)
211 status = str(node.get("status") or DEFAULT_NODE_STATUS)
212 if status in {"active", "open"}:
213 score += 1.2
214 elif status == "stable":
215 score += 0.7
216 elif status == "deprecated":
217 score -= 1.5
218 kind = str(node.get("kind") or DEFAULT_NODE_KIND)
219 if kind in {"strategy", "skill", "decision", "constraint", "question"}:
220 score += 0.5
221 score += min(int(node.get("use_count") or 0), 8) * 0.08
222 score += _recency_score(str(node.get("updated_at") or node.get("created_at") or ""))
223 return score
224
225
226def _edge_index(edges: list[dict[str, Any]]) -> dict[str, list[str]]:
227 index: dict[str, list[str]] = {}
228 for edge in edges:
229 from_key = str(edge.get("from_key") or "")
230 to_key = str(edge.get("to_key") or "")
231 if not from_key or not to_key:
232 continue
233 relation = str(edge.get("relation") or "related_to")
234 index.setdefault(from_key, []).append(f"{relation} -> {to_key}")
235 index.setdefault(to_key, []).append(f"{relation} <- {from_key}")
236 return index
237
238
239def _tokens(value: str) -> set[str]:
240 return {token for token in re.findall(r"[a-z0-9][a-z0-9_-]{1,}", value.lower()) if token not in _STOPWORDS}
241
242
243def _float_between(value: Any, low: float, high: float) -> float:
244 try:
245 number = float(value)
246 except (TypeError, ValueError):
247 return low
248 return min(high, max(low, number))
249
250
251def _recency_score(value: str) -> float:
252 if not value:
253 return 0.0
254 try:
255 parsed = datetime.fromisoformat(value.replace("Z", "+00:00"))
256 except ValueError:
257 return 0.0
258 age_seconds = max(0.0, (datetime.now(parsed.tzinfo) - parsed).total_seconds())
259 if age_seconds < 3600:
260 return 0.8
261 if age_seconds < 86_400:
262 return 0.4
263 if age_seconds < 604_800:
264 return 0.15
265 return 0.0
266
267
268def _clean_string_list(value: Any) -> list[str]:
269 if not isinstance(value, list):
270 return []
271 return [" ".join(str(item).split()) for item in value if str(item).strip()]
272
273
274def _durable_signal_count(job: dict[str, Any]) -> int:
275 metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
276 count = 0
277 for key in (
278 "experiment_ledger",
279 "finding_ledger",
280 "lessons",
281 "source_ledger",
282 "task_queue",
283 ):
284 values = metadata.get(key)
285 if isinstance(values, list):
286 count += sum(1 for value in values if isinstance(value, dict))
287 roadmap = metadata.get("roadmap") if isinstance(metadata.get("roadmap"), dict) else {}
288 milestones = roadmap.get("milestones") if isinstance(roadmap.get("milestones"), list) else []
289 count += sum(1 for milestone in milestones if isinstance(milestone, dict))
290 return count
291
292
293_STOPWORDS = {
294 "and",
295 "for",
296 "from",
297 "into",
298 "that",
299 "the",
300 "this",
301 "with",
302}
nipux_cli/memory_graph_view.py 299 lines
1"""Self-contained HTML view for a job-local memory graph."""
2
3from __future__ import annotations
4
5import json
6from html import escape
7from typing import Any
8
9from nipux_cli.memory_graph import memory_graph_from_job
10
11
12def render_memory_graph_html(job: dict[str, Any]) -> str:
13 """Return a standalone clickable graph page for a job's memory graph."""
14
15 graph = memory_graph_from_job(job)
16 nodes = [_view_node(node) for node in graph["nodes"]]
17 edges = [_view_edge(edge) for edge in graph["edges"]]
18 data = json.dumps(
19 {
20 "job": {
21 "id": str(job.get("id") or ""),
22 "title": str(job.get("title") or ""),
23 "objective": str(job.get("objective") or ""),
24 },
25 "updated_at": graph.get("updated_at") or "",
26 "nodes": nodes,
27 "edges": edges,
28 },
29 ensure_ascii=False,
30 ).replace("</", "<\\/")
31 title = escape(str(job.get("title") or "Nipux memory graph"))
32 return f"""<!doctype html>
33<html lang="en">
34<head>
35<meta charset="utf-8">
36<meta name="viewport" content="width=device-width, initial-scale=1">
37<title>Nipux Memory Graph - {title}</title>
38<style>
39:root {{
40 color-scheme: dark;
41 --bg: #101112;
42 --panel: #17191b;
43 --line: #303336;
44 --text: #eeeeea;
45 --muted: #9a9a94;
46 --accent: #82e6e1;
47 --gold: #e6d06f;
48 --purple: #c99ce8;
49}}
50* {{ box-sizing: border-box; }}
51html, body {{ margin: 0; min-height: 100%; background: var(--bg); color: var(--text); font: 14px/1.45 ui-monospace, SFMono-Regular, Menlo, Consolas, monospace; }}
52body {{ overflow: hidden; }}
53.shell {{ display: grid; grid-template-columns: minmax(0, 1fr) 360px; height: 100vh; }}
54.stage {{ position: relative; min-width: 0; border-right: 1px solid var(--line); }}
55.top {{ position: absolute; top: 22px; left: 26px; right: 26px; z-index: 2; display: flex; align-items: start; justify-content: space-between; gap: 24px; }}
56.eyebrow {{ color: var(--muted); letter-spacing: .22em; text-transform: uppercase; font-size: 12px; }}
57h1 {{ margin: 8px 0 0; font-size: clamp(30px, 4vw, 64px); line-height: .95; letter-spacing: -.04em; }}
58.stats {{ display: flex; gap: 18px; color: var(--muted); white-space: nowrap; }}
59.stats b {{ color: var(--text); font-size: 20px; }}
60canvas {{ display: block; width: 100%; height: 100%; }}
61.help {{ position: absolute; left: 26px; bottom: 22px; color: var(--muted); z-index: 2; }}
62aside {{ min-width: 0; background: var(--panel); padding: 26px; overflow: auto; }}
63.card {{ border: 1px solid var(--line); border-radius: 18px; padding: 18px; margin-top: 18px; background: rgba(255,255,255,.018); }}
64.label {{ color: var(--muted); font-size: 12px; letter-spacing: .18em; text-transform: uppercase; }}
65.node-title {{ margin-top: 10px; font-size: 22px; line-height: 1.12; }}
66.row {{ display: grid; grid-template-columns: 88px minmax(0, 1fr); gap: 12px; margin-top: 12px; }}
67.row span:first-child {{ color: var(--muted); }}
68.pill {{ display: inline-block; margin: 5px 6px 0 0; padding: 3px 8px; border: 1px solid var(--line); border-radius: 999px; color: var(--accent); }}
69.list {{ margin: 8px 0 0; padding-left: 18px; color: var(--muted); }}
70.empty {{ color: var(--muted); margin-top: 12px; }}
71.search {{ width: 100%; margin-top: 18px; padding: 12px 14px; border-radius: 12px; border: 1px solid var(--line); background: #0d0e0f; color: var(--text); font: inherit; outline: none; }}
72.search:focus {{ border-color: var(--accent); }}
73@media (max-width: 900px) {{
74 .shell {{ grid-template-columns: 1fr; grid-template-rows: 62vh 38vh; }}
75 .stage {{ border-right: 0; border-bottom: 1px solid var(--line); }}
76}}
77</style>
78</head>
79<body>
80<main class="shell">
81 <section class="stage">
82 <div class="top">
83 <div>
84 <div class="eyebrow">Nipux memory graph</div>
85 <h1>{title}</h1>
86 </div>
87 <div class="stats">
88 <div><b id="node-count">0</b><br>nodes</div>
89 <div><b id="edge-count">0</b><br>links</div>
90 </div>
91 </div>
92 <canvas id="graph"></canvas>
93 <div class="help">drag to rotate · scroll to zoom · click a node</div>
94 </section>
95 <aside>
96 <div class="label">inspect</div>
97 <input id="search" class="search" placeholder="search nodes">
98 <div id="details" class="card">
99 <div class="label">selected node</div>
100 <div class="empty">Click a node to inspect its summary, evidence, and links.</div>
101 </div>
102 <div id="results" class="card">
103 <div class="label">visible nodes</div>
104 <div class="empty">No nodes yet. The worker can create graph memory with record_memory_graph.</div>
105 </div>
106 </aside>
107</main>
108<script id="graph-data" type="application/json">{data}</script>
109<script>
110const data = JSON.parse(document.getElementById("graph-data").textContent);
111const canvas = document.getElementById("graph");
112const ctx = canvas.getContext("2d");
113const details = document.getElementById("details");
114const results = document.getElementById("results");
115const search = document.getElementById("search");
116document.getElementById("node-count").textContent = data.nodes.length;
117document.getElementById("edge-count").textContent = data.edges.length;
118
119let width = 0, height = 0, zoom = 1, rotX = -0.35, rotY = 0.65, dragging = false, last = [0, 0], selected = null;
120let lastResultsSignature = "";
121const nodeByKey = new Map(data.nodes.map((node, index) => [node.key, {{ ...node, index }}]));
122const nodes = data.nodes.map((node, index) => {{
123 const a = index * 2.399963229728653;
124 const r = 110 + (index % 7) * 24;
125 const z = ((index * 53) % 240) - 120;
126 return {{ ...node, x: Math.cos(a) * r, y: Math.sin(a) * r, z, vx: 0, vy: 0, vz: 0, screen: [0, 0], visible: true }};
127}});
128const nodeLookup = new Map(nodes.map(node => [node.key, node]));
129const edges = data.edges.map(edge => ({{ ...edge, from: nodeLookup.get(edge.from_key), to: nodeLookup.get(edge.to_key) }})).filter(edge => edge.from && edge.to);
130
131function resize() {{
132 const ratio = window.devicePixelRatio || 1;
133 width = canvas.clientWidth;
134 height = canvas.clientHeight;
135 canvas.width = Math.max(1, width * ratio);
136 canvas.height = Math.max(1, height * ratio);
137 ctx.setTransform(ratio, 0, 0, ratio, 0, 0);
138}}
139window.addEventListener("resize", resize);
140resize();
141
142function project(node) {{
143 const cy = Math.cos(rotY), sy = Math.sin(rotY), cx = Math.cos(rotX), sx = Math.sin(rotX);
144 let x = node.x * cy - node.z * sy;
145 let z = node.x * sy + node.z * cy;
146 let y = node.y * cx - z * sx;
147 z = node.y * sx + z * cx;
148 const scale = zoom * 520 / (520 + z);
149 return [width / 2 + x * scale, height / 2 + y * scale, scale, z];
150}}
151
152function color(node) {{
153 if (node.status === "deprecated") return "#7d7d77";
154 if (node.kind === "question") return "#e6d06f";
155 if (node.kind === "skill" || node.kind === "strategy") return "#82e6e1";
156 if (node.kind === "decision" || node.kind === "constraint") return "#c99ce8";
157 return "#eeeeea";
158}}
159
160function draw() {{
161 ctx.clearRect(0, 0, width, height);
162 ctx.fillStyle = "#101112";
163 ctx.fillRect(0, 0, width, height);
164 const q = search.value.trim().toLowerCase();
165 for (const node of nodes) {{
166 const hay = [node.key, node.title, node.kind, node.status, node.summary, ...(node.tags || [])].join(" ").toLowerCase();
167 node.visible = !q || hay.includes(q);
168 }}
169 for (const edge of edges) {{
170 if (!edge.from.visible || !edge.to.visible) continue;
171 const a = project(edge.from), b = project(edge.to);
172 ctx.strokeStyle = "rgba(154,154,148,.26)";
173 ctx.lineWidth = 1;
174 ctx.beginPath();
175 ctx.moveTo(a[0], a[1]);
176 ctx.lineTo(b[0], b[1]);
177 ctx.stroke();
178 }}
179 const sorted = [...nodes].map(node => [node, project(node)]).sort((a, b) => a[1][3] - b[1][3]);
180 for (const [node, p] of sorted) {{
181 node.screen = p;
182 if (!node.visible) continue;
183 const radius = Math.max(5, 9 * p[2]);
184 ctx.fillStyle = color(node);
185 ctx.globalAlpha = selected && selected.key !== node.key ? .55 : 1;
186 ctx.beginPath();
187 ctx.arc(p[0], p[1], radius, 0, Math.PI * 2);
188 ctx.fill();
189 if (selected && selected.key === node.key) {{
190 ctx.strokeStyle = "#ffffff";
191 ctx.lineWidth = 2;
192 ctx.stroke();
193 }}
194 }}
195 ctx.globalAlpha = 1;
196 renderResults(q);
197 requestAnimationFrame(draw);
198}}
199
200function renderResults(query) {{
201 const visible = nodes.filter(node => node.visible).slice(0, 18);
202 const signature = query + "|" + visible.map(node => node.key).join(",");
203 if (signature === lastResultsSignature) return;
204 lastResultsSignature = signature;
205 results.innerHTML = '<div class="label">visible nodes</div>' + (
206 visible.length
207 ? visible.map(node => `<div class="row"><span>${{escapeHtml(node.kind)}}</span><a href="#" data-key="${{escapeHtml(node.key)}}">${{escapeHtml(node.title || node.key)}}</a></div>`).join("")
208 : '<div class="empty">No nodes match the current search.</div>'
209 );
210 for (const link of results.querySelectorAll("a[data-key]")) {{
211 link.addEventListener("click", event => {{
212 event.preventDefault();
213 selectNode(nodeLookup.get(link.dataset.key));
214 }});
215 }}
216}}
217
218function selectNode(node) {{
219 selected = node;
220 if (!node) return;
221 const linked = edges.filter(edge => edge.from.key === node.key || edge.to.key === node.key).slice(0, 12);
222 details.innerHTML = `
223 <div class="label">${{escapeHtml(node.kind)}} · ${{escapeHtml(node.status)}}</div>
224 <div class="node-title">${{escapeHtml(node.title || node.key)}}</div>
225 <div class="row"><span>key</span><div>${{escapeHtml(node.key)}}</div></div>
226 <div class="row"><span>summary</span><div>${{escapeHtml(node.summary || "No summary recorded.")}}</div></div>
227 <div class="row"><span>score</span><div>salience ${{node.salience ?? "n/a"}} · confidence ${{node.confidence ?? "n/a"}}</div></div>
228 <div class="row"><span>tags</span><div>${{(node.tags || []).map(tag => `<span class="pill">${{escapeHtml(tag)}}</span>`).join("") || "none"}}</div></div>
229 <div class="row"><span>evidence</span><ul class="list">${{(node.evidence_refs || []).map(ref => `<li>${{escapeHtml(ref)}}</li>`).join("") || "<li>none</li>"}}</ul></div>
230 <div class="row"><span>links</span><ul class="list">${{linked.map(edge => `<li>${{escapeHtml(edge.from.key === node.key ? edge.relation + " → " + edge.to.key : edge.relation + " ← " + edge.from.key)}}</li>`).join("") || "<li>none</li>"}}</ul></div>
231 `;
232}}
233
234function escapeHtml(value) {{
235 return String(value ?? "").replace(/[&<>"']/g, char => ({{ "&": "&", "<": "<", ">": ">", '"': """, "'": "'" }}[char]));
236}}
237
238canvas.addEventListener("mousedown", event => {{ dragging = true; last = [event.clientX, event.clientY]; }});
239window.addEventListener("mouseup", () => dragging = false);
240window.addEventListener("mousemove", event => {{
241 if (!dragging) return;
242 rotY += (event.clientX - last[0]) * 0.006;
243 rotX += (event.clientY - last[1]) * 0.006;
244 last = [event.clientX, event.clientY];
245}});
246canvas.addEventListener("wheel", event => {{
247 event.preventDefault();
248 zoom = Math.max(.35, Math.min(3.2, zoom * (event.deltaY > 0 ? .92 : 1.08)));
249}}, {{ passive: false }});
250canvas.addEventListener("click", event => {{
251 const rect = canvas.getBoundingClientRect();
252 const x = event.clientX - rect.left, y = event.clientY - rect.top;
253 let best = null, bestDistance = 18;
254 for (const node of nodes) {{
255 if (!node.visible) continue;
256 const dx = node.screen[0] - x, dy = node.screen[1] - y;
257 const distance = Math.hypot(dx, dy);
258 if (distance < bestDistance) {{ best = node; bestDistance = distance; }}
259 }}
260 if (best) selectNode(best);
261}});
262search.addEventListener("input", () => {{}});
263if (nodes[0]) selectNode(nodes[0]);
264draw();
265</script>
266</body>
267</html>
268"""
269
270
271def _view_node(node: dict[str, Any]) -> dict[str, Any]:
272 return {
273 "key": str(node.get("key") or node.get("title") or "memory"),
274 "title": str(node.get("title") or node.get("key") or "memory"),
275 "kind": str(node.get("kind") or "fact"),
276 "status": str(node.get("status") or "active"),
277 "summary": str(node.get("summary") or ""),
278 "salience": node.get("salience"),
279 "confidence": node.get("confidence"),
280 "tags": _string_list(node.get("tags")),
281 "evidence_refs": _string_list(node.get("evidence_refs")),
282 "created_at": str(node.get("created_at") or ""),
283 "updated_at": str(node.get("updated_at") or ""),
284 }
285
286
287def _view_edge(edge: dict[str, Any]) -> dict[str, Any]:
288 return {
289 "from_key": str(edge.get("from_key") or ""),
290 "to_key": str(edge.get("to_key") or ""),
291 "relation": str(edge.get("relation") or "related_to"),
292 "summary": str(edge.get("summary") or ""),
293 }
294
295
296def _string_list(value: Any) -> list[str]:
297 if not isinstance(value, list):
298 return []
299 return [" ".join(str(item).split()) for item in value if str(item).strip()]
nipux_cli/metric_format.py 17 lines
1"""Small formatting helpers for measured worker results."""
2
3from __future__ import annotations
4
5from typing import Any
6
7
8def format_metric_value(name: Any, value: Any, unit: Any = "") -> str:
9 """Return a readable metric string such as ``score=0.82`` or ``tokens=4200 tokens``."""
10
11 metric_name = str(name or "metric").strip() or "metric"
12 metric_value = str(value).strip()
13 metric_unit = str(unit or "").strip()
14 if not metric_unit:
15 return f"{metric_name}={metric_value}"
16 separator = "" if metric_unit.startswith(("%", "/", "°")) else " "
17 return f"{metric_name}={metric_value}{separator}{metric_unit}"
nipux_cli/operator_context.py 83 lines
1"""Generic filtering for operator messages that enter worker context."""
2
3from __future__ import annotations
4
5import re
6from typing import Any
7
8
9CONVERSATION_ONLY_PATTERNS = [
10 re.compile(r"(?i)^\s*(hi|hello|hey|yo|thanks|thank you|ok|okay|cool|nice|great|hello\?)\s*[.!?]*\s*$"),
11 re.compile(r"(?i)^\s*(how('?s| is) it going|what('?s| is) going on|any updates?|status|jobs|ls|help|clear|exit|quit)\s*[.!?]*\s*$"),
12 re.compile(r"(?i)^\s*(how('?s| is) it going)\??\s*(have you got|any)\s+(any\s+)?(improvements?|updates?|results?)\s*(yet)?\s*[.!?]*\s*$"),
13 re.compile(r"(?i)^\s*(run|start|stop|pause|resume|cancel|work|status|jobs|clear|exit|quit|help)\s+\d+\s*$"),
14]
15
16ACTIONABLE_PATTERNS = [
17 re.compile(
18 r"(?i)\b("
19 r"avoid|because|benchmark|change|constraint|correct|do not|don't|dont|fix|focus|instead|instruction|"
20 r"measure|must|need|never|only|prefer|priority|remember|should|target|use|wrong"
21 r"|prioriti[sz]e)\b"
22 ),
23 re.compile(r"[\"'`][^\"'`]{2,}[\"'`]"),
24]
25
26
27def operator_entry_is_active(entry: dict[str, Any]) -> bool:
28 mode = str(entry.get("mode") or "steer").strip().lower().replace("-", "_")
29 return (
30 mode in {"steer", "follow_up"}
31 and not entry.get("acknowledged_at")
32 and not entry.get("superseded_at")
33 )
34
35
36def operator_entry_is_prompt_relevant(entry: dict[str, Any]) -> bool:
37 mode = str(entry.get("mode") or "steer").strip().lower().replace("-", "_")
38 message = str(entry.get("message") or "").strip()
39 if not message:
40 return False
41 if mode == "note":
42 return not _conversation_only(message)
43 if mode not in {"steer", "follow_up"}:
44 return False
45 if entry.get("acknowledged_at") or entry.get("superseded_at"):
46 return False
47 return _actionable(message)
48
49
50def active_prompt_operator_entries(messages: list[Any]) -> list[dict[str, Any]]:
51 return [
52 entry
53 for entry in messages
54 if isinstance(entry, dict)
55 and operator_entry_is_prompt_relevant(entry)
56 ]
57
58
59def inactive_prompt_operator_ids(messages: list[Any]) -> list[str]:
60 ids: list[str] = []
61 for entry in messages:
62 if not isinstance(entry, dict):
63 continue
64 if not operator_entry_is_active(entry):
65 continue
66 if operator_entry_is_prompt_relevant(entry):
67 continue
68 event_id = str(entry.get("event_id") or "")
69 if event_id:
70 ids.append(event_id)
71 return ids
72
73
74def _conversation_only(message: str) -> bool:
75 text = " ".join(message.split())
76 return any(pattern.search(text) for pattern in CONVERSATION_ONLY_PATTERNS)
77
78
79def _actionable(message: str) -> bool:
80 text = " ".join(message.split())
81 if _conversation_only(text):
82 return False
83 return any(pattern.search(text) for pattern in ACTIONABLE_PATTERNS)
nipux_cli/parser_builder.py 356 lines
1"""Argparse construction for Nipux CLI commands."""
2
3from __future__ import annotations
4
5import argparse
6from collections.abc import Callable, Mapping
7
8
9CommandHandler = Callable[[argparse.Namespace], None]
10CommandHandlers = Mapping[str, CommandHandler]
11
12
13def _handler(handlers: CommandHandlers, name: str) -> CommandHandler:
14 return handlers[name]
15
16
17def build_arg_parser(
18 *,
19 handlers: CommandHandlers,
20 version: str,
21 default_context_length: int,
22) -> argparse.ArgumentParser:
23 parser = argparse.ArgumentParser(prog="nipux")
24 parser.add_argument("--version", action="version", version=f"nipux {version}")
25 sub = parser.add_subparsers(dest="command", required=True)
26
27 init = sub.add_parser("init")
28 init.add_argument("--path")
29 init.add_argument("--force", action="store_true")
30 init.add_argument("--openrouter", action="store_true", help="Write an OpenRouter config that reads OPENROUTER_API_KEY")
31 init.add_argument("--model", help="Model name to write into config.yaml")
32 init.add_argument("--base-url", help="OpenAI-compatible API base URL")
33 init.add_argument("--api-key-env", help="Environment variable that stores the API key")
34 init.add_argument("--context-length", type=int, default=default_context_length)
35 init.set_defaults(func=_handler(handlers, "init"))
36
37 update = sub.add_parser("update")
38 update.add_argument("--path", help="Git checkout to update. Defaults to the current Nipux install.")
39 update.add_argument("--allow-dirty", action="store_true", help="Attempt git pull even when local changes exist")
40 update.add_argument("--no-restart", action="store_true", help="Do not restart a running daemon after updating")
41 update.set_defaults(func=_handler(handlers, "update"))
42
43 uninstall = sub.add_parser("uninstall")
44 uninstall.add_argument("--yes", action="store_true", help="Confirm removal without an interactive prompt")
45 uninstall.add_argument("--dry-run", action="store_true", help="Show what would be removed")
46 uninstall.add_argument("--keep-legacy", action="store_true", help="Keep legacy ~/.kneepucks state if present")
47 uninstall.add_argument("--keep-tool", action="store_true", help="Keep the installed nipux command")
48 uninstall.add_argument("--remove-tool", action="store_true", help=argparse.SUPPRESS)
49 uninstall.add_argument("--wait", type=float, default=5.0, help="Seconds to wait for daemon shutdown")
50 uninstall.set_defaults(func=_handler(handlers, "uninstall"))
51
52 create = sub.add_parser("create", aliases=["new"])
53 create.add_argument("objective")
54 create.add_argument("--title")
55 create.add_argument("--kind", default="generic")
56 create.add_argument("--cadence")
57 create.set_defaults(func=_handler(handlers, "create"))
58
59 jobs = sub.add_parser("jobs")
60 jobs.set_defaults(func=_handler(handlers, "jobs"))
61
62 ls_cmd = sub.add_parser("ls")
63 ls_cmd.set_defaults(func=_handler(handlers, "jobs"))
64
65 focus = sub.add_parser("focus")
66 focus.add_argument("query", nargs="*")
67 focus.set_defaults(func=_handler(handlers, "focus"))
68
69 rename = sub.add_parser("rename")
70 rename.add_argument("job_id", nargs="*")
71 rename.add_argument("--title", nargs="+", required=True)
72 rename.set_defaults(func=_handler(handlers, "rename"))
73
74 delete = sub.add_parser("delete", aliases=["rm"])
75 delete.add_argument("job_id", nargs="*")
76 delete.add_argument("--keep-files", action="store_true")
77 delete.set_defaults(func=_handler(handlers, "delete"))
78
79 chat = sub.add_parser("chat")
80 chat.add_argument("job_id", nargs="*")
81 chat.add_argument("--history-limit", type=int, default=12)
82 chat.add_argument("--no-history", action="store_true")
83 chat.set_defaults(func=_handler(handlers, "chat"))
84
85 shell = sub.add_parser("shell")
86 shell.add_argument("--status", action="store_true", help="Render the full dashboard when the shell opens")
87 shell.add_argument("--no-status", action="store_true", help=argparse.SUPPRESS)
88 shell.add_argument("--limit", type=int, default=8)
89 shell.add_argument("--chars", type=int, default=180)
90 shell.set_defaults(func=_handler(handlers, "shell"))
91
92 steer = sub.add_parser("steer", aliases=["say"])
93 steer.add_argument("--job", dest="job_id")
94 steer.add_argument("message", nargs="+")
95 steer.set_defaults(func=_handler(handlers, "steer"))
96
97 pause = sub.add_parser("pause")
98 pause.add_argument("parts", nargs="*", help="Optional job title/id followed by an optional note")
99 pause.set_defaults(func=_handler(handlers, "pause"))
100
101 resume = sub.add_parser("resume")
102 resume.add_argument("job_id", nargs="*")
103 resume.set_defaults(func=_handler(handlers, "resume"))
104
105 cancel = sub.add_parser("cancel")
106 cancel.add_argument("parts", nargs="*", help="Optional job title/id followed by an optional note")
107 cancel.set_defaults(func=_handler(handlers, "cancel"))
108
109 status = sub.add_parser("status")
110 status.add_argument("job_id", nargs="*")
111 status.add_argument("--limit", type=int, default=8)
112 status.add_argument("--chars", type=int, default=180)
113 status.add_argument("--full", action="store_true", help="Render the full dashboard")
114 status.add_argument("--json", action="store_true")
115 status.set_defaults(func=_handler(handlers, "status"))
116
117 health = sub.add_parser("health")
118 health.add_argument("--limit", type=int, default=8)
119 health.add_argument("--chars", type=int, default=180)
120 health.set_defaults(func=_handler(handlers, "health"))
121
122 history = sub.add_parser("history")
123 history.add_argument("job_id", nargs="*")
124 history.add_argument("--limit", type=int, default=80)
125 history.add_argument("--chars", type=int, default=260)
126 history.add_argument("--full", action="store_true")
127 history.add_argument("--json", action="store_true")
128 history.set_defaults(func=_handler(handlers, "history"))
129
130 events = sub.add_parser("events")
131 events.add_argument("job_id", nargs="*")
132 events.add_argument("--limit", type=int, default=80)
133 events.add_argument("--chars", type=int, default=260)
134 events.add_argument("--full", action="store_true")
135 events.add_argument("--follow", action="store_true")
136 events.add_argument("--interval", type=float, default=2.0)
137 events.add_argument("--json", action="store_true")
138 events.set_defaults(func=_handler(handlers, "events"))
139
140 dashboard = sub.add_parser("dashboard", aliases=["dash"])
141 dashboard.add_argument("job_id", nargs="*")
142 dashboard.add_argument("--interval", type=float, default=2.0)
143 dashboard.add_argument("--limit", type=int, default=12)
144 dashboard.add_argument("--chars", type=int, default=260)
145 dashboard.add_argument("--no-follow", dest="follow", action="store_false")
146 dashboard.add_argument("--no-clear", dest="clear", action="store_false")
147 dashboard.set_defaults(func=_handler(handlers, "dashboard"), follow=True, clear=True)
148
149 start = sub.add_parser("start")
150 start.add_argument("--poll-seconds", type=float, default=0.0)
151 start.add_argument("--fake", action="store_true", help="Use deterministic fake model responses")
152 start.add_argument("--quiet", action="store_true", help="Write fewer daemon log lines")
153 start.add_argument("--log-file")
154 start.set_defaults(func=_handler(handlers, "start"))
155
156 stop = sub.add_parser("stop")
157 stop.add_argument("job_id", nargs="*", help="Optional job title/id to pause instead of stopping the daemon")
158 stop.add_argument("--wait", type=float, default=5.0)
159 stop.set_defaults(func=_handler(handlers, "stop"))
160
161 restart = sub.add_parser("restart")
162 restart.add_argument("--poll-seconds", type=float, default=0.0)
163 restart.add_argument("--wait", type=float, default=5.0)
164 restart.add_argument("--fake", action="store_true", help="Use deterministic fake model responses")
165 restart.add_argument("--quiet", action="store_true", help="Write fewer daemon log lines")
166 restart.add_argument("--log-file")
167 restart.set_defaults(func=_handler(handlers, "restart"))
168
169 browser_dashboard = sub.add_parser("browser-dashboard")
170 browser_dashboard.add_argument("--port", type=int, default=4848)
171 browser_dashboard.add_argument("--foreground", action="store_true")
172 browser_dashboard.add_argument("--stop", action="store_true")
173 browser_dashboard.add_argument("--log-file")
174 browser_dashboard.set_defaults(func=_handler(handlers, "browser_dashboard"))
175
176 autostart = sub.add_parser("autostart")
177 autostart.add_argument("action", choices=["install", "status", "uninstall"])
178 autostart.add_argument("--poll-seconds", type=float, default=5.0)
179 autostart.add_argument("--quiet", action="store_true")
180 autostart.set_defaults(func=_handler(handlers, "autostart"))
181
182 service = sub.add_parser("service")
183 service.add_argument("action", choices=["install", "status", "uninstall"])
184 service.add_argument("--poll-seconds", type=float, default=0.0)
185 service.add_argument("--quiet", action="store_true")
186 service.set_defaults(func=_handler(handlers, "service"))
187
188 artifacts = sub.add_parser("artifacts")
189 artifacts.add_argument("job_id", nargs="*")
190 artifacts.add_argument("--limit", type=int, default=25)
191 artifacts.add_argument("--chars", type=int, default=220)
192 artifacts.add_argument("--paths", action="store_true", help="Show full artifact paths")
193 artifacts.set_defaults(func=_handler(handlers, "artifacts"))
194
195 artifact = sub.add_parser("artifact")
196 artifact.add_argument("artifact_id_or_path", nargs="+")
197 artifact.add_argument("--job", dest="job_id")
198 artifact.add_argument("--chars", type=int, default=12000)
199 artifact.set_defaults(func=_handler(handlers, "artifact"))
200
201 lessons = sub.add_parser("lessons")
202 lessons.add_argument("job_id", nargs="*")
203 lessons.add_argument("--limit", type=int, default=25)
204 lessons.add_argument("--chars", type=int, default=220)
205 lessons.set_defaults(func=_handler(handlers, "lessons"))
206
207 learn = sub.add_parser("learn")
208 learn.add_argument("--job", dest="job_id")
209 learn.add_argument("--category", default="operator_preference")
210 learn.add_argument("--chars", type=int, default=220)
211 learn.add_argument("lesson", nargs="+")
212 learn.set_defaults(func=_handler(handlers, "learn"))
213
214 findings = sub.add_parser("findings")
215 findings.add_argument("job_id", nargs="*")
216 findings.add_argument("--limit", type=int, default=25)
217 findings.add_argument("--chars", type=int, default=220)
218 findings.add_argument("--json", action="store_true")
219 findings.set_defaults(func=_handler(handlers, "findings"))
220
221 tasks = sub.add_parser("tasks")
222 tasks.add_argument("job_id", nargs="*")
223 tasks.add_argument("--limit", type=int, default=25)
224 tasks.add_argument("--chars", type=int, default=220)
225 tasks.add_argument("--status", nargs="+")
226 tasks.add_argument("--json", action="store_true")
227 tasks.set_defaults(func=_handler(handlers, "tasks"))
228
229 roadmap = sub.add_parser("roadmap")
230 roadmap.add_argument("job_id", nargs="*")
231 roadmap.add_argument("--limit", type=int, default=25)
232 roadmap.add_argument("--features", type=int, default=3)
233 roadmap.add_argument("--chars", type=int, default=220)
234 roadmap.add_argument("--json", action="store_true")
235 roadmap.set_defaults(func=_handler(handlers, "roadmap"))
236
237 experiments = sub.add_parser("experiments")
238 experiments.add_argument("job_id", nargs="*")
239 experiments.add_argument("--limit", type=int, default=25)
240 experiments.add_argument("--chars", type=int, default=220)
241 experiments.add_argument("--status", nargs="+")
242 experiments.add_argument("--json", action="store_true")
243 experiments.set_defaults(func=_handler(handlers, "experiments"))
244
245 sources = sub.add_parser("sources")
246 sources.add_argument("job_id", nargs="*")
247 sources.add_argument("--limit", type=int, default=25)
248 sources.add_argument("--chars", type=int, default=220)
249 sources.add_argument("--json", action="store_true")
250 sources.set_defaults(func=_handler(handlers, "sources"))
251
252 memory = sub.add_parser("memory")
253 memory.add_argument("job_id", nargs="*")
254 memory.add_argument("--limit", type=int, default=10)
255 memory.add_argument("--chars", type=int, default=260)
256 memory.add_argument("--json", action="store_true", help="Print memory graph JSON")
257 memory.add_argument("--graph", action="store_true", help="Write a clickable HTML memory graph")
258 memory.add_argument("--output", help="Path for --graph HTML output")
259 memory.set_defaults(func=_handler(handlers, "memory"))
260
261 metrics = sub.add_parser("metrics")
262 metrics.add_argument("job_id", nargs="*")
263 metrics.add_argument("--chars", type=int, default=220)
264 metrics.set_defaults(func=_handler(handlers, "metrics"))
265
266 usage = sub.add_parser("usage")
267 usage.add_argument("job_id", nargs="*")
268 usage.add_argument("--json", action="store_true")
269 usage.set_defaults(func=_handler(handlers, "usage"))
270
271 logs = sub.add_parser("logs", aliases=["outputs", "output"])
272 logs.add_argument("job_id", nargs="*")
273 logs.add_argument("--limit", type=int, default=25)
274 logs.add_argument("--verbose", action="store_true")
275 logs.add_argument("--chars", type=int, default=4000)
276 logs.set_defaults(func=_handler(handlers, "logs"))
277
278 activity = sub.add_parser("activity", aliases=["feed", "tail"])
279 activity.add_argument("job_id", nargs="*")
280 activity.add_argument("--limit", type=int, default=20)
281 activity.add_argument("--chars", type=int, default=180)
282 activity.add_argument("--follow", action="store_true")
283 activity.add_argument("--interval", type=float, default=2.0)
284 activity.add_argument("--verbose", action="store_true")
285 activity.add_argument("--paths", action="store_true", help="Show full artifact paths")
286 activity.set_defaults(func=_handler(handlers, "activity"))
287
288 updates = sub.add_parser("updates", aliases=["outcomes", "outcome"])
289 updates.add_argument("job_id", nargs="*")
290 updates.add_argument("--all", action="store_true", help="Show durable outcome summaries for every job")
291 updates.add_argument("--limit", type=int, default=5)
292 updates.add_argument("--chars", type=int, default=180)
293 updates.add_argument("--paths", action="store_true", help="Show full artifact paths")
294 updates.set_defaults(func=_handler(handlers, "updates"))
295
296 watch = sub.add_parser("watch")
297 watch.add_argument("job_id", nargs="+")
298 watch.add_argument("--interval", type=float, default=2.0)
299 watch.add_argument("--limit", type=int, default=20)
300 watch.add_argument("--verbose", action="store_true")
301 watch.add_argument("--chars", type=int, default=4000)
302 watch.add_argument("--no-follow", dest="follow", action="store_false")
303 watch.set_defaults(func=_handler(handlers, "watch"), follow=True)
304
305 run_one = sub.add_parser("run-one")
306 run_one.add_argument("job_id", nargs="+")
307 run_one.add_argument("--fake", action="store_true", help="Use a deterministic fake model response")
308 run_one.set_defaults(func=_handler(handlers, "run_one"))
309
310 work = sub.add_parser("work")
311 work.add_argument("job_id", nargs="*")
312 work.add_argument("--steps", type=int, default=5)
313 work.add_argument("--poll-seconds", type=float, default=0.5)
314 work.add_argument("--fake", action="store_true", help="Use deterministic fake model responses")
315 work.add_argument("--verbose", action="store_true", help="Print step inputs and outputs")
316 work.add_argument("--dashboard", action="store_true", help="Render a dashboard snapshot after each step")
317 work.add_argument("--limit", type=int, default=12)
318 work.add_argument("--chars", type=int, default=4000)
319 work.add_argument("--continue-on-error", action="store_true")
320 work.set_defaults(func=_handler(handlers, "work"))
321
322 run = sub.add_parser("run")
323 run.add_argument("job_id", nargs="*")
324 run.add_argument("--poll-seconds", type=float, default=0.0)
325 run.add_argument("--interval", type=float, default=2.0)
326 run.add_argument("--limit", type=int, default=20)
327 run.add_argument("--chars", type=int, default=180)
328 run.add_argument("--verbose", action="store_true")
329 run.add_argument("--paths", action="store_true")
330 run.add_argument("--fake", action="store_true", help="Use deterministic fake model responses")
331 run.add_argument("--quiet", action="store_true", help="Write fewer daemon log lines")
332 run.add_argument("--log-file")
333 run.add_argument("--no-follow", action="store_true", help="Start daemon and return without tailing activity")
334 run.set_defaults(func=_handler(handlers, "run"))
335
336 digest = sub.add_parser("digest")
337 digest.add_argument("job_id", nargs="+")
338 digest.set_defaults(func=_handler(handlers, "digest"))
339
340 daily_digest = sub.add_parser("daily-digest")
341 daily_digest.add_argument("--day", help="YYYY-MM-DD. Defaults to today.")
342 daily_digest.set_defaults(func=_handler(handlers, "daily_digest"))
343
344 daemon = sub.add_parser("daemon")
345 daemon.add_argument("--once", action="store_true", help="Run at most one job step and exit")
346 daemon.add_argument("--fake", action="store_true", help="Use deterministic fake model responses")
347 daemon.add_argument("--poll-seconds", type=float, default=0.0)
348 daemon.add_argument("--quiet", action="store_true", help="Do not print foreground progress lines")
349 daemon.add_argument("--verbose", action="store_true", help="Print model-visible job state and step results")
350 daemon.set_defaults(func=_handler(handlers, "daemon"))
351
352 doctor = sub.add_parser("doctor")
353 doctor.add_argument("--check-model", action="store_true", help="Also call the local model /models endpoint")
354 doctor.set_defaults(func=_handler(handlers, "doctor"))
355
356 return parser
nipux_cli/planning.py 384 lines
1"""Generic initial planning primitives for long-running jobs."""
2
3from __future__ import annotations
4
5import re
6from typing import Any
7
8
9_PROFILE_TERMS: dict[str, set[str]] = {
10 "measured": {
11 "accelerate",
12 "benchmark",
13 "compare",
14 "decrease",
15 "faster",
16 "improve",
17 "increase",
18 "latency",
19 "measure",
20 "metric",
21 "optimize",
22 "performance",
23 "reduce",
24 "score",
25 "speed",
26 "test",
27 "throughput",
28 },
29 "deliverable": {
30 "article",
31 "artifact",
32 "checklist",
33 "create",
34 "deck",
35 "doc",
36 "document",
37 "draft",
38 "file",
39 "generate",
40 "guide",
41 "manual",
42 "memo",
43 "outline",
44 "paper",
45 "produce",
46 "presentation",
47 "report",
48 "spec",
49 "template",
50 "write",
51 },
52 "monitor": {
53 "alert",
54 "check",
55 "observe",
56 "periodic",
57 "reporting",
58 "track",
59 "watch",
60 "monitor",
61 },
62 "implementation": {
63 "automate",
64 "build",
65 "change",
66 "code",
67 "debug",
68 "deploy",
69 "fix",
70 "implement",
71 "install",
72 "repair",
73 "run",
74 "setup",
75 },
76 "research": {
77 "analyze",
78 "compare",
79 "explore",
80 "find",
81 "investigate",
82 "learn",
83 "map",
84 "research",
85 "review",
86 "summarize",
87 "survey",
88 },
89}
90_PROFILE_PRIORITY = {
91 "measured": 0,
92 "monitor": 1,
93 "implementation": 2,
94 "deliverable": 3,
95 "research": 4,
96}
97
98
99def objective_profiles(objective: str) -> list[str]:
100 """Infer generic work profiles from an objective without binding to a domain."""
101
102 tokens = set(re.findall(r"[a-z][a-z0-9_-]+", objective.lower()))
103 scores: list[tuple[int, str]] = []
104 for profile, terms in _PROFILE_TERMS.items():
105 score = len(tokens & terms)
106 if score:
107 scores.append((score, profile))
108 if not scores:
109 return ["general"]
110 scores.sort(key=lambda item: (-item[0], _PROFILE_PRIORITY.get(item[1], 99), item[1]))
111 profiles = [profile for _score, profile in scores[:2]]
112 return profiles or ["general"]
113
114
115def initial_plan_for_objective(objective: str) -> dict[str, Any]:
116 objective_text = " ".join(objective.split())
117 profiles = objective_profiles(objective_text)
118 tasks = _initial_tasks_for_profiles(profiles)
119 questions = _initial_questions_for_profiles(profiles)
120 return {
121 "status": "needs_operator_review",
122 "summary": _initial_summary_for_profiles(profiles),
123 "profile": profiles[0],
124 "profiles": profiles,
125 "tasks": tasks,
126 "questions": questions,
127 "objective": objective_text,
128 }
129
130
131def initial_task_contract(task_title: str) -> dict[str, str]:
132 lowered = task_title.lower()
133 if any(word in lowered for word in ("baseline", "benchmark", "compare", "experiment", "measure", "metric", "test")):
134 return {
135 "output_contract": "experiment",
136 "acceptance_criteria": "A baseline, result, comparison, or explicit blocked measurement is recorded.",
137 "evidence_needed": "Experiment record with metric, environment or inputs, result direction, and next action.",
138 "stall_behavior": "Record why measurement is blocked and create the smallest follow-up task that can obtain it.",
139 }
140 if any(
141 word in lowered
142 for word in (
143 "article",
144 "checklist",
145 "draft",
146 "deliverable",
147 "document",
148 "file",
149 "generate",
150 "guide",
151 "manual",
152 "paper",
153 "produce",
154 "report",
155 "template",
156 "write",
157 )
158 ):
159 return {
160 "output_contract": "report",
161 "acceptance_criteria": "A durable draft, report, or deliverable section is saved with its evidence status.",
162 "evidence_needed": "Saved output plus cited evidence, assumptions, gaps, or review notes.",
163 "stall_behavior": "Save the partial output, record the gap, and create the next evidence or revision task.",
164 }
165 if any(word in lowered for word in ("validate", "review", "decide", "criteria", "constraint", "success")):
166 return {
167 "output_contract": "decision",
168 "acceptance_criteria": "The decision, validation result, or success criteria are explicit.",
169 "evidence_needed": "Operator context, durable notes, milestone validation, or task/roadmap updates.",
170 "stall_behavior": "Ask for the missing constraint or record a reversible assumption and continue.",
171 }
172 if any(word in lowered for word in ("monitor", "watch", "track", "check", "observe")):
173 return {
174 "output_contract": "monitor",
175 "acceptance_criteria": "A check cadence, signal, state change, or next observation time is recorded.",
176 "evidence_needed": "Monitor result, status update, deferred follow-up, or recorded blocker.",
177 "stall_behavior": "Defer the job until the next useful check or pivot to a diagnostic task.",
178 }
179 if any(word in lowered for word in ("act", "apply", "build", "change", "deploy", "fix", "implement", "install", "run")):
180 return {
181 "output_contract": "action",
182 "acceptance_criteria": "The action produces an observable durable change or a clear blocker.",
183 "evidence_needed": "Tool result plus file, artifact, ledger, task, roadmap, or experiment update.",
184 "stall_behavior": "Record the blocker and open a smaller follow-up action.",
185 }
186 if "clarify" in lowered or "criteria" in lowered or "constraint" in lowered:
187 return {
188 "output_contract": "decision",
189 "acceptance_criteria": "Success criteria, constraints, and first branches are explicit.",
190 "evidence_needed": "Operator context, durable notes, or an updated roadmap/task queue.",
191 "stall_behavior": "Ask for the missing constraint or record a decision with the best current assumption.",
192 }
193 if "map" in lowered or "research" in lowered or "branch" in lowered or "explore" in lowered or "source" in lowered:
194 return {
195 "output_contract": "research",
196 "acceptance_criteria": "At least one viable branch is selected and low-value branches are avoided.",
197 "evidence_needed": "Source notes, branch rationale, source ledger entries, or saved research output.",
198 "stall_behavior": "Record a low-yield lesson and pivot to a different branch.",
199 }
200 if "collect" in lowered or "evidence" in lowered or "save" in lowered or "output" in lowered:
201 return {
202 "output_contract": "artifact",
203 "acceptance_criteria": "A durable output is saved and linked to the task or ledger.",
204 "evidence_needed": "Artifact, file output, finding record, source record, or experiment record.",
205 "stall_behavior": "Record what evidence is missing and create the next evidence-producing task.",
206 }
207 if "reflect" in lowered or "memory" in lowered or "continue" in lowered:
208 return {
209 "output_contract": "monitor",
210 "acceptance_criteria": "Progress is evaluated from durable deltas and the next branch is chosen.",
211 "evidence_needed": "Reflection, lesson, task update, roadmap validation, or experiment comparison.",
212 "stall_behavior": "Record a blocker or pivot when no durable delta was produced.",
213 }
214 return {
215 "output_contract": "action",
216 "acceptance_criteria": "The task produces an observable durable change.",
217 "evidence_needed": "Tool result plus artifact, ledger, task, roadmap, or experiment update.",
218 "stall_behavior": "Record a blocker and open a smaller follow-up task.",
219 }
220
221
222def initial_roadmap_for_objective(*, title: str, objective: str) -> dict[str, Any]:
223 profiles = objective_profiles(objective)
224 execute_contract = _primary_execution_contract(profiles)
225 return {
226 "title": title,
227 "status": "planned",
228 "objective": objective,
229 "scope": (
230 "Initial roadmap generated from the objective and inferred generic work profile "
231 f"({', '.join(profiles)}). Refine this as evidence and operator context arrive."
232 ),
233 "current_milestone": "Clarify and frame the work",
234 "validation_contract": (
235 "Each milestone needs observable evidence that its acceptance criteria were met, "
236 "or a recorded blocker plus follow-up tasks."
237 ),
238 "milestones": [
239 {
240 "title": "Clarify and frame the work",
241 "status": "planned",
242 "priority": 10,
243 "goal": "Turn the objective into concrete success criteria and constraints.",
244 "acceptance_criteria": "Success criteria and first branches are explicit.",
245 "evidence_needed": "Operator context, planning notes, or a recorded task queue.",
246 "features": [{"title": "Capture success criteria", "status": "planned", "output_contract": "decision"}],
247 },
248 {
249 "title": "Execute first durable branches",
250 "status": "planned",
251 "priority": 8,
252 "goal": "Produce artifacts, findings, actions, or measurements that advance the objective.",
253 "acceptance_criteria": "At least one branch produces durable evidence.",
254 "evidence_needed": "Saved outputs, ledger updates, action results, or experiment records.",
255 "features": [
256 {
257 "title": "Run the first evidence-producing branch",
258 "status": "planned",
259 "output_contract": execute_contract,
260 }
261 ],
262 },
263 {
264 "title": "Validate and continue",
265 "status": "planned",
266 "priority": 6,
267 "goal": "Check results against acceptance criteria and create follow-up work.",
268 "acceptance_criteria": "Validation is passed, failed, or blocked with a next action.",
269 "evidence_needed": "record_milestone_validation entry and follow-up tasks if needed.",
270 "features": [{"title": "Validate the checkpoint", "status": "planned", "output_contract": "decision"}],
271 },
272 ],
273 "metadata": {"phase": "initial_plan"},
274 }
275
276
277def _initial_summary_for_profiles(profiles: list[str]) -> str:
278 primary = profiles[0] if profiles else "general"
279 if primary == "measured":
280 return "I will start by defining the measurable baseline, then iterate on branches that can prove improvement."
281 if primary == "deliverable":
282 return "I will start by framing the deliverable, collecting evidence, and saving drafts that can be improved."
283 if primary == "monitor":
284 return "I will start by defining the watched signals, first check, cadence, and durable update format."
285 if primary == "implementation":
286 return "I will start by inspecting the current state, planning a small action, and validating the result."
287 if primary == "research":
288 return "I will start by mapping source branches, collecting evidence, and saving concise findings."
289 return "I will turn this objective into a durable long-running job before starting tool work."
290
291
292def _initial_tasks_for_profiles(profiles: list[str]) -> list[str]:
293 tasks: list[str] = ["Clarify success criteria, constraints, and review/report cadence."]
294 primary = profiles[0] if profiles else "general"
295 if primary == "measured":
296 tasks.extend(
297 [
298 "Record the baseline metric and measurement method.",
299 "Run the first measurable branch and record an experiment.",
300 "Compare the result with the best known baseline and choose the next branch.",
301 ]
302 )
303 elif primary == "deliverable":
304 tasks.extend(
305 [
306 "Map the outline, audience, evidence gaps, and acceptance criteria.",
307 "Collect evidence for the first section or deliverable unit.",
308 "Save a durable draft or report checkpoint.",
309 "Review and revise the latest draft against evidence gaps and acceptance criteria.",
310 ]
311 )
312 elif primary == "monitor":
313 tasks.extend(
314 [
315 "Define watched signals, check cadence, and alert conditions.",
316 "Run the first status check and save the observation.",
317 "Defer or continue based on the next useful check time.",
318 ]
319 )
320 elif primary == "implementation":
321 tasks.extend(
322 [
323 "Inspect current state and identify the smallest safe action.",
324 "Apply one change or execute one action with observable output.",
325 "Validate the result and record any follow-up branch.",
326 ]
327 )
328 else:
329 tasks.extend(
330 [
331 "Map the first research or execution branches.",
332 "Collect evidence and save outputs as files.",
333 "Reflect on what worked, update memory, and continue with the next branch.",
334 ]
335 )
336 return tasks
337
338
339def _initial_questions_for_profiles(profiles: list[str]) -> list[str]:
340 questions = [
341 "What result would make this job successful?",
342 "Are there constraints, risks, or approaches I should avoid?",
343 ]
344 primary = profiles[0] if profiles else "general"
345 if primary == "measured":
346 questions.insert(1, "What metric should be treated as the primary measure of progress?")
347 elif primary == "deliverable":
348 questions.insert(1, "Who is the audience, and what quality bar should the deliverable meet?")
349 elif primary == "monitor":
350 questions.insert(1, "How often should I check, and what change should trigger a report?")
351 elif primary == "implementation":
352 questions.insert(1, "Which environment or files are in scope, and what requires approval?")
353 else:
354 questions.insert(1, "Which sources, artifacts, or signals should I trust most?")
355 questions.append("Should this run aggressively in the background or wait for review between branches?")
356 return questions
357
358
359def _primary_execution_contract(profiles: list[str]) -> str:
360 if "measured" in profiles:
361 return "experiment"
362 if "deliverable" in profiles:
363 return "report"
364 if "monitor" in profiles:
365 return "monitor"
366 if "implementation" in profiles:
367 return "action"
368 if "research" in profiles:
369 return "research"
370 return "artifact"
371
372
373def format_initial_plan(plan: dict[str, Any]) -> str:
374 tasks = plan.get("tasks") if isinstance(plan.get("tasks"), list) else []
375 questions = plan.get("questions") if isinstance(plan.get("questions"), list) else []
376 lines = [str(plan.get("summary") or "Initial plan created.")]
377 if tasks:
378 lines.append("Plan:")
379 lines.extend(f"- {task}" for task in tasks)
380 if questions:
381 lines.append("Questions:")
382 lines.extend(f"- {question}" for question in questions)
383 lines.append("Reply with answers, or use the right-side Run control when this plan is good enough to start.")
384 return "\n".join(lines)
nipux_cli/progress.py 213 lines
1"""Generic progress summaries for long-running jobs."""
2
3from __future__ import annotations
4
5from dataclasses import dataclass
6from typing import Any
7
8
9@dataclass(frozen=True)
10class ProgressCheckpoint:
11 message: str
12 category: str
13 counts: dict[str, int]
14 deltas: dict[str, int]
15 updates: dict[str, int]
16 resolutions: dict[str, int]
17 recent: str
18
19
20LEDGER_KEYS = ("findings", "sources", "tasks", "experiments", "lessons", "milestones")
21
22
23def build_progress_checkpoint(
24 metadata: dict[str, Any],
25 *,
26 previous_counts: dict[str, Any] | None = None,
27 step_no: int,
28 tool_name: str | None,
29 artifact_id: str = "",
30 is_finding_output: bool = False,
31) -> ProgressCheckpoint:
32 """Create the operator-facing checkpoint text from durable ledger deltas."""
33 counts = ledger_counts(metadata)
34 previous = previous_counts or {}
35 deltas = {key: counts[key] - _as_int(previous.get(key)) for key in LEDGER_KEYS}
36 updates = ledger_update_counts(metadata, since=str(metadata.get("last_checkpoint_at") or ""))
37 resolutions = ledger_resolution_counts(metadata, since=str(metadata.get("last_checkpoint_at") or ""))
38 recent = recent_progress_bits(metadata)
39 if is_finding_output:
40 message = (
41 f"Saved output {artifact_id}; ledgers now have {counts['findings']} findings, "
42 f"{counts['sources']} sources, {counts['tasks']} tasks, and {counts['experiments']} experiments."
43 )
44 category = "finding"
45 else:
46 changed_parts = [_count_phrase(value, key, prefix="+") for key, value in deltas.items() if value > 0]
47 changed_parts.extend(
48 _count_phrase(value, key, prefix="~", suffix="updated") for key, value in updates.items() if value > 0
49 )
50 changed_parts.extend(
51 _count_phrase(value, key, suffix="resolved") for key, value in resolutions.items() if value > 0
52 )
53 changed = ", ".join(changed_parts)
54 made_progress = bool(changed)
55 if not changed:
56 changed = "no new durable ledger entries"
57 message = (
58 f"Checkpoint step #{step_no}: {changed}. Totals: {counts['findings']} findings, "
59 f"{counts['sources']} sources, {counts['tasks']} tasks, {counts['experiments']} experiments, "
60 f"{counts['lessons']} lessons."
61 )
62 category = "progress" if made_progress else "activity"
63 if recent:
64 message = f"{message} Recent: {recent}."
65 return ProgressCheckpoint(
66 message=message,
67 category=category,
68 counts=counts,
69 deltas=deltas,
70 updates=updates,
71 resolutions=resolutions,
72 recent=recent,
73 )
74
75
76def ledger_counts(metadata: dict[str, Any]) -> dict[str, int]:
77 roadmap = metadata.get("roadmap") if isinstance(metadata.get("roadmap"), dict) else {}
78 milestones = roadmap.get("milestones") if isinstance(roadmap.get("milestones"), list) else []
79 return {
80 "findings": len(_metadata_list(metadata, "finding_ledger")),
81 "sources": len(_metadata_list(metadata, "source_ledger")),
82 "tasks": len(_metadata_list(metadata, "task_queue")),
83 "experiments": len(_metadata_list(metadata, "experiment_ledger")),
84 "lessons": len(_metadata_list(metadata, "lessons")),
85 "milestones": len(milestones),
86 }
87
88
89def ledger_update_counts(metadata: dict[str, Any], *, since: str = "") -> dict[str, int]:
90 """Count durable ledger updates that do not increase ledger size."""
91 counts = {key: 0 for key in LEDGER_KEYS}
92 record_map = {
93 "findings": "last_finding_record",
94 "sources": "last_source_record",
95 "tasks": "last_task_record",
96 "experiments": "last_experiment_record",
97 }
98 for key, metadata_key in record_map.items():
99 record = metadata.get(metadata_key)
100 if _updated_existing_record(record, since=since):
101 counts[key] += 1
102 roadmap = metadata.get("last_roadmap_record")
103 if isinstance(roadmap, dict) and _record_after_checkpoint(roadmap, since=since):
104 updated = _as_int(roadmap.get("updated_milestones")) + _as_int(roadmap.get("updated_features"))
105 if roadmap.get("roadmap_updated"):
106 updated += 1
107 added = _as_int(roadmap.get("added_milestones")) + _as_int(roadmap.get("added_features"))
108 if updated > 0 and added <= 0:
109 counts["milestones"] += 1
110 validation = metadata.get("last_milestone_validation")
111 if isinstance(validation, dict) and _record_after_checkpoint(validation, since=since):
112 counts["milestones"] += 1
113 return counts
114
115
116def ledger_resolution_counts(metadata: dict[str, Any], *, since: str = "") -> dict[str, int]:
117 """Count durable branch resolutions so task updates do not look like empty churn."""
118 counts = {key: 0 for key in LEDGER_KEYS}
119 task = metadata.get("last_task_record")
120 if _updated_existing_record(task, since=since):
121 status = str(task.get("status") or "").lower() if isinstance(task, dict) else ""
122 if status in {"done", "blocked", "skipped"} and (task.get("result") or task.get("evidence_needed")):
123 counts["tasks"] += 1
124 experiment = metadata.get("last_experiment_record")
125 if _updated_existing_record(experiment, since=since):
126 status = str(experiment.get("status") or "").lower() if isinstance(experiment, dict) else ""
127 if status in {"measured", "failed", "blocked", "skipped"} or experiment.get("metric_value") is not None:
128 counts["experiments"] += 1
129 validation = metadata.get("last_milestone_validation")
130 if isinstance(validation, dict) and _record_after_checkpoint(validation, since=since):
131 counts["milestones"] += 1
132 return counts
133
134
135def recent_progress_bits(metadata: dict[str, Any]) -> str:
136 bits: list[str] = []
137 findings = _metadata_list(metadata, "finding_ledger")
138 if findings:
139 finding = findings[-1]
140 bits.append(f"finding={_clip_text(str(finding.get('name') or finding.get('title') or 'finding'), 80)}")
141 active_tasks = [
142 task
143 for task in _metadata_list(metadata, "task_queue")
144 if str(task.get("status") or "open").lower() in {"active", "open", "blocked"}
145 ]
146 if active_tasks:
147 task = sorted(active_tasks, key=lambda entry: -_as_int(entry.get("priority")))[0]
148 bits.append(f"task={_clip_text(str(task.get('title') or 'task'), 80)}")
149 measured = [
150 experiment
151 for experiment in _metadata_list(metadata, "experiment_ledger")
152 if experiment.get("metric_value") is not None
153 ]
154 if measured:
155 experiment = measured[-1]
156 metric = f"{experiment.get('metric_name') or 'metric'}={experiment.get('metric_value')}{experiment.get('metric_unit') or ''}"
157 bits.append(f"measurement={_clip_text(metric, 80)}")
158 roadmap = metadata.get("roadmap") if isinstance(metadata.get("roadmap"), dict) else {}
159 milestones = roadmap.get("milestones") if isinstance(roadmap.get("milestones"), list) else []
160 active_milestones = [
161 milestone
162 for milestone in milestones
163 if isinstance(milestone, dict)
164 and str(milestone.get("status") or "planned").lower() in {"active", "validating", "blocked"}
165 ]
166 if active_milestones:
167 bits.append(f"milestone={_clip_text(str(active_milestones[-1].get('title') or 'milestone'), 80)}")
168 return "; ".join(bits)
169
170
171def _metadata_list(metadata: dict[str, Any], key: str) -> list[dict[str, Any]]:
172 value = metadata.get(key)
173 if not isinstance(value, list):
174 return []
175 return [item for item in value if isinstance(item, dict)]
176
177
178def _clip_text(value: str, limit: int) -> str:
179 text = " ".join(value.split())
180 if len(text) <= limit:
181 return text
182 return text[: max(0, limit - 1)].rstrip() + "..."
183
184
185def _updated_existing_record(record: Any, *, since: str) -> bool:
186 return (
187 isinstance(record, dict)
188 and record.get("created") is False
189 and record.get("substantive_update") is not False
190 and _record_after_checkpoint(record, since=since)
191 )
192
193
194def _record_after_checkpoint(record: dict[str, Any], *, since: str) -> bool:
195 if not since:
196 return True
197 updated_at = str(record.get("updated_at") or record.get("validated_at") or record.get("last_seen") or record.get("at") or "")
198 return bool(updated_at and updated_at > since)
199
200
201def _count_phrase(value: int, key: str, *, prefix: str = "", suffix: str = "") -> str:
202 label = key[:-1] if value == 1 and key.endswith("s") else key
203 bits = [f"{prefix}{value} {label}"]
204 if suffix:
205 bits.append(suffix)
206 return " ".join(bits)
207
208
209def _as_int(value: Any) -> int:
210 try:
211 return int(value)
212 except (TypeError, ValueError):
213 return 0
nipux_cli/provider_errors.py 64 lines
1"""Generic model-provider error classification."""
2
3from __future__ import annotations
4
5import json
6from typing import Any
7
8
9PROVIDER_ACTION_MARKERS = (
10 "authenticationerror",
11 "permissiondeniederror",
12 "authentication failed",
13 "permission denied",
14 "invalid api key",
15 "incorrect api key",
16 "user not found",
17 "key limit exceeded",
18 "insufficient_quota",
19 "insufficient quota",
20 "insufficient credits",
21 "billing",
22 "payment required",
23 "credit limit",
24 "quota exceeded",
25 "401",
26 "403",
27)
28
29RATE_LIMIT_MARKERS = (
30 "429",
31 "rate limit",
32 "ratelimit",
33 "too many requests",
34 "temporarily over capacity",
35)
36
37PROVIDER_ACTION_REQUIRED_NOTE = (
38 "Model provider requires operator action: authentication, permission, billing, or quota is blocking calls. "
39 "Paused this job so the daemon does not repeat failing model requests. Update credentials/model access, then resume."
40)
41
42
43def provider_error_text(error: Any) -> str:
44 if isinstance(error, str):
45 return error.lower()
46 parts = [type(error).__name__, str(error)]
47 payload = getattr(error, "payload", None)
48 if isinstance(payload, dict) and payload:
49 parts.append(json.dumps(payload, ensure_ascii=False, default=str))
50 return " ".join(parts).lower()
51
52
53def provider_action_required(text_or_error: Any) -> bool:
54 text = provider_error_text(text_or_error)
55 return any(marker in text for marker in PROVIDER_ACTION_MARKERS)
56
57
58def provider_action_required_note(text_or_error: Any) -> str:
59 return PROVIDER_ACTION_REQUIRED_NOTE if provider_action_required(text_or_error) else ""
60
61
62def provider_rate_limited(text_or_error: Any) -> bool:
63 text = provider_error_text(text_or_error)
64 return any(marker in text for marker in RATE_LIMIT_MARKERS)
nipux_cli/record_commands.py 542 lines
1"""Read-only CLI commands for job records, ledgers, memory, and usage."""
2
3from __future__ import annotations
4
5import json
6from dataclasses import dataclass
7from pathlib import Path
8from typing import Any, Callable
9
10from nipux_cli.artifacts import ArtifactStore, sha256_text
11from nipux_cli.cli_render import job_ref_text as default_job_ref_text
12from nipux_cli.cli_render import json_default, rule
13from nipux_cli.daemon import daemon_lock_status
14from nipux_cli.memory_graph import memory_graph_from_job
15from nipux_cli.memory_graph_view import render_memory_graph_html
16from nipux_cli.tui_status import active_operator_messages, worker_label
17from nipux_cli.tui_style import _one_line
18from nipux_cli.usage import format_usage_report
19
20
21@dataclass(frozen=True)
22class RecordCommandDeps:
23 db_factory: Callable[[], tuple[Any, Any]]
24 resolve_job_id: Callable[[Any, Any], str | None]
25 job_ref_text: Callable[[Any], str] = default_job_ref_text
26
27
28def cmd_findings_impl(args: Any, deps: RecordCommandDeps) -> None:
29 db, _ = deps.db_factory()
30 try:
31 job_id = _resolve_or_print(db, args, deps)
32 if not job_id:
33 return
34 job = db.get_job(job_id)
35 findings = _metadata_records(job, "finding_ledger")
36 if args.json:
37 print(json.dumps(findings, ensure_ascii=False, indent=2, default=json_default))
38 return
39 print(f"findings {job['title']} | {len(findings)} unique")
40 print(rule("="))
41 if not findings:
42 print("none yet")
43 return
44 ranked = sorted(findings, key=lambda finding: float(finding.get("score") or 0), reverse=True)
45 for index, finding in enumerate(ranked[: args.limit], start=1):
46 score = finding.get("score")
47 score_text = f" score={score:g}" if isinstance(score, (int, float)) else ""
48 print(f"{index:>2}. {_one_line(finding.get('name') or 'unknown', 54)}{score_text}")
49 details = " | ".join(
50 value
51 for value in [
52 str(finding.get("location") or "").strip(),
53 str(finding.get("category") or "").strip(),
54 str(finding.get("status") or "").strip(),
55 ]
56 if value
57 )
58 if details:
59 print(f" {details}")
60 if finding.get("url") or finding.get("source_url"):
61 print(f" {finding.get('url') or finding.get('source_url')}")
62 if finding.get("reason"):
63 print(f" {_one_line(finding['reason'], args.chars)}")
64 finally:
65 db.close()
66
67
68def cmd_tasks_impl(args: Any, deps: RecordCommandDeps) -> None:
69 db, _ = deps.db_factory()
70 try:
71 job_id = _resolve_or_print(db, args, deps)
72 if not job_id:
73 return
74 job = db.get_job(job_id)
75 tasks = _metadata_records(job, "task_queue")
76 if args.status:
77 wanted = {status.strip().lower() for status in args.status}
78 tasks = [task for task in tasks if str(task.get("status") or "open").lower() in wanted]
79 if args.json:
80 print(json.dumps(tasks, ensure_ascii=False, indent=2, default=json_default))
81 return
82 status_order = {"active": 0, "open": 1, "blocked": 2, "done": 3, "skipped": 4}
83 ranked = sorted(
84 tasks,
85 key=lambda task: (
86 status_order.get(str(task.get("status") or "open"), 9),
87 -int(task.get("priority") or 0),
88 str(task.get("title") or ""),
89 ),
90 )
91 print(f"tasks {job['title']} | {len(ranked)} tracked")
92 print(rule("="))
93 if not ranked:
94 print("none yet")
95 return
96 for index, task in enumerate(ranked[: args.limit], start=1):
97 status = str(task.get("status") or "open")
98 priority = int(task.get("priority") or 0)
99 print(f"{index:>2}. {status:<7} p={priority:<3} {_one_line(task.get('title') or 'untitled', 54)}")
100 details = " | ".join(
101 value
102 for value in [
103 f"contract={task.get('output_contract')}" if task.get("output_contract") else "",
104 f"accept={task.get('acceptance_criteria')}" if task.get("acceptance_criteria") else "",
105 f"evidence={task.get('evidence_needed')}" if task.get("evidence_needed") else "",
106 f"stall={task.get('stall_behavior')}" if task.get("stall_behavior") else "",
107 str(task.get("goal") or "").strip(),
108 str(task.get("source_hint") or "").strip(),
109 str(task.get("result") or "").strip(),
110 ]
111 if value
112 )
113 if details:
114 print(f" {_one_line(details, args.chars)}")
115 finally:
116 db.close()
117
118
119def cmd_roadmap_impl(args: Any, deps: RecordCommandDeps) -> None:
120 db, _ = deps.db_factory()
121 try:
122 job_id = _resolve_or_print(db, args, deps)
123 if not job_id:
124 return
125 job = db.get_job(job_id)
126 metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
127 roadmap = metadata.get("roadmap") if isinstance(metadata.get("roadmap"), dict) else {}
128 if args.json:
129 print(json.dumps(roadmap, ensure_ascii=False, indent=2, default=json_default))
130 return
131 print(f"roadmap {job['title']}")
132 print(rule("="))
133 if not roadmap:
134 print("none yet")
135 print("the worker can create one with record_roadmap when broad work needs milestones")
136 return
137 milestones = roadmap.get("milestones") if isinstance(roadmap.get("milestones"), list) else []
138 print(f"title: {roadmap.get('title') or 'Roadmap'}")
139 print(f"status: {roadmap.get('status') or 'planned'} | milestones: {len(milestones)}")
140 if roadmap.get("current_milestone"):
141 print(f"current: {_one_line(roadmap.get('current_milestone') or '', args.chars)}")
142 if roadmap.get("scope"):
143 print(f"scope: {_one_line(roadmap.get('scope') or '', args.chars)}")
144 if roadmap.get("validation_contract"):
145 print(f"validation: {_one_line(roadmap.get('validation_contract') or '', args.chars)}")
146 _print_milestones(milestones, limit=args.limit, features=args.features, chars=args.chars)
147 finally:
148 db.close()
149
150
151def cmd_experiments_impl(args: Any, deps: RecordCommandDeps) -> None:
152 db, _ = deps.db_factory()
153 try:
154 job_id = _resolve_or_print(db, args, deps)
155 if not job_id:
156 return
157 job = db.get_job(job_id)
158 experiments = _metadata_records(job, "experiment_ledger")
159 if args.status:
160 wanted = {status.strip().lower() for status in args.status}
161 experiments = [
162 experiment for experiment in experiments if str(experiment.get("status") or "planned").lower() in wanted
163 ]
164 if args.json:
165 print(json.dumps(experiments, ensure_ascii=False, indent=2, default=json_default))
166 return
167 status_order = {"running": 0, "planned": 1, "measured": 2, "blocked": 3, "failed": 4, "skipped": 5}
168 ranked = sorted(
169 experiments,
170 key=lambda experiment: (
171 not bool(experiment.get("best_observed")),
172 status_order.get(str(experiment.get("status") or "planned"), 9),
173 str(experiment.get("updated_at") or experiment.get("created_at") or ""),
174 ),
175 )
176 print(f"experiments {job['title']} | {len(ranked)} tracked")
177 print(rule("="))
178 if not ranked:
179 print("none yet")
180 return
181 for index, experiment in enumerate(ranked[: args.limit], start=1):
182 status = str(experiment.get("status") or "planned")
183 best = " *best*" if experiment.get("best_observed") else ""
184 metric = ""
185 if experiment.get("metric_value") is not None:
186 metric = (
187 f" {experiment.get('metric_name') or 'metric'}="
188 f"{experiment.get('metric_value')}{experiment.get('metric_unit') or ''}"
189 )
190 print(f"{index:>2}. {status:<8} {_one_line(experiment.get('title') or 'experiment', 54)}{metric}{best}")
191 details = " | ".join(
192 value
193 for value in [
194 str(experiment.get("result") or "").strip(),
195 f"next: {experiment.get('next_action')}" if experiment.get("next_action") else "",
196 f"delta: {experiment.get('delta_from_previous_best')}"
197 if experiment.get("delta_from_previous_best") is not None
198 else "",
199 ]
200 if value
201 )
202 if details:
203 print(f" {_one_line(details, args.chars)}")
204 finally:
205 db.close()
206
207
208def cmd_sources_impl(args: Any, deps: RecordCommandDeps) -> None:
209 db, _ = deps.db_factory()
210 try:
211 job_id = _resolve_or_print(db, args, deps)
212 if not job_id:
213 return
214 job = db.get_job(job_id)
215 sources = _metadata_records(job, "source_ledger")
216 if args.json:
217 print(json.dumps(sources, ensure_ascii=False, indent=2, default=json_default))
218 return
219 ranked = sorted(
220 sources,
221 key=lambda source: (float(source.get("usefulness_score") or 0), int(source.get("yield_count") or 0)),
222 reverse=True,
223 )
224 print(f"sources {job['title']} | {len(sources)} scored")
225 print(rule("="))
226 if not ranked:
227 print("none yet")
228 return
229 for index, source in enumerate(ranked[: args.limit], start=1):
230 score = float(source.get("usefulness_score") or 0)
231 print(
232 f"{index:>2}. {_one_line(source.get('source') or 'unknown', 58)} "
233 f"score={score:g} findings={source.get('yield_count') or 0} fails={source.get('fail_count') or 0}"
234 )
235 detail = " | ".join(
236 value
237 for value in [
238 str(source.get("source_type") or "").strip(),
239 str(source.get("last_outcome") or "").strip(),
240 ]
241 if value
242 )
243 if detail:
244 print(f" {_one_line(detail, args.chars)}")
245 warnings = source.get("warnings") if isinstance(source.get("warnings"), list) else []
246 if warnings:
247 print(f" warnings: {_one_line(', '.join(str(item) for item in warnings[-3:]), args.chars)}")
248 finally:
249 db.close()
250
251
252def cmd_memory_impl(args: Any, deps: RecordCommandDeps) -> None:
253 db, config = deps.db_factory()
254 try:
255 job_id = _resolve_or_print(db, args, deps)
256 if not job_id:
257 return
258 job = db.get_job(job_id)
259 metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
260 lessons = _metadata_records(job, "lessons")
261 reflections = _metadata_records(job, "reflections")
262 compact = db.list_memory(job_id)
263 active_operator = active_operator_messages(metadata)
264 graph = memory_graph_from_job(job)
265 if bool(getattr(args, "json", False)):
266 print(json.dumps(graph, ensure_ascii=False, indent=2, default=json_default))
267 return
268 if bool(getattr(args, "graph", False)):
269 _write_memory_graph_view(db=db, config=config, job=job, graph=graph, output=getattr(args, "output", None))
270 return
271 pending_measurement = (
272 metadata.get("pending_measurement_obligation")
273 if isinstance(metadata.get("pending_measurement_obligation"), dict)
274 else {}
275 )
276 print(f"memory {job['title']}")
277 print(rule("="))
278 print(
279 f"lessons={len(lessons)} reflections={len(reflections)} compact_entries={len(compact)} "
280 f"graph_nodes={len(graph['nodes'])} graph_edges={len(graph['edges'])}"
281 )
282 _print_memory_sections(
283 active_operator=active_operator,
284 pending_measurement=pending_measurement,
285 graph=graph,
286 reflections=reflections,
287 lessons=lessons,
288 compact=compact,
289 limit=args.limit,
290 chars=args.chars,
291 )
292 finally:
293 db.close()
294
295
296def _write_memory_graph_view(
297 *,
298 db: Any,
299 config: Any,
300 job: dict[str, Any],
301 graph: dict[str, Any],
302 output: str | None,
303) -> None:
304 html = render_memory_graph_html(job)
305 summary = f"Clickable memory graph with {len(graph['nodes'])} nodes and {len(graph['edges'])} links."
306 metadata = {
307 "memory_graph": True,
308 "node_count": len(graph["nodes"]),
309 "edge_count": len(graph["edges"]),
310 }
311 if output:
312 path = Path(output).expanduser()
313 path.parent.mkdir(parents=True, exist_ok=True)
314 path.write_text(html, encoding="utf-8")
315 artifact_id = db.add_artifact(
316 job_id=str(job["id"]),
317 path=path,
318 sha256=sha256_text(html),
319 artifact_type="html",
320 title="Memory Graph",
321 summary=summary,
322 metadata=metadata,
323 )
324 print(f"memory graph written: {path}")
325 print(f"artifact: {artifact_id}")
326 return
327 store = ArtifactStore(config.runtime.home, db)
328 stored = store.write_text(
329 job_id=str(job["id"]),
330 content=html,
331 title="Memory Graph",
332 summary=summary,
333 artifact_type="html",
334 metadata=metadata,
335 )
336 print(f"memory graph written: {stored.path}")
337 print(f"artifact: {stored.id}")
338
339
340def cmd_metrics_impl(args: Any, deps: RecordCommandDeps) -> None:
341 db, config = deps.db_factory()
342 try:
343 job_id = _resolve_or_print(db, args, deps)
344 if not job_id:
345 return
346 job = db.get_job(job_id)
347 steps = db.list_steps(job_id=job_id)
348 artifacts = db.list_artifacts(job_id, limit=1000)
349 findings = _metadata_records(job, "finding_ledger")
350 sources = _metadata_records(job, "source_ledger")
351 tasks = _metadata_records(job, "task_queue")
352 experiments = _metadata_records(job, "experiment_ledger")
353 lessons = _metadata_records(job, "lessons")
354 reflections = _metadata_records(job, "reflections")
355 metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
356 roadmap = metadata.get("roadmap") if isinstance(metadata.get("roadmap"), dict) else {}
357 milestones = roadmap.get("milestones") if isinstance(roadmap.get("milestones"), list) else []
358 daemon = daemon_lock_status(config.runtime.home / "agentd.lock")
359 finding_batches = [
360 artifact
361 for artifact in artifacts
362 if "finding" in str(artifact.get("title") or artifact.get("summary") or "").lower()
363 ]
364 blocked = [step for step in steps if step.get("status") == "blocked"]
365 failed = [step for step in steps if step.get("status") == "failed"]
366 print(f"metrics {job['title']}")
367 print(rule("="))
368 print(f"daemon: {'running' if daemon['running'] else 'stopped'} | worker: {worker_label(job, bool(daemon['running']))}")
369 print(f"steps: {_step_count(steps)} | failed: {len(failed)} | blocked/recovered: {len(blocked)}")
370 print(f"artifacts: {len(artifacts)} | finding_batches: {len(finding_batches)}")
371 print(
372 f"findings: {len(findings)} | sources: {len(sources)} | tasks: {len(tasks)} | "
373 f"milestones: {len(milestones)} | experiments: {len(experiments)} | "
374 f"lessons: {len(lessons)} | reflections: {len(reflections)}"
375 )
376 _print_best_records(sources=sources, findings=findings, experiments=experiments, chars=args.chars)
377 finally:
378 db.close()
379
380
381def cmd_usage_impl(args: Any, deps: RecordCommandDeps) -> None:
382 db, config = deps.db_factory()
383 try:
384 job_id = _resolve_or_print(db, args, deps)
385 if not job_id:
386 return
387 job = db.get_job(job_id)
388 usage = db.job_token_usage(job_id)
389 usage["input_cost_per_million"] = config.model.input_cost_per_million
390 usage["output_cost_per_million"] = config.model.output_cost_per_million
391 usage["max_job_cost_usd"] = config.runtime.max_job_cost_usd
392 if args.json:
393 print(json.dumps(usage, ensure_ascii=False, indent=2, sort_keys=True))
394 return
395 lines = format_usage_report(
396 title=str(job.get("title") or job_id),
397 usage=usage,
398 context_length=int(config.model.context_length or 0),
399 model=str(config.model.model),
400 base_url=str(config.model.base_url),
401 )
402 print("\n".join(lines))
403 finally:
404 db.close()
405
406
407def _resolve_or_print(db: Any, args: Any, deps: RecordCommandDeps) -> str | None:
408 job_id = deps.resolve_job_id(db, args.job_id)
409 if job_id:
410 return job_id
411 ref = deps.job_ref_text(args.job_id)
412 print(f"No job matched: {ref}" if ref else "No jobs found.")
413 return None
414
415
416def _metadata_records(job: dict[str, Any], key: str) -> list[dict[str, Any]]:
417 metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
418 values = metadata.get(key)
419 if not isinstance(values, list):
420 return []
421 return [value for value in values if isinstance(value, dict)]
422
423
424def _print_milestones(milestones: list[Any], *, limit: int, features: int, chars: int) -> None:
425 if not milestones:
426 return
427 print()
428 status_order = {"active": 0, "validating": 1, "planned": 2, "blocked": 3, "done": 4, "skipped": 5}
429 ranked = sorted(
430 [milestone for milestone in milestones if isinstance(milestone, dict)],
431 key=lambda milestone: (
432 status_order.get(str(milestone.get("status") or "planned"), 9),
433 -int(milestone.get("priority") or 0),
434 str(milestone.get("title") or ""),
435 ),
436 )
437 for index, milestone in enumerate(ranked[:limit], start=1):
438 status = str(milestone.get("status") or "planned")
439 validation = str(milestone.get("validation_status") or "not_started")
440 milestone_features = milestone.get("features") if isinstance(milestone.get("features"), list) else []
441 open_features = sum(
442 1 for feature in milestone_features
443 if isinstance(feature, dict) and str(feature.get("status") or "planned") in {"planned", "active"}
444 )
445 print(
446 f"{index:>2}. {status:<10} validation={validation:<11} "
447 f"p={int(milestone.get('priority') or 0):<3} {_one_line(milestone.get('title') or 'milestone', 54)}"
448 )
449 details = " | ".join(
450 value
451 for value in [
452 f"features={len(milestone_features)}/{open_features} open" if milestone_features else "",
453 f"accept={milestone.get('acceptance_criteria')}" if milestone.get("acceptance_criteria") else "",
454 f"evidence={milestone.get('evidence_needed')}" if milestone.get("evidence_needed") else "",
455 f"result={milestone.get('validation_result')}" if milestone.get("validation_result") else "",
456 f"next={milestone.get('next_action')}" if milestone.get("next_action") else "",
457 ]
458 if value
459 )
460 if details:
461 print(f" {_one_line(details, chars)}")
462 for feature in milestone_features[: min(3, features)]:
463 if isinstance(feature, dict):
464 print(f" - {str(feature.get('status') or 'planned'):<7} {_one_line(feature.get('title') or 'feature', max(30, chars - 16))}")
465
466
467def _print_memory_sections(
468 *,
469 active_operator: list[dict[str, Any]],
470 pending_measurement: dict[str, Any],
471 graph: dict[str, Any],
472 reflections: list[dict[str, Any]],
473 lessons: list[dict[str, Any]],
474 compact: list[dict[str, Any]],
475 limit: int,
476 chars: int,
477) -> None:
478 if active_operator:
479 print()
480 print("active operator context:")
481 for entry in active_operator[-min(limit, 8) :]:
482 marker = entry.get("event_id") or "operator"
483 print(f" {marker}: {_one_line(entry.get('message') or '', chars)}")
484 if pending_measurement:
485 print()
486 print(f"pending measurement: step #{pending_measurement.get('source_step_no') or '?'}")
487 candidates = pending_measurement.get("metric_candidates") if isinstance(pending_measurement.get("metric_candidates"), list) else []
488 if candidates:
489 print(f" candidates: {_one_line(', '.join(str(item) for item in candidates[:5]), chars)}")
490 nodes = graph.get("nodes") if isinstance(graph.get("nodes"), list) else []
491 if nodes:
492 print()
493 print("memory graph:")
494 for node in nodes[: min(limit, 8)]:
495 print(
496 f" {node.get('kind') or 'fact'}:{node.get('status') or 'active'} "
497 f"{_one_line(node.get('title') or node.get('key') or 'memory', chars)}"
498 )
499 print(" view: memory --graph")
500 if reflections:
501 print()
502 print("latest reflection:")
503 reflection = reflections[-1]
504 print(f" {_one_line(reflection.get('summary') or '', chars)}")
505 if reflection.get("strategy"):
506 print(f" strategy: {_one_line(reflection['strategy'], chars)}")
507 if lessons:
508 print()
509 print("latest lessons:")
510 for lesson in lessons[-min(limit, 8) :]:
511 print(f" {lesson.get('category') or 'memory'}: {_one_line(lesson.get('lesson') or '', chars)}")
512 if compact:
513 print()
514 print("compact memory:")
515 for entry in compact[: min(limit, 3)]:
516 print(f" {entry.get('key')}: {_one_line(entry.get('summary') or '', chars)}")
517
518
519def _print_best_records(
520 *,
521 sources: list[dict[str, Any]],
522 findings: list[dict[str, Any]],
523 experiments: list[dict[str, Any]],
524 chars: int,
525) -> None:
526 if sources:
527 best = max(sources, key=lambda source: float(source.get("usefulness_score") or 0))
528 print(f"best source: {_one_line(best.get('source') or '', chars)} score={best.get('usefulness_score')}")
529 if findings:
530 best_finding = max(findings, key=lambda finding: float(finding.get("score") or 0))
531 print(f"best finding: {_one_line(best_finding.get('name') or '', chars)} score={best_finding.get('score')}")
532 measured = [experiment for experiment in experiments if experiment.get("metric_value") is not None]
533 best_experiments = [experiment for experiment in measured if experiment.get("best_observed")]
534 if best_experiments:
535 best_experiment = best_experiments[-1]
536 metric = f"{best_experiment.get('metric_name') or 'metric'}={best_experiment.get('metric_value')}{best_experiment.get('metric_unit') or ''}"
537 print(f"best experiment: {_one_line(best_experiment.get('title') or '', chars)} {metric}")
538
539
540def _step_count(steps: list[dict[str, Any]]) -> int:
541 numbers = [int(step.get("step_no") or 0) for step in steps]
542 return max(numbers, default=0)
nipux_cli/scheduling.py 76 lines
1"""Shared scheduling helpers for deferred long-running work."""
2
3from __future__ import annotations
4
5from datetime import datetime, timezone
6from typing import Any
7
8
9def job_deferred_until(job: dict[str, Any], *, now: datetime | None = None) -> datetime | None:
10 metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
11 raw_until = str(metadata.get("defer_until") or "").strip()
12 if not raw_until:
13 return None
14 try:
15 until = datetime.fromisoformat(raw_until.replace("Z", "+00:00"))
16 except ValueError:
17 return None
18 if until.tzinfo is None:
19 until = until.replace(tzinfo=timezone.utc)
20 until = until.astimezone(timezone.utc)
21 now = (now or datetime.now(timezone.utc)).astimezone(timezone.utc)
22 return until if until > now else None
23
24
25def job_is_deferred(job: dict[str, Any], *, now: datetime | None = None) -> bool:
26 return job_deferred_until(job, now=now) is not None
27
28
29def job_provider_blocked(job: dict[str, Any]) -> bool:
30 """Return true when provider calls need operator action before retrying."""
31
32 metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
33 blocked_raw = str(metadata.get("provider_blocked_at") or "").strip()
34 if not blocked_raw:
35 return False
36 unblocked_raw = str(metadata.get("provider_unblocked_at") or "").strip()
37 if not unblocked_raw:
38 return True
39 blocked_at = _metadata_time(blocked_raw)
40 unblocked_at = _metadata_time(unblocked_raw)
41 if blocked_at is None or unblocked_at is None:
42 return False
43 return blocked_at > unblocked_at
44
45
46def provider_retry_metadata() -> dict[str, str]:
47 """Metadata patch used when the operator explicitly retries provider work."""
48
49 return {
50 "provider_blocked_at": "",
51 "provider_unblocked_at": datetime.now(timezone.utc).isoformat(),
52 }
53
54
55def operator_resume_metadata() -> dict[str, str]:
56 """Metadata patch used when the operator explicitly makes a job runnable."""
57
58 patch = provider_retry_metadata()
59 patch.update(
60 {
61 "defer_until": "",
62 "defer_reason": "",
63 "defer_next_action": "",
64 }
65 )
66 return patch
67
68
69def _metadata_time(value: str) -> datetime | None:
70 try:
71 parsed = datetime.fromisoformat(value.replace("Z", "+00:00"))
72 except ValueError:
73 return None
74 if parsed.tzinfo is None:
75 parsed = parsed.replace(tzinfo=timezone.utc)
76 return parsed.astimezone(timezone.utc)
nipux_cli/service_install.py 179 lines
1"""OS service installation helpers for the Nipux daemon."""
2
3from __future__ import annotations
4
5import os
6import shlex
7import shutil
8import subprocess
9import sys
10from argparse import Namespace
11from pathlib import Path
12
13from nipux_cli.config import load_config
14
15
16def launch_agent_path() -> Path:
17 return Path.home() / "Library" / "LaunchAgents" / "com.nipux.agent.plist"
18
19
20def launch_agent_plist(*, poll_seconds: float, quiet: bool) -> str:
21 config = load_config()
22 config.ensure_dirs()
23 command = [
24 sys.executable,
25 "-m",
26 "nipux_cli.cli",
27 "daemon",
28 "--poll-seconds",
29 str(poll_seconds),
30 ]
31 command.append("--quiet" if quiet else "--verbose")
32 args_xml = "\n".join(f" <string>{xml_escape(part)}</string>" for part in command)
33 log_path = config.runtime.logs_dir / "launchd-daemon.log"
34 return f"""<?xml version="1.0" encoding="UTF-8"?>
35<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
36<plist version="1.0">
37 <dict>
38 <key>Label</key>
39 <string>com.nipux.agent</string>
40 <key>ProgramArguments</key>
41 <array>
42{args_xml}
43 </array>
44 <key>EnvironmentVariables</key>
45 <dict>
46 <key>NIPUX_HOME</key>
47 <string>{xml_escape(str(config.runtime.home))}</string>
48 </dict>
49 <key>RunAtLoad</key>
50 <true/>
51 <key>KeepAlive</key>
52 <true/>
53 <key>StandardOutPath</key>
54 <string>{xml_escape(str(log_path))}</string>
55 <key>StandardErrorPath</key>
56 <string>{xml_escape(str(log_path))}</string>
57 <key>WorkingDirectory</key>
58 <string>{xml_escape(str(Path.cwd()))}</string>
59 </dict>
60</plist>
61"""
62
63
64def systemd_service_path() -> Path:
65 return Path.home() / ".config" / "systemd" / "user" / "nipux.service"
66
67
68def systemd_service_text(*, poll_seconds: float, quiet: bool) -> str:
69 config = load_config()
70 config.ensure_dirs()
71 command = [
72 sys.executable,
73 "-m",
74 "nipux_cli.cli",
75 "daemon",
76 "--poll-seconds",
77 str(poll_seconds),
78 ]
79 command.append("--quiet" if quiet else "--verbose")
80 return "\n".join(
81 [
82 "[Unit]",
83 "Description=Nipux 24/7 autonomous worker",
84 "After=network-online.target",
85 "Wants=network-online.target",
86 "",
87 "[Service]",
88 "Type=simple",
89 f"WorkingDirectory={Path.cwd()}",
90 f"Environment=NIPUX_HOME={config.runtime.home}",
91 f"ExecStart={' '.join(shlex.quote(part) for part in command)}",
92 "Restart=always",
93 "RestartSec=3",
94 "",
95 "[Install]",
96 "WantedBy=default.target",
97 "",
98 ]
99 )
100
101
102def cmd_autostart(args: Namespace) -> None:
103 path = launch_agent_path()
104 label = "gui/" + str(os.getuid()) + "/com.nipux.agent"
105 if args.action == "status":
106 status = "installed" if path.exists() else "not installed"
107 print(f"autostart: {status}")
108 print(f"plist: {path}")
109 if path.exists():
110 result = subprocess.run(
111 ["launchctl", "print", label], check=False, stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL
112 )
113 print("launchd: loaded" if result.returncode == 0 else "launchd: not loaded")
114 return
115 if args.action == "install":
116 path.parent.mkdir(parents=True, exist_ok=True)
117 path.write_text(launch_agent_plist(poll_seconds=args.poll_seconds, quiet=args.quiet), encoding="utf-8")
118 subprocess.run(
119 ["launchctl", "bootout", label], check=False, stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL
120 )
121 result = subprocess.run(["launchctl", "bootstrap", "gui/" + str(os.getuid()), str(path)], check=False)
122 if result.returncode:
123 raise SystemExit(result.returncode)
124 subprocess.run(["launchctl", "enable", label], check=False)
125 print(f"autostart installed: {path}")
126 print("daemon will start at login and launchd will keep it alive")
127 return
128 if args.action == "uninstall":
129 subprocess.run(
130 ["launchctl", "bootout", label], check=False, stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL
131 )
132 if path.exists():
133 path.unlink()
134 print("autostart uninstalled")
135 return
136 raise SystemExit(f"unknown autostart action: {args.action}")
137
138
139def cmd_service(args: Namespace) -> None:
140 path = systemd_service_path()
141 systemctl = shutil.which("systemctl")
142 user_cmd = [systemctl, "--user"] if systemctl else None
143 if args.action == "status":
144 print(f"service: {'installed' if path.exists() else 'not installed'}")
145 print(f"unit: {path}")
146 if user_cmd:
147 result = subprocess.run(
148 [*user_cmd, "is-active", "nipux.service"], check=False, capture_output=True, text=True
149 )
150 print(f"systemd: {result.stdout.strip() or result.stderr.strip() or 'unknown'}")
151 else:
152 print("systemd: unavailable on this machine")
153 return
154 if args.action == "install":
155 path.parent.mkdir(parents=True, exist_ok=True)
156 path.write_text(systemd_service_text(poll_seconds=args.poll_seconds, quiet=args.quiet), encoding="utf-8")
157 print(f"service file written: {path}")
158 if user_cmd:
159 subprocess.run([*user_cmd, "daemon-reload"], check=False)
160 subprocess.run([*user_cmd, "enable", "--now", "nipux.service"], check=False)
161 print("systemd user service enabled and started")
162 else:
163 print(
164 "systemd not found; copy this service to a Linux server or run: systemctl --user enable --now nipux.service"
165 )
166 return
167 if args.action == "uninstall":
168 if user_cmd:
169 subprocess.run([*user_cmd, "disable", "--now", "nipux.service"], check=False)
170 subprocess.run([*user_cmd, "daemon-reload"], check=False)
171 if path.exists():
172 path.unlink()
173 print("service uninstalled")
174 return
175 raise SystemExit(f"unknown service action: {args.action}")
176
177
178def xml_escape(value: str) -> str:
179 return value.replace("&", "&").replace("<", "<").replace(">", ">")
nipux_cli/settings.py 153 lines
1"""Inline config editing helpers for Nipux slash commands."""
2
3from __future__ import annotations
4
5import os
6from pathlib import Path
7from typing import Any
8
9import yaml
10
11from nipux_cli.config import default_config_yaml, get_agent_home, load_config, write_private_text
12from nipux_cli.cli_state import clear_model_setup_verified
13from nipux_cli.tui_commands import SETTINGS_FIELD_TYPES
14
15
16def config_field_value(field: str, config: Any | None = None) -> Any:
17 config = load_config() if config is None else config
18 values = {
19 "model.name": config.model.model,
20 "model.base_url": config.model.base_url,
21 "model.api_key_env": config.model.api_key_env,
22 "model.context_length": config.model.context_length,
23 "model.request_timeout_seconds": config.model.request_timeout_seconds,
24 "model.input_cost_per_million": config.model.input_cost_per_million,
25 "model.output_cost_per_million": config.model.output_cost_per_million,
26 "runtime.home": str(config.runtime.home),
27 "runtime.max_step_seconds": config.runtime.max_step_seconds,
28 "runtime.artifact_inline_char_limit": config.runtime.artifact_inline_char_limit,
29 "runtime.daily_digest_enabled": config.runtime.daily_digest_enabled,
30 "runtime.daily_digest_time": config.runtime.daily_digest_time,
31 "runtime.max_job_cost_usd": config.runtime.max_job_cost_usd,
32 "tools.browser": config.tools.browser,
33 "tools.web": config.tools.web,
34 "tools.shell": config.tools.shell,
35 "tools.files": config.tools.files,
36 }
37 return values.get(field, "")
38
39
40def save_config_field(field: str, raw_value: str) -> Any:
41 value = _coerce_config_value(field, raw_value)
42 data = _load_config_yaml()
43 section, key = field.split(".", 1)
44 target = data.setdefault(section, {})
45 if not isinstance(target, dict):
46 target = {}
47 data[section] = target
48 target[key] = value
49 _save_config_yaml(data)
50 return value
51
52
53def inline_setting_notice(field: str, raw_value: str) -> str:
54 value = raw_value.strip()
55 if not value:
56 return f"kept {field}"
57 if field == "secret:model.api_key":
58 config = load_config()
59 name = config.model.api_key_env
60 _save_env_secret(name, value)
61 clear_model_setup_verified()
62 return f"saved {name} in {_short_path(get_agent_home() / '.env')}"
63 try:
64 saved = save_config_field(field, value)
65 except ValueError as exc:
66 return f"{field}: {exc}"
67 clear_model_setup_verified()
68 return f"saved {field} = {saved}"
69
70
71def edit_target_label(field: str) -> str:
72 if field == "secret:model.api_key":
73 return "API key"
74 return field
75
76
77def edit_target_hint(field: str, config: Any | None = None) -> str:
78 config = config or load_config()
79 if field == "secret:model.api_key":
80 state = "set" if config.model.api_key else "missing"
81 return f"Editing API key ({state}). Enter saves, Esc cancels. Input is hidden."
82 current = config_field_value(field, config)
83 return f"Editing {field}. Current: {current}. Enter saves, Esc cancels, empty keeps current."
84
85
86def edit_target_masks_input(field: str | None) -> bool:
87 return field == "secret:model.api_key"
88
89
90def _config_path() -> Path:
91 return get_agent_home() / "config.yaml"
92
93
94def _load_config_yaml() -> dict[str, Any]:
95 path = _config_path()
96 if not path.exists():
97 loaded = yaml.safe_load(default_config_yaml()) or {}
98 return loaded if isinstance(loaded, dict) else {}
99 loaded = yaml.safe_load(path.read_text(encoding="utf-8")) or {}
100 return loaded if isinstance(loaded, dict) else {}
101
102
103def _save_config_yaml(data: dict[str, Any]) -> None:
104 path = _config_path()
105 write_private_text(path, yaml.safe_dump(data, sort_keys=False))
106
107
108def _save_env_secret(name: str, value: str) -> None:
109 env_path = get_agent_home() / ".env"
110 env_path.parent.mkdir(parents=True, exist_ok=True)
111 existing: dict[str, str] = {}
112 if env_path.exists():
113 for raw in env_path.read_text(encoding="utf-8", errors="ignore").splitlines():
114 if "=" not in raw or raw.strip().startswith("#"):
115 continue
116 key, current = raw.split("=", 1)
117 if key.strip():
118 existing[key.strip()] = current.strip()
119 existing[name] = value
120 write_private_text(env_path, "\n".join(f"{key}={current}" for key, current in existing.items()) + "\n")
121 os.environ[name] = value
122
123
124def _coerce_config_value(field: str, raw_value: str) -> Any:
125 kind = SETTINGS_FIELD_TYPES.get(field, "str")
126 value = raw_value.strip()
127 if field == "runtime.max_job_cost_usd" and value.lower() in {"0", "none", "off", "false", "null"}:
128 return None
129 if kind == "int":
130 return int(value)
131 if kind == "float":
132 return float(value)
133 if kind == "bool":
134 lowered = value.lower()
135 if lowered in {"1", "true", "yes", "on"}:
136 return True
137 if lowered in {"0", "false", "no", "off"}:
138 return False
139 raise ValueError("use true or false")
140 if kind == "path":
141 return str(Path(value).expanduser())
142 return value
143
144
145def _short_path(path: Path | str, *, max_width: int = 80) -> str:
146 text = str(path)
147 home = str(Path.home())
148 if text.startswith(home + os.sep):
149 text = "~" + text[len(home) :]
150 if len(text) <= max_width:
151 return text
152 keep = max(12, max_width - 4)
153 return "..." + text[-keep:]
nipux_cli/settings_commands.py 84 lines
1"""Slash-command handlers for inline Nipux configuration."""
2
3from __future__ import annotations
4
5import shlex
6from contextlib import redirect_stdout
7from io import StringIO
8
9from nipux_cli.config import load_config
10from nipux_cli.settings import config_field_value, inline_setting_notice
11from nipux_cli.tui_commands import CHAT_SETTING_COMMANDS
12
13
14def handle_chat_setting_command(command: str, rest: list[str]) -> bool:
15 if command == "config":
16 print("\n".join(config_summary_lines()))
17 return True
18 if command in {"key", "api-key"}:
19 if not rest:
20 config = load_config()
21 state = "set" if config.model.api_key else "missing"
22 print(f"API key is {state} via {config.model.api_key_env}. Use /api-key KEY to save a new one.")
23 return True
24 print(inline_setting_notice("secret:model.api_key", " ".join(rest)))
25 return True
26 if command not in CHAT_SETTING_COMMANDS:
27 return False
28 field, placeholder = CHAT_SETTING_COMMANDS[command]
29 if not rest:
30 current = config_field_value(field)
31 print(f"{field} = {current}")
32 print(f"usage: /{command} {placeholder}")
33 return True
34 print(inline_setting_notice(field, " ".join(rest)))
35 return True
36
37
38def config_summary_lines() -> list[str]:
39 config = load_config()
40 key_state = "set" if config.model.api_key else "missing"
41 input_cost = _rate_text(config.model.input_cost_per_million)
42 output_cost = _rate_text(config.model.output_cost_per_million)
43 return [
44 "config",
45 f"model: {config.model.model}",
46 f"endpoint: {config.model.base_url}",
47 f"key: {key_state} ({config.model.api_key_env})",
48 f"context: {config.model.context_length}",
49 f"request timeout: {config.model.request_timeout_seconds}s",
50 f"cost rates: input {input_cost} / output {output_cost} per 1M tokens",
51 (
52 "tools: "
53 f"browser {config.tools.browser}, web {config.tools.web}, "
54 f"CLI {config.tools.shell}, files {config.tools.files}"
55 ),
56 f"home: {config.runtime.home}",
57 f"step timeout: {config.runtime.max_step_seconds}s",
58 f"output preview: {config.runtime.artifact_inline_char_limit} chars",
59 f"job cost limit: {_cost_limit_text(config.runtime.max_job_cost_usd)}",
60 f"daily digest: {config.runtime.daily_digest_enabled} at {config.runtime.daily_digest_time}",
61 ]
62
63
64def _rate_text(value: float | None) -> str:
65 return "provider-reported" if value is None else f"${value:g}"
66
67
68def _cost_limit_text(value: float | None) -> str:
69 return "none" if value is None else f"${value:g}"
70
71
72def capture_setting_command(line: str) -> list[str]:
73 try:
74 parts = shlex.split(line[1:] if line.startswith("/") else line)
75 except ValueError as exc:
76 return [f"parse error: {exc}"]
77 if not parts:
78 return []
79 stream = StringIO()
80 with redirect_stdout(stream):
81 if not handle_chat_setting_command(parts[0], parts[1:]):
82 print(f"unknown config command: /{parts[0]}")
83 lines = [" ".join(item.split()) for item in stream.getvalue().splitlines() if item.strip()]
84 return lines[-12:] or ["done"]
nipux_cli/shell_tools.py 348 lines
1"""Shell and workspace file tools for Nipux workers."""
2
3from __future__ import annotations
4
5import contextlib
6import json
7import os
8import re
9import signal
10import subprocess
11import time
12from datetime import datetime, timezone
13from pathlib import Path
14from typing import Any
15
16
17def write_file(args: dict[str, Any], ctx: Any) -> str:
18 del ctx
19 raw_path = str(args.get("path") or "").strip()
20 if not raw_path:
21 return _json({"success": False, "error": "path is required"})
22 if "content" not in args:
23 return _json({"success": False, "error": "content is required"})
24 mode = str(args.get("mode") or "overwrite").strip().lower()
25 if mode not in {"overwrite", "append"}:
26 return _json({"success": False, "error": f"invalid mode: {mode}"})
27 path = Path(raw_path).expanduser()
28 if not path.is_absolute():
29 path = Path.cwd() / path
30 if path.exists() and path.is_dir():
31 return _json({"success": False, "error": f"path is a directory: {path}"})
32 create_parents = bool(args.get("create_parents", True))
33 if create_parents:
34 path.parent.mkdir(parents=True, exist_ok=True)
35 content = str(args.get("content") or "")
36 write_mode = "a" if mode == "append" else "w"
37 with path.open(write_mode, encoding="utf-8") as fh:
38 fh.write(content)
39 return _json({
40 "success": True,
41 "path": str(path),
42 "mode": mode,
43 "bytes": path.stat().st_size,
44 })
45
46
47def shell_exec(args: dict[str, Any], ctx: Any) -> str:
48 command = str(args.get("command") or "").strip()
49 if not command:
50 return _json({"success": False, "error": "command is required"})
51 cwd_raw = str(args.get("cwd") or "").strip()
52 cwd = cwd_raw or None
53 if cwd and not Path(cwd).expanduser().exists():
54 return _json({"success": False, "error": f"cwd does not exist: {cwd}"})
55 timeout_raw = args.get("timeout_seconds")
56 timeout = float(timeout_raw) if isinstance(timeout_raw, (int, float)) else 60.0
57 timeout = max(1.0, min(timeout, 900.0))
58 max_chars_raw = args.get("max_output_chars")
59 max_chars = int(max_chars_raw) if isinstance(max_chars_raw, (int, float)) else 12000
60 max_chars = max(1000, min(max_chars, 50000))
61 shell = "/bin/zsh" if Path("/bin/zsh").exists() else None
62 env = dict(os.environ)
63 env["NIPUX_JOB_ID"] = ctx.job_id
64 if ctx.run_id:
65 env["NIPUX_RUN_ID"] = ctx.run_id
66 started = time.monotonic()
67 process: subprocess.Popen[str] | None = None
68 try:
69 process = subprocess.Popen(
70 command,
71 shell=True,
72 executable=shell,
73 cwd=str(Path(cwd).expanduser()) if cwd else None,
74 env=env,
75 stdout=subprocess.PIPE,
76 stderr=subprocess.PIPE,
77 text=True,
78 start_new_session=True,
79 )
80 _register_shell_process(ctx, process, command=command, cwd=cwd or os.getcwd(), timeout_seconds=timeout)
81 stdout, stderr = process.communicate(timeout=timeout)
82 except subprocess.TimeoutExpired:
83 assert process is not None
84 _terminate_process_group(process)
85 try:
86 stdout, stderr = process.communicate(timeout=2)
87 except subprocess.TimeoutExpired:
88 _kill_process_group(process)
89 stdout, stderr = process.communicate()
90 return _json({
91 "success": False,
92 "error": f"command timed out after {timeout:.1f}s",
93 "timed_out": True,
94 "command": command,
95 "cwd": cwd or os.getcwd(),
96 "timeout_seconds": timeout,
97 "duration_seconds": round(time.monotonic() - started, 3),
98 "returncode": None,
99 "stdout": _truncate_output(stdout, max_chars),
100 "stderr": _truncate_output(stderr, max_chars),
101 })
102 except BaseException:
103 if process is not None and process.poll() is None:
104 _terminate_process_group(process)
105 try:
106 process.wait(timeout=2)
107 except subprocess.TimeoutExpired:
108 _kill_process_group(process)
109 raise
110 finally:
111 if process is not None:
112 _unregister_shell_process(ctx, process.pid)
113 error = _shell_error(process.returncode, stdout, stderr, command=command)
114 return _json({
115 "success": process.returncode == 0 and not error,
116 "error": error,
117 "command": command,
118 "cwd": cwd or os.getcwd(),
119 "duration_seconds": round(time.monotonic() - started, 3),
120 "returncode": process.returncode,
121 "stdout": _truncate_output(stdout, max_chars),
122 "stderr": _truncate_output(stderr, max_chars),
123 })
124
125
126def cleanup_registered_shell_processes(home: str | Path) -> list[dict[str, Any]]:
127 path = _shell_process_registry_path(home)
128 records = _read_shell_process_registry(path)
129 if not records:
130 return []
131 cleaned: list[dict[str, Any]] = []
132 survivors: list[dict[str, Any]] = []
133 for record in records:
134 pid = _as_int(record.get("pid"))
135 if pid <= 0:
136 continue
137 if not _pid_exists(pid):
138 continue
139 try:
140 os.killpg(pid, signal.SIGTERM)
141 except ProcessLookupError:
142 continue
143 except PermissionError:
144 survivors.append(record)
145 continue
146 time.sleep(0.05)
147 if _pid_exists(pid):
148 with contextlib.suppress(ProcessLookupError, PermissionError):
149 os.killpg(pid, signal.SIGKILL)
150 record = dict(record)
151 record["cleaned_at"] = datetime.now(timezone.utc).isoformat()
152 cleaned.append(record)
153 _write_shell_process_registry(path, survivors)
154 return cleaned
155
156
157def _shell_error(returncode: int | None, stdout: str, stderr: str, *, command: str = "") -> str:
158 if returncode == 0:
159 return _shell_success_anomaly(stdout, stderr, command=command)
160 combined = "\n".join(part.strip() for part in (stderr, stdout) if part and part.strip())
161 lowered = combined.lower()
162 if "sudo:" in lowered and ("password" in lowered or "terminal is required" in lowered):
163 return "command requires interactive sudo/password; configure non-interactive privileges or choose a non-sudo path"
164 if "permission denied" in lowered:
165 return "command failed with permission denied"
166 missing_probe = _missing_executable_probe(command, combined)
167 if missing_probe:
168 return f"command probe found no executable: {missing_probe}"
169 excerpt = " ".join(combined.split())[:500] if combined else "no output"
170 return f"command exited with status {returncode}: {excerpt}"
171
172
173def _shell_success_anomaly(stdout: str, stderr: str, *, command: str = "") -> str:
174 combined = "\n".join(part.strip() for part in (stderr, stdout) if part and part.strip())
175 if not combined:
176 empty_probe = _empty_observation_probe(command)
177 if empty_probe:
178 return f"command probe produced no output despite exit status 0: {empty_probe}"
179 return ""
180 lowered = combined.lower()
181 auth_markers = (
182 "401 unauthorized",
183 "403 forbidden",
184 "authentication failed",
185 "username/password authentication failed",
186 "invalid username or password",
187 "permission denied",
188 )
189 if _shell_sudo_password_anomaly(lowered):
190 return "command output indicates interactive sudo/password requirement despite exit status 0"
191 if any(marker in lowered for marker in auth_markers):
192 excerpt = " ".join(combined.split())[:500]
193 return f"command output indicates authentication or authorization failure despite exit status 0: {excerpt}"
194 missing_probe = _missing_executable_probe(command, combined)
195 if missing_probe:
196 return f"command probe found no executable despite exit status 0: {missing_probe}"
197 command_missing_match = _shell_missing_command_anomaly(combined)
198 if command_missing_match:
199 excerpt = " ".join(combined.split())[:500]
200 return f"command output indicates missing command despite exit status 0: {excerpt}"
201 build_error_match = _shell_build_error_anomaly(combined)
202 if build_error_match:
203 excerpt = " ".join(combined.split())[:500]
204 return f"command output indicates build/tool failure despite exit status 0: {excerpt}"
205 http_error_match = _shell_http_error_anomaly(lowered)
206 if http_error_match:
207 excerpt = " ".join(combined.split())[:500]
208 return f"command output indicates HTTP failure despite exit status 0: {excerpt}"
209 return ""
210
211
212def _missing_executable_probe(command: str, combined_output: str) -> str:
213 text = str(command or "").strip()
214 match = re.match(r"^(?:which|command\s+-v)\s+([A-Za-z0-9_.+-]+)(?:\s|$)", text)
215 if not match:
216 return ""
217 if not combined_output.strip() or "not found" in combined_output.lower():
218 return match.group(1)
219 return ""
220
221
222def _empty_observation_probe(command: str) -> str:
223 text = str(command or "").strip()
224 if re.match(r"^(?:which|command\s+-v)\s+([A-Za-z0-9_.+-]+)(?:\s|$)", text):
225 return "probe found no executable: executable lookup returned no path"
226 if re.match(r"^(?:find|ls|stat|file)\b", text):
227 return "read-only filesystem probe returned no observation"
228 return ""
229
230
231def _shell_missing_command_anomaly(text: str) -> bool:
232 return bool(
233 re.search(
234 r"(?im)(?:^|\n)(?:/bin/sh:\s*\d+:\s*)?(?:(?:/|~)[^\s:'\"]+|[A-Za-z0-9_.+-]+):\s*(?:command not found|not found)\s*$",
235 text,
236 )
237 )
238
239
240def _shell_sudo_password_anomaly(text: str) -> bool:
241 return "sudo:" in text and ("password" in text or "terminal is required" in text)
242
243
244def _shell_build_error_anomaly(text: str) -> bool:
245 lowered = text.lower()
246 if "no rule to make target" in lowered:
247 return True
248 if "***" in text and "stop." in lowered:
249 return True
250 return bool(re.search(r"(?im)^\s*(?:make(?:\[\d+\])?:\s*)?\*\*\* .*\bstop\.\s*$", text))
251
252
253def _shell_http_error_anomaly(text: str) -> bool:
254 return any(f" {code} " in f" {text} " for code in ("400", "401", "403", "404", "429", "500", "502", "503", "504")) and any(
255 marker in text for marker in ("http", "error", "unauthorized", "forbidden", "not found", "too many requests")
256 )
257
258
259def _terminate_process_group(process: subprocess.Popen[str]) -> None:
260 try:
261 os.killpg(process.pid, signal.SIGTERM)
262 except ProcessLookupError:
263 return
264
265
266def _kill_process_group(process: subprocess.Popen[str]) -> None:
267 try:
268 os.killpg(process.pid, signal.SIGKILL)
269 except ProcessLookupError:
270 return
271
272
273def _register_shell_process(ctx: Any, process: subprocess.Popen[str], *, command: str, cwd: str, timeout_seconds: float) -> None:
274 path = _shell_process_registry_path(ctx.config.runtime.home)
275 path.parent.mkdir(parents=True, exist_ok=True)
276 record = {
277 "pid": process.pid,
278 "pgid": process.pid,
279 "job_id": getattr(ctx, "job_id", ""),
280 "run_id": getattr(ctx, "run_id", "") or "",
281 "step_id": getattr(ctx, "step_id", "") or "",
282 "command": command[:1000],
283 "cwd": cwd,
284 "timeout_seconds": timeout_seconds,
285 "started_at": datetime.now(timezone.utc).isoformat(),
286 }
287 with path.open("a", encoding="utf-8") as handle:
288 handle.write(json.dumps(record, ensure_ascii=False) + "\n")
289
290
291def _unregister_shell_process(ctx: Any, pid: int) -> None:
292 path = _shell_process_registry_path(ctx.config.runtime.home)
293 records = [record for record in _read_shell_process_registry(path) if _as_int(record.get("pid")) != pid]
294 _write_shell_process_registry(path, records)
295
296
297def _shell_process_registry_path(home: str | Path) -> Path:
298 return Path(home).expanduser() / "runtime" / "shell_processes.jsonl"
299
300
301def _read_shell_process_registry(path: Path) -> list[dict[str, Any]]:
302 if not path.exists():
303 return []
304 records: list[dict[str, Any]] = []
305 for line in path.read_text(encoding="utf-8").splitlines():
306 try:
307 record = json.loads(line)
308 except json.JSONDecodeError:
309 continue
310 if isinstance(record, dict):
311 records.append(record)
312 return records
313
314
315def _write_shell_process_registry(path: Path, records: list[dict[str, Any]]) -> None:
316 path.parent.mkdir(parents=True, exist_ok=True)
317 if not records:
318 with contextlib.suppress(FileNotFoundError):
319 path.unlink()
320 return
321 path.write_text("".join(json.dumps(record, ensure_ascii=False) + "\n" for record in records), encoding="utf-8")
322
323
324def _pid_exists(pid: int) -> bool:
325 try:
326 os.kill(pid, 0)
327 return True
328 except ProcessLookupError:
329 return False
330
331
332def _as_int(value: Any) -> int:
333 try:
334 return int(value)
335 except (TypeError, ValueError):
336 return 0
337
338
339def _truncate_output(value: Any, max_chars: int) -> str:
340 text = value.decode("utf-8", errors="replace") if isinstance(value, bytes) else str(value or "")
341 if len(text) <= max_chars:
342 return text
343 omitted = len(text) - max_chars
344 return text[:max_chars] + f"\n... truncated {omitted} chars ..."
345
346
347def _json(value: Any) -> str:
348 return json.dumps(value, ensure_ascii=False)
nipux_cli/source_quality.py 32 lines
1"""Source quality checks for web and browser tools."""
2
3from __future__ import annotations
4
5ANTI_BOT_MARKERS = (
6 "performing security verification",
7 "cloudflare security challenge",
8 "verifies you are not a bot",
9 "verify you are not a bot",
10 "enable javascript and cookies",
11 "checking your browser before accessing",
12 "just a moment...",
13 "you have been blocked",
14 "browsing and clicking at a speed much faster than expected",
15 "there is a robot on the same network",
16)
17
18
19def anti_bot_reason(*parts: str) -> str | None:
20 """Return a short reason if text looks like an anti-bot interstitial."""
21
22 text = " ".join(part for part in parts if part).lower()
23 if not text:
24 return None
25 for marker in ANTI_BOT_MARKERS:
26 if marker in text:
27 if "cloudflare" in text:
28 return "cloudflare anti-bot challenge"
29 if "captcha" in text or "you have been blocked" in text:
30 return "captcha/anti-bot block"
31 return "anti-bot challenge"
32 return None
nipux_cli/task_match.py 102 lines
1"""Task title matching helpers for long-running job queues."""
2
3from __future__ import annotations
4
5import re
6from typing import Any
7
8TASK_MATCH_STOPWORDS = {
9 "a",
10 "an",
11 "and",
12 "as",
13 "at",
14 "by",
15 "for",
16 "from",
17 "in",
18 "into",
19 "of",
20 "on",
21 "or",
22 "the",
23 "then",
24 "to",
25 "via",
26 "with",
27}
28
29
30def task_key(parent: str, title: str) -> str:
31 return re.sub(r"[^a-z0-9]+", "-", f"{parent}|{title}".lower()).strip("-")[:120]
32
33
34def find_semantic_task_match(
35 *,
36 title: str,
37 parent: str,
38 tasks: list[dict[str, Any]],
39 statuses: set[str] | None = None,
40 min_score: float = 0.55,
41) -> dict[str, Any] | None:
42 incoming_title = str(title or "").strip()
43 if not incoming_title:
44 return None
45 incoming_parent = str(parent or "").strip()
46 incoming_key = task_key(incoming_parent, incoming_title)
47 incoming_tokens = _task_tokens(incoming_title)
48 if len(incoming_tokens) < 2:
49 return None
50 allowed_statuses = statuses or {"active", "open", "blocked"}
51 best: dict[str, Any] | None = None
52 best_score = 0.0
53 for task in tasks:
54 if not isinstance(task, dict):
55 continue
56 candidate_title = str(task.get("title") or "").strip()
57 if not candidate_title:
58 continue
59 candidate_parent = str(task.get("parent") or "").strip()
60 if incoming_parent and candidate_parent and incoming_parent != candidate_parent:
61 continue
62 candidate_key = str(task.get("key") or task_key(candidate_parent, candidate_title))
63 if candidate_key == incoming_key:
64 return None
65 status = str(task.get("status") or "open").strip().lower().replace(" ", "_")
66 if status not in allowed_statuses:
67 continue
68 candidate_tokens = _task_tokens(candidate_title)
69 if len(candidate_tokens) < 2:
70 continue
71 score, overlap = _task_similarity(incoming_tokens, candidate_tokens)
72 if overlap < 2 or score < min_score or score <= best_score:
73 continue
74 best_score = score
75 best = {
76 "task": task,
77 "key": candidate_key,
78 "title": candidate_title,
79 "parent": candidate_parent,
80 "status": status,
81 "score": round(score, 3),
82 "overlap": overlap,
83 }
84 return best
85
86
87def _task_tokens(text: str) -> set[str]:
88 tokens = {
89 token
90 for token in re.findall(r"[a-z0-9]+", str(text or "").lower())
91 if len(token) > 1 and token not in TASK_MATCH_STOPWORDS
92 }
93 return tokens
94
95
96def _task_similarity(left: set[str], right: set[str]) -> tuple[float, int]:
97 overlap = len(left & right)
98 if overlap <= 0:
99 return 0.0, 0
100 jaccard = overlap / max(1, len(left | right))
101 containment = overlap / max(1, min(len(left), len(right)))
102 return max(jaccard, containment), overlap
nipux_cli/templates.py 67 lines
1"""Program templates for generic long-running jobs."""
2
3from __future__ import annotations
4
5
6def program_for_job(*, kind: str, title: str, objective: str) -> str:
7 kind = (kind or "generic").strip().lower()
8 body = _TEMPLATES.get(kind, _generic_template)
9 return body(title=title, objective=objective).strip() + "\n"
10
11
12def _generic_template(*, title: str, objective: str) -> str:
13 return f"""# {title}
14
15## Objective
16
17{objective}
18
19## Operating Rules
20
21- Work forever in bounded, resumable steps until the operator explicitly cancels or pauses the job.
22- Treat useful results as checkpoints, not endings: save the result, create the next branch, and continue.
23- Save important observations as artifacts.
24- Use report_update for short progress notes or blocked-state notes.
25- Use record_lesson when a source, mistake, operator preference, or strategy should affect future steps.
26- Use record_source and record_findings when those tools are available so the job improves its ledgers over time.
27- Use record_roadmap for broad work that needs milestones, feature groups, validation contracts, and roadmap-level checkpoints.
28- Use record_milestone_validation to validate milestones from evidence and create follow-up tasks when validation fails or blocks.
29- Use record_tasks to split broad objectives into durable branches with output contracts, acceptance criteria, required evidence, and stall behavior.
30- Use record_experiment whenever a branch produces measured results, comparisons, benchmarks, scores, or optimization data.
31- Use acknowledge_operator_context after incorporating or superseding active operator steering.
32- Use browser and web tools first. Do not assume memory is exact unless it points to an artifact.
33- Prefer quantity of attempts over one giant plan.
34"""
35
36
37def _research_paper_template(*, title: str, objective: str) -> str:
38 return f"""# {title}
39
40## Objective
41
42{objective}
43
44## Research Rules
45
46- Save exact source URLs and extracted text snippets as artifacts.
47- Keep a rolling citation map with claims, evidence, and open questions.
48- Separate facts from hypotheses.
49- Produce drafts only after evidence artifacts exist.
50- Use report_update for brief progress, gap, or blocked-source notes.
51- Use record_roadmap for the paper outline, evidence milestones, draft milestones, and validation checkpoints.
52- Use record_milestone_validation when a section or draft milestone has enough evidence to judge.
53- Use record_tasks to track source clusters, sections, and unresolved evidence gaps with output contracts and acceptance criteria.
54- Use acknowledge_operator_context after incorporating or superseding active operator steering.
55
56## Step Loop
57
581. Search for one source cluster.
592. Extract and save relevant evidence.
603. Update the citation/evidence map.
614. Write or improve one section when enough evidence exists.
62"""
63
64
65_TEMPLATES = {
66 "research_paper": _research_paper_template,
67}
nipux_cli/tools.py 2229 lines
1"""Static tool registry for the Nipux agent."""
2
3from __future__ import annotations
4
5import json
6import re
7import time
8from dataclasses import dataclass
9from datetime import datetime, timedelta, timezone
10from typing import Any, Callable
11
12from nipux_cli.artifacts import ArtifactStore
13from nipux_cli.config import AppConfig
14from nipux_cli.db import AgentDB
15from nipux_cli.metric_format import format_metric_value
16from nipux_cli.digest import send_digest_email
17from nipux_cli.memory_graph import memory_graph_from_job, search_memory_graph
18from nipux_cli.planning import initial_task_contract
19from nipux_cli.shell_tools import shell_exec as _shell_exec
20from nipux_cli.shell_tools import write_file as _write_file
21from nipux_cli.task_match import find_semantic_task_match, task_key
22
23
24@dataclass(frozen=True)
25class ToolContext:
26 config: AppConfig
27 db: AgentDB
28 artifacts: ArtifactStore
29 job_id: str
30 run_id: str | None = None
31 step_id: str | None = None
32 task_id: str | None = None
33
34
35Handler = Callable[[dict[str, Any], ToolContext], str]
36
37EVIDENCE_OUTPUT_TERMS = {
38 "audit",
39 "checkpoint",
40 "evidence",
41 "extract",
42 "extracted",
43 "notes",
44 "source",
45 "sources",
46}
47DELIVERABLE_OUTPUT_TERMS = {
48 "compiled",
49 "deliverable",
50 "draft",
51 "final",
52 "revision",
53 "updated",
54}
55
56
57@dataclass(frozen=True)
58class ToolSpec:
59 name: str
60 description: str
61 parameters: dict[str, Any]
62 handler: Handler
63
64 def as_openai_tool(self) -> dict[str, Any]:
65 return {
66 "type": "function",
67 "function": {
68 "name": self.name,
69 "description": self.description,
70 "parameters": self.parameters,
71 },
72 }
73
74
75def _missing_argument(value: Any) -> bool:
76 if value is None:
77 return True
78 if isinstance(value, str):
79 stripped = value.strip()
80 if not stripped:
81 return True
82 lowered = stripped.lower()
83 if lowered in {"...", "…", "<...>", "{...}", "{{...}}", "placeholder", "todo", "tbd"}:
84 return True
85 if re.fullmatch(r"[.<{\[(\s]*\.{3,}[\s>}\])]*", stripped):
86 return True
87 return False
88 if isinstance(value, (list, dict, tuple, set)):
89 return not value
90 return False
91
92
93REFERENCE_LIKE_FIELD_PATTERN = re.compile(r"(?i)(?:^|_)(?:artifact|id|path|ref|source|url)(?:$|_)")
94
95
96def _placeholder_argument(value: Any) -> bool:
97 if _missing_argument(value):
98 return True
99 if not isinstance(value, str):
100 return False
101 stripped = value.strip().strip("'\"")
102 if not stripped:
103 return True
104 if re.search(r"\s", stripped):
105 return False
106 return bool(re.search(r"(?:\.{3,}|…)$", stripped))
107
108
109def _schema_placeholder_arguments(schema: dict[str, Any], value: Any, *, path: str = "") -> list[str]:
110 schema_type = schema.get("type")
111 placeholders: list[str] = []
112 if schema_type == "object" and isinstance(value, dict):
113 properties = schema.get("properties") if isinstance(schema.get("properties"), dict) else {}
114 for name, child_schema in properties.items():
115 if name not in value or not isinstance(child_schema, dict):
116 continue
117 child_path = f"{path}.{name}" if path else str(name)
118 if REFERENCE_LIKE_FIELD_PATTERN.search(str(name)) and _placeholder_argument(value.get(name)):
119 placeholders.append(child_path)
120 continue
121 placeholders.extend(_schema_placeholder_arguments(child_schema, value.get(name), path=child_path))
122 elif schema_type == "array" and isinstance(value, list):
123 item_schema = schema.get("items") if isinstance(schema.get("items"), dict) else {}
124 for index, item in enumerate(value[:50]):
125 placeholders.extend(_schema_placeholder_arguments(item_schema, item, path=f"{path}[{index}]"))
126 return placeholders
127
128
129def _schema_missing_arguments(schema: dict[str, Any], value: Any, *, path: str = "") -> list[str]:
130 schema_type = schema.get("type")
131 missing: list[str] = []
132 if schema_type == "object" and isinstance(value, dict):
133 properties = schema.get("properties") if isinstance(schema.get("properties"), dict) else {}
134 for required in schema.get("required") or []:
135 name = str(required)
136 if _missing_argument(value.get(name)):
137 missing.append(f"{path}.{name}" if path else name)
138 for name, child_schema in properties.items():
139 if name in value and isinstance(child_schema, dict) and not _missing_argument(value.get(name)):
140 child_path = f"{path}.{name}" if path else str(name)
141 missing.extend(_schema_missing_arguments(child_schema, value.get(name), path=child_path))
142 elif schema_type == "array" and isinstance(value, list):
143 item_schema = schema.get("items") if isinstance(schema.get("items"), dict) else {}
144 for index, item in enumerate(value[:50]):
145 missing.extend(_schema_missing_arguments(item_schema, item, path=f"{path}[{index}]"))
146 return missing
147
148
149REQUIRED_ARGUMENT_ALIASES: dict[str, dict[str, tuple[str, ...]]] = {
150 "record_experiment": {
151 "title": ("title", "name", "metric_name", "hypothesis", "result", "outcome"),
152 },
153}
154
155REQUIRED_ARGUMENT_GROUPS: dict[str, tuple[tuple[str, tuple[str, ...]], ...]] = {
156 "read_artifact": (("artifact reference", ("artifact_id", "path", "title", "ref")),),
157 "record_memory_graph": (("nodes or edges", ("nodes", "edges")),),
158}
159
160
161def _json(value: Any) -> str:
162 return json.dumps(value, ensure_ascii=False)
163
164
165def _write_artifact(args: dict[str, Any], ctx: ToolContext) -> str:
166 content = str(args.get("content") or "")
167 if not content:
168 return _json({"success": False, "error": "content is required"})
169 stored = ctx.artifacts.write_text(
170 job_id=ctx.job_id,
171 run_id=ctx.run_id,
172 step_id=ctx.step_id,
173 content=content,
174 title=args.get("title"),
175 summary=args.get("summary"),
176 artifact_type=str(args.get("type") or "text"),
177 metadata=args.get("metadata") if isinstance(args.get("metadata"), dict) else None,
178 )
179 return _json({
180 "success": True,
181 "artifact_id": stored.id,
182 "path": str(stored.path),
183 "sha256": stored.sha256,
184 })
185
186
187def _read_artifact(args: dict[str, Any], ctx: ToolContext) -> str:
188 artifact_ref = str(args.get("artifact_id") or args.get("path") or args.get("title") or args.get("ref") or "")
189 if not artifact_ref:
190 return _json({"success": False, "error": "artifact_id is required"})
191 resolved = _resolve_artifact_ref(ctx, artifact_ref)
192 if not resolved:
193 recent = _recent_artifact_refs(ctx)
194 return _json({
195 "success": False,
196 "recoverable": True,
197 "error": f"artifact not found: {artifact_ref}",
198 "guidance": (
199 "The requested artifact reference does not exist. Use one of the recent_artifacts refs, "
200 "call search_artifacts with a concrete query, or continue from already observed evidence."
201 ),
202 "recent_artifacts": recent,
203 })
204 try:
205 content = ctx.artifacts.read_text(resolved["id"])
206 except (OSError, KeyError, ValueError) as exc:
207 return _json({"success": False, "artifact_id": resolved["id"], "error": str(exc)})
208 return _json({"success": True, "artifact_id": resolved["id"], "title": resolved.get("title"), "path": resolved.get("path"), "content": content})
209
210
211def _recent_artifact_refs(ctx: ToolContext, limit: int = 8) -> list[dict[str, str]]:
212 artifacts = ctx.db.list_artifacts(ctx.job_id, limit=limit)
213 refs: list[dict[str, str]] = []
214 for index, artifact in enumerate(artifacts, start=1):
215 refs.append({
216 "number": str(index),
217 "id": str(artifact.get("id") or ""),
218 "title": str(artifact.get("title") or ""),
219 "path": str(artifact.get("path") or ""),
220 })
221 return refs
222
223
224def _resolve_artifact_ref(ctx: ToolContext, artifact_ref: str) -> dict[str, Any] | None:
225 ref = artifact_ref.strip().strip("'\"")
226 if not ref:
227 return None
228 artifacts = ctx.db.list_artifacts(ctx.job_id, limit=250)
229 for artifact in artifacts:
230 if ref == artifact.get("id") or ref == str(artifact.get("path") or ""):
231 return artifact
232 if ref.isdigit():
233 index = int(ref) - 1
234 if 0 <= index < len(artifacts):
235 return artifacts[index]
236 lowered = ref.lower()
237 for artifact in artifacts:
238 title = str(artifact.get("title") or "").lower()
239 if lowered == title:
240 return artifact
241 for artifact in artifacts:
242 haystack = " ".join(str(artifact.get(key) or "") for key in ("title", "summary", "path")).lower()
243 if lowered and lowered in haystack:
244 return artifact
245 return None
246
247
248def _search_artifacts(args: dict[str, Any], ctx: ToolContext) -> str:
249 query = str(args.get("query") or "")
250 limit = int(args.get("limit") or 10)
251 return _json({"success": True, "results": ctx.artifacts.search_text(job_id=ctx.job_id, query=query, limit=limit)})
252
253
254def _update_job_state(args: dict[str, Any], ctx: ToolContext) -> str:
255 status = str(args.get("status") or "").strip().lower()
256 if status in {"paused", "cancelled", "completed", "failed"}:
257 note = str(args.get("note") or "")
258 follow_up_task = None
259 if status == "completed":
260 follow_up_task = _append_completion_audit_task(
261 ctx,
262 source="update_job_state",
263 requested_status=status,
264 claimed_message=note,
265 )
266 metadata = {"requested_status": status, "kept_running": True}
267 if follow_up_task is not None:
268 metadata["follow_up_task"] = follow_up_task.get("key")
269 ctx.db.append_agent_update(
270 ctx.job_id,
271 f"Worker requested {status}; job remains running. {note}".strip(),
272 category="progress" if status == "completed" else "blocked",
273 metadata=metadata,
274 )
275 result = {
276 "success": True,
277 "job_id": ctx.job_id,
278 "status": "running",
279 "requested_status": status,
280 "kept_running": True,
281 "guidance": (
282 "Jobs are perpetual by default. Do not mark the job complete or failed. "
283 "Save the current result, create follow-up tasks, report a checkpoint, and continue."
284 ),
285 }
286 if follow_up_task is not None:
287 result["follow_up_task"] = follow_up_task
288 return _json(result)
289 if status not in {"queued", "running"}:
290 return _json({"success": False, "error": f"invalid status: {status}"})
291 note = str(args.get("note") or "")
292 patch = {"last_note": note} if note else None
293 ctx.db.update_job_status(ctx.job_id, status, metadata_patch=patch)
294 return _json({"success": True, "job_id": ctx.job_id, "status": status})
295
296
297def _defer_job(args: dict[str, Any], ctx: ToolContext) -> str:
298 until = _defer_until(args)
299 reason = str(args.get("reason") or "").strip()
300 next_action = str(args.get("next_action") or "").strip()
301 patch = {
302 "defer_until": until.isoformat(),
303 "defer_reason": reason,
304 "defer_next_action": next_action,
305 }
306 job = ctx.db.get_job(ctx.job_id)
307 status = str(job.get("status") or "queued")
308 if status not in {"queued", "running"}:
309 status = "queued"
310 ctx.db.update_job_status(ctx.job_id, status, metadata_patch=patch)
311 message = f"Deferred until {until.isoformat()}"
312 if reason:
313 message += f": {reason}"
314 if next_action:
315 message += f" Next: {next_action}"
316 ctx.db.append_agent_update(
317 ctx.job_id,
318 message,
319 category="progress",
320 metadata={"defer_until": until.isoformat(), "reason": reason, "next_action": next_action},
321 )
322 return _json({
323 "success": True,
324 "job_id": ctx.job_id,
325 "status": status,
326 "defer_until": until.isoformat(),
327 "reason": reason,
328 "next_action": next_action,
329 })
330
331
332def _defer_until(args: dict[str, Any]) -> datetime:
333 raw_until = str(args.get("until") or "").strip()
334 if raw_until:
335 try:
336 parsed = datetime.fromisoformat(raw_until.replace("Z", "+00:00"))
337 except ValueError:
338 parsed = datetime.now(timezone.utc)
339 if parsed.tzinfo is None:
340 parsed = parsed.replace(tzinfo=timezone.utc)
341 return parsed.astimezone(timezone.utc)
342 seconds = args.get("seconds", args.get("delay_seconds", 300))
343 try:
344 delay = max(1.0, float(seconds))
345 except (TypeError, ValueError):
346 delay = 300.0
347 return datetime.now(timezone.utc) + timedelta(seconds=delay)
348
349
350def _report_update(args: dict[str, Any], ctx: ToolContext) -> str:
351 message = str(args.get("message") or args.get("summary") or "").strip()
352 if not message:
353 return _json({"success": False, "error": "message is required"})
354 category = str(args.get("category") or "progress").strip().lower()
355 metadata = args.get("metadata") if isinstance(args.get("metadata"), dict) else {}
356 normalized_message = _perpetual_checkpoint_message(message)
357 if normalized_message != message:
358 metadata = {**metadata, "original_message": message, "rewritten_completion_claim": True}
359 message = normalized_message
360 follow_up_task = _append_completion_audit_task(
361 ctx,
362 source="report_update",
363 requested_status="completed",
364 claimed_message=str(metadata.get("original_message") or ""),
365 )
366 metadata["follow_up_task"] = follow_up_task.get("key")
367 entry = ctx.db.append_agent_update(ctx.job_id, message, category=category, metadata=metadata)
368 return _json({"success": True, "job_id": ctx.job_id, "update": entry})
369
370
371def _append_completion_audit_task(
372 ctx: ToolContext,
373 *,
374 source: str,
375 requested_status: str,
376 claimed_message: str = "",
377) -> dict[str, Any]:
378 return ctx.db.append_task_record(
379 ctx.job_id,
380 title="Audit latest checkpoint against objective",
381 status="open",
382 priority=7,
383 goal=(
384 "Before treating the latest checkpoint as sufficient, compare the objective and operator context "
385 "against concrete artifacts, files, findings, measurements, validations, and task results."
386 ),
387 output_contract="decision",
388 acceptance_criteria=(
389 "A prompt-to-artifact checklist maps explicit requirements to evidence, identifies uncovered gaps, "
390 "and opens or continues the next branch from those gaps."
391 ),
392 evidence_needed=(
393 "Objective text, active operator context, latest durable outputs, recent tool/test results, "
394 "task queue state, roadmap validations, and measured results when applicable."
395 ),
396 stall_behavior=(
397 "If evidence is missing, mark the checkpoint incomplete, record the gap, and create the smallest "
398 "follow-up task instead of claiming completion."
399 ),
400 metadata={
401 "source": source,
402 "requested_status": requested_status,
403 "completion_audit_required": True,
404 "claimed_message": claimed_message[:1000],
405 },
406 )
407
408
409def _perpetual_checkpoint_message(message: str) -> str:
410 """Keep worker reports checkpoint-oriented without hiding the underlying audit trail."""
411
412 text = " ".join(str(message or "").split())
413 if not text:
414 return ""
415 leading_claim = re.compile(
416 r"(?i)^\s*(?:the\s+)?(?:job|objective|run|work)\s+"
417 r"(?:is\s+|was\s+)?(?:complete|completed|done|finished)\b[.!:,\-\s]*"
418 )
419 if leading_claim.search(text):
420 rest = leading_claim.sub("", text, count=1).strip()
421 if rest:
422 return f"Checkpoint reported; continuing work. {rest}"
423 return "Checkpoint reported; continuing work."
424 whole_job_claim = re.compile(
425 r"(?i)\b(?:completed|finished|done\s+with)\s+(?:the\s+)?(?:job|objective|run|work)\b"
426 )
427 if whole_job_claim.search(text):
428 return "Checkpoint reported; continuing work. " + whole_job_claim.sub("reached a checkpoint for the work", text, count=1)
429 return text
430
431
432def _record_lesson(args: dict[str, Any], ctx: ToolContext) -> str:
433 lesson = str(args.get("lesson") or args.get("memory") or "").strip()
434 if not lesson:
435 return _json({"success": False, "error": "lesson is required"})
436 category = str(args.get("category") or "memory").strip().lower()
437 confidence_arg = args.get("confidence")
438 confidence = float(confidence_arg) if isinstance(confidence_arg, (int, float)) else None
439 metadata = args.get("metadata") if isinstance(args.get("metadata"), dict) else {}
440 pending_measurement = _pending_measurement(ctx)
441 measurement_resolution_category = category in {"constraint", "mistake", "strategy", "memory"}
442 if pending_measurement and measurement_resolution_category and not _lesson_explains_measurement_obligation(lesson, metadata):
443 return _json({
444 "success": False,
445 "error": "measurement explanation required",
446 "message": (
447 "A pending measurement must be resolved with record_experiment, a follow-up task, "
448 "or a lesson that explicitly explains why the output is not a valid measurement."
449 ),
450 "pending_measurement_obligation": pending_measurement,
451 })
452 entry = ctx.db.append_lesson(ctx.job_id, lesson, category=category, confidence=confidence, metadata=metadata)
453 if pending_measurement and measurement_resolution_category:
454 _resolve_measurement_obligation(
455 ctx,
456 status="explained",
457 reason=lesson,
458 via_tool="record_lesson",
459 )
460 return _json({"success": True, "job_id": ctx.job_id, "lesson": entry})
461
462
463def _lesson_explains_measurement_obligation(lesson: str, metadata: dict[str, Any]) -> bool:
464 parts = [lesson]
465 for value in metadata.values():
466 if isinstance(value, (str, int, float, bool)):
467 parts.append(str(value))
468 text = " ".join(parts).lower()
469 measurement_terms = (
470 "measure",
471 "measured",
472 "measurement",
473 "metric",
474 "experiment",
475 "trial",
476 "benchmark",
477 "result",
478 "obligation",
479 )
480 accounting_terms = (
481 "invalid",
482 "valid",
483 "not valid",
484 "diagnostic",
485 "missing",
486 "no metric",
487 "without metric",
488 "blocked",
489 "failed",
490 "failure",
491 "timeout",
492 "permission",
493 "auth",
494 "quota",
495 "rate limit",
496 "unavailable",
497 "unable",
498 "cannot",
499 "can't",
500 "could not",
501 "stale",
502 "incomplete",
503 "not comparable",
504 "not enough",
505 "rerun",
506 "re-run",
507 "retry",
508 )
509 return any(term in text for term in measurement_terms) and any(term in text for term in accounting_terms)
510
511
512def _record_memory_graph(args: dict[str, Any], ctx: ToolContext) -> str:
513 nodes = args.get("nodes") if isinstance(args.get("nodes"), list) else []
514 edges = args.get("edges") if isinstance(args.get("edges"), list) else []
515 if not nodes and not edges:
516 return _json({"success": False, "error": "nodes or edges are required"})
517 record = ctx.db.append_memory_graph_records(ctx.job_id, nodes=nodes, edges=edges)
518 ctx.db.append_agent_update(
519 ctx.job_id,
520 (
521 "Memory graph updated: "
522 f"{record.get('added_nodes')} new nodes, {record.get('updated_nodes')} updated, "
523 f"{record.get('added_edges')} new links."
524 ),
525 category="progress",
526 metadata={
527 "memory_graph_event_id": record.get("event_id"),
528 "added_nodes": record.get("added_nodes"),
529 "updated_nodes": record.get("updated_nodes"),
530 "added_edges": record.get("added_edges"),
531 },
532 )
533 return _json({"success": True, "job_id": ctx.job_id, **record})
534
535
536def _search_memory_graph(args: dict[str, Any], ctx: ToolContext) -> str:
537 query = str(args.get("query") or "")
538 limit = int(args.get("limit") or 10)
539 job = ctx.db.get_job(ctx.job_id)
540 graph = memory_graph_from_job(job)
541 results = search_memory_graph(graph, query=query, limit=limit)
542 return _json({"success": True, "job_id": ctx.job_id, "query": query, **results})
543
544
545def _acknowledge_operator_context(args: dict[str, Any], ctx: ToolContext) -> str:
546 raw_ids = args.get("message_ids")
547 message_ids = [str(item) for item in raw_ids] if isinstance(raw_ids, list) else []
548 summary = str(args.get("summary") or args.get("reason") or "").strip()
549 status = str(args.get("status") or "acknowledged").strip().lower()
550 pending = _acknowledgeable_operator_messages(ctx.db.get_job(ctx.job_id), message_ids=message_ids)
551 if not pending:
552 return _json({
553 "success": False,
554 "recoverable": True,
555 "error": "no active operator context to acknowledge",
556 "message_ids": message_ids,
557 "guidance": "Use acknowledge_operator_context only after incorporating claimed operator steering. Use report_update, record_lesson, record_tasks, or record_experiment for ordinary progress.",
558 })
559 result = ctx.db.acknowledge_operator_messages(
560 ctx.job_id,
561 message_ids=message_ids,
562 summary=summary,
563 status=status,
564 )
565 message = summary or f"Operator context {result.get('status')}."
566 ctx.db.append_agent_update(
567 ctx.job_id,
568 message,
569 category="progress",
570 metadata={
571 "operator_context_status": result.get("status"),
572 "operator_message_count": result.get("count"),
573 "operator_message_ids": [
574 entry.get("event_id")
575 for entry in result.get("messages", [])
576 if isinstance(entry, dict) and entry.get("event_id")
577 ],
578 },
579 )
580 return _json({"success": True, "job_id": ctx.job_id, **result})
581
582
583def _acknowledgeable_operator_messages(job: dict[str, Any], *, message_ids: list[str]) -> list[dict[str, Any]]:
584 metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
585 messages = metadata.get("operator_messages") if isinstance(metadata.get("operator_messages"), list) else []
586 wanted = {str(message_id).strip() for message_id in message_ids if str(message_id).strip()}
587 pending = []
588 for entry in messages:
589 if not isinstance(entry, dict):
590 continue
591 mode = str(entry.get("mode") or "steer").strip().lower().replace("-", "_")
592 if mode not in {"steer", "follow_up"}:
593 continue
594 event_id = str(entry.get("event_id") or "")
595 if wanted and event_id not in wanted:
596 continue
597 if not wanted and not entry.get("claimed_at"):
598 continue
599 if entry.get("acknowledged_at") or entry.get("superseded_at"):
600 continue
601 pending.append(entry)
602 return pending
603
604
605def _record_source(args: dict[str, Any], ctx: ToolContext) -> str:
606 source = str(args.get("source") or args.get("url") or args.get("domain") or "").strip()
607 if not source:
608 return _json({"success": False, "error": "source is required"})
609 warnings_raw = args.get("warnings")
610 warnings = [str(item) for item in warnings_raw] if isinstance(warnings_raw, list) else []
611 score_arg = args.get("usefulness_score")
612 usefulness_score = float(score_arg) if isinstance(score_arg, (int, float)) else None
613 yield_count = int(args.get("yield_count") or 0)
614 fail_count_delta = int(args.get("fail_count_delta") or 0)
615 metadata = args.get("metadata") if isinstance(args.get("metadata"), dict) else {}
616 source_type = str(args.get("source_type") or "")
617 outcome = str(args.get("outcome") or "")
618 if not _source_has_assessment(
619 source_type=source_type,
620 usefulness_score=usefulness_score,
621 yield_count=yield_count,
622 fail_count_delta=fail_count_delta,
623 warnings=warnings,
624 outcome=outcome,
625 metadata=metadata,
626 ):
627 return _json({
628 "success": False,
629 "error": "source assessment is required",
630 "guidance": (
631 "record_source must say why the source matters: include usefulness_score, "
632 "yield_count, fail_count_delta, warnings, outcome, or evidence metadata."
633 ),
634 })
635 entry = ctx.db.append_source_record(
636 ctx.job_id,
637 source,
638 source_type=source_type,
639 usefulness_score=usefulness_score,
640 yield_count=yield_count,
641 fail_count_delta=fail_count_delta,
642 warnings=warnings,
643 outcome=outcome,
644 metadata=metadata,
645 )
646 return _json({"success": True, "job_id": ctx.job_id, "source": entry})
647
648
649def _source_has_assessment(
650 *,
651 source_type: str,
652 usefulness_score: float | None,
653 yield_count: int,
654 fail_count_delta: int,
655 warnings: list[str],
656 outcome: str,
657 metadata: dict[str, Any],
658) -> bool:
659 return bool(
660 usefulness_score is not None
661 or yield_count
662 or fail_count_delta
663 or warnings
664 or outcome.strip()
665 or any(str(value).strip() for value in metadata.values())
666 )
667
668
669def _record_findings(args: dict[str, Any], ctx: ToolContext) -> str:
670 raw_findings = args.get("findings")
671 if isinstance(raw_findings, list):
672 findings = [item for item in raw_findings if isinstance(item, dict)]
673 else:
674 findings = [args]
675 if not findings:
676 return _json({"success": False, "error": "findings are required"})
677 evidence_artifact = str(args.get("evidence_artifact") or args.get("artifact_id") or "")
678 stored = []
679 added = 0
680 updated = 0
681 unchanged = 0
682 rejected = []
683 source_yields: dict[str, int] = {}
684 for finding in findings[:50]:
685 name = str(finding.get("name") or finding.get("title") or "").strip()
686 if not name:
687 continue
688 source_url = str(finding.get("source_url") or finding.get("source") or args.get("source_url") or args.get("source") or "")
689 reason = str(finding.get("reason") or finding.get("rationale") or "")
690 finding_evidence_artifact = str(finding.get("evidence_artifact") or evidence_artifact)
691 metadata = finding.get("metadata") if isinstance(finding.get("metadata"), dict) else {}
692 if not _finding_has_evidence(
693 url=str(finding.get("url") or ""),
694 source_url=source_url,
695 reason=reason,
696 evidence_artifact=finding_evidence_artifact,
697 metadata=metadata,
698 ):
699 rejected.append({"name": name, "reason": "missing_evidence"})
700 continue
701 score_arg = finding.get("score")
702 score = float(score_arg) if isinstance(score_arg, (int, float)) else None
703 entry = ctx.db.append_finding_record(
704 ctx.job_id,
705 name=name,
706 url=str(finding.get("url") or ""),
707 source_url=source_url,
708 category=str(finding.get("category") or finding.get("type") or ""),
709 location=str(finding.get("location") or ""),
710 contact=str(finding.get("contact") or ""),
711 reason=reason,
712 status=str(finding.get("status") or "new"),
713 score=score,
714 evidence_artifact=finding_evidence_artifact,
715 metadata=metadata,
716 )
717 if entry.get("created"):
718 added += 1
719 if source_url:
720 source_yields[source_url] = source_yields.get(source_url, 0) + 1
721 elif entry.get("substantive_update"):
722 updated += 1
723 else:
724 unchanged += 1
725 stored.append(entry)
726 if not stored:
727 return _json({
728 "success": False,
729 "error": "no valid finding with name/title and evidence was provided",
730 "rejected": rejected,
731 "guidance": (
732 "Each finding must include an evidence anchor such as source_url/url, reason/rationale, "
733 "evidence_artifact, or evidence metadata. Use record_tasks or record_source for unevidenced candidates."
734 ),
735 })
736 source_records = []
737 for source_url, count in source_yields.items():
738 score = round(min(1.0, 0.55 + min(count, 10) * 0.04), 2)
739 source_records.append(
740 ctx.db.append_source_record(
741 ctx.job_id,
742 source_url,
743 source_type=str(args.get("source_type") or "finding_source"),
744 usefulness_score=score,
745 yield_count=count,
746 outcome=f"record_findings yielded {count} new candidate(s)",
747 metadata={"auto_from_record_findings": True, "evidence_artifact": evidence_artifact},
748 )
749 )
750 if added or updated or source_records:
751 message = f"Finding ledger updated: {added} new, {updated} changed. Source ledger updated: {len(source_records)}."
752 if unchanged:
753 message += f" {unchanged} unchanged."
754 ctx.db.append_agent_update(
755 ctx.job_id,
756 message,
757 category="finding",
758 metadata={
759 "added": added,
760 "updated": updated,
761 "unchanged": unchanged,
762 "rejected": len(rejected),
763 "sources_updated": len(source_records),
764 },
765 )
766 return _json({
767 "success": True,
768 "job_id": ctx.job_id,
769 "added": added,
770 "updated": updated,
771 "unchanged": unchanged,
772 "rejected": rejected,
773 "sources_updated": len(source_records),
774 "sources": source_records,
775 "findings": stored,
776 })
777
778
779def _finding_has_evidence(
780 *,
781 url: str,
782 source_url: str,
783 reason: str,
784 evidence_artifact: str,
785 metadata: dict[str, Any],
786) -> bool:
787 if url.strip() or source_url.strip() or reason.strip() or evidence_artifact.strip():
788 return True
789 evidence_keys = {
790 "artifact_id",
791 "evidence_artifact",
792 "experiment_key",
793 "file_path",
794 "output_path",
795 "source_id",
796 "source_url",
797 "step_id",
798 }
799 return any(str(metadata.get(key) or "").strip() for key in evidence_keys)
800
801
802def _record_tasks(args: dict[str, Any], ctx: ToolContext) -> str:
803 raw_tasks = args.get("tasks")
804 if isinstance(raw_tasks, list):
805 tasks = [item for item in raw_tasks if isinstance(item, dict)]
806 else:
807 tasks = [args]
808 if not tasks:
809 return _json({"success": False, "error": "tasks are required"})
810
811 pending_measurement = _pending_measurement(ctx)
812 task_queue_pressure = _task_queue_pressure_active(ctx)
813 prepared_tasks: list[dict[str, Any]] = []
814 for task in tasks[:50]:
815 title = str(task.get("title") or task.get("name") or "").strip()
816 if not title:
817 continue
818 parent = str(task.get("parent") or "")
819 metadata = task.get("metadata") if isinstance(task.get("metadata"), dict) else {}
820 if task_queue_pressure:
821 match = _semantic_task_match_under_pressure(ctx, title=title, parent=parent)
822 if match:
823 metadata = dict(metadata)
824 metadata.setdefault("original_title", title)
825 metadata.setdefault(
826 "matched_existing_task",
827 {
828 "key": match.get("key"),
829 "title": match.get("title"),
830 "score": match.get("score"),
831 "overlap": match.get("overlap"),
832 },
833 )
834 title = str(match.get("title") or title)
835 parent = str(match.get("parent") or parent)
836 output_contract = str(
837 task.get("output_contract")
838 or task.get("contract")
839 or metadata.get("output_contract")
840 or metadata.get("contract")
841 or ""
842 )
843 acceptance_criteria = str(task.get("acceptance_criteria") or "")
844 evidence_needed = str(task.get("evidence_needed") or "")
845 stall_behavior = str(task.get("stall_behavior") or "")
846 output_contract, acceptance_criteria, evidence_needed, stall_behavior, metadata = _complete_task_contract(
847 title=title,
848 output_contract=output_contract,
849 acceptance_criteria=acceptance_criteria,
850 evidence_needed=evidence_needed,
851 stall_behavior=stall_behavior,
852 metadata=metadata,
853 )
854 goal = str(task.get("goal") or task.get("description") or "")
855 source_hint = str(task.get("source_hint") or task.get("source") or "")
856 result_text = str(task.get("result") or task.get("outcome") or "")
857 priority_arg = task.get("priority")
858 priority = int(priority_arg) if isinstance(priority_arg, (int, float)) else 0
859 status = str(task.get("status") or "open")
860 if pending_measurement and not _task_targets_measurement_obligation(
861 title=title,
862 goal=goal,
863 source_hint=source_hint,
864 result=result_text,
865 output_contract=output_contract,
866 acceptance_criteria=acceptance_criteria,
867 evidence_needed=evidence_needed,
868 stall_behavior=stall_behavior,
869 metadata=metadata,
870 ) and not _task_would_be_unchanged(
871 ctx,
872 title=title,
873 status=status,
874 priority=priority,
875 goal=goal,
876 source_hint=source_hint,
877 result=result_text,
878 parent=parent,
879 output_contract=output_contract,
880 acceptance_criteria=acceptance_criteria,
881 evidence_needed=evidence_needed,
882 stall_behavior=stall_behavior,
883 metadata=metadata,
884 ):
885 return _json({
886 "success": False,
887 "error": "measurement task required",
888 "message": (
889 "A pending measurement can only be deferred by a task that explicitly obtains, "
890 "repairs, validates, or accounts for that measurement."
891 ),
892 "rejected_task": title,
893 "pending_measurement_obligation": pending_measurement,
894 })
895 prepared_tasks.append({
896 "task": task,
897 "title": title,
898 "goal": goal,
899 "source_hint": source_hint,
900 "result_text": result_text,
901 "parent": parent,
902 "priority": priority,
903 "status": status,
904 "metadata": metadata,
905 "output_contract": output_contract,
906 "acceptance_criteria": acceptance_criteria,
907 "evidence_needed": evidence_needed,
908 "stall_behavior": stall_behavior,
909 })
910
911 stored = []
912 added = 0
913 updated = 0
914 unchanged = 0
915 for prepared in prepared_tasks:
916 task = prepared["task"]
917 title = prepared["title"]
918 status = prepared["status"]
919 metadata = prepared["metadata"]
920 output_contract = prepared["output_contract"]
921 acceptance_criteria = prepared["acceptance_criteria"]
922 evidence_needed = prepared["evidence_needed"]
923 stall_behavior = prepared["stall_behavior"]
924 result_text = prepared["result_text"]
925 status, metadata = _validated_task_status(
926 ctx,
927 status=status,
928 output_contract=output_contract,
929 result=result_text,
930 metadata=metadata,
931 )
932 entry = ctx.db.append_task_record(
933 ctx.job_id,
934 title=title,
935 status=status,
936 priority=prepared["priority"],
937 goal=prepared["goal"],
938 source_hint=prepared["source_hint"],
939 result=result_text,
940 parent=prepared["parent"],
941 output_contract=output_contract,
942 acceptance_criteria=acceptance_criteria,
943 evidence_needed=evidence_needed,
944 stall_behavior=stall_behavior,
945 metadata=metadata,
946 )
947 if entry.get("created"):
948 added += 1
949 elif entry.get("substantive_update"):
950 updated += 1
951 else:
952 unchanged += 1
953 stored.append(entry)
954 if not stored:
955 return _json({"success": False, "error": "no valid task with title/name was provided"})
956
957 if added or updated:
958 message = f"Task queue updated: {added} new, {updated} changed."
959 if unchanged:
960 message += f" {unchanged} unchanged."
961 ctx.db.append_agent_update(
962 ctx.job_id,
963 message,
964 category="plan",
965 metadata={"added": added, "updated": updated, "unchanged": unchanged},
966 )
967 if (added or updated) and pending_measurement:
968 _resolve_measurement_obligation(
969 ctx,
970 status="deferred",
971 reason="Created or updated task branch to obtain or handle the pending measurement.",
972 via_tool="record_tasks",
973 )
974 return _json({"success": True, "job_id": ctx.job_id, "added": added, "updated": updated, "unchanged": unchanged, "tasks": stored})
975
976
977def _task_targets_measurement_obligation(
978 *,
979 title: str,
980 goal: str,
981 source_hint: str,
982 result: str,
983 output_contract: str,
984 acceptance_criteria: str,
985 evidence_needed: str,
986 stall_behavior: str,
987 metadata: dict[str, Any],
988) -> bool:
989 text = " ".join([
990 title,
991 goal,
992 source_hint,
993 result,
994 output_contract,
995 acceptance_criteria,
996 evidence_needed,
997 stall_behavior,
998 _metadata_scalar_text(metadata),
999 ]).lower()
1000 contract = output_contract.strip().lower()
1001 if contract in {"experiment", "monitor"} and _text_mentions_measurement(text):
1002 return True
1003 return _text_mentions_measurement(text) and _text_mentions_measurement_accounting(text)
1004
1005
1006def _task_would_be_unchanged(
1007 ctx: ToolContext,
1008 *,
1009 title: str,
1010 status: str,
1011 priority: int,
1012 goal: str,
1013 source_hint: str,
1014 result: str,
1015 parent: str,
1016 output_contract: str,
1017 acceptance_criteria: str,
1018 evidence_needed: str,
1019 stall_behavior: str,
1020 metadata: dict[str, Any],
1021) -> bool:
1022 try:
1023 job = ctx.db.get_job(ctx.job_id)
1024 except KeyError:
1025 return False
1026 job_metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
1027 tasks = job_metadata.get("task_queue") if isinstance(job_metadata.get("task_queue"), list) else []
1028 key = task_key(parent, title)
1029 current = next(
1030 (
1031 entry
1032 for entry in tasks
1033 if isinstance(entry, dict)
1034 and (
1035 entry.get("key") == key
1036 or (not entry.get("key") and task_key(str(entry.get("parent") or ""), str(entry.get("title") or "")) == key)
1037 )
1038 ),
1039 None,
1040 )
1041 if not current:
1042 return False
1043 fields = (
1044 "status",
1045 "priority",
1046 "goal",
1047 "source_hint",
1048 "result",
1049 "parent",
1050 "output_contract",
1051 "acceptance_criteria",
1052 "evidence_needed",
1053 "stall_behavior",
1054 "metadata",
1055 )
1056 before = _task_change_fingerprint(current, fields)
1057 after = dict(current)
1058 cleaned_status = (status.strip().lower() or "open").replace(" ", "_")
1059 after["status"] = cleaned_status if cleaned_status in {"open", "active", "done", "blocked", "skipped"} else "open"
1060 after["priority"] = int(priority)
1061 for field, value in {
1062 "goal": goal.strip(),
1063 "source_hint": source_hint.strip(),
1064 "result": result.strip(),
1065 "parent": parent.strip(),
1066 "output_contract": output_contract.strip().lower().replace(" ", "_"),
1067 "acceptance_criteria": acceptance_criteria.strip(),
1068 "evidence_needed": evidence_needed.strip(),
1069 "stall_behavior": stall_behavior.strip(),
1070 }.items():
1071 if value:
1072 after[field] = value
1073 if metadata:
1074 merged_metadata = after.get("metadata") if isinstance(after.get("metadata"), dict) else {}
1075 merged_metadata = dict(merged_metadata)
1076 merged_metadata.update(metadata)
1077 after["metadata"] = merged_metadata
1078 return before == _task_change_fingerprint(after, fields)
1079
1080
1081def _task_change_fingerprint(entry: dict[str, Any], fields: tuple[str, ...]) -> str:
1082 return json.dumps({field: entry.get(field) for field in fields}, sort_keys=True, separators=(",", ":"))
1083
1084
1085def _task_queue_pressure_active(ctx: ToolContext) -> bool:
1086 try:
1087 job = ctx.db.get_job(ctx.job_id)
1088 except KeyError:
1089 return False
1090 metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
1091 tasks = metadata.get("task_queue") if isinstance(metadata.get("task_queue"), list) else []
1092 objective_tasks = [task for task in tasks if not _task_is_guard_recovery(task)]
1093 open_tasks = [
1094 task
1095 for task in objective_tasks
1096 if str(task.get("status") or "open").strip().lower().replace(" ", "_") in {"open", "active"}
1097 ]
1098 return len(objective_tasks) > 80 or len(open_tasks) >= 40
1099
1100
1101def _semantic_task_match_under_pressure(ctx: ToolContext, *, title: str, parent: str) -> dict[str, Any] | None:
1102 try:
1103 job = ctx.db.get_job(ctx.job_id)
1104 except KeyError:
1105 return None
1106 metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
1107 tasks = metadata.get("task_queue") if isinstance(metadata.get("task_queue"), list) else []
1108 return find_semantic_task_match(
1109 title=title,
1110 parent=parent,
1111 tasks=[task for task in tasks if not _task_is_guard_recovery(task)],
1112 )
1113
1114
1115def _task_is_guard_recovery(task: dict[str, Any]) -> bool:
1116 metadata = task.get("metadata") if isinstance(task.get("metadata"), dict) else {}
1117 return bool(metadata.get("guard_recovery")) or str(task.get("title") or "").strip().lower().startswith("resolve guard:")
1118
1119
1120def _text_mentions_measurement(text: str) -> bool:
1121 terms = (
1122 "measure",
1123 "measured",
1124 "measurement",
1125 "metric",
1126 "experiment",
1127 "trial",
1128 "benchmark",
1129 "result",
1130 "obligation",
1131 )
1132 return any(term in text for term in terms)
1133
1134
1135def _text_mentions_measurement_accounting(text: str) -> bool:
1136 terms = (
1137 "account",
1138 "accounting",
1139 "obtain",
1140 "repair",
1141 "validate",
1142 "valid",
1143 "invalid",
1144 "diagnostic",
1145 "missing",
1146 "no metric",
1147 "without metric",
1148 "blocked",
1149 "failed",
1150 "failure",
1151 "timeout",
1152 "permission",
1153 "auth",
1154 "quota",
1155 "rate limit",
1156 "unavailable",
1157 "unable",
1158 "cannot",
1159 "can't",
1160 "could not",
1161 "stale",
1162 "incomplete",
1163 "not comparable",
1164 "not enough",
1165 "rerun",
1166 "re-run",
1167 "retry",
1168 "next measured",
1169 )
1170 return any(term in text for term in terms)
1171
1172
1173def _metadata_scalar_text(metadata: dict[str, Any]) -> str:
1174 parts = []
1175 for value in metadata.values():
1176 if isinstance(value, (str, int, float, bool)):
1177 parts.append(str(value))
1178 return " ".join(parts)
1179
1180
1181def _complete_task_contract(
1182 *,
1183 title: str,
1184 output_contract: str,
1185 acceptance_criteria: str,
1186 evidence_needed: str,
1187 stall_behavior: str,
1188 metadata: dict[str, Any],
1189) -> tuple[str, str, str, str, dict[str, Any]]:
1190 defaults = initial_task_contract(title)
1191 inferred = []
1192 if not output_contract.strip():
1193 output_contract = defaults["output_contract"]
1194 inferred.append("output_contract")
1195 if not acceptance_criteria.strip():
1196 acceptance_criteria = defaults["acceptance_criteria"]
1197 inferred.append("acceptance_criteria")
1198 if not evidence_needed.strip():
1199 evidence_needed = defaults["evidence_needed"]
1200 inferred.append("evidence_needed")
1201 if not stall_behavior.strip():
1202 stall_behavior = defaults["stall_behavior"]
1203 inferred.append("stall_behavior")
1204 if inferred:
1205 metadata = dict(metadata)
1206 existing = metadata.get("contract_inferred_fields")
1207 existing_fields = [str(item) for item in existing] if isinstance(existing, list) else []
1208 metadata["contract_inferred_fields"] = sorted(set(existing_fields + inferred))
1209 return output_contract, acceptance_criteria, evidence_needed, stall_behavior, metadata
1210
1211
1212def _validated_task_status(
1213 ctx: ToolContext,
1214 *,
1215 status: str,
1216 output_contract: str,
1217 result: str,
1218 metadata: dict[str, Any],
1219) -> tuple[str, dict[str, Any]]:
1220 normalized_status = status.strip().lower().replace(" ", "_") or "open"
1221 contract = output_contract.strip().lower().replace(" ", "_")
1222 if normalized_status == "done" and not result.strip() and not _task_metadata_has_completion_evidence(metadata, contract=contract):
1223 updated = dict(metadata)
1224 updated["completion_validation"] = "missing_result_evidence"
1225 return "active", updated
1226 if normalized_status != "done":
1227 return status, metadata
1228 if contract in {"artifact", "report"}:
1229 if _recent_deliverable_evidence(ctx):
1230 return status, metadata
1231 updated = dict(metadata)
1232 updated["completion_validation"] = "missing_recent_deliverable_evidence"
1233 if result:
1234 updated["claimed_result"] = result
1235 return "active", updated
1236 if _task_contract_completion_has_evidence(ctx, contract=contract, metadata=metadata):
1237 return status, metadata
1238 updated = dict(metadata)
1239 updated["completion_validation"] = f"missing_{contract}_evidence" if contract else "missing_contract_evidence"
1240 if result:
1241 updated["claimed_result"] = result
1242 return "active", updated
1243
1244
1245def _task_contract_completion_has_evidence(ctx: ToolContext, *, contract: str, metadata: dict[str, Any]) -> bool:
1246 if not contract or contract == "decision":
1247 return True
1248 if _task_metadata_has_completion_evidence(metadata, contract=contract):
1249 return True
1250 recent_evidence_tools = {
1251 "research": {"record_source", "record_findings"},
1252 "experiment": {"record_experiment"},
1253 "action": {"shell_exec", "write_file", "write_artifact"},
1254 "monitor": {"defer_job"},
1255 "validation": {"record_milestone_validation", "shell_exec"},
1256 }
1257 tools = recent_evidence_tools.get(contract)
1258 if not tools:
1259 return True
1260 for step in reversed(ctx.db.list_steps(job_id=ctx.job_id, limit=12)):
1261 if step.get("id") == ctx.step_id or step.get("status") != "completed":
1262 continue
1263 tool_name = str(step.get("tool_name") or "")
1264 if contract == "action" and tool_name == "shell_exec":
1265 input_data = step.get("input") if isinstance(step.get("input"), dict) else {}
1266 args = input_data.get("arguments") if isinstance(input_data.get("arguments"), dict) else {}
1267 if _shell_command_counts_as_action_evidence(str(args.get("command") or "")):
1268 return True
1269 continue
1270 if tool_name in tools:
1271 return True
1272 return False
1273
1274
1275def _task_metadata_has_completion_evidence(metadata: dict[str, Any], *, contract: str = "") -> bool:
1276 evidence_keys = {
1277 "artifact_id",
1278 "evidence_artifact",
1279 "experiment_key",
1280 "file_path",
1281 "output_path",
1282 "validation_event_id",
1283 }
1284 if contract == "research":
1285 evidence_keys.update({"finding_key", "source_key", "source_url", "finding_id", "source_id"})
1286 elif contract == "experiment":
1287 evidence_keys.update({"metric_name", "metric_value", "measurement_id"})
1288 elif contract == "action":
1289 evidence_keys.update({"step_id", "command", "action_id"})
1290 elif contract == "monitor":
1291 evidence_keys.update({"defer_until", "monitor_id", "check_at"})
1292 return any(str(metadata.get(key) or "").strip() for key in evidence_keys)
1293
1294
1295def _recent_deliverable_evidence(ctx: ToolContext, *, limit: int = 12) -> bool:
1296 for step in reversed(ctx.db.list_steps(job_id=ctx.job_id, limit=limit)):
1297 if step.get("id") == ctx.step_id:
1298 continue
1299 if step.get("status") != "completed":
1300 continue
1301 tool_name = str(step.get("tool_name") or "")
1302 input_data = step.get("input") if isinstance(step.get("input"), dict) else {}
1303 args = input_data.get("arguments") if isinstance(input_data.get("arguments"), dict) else {}
1304 if tool_name == "write_artifact" and _artifact_args_look_like_deliverable(args):
1305 return True
1306 if tool_name == "write_file":
1307 return True
1308 if tool_name == "shell_exec" and _shell_command_looks_like_write(str(args.get("command") or "")):
1309 return True
1310 return False
1311
1312
1313def _artifact_args_look_like_deliverable(args: dict[str, Any]) -> bool:
1314 text = " ".join(str(args.get(key) or "") for key in ("title", "summary", "type")).lower()
1315 if not text:
1316 return False
1317 evidence_like = any(term in text for term in EVIDENCE_OUTPUT_TERMS)
1318 deliverable_like = any(term in text for term in DELIVERABLE_OUTPUT_TERMS)
1319 return deliverable_like and not evidence_like
1320
1321
1322def _shell_command_looks_like_write(command: str) -> bool:
1323 text = command.strip()
1324 if not text:
1325 return False
1326 write_patterns = [
1327 r"(?<!\d)>>?\s*[^&]",
1328 r"\b1>>?\s*[^&]",
1329 r"\btee\b",
1330 r"\bcat\s+>\b",
1331 r"\bpython[0-9.]*\b.*\bwrite_text\b",
1332 r"\bpython[0-9.]*\b.*\bopen\([^)]*,\s*['\"]w",
1333 r"\bsed\s+-i\b",
1334 ]
1335 return any(re.search(pattern, text, flags=re.IGNORECASE | re.DOTALL) for pattern in write_patterns)
1336
1337
1338def _shell_command_counts_as_action_evidence(command: str) -> bool:
1339 text = command.strip()
1340 if not text:
1341 return False
1342 if _shell_command_looks_like_write(text):
1343 return True
1344 read_only = re.compile(
1345 r"(?is)^\s*(?:"
1346 r"awk\b|cat\b|df\b|du\b|echo\b|find\b|git\s+(?:diff|grep|log|ls-files|show|status)\b|"
1347 r"grep\b|head\b|ls\b|pwd\b|rg\b|sed\s+-n\b|stat\b|tail\b|tree\b|wc\b"
1348 r")"
1349 )
1350 if read_only.search(text):
1351 return False
1352 if re.match(r"(?is)^curl\b", text):
1353 mutating_flags = (
1354 r"\b-X\s*(?:POST|PUT|PATCH|DELETE)\b|--request\s+(?:POST|PUT|PATCH|DELETE)\b|"
1355 r"(?:^|\s)(?:-d|--data|--form|-F|-T|--upload-file)\b"
1356 )
1357 return bool(re.search(mutating_flags, text))
1358 return True
1359
1360
1361def _record_roadmap(args: dict[str, Any], ctx: ToolContext) -> str:
1362 title = str(args.get("title") or args.get("name") or "").strip()
1363 if not title:
1364 return _json({"success": False, "error": "title is required"})
1365 milestones_arg = args.get("milestones")
1366 milestones = [item for item in milestones_arg if isinstance(item, dict)] if isinstance(milestones_arg, list) else []
1367 roadmap = ctx.db.append_roadmap_record(
1368 ctx.job_id,
1369 title=title,
1370 status=str(args.get("status") or "planned"),
1371 objective=str(args.get("objective") or ""),
1372 scope=str(args.get("scope") or ""),
1373 current_milestone=str(args.get("current_milestone") or ""),
1374 validation_contract=str(args.get("validation_contract") or ""),
1375 milestones=milestones,
1376 metadata=args.get("metadata") if isinstance(args.get("metadata"), dict) else {},
1377 )
1378 ctx.db.append_agent_update(
1379 ctx.job_id,
1380 (
1381 f"Roadmap updated: {roadmap.get('status')} with "
1382 f"{len(roadmap.get('milestones') or [])} milestones."
1383 ),
1384 category="plan",
1385 metadata={
1386 "roadmap_title": roadmap.get("title"),
1387 "roadmap_status": roadmap.get("status"),
1388 "milestone_count": len(roadmap.get("milestones") or []),
1389 "current_milestone": roadmap.get("current_milestone"),
1390 },
1391 )
1392 return _json({"success": True, "job_id": ctx.job_id, "roadmap": roadmap})
1393
1394
1395def _record_milestone_validation(args: dict[str, Any], ctx: ToolContext) -> str:
1396 milestone = str(args.get("milestone") or args.get("milestone_title") or "").strip()
1397 if not milestone:
1398 return _json({"success": False, "error": "milestone is required"})
1399 raw_issues = args.get("issues")
1400 issues = [str(item) for item in raw_issues if str(item).strip()] if isinstance(raw_issues, list) else []
1401 follow_up_items = args.get("follow_up_tasks") if isinstance(args.get("follow_up_tasks"), list) else []
1402 validation_status = str(args.get("validation_status") or args.get("status") or "pending").strip().lower().replace(" ", "_")
1403 if validation_status not in {"pending", "passed", "failed", "blocked"}:
1404 validation_status = "pending"
1405 result_text = str(args.get("result") or args.get("summary") or "").strip()
1406 evidence_text = str(args.get("evidence") or args.get("evidence_artifact") or "").strip()
1407 next_action = str(args.get("next_action") or "").strip()
1408 metadata = args.get("metadata") if isinstance(args.get("metadata"), dict) else {}
1409 if validation_status == "passed" and not _validation_has_positive_evidence(result_text, evidence_text, metadata):
1410 return _json({
1411 "success": False,
1412 "error": "passed milestone validation requires evidence or result",
1413 "guidance": (
1414 "Use validation_status=passed only after concrete evidence or a validation result proves the milestone. "
1415 "Use pending, failed, or blocked when validation is incomplete or missing evidence."
1416 ),
1417 })
1418 if validation_status in {"failed", "blocked"} and not (
1419 result_text or evidence_text or issues or follow_up_items or next_action
1420 ):
1421 return _json({
1422 "success": False,
1423 "error": f"{validation_status} milestone validation requires a gap, issue, evidence, next_action, or follow-up task",
1424 "guidance": (
1425 "Failed or blocked validation must say what is missing or what should happen next, "
1426 "so the worker can continue from a concrete gap instead of logging a vague checkpoint."
1427 ),
1428 })
1429 validation = ctx.db.append_milestone_validation_record(
1430 ctx.job_id,
1431 milestone=milestone,
1432 validation_status=validation_status,
1433 result=result_text,
1434 evidence=evidence_text,
1435 issues=issues,
1436 next_action=next_action,
1437 metadata=metadata,
1438 )
1439 follow_up_tasks = []
1440 for task in follow_up_items[:25]:
1441 if not isinstance(task, dict):
1442 continue
1443 title = str(task.get("title") or task.get("name") or "").strip()
1444 if not title:
1445 continue
1446 priority_arg = task.get("priority")
1447 priority = int(priority_arg) if isinstance(priority_arg, (int, float)) else 0
1448 follow_up_tasks.append(ctx.db.append_task_record(
1449 ctx.job_id,
1450 title=title,
1451 status=str(task.get("status") or "open"),
1452 priority=priority,
1453 goal=str(task.get("goal") or task.get("description") or ""),
1454 source_hint=str(task.get("source_hint") or task.get("source") or ""),
1455 result=str(task.get("result") or task.get("outcome") or ""),
1456 parent=str(task.get("parent") or milestone),
1457 output_contract=str(
1458 task.get("output_contract")
1459 or task.get("contract")
1460 or (task.get("metadata") if isinstance(task.get("metadata"), dict) else {}).get("output_contract")
1461 or (task.get("metadata") if isinstance(task.get("metadata"), dict) else {}).get("contract")
1462 or "action"
1463 ),
1464 acceptance_criteria=str(task.get("acceptance_criteria") or ""),
1465 evidence_needed=str(task.get("evidence_needed") or ""),
1466 stall_behavior=str(task.get("stall_behavior") or ""),
1467 metadata=task.get("metadata") if isinstance(task.get("metadata"), dict) else {"source": "milestone_validation"},
1468 ))
1469 ctx.db.append_agent_update(
1470 ctx.job_id,
1471 (
1472 f"Milestone validation {validation.get('validation_status')}: "
1473 f"{validation.get('title') or milestone}; follow-up tasks {len(follow_up_tasks)}."
1474 ),
1475 category="plan",
1476 metadata={
1477 "milestone": validation.get("title") or milestone,
1478 "validation_status": validation.get("validation_status"),
1479 "follow_up_tasks": len(follow_up_tasks),
1480 },
1481 )
1482 return _json({
1483 "success": True,
1484 "job_id": ctx.job_id,
1485 "validation": validation,
1486 "follow_up_tasks": follow_up_tasks,
1487 })
1488
1489
1490def _validation_has_positive_evidence(result: str, evidence: str, metadata: dict[str, Any]) -> bool:
1491 if result.strip() or evidence.strip():
1492 return True
1493 evidence_keys = {
1494 "artifact_id",
1495 "evidence_artifact",
1496 "experiment_key",
1497 "file_path",
1498 "output_path",
1499 "source_url",
1500 "finding_key",
1501 "validation_event_id",
1502 }
1503 return any(str(metadata.get(key) or "").strip() for key in evidence_keys)
1504
1505
1506def _record_experiment(args: dict[str, Any], ctx: ToolContext) -> str:
1507 title = str(
1508 args.get("title")
1509 or args.get("name")
1510 or args.get("metric_name")
1511 or args.get("hypothesis")
1512 or args.get("result")
1513 or args.get("outcome")
1514 or "Experiment checkpoint"
1515 ).strip()
1516 metric_name = str(args.get("metric_name") or "").strip()
1517 metric_value = _optional_float(args.get("metric_value"))
1518 baseline_value = _optional_float(args.get("baseline_value"))
1519 status = str(args.get("status") or "planned").strip().lower() or "planned"
1520 next_action = str(args.get("next_action") or "").strip()
1521 if status == "measured" and (not metric_name or metric_value is None):
1522 return _json({
1523 "success": False,
1524 "error": "measured experiments require metric_name and numeric metric_value",
1525 "guidance": (
1526 "Use status=measured only for a real measurement with a metric name and numeric value. "
1527 "Use status=failed, blocked, skipped, running, or planned when the trial did not produce a valid metric."
1528 ),
1529 })
1530 if status in {"measured", "failed", "blocked", "skipped"} and not next_action:
1531 return _json({
1532 "success": False,
1533 "error": "next_action is required for measured, failed, blocked, or skipped experiments",
1534 "guidance": (
1535 "Experiment records that close out a trial must leave a concrete next action, "
1536 "such as the next experiment, action branch, monitor branch, pivot, or blocked condition."
1537 ),
1538 })
1539 if status in {"failed", "blocked", "skipped"} and not _experiment_has_closed_trial_context(args):
1540 return _json({
1541 "success": False,
1542 "error": f"{status} experiments require result, evidence, config, or metadata",
1543 "guidance": (
1544 "Closed non-measured trials must record what happened or what was attempted. "
1545 "Include result/outcome, evidence_artifact, config, or metadata with the blocker/context."
1546 ),
1547 })
1548 record = ctx.db.append_experiment_record(
1549 ctx.job_id,
1550 title=title,
1551 hypothesis=str(args.get("hypothesis") or ""),
1552 status=status,
1553 metric_name=metric_name,
1554 metric_value=metric_value,
1555 metric_unit=str(args.get("metric_unit") or ""),
1556 higher_is_better=bool(args.get("higher_is_better", True)),
1557 baseline_value=baseline_value,
1558 config=args.get("config") if isinstance(args.get("config"), dict) else {},
1559 result=str(args.get("result") or args.get("outcome") or ""),
1560 evidence_artifact=str(args.get("evidence_artifact") or args.get("artifact_id") or ""),
1561 next_action=next_action,
1562 metadata=args.get("metadata") if isinstance(args.get("metadata"), dict) else {},
1563 )
1564 metric = ""
1565 if record.get("metric_value") is not None:
1566 metric = " " + format_metric_value(
1567 record.get("metric_name") or "metric",
1568 record.get("metric_value"),
1569 record.get("metric_unit") or "",
1570 )
1571 best = " best" if record.get("best_observed") else ""
1572 ctx.db.append_agent_update(
1573 ctx.job_id,
1574 f"Experiment {record.get('status')}: {record.get('title')}{metric}{best}.",
1575 category="progress",
1576 metadata={
1577 "experiment_key": record.get("key"),
1578 "metric_name": record.get("metric_name"),
1579 "metric_value": record.get("metric_value"),
1580 "best_observed": record.get("best_observed"),
1581 "delta_from_previous_best": record.get("delta_from_previous_best"),
1582 },
1583 )
1584 if record.get("metric_value") is not None or str(record.get("status") or "") in {"measured", "failed", "blocked", "skipped"}:
1585 _resolve_measurement_obligation(
1586 ctx,
1587 status="recorded",
1588 reason=f"Recorded experiment {record.get('title')}.",
1589 via_tool="record_experiment",
1590 experiment_key=str(record.get("key") or ""),
1591 )
1592 return _json({"success": True, "job_id": ctx.job_id, "experiment": record})
1593
1594
1595def _experiment_has_closed_trial_context(args: dict[str, Any]) -> bool:
1596 if str(args.get("result") or args.get("outcome") or "").strip():
1597 return True
1598 if str(args.get("evidence_artifact") or args.get("artifact_id") or "").strip():
1599 return True
1600 config = args.get("config")
1601 if isinstance(config, dict) and any(str(value).strip() for value in config.values()):
1602 return True
1603 metadata = args.get("metadata")
1604 if isinstance(metadata, dict) and any(str(value).strip() for value in metadata.values()):
1605 return True
1606 return False
1607
1608
1609def _optional_float(value: Any) -> float | None:
1610 if isinstance(value, bool) or value is None:
1611 return None
1612 if isinstance(value, (int, float)):
1613 return float(value)
1614 if isinstance(value, str):
1615 text = value.strip().replace(",", "")
1616 if not text:
1617 return None
1618 try:
1619 return float(text)
1620 except ValueError:
1621 return None
1622 return None
1623
1624
1625def _pending_measurement(ctx: ToolContext) -> dict[str, Any] | None:
1626 try:
1627 job = ctx.db.get_job(ctx.job_id)
1628 except KeyError:
1629 return None
1630 metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
1631 obligation = metadata.get("pending_measurement_obligation")
1632 if isinstance(obligation, dict) and not obligation.get("resolved_at"):
1633 return obligation
1634 return None
1635
1636
1637def _resolve_measurement_obligation(
1638 ctx: ToolContext,
1639 *,
1640 status: str,
1641 reason: str,
1642 via_tool: str,
1643 experiment_key: str = "",
1644) -> None:
1645 obligation = _pending_measurement(ctx)
1646 if not obligation:
1647 return
1648 resolved = dict(obligation)
1649 resolved.update({
1650 "resolved_at": time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime()),
1651 "resolution_status": status,
1652 "resolution_reason": reason[:1000],
1653 "resolution_tool": via_tool,
1654 })
1655 if experiment_key:
1656 resolved["experiment_key"] = experiment_key
1657 ctx.db.update_job_metadata(
1658 ctx.job_id,
1659 {
1660 "pending_measurement_obligation": {},
1661 "last_measurement_obligation": resolved,
1662 },
1663 )
1664 ctx.db.append_agent_update(
1665 ctx.job_id,
1666 f"Measurement obligation {status}: {reason[:220]}",
1667 category="progress" if status == "recorded" else "blocked",
1668 metadata={"measurement_obligation": resolved},
1669 )
1670
1671
1672def _send_digest_email(args: dict[str, Any], ctx: ToolContext) -> str:
1673 subject = str(args.get("subject") or "Agent digest")
1674 body = str(args.get("body") or "")
1675 if not body:
1676 return _json({"success": False, "error": "body is required"})
1677 result = send_digest_email(ctx.config.email, subject=subject, body=body, to_addr=args.get("to_addr"))
1678 stored = ctx.artifacts.write_text(
1679 job_id=ctx.job_id,
1680 run_id=ctx.run_id,
1681 step_id=ctx.step_id,
1682 content=body,
1683 title=subject,
1684 summary="Digest email body",
1685 artifact_type="digest",
1686 metadata={"email": result},
1687 )
1688 return _json({"success": True, "email": result, "artifact_id": stored.id, "path": str(stored.path)})
1689
1690
1691def _browser_call(name: str, args: dict[str, Any], ctx: ToolContext) -> str:
1692 from nipux_cli import browser
1693
1694 task_id = ctx.task_id or ctx.job_id
1695 if name == "browser_navigate":
1696 return _json(browser.navigate(ctx.config, task_id=task_id, url=str(args.get("url") or "")))
1697 if name == "browser_snapshot":
1698 return _json(browser.snapshot(ctx.config, task_id=task_id, full=bool(args.get("full", False))))
1699 if name == "browser_click":
1700 return _json(browser.click(ctx.config, task_id=task_id, ref=str(args.get("ref") or "")))
1701 if name == "browser_type":
1702 return _json(browser.fill(ctx.config, task_id=task_id, ref=str(args.get("ref") or ""), text=str(args.get("text") or "")))
1703 if name == "browser_scroll":
1704 return _json(browser.scroll(ctx.config, task_id=task_id, direction=str(args.get("direction") or "down")))
1705 if name == "browser_back":
1706 return _json(browser.back(ctx.config, task_id=task_id))
1707 if name == "browser_press":
1708 return _json(browser.press(ctx.config, task_id=task_id, key=str(args.get("key") or "")))
1709 if name == "browser_console":
1710 return _json(browser.console(ctx.config, task_id=task_id, clear=bool(args.get("clear", False)), expression=args.get("expression")))
1711 raise KeyError(name)
1712
1713
1714def _web_call(name: str, args: dict[str, Any], ctx: ToolContext) -> str:
1715 del ctx
1716 from nipux_cli.web import web_extract, web_search
1717
1718 if name == "web_search":
1719 return _json(web_search(str(args.get("query") or ""), limit=int(args.get("limit") or 5)))
1720 if name == "web_extract":
1721 urls = args.get("urls") if isinstance(args.get("urls"), list) else []
1722 return _json(web_extract(urls[:5]))
1723 raise KeyError(name)
1724
1725
1726def _browser_handler(name: str) -> Handler:
1727 return lambda args, ctx: _browser_call(name, args, ctx)
1728
1729
1730def _web_handler(name: str) -> Handler:
1731 return lambda args, ctx: _web_call(name, args, ctx)
1732
1733
1734BROWSER_SCHEMAS: list[ToolSpec] = [
1735 ToolSpec("browser_navigate", "Navigate to a URL and return a compact browser snapshot.", {
1736 "type": "object",
1737 "properties": {"url": {"type": "string"}},
1738 "required": ["url"],
1739 }, _browser_handler("browser_navigate")),
1740 ToolSpec("browser_snapshot", "Refresh the current page accessibility snapshot.", {
1741 "type": "object",
1742 "properties": {"full": {"type": "boolean", "default": False}},
1743 "required": [],
1744 }, _browser_handler("browser_snapshot")),
1745 ToolSpec("browser_click", "Click an element by snapshot ref, for example @e5.", {
1746 "type": "object",
1747 "properties": {"ref": {"type": "string"}},
1748 "required": ["ref"],
1749 }, _browser_handler("browser_click")),
1750 ToolSpec("browser_type", "Fill an input by snapshot ref.", {
1751 "type": "object",
1752 "properties": {"ref": {"type": "string"}, "text": {"type": "string"}},
1753 "required": ["ref", "text"],
1754 }, _browser_handler("browser_type")),
1755 ToolSpec("browser_scroll", "Scroll the current page up or down.", {
1756 "type": "object",
1757 "properties": {"direction": {"type": "string", "enum": ["up", "down"]}},
1758 "required": ["direction"],
1759 }, _browser_handler("browser_scroll")),
1760 ToolSpec("browser_back", "Navigate back in browser history.", {"type": "object", "properties": {}, "required": []}, _browser_handler("browser_back")),
1761 ToolSpec("browser_press", "Press a keyboard key in the browser.", {
1762 "type": "object",
1763 "properties": {"key": {"type": "string"}},
1764 "required": ["key"],
1765 }, _browser_handler("browser_press")),
1766 ToolSpec("browser_console", "Read console errors or evaluate JavaScript in the current page.", {
1767 "type": "object",
1768 "properties": {"clear": {"type": "boolean", "default": False}, "expression": {"type": "string"}},
1769 "required": [],
1770 }, _browser_handler("browser_console")),
1771]
1772
1773
1774SUPPORT_SCHEMAS: list[ToolSpec] = [
1775 ToolSpec("web_search", "Search the web for candidate sources.", {
1776 "type": "object",
1777 "properties": {"query": {"type": "string"}, "limit": {"type": "integer", "default": 5}},
1778 "required": ["query"],
1779 }, _web_handler("web_search")),
1780 ToolSpec("web_extract", "Extract markdown text from up to five URLs.", {
1781 "type": "object",
1782 "properties": {"urls": {"type": "array", "items": {"type": "string"}, "maxItems": 5}},
1783 "required": ["urls"],
1784 }, _web_handler("web_extract")),
1785 ToolSpec("shell_exec", "Run a local shell command for CLI work. Use small read-only probes first. For long downloads, builds, training, crawls, or benchmarks, set a meaningful timeout, prefer resumable commands, and record or defer monitoring instead of repeatedly restarting short timed-out commands. Do not run destructive or high-risk cyber commands.", {
1786 "type": "object",
1787 "properties": {
1788 "command": {"type": "string"},
1789 "cwd": {"type": "string"},
1790 "timeout_seconds": {"type": "number", "default": 60},
1791 "max_output_chars": {"type": "integer", "default": 12000},
1792 },
1793 "required": ["command"],
1794 }, _shell_exec),
1795 ToolSpec("write_file", "Create, overwrite, or append a concrete workspace/local file for deliverables, code, documents, configs, or other file outputs.", {
1796 "type": "object",
1797 "properties": {
1798 "path": {"type": "string"},
1799 "content": {"type": "string"},
1800 "mode": {"type": "string", "enum": ["overwrite", "append"], "default": "overwrite"},
1801 "create_parents": {"type": "boolean", "default": True},
1802 },
1803 "required": ["path", "content"],
1804 }, _write_file),
1805 ToolSpec("write_artifact", "Persist important findings, evidence, reports, or checkpoints to the job artifact store.", {
1806 "type": "object",
1807 "properties": {
1808 "title": {"type": "string"},
1809 "type": {"type": "string", "default": "text"},
1810 "summary": {"type": "string"},
1811 "content": {"type": "string"},
1812 "metadata": {"type": "object"},
1813 },
1814 "required": ["content"],
1815 }, _write_artifact),
1816 ToolSpec("read_artifact", "Read a saved artifact by artifact_id, visible number, exact saved path, or title.", {
1817 "type": "object",
1818 "properties": {
1819 "artifact_id": {"type": "string", "description": "Artifact id, visible number, saved path, or title."},
1820 "path": {"type": "string"},
1821 "title": {"type": "string"},
1822 "ref": {"type": "string"},
1823 },
1824 "required": [],
1825 }, _read_artifact),
1826 ToolSpec("search_artifacts", "Search stored artifacts for exact evidence from prior steps.", {
1827 "type": "object",
1828 "properties": {"query": {"type": "string"}, "limit": {"type": "integer", "default": 10}},
1829 "required": ["query"],
1830 }, _search_artifacts),
1831 ToolSpec("update_job_state", "Keep the current job runnable. Completion, failure, pausing, and cancellation are operator-only; workers should report checkpoints and continue.", {
1832 "type": "object",
1833 "properties": {
1834 "status": {"type": "string", "enum": ["queued", "running"]},
1835 "note": {"type": "string"},
1836 },
1837 "required": ["status"],
1838 }, _update_job_state),
1839 ToolSpec("defer_job", "Wait before the next worker turn for this job. Use for long external processes, monitor/check-later tasks, or scheduled follow-up without completing or pausing the job.", {
1840 "type": "object",
1841 "properties": {
1842 "seconds": {"type": "number", "description": "Delay in seconds before this job is runnable again.", "default": 300},
1843 "until": {"type": "string", "description": "Optional ISO timestamp to resume after."},
1844 "reason": {"type": "string"},
1845 "next_action": {"type": "string"},
1846 },
1847 "required": [],
1848 }, _defer_job),
1849 ToolSpec("report_update", "Leave a short operator-readable progress note. Do not use this instead of write_artifact for durable evidence.", {
1850 "type": "object",
1851 "properties": {
1852 "message": {"type": "string"},
1853 "category": {"type": "string", "enum": ["progress", "finding", "blocked", "plan"], "default": "progress"},
1854 "metadata": {"type": "object"},
1855 },
1856 "required": ["message"],
1857 }, _report_update),
1858 ToolSpec("record_lesson", "Save durable learning for this job: bad source patterns, success criteria, strategy changes, mistakes to avoid, or operator preferences.", {
1859 "type": "object",
1860 "properties": {
1861 "lesson": {"type": "string"},
1862 "category": {
1863 "type": "string",
1864 "enum": [
1865 "source_quality",
1866 "task_profile",
1867 "strategy",
1868 "mistake",
1869 "constraint",
1870 "operator_preference",
1871 "memory",
1872 ],
1873 "default": "memory",
1874 },
1875 "confidence": {"type": "number"},
1876 "metadata": {"type": "object"},
1877 },
1878 "required": ["lesson"],
1879 }, _record_lesson),
1880 ToolSpec("record_memory_graph", "Create or update the job's connected memory graph: reusable episodes, facts, strategies, skills, questions, decisions, constraints, and links between them. Use this to build a durable brain for long-running work instead of relying on raw history.", {
1881 "type": "object",
1882 "properties": {
1883 "nodes": {
1884 "type": "array",
1885 "maxItems": 50,
1886 "items": {
1887 "type": "object",
1888 "properties": {
1889 "key": {"type": "string"},
1890 "title": {"type": "string"},
1891 "kind": {
1892 "type": "string",
1893 "enum": [
1894 "artifact",
1895 "constraint",
1896 "decision",
1897 "episode",
1898 "experiment",
1899 "fact",
1900 "milestone",
1901 "question",
1902 "skill",
1903 "source",
1904 "strategy",
1905 "task",
1906 ],
1907 },
1908 "status": {
1909 "type": "string",
1910 "enum": ["active", "blocked", "deprecated", "open", "resolved", "stable"],
1911 "default": "active",
1912 },
1913 "summary": {"type": "string"},
1914 "salience": {"type": "number"},
1915 "confidence": {"type": "number"},
1916 "tags": {"type": "array", "items": {"type": "string"}},
1917 "parent_key": {"type": "string"},
1918 "links": {"type": "array", "items": {"type": "string"}},
1919 "evidence_refs": {"type": "array", "items": {"type": "string"}},
1920 "metadata": {"type": "object"},
1921 },
1922 "required": ["title"],
1923 },
1924 },
1925 "edges": {
1926 "type": "array",
1927 "maxItems": 100,
1928 "items": {
1929 "type": "object",
1930 "properties": {
1931 "from_key": {"type": "string"},
1932 "to_key": {"type": "string"},
1933 "relation": {"type": "string"},
1934 "evidence_refs": {"type": "array", "items": {"type": "string"}},
1935 "metadata": {"type": "object"},
1936 },
1937 "required": ["from_key", "to_key", "relation"],
1938 },
1939 },
1940 },
1941 "required": [],
1942 }, _record_memory_graph),
1943 ToolSpec("search_memory_graph", "Search the job's connected memory graph for reusable facts, decisions, strategies, skills, questions, constraints, and related links.", {
1944 "type": "object",
1945 "properties": {
1946 "query": {"type": "string"},
1947 "limit": {"type": "integer", "default": 10},
1948 },
1949 "required": ["query"],
1950 }, _search_memory_graph),
1951 ToolSpec("acknowledge_operator_context", "Acknowledge that durable operator steering has been incorporated or superseded. Use this after acting on a chat correction so it can leave the active context while remaining in history.", {
1952 "type": "object",
1953 "properties": {
1954 "message_ids": {"type": "array", "items": {"type": "string"}},
1955 "summary": {"type": "string"},
1956 "status": {"type": "string", "enum": ["acknowledged", "superseded"], "default": "acknowledged"},
1957 },
1958 "required": ["summary"],
1959 }, _acknowledge_operator_context),
1960 ToolSpec("record_source", "Update the source ledger with source quality, finding yield, failures, warnings, and last outcome.", {
1961 "type": "object",
1962 "properties": {
1963 "source": {"type": "string"},
1964 "source_type": {"type": "string"},
1965 "usefulness_score": {"type": "number"},
1966 "yield_count": {"type": "integer", "default": 0},
1967 "fail_count_delta": {"type": "integer", "default": 0},
1968 "warnings": {"type": "array", "items": {"type": "string"}},
1969 "outcome": {"type": "string"},
1970 "metadata": {"type": "object"},
1971 },
1972 "required": ["source"],
1973 }, _record_source),
1974 ToolSpec("record_findings", "Update the finding ledger with evidence-backed useful results. Each finding needs an evidence anchor such as source_url/url, reason, evidence_artifact, or evidence metadata.", {
1975 "type": "object",
1976 "properties": {
1977 "evidence_artifact": {"type": "string"},
1978 "findings": {
1979 "type": "array",
1980 "maxItems": 50,
1981 "items": {
1982 "type": "object",
1983 "properties": {
1984 "name": {"type": "string"},
1985 "url": {"type": "string"},
1986 "source_url": {"type": "string"},
1987 "category": {"type": "string"},
1988 "location": {"type": "string"},
1989 "contact": {"type": "string"},
1990 "reason": {"type": "string"},
1991 "status": {"type": "string"},
1992 "score": {"type": "number"},
1993 "evidence_artifact": {"type": "string"},
1994 "metadata": {"type": "object"},
1995 },
1996 "required": ["name"],
1997 },
1998 },
1999 },
2000 "required": ["findings"],
2001 }, _record_findings),
2002 ToolSpec("record_tasks", "Create or update a durable queue of objective-neutral work branches. Use this to split long jobs into next actions, mark blocked branches, and keep the agent from cycling on one path. Missing task contract fields are filled with generic defaults from the task title. When the queue is saturated, near-duplicate task titles are folded into the matching existing task instead of creating another branch.", {
2003 "type": "object",
2004 "properties": {
2005 "tasks": {
2006 "type": "array",
2007 "maxItems": 50,
2008 "items": {
2009 "type": "object",
2010 "properties": {
2011 "title": {"type": "string"},
2012 "status": {"type": "string", "enum": ["open", "active", "done", "blocked", "skipped"], "default": "open"},
2013 "priority": {"type": "integer", "default": 0},
2014 "goal": {"type": "string"},
2015 "source_hint": {"type": "string"},
2016 "result": {"type": "string"},
2017 "parent": {"type": "string"},
2018 "output_contract": {
2019 "type": "string",
2020 "enum": ["research", "artifact", "experiment", "action", "monitor", "decision", "report"],
2021 },
2022 "acceptance_criteria": {"type": "string"},
2023 "evidence_needed": {"type": "string"},
2024 "stall_behavior": {"type": "string"},
2025 "metadata": {"type": "object"},
2026 },
2027 "required": ["title"],
2028 },
2029 },
2030 },
2031 "required": ["tasks"],
2032 }, _record_tasks),
2033 ToolSpec("record_roadmap", "Create or update a generic roadmap for broad work: milestones, features, success criteria, validation contract, scope, and current roadmap state. Use this before or during long-running work when task lists need higher-level structure.", {
2034 "type": "object",
2035 "properties": {
2036 "title": {"type": "string"},
2037 "status": {"type": "string", "enum": ["planned", "active", "validating", "done", "blocked", "paused"], "default": "planned"},
2038 "objective": {"type": "string"},
2039 "scope": {"type": "string"},
2040 "current_milestone": {"type": "string"},
2041 "validation_contract": {"type": "string"},
2042 "milestones": {
2043 "type": "array",
2044 "maxItems": 100,
2045 "items": {
2046 "type": "object",
2047 "properties": {
2048 "key": {"type": "string"},
2049 "title": {"type": "string"},
2050 "status": {"type": "string", "enum": ["planned", "active", "validating", "done", "blocked", "skipped"], "default": "planned"},
2051 "priority": {"type": "integer", "default": 0},
2052 "goal": {"type": "string"},
2053 "acceptance_criteria": {"type": "string"},
2054 "evidence_needed": {"type": "string"},
2055 "validation_status": {"type": "string", "enum": ["not_started", "pending", "passed", "failed", "blocked"], "default": "not_started"},
2056 "validation_result": {"type": "string"},
2057 "next_action": {"type": "string"},
2058 "features": {
2059 "type": "array",
2060 "maxItems": 100,
2061 "items": {
2062 "type": "object",
2063 "properties": {
2064 "key": {"type": "string"},
2065 "title": {"type": "string"},
2066 "status": {"type": "string", "enum": ["planned", "active", "done", "blocked", "skipped"], "default": "planned"},
2067 "goal": {"type": "string"},
2068 "output_contract": {"type": "string", "enum": ["research", "artifact", "experiment", "action", "monitor", "decision", "report", "validation"]},
2069 "acceptance_criteria": {"type": "string"},
2070 "evidence_needed": {"type": "string"},
2071 "result": {"type": "string"},
2072 "metadata": {"type": "object"},
2073 },
2074 "required": ["title"],
2075 },
2076 },
2077 "metadata": {"type": "object"},
2078 },
2079 "required": ["title"],
2080 },
2081 },
2082 "metadata": {"type": "object"},
2083 },
2084 "required": ["title"],
2085 }, _record_roadmap),
2086 ToolSpec("record_milestone_validation", "Record validation for a roadmap milestone and optionally create follow-up tasks for gaps. Use fresh evidence, acceptance criteria, and clear pass/fail/blocker reasons.", {
2087 "type": "object",
2088 "properties": {
2089 "milestone": {"type": "string"},
2090 "validation_status": {"type": "string", "enum": ["pending", "passed", "failed", "blocked"], "default": "pending"},
2091 "result": {"type": "string"},
2092 "evidence": {"type": "string"},
2093 "issues": {"type": "array", "items": {"type": "string"}},
2094 "next_action": {"type": "string"},
2095 "follow_up_tasks": {
2096 "type": "array",
2097 "maxItems": 25,
2098 "items": {
2099 "type": "object",
2100 "properties": {
2101 "title": {"type": "string"},
2102 "status": {"type": "string", "enum": ["open", "active", "done", "blocked", "skipped"], "default": "open"},
2103 "priority": {"type": "integer", "default": 0},
2104 "goal": {"type": "string"},
2105 "source_hint": {"type": "string"},
2106 "result": {"type": "string"},
2107 "parent": {"type": "string"},
2108 "output_contract": {"type": "string", "enum": ["research", "artifact", "experiment", "action", "monitor", "decision", "report"]},
2109 "acceptance_criteria": {"type": "string"},
2110 "evidence_needed": {"type": "string"},
2111 "stall_behavior": {"type": "string"},
2112 "metadata": {"type": "object"},
2113 },
2114 "required": ["title"],
2115 },
2116 },
2117 "metadata": {"type": "object"},
2118 },
2119 "required": ["milestone", "validation_status"],
2120 }, _record_milestone_validation),
2121 ToolSpec("record_experiment", "Track a measurable trial, benchmark, comparison, hypothesis test, or optimization attempt. Use this after any command or source produces a concrete result so future steps compare against the best observed result instead of treating notes as progress. Closed trials must include next_action so long-running work can continue from the result.", {
2122 "type": "object",
2123 "properties": {
2124 "title": {"type": "string"},
2125 "hypothesis": {"type": "string"},
2126 "status": {"type": "string", "enum": ["planned", "running", "measured", "failed", "blocked", "skipped"], "default": "planned"},
2127 "metric_name": {"type": "string"},
2128 "metric_value": {"type": "number"},
2129 "metric_unit": {"type": "string"},
2130 "higher_is_better": {"type": "boolean", "default": True},
2131 "baseline_value": {"type": "number"},
2132 "config": {"type": "object"},
2133 "result": {"type": "string"},
2134 "evidence_artifact": {"type": "string"},
2135 "next_action": {"type": "string", "description": "Concrete next experiment, action, monitor branch, pivot, or blocked condition. Required when status is measured, failed, blocked, or skipped."},
2136 "metadata": {"type": "object"},
2137 },
2138 "required": ["title"],
2139 }, _record_experiment),
2140 ToolSpec("send_digest_email", "Send or dry-run a digest email and save the body as an artifact.", {
2141 "type": "object",
2142 "properties": {"subject": {"type": "string"}, "body": {"type": "string"}, "to_addr": {"type": "string"}},
2143 "required": ["body"],
2144 }, _send_digest_email),
2145]
2146
2147
2148APPROVED_TOOL_NAMES = tuple(spec.name for spec in [*BROWSER_SCHEMAS, *SUPPORT_SCHEMAS])
2149
2150
2151class ToolRegistry:
2152 def __init__(self, specs: list[ToolSpec] | None = None):
2153 self._specs = {spec.name: spec for spec in (specs or [*BROWSER_SCHEMAS, *SUPPORT_SCHEMAS])}
2154
2155 def names(self) -> list[str]:
2156 return sorted(self._specs)
2157
2158 def openai_tools(self, config: AppConfig | None = None) -> list[dict[str, Any]]:
2159 return [self._specs[name].as_openai_tool() for name in self.names() if _tool_enabled(name, config)]
2160
2161 def validate_arguments(self, name: str, args: dict[str, Any], config: AppConfig | None = None) -> dict[str, Any] | None:
2162 if name not in self._specs or not _tool_enabled(name, config):
2163 return None
2164 args = args if isinstance(args, dict) else {}
2165 spec = self._specs[name]
2166 missing: list[str] = []
2167 aliases = REQUIRED_ARGUMENT_ALIASES.get(name, {})
2168 for required in spec.parameters.get("required") or []:
2169 candidates = aliases.get(str(required), (str(required),))
2170 if all(_missing_argument(args.get(candidate)) for candidate in candidates):
2171 missing.append(" or ".join(candidates))
2172 for label, fields in REQUIRED_ARGUMENT_GROUPS.get(name, ()):
2173 if all(_missing_argument(args.get(field)) for field in fields):
2174 missing.append(label)
2175 nested_schema = dict(spec.parameters)
2176 nested_schema["required"] = []
2177 missing.extend(item for item in _schema_missing_arguments(nested_schema, args) if item not in missing)
2178 placeholders = [] if missing else [item for item in _schema_placeholder_arguments(nested_schema, args) if item not in missing]
2179 if not missing and not placeholders:
2180 return None
2181 concrete_fields = [*missing, *placeholders]
2182 return {
2183 "success": True,
2184 "recoverable": True,
2185 "error": "missing required tool arguments" if missing else "placeholder tool arguments",
2186 "missing_arguments": missing,
2187 "placeholder_arguments": placeholders,
2188 "blocked_tool": name,
2189 "guidance": (
2190 f"Retry {name} with concrete values for: {', '.join(concrete_fields)}. "
2191 "Do not call a tool with placeholder or empty arguments."
2192 ),
2193 }
2194
2195 def handle(self, name: str, args: dict[str, Any], ctx: ToolContext) -> str:
2196 if name not in self._specs:
2197 return _json({"success": False, "error": f"unknown tool: {name}"})
2198 if not _tool_enabled(name, ctx.config):
2199 group = _tool_access_group(name) or "tool"
2200 return _json({
2201 "success": False,
2202 "error": f"{name} is disabled by tool access config",
2203 "tool_access": group,
2204 })
2205 return self._specs[name].handler(args, ctx)
2206
2207
2208def _tool_enabled(name: str, config: AppConfig | None) -> bool:
2209 if config is None:
2210 return True
2211 group = _tool_access_group(name)
2212 if group is None:
2213 return True
2214 return bool(getattr(config.tools, group))
2215
2216
2217def _tool_access_group(name: str) -> str | None:
2218 if name.startswith("browser_"):
2219 return "browser"
2220 if name.startswith("web_"):
2221 return "web"
2222 if name == "shell_exec":
2223 return "shell"
2224 if name == "write_file":
2225 return "files"
2226 return None
2227
2228
2229DEFAULT_REGISTRY = ToolRegistry()
nipux_cli/tui_commands.py 296 lines
1"""Slash command metadata and command-palette helpers for the TUI."""
2
3from __future__ import annotations
4
5from nipux_cli.tui_style import _accent, _bold, _fit_ansi, _muted
6
7
8FIRST_RUN_SLASH_COMMANDS = [
9 ("/model", "set model"),
10 ("/base-url", "set endpoint"),
11 ("/api-key", "save key"),
12 ("/api-key-env", "key env var"),
13 ("/config", "runtime config"),
14 ("/context", "token budget"),
15 ("/input-cost", "input $/1M"),
16 ("/timeout", "request timeout"),
17 ("/browser", "browser on/off"),
18 ("/web", "web on/off"),
19 ("/cli-access", "CLI on/off"),
20 ("/file-access", "files on/off"),
21 ("/home", "state directory"),
22 ("/step-limit", "worker timeout"),
23 ("/output-chars", "output preview size"),
24 ("/output-cost", "output $/1M"),
25 ("/max-cost", "job cost limit"),
26 ("/daily-digest", "daily digest on/off"),
27 ("/digest-time", "digest time"),
28 ("/doctor", "check setup"),
29 ("/init", "write config"),
30 ("/help", "show commands"),
31 ("/clear", "clear notices"),
32 ("/exit", "quit"),
33]
34
35CHAT_SLASH_COMMANDS = [
36 ("/new", "create and start a worker"),
37 ("/run", "resume focused work"),
38 ("/jobs", "switch or inspect jobs"),
39 ("/settings", "configure provider/tools"),
40 ("/status", "focused job state"),
41 ("/help", "core commands"),
42 ("/outcomes", "durable work"),
43 ("/artifacts", "saved files"),
44 ("/activity", "tool calls"),
45 ("/work", "one step"),
46 ("/work-verbose", "verbose step"),
47 ("/focus", "set focus"),
48 ("/switch", "set focus"),
49 ("/history", "timeline"),
50 ("/events", "event feed"),
51 ("/updates", "durable work"),
52 ("/outputs", "raw runs"),
53 ("/artifact", "open output"),
54 ("/findings", "finding ledger"),
55 ("/tasks", "task queue"),
56 ("/roadmap", "milestones"),
57 ("/experiments", "measurements"),
58 ("/sources", "source ledger"),
59 ("/memory", "learning"),
60 ("/metrics", "counts"),
61 ("/lessons", "lessons"),
62 ("/usage", "tokens/cost"),
63 ("/config", "runtime config"),
64 ("/health", "daemon health"),
65 ("/start", "start daemon"),
66 ("/restart", "restart daemon"),
67 ("/model", "set model"),
68 ("/base-url", "set endpoint"),
69 ("/api-key", "save key"),
70 ("/api-key-env", "key env var"),
71 ("/context", "token budget"),
72 ("/input-cost", "input $/1M"),
73 ("/timeout", "request timeout"),
74 ("/browser", "browser on/off"),
75 ("/web", "web on/off"),
76 ("/cli-access", "CLI on/off"),
77 ("/file-access", "files on/off"),
78 ("/home", "state directory"),
79 ("/step-limit", "worker timeout"),
80 ("/output-chars", "output preview size"),
81 ("/output-cost", "output $/1M"),
82 ("/daily-digest", "daily digest on/off"),
83 ("/digest-time", "digest time"),
84 ("/doctor", "check setup"),
85 ("/init", "write config"),
86 ("/pause", "pause job"),
87 ("/resume", "resume job"),
88 ("/stop", "pause job"),
89 ("/cancel", "cancel job"),
90 ("/delete", "delete job"),
91 ("/learn", "save lesson"),
92 ("/note", "save note"),
93 ("/follow", "queue follow-up"),
94 ("/digest", "digest"),
95 ("/clear", "clear notices"),
96 ("/exit", "quit"),
97]
98
99SETTINGS_FIELD_TYPES = {
100 "model.name": "str",
101 "model.base_url": "str",
102 "model.api_key_env": "str",
103 "model.context_length": "int",
104 "model.request_timeout_seconds": "float",
105 "model.input_cost_per_million": "float",
106 "model.output_cost_per_million": "float",
107 "runtime.home": "path",
108 "runtime.max_step_seconds": "int",
109 "runtime.artifact_inline_char_limit": "int",
110 "runtime.daily_digest_enabled": "bool",
111 "runtime.daily_digest_time": "str",
112 "runtime.max_job_cost_usd": "float",
113 "tools.browser": "bool",
114 "tools.web": "bool",
115 "tools.shell": "bool",
116 "tools.files": "bool",
117}
118
119CHAT_SETTING_COMMANDS = {
120 "model": ("model.name", "MODEL"),
121 "base-url": ("model.base_url", "URL"),
122 "api-key-env": ("model.api_key_env", "ENV_NAME"),
123 "context": ("model.context_length", "TOKENS"),
124 "input-cost": ("model.input_cost_per_million", "DOLLARS_PER_1M_INPUT_TOKENS"),
125 "output-cost": ("model.output_cost_per_million", "DOLLARS_PER_1M_OUTPUT_TOKENS"),
126 "max-cost": ("runtime.max_job_cost_usd", "DOLLARS_OR_0"),
127 "timeout": ("model.request_timeout_seconds", "SECONDS"),
128 "browser": ("tools.browser", "true|false"),
129 "web": ("tools.web", "true|false"),
130 "cli-access": ("tools.shell", "true|false"),
131 "file-access": ("tools.files", "true|false"),
132 "home": ("runtime.home", "PATH"),
133 "step-limit": ("runtime.max_step_seconds", "SECONDS"),
134 "output-chars": ("runtime.artifact_inline_char_limit", "CHARS"),
135 "daily-digest": ("runtime.daily_digest_enabled", "true|false"),
136 "digest-time": ("runtime.daily_digest_time", "HH:MM"),
137}
138
139SLASH_ARGUMENT_HINTS = {
140 "new": "OBJECTIVE",
141 "focus": "JOB_TITLE",
142 "switch": "JOB_TITLE",
143 "delete": "JOB_TITLE",
144 "history": "LIMIT",
145 "events": "LIMIT",
146 "outputs": "LIMIT",
147 "outcomes": "all",
148 "updates": "all",
149 "artifact": "QUERY_OR_ID",
150 "work": "N",
151 "work-verbose": "N",
152 "learn": "LESSON",
153 "note": "MESSAGE",
154 "follow": "MESSAGE",
155 **{command: placeholder for command, (_field, placeholder) in CHAT_SETTING_COMMANDS.items()},
156 "api-key": "KEY",
157 "key": "KEY",
158}
159
160REQUIRED_SLASH_ARGUMENTS = {"new", "artifact", "learn", "note", "follow"}
161
162
163def slash_suggestion_lines(
164 input_buffer: str,
165 commands: list[tuple[str, str]],
166 *,
167 width: int,
168 limit: int = 5,
169) -> list[str]:
170 if not input_buffer.startswith("/"):
171 return []
172 parts = input_buffer[1:].split(maxsplit=1)
173 token = parts[0].lower() if parts else ""
174 if " " in input_buffer[1:]:
175 hint = SLASH_ARGUMENT_HINTS.get(token)
176 description = next((desc for cmd, desc in commands if cmd == f"/{token}"), "")
177 if not hint:
178 return []
179 body = f"{_accent('/' + token)} {_muted(hint)}"
180 if description:
181 body += f" {_muted(description)}"
182 return [
183 _muted("╭─ command " + "─" * max(0, width - 11)),
184 _fit_ansi(_muted("│ ") + body, width),
185 _fit_ansi(_muted(_slash_argument_footer(parts, hint)), width),
186 ]
187 command_names = [cmd for cmd, _desc in commands]
188 selected_command = f"/{token}" if token else ""
189 exact_selection = selected_command in command_names and input_buffer.rstrip() == selected_command
190 if exact_selection:
191 all_matches = commands
192 else:
193 all_matches = [(cmd, desc) for cmd, desc in commands if cmd[1:].startswith(token)]
194 if not all_matches and token:
195 all_matches = [(cmd, desc) for cmd, desc in commands if token in cmd[1:]]
196 if exact_selection:
197 selected_index = next((index for index, (cmd, _desc) in enumerate(all_matches) if cmd == selected_command), 0)
198 start = max(0, min(selected_index - max(0, limit // 2), max(0, len(all_matches) - limit)))
199 matches = all_matches[start : start + limit]
200 else:
201 matches = all_matches[:limit]
202 if not matches:
203 return [
204 _muted("╭─ commands " + "─" * max(0, width - 12)),
205 _fit_ansi(_muted("│ no matches"), width),
206 _muted("╰" + "─" * max(0, width - 1)),
207 ]
208 cmd_width = min(14, max(len(cmd) for cmd, _ in matches) + 2)
209 lines = [_fit_ansi(_muted("╭─ commands " + "─" * max(0, width - 12)), width)]
210 for index, (cmd, desc) in enumerate(matches):
211 active = cmd == selected_command if exact_selection else index == 0
212 marker = _accent("›") if active else _muted(" ")
213 hint = SLASH_ARGUMENT_HINTS.get(cmd[1:])
214 command_text = cmd if not hint else f"{cmd} {hint}"
215 command_width = cmd_width + (len(hint) + 1 if hint else 0)
216 name = _bold(_accent(command_text)) if active else _accent(command_text)
217 body = f"{_muted('│')} {marker} {_fit_ansi(name, command_width)} {_muted(desc)}"
218 lines.append(_fit_ansi(body, width))
219 hidden = max(0, len(all_matches) - len(matches))
220 if hidden:
221 lines.append(_fit_ansi(_muted("╰─ type to filter · enter selects first match"), width))
222 else:
223 lines.append(_fit_ansi(_muted("╰─ enter selects · tab fills · ↑↓ moves"), width))
224 return lines
225
226
227def autocomplete_slash(input_buffer: str, commands: list[tuple[str, str]]) -> str:
228 if not input_buffer.startswith("/") or " " in input_buffer[1:]:
229 return input_buffer
230 matches = _slash_command_matches(input_buffer, commands)
231 if not matches:
232 return input_buffer
233 return matches[0] + " "
234
235
236def slash_completion_for_submit(input_buffer: str, commands: list[tuple[str, str]]) -> tuple[str, bool]:
237 """Return the buffer to use and whether Enter should submit it now."""
238
239 if not input_buffer.startswith("/"):
240 return input_buffer, True
241 if " " in input_buffer[1:]:
242 command = input_buffer[1:].split(maxsplit=1)[0].lower()
243 if command in REQUIRED_SLASH_ARGUMENTS and not _slash_argument_text(input_buffer).strip():
244 return input_buffer, False
245 return input_buffer, True
246 current = input_buffer.rstrip()
247 if not current:
248 return input_buffer, True
249 command_names = {cmd for cmd, _desc in commands}
250 token = current[1:].lower()
251 exact = current in command_names
252 if exact and token not in REQUIRED_SLASH_ARGUMENTS:
253 return input_buffer, True
254 matches = _slash_command_matches(input_buffer, commands)
255 if not matches:
256 return input_buffer, True
257 selected = current if exact else matches[0]
258 if selected[1:] not in REQUIRED_SLASH_ARGUMENTS:
259 return selected, True
260 suffix = " "
261 completed = selected + suffix
262 return completed, completed == input_buffer
263
264
265def _slash_argument_text(input_buffer: str) -> str:
266 parts = input_buffer[1:].split(maxsplit=1)
267 return parts[1] if len(parts) > 1 else ""
268
269
270def _slash_argument_footer(parts: list[str], hint: str) -> str:
271 if len(parts) == 1:
272 return f"╰─ type {hint}, then enter"
273 return "╰─ enter sends"
274
275
276def cycle_slash(input_buffer: str, commands: list[tuple[str, str]], *, direction: int) -> str:
277 if not input_buffer.startswith("/") or " " in input_buffer[1:]:
278 return input_buffer
279 current = input_buffer.rstrip()
280 command_names = [cmd for cmd, _desc in commands]
281 matches = command_names if current in command_names else _slash_command_matches(input_buffer, commands)
282 if not matches:
283 return input_buffer
284 try:
285 index = matches.index(current)
286 except ValueError:
287 return matches[0] if direction >= 0 else matches[-1]
288 return matches[(index + direction) % len(matches)]
289
290
291def _slash_command_matches(input_buffer: str, commands: list[tuple[str, str]]) -> list[str]:
292 token = input_buffer.strip()[1:].lower()
293 matches = [cmd for cmd, _desc in commands if cmd[1:].startswith(token)]
294 if not matches:
295 matches = [cmd for cmd, _desc in commands if token in cmd[1:]]
296 return matches
nipux_cli/tui_event_format.py 262 lines
1"""Shared event formatting helpers for Nipux terminal renderers."""
2
3from __future__ import annotations
4
5import os
6import re
7import shlex
8from pathlib import Path
9from typing import Any
10
11from nipux_cli.metric_format import format_metric_value
12from nipux_cli.tui_style import _one_line
13
14
15def event_tool_args(metadata: dict[str, Any]) -> dict[str, Any]:
16 input_data = metadata.get("input") if isinstance(metadata.get("input"), dict) else {}
17 args = input_data.get("arguments") if isinstance(input_data.get("arguments"), dict) else {}
18 return args
19
20
21def shell_write_target(command: str) -> str:
22 if not command.strip():
23 return ""
24 redirect = re.search(r"(?:^|\s)(?:1?>|>>)\s*([^\s;&|]+)", command)
25 if redirect:
26 target = redirect.group(1).strip("'\"")
27 if target and not target.startswith("&"):
28 return target
29 try:
30 parts = shlex.split(command)
31 except ValueError:
32 parts = command.split()
33 for index, part in enumerate(parts):
34 if part != "tee":
35 continue
36 for candidate in parts[index + 1 :]:
37 if candidate.startswith("-"):
38 continue
39 return candidate
40 return ""
41
42
43def event_title_body(title: str, body: str, *, fallback: str) -> str:
44 if title and body and title not in body:
45 return f"{title} - {body}"
46 return title or body or fallback
47
48
49def experiment_metric_text(metadata: dict[str, Any]) -> str:
50 value = metadata.get("metric_value")
51 if value in (None, ""):
52 return ""
53 name = metadata.get("metric_name") or "metric"
54 unit = metadata.get("metric_unit") or ""
55 direction = metadata.get("result_direction") or metadata.get("decision") or ""
56 return " ".join(part for part in [format_metric_value(name, value, unit), str(direction)] if part)
57
58
59def event_clock(event: dict[str, Any]) -> str:
60 compact = _compact_time(str(event.get("created_at") or ""))
61 if len(compact) >= 16 and compact[10:11] == " ":
62 return compact[11:16]
63 return "" if compact == "?" else _one_line(compact, 5)
64
65
66def event_hour(event: dict[str, Any]) -> str:
67 compact = _compact_time(str(event.get("created_at") or ""))
68 if len(compact) >= 13 and compact[10:11] == " ":
69 return f"{compact[:13]}:00"
70 if len(compact) >= 2:
71 return compact
72 return "recent"
73
74
75def friendly_error_text(text: str) -> str:
76 lowered = text.lower()
77 if "key limit exceeded" in lowered:
78 return "Provider key limit exceeded. Update the key limit or switch models."
79 if "authenticationerror" in lowered or "user not found" in lowered or "401" in lowered:
80 return "Model authentication failed. Update the API key with /api-key, then try again."
81 if "permissiondeniederror" in lowered or "403" in lowered:
82 return "Provider permission denied. Check model access or key limits."
83 if (
84 "apiconnectionerror" in lowered
85 or "connection error" in lowered
86 or "connection refused" in lowered
87 or "failed to establish a new connection" in lowered
88 ):
89 return "Model endpoint is unreachable. Check /base-url or start the configured model server, then run /doctor."
90 if "timeout" in lowered or "timed out" in lowered:
91 return "Model request timed out. Check the endpoint/model or adjust /timeout, then run /doctor."
92 return _one_line(clean_step_summary(text), 220)
93
94
95def brief_reflection_text(text: str) -> str:
96 clean = clean_step_summary(text)
97 match = re.search(r"Reflection through step #?([0-9]+):\s*(.*?)(?:\. Best |\.\s*$|$)", clean)
98 if match:
99 counts = match.group(2)
100 counts = counts.replace(", 0 active operator messages", "")
101 counts = counts.replace(", 0 recent finding artifacts", "")
102 return _one_line(f"reflected #{match.group(1)}: {counts}", 140)
103 return _one_line(clean, 140)
104
105
106def generic_display_text(value: Any) -> str:
107 return " ".join(str(value).split())
108
109
110def clean_step_summary(summary: Any) -> str:
111 text = " ".join(str(summary).split())
112 if text.startswith("write_artifact saved ") and " at /" in text:
113 return text.split(" at /", 1)[0]
114 return text
115
116
117def chat_message_paragraphs(value: Any) -> list[str]:
118 text = str(value).replace("\r\n", "\n").replace("\r", "\n")
119 text = re.sub(r"\*\*([^*]+)\*\*", r"\1", text)
120 text = re.sub(r"`([^`]+)`", r"\1", text)
121 text = re.sub(r"(?<!^)\s(?=(?:[0-9]+\.|[-*])\s+)", "\n", text)
122 paragraphs: list[str] = []
123 for raw in text.splitlines():
124 line = " ".join(raw.strip().split())
125 if line:
126 paragraphs.append(line)
127 return paragraphs or [""]
128
129
130def chat_agent_message_text(title: str, body: str) -> str:
131 lowered = title.lower()
132 if lowered == "chat":
133 return body
134 if lowered in {"plan", "planning"}:
135 plan_body = body.split("Questions:", 1)[0]
136 tasks = len(re.findall(r"(?:^|\s)- ", plan_body))
137 if tasks:
138 return f"Plan drafted with {tasks} items. Reply with changes or start work from the controls."
139 return "Plan drafted. Reply with changes or start work from the controls."
140 if lowered in {"progress", "update", "report"}:
141 return _one_line(clean_step_summary(body), 220)
142 return ""
143
144
145def tool_live_summary(tool: str, metadata: dict[str, Any], body: str) -> str:
146 args = event_tool_args(metadata)
147 clean_body = clean_step_summary(body)
148 if tool == "web_search":
149 query = str(args.get("query") or _regex_group(r"query='([^']+)'", clean_body) or "")
150 return f"search {query}" if query else "search web"
151 if tool == "web_extract":
152 urls = args.get("urls") if isinstance(args.get("urls"), list) else []
153 count = len(urls)
154 fetched = _regex_group(r"fetched ([0-9]+/[0-9]+ pages)", clean_body)
155 return f"extract {fetched or (str(count) + ' pages' if count else 'pages')}"
156 if tool == "shell_exec":
157 command = str(args.get("command") or _regex_group(r"cmd='([^']+)'", clean_body) or "")
158 prefix = f"shell {_short_command(command)}" if command else "shell command"
159 rc = metadata.get("output", {}).get("returncode") if isinstance(metadata.get("output"), dict) else None
160 return f"{prefix} rc={rc}" if rc is not None else prefix
161 if tool == "browser_navigate":
162 url = str(args.get("url") or _regex_group(r"<([^>]+)>", clean_body) or "")
163 return f"open {_short_url(url)}" if url else "open page"
164 if tool == "browser_snapshot":
165 return "snapshot page"
166 if tool == "browser_click":
167 ref = str(args.get("ref") or "")
168 return f"click {ref}" if ref else "click page"
169 if tool == "browser_scroll":
170 return f"scroll {args.get('direction') or 'page'}"
171 if tool == "write_artifact":
172 return "save output"
173 if tool == "write_file":
174 args_path = str(args.get("path") or "")
175 output = metadata.get("output") if isinstance(metadata.get("output"), dict) else {}
176 path = str(output.get("path") or args_path)
177 return f"update {short_path(path, max_width=36)}" if path else "update file"
178 if tool == "defer_job":
179 seconds = args.get("seconds") or args.get("delay_seconds")
180 until = args.get("until")
181 if until:
182 return f"wait until {until}"
183 return f"wait {seconds}s" if seconds else "wait before next check"
184 if tool == "record_lesson":
185 return "learn memory"
186 if tool == "record_memory_graph":
187 return "map memory"
188 if tool == "search_memory_graph":
189 return "search memory"
190 if tool == "record_source":
191 return "score source"
192 if tool == "record_findings":
193 return "record findings"
194 if tool == "record_tasks":
195 return "update tasks"
196 if tool == "record_roadmap":
197 return "update roadmap"
198 if tool == "record_milestone_validation":
199 return "validate roadmap"
200 if tool == "record_experiment":
201 return "record experiment"
202 if tool == "acknowledge_operator_context":
203 return "ack operator"
204 if tool == "report_update":
205 return "report update"
206 if tool == "read_artifact":
207 return "read output"
208 if tool == "search_artifacts":
209 return "search outputs"
210 return tool or clean_body or "step"
211
212
213def short_path(path: Path | str, *, max_width: int = 80) -> str:
214 text = str(path)
215 home = str(Path.home())
216 if text.startswith(home + os.sep):
217 text = "~" + text[len(home) :]
218 if len(text) <= max_width:
219 return text
220 keep = max(12, max_width - 4)
221 return "..." + text[-keep:]
222
223
224def _regex_group(pattern: str, text: str) -> str:
225 match = re.search(pattern, text)
226 return match.group(1) if match else ""
227
228
229def _short_url(url: str) -> str:
230 if not url:
231 return ""
232 stripped = url.replace("https://", "").replace("http://", "")
233 return stripped.split("/", 1)[0] or stripped
234
235
236def _short_command(command: str) -> str:
237 if not command:
238 return ""
239 try:
240 parts = shlex.split(command)
241 except ValueError:
242 parts = command.split()
243 if not parts:
244 return ""
245 if parts[0] == "ssh":
246 host = next((part for part in parts[1:] if not part.startswith("-") and "=" not in part), "")
247 remote = " ".join(parts[parts.index(host) + 1 :]) if host in parts else ""
248 if remote:
249 remote_parts = remote.split()
250 remote_head = remote_parts[0] if remote_parts else "remote"
251 return f"ssh {host} {remote_head}"
252 return f"ssh {host}".strip()
253 if parts[0] in {"python", "python3", "uv", "npm", "pnpm", "yarn", "node"} and len(parts) > 1:
254 return " ".join(parts[:3])
255 return " ".join(parts[:2])
256
257
258def _compact_time(value: str) -> str:
259 text = value.replace("T", " ")
260 if len(text) >= 16 and text[4:5] == "-" and text[13:14] == ":":
261 return text[:16]
262 return _one_line(text, 16)
nipux_cli/tui_events.py 371 lines
1"""Compact event rendering helpers for the Nipux terminal UI."""
2
3from __future__ import annotations
4
5import re
6import textwrap
7from typing import Any
8
9from nipux_cli.tui_event_format import (
10 brief_reflection_text,
11 chat_message_paragraphs,
12 event_clock,
13 event_title_body,
14 friendly_error_text,
15 generic_display_text,
16 tool_live_summary,
17)
18from nipux_cli.tui_style import (
19 _accent,
20 _bold,
21 _center_ansi,
22 _fit_ansi,
23 _muted,
24 _one_line,
25 _style,
26)
27
28THINKING_NOTICE_PREFIX = "__nipux_thinking__:"
29WAITING_NOTICE_PREFIX = "__nipux_waiting__:"
30
31LOW_VALUE_CHAT_NOTICES = (
32 "sent; waiting for model",
33 "sent, waiting for model",
34 "waiting for model",
35 "waiting for the next worker step",
36)
37
38NIPUX_HERO = [
39 "███╗ ██╗██╗██████╗ ██╗ ██╗██╗ ██╗",
40 "████╗ ██║██║██╔══██╗██║ ██║╚██╗██╔╝",
41 "██╔██╗ ██║██║██████╔╝██║ ██║ ╚███╔╝ ",
42 "██║╚██╗██║██║██╔═══╝ ██║ ██║ ██╔██╗ ",
43 "██║ ╚████║██║██║ ╚██████╔╝██╔╝ ██╗",
44 "╚═╝ ╚═══╝╚═╝╚═╝ ╚═════╝ ╚═╝ ╚═╝",
45]
46
47LOW_SIGNAL_FRAME_TOOLS = {
48 "acknowledge_operator_context",
49 "read_artifact",
50 "record_experiment",
51 "record_findings",
52 "record_lesson",
53 "record_milestone_validation",
54 "record_roadmap",
55 "record_source",
56 "record_tasks",
57 "reflect",
58 "report_update",
59 "search_artifacts",
60 "update_job_state",
61 "write_artifact",
62}
63
64GENERIC_CHAT_NOTICE_PREFIXES = (
65 "opened ",
66 "focus set",
67 "paused ",
68 "resumed ",
69 "cancelled ",
70 "deleted ",
71)
72
73
74def chat_event_parts(event: dict[str, Any]) -> tuple[str, str, str] | None:
75 kind = str(event.get("event_type") or "")
76 title = str(event.get("title") or "").strip()
77 body = str(event.get("body") or "")
78 metadata = event.get("metadata") if isinstance(event.get("metadata"), dict) else {}
79 clock = event_clock(event)
80 if kind == "operator_message":
81 return "YOU", body, clock
82 if kind == "agent_message" and title == "chat":
83 if metadata.get("error"):
84 body = friendly_error_text(body)
85 if _is_low_value_chat_notice(_normalized_chat_body(body)) or _is_waiting_notice(_normalized_chat_body(body)):
86 return None
87 return "AGENT", body, clock
88 return None
89
90
91def append_chat_output(lines: list[str], label: str, body: Any, *, clock: str, width: int) -> None:
92 label_text = _chat_label(label)
93 meta = f"{label_text} {_muted(clock)}" if clock else label_text
94 lines.append(_fit_ansi(meta, width))
95 prefix = " "
96 available = max(18, width - len(prefix))
97 for paragraph in chat_message_paragraphs(body):
98 wrapped = textwrap.wrap(paragraph, width=available) or [""]
99 for part in wrapped:
100 lines.append(_fit_ansi(prefix + part, width))
101 if len(lines) == 1:
102 lines.append(_fit_ansi(prefix, width))
103
104
105def chat_pane_lines(events: list[dict[str, Any]], notices: list[str], *, width: int, rows: int) -> list[str]:
106 items: list[tuple[str, str, str]] = []
107 seen_chat_bodies: set[tuple[str, str]] = set()
108 seen_bodies: set[str] = set()
109 for event in events:
110 rendered = chat_event_parts(event)
111 if not rendered:
112 continue
113 label, body, clock = rendered
114 items.append((label, body, clock))
115 normalized = _normalized_chat_body(body)
116 seen_chat_bodies.add((label, normalized))
117 seen_bodies.add(normalized)
118 for notice in notices:
119 if notice.startswith(THINKING_NOTICE_PREFIX):
120 body = notice.removeprefix(THINKING_NOTICE_PREFIX).strip() or "thinking"
121 items.append(("THINKING", body, ""))
122 elif notice.startswith(WAITING_NOTICE_PREFIX):
123 body = notice.removeprefix(WAITING_NOTICE_PREFIX).strip() or "waiting"
124 items.append(("WAITING", body, ""))
125 elif notice.startswith("> "):
126 body = notice[2:]
127 normalized = _normalized_chat_body(body)
128 if normalized and normalized not in seen_bodies and ("YOU", normalized) not in seen_chat_bodies:
129 items.append(("YOU", body, ""))
130 seen_bodies.add(normalized)
131 else:
132 body = notice
133 normalized = _normalized_chat_body(body)
134 if _is_low_value_chat_notice(normalized) or _is_waiting_notice(normalized) or _is_generic_chat_notice(normalized):
135 continue
136 if normalized and normalized not in seen_bodies and ("AGENT", normalized) not in seen_chat_bodies:
137 items.append(("NIPUX", body, ""))
138 seen_bodies.add(normalized)
139 if not items:
140 return chat_empty_state_lines(width=width, rows=rows)
141 rendered_items = [_chat_item_lines(label, body, clock=clock, width=width) for label, body, clock in items[-max(4, rows) :]]
142 output_rows = _flatten_chat_blocks(rendered_items)
143 if len(output_rows) <= rows:
144 return output_rows
145 if rows <= 1:
146 return output_rows[-rows:]
147 newest = rendered_items[-1]
148 if len(newest) >= rows:
149 if rows <= 3:
150 visible = newest[: rows - 1]
151 hidden = len(newest) - len(visible)
152 marker = _fit_ansi(_muted(f"... {hidden} more lines in /history."), width)
153 return [*visible, marker]
154 head = max(1, min(4, rows // 3))
155 tail = max(1, rows - head - 1)
156 hidden = max(0, len(newest) - head - tail)
157 marker = _fit_ansi(_muted(f"... {hidden} middle lines hidden; /history shows all."), width)
158 return [*newest[:head], marker, *newest[-tail:]]
159 visible_blocks: list[list[str]] = [newest]
160 used = len(newest)
161 hidden_lines = 0
162 for block in reversed(rendered_items[:-1]):
163 if used + len(block) + 1 <= rows:
164 visible_blocks.insert(0, block)
165 used += len(block)
166 else:
167 hidden_lines += len(block)
168 marker = _fit_ansi(_muted(f"... {hidden_lines} older chat lines hidden; /history shows all."), width)
169 return [marker, *_flatten_chat_blocks(visible_blocks)][:rows]
170
171
172def _chat_item_lines(label: str, body: Any, *, clock: str, width: int) -> list[str]:
173 if label == "THINKING":
174 return [_fit_ansi(f"{_style('AGENT', '36;1')} {_accent(str(body))}", width)]
175 if label == "WAITING":
176 return [_fit_ansi(f"{_style('AGENT', '36;1')} {_muted(str(body))}", width)]
177 lines: list[str] = []
178 append_chat_output(lines, label, body, clock=clock, width=width)
179 return lines
180
181
182def _flatten_chat_blocks(blocks: list[list[str]]) -> list[str]:
183 rows: list[str] = []
184 for block in blocks:
185 if rows:
186 rows.append("")
187 rows.extend(block)
188 return rows
189
190
191def _chat_label(label: str) -> str:
192 if label == "YOU":
193 return _style("you", "35;1")
194 if label == "AGENT":
195 return _style("nipux", "36;1")
196 if label == "THINKING":
197 return _style("thinking", "36")
198 if label == "WAITING":
199 return _style("waiting", "36")
200 return _muted(label.lower())
201
202
203def _normalized_chat_body(value: Any) -> str:
204 return " ".join(str(value or "").split())
205
206
207def _is_low_value_chat_notice(normalized: str) -> bool:
208 lowered = normalized.lower()
209 return any(phrase in lowered for phrase in LOW_VALUE_CHAT_NOTICES)
210
211
212def _is_waiting_notice(normalized: str) -> bool:
213 lowered = normalized.lower()
214 return (
215 lowered.startswith("waiting:")
216 or lowered.startswith("waiting for ")
217 or "message saved for the worker" in lowered
218 )
219
220
221def _is_generic_chat_notice(normalized: str) -> bool:
222 lowered = normalized.lower()
223 return any(lowered.startswith(prefix) for prefix in GENERIC_CHAT_NOTICE_PREFIXES)
224
225
226def chat_empty_state_lines(*, width: int, rows: int) -> list[str]:
227 content = [
228 _center_ansi(_bold(_accent("NIPUX")), width),
229 "",
230 *_centered_wrapped_hint("Type a goal in plain English to start a worker.", width=width),
231 *_centered_wrapped_hint("Or use /new OBJECTIVE.", width=width),
232 *_centered_wrapped_hint("/settings configures provider/tools.", width=width),
233 ]
234 top_pad = max(0, (rows - len(content)) // 2)
235 return ([""] * top_pad + content)[:rows]
236
237
238def _centered_wrapped_hint(text: str, *, width: int) -> list[str]:
239 available = max(24, min(68, width - 4))
240 return [_center_ansi(_muted(part), width) for part in textwrap.wrap(text, width=available) or [text]]
241
242
243def worker_activity_lines(events: list[dict[str, Any]], *, width: int, limit: int) -> list[str]:
244 items: list[dict[str, Any]] = []
245 for event in events:
246 line = minimal_live_event_line(event, chars=max(16, width - 12))
247 if not line:
248 continue
249 if items and items[-1].get("key") == line:
250 items[-1]["count"] = int(items[-1].get("count") or 1) + 1
251 continue
252 items.append({"line": line, "count": 1, "key": line})
253 rendered = []
254 for item in items[-limit:]:
255 line = str(item["line"])
256 count = int(item.get("count") or 1)
257 rendered.append(f"{live_badge(line)} {_one_line(live_display_text(line, count=count), max(16, width - 9))}")
258 return rendered
259
260
261def minimal_live_event_line(event: dict[str, Any], *, chars: int = 92) -> str:
262 kind = str(event.get("event_type") or "")
263 title = str(event.get("title") or "").strip()
264 body = generic_display_text(event.get("body") or "")
265 metadata = event.get("metadata") if isinstance(event.get("metadata"), dict) else {}
266 status = str(metadata.get("status") or "")
267 if kind == "operator_message":
268 return ""
269 if kind == "agent_message" and title == "chat":
270 return ""
271 if kind == "operator_context":
272 return _one_line(f"operator {title or body}", chars)
273 if kind == "tool_call":
274 if title in LOW_SIGNAL_FRAME_TOOLS:
275 return ""
276 return _one_line("start " + tool_live_summary(title, metadata, body), chars)
277 if kind == "tool_result":
278 if title in LOW_SIGNAL_FRAME_TOOLS and status == "completed":
279 return ""
280 if title == "llm" and status == "failed":
281 return ""
282 prefix = "blocked" if status == "blocked" else "failed" if status == "failed" else "done"
283 detail = friendly_error_text(body or title) if status in {"blocked", "failed"} else tool_live_summary(title, metadata, body)
284 return _one_line(f"{prefix} {detail}", chars)
285 if kind == "error":
286 detail = friendly_error_text(body or title or "error")
287 return _one_line(f"error {detail}", chars)
288 if kind == "artifact":
289 return _one_line(f"saved {title or body or 'output'}", chars)
290 if kind == "finding":
291 return _one_line(f"finding {title or body}", chars)
292 if kind == "source":
293 return _one_line(f"source {title or body}", chars)
294 if kind == "task":
295 return _one_line(f"task {title or body}", chars)
296 if kind == "roadmap":
297 return _one_line(f"roadmap {title or body}", chars)
298 if kind == "milestone_validation":
299 validation = str(metadata.get("validation_status") or metadata.get("status") or "")
300 return _one_line(f"validate {validation} {title or body}".strip(), chars)
301 if kind == "experiment":
302 return _one_line(f"experiment {title or body}", chars)
303 if kind == "lesson":
304 detail = event_title_body(title, body, fallback="lesson")
305 return _one_line(f"learned {detail}", chars)
306 if kind == "reflection":
307 return _one_line(f"reflect {brief_reflection_text(body or title)}", chars)
308 if kind == "agent_message":
309 if title not in {"chat", "progress", "update", "report"}:
310 return ""
311 return _one_line(f"update {body or title}", chars)
312 if kind in {"daemon", "loop"}:
313 return ""
314 return ""
315
316
317def live_badge(text: str) -> str:
318 badge_text = re.sub(r"^x[0-9]+\s+", "", text)
319 if badge_text.startswith("error") or badge_text.startswith("failed"):
320 return _style("FAIL", "31")
321 if badge_text.startswith("blocked"):
322 return _style("BLOCK", "33")
323 if badge_text.startswith("start"):
324 return _style("run ", "36")
325 if badge_text.startswith("done"):
326 return _style("done", "32")
327 if badge_text.startswith("saved"):
328 return _style("save", "32")
329 if badge_text.startswith("finding"):
330 return _style("find", "32")
331 if badge_text.startswith("source"):
332 return _style("src ", "36")
333 if badge_text.startswith("experiment"):
334 return _style("test", "33")
335 if badge_text.startswith("task"):
336 return _style("task", "33")
337 if badge_text.startswith("learned"):
338 return _style("mem ", "36")
339 if badge_text.startswith("reflect"):
340 return _style("plan", "35")
341 if badge_text.startswith("update"):
342 return _style("note", "35")
343 return _style("info", "2")
344
345
346def live_display_text(text: str, *, count: int = 1) -> str:
347 if count > 1 and (
348 text.startswith("error")
349 or text.startswith("failed")
350 or text.startswith("blocked")
351 ):
352 return f"x{count} {text}"
353 base = text
354 for prefix in (
355 "start ",
356 "done ",
357 "saved ",
358 "finding ",
359 "source ",
360 "experiment ",
361 "task ",
362 "learned ",
363 "reflect ",
364 "update ",
365 ):
366 if text.startswith(prefix):
367 base = text[len(prefix) :]
368 break
369 if count > 1:
370 return f"{base} x{count}"
371 return base
nipux_cli/tui_input.py 80 lines
1"""Terminal input helpers for full-screen Nipux frames."""
2
3from __future__ import annotations
4
5import os
6import re
7import select
8import sys
9import time
10
11
12def read_terminal_char(fd: int) -> str:
13 data = os.read(fd, 1)
14 return data.decode("latin1", errors="ignore")
15
16
17def read_escape_sequence(first: str, *, fd: int | None = None) -> str:
18 fd = sys.stdin.fileno() if fd is None else fd
19 sequence = first
20 deadline = time.monotonic() + 0.12
21 while len(sequence) < 96:
22 timeout = max(0.0, min(0.04, deadline - time.monotonic()))
23 if timeout <= 0:
24 break
25 readable, _, _ = select.select([fd], [], [], timeout)
26 if not readable:
27 break
28 sequence += read_terminal_char(fd)
29 if terminal_escape_complete(sequence):
30 break
31 return sequence
32
33
34def terminal_escape_complete(sequence: str) -> bool:
35 if sequence in {"\x1b[A", "\x1b[B", "\x1b[C", "\x1b[D", "\x1bOA", "\x1bOB", "\x1bOC", "\x1bOD"}:
36 return True
37 if re.match(r"^\x1b\[[0-9;?]*[ABCD]$", sequence):
38 return True
39 if re.match(r"^\x1b\[<\d+;\d+;\d+[mM]$", sequence):
40 return True
41 if sequence.startswith("\x1b[M") and len(sequence) >= 6:
42 return True
43 return False
44
45
46def decode_terminal_escape(sequence: str) -> tuple[str, tuple[int, int] | None]:
47 arrows = {
48 "\x1b[A": "up",
49 "\x1b[B": "down",
50 "\x1b[C": "right",
51 "\x1b[D": "left",
52 "\x1bOA": "up",
53 "\x1bOB": "down",
54 "\x1bOC": "right",
55 "\x1bOD": "left",
56 }
57 if sequence in arrows:
58 return arrows[sequence], None
59 csi_arrow = re.match(r"^\x1b\[[0-9;?]*([ABCD])$", sequence)
60 if csi_arrow:
61 return {"A": "up", "B": "down", "C": "right", "D": "left"}[csi_arrow.group(1)], None
62 match = re.match(r"^\x1b\[<(\d+);(\d+);(\d+)([mM])$", sequence)
63 if match and match.group(4) == "M":
64 button = int(match.group(1))
65 if button == 0:
66 return "click", (int(match.group(2)), int(match.group(3)))
67 if sequence.startswith("\x1b[M") and len(sequence) >= 6:
68 button = ord(sequence[3]) - 32
69 if button == 0:
70 return "click", (ord(sequence[4]) - 32, ord(sequence[5]) - 32)
71 return "unknown", None
72
73
74def drain_pending_input(fd: int | None = None) -> None:
75 fd = sys.stdin.fileno() if fd is None else fd
76 while True:
77 readable, _, _ = select.select([fd], [], [], 0)
78 if not readable:
79 return
80 os.read(fd, 1)
nipux_cli/tui_layout.py 234 lines
1"""Reusable terminal layout primitives for Nipux frames."""
2
3from __future__ import annotations
4
5from typing import Any
6
7from nipux_cli.tui_style import (
8 _accent,
9 _bold,
10 _fit_ansi,
11 _muted,
12 _one_line,
13 _strip_ansi,
14 _style,
15)
16
17
18def _top_bar(
19 width: int,
20 *,
21 state: str,
22 daemon: str,
23 model: str,
24 token_usage: dict[str, Any] | None = None,
25 context_length: int = 0,
26 base_url: str = "",
27) -> list[str]:
28 del state, daemon
29 title = _style("NIPUX", "38;5;123;1")
30 usage_text = _token_usage_topline(token_usage or {}, context_length=context_length, model=model, base_url=base_url)
31 model_line = f"{_muted('model')} {_style(_one_line(model, max(16, width // 3)), '36')}"
32 if width >= 118:
33 compact_model = f"{_muted('model')} {_style(_one_line(model, max(14, width // 5)), '36')}"
34 return [
35 _edge_line(title, f"{compact_model} {usage_text}", width=width),
36 _muted("━" * width),
37 ]
38 first = _edge_line(title, model_line, width=width)
39 second = _edge_line("", usage_text, width=width)
40 return [
41 first,
42 second,
43 _muted("━" * width),
44 ]
45
46
47def _two_col_title(left_width: int, right_width: int, left: str, right: str) -> str:
48 left_title = _style(left.upper(), "38;5;252;1")
49 right_title = _style(right.upper(), "38;5;252;1")
50 return _fit_ansi(left_title, left_width) + _muted(" │ ") + _fit_ansi(right_title, right_width)
51
52
53def _two_col_line(left: str, right: str, *, left_width: int, right_width: int) -> str:
54 return _fit_ansi(left, left_width) + _muted(" │ ") + _fit_ansi(right, right_width)
55
56
57def _edge_line(left: str, right: str, *, width: int) -> str:
58 right_len = len(_strip_ansi(right))
59 left_width = max(0, width - right_len - 2)
60 left_text = _fit_ansi(left, left_width)
61 gap = max(1, width - len(_strip_ansi(left_text)) - right_len)
62 return _fit_ansi(left_text + " " * gap + right, width)
63
64
65def _triple_line(left: str, center: str, right: str, *, width: int) -> str:
66 right_len = len(_strip_ansi(right))
67 center_len = len(_strip_ansi(center))
68 left_len = len(_strip_ansi(left))
69 center_start = max(left_len + 2, (width - center_len) // 2)
70 right_start = max(center_start + center_len + 1, width - right_len)
71 if right_start >= width:
72 return _edge_line(center, right, width=width)
73 parts = [
74 left,
75 " " * max(1, center_start - left_len),
76 center,
77 " " * max(1, right_start - center_start - center_len),
78 right,
79 ]
80 return _fit_ansi("".join(parts), width)
81
82
83def _compose_bar(
84 input_buffer: str,
85 *,
86 width: int,
87 hint: str | None = None,
88 suggestions: list[str] | None = None,
89 prompt_label: str = "❯",
90 title: str = "message",
91 mask_input: bool = False,
92) -> list[str]:
93 if mask_input:
94 visible_input = "•" * min(len(input_buffer), max(8, width - 8))
95 else:
96 visible_input = input_buffer[-max(8, width - 8) :]
97 hint = _muted(hint or "Enter send · / commands · arrows navigate")
98 label = _accent(prompt_label) if prompt_label == "❯" else _muted(prompt_label)
99 prompt = f"{label} {visible_input}{_accent('▌')}"
100 lines = []
101 if suggestions:
102 lines.extend(suggestions)
103 title = f" {title.strip()} "
104 lines.extend([
105 _muted("╭─" + title + "─" * max(0, width - len(title) - 2)),
106 _fit_ansi(_muted("│ ") + prompt, width),
107 _fit_ansi(_muted("╰─ ") + hint, width),
108 ])
109 return lines
110
111
112def _metric_strip(items: list[tuple[str, Any]], *, width: int) -> str:
113 parts = [f"{_muted(label)} {_bold(value)}" for label, value in items]
114 text = " ".join(parts)
115 if len(_strip_ansi(text)) <= width:
116 return text
117 compact = [f"{label}:{value}" for label, value in items]
118 return _one_line(" ".join(compact), width)
119
120
121def _pill(label: str, value: Any) -> str:
122 value_text = str(value)
123 color = "36"
124 lowered = value_text.lower()
125 if any(term in lowered for term in ("running", "active", "advancing", "ok")):
126 color = "32"
127 elif any(term in lowered for term in ("paused", "idle", "queued", "planning")):
128 color = "33"
129 elif any(term in lowered for term in ("failed", "cancelled", "error", "stopped")):
130 color = "31"
131 return f"{_muted(label)} {_style(value_text, color)}"
132
133
134def _token_usage_topline(
135 usage: dict[str, Any],
136 *,
137 context_length: int,
138 model: str,
139 base_url: str,
140) -> str:
141 calls = _safe_int(usage.get("calls"))
142 if calls <= 0:
143 return (
144 f"{_muted('ctx')} {_style('0', '36')} "
145 f"{_muted('out')} {_style('0', '36')} "
146 f"{_muted('tok')} {_style('0', '36')} "
147 f"{_muted('cost')} {_style('$0.00', '36')}"
148 )
149 latest_prompt = _safe_int(usage.get("latest_prompt_tokens"))
150 completion = _safe_int(usage.get("completion_tokens"))
151 total = _safe_int(usage.get("total_tokens")) or latest_prompt + completion
152 ctx_text = _format_compact_count(latest_prompt)
153 if context_length > 0:
154 ctx_text = f"{ctx_text}/{_format_compact_count(context_length)}"
155 cost_text = _format_usage_cost(usage, model=model, base_url=base_url)
156 return (
157 f"{_muted('ctx')} {_style(ctx_text, '36')} "
158 f"{_muted('out')} {_style(_format_compact_count(completion), '36')} "
159 f"{_muted('tok')} {_style(_format_compact_count(total), '36')} "
160 f"{_muted('cost')} {_style(cost_text, '36')}"
161 )
162
163
164def _model_cost_is_zero(*, model: str, base_url: str) -> bool:
165 lowered_model = model.lower()
166 lowered_url = base_url.lower()
167 return (
168 lowered_model.endswith(":free")
169 or lowered_model in {"local-model", "fake", "test"}
170 or "localhost" in lowered_url
171 or "127.0.0.1" in lowered_url
172 )
173
174
175def _format_usage_cost(usage: dict[str, Any], *, model: str, base_url: str) -> str:
176 if bool(usage.get("has_cost")):
177 return f"${_safe_float(usage.get('cost')):.4f}"
178 if _model_cost_is_zero(model=model, base_url=base_url):
179 return "$0.00"
180 input_rate = _safe_optional_float(usage.get("input_cost_per_million"))
181 output_rate = _safe_optional_float(usage.get("output_cost_per_million"))
182 if input_rate is not None and output_rate is not None:
183 prompt = _safe_int(usage.get("prompt_tokens"))
184 completion = _safe_int(usage.get("completion_tokens"))
185 if prompt > 0 or completion > 0:
186 estimated = (prompt / 1_000_000 * input_rate) + (completion / 1_000_000 * output_rate)
187 return f"~${estimated:.4f}"
188 if _safe_int(usage.get("estimated_calls")):
189 return "pending"
190 return "pending"
191
192
193def _format_compact_count(value: Any) -> str:
194 number = _safe_int(value)
195 if number >= 1_000_000_000:
196 return f"{number / 1_000_000_000:.1f}B"
197 if number >= 1_000_000:
198 return f"{number / 1_000_000:.1f}M"
199 if number >= 1_000:
200 return f"{number / 1_000:.1f}K"
201 return str(number)
202
203
204def _safe_int(value: Any) -> int:
205 try:
206 return int(float(value))
207 except (TypeError, ValueError):
208 return 0
209
210
211def _safe_float(value: Any) -> float:
212 try:
213 return float(value)
214 except (TypeError, ValueError):
215 return 0.0
216
217
218def _safe_optional_float(value: Any) -> float | None:
219 if value in (None, ""):
220 return None
221 try:
222 return float(value)
223 except (TypeError, ValueError):
224 return None
225
226
227def _status_dot(state: str) -> str:
228 if state in {"advancing", "running", "active"}:
229 return _style("●", "32")
230 if state in {"paused", "queued", "planning", "idle"}:
231 return _style("●", "33")
232 if state in {"failed", "cancelled"}:
233 return _style("●", "31")
234 return _style("●", "36")
nipux_cli/tui_outcomes.py 508 lines
1"""Durable outcome summaries for the Nipux terminal UI."""
2
3from __future__ import annotations
4
5import textwrap
6from typing import Any
7
8from nipux_cli.tui_event_format import (
9 brief_reflection_text,
10 chat_agent_message_text,
11 event_clock,
12 event_hour,
13 event_title_body,
14 event_tool_args,
15 experiment_metric_text,
16 generic_display_text,
17 shell_write_target,
18 short_path,
19 tool_live_summary,
20)
21from nipux_cli.tui_style import _bold, _event_badge, _fit_ansi, _muted, _one_line, _page_indicator, _strip_ansi
22
23
24CHAT_RIGHT_PAGES = [("updates", "Updates"), ("status", "Jobs")]
25
26DURABLE_OUTCOME_LABELS = {
27 "SAVE",
28 "FIND",
29 "SOURCE",
30 "TEST",
31 "TASK",
32 "ROAD",
33 "VALID",
34 "LEARN",
35 "FILE",
36}
37
38SUMMARY_COUNT_LABELS = DURABLE_OUTCOME_LABELS | {"DONE", "FAIL"}
39PRIMARY_OUTCOME_LABELS = DURABLE_OUTCOME_LABELS | {"FAIL"}
40OUTCOME_LABEL_ORDER = [
41 "SAVE",
42 "FIND",
43 "TEST",
44 "FILE",
45 "TASK",
46 "ROAD",
47 "VALID",
48 "SOURCE",
49 "LEARN",
50 "PLAN",
51 "UPDATE",
52 "FAIL",
53 "DONE",
54]
55
56OUTCOME_SUMMARY_NAMES = {
57 "SAVE": "outputs",
58 "FIND": "findings",
59 "SOURCE": "sources",
60 "TEST": "measurements",
61 "TASK": "tasks",
62 "ROAD": "roadmap",
63 "VALID": "validations",
64 "LEARN": "lessons",
65 "PLAN": "plans",
66 "UPDATE": "updates",
67 "FAIL": "blocks",
68 "FILE": "files",
69 "DONE": "research",
70}
71
72SUMMARY_EVENT_TYPES = (
73 "agent_message",
74 "artifact",
75 "error",
76 "experiment",
77 "finding",
78 "lesson",
79 "milestone_validation",
80 "reflection",
81 "roadmap",
82 "source",
83 "task",
84)
85
86SUMMARY_TOOL_EVENT_TYPES = ("tool_result",)
87
88
89def model_update_event_parts(event: dict[str, Any], *, width: int, compact: bool = True) -> tuple[str, str, str] | None:
90 kind = str(event.get("event_type") or "")
91 title = generic_display_text(event.get("title") or "")
92 body = generic_display_text(event.get("body") or "")
93 metadata = event.get("metadata") if isinstance(event.get("metadata"), dict) else {}
94 status = str(metadata.get("status") or "")
95 clock = event_clock(event)
96 chars = max(24, width - 16)
97 if kind == "error":
98 return "FAIL", _outcome_text(event_title_body(title, body, fallback="error"), chars=chars, compact=compact), clock
99 if kind == "artifact":
100 detail = title or body or str(metadata.get("summary") or "") or "saved output"
101 return "SAVE", _outcome_text(detail, chars=chars, compact=compact), clock
102 if kind == "finding":
103 return "FIND", _outcome_text(event_title_body(title, body, fallback="finding"), chars=chars, compact=compact), clock
104 if kind == "source":
105 return "SOURCE", _outcome_text(event_title_body(title, body, fallback="source"), chars=chars, compact=compact), clock
106 if kind == "experiment":
107 metric = experiment_metric_text(metadata)
108 detail = event_title_body(title, body, fallback="measurement")
109 if metric and metric not in detail:
110 detail = f"{detail} - {metric}"
111 return "TEST", _outcome_text(detail, chars=chars, compact=compact), clock
112 if kind == "task":
113 task_status = str(metadata.get("status") or "")
114 detail = event_title_body(title, body, fallback="task")
115 prefix = f"{task_status} " if task_status else ""
116 return "TASK", _outcome_text(prefix + detail, chars=chars, compact=compact), clock
117 if kind == "roadmap":
118 return "ROAD", _outcome_text(event_title_body(title, body, fallback="roadmap"), chars=chars, compact=compact), clock
119 if kind == "milestone_validation":
120 validation = str(metadata.get("validation_status") or metadata.get("status") or "")
121 detail = event_title_body(title, body, fallback="milestone")
122 return "VALID", _outcome_text(f"{validation} {detail}".strip(), chars=chars, compact=compact), clock
123 if kind == "lesson":
124 return "LEARN", _outcome_text(event_title_body(title, body, fallback="lesson"), chars=chars, compact=compact), clock
125 if kind == "reflection":
126 return "PLAN", _outcome_text(brief_reflection_text(body or title), chars=chars, compact=compact), clock
127 if kind == "agent_message" and title.lower() in {"error", "blocked"}:
128 detail = body or chat_agent_message_text(title, body) or event_title_body(title, body, fallback="error")
129 return "FAIL", _outcome_text(detail, chars=chars, compact=compact), clock
130 if kind == "agent_message" and title.lower() in {"progress", "update", "report", "plan", "planning"}:
131 durable_progress = _durable_progress_event_parts(metadata, body=body, chars=chars, compact=compact, clock=clock)
132 if durable_progress:
133 return durable_progress
134 detail = chat_agent_message_text(title, body) or event_title_body(title, body, fallback="update")
135 return "UPDATE", _outcome_text(detail, chars=chars, compact=compact), clock
136 if kind == "tool_result" and status == "completed":
137 tool = title
138 if tool in {"web_search", "web_extract"}:
139 return "DONE", _outcome_text(tool_live_summary(tool, metadata, body), chars=chars, compact=compact), clock
140 if tool == "shell_exec":
141 command = str(event_tool_args(metadata).get("command") or "")
142 target = shell_write_target(command)
143 if target:
144 return "FILE", _outcome_text(f"updated {short_path(target, max_width=chars - 8)} via shell", chars=chars, compact=compact), clock
145 if tool == "write_file":
146 output = metadata.get("output") if isinstance(metadata.get("output"), dict) else {}
147 path = str(output.get("path") or event_tool_args(metadata).get("path") or "")
148 return "FILE", _outcome_text(f"updated {short_path(path, max_width=chars - 8)}", chars=chars, compact=compact), clock
149 return None
150
151
152def is_summary_event_candidate(event: dict[str, Any]) -> bool:
153 kind = str(event.get("event_type") or "")
154 if kind in SUMMARY_EVENT_TYPES:
155 return True
156 if kind != "tool_result":
157 return False
158 metadata = event.get("metadata") if isinstance(event.get("metadata"), dict) else {}
159 if str(metadata.get("status") or "") != "completed":
160 return False
161 title = str(event.get("title") or "")
162 if title == "write_file":
163 return True
164 if title == "shell_exec":
165 command = str(event_tool_args(metadata).get("command") or "")
166 return bool(shell_write_target(command))
167 return False
168
169
170def latest_durable_outcome_line(events: list[dict[str, Any]], *, width: int) -> str:
171 fallback: tuple[str, str, str] | None = None
172 for event in reversed(events):
173 parsed = model_update_event_parts(event, width=width)
174 if not parsed:
175 continue
176 label, text, _clock = parsed
177 if label == "DONE":
178 fallback = fallback or parsed
179 continue
180 if label not in PRIMARY_OUTCOME_LABELS:
181 continue
182 prefix = f"{_muted('Outcome')} {_event_badge(label)} "
183 return _fit_ansi(prefix + _one_line(text, max(12, width - len(_strip_ansi(prefix)))), width)
184 if fallback:
185 label, text, _clock = fallback
186 prefix = f"{_muted('Outcome')} {_event_badge(label)} "
187 return _fit_ansi(prefix + _one_line(text, max(12, width - len(_strip_ansi(prefix)))), width)
188 return ""
189
190
191def latest_hour_outcome_summary_line(events: list[dict[str, Any]], *, width: int) -> str:
192 """Return a single compact count summary for the newest visible activity hour."""
193
194 buckets: dict[str, dict[str, int]] = {}
195 order: list[str] = []
196 for event in events:
197 parsed = model_update_event_parts(event, width=max(width, 180), compact=False)
198 if not parsed:
199 continue
200 label, _text, _clock = parsed
201 if label not in SUMMARY_COUNT_LABELS:
202 continue
203 hour = event_hour(event)
204 if hour not in buckets:
205 buckets[hour] = {}
206 order.append(hour)
207 buckets[hour][label] = int(buckets[hour].get(label) or 0) + 1
208 if not order:
209 return ""
210 summary = hourly_outcome_summary(buckets[order[-1]])
211 if not summary:
212 return ""
213 prefix = f"{_muted('Latest hour')} "
214 return _fit_ansi(prefix + _bold(_one_line(summary, max(12, width - len(_strip_ansi(prefix))))), width)
215
216
217def visible_outcome_summary_line(events: list[dict[str, Any]], *, width: int) -> str:
218 """Return a stable summary of the durable outcomes available to the pane."""
219
220 counts = outcome_counts(events, include_research=False, include_failures=True)
221 summary = hourly_outcome_summary(counts)
222 if not summary:
223 return ""
224 prefix = f"{_muted('Visible')} "
225 return _fit_ansi(prefix + _bold(_one_line(summary, max(12, width - len(_strip_ansi(prefix))))), width)
226
227
228def job_outcome_summary(events: list[dict[str, Any]], *, width: int) -> str:
229 """Return a short per-job durable outcome mix for compact job cards."""
230
231 counts = outcome_counts(events, include_research=False, include_failures=False)
232 summary = hourly_outcome_summary(counts)
233 if not summary:
234 return ""
235 return _one_line(summary, width)
236
237
238def outcome_counts(
239 events: list[dict[str, Any]],
240 *,
241 include_research: bool,
242 include_failures: bool,
243) -> dict[str, int]:
244 counts: dict[str, int] = {}
245 for event in events:
246 parsed = model_update_event_parts(event, width=220, compact=False)
247 if not parsed:
248 continue
249 label, _text, _clock = parsed
250 if label == "DONE" and not include_research:
251 continue
252 if label == "FAIL" and not include_failures:
253 continue
254 if label not in SUMMARY_COUNT_LABELS:
255 continue
256 counts[label] = int(counts.get(label) or 0) + 1
257 return counts
258
259
260def recent_model_update_lines(
261 events: list[dict[str, Any]],
262 *,
263 width: int,
264 limit: int,
265 include_research: bool = False,
266 wrap: bool = True,
267) -> list[str]:
268 """Render recent durable worker outcomes for the compact status pane."""
269 if limit <= 0:
270 return []
271 lines: list[str] = []
272 items: list[dict[str, Any]] = []
273 index_by_key: dict[tuple[str, str], int] = {}
274 for event in reversed(events):
275 parsed = model_update_event_parts(event, width=max(width, 180), compact=False)
276 if not parsed:
277 continue
278 label, text, clock = parsed
279 if label == "DONE" and not include_research:
280 continue
281 if label not in PRIMARY_OUTCOME_LABELS and not (include_research and label == "DONE"):
282 continue
283 key = (label, text)
284 if key in index_by_key:
285 items[index_by_key[key]]["count"] = int(items[index_by_key[key]].get("count") or 1) + 1
286 continue
287 index_by_key[key] = len(items)
288 items.append({"label": label, "text": text, "clock": clock, "count": 1})
289 if len(items) >= max(limit * 2, limit + 8):
290 break
291 for item in items:
292 label = str(item["label"])
293 text = str(item["text"])
294 clock = str(item["clock"])
295 count = int(item.get("count") or 1)
296 prefix = f"{_muted(clock)} {_event_badge(label)} " if clock else f"{_event_badge(label)} "
297 prefix_width = len(_strip_ansi(prefix))
298 available = max(12, width - prefix_width - 2)
299 if count > 1:
300 text = f"{text} x{count}"
301 if not wrap:
302 lines.append(_fit_ansi(prefix + _one_line(text, available), width))
303 if len(lines) >= limit:
304 return lines
305 continue
306 wrapped = textwrap.wrap(text, width=available) or [""]
307 lines.append(_fit_ansi(prefix + wrapped[0], width))
308 if len(lines) >= limit:
309 return lines
310 continuation_prefix = " " * prefix_width
311 for part in wrapped[1:]:
312 lines.append(_fit_ansi(continuation_prefix + part, width))
313 if len(lines) >= limit:
314 return lines
315 if len(lines) >= limit:
316 return lines
317 return lines
318
319
320def chat_updates_pane_lines(
321 *,
322 job: dict[str, Any],
323 events: list[dict[str, Any]],
324 width: int,
325 rows: int,
326) -> list[str]:
327 lines = [
328 f"{_muted('Page')} {_page_indicator('updates', CHAT_RIGHT_PAGES)}",
329 f"{_muted('Focus')} {_bold(_one_line(job.get('title') or 'untitled', width - 8))}",
330 ]
331 counts = outcome_counts(events, include_research=False, include_failures=True)
332 summary = hourly_outcome_summary(counts)
333 if summary:
334 lines.extend([*_wrapped_label_line("Visible", summary, width=width), ""])
335 update_lines = recent_model_update_lines(events, width=width, limit=max(4, rows - len(lines)), wrap=False)
336 if update_lines:
337 lines.extend(update_lines)
338 else:
339 lines.extend(["", _muted("No model updates yet.")])
340 return [_fit_ansi(line, width) for line in lines[:rows]]
341
342
343def _wrapped_label_line(label: str, text: str, *, width: int) -> list[str]:
344 prefix = f"{_muted(label)} "
345 prefix_width = len(_strip_ansi(prefix))
346 available = max(12, width - prefix_width)
347 wrapped = textwrap.wrap(text, width=available) or [""]
348 lines = [_fit_ansi(prefix + _bold(wrapped[0]), width)]
349 continuation = " " * prefix_width
350 for part in wrapped[1:]:
351 lines.append(_fit_ansi(continuation + _bold(part), width))
352 return lines
353
354
355def hourly_update_lines(events: list[dict[str, Any]], *, width: int, limit: int) -> list[str]:
356 if limit <= 0:
357 return []
358 buckets: dict[str, dict[str, Any]] = {}
359 order: list[str] = []
360 for event in events:
361 parsed = model_update_event_parts(event, width=max(width, 220), compact=False)
362 if not parsed:
363 continue
364 label, text, clock = parsed
365 if label not in SUMMARY_COUNT_LABELS:
366 continue
367 hour = event_hour(event)
368 if hour not in buckets:
369 buckets[hour] = {"counts": {}, "items": [], "clock": clock}
370 order.append(hour)
371 bucket = buckets[hour]
372 counts = bucket["counts"]
373 counts[label] = int(counts.get(label) or 0) + 1
374 item = (label, text)
375 if item not in bucket["items"]:
376 bucket["items"].append(item)
377 rendered: list[str] = []
378 # Each visible hour needs a header and at least a couple durable outcomes.
379 # Showing too many buckets makes the pane churn and can trim off the hour
380 # label, which is harder to scan during long-running jobs.
381 max_visible_hours = max(1, min(len(order), max(1, limit // 4)))
382 recent_hours = order[-max_visible_hours:]
383 available_items = max(1, limit - len(recent_hours))
384 per_bucket = max(1, min(6, available_items // max(1, len(recent_hours))))
385 for hour in recent_hours:
386 bucket = buckets[hour]
387 counts = bucket["counts"]
388 summary = hourly_outcome_summary(counts)
389 rendered.append(_fit_ansi(f"{_muted(hour)} {_bold(summary or 'activity')}", width))
390 primary_items = [item for item in bucket["items"] if item[0] in PRIMARY_OUTCOME_LABELS]
391 visible_items = primary_items or bucket["items"]
392 for label, text in visible_items[-per_bucket:]:
393 prefix = f" {_event_badge(label)} "
394 available = max(16, width - len(_strip_ansi(prefix)))
395 parts = textwrap.wrap(text, width=available) or [""]
396 rendered.append(_fit_ansi(prefix + parts[0], width))
397 for part in parts[1:]:
398 rendered.append(_fit_ansi(" " * len(_strip_ansi(prefix)) + part, width))
399 if len(rendered) >= limit:
400 return rendered[:limit]
401 if len(rendered) >= limit:
402 return rendered[:limit]
403 return rendered[:limit]
404
405
406def hourly_outcome_summary(counts: dict[str, Any]) -> str:
407 pieces: list[str] = []
408 ordered = [label for label in OUTCOME_LABEL_ORDER if label in counts]
409 ordered.extend(sorted(label for label in counts if label not in set(OUTCOME_LABEL_ORDER)))
410 for label in ordered:
411 count = int(counts.get(label) or 0)
412 if count <= 0:
413 continue
414 name = OUTCOME_SUMMARY_NAMES.get(label, label.lower())
415 pieces.append(f"{count} {name}")
416 return " ".join(pieces)
417
418
419def _durable_progress_event_parts(
420 metadata: dict[str, Any],
421 *,
422 body: str,
423 chars: int,
424 compact: bool,
425 clock: str,
426) -> tuple[str, str, str] | None:
427 deltas = _count_map(metadata.get("deltas"))
428 updates = _count_map(metadata.get("updates"))
429 resolutions = _count_map(metadata.get("resolutions"))
430 totals = {
431 key: int(deltas.get(key) or 0) + int(updates.get(key) or 0) + int(resolutions.get(key) or 0)
432 for key in set(deltas) | set(updates) | set(resolutions)
433 }
434 if not any(value > 0 for value in totals.values()):
435 return None
436 key = _dominant_progress_key(totals, resolutions=resolutions)
437 if not key:
438 return None
439 label = _progress_label_for_key(key, resolution=bool(resolutions.get(key)))
440 pieces: list[str] = []
441 for record_key in ("findings", "experiments", "sources", "tasks", "milestones", "lessons"):
442 if deltas.get(record_key):
443 pieces.append(_progress_count_phrase(int(deltas[record_key]), record_key, prefix="+"))
444 if updates.get(record_key):
445 pieces.append(_progress_count_phrase(int(updates[record_key]), record_key, prefix="~", suffix="updated"))
446 if resolutions.get(record_key):
447 pieces.append(_progress_count_phrase(int(resolutions[record_key]), record_key, suffix="resolved"))
448 detail = ", ".join(pieces)
449 if body:
450 detail = f"{detail} - {generic_display_text(body)}"
451 return label, _outcome_text(detail, chars=chars, compact=compact), clock
452
453
454def _count_map(value: Any) -> dict[str, int]:
455 if not isinstance(value, dict):
456 return {}
457 result: dict[str, int] = {}
458 for key, raw_count in value.items():
459 try:
460 count = int(raw_count)
461 except (TypeError, ValueError):
462 continue
463 if count > 0:
464 result[str(key)] = count
465 return result
466
467
468def _dominant_progress_key(totals: dict[str, int], *, resolutions: dict[str, int]) -> str:
469 for key in ("experiments", "milestones", "findings", "sources", "tasks", "lessons"):
470 if resolutions.get(key):
471 return key
472 for key in ("experiments", "findings", "milestones", "sources", "tasks", "lessons"):
473 if totals.get(key):
474 return key
475 return ""
476
477
478def _progress_label_for_key(key: str, *, resolution: bool) -> str:
479 if key == "findings":
480 return "FIND"
481 if key == "sources":
482 return "SOURCE"
483 if key == "experiments":
484 return "TEST"
485 if key == "tasks":
486 return "TASK"
487 if key == "milestones":
488 return "VALID" if resolution else "ROAD"
489 if key == "lessons":
490 return "LEARN"
491 return "UPDATE"
492
493
494def _progress_count_phrase(value: int, key: str, *, prefix: str = "", suffix: str = "") -> str:
495 label = OUTCOME_SUMMARY_NAMES.get(_progress_label_for_key(key, resolution=False), key)
496 if value == 1 and label.endswith("s"):
497 label = label[:-1]
498 parts = [f"{prefix}{value} {label}"]
499 if suffix:
500 parts.append(suffix)
501 return " ".join(parts)
502
503
504def _outcome_text(text: str, *, chars: int, compact: bool) -> str:
505 clean = generic_display_text(text)
506 if compact:
507 return _one_line(clean, chars)
508 return _one_line(clean, 900)
nipux_cli/tui_status.py 540 lines
1"""Status and work-pane renderers for the Nipux terminal UI."""
2
3from __future__ import annotations
4
5import textwrap
6from typing import Any
7
8from nipux_cli.config import AppConfig
9from nipux_cli.operator_context import active_prompt_operator_entries
10from nipux_cli.scheduling import job_deferred_until, job_provider_blocked
11from nipux_cli.tui_event_format import experiment_metric_text
12from nipux_cli.tui_events import (
13 worker_activity_lines,
14)
15from nipux_cli.tui_outcomes import (
16 CHAT_RIGHT_PAGES,
17 job_outcome_summary,
18 latest_durable_outcome_line,
19 latest_hour_outcome_summary_line,
20 model_update_event_parts,
21 recent_model_update_lines,
22)
23from nipux_cli.tui_layout import _format_compact_count, _metric_strip
24from nipux_cli.tui_style import (
25 _accent,
26 _bold,
27 _event_badge,
28 _fit_ansi,
29 _muted,
30 _one_line,
31 _page_indicator,
32 _status_badge,
33)
34
35
36def worker_label(job: dict[str, Any], daemon_running: bool) -> str:
37 status = str(job.get("status") or "")
38 if job_provider_blocked(job):
39 return "provider wait"
40 if status == "planning":
41 return "waiting"
42 if status in {"paused", "completed", "cancelled", "failed"}:
43 return status
44 if job_deferred_until(job):
45 return "waiting"
46 return "active" if daemon_running and status in {"running", "queued"} else "idle"
47
48
49def job_display_state(job: dict[str, Any], daemon_running: bool) -> str:
50 status = str(job.get("status") or "")
51 if job_provider_blocked(job):
52 return "provider wait"
53 if status in {"running", "queued"}:
54 if job_deferred_until(job):
55 return "waiting"
56 return "advancing" if daemon_running else "open"
57 return status or "unknown"
58
59
60def active_operator_messages(metadata: dict[str, Any]) -> list[dict[str, Any]]:
61 messages = metadata.get("operator_messages") if isinstance(metadata.get("operator_messages"), list) else []
62 return [
63 entry
64 for entry in messages
65 if isinstance(entry, dict)
66 and entry in active_prompt_operator_entries(messages)
67 and str(entry.get("mode") or "steer") in {"steer", "follow_up"}
68 ]
69
70
71def right_pane_lines(
72 *,
73 job: dict[str, Any],
74 jobs: list[dict[str, Any]],
75 job_artifacts: dict[str, list[dict[str, Any]]],
76 job_summary_events: dict[str, list[dict[str, Any]]],
77 job_counts: dict[str, dict[str, Any]],
78 job_id: str,
79 daemon_running: bool,
80 state: str,
81 worker: str,
82 daemon_text: str,
83 model: str,
84 goal_text: str,
85 latest_text: str,
86 metrics: list[tuple[str, Any]],
87 events: list[dict[str, Any]],
88 token_usage: dict[str, Any],
89 context_length: int,
90 width: int,
91 rows: int,
92 right_view: str = "status",
93) -> list[str]:
94 del model, latest_text, daemon_text
95 if _is_workspace_placeholder(job) and not jobs:
96 return _empty_workspace_status_lines(right_view=right_view, width=width, rows=rows)
97 info_lines = _chat_workspace_lines(
98 right_view=right_view,
99 job=job,
100 state=state,
101 worker=worker,
102 goal_text=goal_text,
103 token_usage=token_usage,
104 context_length=context_length,
105 width=width,
106 )
107 metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
108 active_operator = active_operator_messages(metadata)
109 pending_measurement = (
110 metadata.get("pending_measurement_obligation")
111 if isinstance(metadata.get("pending_measurement_obligation"), dict)
112 else {}
113 )
114 if active_operator:
115 info_lines.append(f"{_muted('Operator')} {len(active_operator)} active")
116 info_lines.append(f"{_muted('Context')} {_one_line(active_operator[-1].get('message') or '', width - 8)}")
117 if pending_measurement:
118 info_lines.append(f"{_muted('Measure')} pending step #{pending_measurement.get('source_step_no') or '?'}")
119 if job_provider_blocked(job):
120 info_lines.append(_fit_ansi(f"{_muted('Provider')} action needed before retrying model calls", width))
121 defer_line = _defer_status_line(job, width=width)
122 if defer_line:
123 info_lines.append(defer_line)
124 spacious = rows >= 18
125 if spacious:
126 info_lines.append("")
127 info_lines.append(_bold("Now"))
128 latest_hour = latest_hour_outcome_summary_line(events, width=width) if rows >= 18 else ""
129 if latest_hour:
130 info_lines.append(latest_hour)
131 latest_outcome = latest_durable_outcome_line(events, width=width)
132 if latest_outcome:
133 info_lines.append(latest_outcome)
134 if spacious and not latest_hour and not latest_outcome:
135 info_lines.append(_muted("No durable outcome yet."))
136 if spacious:
137 info_lines.append("")
138 info_lines.append(_bold("Progress"))
139 info_lines.extend(_metrics_grid_lines(metrics, width=width))
140 yield_line = _yield_line(metrics, width=width)
141 if yield_line:
142 info_lines.append(yield_line)
143 else:
144 info_lines.append(_metric_strip(metrics[:5], width=width))
145 info_lines.append("")
146 info_lines.append(_bold("Jobs"))
147 info_lines.extend(
148 frame_jobs_lines(
149 jobs[:5],
150 focused_job_id=job_id,
151 daemon_running=daemon_running,
152 width=width,
153 job_artifacts=job_artifacts,
154 job_summary_events=job_summary_events,
155 job_counts=job_counts,
156 show_outputs=True,
157 )
158 )
159 info_lines.append("")
160 info_lines.append(_bold("Recent outcomes"))
161 outcome_lines = recent_model_update_lines(events, width=width, limit=max(3, rows - len(info_lines)))
162 if outcome_lines:
163 info_lines.extend(outcome_lines)
164 else:
165 current_outputs = job_artifacts.get(job_id) or []
166 if current_outputs:
167 for artifact in current_outputs[:4]:
168 title = _one_line(str(artifact.get("title") or artifact.get("id") or "output"), max(10, width - 8))
169 info_lines.append(_fit_ansi(f"{_event_badge('SAVE')} {title}", width))
170 else:
171 info_lines.append(_muted("No durable outcomes yet."))
172 return info_lines[:rows]
173
174
175def _is_workspace_placeholder(job: dict[str, Any]) -> bool:
176 return str(job.get("kind") or "") == "workspace"
177
178
179def _empty_workspace_status_lines(*, right_view: str, width: int, rows: int) -> list[str]:
180 lines = [
181 f"{_muted('Page')} {_page_indicator(right_view, CHAT_RIGHT_PAGES)}",
182 _bold("No workers yet"),
183 _muted("Type a goal in chat to start one."),
184 "",
185 f"{_muted('Start')} {_bold('plain English goal')} or {_bold('/new OBJECTIVE')}",
186 f"{_muted('Setup')} {_bold('/settings')}",
187 f"{_muted('Check')} {_bold('/doctor')}",
188 ]
189 return [_fit_ansi(line, width) for line in lines[:rows]]
190
191
192def chat_work_pane_lines(
193 *,
194 job: dict[str, Any],
195 events: list[dict[str, Any]],
196 tasks: list[dict[str, Any]],
197 experiments: list[dict[str, Any]],
198 width: int,
199 rows: int,
200) -> list[str]:
201 lines = [
202 f"{_muted('Page')} {_page_indicator('work', CHAT_RIGHT_PAGES)}",
203 f"{_muted('Focus')} {_bold(_one_line(job.get('title') or 'untitled', width - 8))}",
204 ]
205 done_lines = recent_model_update_lines(events, width=width, limit=max(2, rows // 4))
206 if done_lines:
207 lines.extend(["", _bold("Done")])
208 lines.extend(done_lines)
209 lines.extend([
210 "",
211 _bold("Tool / console"),
212 ])
213 tool_budget = max(3, min(max(4, rows // 3), rows - len(lines) - 5))
214 tool_lines = worker_activity_lines(events, width=width, limit=tool_budget)
215 if tool_lines:
216 lines.extend(tool_lines)
217 else:
218 lines.append(_muted("No recent tool calls."))
219 remaining = max(0, rows - len(lines))
220 if remaining > 4:
221 lines.append("")
222 lines.append(_bold("Tasks"))
223 for task in _rank_visible_tasks(tasks)[: max(1, remaining // 2)]:
224 status = str(task.get("status") or "open")
225 title = _one_line(str(task.get("title") or "task"), max(10, width - 15))
226 lines.append(_fit_ansi(f"{_status_badge(status)} {title}", width))
227 remaining = max(0, rows - len(lines))
228 if remaining > 3 and experiments:
229 lines.append("")
230 lines.append(_bold("Measurements"))
231 for experiment in experiments[-max(1, remaining - 2) :]:
232 metric = experiment_metric_text(experiment)
233 title = _one_line(str(experiment.get("title") or "experiment"), max(10, width - 16))
234 suffix = f" {_muted(metric)}" if metric else ""
235 lines.append(_fit_ansi(f"{_event_badge('TEST')} {title}{suffix}", width))
236 return [_fit_ansi(line, width) for line in lines[:rows]]
237
238
239def chat_settings_pane_lines(
240 *,
241 config: AppConfig,
242 width: int,
243 rows: int,
244) -> list[str]:
245 key_state = "set" if config.model.api_key else "missing"
246 input_cost = _rate_text(config.model.input_cost_per_million)
247 output_cost = _rate_text(config.model.output_cost_per_million)
248 lines = [
249 f"{_muted('Page')} {_page_indicator('settings', CHAT_RIGHT_PAGES)}",
250 _bold("Model"),
251 _setting_line("id", config.model.model, command="/model MODEL", width=width),
252 _setting_line("endpoint", config.model.base_url, command="/base-url URL", width=width),
253 _setting_line("key", f"{key_state} via {config.model.api_key_env}", command="/api-key KEY", width=width),
254 _setting_line("context", str(config.model.context_length), command="/context TOKENS", width=width),
255 "",
256 _bold("Runtime"),
257 _setting_line("home", str(config.runtime.home), command="/home PATH", width=width),
258 _setting_line("step", f"{config.runtime.max_step_seconds}s", command="/step-limit SECONDS", width=width),
259 _setting_line("preview", f"{config.runtime.artifact_inline_char_limit} chars", command="/output-chars CHARS", width=width),
260 "",
261 _bold("Cost"),
262 _setting_line("input", input_cost, command="/input-cost DOLLARS", width=width),
263 _setting_line("output", output_cost, command="/output-cost DOLLARS", width=width),
264 "",
265 _bold("Digest"),
266 _setting_line(
267 "daily",
268 f"{config.runtime.daily_digest_enabled} at {config.runtime.daily_digest_time}",
269 command="/daily-digest true|false",
270 width=width,
271 ),
272 _muted("Type a command in the composer to edit."),
273 ]
274 return [_fit_ansi(line, width) for line in lines[:rows]]
275
276
277def frame_jobs_lines(
278 jobs: list[dict[str, Any]],
279 *,
280 focused_job_id: str,
281 daemon_running: bool,
282 width: int,
283 job_artifacts: dict[str, list[dict[str, Any]]] | None = None,
284 job_summary_events: dict[str, list[dict[str, Any]]] | None = None,
285 job_counts: dict[str, dict[str, Any]] | None = None,
286 show_outputs: bool = False,
287) -> list[str]:
288 rendered = []
289 for index, item in enumerate(jobs[:5], start=1):
290 item_id = str(item.get("id") or "")
291 marker = _accent("●") if item_id == focused_job_id else _muted("○")
292 title_width = max(14, min(30, width - 34))
293 title = _one_line(str(item.get("title") or item.get("id") or "job"), title_width)
294 state = _status_badge(job_display_state(item, daemon_running))
295 worker = _status_badge(worker_label(item, daemon_running))
296 kind = _one_line(item.get("kind") or "", max(0, width - title_width - 33))
297 rendered.append(
298 _fit_ansi(
299 f"{marker} {index:<2} {_fit_ansi(title, title_width)} "
300 f"{_fit_ansi(state, 10)} {_fit_ansi(worker, 10)} {kind}",
301 width,
302 )
303 )
304 if show_outputs:
305 rendered.extend(_job_compact_work_lines(
306 outputs=(job_artifacts or {}).get(item_id) or [],
307 counts=(job_counts or {}).get(item_id) or {},
308 events=(job_summary_events or {}).get(item_id) or [],
309 width=width,
310 focused=item_id == focused_job_id,
311 ))
312 return rendered
313
314
315def _job_compact_work_lines(
316 *,
317 outputs: list[dict[str, Any]],
318 counts: dict[str, Any],
319 events: list[dict[str, Any]],
320 width: int,
321 focused: bool = False,
322) -> list[str]:
323 lines: list[str] = []
324 summary = job_outcome_summary(events, width=max(12, width - 13))
325 if summary:
326 lines.append(_fit_ansi(f" {_muted('work')} {_bold(summary)}", width))
327 if outputs:
328 latest = outputs[0]
329 output_total = int(counts.get("artifacts") or len(outputs))
330 output_count = f"{output_total} outputs" if output_total != 1 else "1 output"
331 title_budget = max(12, width - 13 - len(output_count))
332 output_title = _one_line(str(latest.get("title") or latest.get("id") or "saved output"), title_budget)
333 lines.append(_fit_ansi(f" {_muted('made')} {_bold(output_count)} · {output_title}", width))
334 if focused and len(outputs) > 1 and width >= 42:
335 second = outputs[1]
336 second_title = _one_line(str(second.get("title") or second.get("id") or "saved output"), max(12, width - 10))
337 lines.append(_fit_ansi(f" {_muted('also')} {second_title}", width))
338 for outcome in _job_recent_non_output_pieces(
339 events,
340 width=max(12, width - 10),
341 skip_save=bool(outputs),
342 limit=2 if focused else 1,
343 ):
344 lines.append(_fit_ansi(f" {_muted('did')} {outcome}", width))
345 return lines
346
347
348def _job_recent_non_output_pieces(
349 events: list[dict[str, Any]],
350 *,
351 width: int,
352 skip_save: bool,
353 limit: int,
354) -> list[str]:
355 pieces: list[str] = []
356 seen: set[str] = set()
357 for event in reversed(events):
358 parsed = model_update_event_parts(event, width=max(width, 120))
359 if not parsed:
360 continue
361 label, text, _clock = parsed
362 if label == "DONE":
363 continue
364 if skip_save and label == "SAVE":
365 continue
366 prefix = _compact_outcome_label(label)
367 piece = f"{_muted(prefix)} {_one_line(text, max(12, width - len(prefix) - 1))}"
368 dedupe_key = _one_line(f"{prefix} {text}", 120)
369 if dedupe_key in seen:
370 continue
371 seen.add(dedupe_key)
372 pieces.append(piece)
373 if len(pieces) >= max(1, limit):
374 break
375 return pieces
376
377
378def _compact_outcome_label(label: str) -> str:
379 return {
380 "FIND": "find",
381 "SOURCE": "src",
382 "TEST": "test",
383 "TASK": "task",
384 "ROAD": "road",
385 "VALID": "valid",
386 "LEARN": "learn",
387 "FILE": "file",
388 "SAVE": "out",
389 "FAIL": "fail",
390 "PLAN": "plan",
391 "UPDATE": "note",
392 }.get(label, label.lower())
393
394
395def _defer_status_line(job: dict[str, Any], *, width: int) -> str:
396 until = job_deferred_until(job)
397 if not until:
398 return ""
399 metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
400 reason = str(metadata.get("defer_reason") or metadata.get("defer_next_action") or "").strip()
401 time_text = until.astimezone().strftime("%b %d %H:%M")
402 detail = f"next check {time_text}"
403 if reason:
404 detail += f" - {reason}"
405 return _fit_ansi(f"{_muted('Wait')} {_one_line(detail, max(12, width - 7))}", width)
406
407
408def _rank_visible_tasks(tasks: list[dict[str, Any]]) -> list[dict[str, Any]]:
409 status_order = {"active": 0, "open": 1, "blocked": 2, "validating": 3, "done": 4, "skipped": 5}
410 return sorted(
411 [task for task in tasks if isinstance(task, dict)],
412 key=lambda task: (
413 status_order.get(str(task.get("status") or "open"), 9),
414 -int(task.get("priority") or 0),
415 str(task.get("title") or ""),
416 ),
417 )
418
419
420def _chat_workspace_lines(
421 *,
422 right_view: str,
423 job: dict[str, Any],
424 state: str,
425 worker: str,
426 goal_text: str,
427 token_usage: dict[str, Any],
428 context_length: int,
429 width: int,
430) -> list[str]:
431 goal_lines = textwrap.wrap(goal_text, width=max(20, width - 8))[:2] or [""]
432 while len(goal_lines) < 2:
433 goal_lines.append("")
434 title = _one_line(str(job.get("title") or "untitled"), max(10, width))
435 lines = [
436 f"{_muted('Page')} {_page_indicator(right_view, CHAT_RIGHT_PAGES)}",
437 _bold(title),
438 f"{_muted('State')} {_status_badge(state)} {_muted('worker')} {_status_badge(worker)}",
439 f"{_muted('Goal')} {goal_lines[0]}",
440 f"{_muted(' ')}{goal_lines[1]}",
441 ]
442 task_line = _current_task_line(job, width=width)
443 if task_line:
444 lines.append(task_line)
445 context_line = _context_pressure_line(token_usage, context_length=context_length, width=width)
446 if context_line:
447 lines.append(context_line)
448 return lines
449
450
451def _current_task_line(job: dict[str, Any], *, width: int) -> str:
452 metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
453 tasks = metadata.get("task_queue") if isinstance(metadata.get("task_queue"), list) else []
454 visible = [
455 task
456 for task in tasks
457 if isinstance(task, dict)
458 and str(task.get("status") or "open") in {"active", "open", "blocked"}
459 ]
460 if not visible:
461 return ""
462 ranked = _rank_visible_tasks(visible)
463 task = ranked[0]
464 status = str(task.get("status") or "open")
465 title = _one_line(str(task.get("title") or "task"), max(12, width - 16))
466 return _fit_ansi(f"{_muted('Task')} {_status_badge(status)} {title}", width)
467
468
469def _context_pressure_line(usage: dict[str, Any], *, context_length: int, width: int) -> str:
470 latest_prompt = _safe_int(usage.get("latest_prompt_tokens"))
471 context_limit = _safe_int(usage.get("latest_context_length")) or context_length
472 if latest_prompt <= 0 or context_limit <= 0:
473 return ""
474 fraction = latest_prompt / max(1, context_limit)
475 if fraction < 0.65:
476 return ""
477 label = "high" if fraction >= 0.85 else "watch" if fraction >= 0.65 else "ok"
478 detail = (
479 f"{_format_compact_count(latest_prompt)}/{_format_compact_count(context_limit)} "
480 f"{fraction:.0%} {label}"
481 )
482 return _fit_ansi(f"{_muted('Context')} {_one_line(detail, max(12, width - 8))}", width)
483
484
485def _metrics_grid_lines(metrics: list[tuple[str, Any]], *, width: int) -> list[str]:
486 wanted = ["actions", "outputs", "findings", "sources", "tasks", "experiments", "memory"]
487 lookup = {label: value for label, value in metrics}
488 items = [(label, lookup[label]) for label in wanted if label in lookup]
489 if width < 40:
490 return [_metric_strip(items, width=width)]
491 lines: list[str] = []
492 col_width = max(16, (width - 2) // 2)
493 for index in range(0, len(items), 2):
494 left = _metric_cell(items[index], width=col_width)
495 right = _metric_cell(items[index + 1], width=col_width) if index + 1 < len(items) else ""
496 lines.append(_fit_ansi(left + " " + right, width))
497 return lines
498
499
500def _metric_cell(item: tuple[str, Any], *, width: int) -> str:
501 label, value = item
502 return _fit_ansi(f"{_muted(label)} {_bold(value)}", width)
503
504
505def _yield_line(metrics: list[tuple[str, Any]], *, width: int) -> str:
506 lookup = {label: value for label, value in metrics}
507 actions = _safe_int(lookup.get("actions"))
508 if actions < 20:
509 return ""
510 outputs = _safe_int(lookup.get("outputs"))
511 findings = _safe_int(lookup.get("findings"))
512 sources = _safe_int(lookup.get("sources"))
513 experiments = _safe_int(lookup.get("experiments"))
514 durable = outputs + findings + sources + experiments
515 if durable <= 0:
516 return _fit_ansi(f"{_muted('Yield')} {_status_badge('blocked')} no durable outcomes after {actions} actions", width)
517 actions_per = actions / durable
518 label = "watch" if actions_per >= 25 else "ok"
519 if actions_per < 8:
520 return ""
521 detail = f"{actions_per:.1f} actions/outcome"
522 return _fit_ansi(f"{_muted('Yield')} {_status_badge(label)} {detail}", width)
523
524
525def _setting_line(label: str, value: str, *, command: str, width: int) -> str:
526 left = f"{_muted(label)} {_bold(_one_line(value, max(8, width - 24)))}"
527 if width < 46:
528 return _fit_ansi(left, width)
529 return _fit_ansi(left + " " + _muted(command), width)
530
531
532def _rate_text(value: float | None) -> str:
533 return "provider-reported" if value is None else f"${value:g}/1M"
534
535
536def _safe_int(value: Any) -> int:
537 try:
538 return int(float(value))
539 except (TypeError, ValueError):
540 return 0
nipux_cli/tui_style.py 153 lines
1"""Small terminal styling helpers shared by the CLI frame renderers."""
2
3from __future__ import annotations
4
5import os
6import re
7import sys
8from typing import Any
9
10
11def _fancy_ui() -> bool:
12 return (
13 sys.stdout.isatty()
14 and os.environ.get("NO_COLOR") is None
15 and os.environ.get("NIPUX_PLAIN") is None
16 and os.environ.get("TERM", "") not in {"", "dumb"}
17 )
18
19
20def _style(text: Any, code: str) -> str:
21 value = str(text)
22 if not _fancy_ui():
23 return value
24 return f"\033[{code}m{value}\033[0m"
25
26
27def _accent(text: Any) -> str:
28 return _style(text, "38;5;123")
29
30
31def _muted(text: Any) -> str:
32 return _style(text, "38;5;248")
33
34
35def _bold(text: Any) -> str:
36 return _style(text, "1")
37
38
39def _one_line(value: Any, width: int) -> str:
40 text = " ".join(str(value).split())
41 if len(text) <= width:
42 return text
43 return text[: max(0, width - 3)] + "..."
44
45
46def _strip_ansi(text: str) -> str:
47 return re.sub(r"\x1b\[[0-9;]*m", "", text)
48
49
50def _fit_ansi(text: Any, width: int) -> str:
51 width = max(0, int(width))
52 content = str(text)
53 visible = _strip_ansi(content)
54 if len(visible) > width:
55 content = _one_line(visible, width)
56 visible = content
57 return content + " " * max(0, width - len(visible))
58
59
60def _center_ansi(text: str, width: int) -> str:
61 text_width = len(_strip_ansi(text))
62 if text_width >= width:
63 return _fit_ansi(text, width)
64 left_pad = max(0, (width - text_width) // 2)
65 return _fit_ansi(" " * left_pad + text, width)
66
67
68def _themed_lines(lines: list[str], *, width: int) -> list[str]:
69 if not _fancy_ui():
70 return [_fit_ansi(line, width) for line in lines]
71 bg = "\033[48;5;234m\033[38;5;252m"
72 reset = "\033[0m"
73 return [bg + _fit_ansi(line, width).replace(reset, reset + bg) + reset for line in lines]
74
75
76def _frame_enter_sequence() -> str:
77 theme = "\033[48;5;234m\033[38;5;252m" if _fancy_ui() else ""
78 return f"\033[?1049h{theme}\033[2J\033[H\033[?25l\033[?1000h\033[?1002h\033[?1006h"
79
80
81def _frame_exit_sequence() -> str:
82 return "\033[?1006l\033[?1002l\033[?1000l\033[?25h\033[0m\033[?1049l"
83
84
85def _page_indicator(active: str, pages: list[tuple[str, str]]) -> str:
86 parts: list[str] = []
87 for key, label in pages:
88 if key == active:
89 parts.append(f"{_accent('●')} {_bold(label)}")
90 else:
91 parts.append(f"{_muted('○')} {_muted(label)}")
92 return " ".join(parts)
93
94
95def _status_badge(value: Any) -> str:
96 text = str(value)
97 color = {
98 "active": "32",
99 "advancing": "32",
100 "running": "32",
101 "queued": "33",
102 "planning": "35",
103 "waiting": "35",
104 "open": "33",
105 "idle": "33",
106 "paused": "33",
107 "cancelled": "31",
108 "failed": "31",
109 "completed": "36",
110 "ok": "32",
111 "watch": "33",
112 "ready": "32",
113 "switch": "36",
114 "missing": "31",
115 "check": "33",
116 "next": "35",
117 "recommended": "36",
118 }.get(text, "37")
119 return _style(text, color)
120
121
122def _event_badge(label: str) -> str:
123 padded = f"{label:<8}"
124 color = {
125 "AGENT": "36",
126 "USER": "35",
127 "FOLLOW": "35",
128 "YOU": "35",
129 "NIPUX": "36",
130 "RUN": "34",
131 "TOOL": "34",
132 "DONE": "32",
133 "FILE": "32",
134 "SAVE": "32",
135 "OUTPUT": "32",
136 "FIND": "32",
137 "SOURCE": "36",
138 "TASK": "33",
139 "ROAD": "35",
140 "VALID": "35",
141 "TEST": "33",
142 "UPDATE": "36",
143 "ACK": "36",
144 "FAIL": "31",
145 "LEARN": "36",
146 "PLAN": "36",
147 "DIGEST": "36",
148 "MEMORY": "36",
149 "SYSTEM": "2",
150 "BLOCK": "33",
151 "ERROR": "31",
152 }.get(label, "37")
153 return _style(padded, color)
nipux_cli/uninstall.py 217 lines
1"""Uninstall helpers for local Nipux runtime state."""
2
3from __future__ import annotations
4
5import os
6import shutil
7import subprocess
8from dataclasses import dataclass
9from pathlib import Path
10from typing import Callable, Sequence
11
12from nipux_cli.config import get_agent_home
13from nipux_cli.service_install import launch_agent_path, systemd_service_path
14
15
16Runner = Callable[..., subprocess.CompletedProcess[str]]
17CommandRunner = Callable[[Sequence[str]], subprocess.CompletedProcess[str]]
18
19
20@dataclass(frozen=True)
21class UninstallPlan:
22 paths: tuple[Path, ...]
23 service_paths: tuple[Path, ...]
24
25
26def build_uninstall_plan(*, runtime_home: Path | None = None, include_legacy: bool = True) -> UninstallPlan:
27 """Return all local runtime paths that a full uninstall should remove."""
28
29 homes = [runtime_home.expanduser() if runtime_home else get_agent_home(), get_agent_home(), Path.home() / ".nipux"]
30 if include_legacy:
31 homes.append(Path.home() / ".kneepucks")
32 paths = tuple(_dedupe_paths(homes))
33 service_paths = tuple(_dedupe_paths([launch_agent_path(), systemd_service_path()]))
34 return UninstallPlan(paths=paths, service_paths=service_paths)
35
36
37def uninstall_runtime(
38 *,
39 runtime_home: Path | None = None,
40 dry_run: bool = False,
41 include_legacy: bool = True,
42 runner: Runner = subprocess.run,
43) -> list[str]:
44 """Remove local Nipux state, logs, service files, and legacy state dirs."""
45
46 plan = build_uninstall_plan(runtime_home=runtime_home, include_legacy=include_legacy)
47 lines: list[str] = []
48 lines.extend(_disable_services(dry_run=dry_run, runner=runner))
49 for path in (*plan.service_paths, *plan.paths):
50 target = path.expanduser()
51 _assert_safe_delete_target(target)
52 if dry_run:
53 lines.append(f"would remove {target}")
54 continue
55 if target.is_dir() and not target.is_symlink():
56 shutil.rmtree(target)
57 lines.append(f"removed {target}")
58 elif target.exists() or target.is_symlink():
59 target.unlink()
60 lines.append(f"removed {target}")
61 else:
62 lines.append(f"not found {target}")
63 return lines
64
65
66def uninstall_installed_tool(
67 *,
68 dry_run: bool = False,
69 runner: CommandRunner | None = None,
70) -> tuple[int, list[str]]:
71 """Remove the installed `nipux` command from common uv-tool locations."""
72
73 uv = shutil.which("uv")
74 run = runner or _run_command
75 lines: list[str] = []
76 if dry_run:
77 lines.append("would run uv tool uninstall nipux")
78 for path in installed_tool_paths():
79 lines.append(f"would remove installed command path {path}")
80 return 0, lines
81 if uv:
82 result = run([uv, "tool", "uninstall", "nipux"])
83 lines.extend(_process_lines(result))
84 if result.returncode == 0:
85 if not lines:
86 lines.append("removed installed nipux command")
87 return 0, lines
88 lines.append("uv tool uninstall failed; checking safe local tool paths")
89 else:
90 lines.append("uv not found; checking safe local tool paths")
91
92 removed = False
93 errors = 0
94 for path in installed_tool_paths():
95 try:
96 if path.is_dir() and not path.is_symlink():
97 shutil.rmtree(path)
98 lines.append(f"removed {path}")
99 removed = True
100 elif path.exists() or path.is_symlink():
101 path.unlink()
102 lines.append(f"removed {path}")
103 removed = True
104 except OSError as exc:
105 lines.append(f"failed to remove {path}: {exc}")
106 errors += 1
107 if removed and not errors:
108 return 0, lines
109 if not removed:
110 lines.append("installed nipux command not found")
111 return (1 if errors else 0), lines
112
113
114def installed_tool_paths() -> tuple[Path, ...]:
115 """Return safe user-level paths for uv-tool Nipux installs."""
116
117 home = Path.home().expanduser().resolve(strict=False)
118 candidates = [
119 home / ".local" / "bin" / "nipux",
120 home / ".local" / "share" / "uv" / "tools" / "nipux",
121 ]
122 current = shutil.which("nipux")
123 if current:
124 candidates.append(Path(current))
125 safe: list[Path] = []
126 for path in _dedupe_paths(candidates):
127 expanded = path.expanduser()
128 if _is_safe_installed_tool_path(expanded, home=home):
129 safe.append(expanded)
130 return tuple(safe)
131
132
133def _disable_services(*, dry_run: bool, runner: Runner) -> list[str]:
134 lines: list[str] = []
135 launch_path = launch_agent_path()
136 label = "gui/" + str(os.getuid()) + "/com.nipux.agent"
137 launchctl = shutil.which("launchctl")
138 if dry_run:
139 lines.append(f"would unload launchd {label}")
140 elif launchctl:
141 runner([launchctl, "bootout", label], check=False, stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
142 lines.append(f"unloaded launchd {label}")
143 else:
144 lines.append("launchd unavailable")
145
146 systemctl = shutil.which("systemctl")
147 service_path = systemd_service_path()
148 if systemctl and service_path.exists():
149 if dry_run:
150 lines.append("would disable systemd user service nipux.service")
151 else:
152 runner(
153 [systemctl, "--user", "disable", "--now", "nipux.service"],
154 check=False,
155 stdout=subprocess.DEVNULL,
156 stderr=subprocess.DEVNULL,
157 )
158 runner([systemctl, "--user", "daemon-reload"], check=False, stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
159 lines.append("disabled systemd user service nipux.service")
160 elif service_path.exists():
161 lines.append("systemd unavailable; removing service file only")
162
163 if not launch_path.exists() and not service_path.exists():
164 lines.append("no installed service files found")
165 return lines
166
167
168def _run_command(command: Sequence[str]) -> subprocess.CompletedProcess[str]:
169 return subprocess.run(
170 list(command),
171 text=True,
172 stdout=subprocess.PIPE,
173 stderr=subprocess.STDOUT,
174 check=False,
175 )
176
177
178def _process_lines(process: subprocess.CompletedProcess[str]) -> list[str]:
179 output = process.stdout if isinstance(process.stdout, str) else ""
180 stderr = process.stderr if isinstance(process.stderr, str) else ""
181 return [line.rstrip() for line in f"{output}\n{stderr}".splitlines() if line.strip()]
182
183
184def _dedupe_paths(paths: list[Path]) -> list[Path]:
185 seen: set[str] = set()
186 result: list[Path] = []
187 for path in paths:
188 key = str(path.expanduser())
189 if key in seen:
190 continue
191 seen.add(key)
192 result.append(path)
193 return result
194
195
196def _assert_safe_delete_target(path: Path) -> None:
197 resolved = path.expanduser().resolve(strict=False)
198 home = Path.home().resolve(strict=False)
199 forbidden = {Path("/").resolve(strict=False), home}
200 if resolved in forbidden:
201 raise ValueError(f"refusing to remove unsafe path: {path}")
202 if len(resolved.parts) < 3:
203 raise ValueError(f"refusing to remove broad path: {path}")
204
205
206def _is_safe_installed_tool_path(path: Path, *, home: Path) -> bool:
207 expanded = path.expanduser()
208 resolved = expanded.resolve(strict=False)
209 user_bin = home / ".local" / "bin" / "nipux"
210 uv_tool_root = home / ".local" / "share" / "uv" / "tools" / "nipux"
211 return (
212 expanded == user_bin
213 or resolved == user_bin
214 or expanded == uv_tool_root
215 or resolved == uv_tool_root
216 or uv_tool_root in resolved.parents
217 )
nipux_cli/updater.py 179 lines
1"""Self-update helpers for source checkouts and installed tools."""
2
3from __future__ import annotations
4
5import os
6import shutil
7import subprocess
8from collections.abc import Callable, Sequence
9from pathlib import Path
10
11
12GitRunner = Callable[[Sequence[str], Path], subprocess.CompletedProcess[str]]
13CommandRunner = Callable[[Sequence[str]], subprocess.CompletedProcess[str]]
14
15DEFAULT_UPDATE_REPO = "https://github.com/nipuxx/agent-cli.git"
16DEFAULT_UPDATE_REF = "main"
17
18
19def find_checkout_root(start: str | Path | None = None) -> Path | None:
20 """Return the nearest enclosing git checkout for the Nipux install."""
21
22 current = Path(start).expanduser().resolve() if start else Path(__file__).resolve()
23 if current.is_file():
24 current = current.parent
25 for candidate in (current, *current.parents):
26 if (candidate / ".git").exists():
27 return candidate
28 return None
29
30
31def update_checkout(
32 *,
33 path: str | Path | None = None,
34 allow_dirty: bool = False,
35 runner: GitRunner | None = None,
36 command_runner: CommandRunner | None = None,
37) -> tuple[int, list[str]]:
38 """Update the current Nipux install and return output lines.
39
40 Source checkouts are fast-forwarded with git. Installed tools are refreshed
41 from the configured source repository so `nipux update` works from anywhere.
42 """
43
44 root = Path(path).expanduser().resolve() if path else find_checkout_root()
45 if not root or not (root / ".git").exists():
46 prefix = []
47 if path is not None:
48 prefix.append(f"{_short_path(root)} is not a source checkout; updating the installed Nipux tool instead.")
49 code, lines = _update_uv_tool_install(runner=command_runner)
50 return code, [*prefix, *lines]
51 run = runner or _run_git
52 top_level = run(["git", "rev-parse", "--show-toplevel"], root)
53 if top_level.returncode != 0:
54 return top_level.returncode, ["Cannot update: git could not identify the checkout.", *_process_lines(top_level)]
55 checkout = Path(top_level.stdout.strip() or root).expanduser().resolve()
56 before = _git_text(run(["git", "rev-parse", "--short", "HEAD"], checkout), fallback="unknown")
57 branch = _git_text(run(["git", "branch", "--show-current"], checkout), fallback="detached")
58 dirty = run(["git", "status", "--porcelain"], checkout)
59 if dirty.returncode != 0:
60 return dirty.returncode, ["Cannot update: git status failed.", *_process_lines(dirty)]
61 if dirty.stdout.strip() and not allow_dirty:
62 return (
63 1,
64 [
65 f"Cannot update: local changes exist in {_short_path(checkout)}.",
66 "Commit or stash them first, then run `nipux update` again.",
67 ],
68 )
69 lines = [f"Updating Nipux in {_short_path(checkout)}", f"Current: {branch} @ {before}"]
70 pulled = run(["git", "pull", "--ff-only"], checkout)
71 lines.extend(_process_lines(pulled))
72 if pulled.returncode != 0:
73 return pulled.returncode, ["Update failed.", *lines]
74 after = _git_text(run(["git", "rev-parse", "--short", "HEAD"], checkout), fallback=before)
75 if after == before:
76 lines.append("Nipux is already up to date.")
77 else:
78 lines.append(f"Updated Nipux: {before} -> {after}.")
79 lines.append("Update complete.")
80 return 0, lines
81
82
83def _update_uv_tool_install(*, runner: CommandRunner | None = None) -> tuple[int, list[str]]:
84 uv = shutil.which("uv")
85 if not uv:
86 return (
87 1,
88 [
89 "Cannot update automatically because `uv` was not found.",
90 "Install uv, then run `nipux update` again.",
91 ],
92 )
93 run = runner or _run_command
94 spec = _uv_tool_update_spec()
95 lines = [
96 "Updating installed Nipux command.",
97 f"Source: {spec}",
98 ]
99 current = shutil.which("nipux")
100 if current:
101 lines.append(f"Command: {current}")
102 updated = run([uv, "tool", "install", "--force", "--upgrade", "--reinstall", "--refresh", spec])
103 lines.extend(_process_lines(updated))
104 if updated.returncode != 0:
105 return updated.returncode, ["Update failed.", *lines]
106 lines.append("Nipux command refreshed from source.")
107 verified = _verify_updated_command(runner=run)
108 if verified:
109 lines.append(verified)
110 lines.append("Update complete.")
111 return 0, lines
112
113
114def _verify_updated_command(*, runner: CommandRunner) -> str:
115 nipux = shutil.which("nipux")
116 if not nipux:
117 return ""
118 checked = runner([nipux, "--version"])
119 version_line = " ".join(_process_lines(checked)).strip()
120 if checked.returncode != 0 or not version_line:
121 return ""
122 return f"Verified: {version_line}"
123
124
125def _uv_tool_update_spec() -> str:
126 """Return the direct source uv should use for installed-tool updates."""
127
128 explicit = os.environ.get("NIPUX_UPDATE_SPEC", "").strip()
129 if explicit:
130 return explicit
131 repo = os.environ.get("NIPUX_REPO_URL", DEFAULT_UPDATE_REPO).strip() or DEFAULT_UPDATE_REPO
132 ref = os.environ.get("NIPUX_REF", DEFAULT_UPDATE_REF).strip() or DEFAULT_UPDATE_REF
133 if repo.startswith(("git+", "http://", "https://", "ssh://", "file://")):
134 prefix = repo if repo.startswith("git+") else f"git+{repo}"
135 return f"{prefix}@{ref}"
136 return f"git+{repo}@{ref}"
137
138
139def _run_git(command: Sequence[str], cwd: Path) -> subprocess.CompletedProcess[str]:
140 return subprocess.run(
141 list(command),
142 cwd=cwd,
143 text=True,
144 stdout=subprocess.PIPE,
145 stderr=subprocess.STDOUT,
146 check=False,
147 )
148
149
150def _run_command(command: Sequence[str]) -> subprocess.CompletedProcess[str]:
151 return subprocess.run(
152 list(command),
153 text=True,
154 stdout=subprocess.PIPE,
155 stderr=subprocess.STDOUT,
156 check=False,
157 )
158
159
160def _process_lines(process: subprocess.CompletedProcess[str]) -> list[str]:
161 output = process.stdout if isinstance(process.stdout, str) else ""
162 return [line.rstrip() for line in output.splitlines() if line.strip()]
163
164
165def _git_text(process: subprocess.CompletedProcess[str], *, fallback: str) -> str:
166 if process.returncode != 0:
167 return fallback
168 value = process.stdout.strip() if isinstance(process.stdout, str) else ""
169 return value or fallback
170
171
172def _short_path(path: Path | str, *, max_width: int = 96) -> str:
173 text = str(path)
174 home = str(Path.home())
175 if text.startswith(home + os.sep):
176 text = "~" + text[len(home) :]
177 if len(text) <= max_width:
178 return text
179 return "..." + text[-max(12, max_width - 4) :]
nipux_cli/updates.py 132 lines
1"""Readable durable progress reports for jobs."""
2
3from __future__ import annotations
4
5import shlex
6from typing import Any
7
8from nipux_cli.config import AppConfig
9from nipux_cli.daemon import daemon_lock_status
10from nipux_cli.db import AgentDB
11from nipux_cli.tui_outcomes import hourly_update_lines, recent_model_update_lines
12from nipux_cli.tui_status import job_display_state
13from nipux_cli.tui_style import _one_line
14
15
16def render_updates_report(
17 db: AgentDB,
18 config: AppConfig,
19 job_id: str,
20 *,
21 limit: int = 5,
22 chars: int = 180,
23 paths: bool = False,
24) -> list[str]:
25 job = db.get_job(job_id)
26 artifacts = db.list_artifacts(job_id, limit=limit)
27 events = db.list_timeline_events(job_id, limit=max(250, limit * 80))
28 metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
29 operator_messages = _metadata_list(metadata, "operator_messages")
30 agent_updates = _metadata_list(metadata, "agent_updates")
31 lessons = _metadata_list(metadata, "lessons")
32 daemon = daemon_lock_status(config.runtime.home / "agentd.lock")
33 lines = [
34 f"updates {job['title']} | state {job_display_state(job, bool(daemon['running']))}",
35 "=" * 80,
36 ]
37 if operator_messages:
38 latest = operator_messages[-1]
39 lines.append(f"last steering: {_one_line(latest.get('message') or '', chars)}")
40 lines.append("outcomes by hour:")
41 outcome_lines = hourly_update_lines(events, width=max(72, chars), limit=max(8, limit * 4))
42 if outcome_lines:
43 lines.extend(f" {line}" for line in outcome_lines)
44 else:
45 lines.append(" none yet")
46 if agent_updates:
47 lines.extend(["", "latest agent notes:"])
48 for update in agent_updates[-min(limit, 5) :]:
49 category = update.get("category") or "progress"
50 lines.append(f" {category}: {_one_line(update.get('message') or '', chars)}")
51 if lessons:
52 lines.extend(["", "latest lessons:"])
53 for lesson in lessons[-min(limit, 5) :]:
54 category = lesson.get("category") or "memory"
55 lines.append(f" {category}: {_one_line(lesson.get('lesson') or '', chars)}")
56 lines.extend(["", "latest saved outputs:"])
57 if not artifacts:
58 lines.append(" none yet")
59 for artifact in artifacts:
60 title = artifact.get("title") or artifact["id"]
61 summary = f" - {_one_line(artifact['summary'], chars)}" if artifact.get("summary") else ""
62 lines.append(f" {artifact['created_at']} {title}{summary}")
63 lines.append(f" view: artifact {shlex.quote(title)}")
64 if paths:
65 lines.append(f" {artifact['path']}")
66 lines.extend(["", "raw tool stream: activity"])
67 return lines
68
69
70def render_all_updates_report(
71 db: AgentDB,
72 config: AppConfig,
73 *,
74 limit: int = 5,
75 chars: int = 180,
76 paths: bool = False,
77) -> list[str]:
78 jobs = db.list_jobs()
79 daemon = daemon_lock_status(config.runtime.home / "agentd.lock")
80 lines = [
81 f"outcomes all jobs | {len(jobs)} tracked",
82 "=" * 80,
83 ]
84 if not jobs:
85 lines.append("No jobs yet.")
86 return lines
87 for job in jobs[: max(1, limit)]:
88 job_id = str(job["id"])
89 counts = db.job_record_counts(job_id)
90 state = job_display_state(job, bool(daemon["running"]))
91 lines.append("")
92 lines.append(f"{job['title']} | {state}")
93 lines.append(
94 " "
95 + " ".join(
96 [
97 f"actions={counts.get('steps', 0)}",
98 f"outputs={counts.get('artifacts', 0)}",
99 f"findings={_metadata_count(job, 'finding_ledger')}",
100 f"tasks={_metadata_count(job, 'task_queue')}",
101 f"experiments={_metadata_count(job, 'experiment_ledger')}",
102 ]
103 )
104 )
105 events = db.list_events(job_id=job_id, limit=max(200, limit * 60))
106 outcome_lines = recent_model_update_lines(events, width=max(72, chars), limit=max(2, min(4, limit)))
107 if outcome_lines:
108 lines.extend(f" {line}" for line in outcome_lines)
109 else:
110 lines.append(" no durable outcomes yet")
111 artifacts = db.list_artifacts(job_id, limit=2)
112 for artifact in artifacts:
113 title = artifact.get("title") or artifact["id"]
114 summary = f" - {_one_line(artifact['summary'], chars)}" if artifact.get("summary") else ""
115 lines.append(f" output: {_one_line(title, chars)}{summary}")
116 if paths:
117 lines.append(f" {artifact['path']}")
118 if len(jobs) > limit:
119 lines.append("")
120 lines.append(f"... {len(jobs) - limit} more jobs hidden. Increase --limit to show more.")
121 return lines
122
123
124def _metadata_list(metadata: dict[str, Any], key: str) -> list[dict[str, Any]]:
125 values = metadata.get(key)
126 return [value for value in values if isinstance(value, dict)] if isinstance(values, list) else []
127
128
129def _metadata_count(job: dict[str, Any], key: str) -> int:
130 metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
131 values = metadata.get(key)
132 return len(values) if isinstance(values, list) else 0
nipux_cli/usage.py 78 lines
1"""Formatting helpers for model token and cost usage."""
2
3from __future__ import annotations
4
5from typing import Any
6
7from nipux_cli.tui_layout import _format_compact_count, _format_usage_cost
8
9
10def format_usage_report(
11 *,
12 title: str,
13 usage: dict[str, Any],
14 context_length: int,
15 model: str,
16 base_url: str,
17) -> list[str]:
18 calls = _safe_int(usage.get("calls"))
19 prompt = _safe_int(usage.get("prompt_tokens"))
20 completion = _safe_int(usage.get("completion_tokens"))
21 total = _safe_int(usage.get("total_tokens")) or prompt + completion
22 latest_prompt = _safe_int(usage.get("latest_prompt_tokens"))
23 latest_completion = _safe_int(usage.get("latest_completion_tokens"))
24 latest_total = _safe_int(usage.get("latest_total_tokens")) or latest_prompt + latest_completion
25 estimated = _safe_int(usage.get("estimated_calls"))
26 cached = _safe_int(usage.get("cached_tokens"))
27 reasoning = _safe_int(usage.get("reasoning_tokens"))
28 cost = _format_usage_cost(usage, model=model, base_url=base_url)
29 cost_limit = _safe_optional_float(usage.get("max_job_cost_usd"))
30 context_text = _format_compact_count(latest_prompt)
31 if context_length > 0:
32 context_text = f"{context_text}/{_format_compact_count(context_length)}"
33 lines = [
34 f"usage {title}",
35 "=" * 80,
36 f"model: {model}",
37 f"calls: {calls} | estimated: {estimated}",
38 f"tokens: total={_format_compact_count(total)} prompt={_format_compact_count(prompt)} output={_format_compact_count(completion)}",
39 f"latest: ctx={context_text} output={_format_compact_count(latest_completion)} total={_format_compact_count(latest_total)}",
40 f"details: cached={_format_compact_count(cached)} reasoning={_format_compact_count(reasoning)} cost={cost}",
41 ]
42 if cost_limit is not None and cost_limit > 0:
43 current_cost = _safe_float(usage.get("cost"))
44 remaining = max(0.0, cost_limit - current_cost)
45 lines.append(f"limit: max job cost=${cost_limit:g} remaining=${remaining:.4f}")
46 if calls <= 0:
47 lines.append("no model usage has been recorded for this job yet")
48 elif estimated:
49 lines.append("some usage is estimated because the provider did not return complete token/cost metadata")
50 elif not bool(usage.get("has_cost")):
51 lines.append(
52 "cost is pending unless the provider returns cost metadata, configured token rates are set, "
53 "or the model is local/free"
54 )
55 return lines
56
57
58def _safe_int(value: Any) -> int:
59 try:
60 return int(float(value))
61 except (TypeError, ValueError):
62 return 0
63
64
65def _safe_float(value: Any) -> float:
66 try:
67 return float(value)
68 except (TypeError, ValueError):
69 return 0.0
70
71
72def _safe_optional_float(value: Any) -> float | None:
73 if value in (None, ""):
74 return None
75 try:
76 return float(value)
77 except (TypeError, ValueError):
78 return None
nipux_cli/web.py 121 lines
1"""Small web search/extract helpers without external web tool dependencies."""
2
3from __future__ import annotations
4
5import html
6import re
7import urllib.parse
8import urllib.request
9from html.parser import HTMLParser
10from typing import Any
11
12from nipux_cli.source_quality import anti_bot_reason
13
14
15class _TextExtractor(HTMLParser):
16 def __init__(self):
17 super().__init__()
18 self.parts: list[str] = []
19 self.skip_depth = 0
20
21 def handle_starttag(self, tag: str, attrs):
22 del attrs
23 if tag in {"script", "style", "noscript"}:
24 self.skip_depth += 1
25 if tag in {"p", "br", "div", "section", "article", "li", "h1", "h2", "h3"}:
26 self.parts.append("\n")
27
28 def handle_endtag(self, tag: str):
29 if tag in {"script", "style", "noscript"} and self.skip_depth:
30 self.skip_depth -= 1
31 if tag in {"p", "div", "section", "article", "li"}:
32 self.parts.append("\n")
33
34 def handle_data(self, data: str):
35 if not self.skip_depth:
36 text = data.strip()
37 if text:
38 self.parts.append(text)
39
40 def text(self) -> str:
41 raw = " ".join(self.parts)
42 raw = re.sub(r"[ \t\r\f\v]+", " ", raw)
43 raw = re.sub(r"\n\s+", "\n", raw)
44 raw = re.sub(r"\n{3,}", "\n\n", raw)
45 return html.unescape(raw).strip()
46
47
48def _request(url: str, *, timeout: int = 20, max_bytes: int = 2_000_000) -> tuple[str, str]:
49 request = urllib.request.Request(
50 url,
51 headers={
52 "User-Agent": "Mozilla/5.0 (compatible; nipux/0.1; +https://github.com/nipuxx/agent-cli)",
53 "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,text/plain;q=0.8,*/*;q=0.5",
54 },
55 )
56 with urllib.request.urlopen(request, timeout=timeout) as response:
57 content_type = response.headers.get("content-type", "")
58 body = response.read(max_bytes + 1)
59 if len(body) > max_bytes:
60 raise ValueError(f"response exceeded {max_bytes} bytes")
61 charset = "utf-8"
62 match = re.search(r"charset=([\w.-]+)", content_type, re.I)
63 if match:
64 charset = match.group(1)
65 return body.decode(charset, errors="replace"), content_type
66
67
68def _strip_html(markup: str) -> str:
69 parser = _TextExtractor()
70 parser.feed(markup)
71 return parser.text()
72
73
74def _duckduckgo_link(raw: str) -> str:
75 parsed = urllib.parse.urlparse(html.unescape(raw))
76 query = urllib.parse.parse_qs(parsed.query)
77 if "uddg" in query and query["uddg"]:
78 return query["uddg"][0]
79 return html.unescape(raw)
80
81
82def web_search(query: str, *, limit: int = 5) -> dict[str, Any]:
83 url = "https://duckduckgo.com/html/?" + urllib.parse.urlencode({"q": query})
84 markup, _ = _request(url)
85 pattern = re.compile(
86 r'<a[^>]+class="result__a"[^>]+href="(?P<href>[^"]+)"[^>]*>(?P<title>.*?)</a>',
87 re.I | re.S,
88 )
89 results = []
90 for match in pattern.finditer(markup):
91 title = re.sub(r"<[^>]+>", "", match.group("title"))
92 results.append({"title": html.unescape(title).strip(), "url": _duckduckgo_link(match.group("href"))})
93 if len(results) >= limit:
94 break
95 return {"success": True, "query": query, "results": results}
96
97
98def web_extract(urls: list[str], *, limit_chars: int = 12_000) -> dict[str, Any]:
99 pages = []
100 for url in urls[:5]:
101 try:
102 body, content_type = _request(url)
103 text = body if "text/plain" in content_type else _strip_html(body)
104 reason = anti_bot_reason(url, text[:2000])
105 page = {
106 "url": url,
107 "content_type": content_type,
108 "text": text[:limit_chars],
109 "truncated": len(text) > limit_chars,
110 }
111 if reason:
112 page["source_warning"] = reason
113 page["warnings"] = [{
114 "type": "anti_bot",
115 "message": reason,
116 "guidance": "This page may require normal human browser verification. Do not bypass protections.",
117 }]
118 pages.append(page)
119 except Exception as exc:
120 pages.append({"url": url, "error": str(exc)})
121 return {"success": True, "pages": pages}
nipux_cli/worker.py 7538 lines
1"""Bounded worker loop for one restartable agent step."""
2
3from __future__ import annotations
4
5import json
6import re
7import shlex
8import signal
9import threading
10import time
11from dataclasses import dataclass
12from datetime import datetime, timezone
13from pathlib import Path
14from typing import Any
15from urllib.parse import urlparse
16
17from nipux_cli.artifacts import ArtifactStore
18from nipux_cli.config import AppConfig, load_config
19from nipux_cli.compression import refresh_memory_index
20from nipux_cli.context_pressure import (
21 context_pressure_for_prompt,
22 emit_context_pressure_update,
23 emit_usage_pressure_update,
24 usage_pressure_for_prompt,
25)
26from nipux_cli.db import AgentDB
27from nipux_cli.llm import LLMResponse, LLMResponseError, OpenAIChatLLM, StepLLM, ToolCall
28from nipux_cli.measurement import measurement_candidates, measurement_candidates_are_diagnostic_only
29from nipux_cli.memory_graph import memory_graph_from_job
30from nipux_cli.metric_format import format_metric_value
31from nipux_cli.operator_context import (
32 inactive_prompt_operator_ids,
33)
34from nipux_cli.progress import build_progress_checkpoint
35from nipux_cli.provider_errors import provider_action_required_note
36from nipux_cli.source_quality import anti_bot_reason
37from nipux_cli.task_match import find_semantic_task_match, task_key
38from nipux_cli.tools import DEFAULT_REGISTRY, ToolContext, ToolRegistry
39from nipux_cli.worker_policy import (
40 ACTIVITY_STAGNATION_BLOCKED_TOOLS,
41 ACTIVITY_STAGNATION_CHECKPOINTS,
42 ANTI_BOT_ACK_TERMS,
43 ARTIFACT_ACCOUNTING_BLOCKED_TOOLS,
44 ARTIFACT_ACCOUNTING_RESOLUTION_TOOLS,
45 BRANCH_WORK_TOOLS,
46 CHURN_TOOLS,
47 DELIVERABLE_ARTIFACT_TERMS,
48 DELIVERABLE_PROGRESS_BLOCKED_TOOLS,
49 DELIVERABLE_RESEARCH_BUDGET_STEPS,
50 EVIDENCE_ARTIFACT_TERMS,
51 EXPERIMENT_DELIVERY_ACTION_TERMS,
52 EXPERIMENT_INFORMATION_ACTION_TERMS,
53 EXPERIMENT_NEXT_ACTION_BLOCKED_TOOLS,
54 FILE_VALIDATION_BLOCKED_TOOLS,
55 FILE_VALIDATION_RESOLUTION_TOOLS,
56 LEDGER_PROGRESS_TOOLS,
57 MAX_WORKER_PROMPT_CHARS,
58 MEASURABLE_ACTION_BUDGET_STEPS,
59 MEASURABLE_PROGRESS_PATTERN,
60 MEASURABLE_RESEARCH_BLOCKED_TOOLS,
61 MEASURABLE_RESEARCH_BUDGET_STEPS,
62 MEASUREMENT_BLOCKED_TOOLS,
63 MEASUREMENT_RESOLUTION_TOOLS,
64 MEMORY_CONSOLIDATION_BLOCKED_TOOLS,
65 MEMORY_ENTRY_PROMPT_CHARS,
66 MEMORY_PROMPT_CHARS,
67 MILESTONE_VALIDATION_BLOCKED_TOOLS,
68 PROGRAM_PROMPT_CHARS,
69 QUERY_STOPWORDS,
70 READ_ONLY_SHELL_COMMAND_PATTERN,
71 RECENT_STATE_PROMPT_CHARS,
72 RECENT_STATE_STEPS,
73 RECOVERABLE_GUARD_ERRORS,
74 REFLECTION_INTERVAL_STEPS,
75 RESEARCH_BALANCE_BLOCKED_TOOLS,
76 ROADMAP_STALENESS_BLOCKED_TOOLS,
77 SOURCE_YIELD_BLOCKED_TOOLS,
78 SYSTEM_PROMPT,
79 TASK_DELIVERABLE_ACTION_TERMS,
80 TASK_PLANNING_STAGNATION_CHECKPOINTS,
81 TASK_QUEUE_SATURATION_OPEN_TASKS,
82 TASK_QUEUE_TOTAL_SOFT_LIMIT,
83 TEXT_TOKEN_STOPWORDS,
84)
85from nipux_cli.worker_prompt_context import (
86 _as_float,
87 _as_int,
88 _experiments_for_prompt,
89 _ledgers_for_prompt,
90 _lessons_for_prompt,
91 _memory_graph_for_prompt,
92 _memory_entries_for_prompt,
93 _metadata_list,
94 _operator_messages_for_prompt,
95 _outcomes_for_prompt,
96 _render_worker_prompt,
97 _roadmap_for_prompt,
98 _tasks_for_prompt,
99 _timeline_for_prompt,
100)
101from nipux_cli.worker_prompt_format import (
102 clip_text as _clip_text,
103 format_step_for_prompt as _format_step_for_prompt,
104 observation_for_prompt as _observation_for_prompt,
105)
106from nipux_cli.worker_tool_summary import summarize_tool_result as _summarize_tool_result
107from nipux_cli.worker_usage import turn_usage_metadata
108
109
110__all__ = ["MAX_WORKER_PROMPT_CHARS", "_render_worker_prompt", "build_messages", "run_one_step"]
111
112
113LESSON_SPRAWL_MIN_LESSONS = 30
114LESSON_SPRAWL_RECENT_LESSONS = 3
115EXPERIMENT_STAGNATION_MIN_TRIALS = 6
116EXPERIMENT_STAGNATION_NON_IMPROVING = 4
117SOURCE_YIELD_MIN_SOURCES = 12
118SOURCE_YIELD_MIN_RECENT_GATHERING = 5
119
120
121@dataclass(frozen=True)
122class StepExecution:
123 job_id: str
124 run_id: str
125 step_id: str
126 tool_name: str | None
127 status: str
128 result: dict[str, Any]
129
130
131EXPERIMENT_NEXT_ACTION_VERIFY_SHELL_PATTERN = re.compile(
132 r"(?is)^\s*(?:command\s+-v\b|which\b|type\b|test\b|ls\b|find\b|stat\b|file\b)"
133)
134EXPERIMENT_NEXT_ACTION_VERIFY_STOPWORDS = {
135 "action",
136 "after",
137 "before",
138 "from",
139 "into",
140 "next",
141 "real",
142 "then",
143 "using",
144 "with",
145}
146MILESTONE_MATCH_STOPWORDS = {
147 "acceptance",
148 "blocked",
149 "criteria",
150 "current",
151 "done",
152 "evidence",
153 "failed",
154 "issue",
155 "issues",
156 "milestone",
157 "needed",
158 "pending",
159 "passed",
160 "result",
161 "roadmap",
162 "status",
163 "title",
164 "validating",
165 "validation",
166 "validate",
167}
168
169
170def build_messages(
171 job: dict[str, Any],
172 recent_steps: list[dict[str, Any]],
173 memory_entries: list[dict[str, Any]] | None = None,
174 program_text: str = "",
175 timeline_events: list[dict[str, Any]] | None = None,
176 active_operator_messages: list[dict[str, Any]] | None = None,
177 include_unclaimed_operator_messages: bool = True,
178 token_usage: dict[str, Any] | None = None,
179) -> list[dict[str, Any]]:
180 step_lines = []
181 for step in recent_steps[-RECENT_STATE_STEPS:]:
182 step_lines.append(_clip_text(_format_step_for_prompt(step), 720))
183 state = _clip_text("\n".join(step_lines), RECENT_STATE_PROMPT_CHARS) if step_lines else "No prior steps."
184 memory_lines = []
185 for entry in _memory_entries_for_prompt(memory_entries or []):
186 refs = ", ".join((entry.get("artifact_refs") or [])[:8])
187 suffix = f"\nArtifact refs: {refs}" if refs else ""
188 memory_lines.append(
189 _clip_text(f"### {entry.get('key') or 'memory'}\n{entry.get('summary') or ''}{suffix}", MEMORY_ENTRY_PROMPT_CHARS)
190 )
191 memory_text = _clip_text("\n\n".join(memory_lines), MEMORY_PROMPT_CHARS) if memory_lines else "No compact memory yet."
192 program = _clip_text(program_text.strip(), PROGRAM_PROMPT_CHARS) if program_text else "No program.md saved yet."
193 operator_messages = _operator_messages_for_prompt(
194 job,
195 active_messages=active_operator_messages or [],
196 include_unclaimed=include_unclaimed_operator_messages,
197 )
198 current_execution_focus = _current_execution_focus_for_prompt(job, recent_steps)
199 measurement_obligation = _measurement_obligation_for_prompt(job)
200 recent_measurement_evidence = _recent_measurement_evidence_for_prompt(job, recent_steps)
201 file_validation_obligation = _file_validation_obligation_for_prompt(job)
202 candidate_file_discovery = _candidate_file_discovery_for_prompt(job, recent_steps)
203 shell_path_recovery = _shell_path_recovery_for_prompt(recent_steps)
204 shell_permission_recovery = _shell_permission_recovery_for_prompt(recent_steps)
205 measured_progress_guard = _measured_progress_guard_for_prompt(job, recent_steps)
206 experiment_stagnation_guard = _experiment_stagnation_guard_for_prompt(job, recent_steps)
207 research_balance_guard = _research_balance_guard_for_prompt(job, recent_steps)
208 source_yield_guard = _source_yield_guard_for_prompt(job, recent_steps)
209 deliverable_progress_guard = _deliverable_progress_guard_for_prompt(job, recent_steps)
210 progress_accounting_guard = _progress_accounting_for_prompt(recent_steps)
211 evidence_checkpoint_guard = _evidence_checkpoint_accounting_for_prompt(job, recent_steps)
212 activity_stagnation = _activity_stagnation_for_prompt(job)
213 task_planning_guard = _task_planning_guard_for_prompt(job)
214 task_queue_saturation = _task_queue_saturation_for_prompt(job, recent_steps)
215 memory_consolidation_guard = _memory_consolidation_guard_for_prompt(job, recent_steps)
216 lesson_consolidation_guard = _lesson_consolidation_guard_for_prompt(job, recent_steps)
217 durable_yield = _durable_yield_for_prompt(job, recent_steps)
218 context_pressure = context_pressure_for_prompt(job)
219 usage_pressure = usage_pressure_for_prompt(job, token_usage)
220 lessons = _lessons_for_prompt(job)
221 memory_graph = _memory_graph_for_prompt(job)
222 roadmap = _roadmap_for_prompt(job)
223 tasks = _tasks_for_prompt(job)
224 ledgers = _ledgers_for_prompt(job)
225 experiments = _experiments_for_prompt(job)
226 reflections = _reflections_for_prompt(job)
227 timeline = _timeline_for_prompt(timeline_events or [])
228 outcomes = _outcomes_for_prompt(timeline_events or [])
229 next_constraint = _next_action_constraint(job, recent_steps)
230 content = _render_worker_prompt(
231 job,
232 sections=[
233 (
234 "Workspace",
235 "\n".join([
236 "- shell_exec runs on the machine hosting this Nipux worker, in the current worker directory unless the command changes it",
237 "- saved artifacts are separate Nipux outputs; read_artifact is only for those saved outputs",
238 "- use shell_exec for workspace/project files unless the file is a saved artifact",
239 ]),
240 ),
241 ("Operator context", operator_messages),
242 ("Current execution focus", current_execution_focus),
243 ("Pending measurement obligation", measurement_obligation),
244 ("Recent measurement evidence", recent_measurement_evidence),
245 ("Pending file validation obligation", file_validation_obligation),
246 ("Candidate file discovery", candidate_file_discovery),
247 ("Shell path recovery", shell_path_recovery),
248 ("Shell permission recovery", shell_permission_recovery),
249 ("Measured progress guard", measured_progress_guard),
250 ("Experiment stagnation guard", experiment_stagnation_guard),
251 ("Research balance guard", research_balance_guard),
252 ("Source yield guard", source_yield_guard),
253 ("Deliverable progress guard", deliverable_progress_guard),
254 ("Progress accounting guard", progress_accounting_guard),
255 ("Evidence checkpoint accounting guard", evidence_checkpoint_guard),
256 ("Activity stagnation", activity_stagnation),
257 ("Task planning guard", task_planning_guard),
258 ("Task queue saturation", task_queue_saturation),
259 ("Memory consolidation guard", memory_consolidation_guard),
260 ("Lesson consolidation guard", lesson_consolidation_guard),
261 ("Durable progress yield", durable_yield),
262 ("Context pressure", context_pressure),
263 ("Usage pressure", usage_pressure),
264 ("Program", program),
265 ("Lessons learned", lessons),
266 ("Memory graph", memory_graph),
267 ("Roadmap", roadmap),
268 ("Task queue", tasks),
269 ("Durable outcomes", outcomes),
270 ("Ledgers", ledgers),
271 ("Experiment ledger", experiments),
272 ("Reflections", reflections),
273 ("Compact memory", memory_text),
274 ("Recent visible timeline", timeline),
275 ("Recent state", state),
276 ("Next-action constraint", next_constraint),
277 ],
278 )
279 return [
280 {"role": "system", "content": SYSTEM_PROMPT},
281 {"role": "user", "content": content},
282 ]
283
284
285def _acknowledge_non_prompt_operator_context(db: AgentDB, job_id: str) -> int:
286 job = db.get_job(job_id)
287 metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
288 messages = metadata.get("operator_messages") if isinstance(metadata.get("operator_messages"), list) else []
289 message_ids = inactive_prompt_operator_ids(messages)
290 if not message_ids:
291 return 0
292 result = db.acknowledge_operator_messages(
293 job_id,
294 message_ids=message_ids,
295 summary="conversation-only message retained in history, not used as worker constraint",
296 )
297 return int(result.get("count") or 0)
298
299
300def _measured_progress_guard_for_prompt(job: dict[str, Any], recent_steps: list[dict[str, Any]]) -> str:
301 context = _measured_progress_guard_context(job, recent_steps)
302 if not context:
303 return "None."
304 if _as_int(context.get("shell_actions_since_last_experiment")) >= _as_int(context.get("shell_action_budget")):
305 candidate_context = _candidate_file_discovery_context(job, recent_steps)
306 shell_guidance = "Do not call shell_exec or do more research next."
307 if candidate_context:
308 shell_guidance = (
309 "Do not call broad shell_exec or do more research next. A single bounded shell_exec is allowed only "
310 "when it validates one exact candidate path already listed in Candidate file discovery."
311 )
312 return (
313 "This objective or active task is measurably framed, and the shell/action budget since the last experiment "
314 f"is exhausted. completed_since_last_experiment={context.get('completed_since_last_experiment')} "
315 f"shell_actions={context.get('shell_actions_since_last_experiment')} shell_budget={context.get('shell_action_budget')} "
316 f"reason={context.get('reason')}. {shell_guidance} Use record_experiment "
317 "for a known result, record_tasks to create a missing experiment/monitor branch, or record_lesson if the "
318 "branch is blocked or the recent outputs were not valid measurements."
319 )
320 return (
321 "This objective or active task is measurably framed, but recent work has not produced "
322 f"new experiment records. completed_since_last_experiment={context.get('completed_since_last_experiment')} "
323 f"research_budget={context.get('research_budget')} shell_actions={context.get('shell_actions_since_last_experiment')} "
324 f"shell_budget={context.get('shell_action_budget')} reason={context.get('reason')}. "
325 "Next useful actions: run a small measuring action, call record_experiment for a known result, "
326 "or use record_tasks to create an experiment/action/monitor task with acceptance criteria and evidence."
327 )
328
329
330def _deliverable_progress_guard_for_prompt(job: dict[str, Any], recent_steps: list[dict[str, Any]]) -> str:
331 context = _deliverable_progress_guard_context(job, recent_steps)
332 if not context:
333 return "None."
334 return (
335 "This objective or active task expects a durable deliverable, but recent branch work has not produced a "
336 "draft/report/file checkpoint. "
337 f"completed_since_last_deliverable={context.get('completed_since_last_deliverable')} "
338 f"research_budget={context.get('research_budget')} reason={context.get('reason')}. "
339 "Next useful actions: write_file or write_artifact for a partial deliverable, record_tasks for a smaller "
340 "deliverable branch, record_roadmap/record_milestone_validation for validation, or record_lesson if the "
341 "deliverable is blocked."
342 )
343
344
345def _experiment_stagnation_guard_for_prompt(job: dict[str, Any], recent_steps: list[dict[str, Any]]) -> str:
346 context = _experiment_stagnation_context(job, recent_steps)
347 if not context:
348 return "None."
349 return (
350 "Recent measured trials have not improved the best observed result. "
351 f"metric={context.get('metric_name')} unit={context.get('metric_unit')} "
352 f"best={context.get('best_value')} latest={context.get('latest_value')} "
353 f"non_improving={context.get('non_improving_count')} recent_trials={context.get('recent_trials')}. "
354 "Before more experiments, shell execution, research, or output churn, record a decision: reject or block the "
355 "stale branch, pivot to a materially different branch, update the roadmap/task queue, or explain why the "
356 "stagnant measurements are still useful."
357 )
358
359
360def _measurement_obligation_for_prompt(job: dict[str, Any]) -> str:
361 metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
362 obligation = metadata.get("pending_measurement_obligation")
363 if not isinstance(obligation, dict) or not obligation or obligation.get("resolved_at"):
364 return "None."
365 candidates = obligation.get("metric_candidates") if isinstance(obligation.get("metric_candidates"), list) else []
366 lines = [
367 f"source_step=#{obligation.get('source_step_no') or '?'} tool={obligation.get('tool') or ''}",
368 f"summary={obligation.get('summary') or ''}",
369 ]
370 command = str(obligation.get("command") or "")
371 if command:
372 lines.append(f"command={_clip_text(command, 360)}")
373 if candidates:
374 lines.append("metric_candidates=" + "; ".join(str(item) for item in candidates[:6]))
375 lines.append(
376 "Before more research or artifact churn, call record_experiment with the measured result, "
377 "record_lesson explaining why it is not a valid measurement, or record_tasks to create the missing measurement branch."
378 )
379 return "\n".join(lines)
380
381
382def _recent_measurement_evidence_for_prompt(job: dict[str, Any], recent_steps: list[dict[str, Any]], *, window: int = 140) -> str:
383 if _pending_measurement_obligation(job):
384 return "Covered by the pending measurement obligation."
385 latest_experiment_step_no = max(
386 (
387 _as_int(step.get("step_no"))
388 for step in recent_steps
389 if step.get("tool_name") == "record_experiment" and step.get("status") == "completed"
390 ),
391 default=0,
392 )
393 lines: list[str] = []
394 for step in reversed(_completed_or_failed_recent_steps(recent_steps)[-window:]):
395 if step.get("tool_name") != "shell_exec":
396 continue
397 output = step.get("output") if isinstance(step.get("output"), dict) else {}
398 if not output:
399 continue
400 command = _step_command(step) or str(output.get("command") or "")
401 candidates = measurement_candidates(output, command=command, limit=4)
402 if not candidates or measurement_candidates_are_diagnostic_only(candidates, command=command):
403 continue
404 step_no = _as_int(step.get("step_no"))
405 relation = "after last experiment" if step_no > latest_experiment_step_no else "before last experiment"
406 prefix = f"- step #{step_no or '?'} {step.get('status') or ''} ({relation})"
407 detail = "; ".join(str(candidate) for candidate in candidates[:3])
408 command_detail = f" command={_clip_text(command, 180)}" if command else ""
409 lines.append(_clip_text(f"{prefix}: {detail}.{command_detail}", 520))
410 if len(lines) >= 6:
411 break
412 if not lines:
413 return "None."
414 return "\n".join([
415 "Recent shell output contains measurable-looking values. Reconcile valid values with record_experiment; "
416 "if a value is invalid, record why before treating the branch as complete.",
417 *reversed(lines),
418 ])
419
420
421def _file_validation_obligation_for_prompt(job: dict[str, Any]) -> str:
422 obligation = _pending_file_validation_obligation(job)
423 if not obligation:
424 return "None."
425 lines = [
426 f"path={obligation.get('path') or ''}",
427 f"source_step=#{obligation.get('source_step_no') or '?'}",
428 f"reason={obligation.get('reason') or 'recent file output needs validation'}",
429 ]
430 suggested = str(obligation.get("suggested_validation") or "").strip()
431 if suggested:
432 lines.append(f"suggested_validation={suggested}")
433 lines.append(
434 "Before more research/output churn, validate the file with shell_exec, "
435 "corroborating any `file` output with header/signature bytes, checksum/size, or a parser/loader when the expected "
436 "format matters, "
437 "or use record_tasks/record_lesson/record_experiment to explain the blocked or deferred validation."
438 )
439 return "\n".join(lines)
440
441
442def _current_execution_focus_for_prompt(job: dict[str, Any], recent_steps: list[dict[str, Any]]) -> str:
443 focus = _current_execution_focus_context(job, recent_steps)
444 if not focus:
445 return (
446 "No isolated focus yet. Follow the next-action constraint, choose one bounded branch, "
447 "and account for the result before expanding the backlog."
448 )
449 lines = [
450 f"phase={focus['phase']}",
451 "focus=" + _clip_text(str(focus.get("focus") or ""), 620),
452 "next=" + _clip_text(str(focus.get("next") or ""), 620),
453 ]
454 evidence = str(focus.get("evidence") or "").strip()
455 if evidence:
456 lines.append("evidence=" + _clip_text(evidence, 520))
457 task = str(focus.get("task") or "").strip()
458 if task:
459 lines.append("task=" + _clip_text(task, 520))
460 backlog = focus.get("backlog") if isinstance(focus.get("backlog"), dict) else {}
461 if backlog:
462 lines.append(
463 "backlog="
464 f"{backlog.get('total')} tasks, {backlog.get('open')} runnable/open. "
465 "Treat it as advisory until this focus is resolved; do not add new branches unless directly closing, "
466 "merging, or relabeling existing work."
467 )
468 lines.append(
469 "Boundary: do not switch to unrelated search, task creation, or stale branches unless this focus is blocked "
470 "by fresh tool evidence. If it is blocked, record the blocker and the next concrete recovery action."
471 )
472 return "\n".join(lines)
473
474
475def _current_execution_focus_context(job: dict[str, Any], recent_steps: list[dict[str, Any]]) -> dict[str, Any] | None:
476 backlog = _task_backlog_pressure_context(job)
477 measurement_obligation = _pending_measurement_obligation(job)
478 if measurement_obligation:
479 return {
480 "phase": "account_measurement",
481 "focus": f"Resolve pending measurement from step #{measurement_obligation.get('source_step_no') or '?'}",
482 "next": "Use record_experiment for a valid measured result, or record_lesson/record_tasks if the result is invalid or missing.",
483 "evidence": str(measurement_obligation.get("summary") or ""),
484 "backlog": backlog,
485 }
486 file_validation = _pending_file_validation_obligation(job)
487 if file_validation:
488 return {
489 "phase": "validate_file",
490 "focus": f"Validate recently written file {file_validation.get('path') or ''}",
491 "next": str(file_validation.get("suggested_validation") or "Run one bounded validation command, then account for the result."),
492 "evidence": str(file_validation.get("reason") or ""),
493 "backlog": backlog,
494 }
495 checkpoint = _auto_checkpoint_accounting_context(job, recent_steps)
496 if checkpoint:
497 if checkpoint.get("checkpoint_read"):
498 next_action = "Use a durable ledger tool to account for the already-read evidence checkpoint."
499 else:
500 next_action = "Read the specific checkpoint artifact or account for it from existing evidence."
501 return {
502 "phase": "account_checkpoint",
503 "focus": f"Resolve auto-saved evidence checkpoint {checkpoint.get('artifact_id') or checkpoint.get('title') or ''}",
504 "next": next_action,
505 "evidence": str(checkpoint.get("summary") or checkpoint.get("title") or ""),
506 "backlog": backlog,
507 }
508 candidate_files = _candidate_file_discovery_context(job, recent_steps)
509 if candidate_files:
510 paths = candidate_files.get("paths") if isinstance(candidate_files.get("paths"), list) else []
511 invalid_paths = set(str(path) for path in candidate_files.get("invalid_paths") or [])
512 primary_path = next((str(path) for path in paths if str(path) not in invalid_paths), str(paths[0]) if paths else "")
513 validated = _candidate_file_recently_validated(primary_path, recent_steps)
514 if validated:
515 return {
516 "phase": "execute_with_validated_candidate",
517 "focus": f"Use the recently validated candidate path: {primary_path}",
518 "next": (
519 "Run the next bounded action or measurement that uses this validated path; do not repeat "
520 "existence checks unless a new command requires a different property."
521 ),
522 "evidence": validated,
523 "task": str(candidate_files.get("task_text") or ""),
524 "backlog": backlog,
525 }
526 return {
527 "phase": "execute_candidate_validation",
528 "focus": f"Validate the highest-confidence candidate path: {primary_path}",
529 "next": "Run one bounded shell/file validation against the primary path, then record the measurement, finding, lesson, or blocker.",
530 "evidence": f"Ranked candidates: {'; '.join(str(path) for path in paths[:4])}",
531 "task": str(candidate_files.get("task_text") or ""),
532 "backlog": backlog,
533 }
534 grounding_block = _latest_evidence_grounding_block(recent_steps)
535 if grounding_block:
536 return {
537 "phase": "repair_record",
538 "focus": "Rewrite or replace the blocked durable record using only observed evidence.",
539 "next": "Use exact observed tokens/paths from recent evidence, or record why the attempted claim is invalid.",
540 "evidence": str(grounding_block.get("error") or grounding_block.get("summary") or ""),
541 "backlog": backlog,
542 }
543 experiment_next_action = _latest_experiment_next_action_context(job)
544 if experiment_next_action:
545 return {
546 "phase": "execute_measured_next_action",
547 "focus": "Continue from the latest measured experiment decision.",
548 "next": str(experiment_next_action.get("next_action") or ""),
549 "evidence": str(experiment_next_action.get("title") or experiment_next_action.get("summary") or ""),
550 "backlog": backlog,
551 }
552 milestone_validation = _milestone_validation_needed(job)
553 if milestone_validation:
554 return {
555 "phase": "validate_milestone",
556 "focus": f"Validate milestone: {milestone_validation.get('title') or 'current milestone'}",
557 "next": "Use record_milestone_validation with pass/fail/blocker status from observed evidence.",
558 "evidence": str(milestone_validation.get("evidence_needed") or milestone_validation.get("acceptance_criteria") or ""),
559 "backlog": backlog,
560 }
561 task = _primary_execution_task(job)
562 if task and backlog:
563 return {
564 "phase": "execute_task",
565 "focus": str(task.get("title") or "current task"),
566 "next": str(task.get("next_action") or task.get("goal") or task.get("acceptance_criteria") or "Take one bounded action for this task."),
567 "evidence": str(task.get("evidence_needed") or task.get("result") or ""),
568 "task": str(task),
569 "backlog": backlog,
570 }
571 return None
572
573
574def _task_backlog_pressure_context(job: dict[str, Any]) -> dict[str, int] | None:
575 tasks = [task for task in _metadata_list(job, "task_queue") if isinstance(task, dict)]
576 if not tasks:
577 return None
578 runnable_statuses = {"active", "open", "waiting", "blocked"}
579 open_count = sum(1 for task in tasks if str(task.get("status") or "open").lower() in runnable_statuses)
580 total_count = len(tasks)
581 if total_count < TASK_QUEUE_TOTAL_SOFT_LIMIT and open_count < TASK_QUEUE_SATURATION_OPEN_TASKS:
582 return None
583 return {"total": total_count, "open": open_count}
584
585
586def _primary_execution_task(job: dict[str, Any]) -> dict[str, Any] | None:
587 tasks = [task for task in _metadata_list(job, "task_queue") if isinstance(task, dict)]
588 status_rank = {"active": 0, "open": 1, "waiting": 2, "blocked": 3}
589 runnable = [task for task in tasks if str(task.get("status") or "open").lower() in status_rank]
590 if not runnable:
591 return None
592 return sorted(
593 runnable,
594 key=lambda task: (
595 status_rank.get(str(task.get("status") or "open").lower(), 9),
596 -_as_int(task.get("priority")),
597 str(task.get("title") or ""),
598 ),
599 )[0]
600
601
602def _candidate_file_discovery_for_prompt(job: dict[str, Any], recent_steps: list[dict[str, Any]]) -> str:
603 context = _candidate_file_discovery_context(job, recent_steps)
604 if not context:
605 return "None."
606 paths = context["paths"]
607 source_text = context["source_text"]
608 lines = [
609 f"{source_text} while open work depends on file/path validation.",
610 "Candidate paths:",
611 ]
612 for path in paths[:8]:
613 lines.append(f"- {path}")
614 primary_path = str(paths[0]) if paths else ""
615 validation = _candidate_file_recently_validated(primary_path, recent_steps)
616 if validation:
617 lines.append(
618 "Highest-ranked candidate has recent positive validation evidence. "
619 f"Use it for the next action instead of repeating existence checks: {_clip_text(validation, 420)}"
620 )
621 invalid_paths = context.get("invalid_paths") if isinstance(context.get("invalid_paths"), list) else []
622 if invalid_paths:
623 lines.append(
624 "Recently invalid or stub-like candidates: "
625 + "; ".join(str(path) for path in invalid_paths[:5])
626 + ". Prefer higher-confidence candidates before retrying these."
627 )
628 lines.append(
629 "Validate likely candidates with shell_exec before recording a no-file/no-progress claim or searching for alternatives. "
630 "Do not reject a non-empty candidate binary from `file` output alone; corroborate with header/signature bytes, "
631 "checksum/size, or a parser/loader for the expected format, or record uncertainty. "
632 "Treat durable-record candidates as candidates until revalidated. This supersedes stale no-candidate/no-file memory "
633 "until validation proves those candidates are irrelevant."
634 )
635 lines.append(f"Relevant open work: {_clip_text(context['task_text'], 500)}")
636 return "\n".join(lines)
637
638
639def _shell_path_recovery_for_prompt(recent_steps: list[dict[str, Any]]) -> str:
640 context = _shell_path_recovery_context(recent_steps)
641 if not context:
642 return "None."
643 paths = context.get("missing_paths") if isinstance(context.get("missing_paths"), list) else []
644 commands = context.get("missing_commands") if isinstance(context.get("missing_commands"), list) else []
645 candidate_executables = (
646 context.get("candidate_executables") if isinstance(context.get("candidate_executables"), dict) else {}
647 )
648 observed_executables = (
649 context.get("observed_executables") if isinstance(context.get("observed_executables"), list) else []
650 )
651 lines = [
652 f"Recent shell step #{context.get('step_no') or '?'} reported a missing command or path.",
653 ]
654 if commands:
655 lines.append("Missing commands: " + ", ".join(str(command) for command in commands[:6]))
656 if candidate_executables:
657 for command, command_paths in list(candidate_executables.items())[:6]:
658 if not isinstance(command_paths, list) or not command_paths:
659 continue
660 lines.append(
661 f"Observed candidate executable for {command}: "
662 + ", ".join(str(path) for path in command_paths[:4])
663 )
664 lines.append("Recovery priority: try the exact candidate path or add its directory to PATH before package-manager/install retries.")
665 if paths:
666 lines.append("Missing paths: " + ", ".join(str(path) for path in paths[:6]))
667 if observed_executables:
668 lines.append("Observed executable paths in partial shell output: " + ", ".join(str(path) for path in observed_executables[:8]))
669 if not commands and not paths:
670 lines.append("Missing command/path was not parsed.")
671 command = str(context.get("command") or "")
672 if command:
673 lines.append(f"Failed command: {_clip_text(command, 420)}")
674 excerpt = str(context.get("excerpt") or "")
675 if excerpt:
676 lines.append(f"Observed output: {_clip_text(excerpt, 360)}")
677 lines.append(
678 "Do not treat this output as a successful measurement or deliverable. Next, locate or verify the real "
679 "executable/file path with a bounded shell probe such as command -v, find, ls, or an equivalent platform "
680 "tool. Retry using only a validated path, or record the branch as blocked/skipped with the observed reason."
681 )
682 return "\n".join(lines)
683
684
685def _shell_path_recovery_context(recent_steps: list[dict[str, Any]], *, window: int = 16) -> dict[str, Any] | None:
686 for step in reversed(_completed_or_failed_recent_steps(recent_steps)[-window:]):
687 if step.get("tool_name") != "shell_exec":
688 continue
689 output = step.get("output") if isinstance(step.get("output"), dict) else {}
690 text = "\n".join(str(output.get(key) or "") for key in ("stdout", "stderr", "error"))
691 if not text.strip():
692 continue
693 missing_paths = _missing_paths_from_shell_output(text)
694 if not missing_paths and not _shell_output_has_missing_command(text):
695 continue
696 commands = _missing_commands_from_shell_output(text)
697 return {
698 "step_no": step.get("step_no"),
699 "command": output.get("command"),
700 "missing_commands": commands,
701 "candidate_executables": _candidate_executable_paths_for_missing_commands(recent_steps, commands),
702 "observed_executables": _observed_executable_paths_from_recent_shell(
703 recent_steps,
704 exclude_paths=missing_paths,
705 ),
706 "missing_paths": missing_paths,
707 "excerpt": text.strip(),
708 }
709 return None
710
711
712def _shell_permission_recovery_for_prompt(recent_steps: list[dict[str, Any]]) -> str:
713 context = _recent_privileged_shell_failure_context(recent_steps)
714 if not context:
715 return "None."
716 lines = [
717 f"Recent shell step #{context.get('step_no') or '?'} failed because a privileged/package-manager command lacked permission.",
718 ]
719 command = str(context.get("command") or "")
720 if command:
721 lines.append(f"Failed command: {_clip_text(command, 420)}")
722 excerpt = str(context.get("excerpt") or "")
723 if excerpt:
724 lines.append(f"Observed output: {_clip_text(excerpt, 360)}")
725 lines.append("Recovery priority: try non-privileged alternatives first; record when operator credentials are required.")
726 lines.append(
727 "Do not retry the same privileged/package-manager path. Prefer observed executables, user-writable installs, "
728 "existing project files, or other non-privileged alternatives; otherwise record the branch as blocked, skipped, "
729 "or needing operator credentials."
730 )
731 return "\n".join(lines)
732
733
734def _shell_step_failure_text(step: dict[str, Any]) -> str:
735 output = step.get("output") if isinstance(step.get("output"), dict) else {}
736 return "\n".join(str(output.get(key) or "") for key in ("stdout", "stderr", "error"))
737
738
739def _shell_output_has_missing_command(text: str) -> bool:
740 lowered = text.lower()
741 return any(marker in lowered for marker in ("command not found", ": not found", "no such file or directory"))
742
743
744def _missing_paths_from_shell_output(text: str) -> list[str]:
745 patterns = [
746 r"(?:^|\n)(?:/bin/sh:\s*\d+:\s*)?(?P<path>/[^\s:'\"]+):\s*(?:not found|No such file or directory|command not found)",
747 r"(?:cannot access|cannot stat|can't stat|stat: cannot statx?) ['\"](?P<quoted>[^'\"]+)['\"]:\s*No such file or directory",
748 r"(?:^|\n)(?P<plain>/[^\s:'\"]+):\s*No such file or directory",
749 ]
750 paths: list[str] = []
751 seen: set[str] = set()
752 for pattern in patterns:
753 for match in re.finditer(pattern, text, flags=re.IGNORECASE):
754 path = str(match.groupdict().get("path") or match.groupdict().get("quoted") or match.groupdict().get("plain") or "").strip()
755 if not path or path in seen:
756 continue
757 seen.add(path)
758 paths.append(path)
759 if len(paths) >= 12:
760 return paths
761 return paths
762
763
764def _missing_commands_from_shell_output(text: str) -> list[str]:
765 patterns = [
766 r"(?:^|\n)(?:/bin/sh:\s*\d+:\s*)?(?P<cmd>[A-Za-z0-9_.+-]+):\s*(?:not found|command not found)",
767 r"(?:^|\n)(?:sh|bash|zsh):\s*(?P<shell_cmd>[A-Za-z0-9_.+-]+):\s*command not found",
768 r"command not found:\s*(?P<suffix_cmd>[A-Za-z0-9_.+-]+)",
769 ]
770 commands: list[str] = []
771 seen: set[str] = set()
772 for pattern in patterns:
773 for match in re.finditer(pattern, text, flags=re.IGNORECASE):
774 command = str(
775 match.groupdict().get("cmd")
776 or match.groupdict().get("shell_cmd")
777 or match.groupdict().get("suffix_cmd")
778 or ""
779 ).strip()
780 if not command or "/" in command or command in seen:
781 continue
782 seen.add(command)
783 commands.append(command)
784 if len(commands) >= 12:
785 return commands
786 return commands
787
788
789def _candidate_executable_paths_for_missing_commands(
790 recent_steps: list[dict[str, Any]], missing_commands: list[str], *, window: int = 20, max_paths_per_command: int = 6
791) -> dict[str, list[str]]:
792 command_names = {str(command or "").strip().lower() for command in missing_commands}
793 command_names = {command for command in command_names if command}
794 if not command_names:
795 return {}
796 matches: dict[str, list[str]] = {command: [] for command in command_names}
797 seen: set[tuple[str, str]] = set()
798 for path in _observed_executable_paths_from_recent_shell(recent_steps, command_names=command_names, window=window):
799 name = Path(path).name.lower()
800 if name not in command_names:
801 continue
802 key = (name, path.lower())
803 if key in seen or len(matches.get(name, [])) >= max_paths_per_command:
804 continue
805 seen.add(key)
806 matches.setdefault(name, []).append(path)
807 return {command: paths for command, paths in matches.items() if paths}
808
809
810def _observed_executable_paths_from_recent_shell(
811 recent_steps: list[dict[str, Any]],
812 *,
813 command_names: set[str] | None = None,
814 exclude_paths: list[str] | None = None,
815 window: int = 20,
816 max_paths: int = 12,
817) -> list[str]:
818 excluded = {str(path or "").lower() for path in (exclude_paths or []) if path}
819 paths: list[str] = []
820 seen: set[str] = set()
821 for step in _completed_or_failed_recent_steps(recent_steps)[-window:]:
822 if step.get("tool_name") != "shell_exec":
823 continue
824 output = step.get("output") if isinstance(step.get("output"), dict) else {}
825 text = "\n".join(str(output.get(key) or "") for key in ("stdout", "stderr", "error"))
826 for line in text.splitlines():
827 if _shell_line_reports_missing_candidate(line):
828 continue
829 for path in _extract_candidate_executable_paths(line, command_names):
830 key = path.lower()
831 if key in excluded or key in seen:
832 continue
833 seen.add(key)
834 paths.append(path)
835 if len(paths) >= max_paths:
836 return paths
837 return paths
838
839
840def _shell_line_reports_missing_candidate(line: str) -> bool:
841 lowered = str(line or "").lower()
842 return any(
843 marker in lowered
844 for marker in (
845 "not found",
846 "no such file or directory",
847 "cannot access",
848 "cannot stat",
849 "can't stat",
850 "missing",
851 )
852 )
853
854
855def _extract_candidate_executable_paths(text: str, command_names: set[str] | None = None) -> list[str]:
856 commands = {command.lower() for command in (command_names or set()) if command}
857 paths: list[str] = []
858 seen: set[str] = set()
859 for match in re.finditer(r"(?<![A-Za-z0-9])(?:~|/)[^\s'\"<>|;&]{2,}", text or ""):
860 raw = _clean_candidate_file_path(match.group(0))
861 if not _looks_like_candidate_executable_path(raw):
862 continue
863 name = Path(raw).name.lower()
864 if commands and name not in commands:
865 continue
866 key = raw.lower()
867 if key in seen:
868 continue
869 seen.add(key)
870 paths.append(raw)
871 return paths
872
873
874def _looks_like_candidate_executable_path(value: str) -> bool:
875 raw = str(value or "").strip()
876 if not raw or len(raw) > 500:
877 return False
878 if "://" in raw or raw.startswith("//") or "..." in raw or "…" in raw or "*" in raw:
879 return False
880 if not raw.startswith(("/", "~")):
881 return False
882 name = Path(raw).name
883 if not name or name.startswith(".") or name in {".", ".."}:
884 return False
885 if any(char in name for char in ("$", "{", "}", "`")):
886 return False
887 return True
888
889
890PACKAGE_MANAGER_WRITE_COMMAND_PATTERN = re.compile(
891 r"(?is)(?:^|[;&|]{1,2}\s*)(?:sudo\s+|doas\s+|pkexec\s+)?"
892 r"(?:apt-get|apt|dnf|yum|apk|pacman|zypper|brew|port)\s+"
893 r"(?:install|upgrade|update|remove|erase|add|sync|build-dep)\b"
894)
895PRIVILEGED_COMMAND_PATTERN = re.compile(r"(?is)(?:^|[;&|]{1,2}\s*)(?:sudo|doas|pkexec)\b")
896
897
898def _shell_command_looks_privileged_or_package_manager(command: str) -> bool:
899 text = str(command or "").strip()
900 if not text:
901 return False
902 return bool(PRIVILEGED_COMMAND_PATTERN.search(text) or PACKAGE_MANAGER_WRITE_COMMAND_PATTERN.search(text))
903
904
905def _shell_output_has_permission_failure(text: str) -> bool:
906 lowered = str(text or "").lower()
907 return any(
908 marker in lowered
909 for marker in (
910 "permission denied",
911 "not permitted",
912 "operation not permitted",
913 "authentication",
914 "authorization",
915 "are you root",
916 "sudo:",
917 "password is required",
918 "unable to acquire the dpkg frontend lock",
919 "could not open lock file",
920 )
921 )
922
923
924def _recent_privileged_shell_failure_context(recent_steps: list[dict[str, Any]], *, window: int = 12) -> dict[str, Any] | None:
925 accounting_tools = {"record_experiment", "record_tasks", "record_lesson", "record_roadmap", "record_milestone_validation"}
926 latest_accounting_step = max(
927 (
928 _as_int(step.get("step_no"))
929 for step in recent_steps[-window:]
930 if step.get("status") == "completed" and step.get("tool_name") in accounting_tools
931 ),
932 default=0,
933 )
934 for step in reversed(_completed_or_failed_recent_steps(recent_steps)[-window:]):
935 step_no = _as_int(step.get("step_no"))
936 if latest_accounting_step and step_no <= latest_accounting_step:
937 continue
938 if step.get("tool_name") != "shell_exec":
939 continue
940 output = step.get("output") if isinstance(step.get("output"), dict) else {}
941 command = _step_command(step) or str(output.get("command") or "")
942 text = _shell_step_failure_text(step)
943 if not _shell_output_has_permission_failure(text):
944 continue
945 if not _shell_command_looks_privileged_or_package_manager(command):
946 continue
947 return {
948 "step_no": step.get("step_no"),
949 "command": command,
950 "excerpt": text.strip(),
951 }
952 return None
953
954
955def _observed_candidate_recovery_required_context(recent_steps: list[dict[str, Any]], args: dict[str, Any]) -> dict[str, Any] | None:
956 command = str(args.get("command") or "")
957 if not command.strip():
958 return None
959 context = _shell_path_recovery_context(recent_steps)
960 if not context:
961 return None
962 candidate_executables = (
963 context.get("candidate_executables") if isinstance(context.get("candidate_executables"), dict) else {}
964 )
965 if not candidate_executables:
966 return None
967 for missing_command, paths in candidate_executables.items():
968 if not isinstance(paths, list) or not paths:
969 continue
970 missing_name = str(missing_command or "").strip()
971 if not missing_name:
972 continue
973 if not _shell_command_invokes_bare_executable(command, missing_name):
974 continue
975 if _shell_command_mentions_candidate_path(command, paths):
976 continue
977 return {
978 "step_no": context.get("step_no"),
979 "missing_command": missing_name,
980 "candidate_executables": paths[:6],
981 "blocked_command": command,
982 }
983 return None
984
985
986def _shell_command_invokes_bare_executable(command: str, executable_name: str) -> bool:
987 name = str(executable_name or "").strip()
988 if not name:
989 return False
990 return bool(re.search(rf"(?<![A-Za-z0-9_./-]){re.escape(name)}(?![A-Za-z0-9_.-])", command))
991
992
993def _shell_command_mentions_candidate_path(command: str, candidate_paths: list[Any]) -> bool:
994 text = str(command or "")
995 for path_value in candidate_paths:
996 path = str(path_value or "").strip()
997 if not path:
998 continue
999 if path in text:
1000 return True
1001 parent = str(Path(path).parent)
1002 if parent and parent not in {".", "/"} and parent in text:
1003 return True
1004 return False
1005
1006
1007def _candidate_file_discovery_context(job: dict[str, Any], recent_steps: list[dict[str, Any]]) -> dict[str, Any] | None:
1008 task_text = _open_file_dependent_task_text(job)
1009 if not task_text:
1010 return None
1011 recent_paths = [
1012 *_candidate_file_paths_from_recent_shell(recent_steps),
1013 *_candidate_file_paths_from_recent_grounding_blocks(recent_steps),
1014 ]
1015 durable_paths = _candidate_file_paths_from_durable_records(job)
1016 paths: list[str] = []
1017 seen: set[str] = set()
1018 for path in [*recent_paths, *durable_paths]:
1019 key = path.lower()
1020 if key in seen:
1021 continue
1022 seen.add(key)
1023 paths.append(path)
1024 if not paths:
1025 return None
1026 source_text = "Recent shell output or durable records listed candidate file paths"
1027 if recent_paths and not durable_paths:
1028 source_text = "Recent shell output listed candidate file paths"
1029 elif durable_paths and not recent_paths:
1030 source_text = "Durable records mention candidate file paths"
1031 return {
1032 "paths": _rank_candidate_file_paths(job, task_text, paths, recent_steps=recent_steps),
1033 "invalid_paths": _invalid_candidate_file_paths(paths, recent_steps),
1034 "source_text": source_text,
1035 "task_text": task_text,
1036 }
1037
1038
1039def _shell_exec_targets_candidate_file(job: dict[str, Any], recent_steps: list[dict[str, Any]], args: dict[str, Any]) -> bool:
1040 command = str(args.get("command") or "")
1041 if not command.strip():
1042 return False
1043 context = _candidate_file_discovery_context(job, recent_steps)
1044 if not context:
1045 return False
1046 command_text = command.replace("\\ ", " ")
1047 return any(path and path in command_text for path in context.get("paths", [])[:12])
1048
1049
1050def _rank_candidate_file_paths(
1051 job: dict[str, Any],
1052 task_text: str,
1053 paths: list[str],
1054 *,
1055 recent_steps: list[dict[str, Any]] | None = None,
1056) -> list[str]:
1057 context_tokens = _candidate_context_tokens(job, task_text)
1058 indexed = list(enumerate(paths))
1059 ranked = sorted(
1060 indexed,
1061 key=lambda item: _candidate_file_path_score(
1062 item[1],
1063 context_tokens,
1064 item[0],
1065 recent_steps=recent_steps,
1066 ),
1067 reverse=True,
1068 )
1069 return [path for _, path in ranked]
1070
1071
1072def _candidate_context_tokens(job: dict[str, Any], task_text: str) -> set[str]:
1073 text = " ".join(str(job.get(key) or "") for key in ("title", "objective", "kind")) + " " + task_text
1074 tokens = set()
1075 for token in re.findall(r"[A-Za-z0-9][A-Za-z0-9._-]{2,}", text.lower()):
1076 cleaned = token.strip("._-")
1077 if not cleaned or cleaned in QUERY_STOPWORDS or cleaned in TEXT_TOKEN_STOPWORDS:
1078 continue
1079 tokens.add(cleaned)
1080 for part in re.split(r"[._-]+", cleaned):
1081 if len(part) >= 3 and part not in QUERY_STOPWORDS and part not in TEXT_TOKEN_STOPWORDS:
1082 tokens.add(part)
1083 return tokens
1084
1085
1086def _candidate_file_path_score(
1087 path: str,
1088 context_tokens: set[str],
1089 original_index: int,
1090 *,
1091 recent_steps: list[dict[str, Any]] | None = None,
1092) -> float:
1093 lowered_path = path.lower()
1094 name = Path(path).name.lower()
1095 stem = Path(name).stem.lower()
1096 path_tokens = set()
1097 for token in re.findall(r"[a-z0-9][a-z0-9._-]{1,}", lowered_path):
1098 path_tokens.add(token.strip("._-"))
1099 path_tokens.update(part for part in re.split(r"[._-]+", token) if len(part) >= 2)
1100 score = 0.0
1101 matches = context_tokens & {token for token in path_tokens if token}
1102 score += len(matches) * 8.0
1103 if any(token and token in stem for token in context_tokens):
1104 score += 6.0
1105 if "/" in path:
1106 score += min(path.count("/"), 8) * 0.15
1107 auxiliary_markers = (
1108 "vocab",
1109 "tokenizer",
1110 "tokeniser",
1111 "mmproj",
1112 "adapter",
1113 "config",
1114 "readme",
1115 "license",
1116 "metadata",
1117 "sample",
1118 "example",
1119 "stub",
1120 )
1121 if any(marker in name for marker in auxiliary_markers):
1122 score -= 18.0
1123 if name.startswith("."):
1124 score -= 20.0
1125 suffix = Path(name).suffix.lower()
1126 if suffix:
1127 score += 1.0
1128 score += _candidate_file_observation_score(path, recent_steps or [])
1129 score -= original_index * 0.01
1130 return score
1131
1132
1133def _invalid_candidate_file_paths(paths: list[str], recent_steps: list[dict[str, Any]]) -> list[str]:
1134 invalid: list[str] = []
1135 for path in paths:
1136 if _candidate_file_observation_score(path, recent_steps) <= -30:
1137 invalid.append(path)
1138 return invalid
1139
1140
1141def _candidate_file_observation_score(path: str, recent_steps: list[dict[str, Any]], *, window: int = 12) -> float:
1142 if not path:
1143 return 0.0
1144 path_key = path.lower()
1145 score = 0.0
1146 for step in _completed_or_failed_recent_steps(recent_steps)[-window:]:
1147 if step.get("tool_name") != "shell_exec":
1148 continue
1149 output = step.get("output") if isinstance(step.get("output"), dict) else {}
1150 text = "\n".join(str(output.get(key) or "") for key in ("stdout", "stderr", "error"))
1151 for line in text.splitlines():
1152 lowered = line.lower()
1153 if path_key not in lowered:
1154 continue
1155 if _shell_line_reports_missing_candidate(line):
1156 score -= 70.0
1157 if any(marker in lowered for marker in ("ascii text", "html document", "json data", "with no line terminators")):
1158 score -= 45.0
1159 score += _candidate_file_size_score_from_line(line)
1160 return score
1161
1162
1163def _candidate_file_recently_validated(path: str, recent_steps: list[dict[str, Any]], *, window: int = 12) -> str:
1164 if not path or _candidate_file_observation_score(path, recent_steps, window=window) < 30:
1165 return ""
1166 path_key = path.lower()
1167 evidence_lines: list[str] = []
1168 for step in _completed_or_failed_recent_steps(recent_steps)[-window:]:
1169 if step.get("tool_name") != "shell_exec":
1170 continue
1171 output = step.get("output") if isinstance(step.get("output"), dict) else {}
1172 text = "\n".join(str(output.get(key) or "") for key in ("stdout", "stderr", "error"))
1173 for line in text.splitlines():
1174 if path_key not in line.lower():
1175 continue
1176 if _shell_line_reports_missing_candidate(line):
1177 continue
1178 evidence_lines.append(" ".join(line.split()))
1179 if len(evidence_lines) >= 3:
1180 return " | ".join(evidence_lines)
1181 return "recent shell evidence showed a non-trivial candidate file size or positive file metadata"
1182
1183
1184def _candidate_file_size_score_from_line(line: str) -> float:
1185 lowered = str(line or "").lower()
1186 if re.search(r"\b\d+(?:\.\d+)?\s*(?:t|tb|tib|g|gb|gib)\b", lowered):
1187 return 55.0
1188 if re.search(r"\b\d+(?:\.\d+)?\s*(?:m|mb|mib)\b", lowered):
1189 return 18.0
1190 integers = [int(match) for match in re.findall(r"(?<![\w.])\d{1,15}(?![\w.])", lowered)]
1191 if any(value >= 1_000_000_000 for value in integers):
1192 return 55.0
1193 if any(value >= 1_000_000 for value in integers):
1194 return 18.0
1195 return 0.0
1196
1197
1198def _open_file_dependent_task_text(job: dict[str, Any]) -> str:
1199 tasks = _metadata_list(job, "task_queue")
1200 parts: list[str] = []
1201 for task in tasks:
1202 if not isinstance(task, dict):
1203 continue
1204 status = str(task.get("status") or "open").lower()
1205 if status not in {"open", "active", "waiting", "blocked"}:
1206 continue
1207 text = " ".join(
1208 str(task.get(key) or "")
1209 for key in ("title", "description", "acceptance_criteria", "evidence_needed", "stall_behavior", "contract")
1210 )
1211 lowered = text.lower()
1212 if any(term in lowered for term in ("file", "path", "download", "artifact", "validate", "benchmark", "script", "config")):
1213 parts.append(" ".join(text.split()))
1214 if len(parts) >= 4:
1215 break
1216 return " | ".join(parts)
1217
1218
1219def _candidate_file_paths_from_recent_shell(
1220 recent_steps: list[dict[str, Any]], *, window: int = 8, max_paths: int = 80
1221) -> list[str]:
1222 paths: list[str] = []
1223 seen: set[str] = set()
1224 for step in _completed_or_failed_recent_steps(recent_steps)[-window:]:
1225 if step.get("tool_name") != "shell_exec":
1226 continue
1227 output = step.get("output") if isinstance(step.get("output"), dict) else {}
1228 text = "\n".join(str(output.get(key) or "") for key in ("stdout", "stderr", "error"))
1229 for path in _extract_candidate_file_paths(text):
1230 key = path.lower()
1231 if key in seen:
1232 continue
1233 seen.add(key)
1234 paths.append(path)
1235 if len(paths) >= max_paths:
1236 return paths
1237 return paths
1238
1239
1240def _candidate_file_paths_from_recent_grounding_blocks(
1241 recent_steps: list[dict[str, Any]], *, window: int = 8, max_paths: int = 80
1242) -> list[str]:
1243 paths: list[str] = []
1244 seen: set[str] = set()
1245 for step in recent_steps[-window:]:
1246 output = step.get("output") if isinstance(step.get("output"), dict) else {}
1247 grounding = output.get("evidence_grounding") if isinstance(output.get("evidence_grounding"), dict) else {}
1248 candidates = grounding.get("missing_candidate_paths")
1249 if not isinstance(candidates, list):
1250 continue
1251 for candidate in candidates:
1252 path = _clean_candidate_file_path(str(candidate or ""))
1253 if not _looks_like_exact_candidate_file_path(path):
1254 continue
1255 key = path.lower()
1256 if key in seen:
1257 continue
1258 seen.add(key)
1259 paths.append(path)
1260 if len(paths) >= max_paths:
1261 return paths
1262 return paths
1263
1264
1265def _candidate_file_paths_from_durable_records(
1266 job: dict[str, Any], *, max_records: int = 80, max_paths: int = 80
1267) -> list[str]:
1268 metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
1269 paths: list[str] = []
1270 seen: set[str] = set()
1271 record_groups = [
1272 _metadata_list(job, "experiment_ledger"),
1273 _metadata_list(job, "finding_ledger"),
1274 _metadata_list(job, "lessons"),
1275 _metadata_list(job, "source_ledger"),
1276 _metadata_list(job, "task_queue"),
1277 ]
1278 roadmap = metadata.get("roadmap") if isinstance(metadata.get("roadmap"), dict) else {}
1279 milestones = roadmap.get("milestones") if isinstance(roadmap.get("milestones"), list) else []
1280 record_groups.append([item for item in milestones if isinstance(item, dict)])
1281 checked = 0
1282 for records in record_groups:
1283 for record in reversed(records[-max_records:]):
1284 if not isinstance(record, dict):
1285 continue
1286 checked += 1
1287 try:
1288 text = json.dumps(record, ensure_ascii=False, sort_keys=True)
1289 except (TypeError, ValueError):
1290 text = str(record)
1291 for path in _extract_candidate_file_paths(text):
1292 key = path.lower()
1293 if key in seen:
1294 continue
1295 seen.add(key)
1296 paths.append(path)
1297 if len(paths) >= max_paths:
1298 return paths
1299 if checked >= max_records * len(record_groups):
1300 return paths
1301 return paths
1302
1303
1304def _extract_candidate_file_paths(text: str) -> list[str]:
1305 paths: list[str] = []
1306 for match in re.finditer(r"(?<![A-Za-z0-9])(?:~|/)[^\s'\"<>|;&]{2,}", text or ""):
1307 raw = _clean_candidate_file_path(match.group(0))
1308 if not _looks_like_exact_candidate_file_path(raw):
1309 continue
1310 paths.append(raw)
1311 for match in re.finditer(r'"path"\s*:\s*"([^"]+\.[A-Za-z0-9][A-Za-z0-9_-]{1,12})"', text or ""):
1312 raw = _clean_candidate_file_path(match.group(1))
1313 if not _looks_like_exact_candidate_file_path(raw, allow_relative=True):
1314 continue
1315 paths.append(raw)
1316 return paths
1317
1318
1319def _looks_like_exact_candidate_file_path(value: str, *, allow_relative: bool = False) -> bool:
1320 raw = str(value or "").strip()
1321 if not raw or len(raw) > 500:
1322 return False
1323 if "://" in raw or raw.startswith("//") or "..." in raw or "…" in raw or "*" in raw:
1324 return False
1325 if not allow_relative and not raw.startswith(("/", "~")):
1326 return False
1327 name = Path(raw).name
1328 if not name or name.startswith("."):
1329 return False
1330 suffix = Path(name).suffix
1331 if not suffix or not re.match(r"^\.[A-Za-z0-9][A-Za-z0-9_]{1,12}$", suffix) or not any(ch.isalpha() for ch in suffix):
1332 return False
1333 return True
1334
1335
1336def _clean_candidate_file_path(value: str) -> str:
1337 raw = str(value or "").strip().rstrip(".,:;)")
1338 for separator in ("\\n", "\\r", "\\t", "\n", "\r", "\t"):
1339 raw = raw.split(separator, 1)[0]
1340 return raw.strip().rstrip(".,:;)")
1341
1342
1343def _progress_accounting_for_prompt(recent_steps: list[dict[str, Any]]) -> str:
1344 context = _artifact_accounting_context(recent_steps)
1345 if not context:
1346 return "None."
1347 return (
1348 "Recent saved outputs need accounting before more output/research. "
1349 f"artifact_count={context.get('artifact_count')} since_step={context.get('since_step')} "
1350 f"artifact_titles={'; '.join(str(title) for title in context.get('artifact_titles', [])[:4])}. "
1351 "Next use record_tasks or record_roadmap to mark progress/reopen branches, "
1352 "record_findings or record_source for reusable evidence, record_experiment for measured results, "
1353 "record_milestone_validation for milestone checks, or record_lesson if these outputs are not useful."
1354 )
1355
1356
1357def _activity_stagnation_for_prompt(job: dict[str, Any]) -> str:
1358 context = _activity_stagnation_context(job)
1359 if not context:
1360 return "None."
1361 return (
1362 "Recent checkpoints have reported activity without durable progress. "
1363 f"activity_checkpoint_streak={context.get('streak')} threshold={context.get('threshold')} "
1364 f"last_counts={context.get('counts')}. "
1365 "Next classify the branch with record_findings, record_source, record_experiment, record_tasks, "
1366 "record_roadmap, record_milestone_validation, or record_lesson. If the branch is low-yield, mark it "
1367 "blocked/skipped and pivot before doing more read-only work or saving more outputs."
1368 )
1369
1370
1371def _research_balance_guard_for_prompt(job: dict[str, Any], recent_steps: list[dict[str, Any]]) -> str:
1372 context = _research_balance_context(job, recent_steps)
1373 if not context:
1374 return "None."
1375 return (
1376 "Recent work is execution-heavy but has little source-backed research recorded. "
1377 f"completed_window={context.get('completed_window')} execution_actions={context.get('execution_actions')} "
1378 f"research_actions={context.get('research_actions')} sources={context.get('sources')} findings={context.get('findings')} "
1379 f"experiments={context.get('experiments')} files={context.get('files')}. "
1380 "Before another deep execution/testing loop, use available research, browser, source, documentation, or local-inspection tools "
1381 "to gather evidence and record it with record_source, record_findings, record_lesson, record_tasks, or an artifact. "
1382 "If external research is not relevant or tools are unavailable, explicitly record why and what evidence substitutes for it."
1383 )
1384
1385
1386def _source_yield_guard_for_prompt(job: dict[str, Any], recent_steps: list[dict[str, Any]]) -> str:
1387 context = _source_yield_context(job, recent_steps)
1388 if not context:
1389 return "None."
1390 return (
1391 "Many sources have been gathered without enough durable synthesis. "
1392 f"sources={context.get('sources')} findings={context.get('findings')} "
1393 f"yielded_sources={context.get('yielded_sources')} recent_gathering={context.get('recent_gathering')} "
1394 f"recent_source_titles={'; '.join(str(title) for title in context.get('recent_source_titles', [])[:4])}. "
1395 "Before more search, extraction, browsing, shell work, file/output writing, or report chatter, distill the "
1396 "source set into record_findings with evidence, update record_source with yield/fail outcomes, or update "
1397 "tasks/roadmap/lessons to reject or pivot the low-yield source branch."
1398 )
1399
1400
1401def _task_planning_guard_for_prompt(job: dict[str, Any]) -> str:
1402 context = _task_planning_stagnation_context(job)
1403 if not context:
1404 return "None."
1405 return (
1406 "Recent checkpoints only added or updated tasks without durable evidence, measurements, validations, "
1407 f"or lessons. task_only_checkpoints={context.get('task_only_checkpoints')} "
1408 f"open_tasks={context.get('open_tasks')} total_tasks={context.get('total_tasks')}. "
1409 "Do not create more new open tasks next. Execute, measure, validate, write a checkpoint, mark existing "
1410 "tasks done/blocked/skipped, or record a lesson from the branch."
1411 )
1412
1413
1414def _task_queue_saturation_for_prompt(job: dict[str, Any], recent_steps: list[dict[str, Any]]) -> str:
1415 context = _recent_task_queue_saturation_context(recent_steps)
1416 persistent_pressure = False
1417 if not context:
1418 metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
1419 pressure = metadata.get("task_backlog_pressure") if isinstance(metadata.get("task_backlog_pressure"), dict) else {}
1420 current_pressure = _current_task_backlog_pressure_context(job)
1421 if not pressure and not current_pressure:
1422 return "None."
1423 if current_pressure:
1424 guard_recovery = pressure.get("guard_recovery") if isinstance(pressure.get("guard_recovery"), dict) else {}
1425 task_queue = guard_recovery.get("task_queue") if isinstance(guard_recovery.get("task_queue"), dict) else {}
1426 context = {
1427 "step_no": pressure.get("latest_step_no") or guard_recovery.get("latest_step_no") or "current",
1428 "source": pressure.get("source") or ("guard_recovery" if guard_recovery else "current_queue"),
1429 "reason": pressure.get("reason") or task_queue.get("reason") or current_pressure.get("reason"),
1430 "open_count": current_pressure.get("open_count"),
1431 "total_count": current_pressure.get("total_count"),
1432 "open_titles": current_pressure.get("open_titles") or [],
1433 }
1434 else:
1435 return "None."
1436 persistent_pressure = True
1437 counts = []
1438 if context.get("open_count") is not None:
1439 counts.append(f"open_tasks={context.get('open_count')}")
1440 if context.get("total_count") is not None:
1441 counts.append(f"total_tasks={context.get('total_count')}")
1442 count_text = " ".join(counts) or "queue is saturated"
1443 open_titles = [str(title).strip() for title in context.get("open_titles") or [] if str(title).strip()]
1444 title_text = f" Existing runnable task titles: {json.dumps(open_titles[:8], ensure_ascii=False)}." if open_titles else ""
1445 if context.get("source") == "blocked_record_tasks":
1446 source_label = "record_tasks block"
1447 elif context.get("source") == "current_queue":
1448 source_label = "current queue"
1449 else:
1450 source_label = "guard recovery"
1451 opening = (
1452 f"Task backlog pressure remains active from {source_label} #{context.get('step_no')}: "
1453 if persistent_pressure
1454 else f"Task queue saturation was just hit at step #{context.get('step_no')}: "
1455 )
1456 return (
1457 opening
1458 + f"{context.get('reason') or 'task queue saturated'} ({count_text}). "
1459 f"{title_text} "
1460 "Do not create new task branches. Either execute an existing high-priority branch, "
1461 "or use record_tasks only to update existing task titles to active, done, blocked, or skipped "
1462 "with concise result/evidence. If you have a near-duplicate task, update the closest existing "
1463 "task instead of inventing a fresh title. Consolidate branch sprawl into roadmap/milestones when useful. "
1464 "If this repeats, record_tasks is temporarily withheld so the worker must use a non-planning action."
1465 )
1466
1467
1468def _current_task_backlog_pressure_context(job: dict[str, Any]) -> dict[str, Any] | None:
1469 tasks = _metadata_list(job, "task_queue")
1470 objective_tasks = [task for task in tasks if not _is_guard_recovery_task(task)]
1471 open_tasks = [
1472 task
1473 for task in objective_tasks
1474 if str(task.get("status") or "open").strip().lower().replace(" ", "_") in {"open", "active"}
1475 ]
1476 if len(objective_tasks) <= TASK_QUEUE_TOTAL_SOFT_LIMIT and len(open_tasks) < TASK_QUEUE_SATURATION_OPEN_TASKS:
1477 return None
1478 reason = "total task queue is too large" if len(objective_tasks) > TASK_QUEUE_TOTAL_SOFT_LIMIT else "too many open tasks"
1479 return {
1480 "reason": reason,
1481 "open_count": len(open_tasks),
1482 "total_count": len(objective_tasks),
1483 "open_titles": [
1484 str(task.get("title") or "").strip()
1485 for task in open_tasks[:8]
1486 if str(task.get("title") or "").strip()
1487 ],
1488 }
1489
1490
1491def _memory_consolidation_guard_for_prompt(job: dict[str, Any], recent_steps: list[dict[str, Any]]) -> str:
1492 context = _memory_graph_consolidation_context(job, recent_steps)
1493 if not context:
1494 return "None."
1495 return (
1496 "Durable job memory is growing faster than the connected memory graph. "
1497 f"durable_records={context.get('durable_records')} graph_nodes={context.get('graph_nodes')} "
1498 f"graph_edges={context.get('graph_edges')} reason={context.get('reason')}. "
1499 "Before more branch work, use record_memory_graph to consolidate the most reusable facts, strategies, "
1500 "decisions, questions, skills, constraints, episodes, and evidence links. If there is truly nothing "
1501 "reusable, record a lesson explaining why this branch should not become graph memory."
1502 )
1503
1504
1505def _lesson_consolidation_guard_for_prompt(job: dict[str, Any], recent_steps: list[dict[str, Any]]) -> str:
1506 context = _lesson_sprawl_context(job, recent_steps)
1507 if not context:
1508 return "None."
1509 return (
1510 "Raw lessons are accumulating faster than consolidated memory. "
1511 f"lessons={context.get('lessons')} recent_lessons={context.get('recent_lessons')} "
1512 f"graph_nodes={context.get('graph_nodes')} reason={context.get('reason')}. "
1513 "Do not add another raw lesson next. Consolidate the reusable strategy, mistake, constraint, decision, "
1514 "or question into record_memory_graph, or update existing tasks/roadmap state if the lesson only describes "
1515 "branch status."
1516 )
1517
1518
1519def _memory_graph_consolidation_context(job: dict[str, Any], recent_steps: list[dict[str, Any]]) -> dict[str, Any] | None:
1520 if any(step.get("tool_name") == "record_memory_graph" and step.get("status") == "completed" for step in recent_steps[-8:]):
1521 return None
1522 graph = memory_graph_from_job(job)
1523 node_count = len(graph["nodes"])
1524 edge_count = len(graph["edges"])
1525 durable_records = _durable_memory_signal_count(job)
1526 if durable_records < 6:
1527 return None
1528 reason = ""
1529 if node_count == 0:
1530 reason = "durable ledgers exist but no graph nodes have been consolidated"
1531 elif durable_records >= 12 and node_count * 5 < durable_records:
1532 reason = "graph is sparse relative to reusable durable records"
1533 elif node_count >= 3 and edge_count == 0 and durable_records >= 10:
1534 reason = "graph nodes exist but have no links"
1535 if not reason:
1536 return None
1537 return {
1538 "durable_records": durable_records,
1539 "graph_nodes": node_count,
1540 "graph_edges": edge_count,
1541 "reason": reason,
1542 }
1543
1544
1545def _lesson_sprawl_context(job: dict[str, Any], recent_steps: list[dict[str, Any]]) -> dict[str, Any] | None:
1546 memory_context = _memory_graph_consolidation_context(job, recent_steps)
1547 if not memory_context:
1548 return None
1549 lessons = _metadata_list(job, "lessons")
1550 lesson_count = len(lessons)
1551 if lesson_count < LESSON_SPRAWL_MIN_LESSONS:
1552 return None
1553 recent_lessons = [
1554 step
1555 for step in recent_steps[-12:]
1556 if step.get("tool_name") == "record_lesson" and str(step.get("status") or "").lower() == "completed"
1557 ]
1558 if len(recent_lessons) < LESSON_SPRAWL_RECENT_LESSONS and lesson_count < LESSON_SPRAWL_MIN_LESSONS * 2:
1559 return None
1560 return {
1561 "lessons": lesson_count,
1562 "recent_lessons": len(recent_lessons),
1563 "graph_nodes": memory_context.get("graph_nodes"),
1564 "graph_edges": memory_context.get("graph_edges"),
1565 "durable_records": memory_context.get("durable_records"),
1566 "reason": "raw lesson backlog needs graph consolidation",
1567 }
1568
1569
1570def _durable_memory_signal_count(job: dict[str, Any]) -> int:
1571 count = (
1572 len(_metadata_list(job, "finding_ledger"))
1573 + len(_metadata_list(job, "source_ledger"))
1574 + len(_metadata_list(job, "experiment_ledger"))
1575 + len(_metadata_list(job, "lessons"))
1576 )
1577 tasks = _metadata_list(job, "task_queue")
1578 count += sum(
1579 1
1580 for task in tasks
1581 if str(task.get("status") or "open").lower() in {"done", "blocked", "skipped"}
1582 and (task.get("result") or task.get("evidence_needed") or task.get("acceptance_criteria"))
1583 )
1584 metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
1585 roadmap = metadata.get("roadmap") if isinstance(metadata.get("roadmap"), dict) else {}
1586 milestones = roadmap.get("milestones") if isinstance(roadmap.get("milestones"), list) else []
1587 count += sum(
1588 1
1589 for milestone in milestones
1590 if isinstance(milestone, dict)
1591 and str(milestone.get("status") or "planned").lower() in {"active", "validating", "done", "blocked", "skipped"}
1592 )
1593 return count
1594
1595
1596def _durable_yield_for_prompt(job: dict[str, Any], recent_steps: list[dict[str, Any]]) -> str:
1597 completed = [step for step in recent_steps if step.get("status") == "completed"]
1598 if len(completed) < 20:
1599 return "None."
1600 durable_tools = LEDGER_PROGRESS_TOOLS | {"write_artifact", "write_file"}
1601 durable_indexes = [
1602 index
1603 for index, step in enumerate(completed)
1604 if step.get("tool_name") in durable_tools
1605 ]
1606 metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
1607 durable_records = (
1608 len(_metadata_list(job, "finding_ledger"))
1609 + len(_metadata_list(job, "source_ledger"))
1610 + len(_metadata_list(job, "experiment_ledger"))
1611 + len(_metadata_list(job, "lessons"))
1612 )
1613 roadmap = metadata.get("roadmap") if isinstance(metadata.get("roadmap"), dict) else {}
1614 milestones = roadmap.get("milestones") if isinstance(roadmap.get("milestones"), list) else []
1615 durable_records += len(milestones)
1616 if not durable_indexes and durable_records <= 0:
1617 return (
1618 f"No durable progress records after {len(completed)} completed actions. "
1619 "Next action should save an output, record findings/source/experiment/lesson/roadmap progress, "
1620 "or mark the branch blocked/skipped before more read-only work."
1621 )
1622 last_durable_index = durable_indexes[-1] if durable_indexes else -1
1623 actions_since = len(completed) - last_durable_index - 1
1624 durable_steps = len(durable_indexes)
1625 actions_per_durable = len(completed) / max(1, durable_steps + durable_records)
1626 if actions_since < 25 and actions_per_durable < 30:
1627 return "None."
1628 return (
1629 f"Durable yield is low: completed_actions={len(completed)} durable_steps={durable_steps} "
1630 f"durable_records={durable_records} actions_since_last_durable={actions_since} "
1631 f"actions_per_durable~{actions_per_durable:.1f}. "
1632 "Prefer a concrete checkpoint next: write/save output, record measured or reusable evidence, validate a milestone, "
1633 "or reject/pivot the branch with a lesson."
1634 )
1635
1636
1637def _reflections_for_prompt(job: dict[str, Any]) -> str:
1638 reflections = _metadata_list(job, "reflections")
1639 if not reflections:
1640 return "No reflection checkpoints yet."
1641 lines = []
1642 for reflection in reflections[-2:]:
1643 strategy = f" strategy={reflection.get('strategy')}" if reflection.get("strategy") else ""
1644 lines.append("- " + _clip_text(f"{reflection.get('summary')}{strategy}", 520))
1645 return "\n".join(lines)
1646
1647
1648def _next_action_constraint(job: dict[str, Any], recent_steps: list[dict[str, Any]]) -> str:
1649 measurement_obligation = _pending_measurement_obligation(job)
1650 if measurement_obligation:
1651 return (
1652 "A pending measurement obligation is active from "
1653 f"step #{measurement_obligation.get('source_step_no') or '?'}. "
1654 "Resolve it with record_experiment, record_lesson explaining why it is invalid, "
1655 "or record_tasks creating the missing measurement branch before more research/artifact churn."
1656 )
1657 file_validation = _pending_file_validation_obligation(job)
1658 if file_validation:
1659 return (
1660 "A recently written file needs validation before more branch work. "
1661 f"File: {_clip_text(str(file_validation.get('path') or ''), 260)}. "
1662 f"Suggested validation: {_clip_text(str(file_validation.get('suggested_validation') or ''), 360)}. "
1663 "Use shell_exec to validate it, or record_tasks/record_lesson/record_experiment if validation is blocked or deferred."
1664 )
1665 artifact_accounting = _artifact_accounting_context(recent_steps)
1666 if artifact_accounting:
1667 return (
1668 "Recent saved outputs need durable accounting. Before more artifact writing, reading, research, browsing, "
1669 "or shell work, use record_tasks, record_roadmap, record_milestone_validation, record_findings, record_source, record_experiment, or record_lesson "
1670 "to explain what changed and what branch is next."
1671 )
1672 checkpoint_accounting = _auto_checkpoint_accounting_context(job, recent_steps)
1673 if checkpoint_accounting:
1674 if not checkpoint_accounting.get("checkpoint_read"):
1675 return (
1676 "An auto-saved evidence checkpoint is pending. Read that specific checkpoint artifact, or use a durable "
1677 "ledger tool to account for the checkpoint from existing evidence before more branch work."
1678 )
1679 return (
1680 "An already-read evidence checkpoint is pending durable accounting. Use record_findings, record_source, "
1681 "record_experiment, record_tasks, record_roadmap, record_milestone_validation, or record_lesson before "
1682 "more shell, search, file, report, or artifact work."
1683 )
1684 grounding_block = _latest_evidence_grounding_block(recent_steps)
1685 if grounding_block:
1686 raw_missing_paths = grounding_block.get("missing_candidate_paths") if isinstance(grounding_block.get("missing_candidate_paths"), list) else []
1687 missing_paths = [
1688 path
1689 for path in (_clean_candidate_file_path(str(item or "")) for item in raw_missing_paths)
1690 if _looks_like_exact_candidate_file_path(path)
1691 ]
1692 path_text = "; ".join(str(path) for path in missing_paths[:6])
1693 detail = f" Missing exact paths: {path_text}." if path_text else ""
1694 candidate_files = _candidate_file_discovery_context(job, recent_steps)
1695 if candidate_files:
1696 paths = candidate_files.get("paths") if isinstance(candidate_files.get("paths"), list) else []
1697 current_path_text = "; ".join(str(path) for path in paths[:4])
1698 if current_path_text:
1699 return (
1700 "Recent durable-record grounding failed, but current ranked candidate paths are available. "
1701 "Treat the failed record as rejected for now and validate the highest-confidence candidate next "
1702 "instead of repairing stale wording. "
1703 f"Candidate paths: {_clip_text(current_path_text, 520)}."
1704 )
1705 return (
1706 "Recent evidence grounding blocked a durable record. Next, rewrite the record using only observed evidence, "
1707 "include the exact observed paths/tokens when claiming candidates or files, or explicitly record why they "
1708 f"are irrelevant/invalid.{detail}"
1709 )
1710 action_failure = _experiment_next_action_failure_context(job, recent_steps)
1711 if action_failure:
1712 return (
1713 "The latest experiment next action was attempted, but the observed shell output reports a missing "
1714 f"command/path/prerequisite at step #{action_failure.get('step_no') or '?'}. "
1715 f"Observed output: {_clip_text(str(action_failure.get('excerpt') or ''), 260)}. "
1716 "Next, account for this attempted action with record_experiment, record_tasks, or record_lesson: "
1717 "mark the branch failed/blocked or create the concrete recovery branch. Do not run more read-only probes "
1718 "until the failed action is durable."
1719 )
1720 measured_guard = _measured_progress_guard_context(job, recent_steps)
1721 if measured_guard:
1722 return (
1723 "This job needs measured progress, not more research-only activity. "
1724 "Do one of: run a small measuring command/action, call record_experiment for a known measurement, "
1725 "record_tasks with an experiment/action/monitor contract, or record_lesson if measurement is blocked."
1726 )
1727 activity_stagnation = _activity_stagnation_context(job)
1728 if activity_stagnation:
1729 return (
1730 "Recent checkpoints show activity without durable progress. "
1731 "Use a ledger or planning tool to classify what changed, reject the low-yield branch, or open a better branch "
1732 "before more read-only work or output churn."
1733 )
1734 task_planning_guard = _task_planning_stagnation_context(job)
1735 if task_planning_guard:
1736 return (
1737 "Recent progress is only task planning. Do not create more new open tasks next. Execute an existing task, "
1738 "record evidence/measurements/validation, write a checkpoint, mark tasks done/blocked/skipped, or record "
1739 "a lesson before expanding the queue again."
1740 )
1741 task_queue_saturation = _recent_task_queue_saturation_context(recent_steps)
1742 if task_queue_saturation:
1743 return (
1744 "The durable task queue is saturated. Do not create new task branches. Execute a current task, "
1745 "or use record_tasks only to update existing task titles to active/done/blocked/skipped with evidence."
1746 )
1747 memory_consolidation = _memory_graph_consolidation_context(job, recent_steps)
1748 if memory_consolidation:
1749 return (
1750 "Consolidate durable progress into the job memory graph before more branch work. "
1751 "Use record_memory_graph for connected reusable knowledge, or record_lesson if the recent branch has no "
1752 "reusable memory value."
1753 )
1754 deliverable_guard = _deliverable_progress_guard_context(job, recent_steps)
1755 if deliverable_guard:
1756 return (
1757 "This job needs a durable deliverable checkpoint, not more background collection. "
1758 "Use write_file or write_artifact to save a partial draft/report/file, or use record_tasks, "
1759 "record_roadmap, record_milestone_validation, or record_lesson to explain the specific blocker "
1760 "and the next deliverable branch."
1761 )
1762 research_balance = _research_balance_context(job, recent_steps)
1763 if research_balance:
1764 return (
1765 "Balance execution with research before the next deep action loop. "
1766 "Gather source-backed evidence with available web/browser/documentation/local-inspection tools and record it, "
1767 "or record why research is not applicable and what evidence replaces it."
1768 )
1769 candidate_files = _candidate_file_discovery_context(job, recent_steps)
1770 if candidate_files:
1771 paths = candidate_files.get("paths") if isinstance(candidate_files.get("paths"), list) else []
1772 path_text = "; ".join(str(path) for path in paths[:4])
1773 primary_path = str(paths[0]) if paths else ""
1774 validation = _candidate_file_recently_validated(primary_path, recent_steps)
1775 if validation:
1776 return (
1777 "A highest-confidence candidate file path has recent positive validation evidence. "
1778 "Use it in the next bounded action or measurement instead of repeating existence checks. "
1779 f"Candidate path: {_clip_text(primary_path, 420)}. Evidence: {_clip_text(validation, 420)}."
1780 )
1781 return (
1782 "Concrete candidate file paths are available while file/path-dependent work is open. "
1783 f"Validate likely candidates next with shell_exec before retrying downloads, searching for alternatives, "
1784 f"or recording no-file/no-progress claims. Candidate paths: {_clip_text(path_text, 520)}."
1785 )
1786 experiment_next_action = _latest_experiment_next_action_context(job)
1787 if experiment_next_action:
1788 return (
1789 "The latest measured experiment selected a concrete next action. "
1790 f"Next action: {_clip_text(experiment_next_action.get('next_action') or '', 520)}. "
1791 "Act on it with the appropriate tool, or use record_tasks/record_lesson if it is invalid or blocked. "
1792 "Do not bury it under more checkpoints or unrelated research."
1793 )
1794 milestone_validation = _milestone_validation_needed(job)
1795 if milestone_validation:
1796 return (
1797 f"Roadmap milestone '{milestone_validation.get('title')}' is ready for validation or is marked validating. "
1798 "Use record_milestone_validation with evidence and pass/fail/blocker status, then create follow-up tasks for gaps."
1799 )
1800 roadmap_staleness = _roadmap_staleness_context(job, recent_steps)
1801 if roadmap_staleness:
1802 return (
1803 "The roadmap has not advanced despite durable task/artifact activity. "
1804 "Use record_roadmap to mark the current milestone active/done/blocked, or record_milestone_validation "
1805 "if acceptance criteria can be judged from existing evidence, before more branch work."
1806 )
1807 if _roadmap_missing_for_broad_job(job):
1808 return (
1809 "The objective is broad enough to benefit from roadmap control. Use record_roadmap to define compact milestones, "
1810 "features, acceptance criteria, and validation checkpoints before expanding the task queue further."
1811 )
1812 evidence_step = _unpersisted_evidence_step(recent_steps)
1813 if evidence_step:
1814 return (
1815 f"You have unsaved evidence from step #{evidence_step['step_no']} "
1816 f"({evidence_step.get('tool_name') or evidence_step['kind']}). "
1817 "Your next tool call should usually be write_artifact. If this evidence taught a durable rule, record_lesson after saving it."
1818 )
1819 if _task_queue_exhausted(job):
1820 return (
1821 "All durable task branches are done, skipped, or blocked. Before more research or execution, "
1822 "use record_tasks to open the next concrete branch, or report_update if the operator needs a checkpoint."
1823 )
1824 for step in reversed(recent_steps[-5:]):
1825 if step.get("status") == "failed" and step.get("tool_name") == "read_artifact":
1826 output = step.get("output") if isinstance(step.get("output"), dict) else {}
1827 if "artifact not found" in str(output.get("error") or step.get("summary") or "").lower():
1828 return (
1829 "The last artifact read used a reference that does not exist. Do not invent or retry artifact ids. "
1830 "Use a valid recent artifact ref, call search_artifacts with a concrete query, or continue from "
1831 "already observed evidence with a durable record."
1832 )
1833 error = str(step.get("error") or "")
1834 if error == "artifact required before more research":
1835 return "The last blocked action needs write_artifact, not another search or browser action."
1836 if error == "task branch required before more work":
1837 return "Create or reopen a task branch with record_tasks before doing more research or execution."
1838 if error in {"duplicate tool call blocked", "similar search query blocked", "search loop blocked"}:
1839 output = step.get("output") if isinstance(step.get("output"), dict) else {}
1840 blocked_tool = str(output.get("blocked_tool") or "")
1841 if blocked_tool == "read_artifact":
1842 return "Do not read the same artifact again. Use its content to choose a concrete next action: inspect a specific item, record findings/tasks, or write a report artifact."
1843 if blocked_tool == "shell_exec":
1844 return "Do not rerun the same shell discovery command. Use the prior output to inspect a specific file/item, save it, or update findings/tasks."
1845 return "Change source, extract an existing result, save an artifact, or record a lesson about the failed strategy."
1846 return "No special constraint beyond taking one bounded useful action."
1847
1848
1849def _latest_evidence_grounding_block(recent_steps: list[dict[str, Any]]) -> dict[str, Any] | None:
1850 resolution_after_block = False
1851 for step in reversed(recent_steps[-8:]):
1852 if (
1853 step.get("status") == "completed"
1854 and step.get("tool_name") in EVIDENCE_GROUNDING_RESOLUTION_TOOLS
1855 ):
1856 resolution_after_block = True
1857 continue
1858 if step.get("status") != "blocked":
1859 continue
1860 output = step.get("output") if isinstance(step.get("output"), dict) else {}
1861 if output.get("error") != "evidence grounding required":
1862 continue
1863 if resolution_after_block:
1864 return None
1865 grounding = output.get("evidence_grounding") if isinstance(output.get("evidence_grounding"), dict) else {}
1866 return grounding or {"unsupported_tokens": []}
1867 return None
1868
1869
1870def _milestone_validation_needed(job: dict[str, Any]) -> dict[str, Any] | None:
1871 metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
1872 roadmap = metadata.get("roadmap") if isinstance(metadata.get("roadmap"), dict) else {}
1873 milestones = roadmap.get("milestones") if isinstance(roadmap.get("milestones"), list) else []
1874 for milestone in milestones:
1875 if not isinstance(milestone, dict):
1876 continue
1877 status = str(milestone.get("status") or "planned")
1878 validation_status = str(milestone.get("validation_status") or "not_started")
1879 if status == "validating" or validation_status == "pending":
1880 return milestone
1881 features = milestone.get("features") if isinstance(milestone.get("features"), list) else []
1882 if status == "active" and features and all(
1883 isinstance(feature, dict) and str(feature.get("status") or "planned") in {"done", "skipped"}
1884 for feature in features
1885 ):
1886 return milestone
1887 return None
1888
1889
1890def _tool_call_matches_pending_milestone_need(tool_name: str, args: dict[str, Any], milestone: dict[str, Any]) -> bool:
1891 if str(milestone.get("validation_status") or "").strip().lower() != "pending":
1892 return False
1893 if tool_name not in BRANCH_WORK_TOOLS:
1894 return False
1895 return _text_matches_pending_milestone_need(_json_value_text(args), milestone)
1896
1897
1898def _text_matches_pending_milestone_need(text: str, milestone: dict[str, Any]) -> bool:
1899 parts = [
1900 str(milestone.get("title") or ""),
1901 str(milestone.get("next_action") or ""),
1902 str(milestone.get("acceptance_criteria") or ""),
1903 str(milestone.get("evidence_needed") or ""),
1904 str(milestone.get("validation_evidence") or ""),
1905 str(milestone.get("validation_result") or ""),
1906 " ".join(str(item) for item in milestone.get("validation_issues") or [] if item),
1907 ]
1908 features = milestone.get("features") if isinstance(milestone.get("features"), list) else []
1909 for feature in features:
1910 if not isinstance(feature, dict):
1911 continue
1912 parts.extend([
1913 str(feature.get("title") or ""),
1914 str(feature.get("goal") or ""),
1915 str(feature.get("acceptance_criteria") or ""),
1916 str(feature.get("evidence_needed") or ""),
1917 ])
1918 need_tokens = _substantive_next_action_tokens(" ".join(parts)) - MILESTONE_MATCH_STOPWORDS
1919 if not need_tokens:
1920 return False
1921 call_tokens = _substantive_next_action_tokens(text) - MILESTONE_MATCH_STOPWORDS
1922 if not call_tokens:
1923 return False
1924 return bool(need_tokens & call_tokens)
1925
1926
1927def _milestone_validation_call_matches_current(args: dict[str, Any], milestone: dict[str, Any]) -> bool:
1928 requested = _norm_task_key("", str(args.get("milestone") or args.get("title") or ""))
1929 if not requested:
1930 return False
1931 candidates = [
1932 _norm_task_key("", str(milestone.get("title") or "")),
1933 _norm_task_key("", str(milestone.get("key") or "")),
1934 ]
1935 for candidate in candidates:
1936 if not candidate:
1937 continue
1938 if requested == candidate or requested in candidate or candidate in requested:
1939 return True
1940 return False
1941
1942
1943def _normalize_milestone_validation_args_for_active_gate(
1944 tool_name: str,
1945 args: dict[str, Any],
1946 job: dict[str, Any],
1947) -> dict[str, Any]:
1948 if tool_name != "record_milestone_validation":
1949 return args
1950 milestone = _milestone_validation_needed(job)
1951 if not milestone or _milestone_validation_call_matches_current(args, milestone):
1952 return args
1953 if not _text_matches_pending_milestone_need(_json_value_text(args), milestone):
1954 return args
1955 normalized = dict(args)
1956 normalized["milestone"] = str(milestone.get("title") or args.get("milestone") or "")
1957 metadata = normalized.get("metadata") if isinstance(normalized.get("metadata"), dict) else {}
1958 normalized["metadata"] = {
1959 **metadata,
1960 "normalized_from_milestone": str(args.get("milestone") or ""),
1961 "normalized_to_active_gate": True,
1962 }
1963 return normalized
1964
1965
1966def _latest_experiment_next_action_context(job: dict[str, Any]) -> dict[str, Any] | None:
1967 experiments = _metadata_list(job, "experiment_ledger")
1968 for experiment in reversed(experiments):
1969 if not isinstance(experiment, dict):
1970 continue
1971 status = str(experiment.get("status") or "").strip().lower()
1972 next_action = str(experiment.get("next_action") or "").strip()
1973 if not next_action:
1974 continue
1975 if status in {"measured", "failed", "blocked"} or experiment.get("metric_value") is not None:
1976 return {
1977 "title": experiment.get("title"),
1978 "status": status,
1979 "metric_name": experiment.get("metric_name"),
1980 "metric_value": experiment.get("metric_value"),
1981 "next_action": next_action,
1982 }
1983 return None
1984
1985
1986def _experiment_next_action_requires_delivery(context: dict[str, Any] | None) -> bool:
1987 if not context:
1988 return False
1989 next_action = str(context.get("next_action") or "").lower()
1990 if not next_action:
1991 return False
1992 tokens = set(re.findall(r"[a-z][a-z0-9_-]+", next_action))
1993 if not tokens & EXPERIMENT_DELIVERY_ACTION_TERMS:
1994 return False
1995 return not bool(tokens & EXPERIMENT_INFORMATION_ACTION_TERMS)
1996
1997
1998def _experiment_next_action_failure_context(job: dict[str, Any], recent_steps: list[dict[str, Any]], *, window: int = 8) -> dict[str, Any] | None:
1999 context = _latest_experiment_next_action_context(job)
2000 if not _experiment_next_action_requires_delivery(context):
2001 return None
2002 latest_experiment_step_no = max(
2003 (
2004 _as_int(step.get("step_no"))
2005 for step in recent_steps
2006 if step.get("tool_name") == "record_experiment" and step.get("status") == "completed"
2007 ),
2008 default=0,
2009 )
2010 next_action = str(context.get("next_action") or "") if context else ""
2011 for step in reversed(_completed_or_failed_recent_steps(recent_steps)[-window:]):
2012 if step.get("tool_name") != "shell_exec":
2013 continue
2014 if latest_experiment_step_no and _as_int(step.get("step_no")) <= latest_experiment_step_no:
2015 continue
2016 text = _shell_step_failure_text(step)
2017 if not text.strip() or not _shell_output_has_missing_command(text):
2018 continue
2019 command = _step_command(step)
2020 if not _shell_command_matches_next_action(command, next_action):
2021 continue
2022 return {
2023 "step_no": step.get("step_no"),
2024 "command": command,
2025 "excerpt": text.strip(),
2026 "missing_commands": _missing_commands_from_shell_output(text),
2027 "missing_paths": _missing_paths_from_shell_output(text),
2028 "experiment_next_action": context,
2029 }
2030 return None
2031
2032
2033def _shell_command_looks_like_write(command: str) -> bool:
2034 text = command.strip()
2035 if not text:
2036 return False
2037 if re.match(r"(?is)^curl\b", text):
2038 download_flags = (
2039 r"(?:^|\s)(?:-o\s*\S+|-O\b|--output(?:=|\s+)\S+|--remote-name\b|--output-dir(?:=|\s+)\S+)"
2040 )
2041 if re.search(download_flags, text):
2042 return True
2043 if re.match(r"(?is)^(?:wget|aria2c)\b", text):
2044 return True
2045 write_patterns = [
2046 r"(?<!\d)>>?\s*[^&]",
2047 r"\b1>>?\s*[^&]",
2048 r"\btee\b",
2049 r"\bcat\s+>\b",
2050 r"\bpython[0-9.]*\b.*\bwrite_text\b",
2051 r"\bpython[0-9.]*\b.*\bopen\([^)]*,\s*['\"]w",
2052 r"\bsed\s+-i\b",
2053 ]
2054 return any(re.search(pattern, text, flags=re.IGNORECASE | re.DOTALL) for pattern in write_patterns)
2055
2056
2057def _shell_command_looks_read_only(command: str) -> bool:
2058 text = command.strip()
2059 if not text:
2060 return False
2061 if _shell_command_looks_like_write(text):
2062 return False
2063 if READ_ONLY_SHELL_COMMAND_PATTERN.search(text):
2064 return True
2065 if re.match(r"(?is)^curl\b", text):
2066 mutating_flags = r"\b-X\s*(?:POST|PUT|PATCH|DELETE)\b|--request\s+(?:POST|PUT|PATCH|DELETE)\b|(?:^|\s)(?:-d|--data|--form|-F|-T|--upload-file)\b"
2067 return not bool(re.search(mutating_flags, text))
2068 return False
2069
2070
2071def _shell_command_supports_experiment_next_action(command: str, context: dict[str, Any] | None) -> bool:
2072 if not context:
2073 return False
2074 text = command.strip()
2075 if not text or not EXPERIMENT_NEXT_ACTION_VERIFY_SHELL_PATTERN.search(text):
2076 return False
2077 next_action = str(context.get("next_action") or "")
2078 if not next_action.strip():
2079 return False
2080 action_tokens = _substantive_next_action_tokens(next_action)
2081 if not action_tokens:
2082 return False
2083 command_tokens = _substantive_next_action_tokens(text)
2084 return bool(action_tokens & command_tokens)
2085
2086
2087def _shell_command_matches_next_action(command: str, next_action: str) -> bool:
2088 if not command.strip() or not next_action.strip():
2089 return False
2090 action_tokens = _substantive_next_action_tokens(next_action)
2091 command_tokens = _substantive_next_action_tokens(command)
2092 return bool(action_tokens & command_tokens)
2093
2094
2095def _substantive_next_action_tokens(text: str) -> set[str]:
2096 tokens = set()
2097 for token in re.findall(r"[a-z0-9][a-z0-9_.-]{2,}", text.lower()):
2098 token = token.strip("._-")
2099 if len(token) < 3:
2100 continue
2101 if token in TEXT_TOKEN_STOPWORDS or token in EXPERIMENT_NEXT_ACTION_VERIFY_STOPWORDS:
2102 continue
2103 tokens.add(token)
2104 for part in re.split(r"[._/-]+", token):
2105 if len(part) >= 3 and part not in TEXT_TOKEN_STOPWORDS and part not in EXPERIMENT_NEXT_ACTION_VERIFY_STOPWORDS:
2106 tokens.add(part)
2107 return tokens
2108
2109
2110def _roadmap_staleness_context(job: dict[str, Any], recent_steps: list[dict[str, Any]]) -> dict[str, Any] | None:
2111 metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
2112 roadmap = metadata.get("roadmap") if isinstance(metadata.get("roadmap"), dict) else {}
2113 milestones = roadmap.get("milestones") if isinstance(roadmap.get("milestones"), list) else []
2114 if not milestones:
2115 return None
2116 if any(step.get("tool_name") in {"record_roadmap", "record_milestone_validation"} for step in recent_steps):
2117 return None
2118 if any(
2119 isinstance(milestone, dict)
2120 and (
2121 str(milestone.get("status") or "planned") != "planned"
2122 or str(milestone.get("validation_status") or "not_started") != "not_started"
2123 )
2124 for milestone in milestones
2125 ):
2126 return None
2127 tasks = metadata.get("task_queue") if isinstance(metadata.get("task_queue"), list) else []
2128 completed_artifacts = [
2129 step for step in recent_steps
2130 if step.get("status") == "completed" and step.get("tool_name") == "write_artifact"
2131 ]
2132 task_updates = [
2133 step for step in recent_steps
2134 if step.get("status") == "completed" and step.get("tool_name") == "record_tasks"
2135 ]
2136 if len(completed_artifacts) < 2 and len(task_updates) < 2 and len(tasks) < 8:
2137 return None
2138 return {
2139 "title": roadmap.get("title") or "Roadmap",
2140 "status": roadmap.get("status") or "planned",
2141 "milestone_count": len(milestones),
2142 "task_count": len(tasks),
2143 "artifact_count": len(completed_artifacts),
2144 "task_update_count": len(task_updates),
2145 }
2146
2147
2148def _roadmap_missing_for_broad_job(job: dict[str, Any]) -> bool:
2149 metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
2150 if isinstance(metadata.get("roadmap"), dict):
2151 return False
2152 objective = str(job.get("objective") or "")
2153 tasks = metadata.get("task_queue") if isinstance(metadata.get("task_queue"), list) else []
2154 if len(tasks) >= 6:
2155 return True
2156 words = re.findall(r"[A-Za-z0-9_]+", objective)
2157 broad_terms = {"build", "create", "develop", "implement", "research", "improve", "optimize", "migrate", "write", "analyze"}
2158 return len(words) >= 14 and any(term in objective.lower() for term in broad_terms)
2159
2160
2161def _task_queue_exhausted(job: dict[str, Any]) -> bool:
2162 tasks = _metadata_list(job, "task_queue")
2163 if not tasks:
2164 return False
2165 runnable = {"open", "active"}
2166 return not any(str(task.get("status") or "open").strip().lower() in runnable for task in tasks)
2167
2168
2169def _task_queue_saturation_context(job: dict[str, Any], args: dict[str, Any]) -> dict[str, Any] | None:
2170 tasks = _metadata_list(job, "task_queue")
2171 objective_tasks = [task for task in tasks if not _is_guard_recovery_task(task)]
2172 open_tasks = [task for task in objective_tasks if str(task.get("status") or "open").strip().lower() in {"open", "active"}]
2173 incoming = args.get("tasks") if isinstance(args.get("tasks"), list) else []
2174 if not incoming:
2175 return None
2176 existing_keys = {
2177 _norm_task_key(str(task.get("parent") or ""), str(task.get("title") or ""))
2178 for task in tasks
2179 }
2180 semantic_matches = []
2181 new_open_titles = []
2182 new_titles = []
2183 for task in incoming:
2184 if not isinstance(task, dict):
2185 continue
2186 status = str(task.get("status") or "open").strip().lower().replace(" ", "_")
2187 title = str(task.get("title") or task.get("name") or "").strip()
2188 parent = str(task.get("parent") or "")
2189 key = _norm_task_key(parent, title)
2190 matched_existing = key in existing_keys
2191 semantic_match = None
2192 if not matched_existing and (len(objective_tasks) > TASK_QUEUE_TOTAL_SOFT_LIMIT or len(open_tasks) >= TASK_QUEUE_SATURATION_OPEN_TASKS):
2193 semantic_match = find_semantic_task_match(
2194 title=title,
2195 parent=parent,
2196 tasks=[existing for existing in tasks if not _is_guard_recovery_task(existing)],
2197 )
2198 matched_existing = bool(semantic_match)
2199 if semantic_match:
2200 semantic_matches.append({
2201 "title": title,
2202 "matched_title": semantic_match.get("title"),
2203 "score": semantic_match.get("score"),
2204 })
2205 if not matched_existing:
2206 new_titles.append(str(task.get("title") or "").strip())
2207 if status in {"open", "active"} and not matched_existing:
2208 new_open_titles.append(str(task.get("title") or "").strip())
2209 projected_total = len(objective_tasks) + len(new_titles)
2210 projected_open = len(open_tasks) + len(new_open_titles)
2211 if projected_total > TASK_QUEUE_TOTAL_SOFT_LIMIT and new_titles:
2212 return {
2213 "reason": "total task queue is too large",
2214 "total_count": len(objective_tasks),
2215 "projected_total_count": projected_total,
2216 "total_threshold": TASK_QUEUE_TOTAL_SOFT_LIMIT,
2217 "open_count": len(open_tasks),
2218 "open_titles": [
2219 str(task.get("title") or "").strip()
2220 for task in open_tasks[:8]
2221 if str(task.get("title") or "").strip()
2222 ],
2223 "new_count": len(new_titles),
2224 "new_titles": new_titles[:8],
2225 "semantic_matches": semantic_matches[:8],
2226 "recovery_task_count": len(tasks) - len(objective_tasks),
2227 }
2228 if projected_open < TASK_QUEUE_SATURATION_OPEN_TASKS:
2229 return None
2230 if not new_open_titles:
2231 return None
2232 return {
2233 "reason": "too many open tasks",
2234 "open_count": len(open_tasks),
2235 "projected_open_count": projected_open,
2236 "open_threshold": TASK_QUEUE_SATURATION_OPEN_TASKS,
2237 "total_count": len(objective_tasks),
2238 "open_titles": [
2239 str(task.get("title") or "").strip()
2240 for task in open_tasks[:8]
2241 if str(task.get("title") or "").strip()
2242 ],
2243 "new_open_count": len(new_open_titles),
2244 "new_open_titles": new_open_titles[:8],
2245 "semantic_matches": semantic_matches[:8],
2246 "recovery_task_count": len(tasks) - len(objective_tasks),
2247 }
2248
2249
2250def _recent_task_queue_saturation_context(recent_steps: list[dict[str, Any]], *, window: int = 6) -> dict[str, Any] | None:
2251 for step in reversed(recent_steps[-window:]):
2252 if step.get("tool_name") != "record_tasks" or step.get("status") != "blocked":
2253 continue
2254 output = step.get("output") if isinstance(step.get("output"), dict) else {}
2255 if output.get("error") != "task queue saturated":
2256 continue
2257 task_queue = output.get("task_queue") if isinstance(output.get("task_queue"), dict) else {}
2258 return {
2259 "step_no": step.get("step_no"),
2260 "reason": task_queue.get("reason") or "task queue saturated",
2261 "open_count": task_queue.get("open_count"),
2262 "total_count": task_queue.get("total_count"),
2263 "open_titles": task_queue.get("open_titles") if isinstance(task_queue.get("open_titles"), list) else [],
2264 }
2265 return None
2266
2267
2268def _record_task_backlog_pressure(
2269 *,
2270 db: AgentDB,
2271 job_id: str,
2272 step_no: int | str | None,
2273 task_queue: dict[str, Any],
2274 source: str,
2275) -> None:
2276 if not isinstance(task_queue, dict) or not task_queue:
2277 return
2278 pressure = {
2279 "detected_at": datetime.now(timezone.utc).isoformat(),
2280 "source": source,
2281 "latest_step_no": step_no,
2282 "reason": task_queue.get("reason") or "task queue saturated",
2283 "open_count": task_queue.get("open_count"),
2284 "total_count": task_queue.get("total_count"),
2285 "projected_open_count": task_queue.get("projected_open_count"),
2286 "projected_total_count": task_queue.get("projected_total_count"),
2287 "open_titles": task_queue.get("open_titles") if isinstance(task_queue.get("open_titles"), list) else [],
2288 }
2289 db.update_job_metadata(job_id, {"task_backlog_pressure": pressure})
2290 db.append_agent_update(
2291 job_id,
2292 (
2293 "Task backlog pressure is active; next worker turns should execute, complete, block, skip, "
2294 "or consolidate existing tasks instead of adding new branches."
2295 ),
2296 category="blocked",
2297 metadata={"task_backlog_pressure": pressure},
2298 )
2299
2300
2301def _clear_stale_task_backlog_pressure(db: AgentDB, job_id: str, job: dict[str, Any]) -> bool:
2302 metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
2303 pressure = metadata.get("task_backlog_pressure")
2304 if not isinstance(pressure, dict) or not pressure:
2305 return False
2306 if _current_task_backlog_pressure_context(job):
2307 return False
2308 cleared = dict(pressure)
2309 cleared["resolved_at"] = datetime.now(timezone.utc).isoformat()
2310 db.update_job_metadata(job_id, {"task_backlog_pressure": {}})
2311 db.append_agent_update(
2312 job_id,
2313 "Task backlog pressure cleared; the active task queue is back under saturation limits.",
2314 category="progress",
2315 metadata={"cleared_task_backlog_pressure": cleared},
2316 )
2317 return True
2318
2319
2320def _repeated_task_queue_saturation_context(recent_steps: list[dict[str, Any]], *, window: int = 8, threshold: int = 2) -> dict[str, Any] | None:
2321 matches = []
2322 for step in recent_steps[-window:]:
2323 if step.get("tool_name") != "record_tasks" or step.get("status") != "blocked":
2324 continue
2325 output = step.get("output") if isinstance(step.get("output"), dict) else {}
2326 if output.get("error") == "task queue saturated":
2327 matches.append(step)
2328 if len(matches) < threshold:
2329 return None
2330 latest = matches[-1]
2331 output = latest.get("output") if isinstance(latest.get("output"), dict) else {}
2332 task_queue = output.get("task_queue") if isinstance(output.get("task_queue"), dict) else {}
2333 return {
2334 "count": len(matches),
2335 "latest_step_no": latest.get("step_no"),
2336 "reason": task_queue.get("reason") or "task queue saturated",
2337 }
2338
2339
2340def _task_planning_stagnation_context(job: dict[str, Any]) -> dict[str, Any] | None:
2341 metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
2342 streak = _as_int(metadata.get("task_planning_checkpoint_streak"))
2343 if streak < TASK_PLANNING_STAGNATION_CHECKPOINTS:
2344 return None
2345 tasks = _metadata_list(job, "task_queue")
2346 open_tasks = [
2347 task
2348 for task in tasks
2349 if str(task.get("status") or "open").strip().lower().replace(" ", "_") in {"open", "active"}
2350 ]
2351 return {
2352 "task_only_checkpoints": streak,
2353 "threshold": TASK_PLANNING_STAGNATION_CHECKPOINTS,
2354 "total_tasks": len(tasks),
2355 "open_tasks": len(open_tasks),
2356 }
2357
2358
2359def _is_guard_recovery_task(task: dict[str, Any]) -> bool:
2360 metadata = task.get("metadata") if isinstance(task.get("metadata"), dict) else {}
2361 return bool(metadata.get("guard_recovery")) or str(task.get("title") or "").strip().lower().startswith("resolve guard:")
2362
2363
2364def _record_tasks_adds_new_open_work(args: dict[str, Any], job: dict[str, Any]) -> bool:
2365 incoming = args.get("tasks") if isinstance(args.get("tasks"), list) else []
2366 if not incoming:
2367 incoming = [args]
2368 tasks = _metadata_list(job, "task_queue")
2369 existing_keys = {
2370 _norm_task_key(str(task.get("parent") or ""), str(task.get("title") or ""))
2371 for task in tasks
2372 }
2373 for task in incoming:
2374 if not isinstance(task, dict):
2375 continue
2376 title = str(task.get("title") or task.get("name") or "").strip()
2377 if not title:
2378 continue
2379 status = str(task.get("status") or "open").strip().lower().replace(" ", "_")
2380 key = _norm_task_key(str(task.get("parent") or ""), title)
2381 if status in {"open", "active"} and key not in existing_keys:
2382 return True
2383 return False
2384
2385
2386def _norm_task_key(parent: str, title: str) -> str:
2387 return task_key(parent, title)
2388
2389
2390def _parse_tool_result(raw: str) -> dict[str, Any]:
2391 try:
2392 parsed = json.loads(raw)
2393 return parsed if isinstance(parsed, dict) else {"result": parsed}
2394 except json.JSONDecodeError:
2395 return {"result": raw}
2396
2397
2398def _load_program_text(config: AppConfig, job_id: str) -> str:
2399 path = config.runtime.jobs_dir / job_id / "program.md"
2400 if not path.exists():
2401 return ""
2402 return path.read_text(encoding="utf-8", errors="replace")
2403
2404
2405def _browser_warning_context(output: dict[str, Any]) -> dict[str, str] | None:
2406 data = output.get("data") if isinstance(output.get("data"), dict) else {}
2407 title = str(data.get("title") or "")
2408 url = str(data.get("url") or data.get("origin") or output.get("url") or "")
2409 snapshot = str(output.get("snapshot") or data.get("snapshot") or output.get("data") or "")
2410 reason = anti_bot_reason(title, url, snapshot)
2411 if not reason:
2412 return None
2413 return {"reason": reason, "url": url, "title": title}
2414
2415
2416def _recent_anti_bot_context(recent_steps: list[dict[str, Any]], *, window: int = 8) -> dict[str, Any] | None:
2417 for step in reversed(recent_steps[-window:]):
2418 if step.get("status") != "completed" or step.get("tool_name") not in {"browser_navigate", "browser_snapshot"}:
2419 continue
2420 output = step.get("output") if isinstance(step.get("output"), dict) else {}
2421 warning = _browser_warning_context(output)
2422 if warning:
2423 return {**warning, "step_id": step.get("id"), "step_no": step.get("step_no")}
2424 return None
2425
2426
2427def _artifact_args_acknowledge_block(args: dict[str, Any]) -> bool:
2428 text = " ".join(str(args.get(key) or "") for key in ("title", "summary", "content")).lower()
2429 return any(term in text for term in ANTI_BOT_ACK_TERMS)
2430
2431
2432def _same_source_url(left: str, right: str) -> bool:
2433 if not left or not right:
2434 return False
2435 return left.split("#", 1)[0].rstrip("/") == right.split("#", 1)[0].rstrip("/")
2436
2437
2438def _normalized_source_url(value: str) -> str:
2439 value = str(value or "").strip()
2440 if not value:
2441 return ""
2442 if "://" not in value:
2443 return f"https://{value}"
2444 return value
2445
2446
2447def _source_host(value: str) -> str:
2448 parsed = urlparse(_normalized_source_url(value))
2449 return parsed.netloc.lower().removeprefix("www.")
2450
2451
2452def _source_matches(left: str, right: str) -> bool:
2453 if _same_source_url(left, right):
2454 return True
2455 left_host, left_path = _source_path_key(left)
2456 right_host, right_path = _source_path_key(right)
2457 if not left_host or left_host != right_host:
2458 return False
2459 if right_path in {"", "/"} or left_path in {"", "/"}:
2460 return False
2461 return left_path == right_path or left_path.startswith(right_path + "/") or right_path.startswith(left_path + "/")
2462
2463
2464def _source_path_key(value: str) -> tuple[str, str]:
2465 parsed = urlparse(_normalized_source_url(value))
2466 host = parsed.netloc.lower().removeprefix("www.")
2467 path = (parsed.path or "/").rstrip("/") or "/"
2468 return host, path
2469
2470
2471def _shell_source_matches(left: str, right: str) -> bool:
2472 if _same_source_url(left, right):
2473 return True
2474 left_host, left_path = _source_path_key(left)
2475 right_host, right_path = _source_path_key(right)
2476 if not left_host or left_host != right_host:
2477 return False
2478 if right_path in {"", "/"} or left_path in {"", "/"}:
2479 return False
2480 return left_path == right_path or left_path.startswith(right_path + "/") or right_path.startswith(left_path + "/")
2481
2482
2483def _urls_from_text(text: str) -> list[str]:
2484 urls: list[str] = []
2485 seen: set[str] = set()
2486 for match in re.finditer(r"https?://[^\s'\"<>)}\]]+", str(text or "")):
2487 url = match.group(0).rstrip(".,;:")
2488 key = url.lower()
2489 if key in seen:
2490 continue
2491 seen.add(key)
2492 urls.append(url)
2493 return urls
2494
2495
2496def _source_url_has_path(value: str) -> bool:
2497 _, path = _source_path_key(value)
2498 return path not in {"", "/"}
2499
2500
2501def _shell_guard_urls(text: str) -> list[str]:
2502 urls = _urls_from_text(text)
2503 if len(urls) <= 1:
2504 return urls
2505 path_urls = [url for url in urls if _source_url_has_path(url)]
2506 return path_urls or urls
2507
2508
2509SHELL_PLACEHOLDER_URL_HOSTS = {
2510 "domain",
2511 "endpoint",
2512 "example",
2513 "file",
2514 "host",
2515 "input",
2516 "output",
2517 "path",
2518 "placeholder",
2519 "source",
2520 "target",
2521 "url",
2522 "uri",
2523}
2524
2525SHELL_PLACEHOLDER_FIELD_NAMES = (
2526 "command",
2527 "domain",
2528 "endpoint",
2529 "file",
2530 "host",
2531 "input",
2532 "output",
2533 "path",
2534 "source",
2535 "target",
2536 "url",
2537 "uri",
2538)
2539
2540
2541def _shell_placeholder_context(command: str) -> dict[str, Any] | None:
2542 command = str(command or "").strip()
2543 if not command:
2544 return None
2545 if "```" in command:
2546 return {
2547 "kind": "markdown_code_fence",
2548 "value": "```",
2549 "reason": "command contains markdown code fences instead of executable shell only",
2550 }
2551 if re.search(r"(?m)^\s*-{3,}\s+\S", command) or re.search(r"(?m)^\s*\d+\.\s+```", command):
2552 return {
2553 "kind": "markdown_prose",
2554 "value": "markdown prose",
2555 "reason": "command contains copied markdown prose instead of executable shell only",
2556 }
2557 for url in _urls_from_text(command):
2558 parsed = urlparse(url)
2559 host = (parsed.hostname or "").lower()
2560 if host in SHELL_PLACEHOLDER_URL_HOSTS:
2561 return {
2562 "kind": "placeholder_url",
2563 "value": url,
2564 "reason": "URL host looks like an unresolved placeholder field",
2565 }
2566 fields = "|".join(re.escape(name) for name in SHELL_PLACEHOLDER_FIELD_NAMES)
2567 placeholder_patterns = [
2568 rf"<\s*(?:{fields})(?:[-_ ][A-Za-z0-9]+)?\s*>",
2569 rf"\{{\{{\s*(?:{fields})(?:[-_ ][A-Za-z0-9]+)?\s*\}}\}}",
2570 rf"\{{\s*(?:{fields})(?:[-_ ][A-Za-z0-9]+)?\s*\}}",
2571 r"</?\s*(?:parameter|arguments?|tool_call|function_call)\b[^>]*>",
2572 r"\b(?:YOUR|REPLACE|TODO|INSERT)_[A-Z0-9_]{3,}\b",
2573 ]
2574 for pattern in placeholder_patterns:
2575 match = re.search(pattern, command, flags=re.IGNORECASE)
2576 if match:
2577 return {
2578 "kind": "placeholder_token",
2579 "value": match.group(0),
2580 "reason": "command contains an unresolved placeholder token",
2581 }
2582 return None
2583
2584
2585def _shell_syntax_preflight_context(command: str) -> dict[str, Any] | None:
2586 command = str(command or "").strip()
2587 if not command:
2588 return None
2589 try:
2590 shlex.split(command, posix=True)
2591 except ValueError as exc:
2592 return {
2593 "kind": "shell_syntax",
2594 "value": str(exc),
2595 "reason": "command is not parseable shell syntax; usually an unmatched quote, escape, or partial pasted command",
2596 }
2597 return None
2598
2599
2600def _source_failure_family_url(value: str) -> str:
2601 parsed = urlparse(_normalized_source_url(value))
2602 if not parsed.scheme or not parsed.netloc:
2603 return ""
2604 segments = [segment for segment in (parsed.path or "").split("/") if segment]
2605 if len(segments) < 2:
2606 return ""
2607 last = segments[-1]
2608 looks_file_like = "." in last
2609 family_segments = segments[:-1] if looks_file_like else segments
2610 if len(family_segments) < 2:
2611 return ""
2612 return f"{parsed.scheme}://{parsed.netloc}/{'/'.join(family_segments)}"
2613
2614
2615def _known_bad_sources(job: dict[str, Any]) -> list[dict[str, Any]]:
2616 bad_sources = []
2617 for source in _metadata_list(job, "source_ledger"):
2618 if (
2619 _as_float(source.get("usefulness_score")) < 0.2
2620 and _as_int(source.get("yield_count")) <= 0
2621 and (_as_int(source.get("fail_count")) > 0 or source.get("warnings"))
2622 ):
2623 bad_sources.append(source)
2624 return bad_sources
2625
2626
2627def _known_bad_source_for_call(name: str, args: dict[str, Any], job: dict[str, Any]) -> dict[str, Any] | None:
2628 if name not in {"browser_navigate", "web_extract", "shell_exec"}:
2629 return None
2630 bad_sources = _known_bad_sources(job)
2631 if not bad_sources:
2632 return None
2633 urls: list[str] = []
2634 if name == "browser_navigate":
2635 urls = [str(args.get("url") or "")]
2636 elif isinstance(args.get("urls"), list):
2637 urls = [str(url) for url in args["urls"]]
2638 elif name == "shell_exec":
2639 urls = _shell_guard_urls(str(args.get("command") or ""))
2640 for url in [url for url in urls if url.strip()]:
2641 for source in bad_sources:
2642 source_value = str(source.get("source") or "")
2643 if not source_value:
2644 continue
2645 matches = _shell_source_matches(url, source_value) if name == "shell_exec" else _source_matches(url, source_value)
2646 if matches:
2647 return source
2648 if name == "shell_exec":
2649 source_family = _source_failure_family_url(source_value)
2650 if source_family and _shell_source_matches(url, source_family):
2651 metadata = source.get("metadata") if isinstance(source.get("metadata"), dict) else {}
2652 return {
2653 **source,
2654 "source": source_family,
2655 "source_type": "shell_exec_family",
2656 "metadata": {**metadata, "source_family": True, "source_family_from": source_value},
2657 }
2658 return None
2659
2660
2661def _tool_signature(name: str, args: dict[str, Any]) -> str:
2662 return f"{name}:{json.dumps(args, ensure_ascii=False, sort_keys=True)}"
2663
2664
2665def _duplicate_recent_tool_call(
2666 name: str,
2667 args: dict[str, Any],
2668 recent_steps: list[dict[str, Any]],
2669 *,
2670 window: int = 24,
2671) -> dict[str, Any] | None:
2672 if name in {"browser_snapshot", "defer_job"}:
2673 return None
2674 signature = _tool_signature(name, args)
2675 for step in reversed(recent_steps[-window:]):
2676 if step.get("status") != "completed" or step.get("tool_name") != name:
2677 continue
2678 input_data = step.get("input") or {}
2679 previous_args = input_data.get("arguments") if isinstance(input_data, dict) else None
2680 if isinstance(previous_args, dict) and _tool_signature(name, previous_args) == signature:
2681 return step
2682 return None
2683
2684
2685def _completed_recent_steps(recent_steps: list[dict[str, Any]]) -> list[dict[str, Any]]:
2686 return [step for step in recent_steps if step.get("status") == "completed"]
2687
2688
2689def _completed_or_failed_recent_steps(recent_steps: list[dict[str, Any]]) -> list[dict[str, Any]]:
2690 return [step for step in recent_steps if step.get("status") in {"completed", "failed"}]
2691
2692
2693BROWSER_RUNTIME_UNAVAILABLE_TERMS = (
2694 "browser runtime unavailable",
2695 "browser not found",
2696 "browser executable",
2697 "chrome not found",
2698 "could not find chrome",
2699 "chromium executable",
2700 "executable doesn't exist",
2701 "playwright browser cache",
2702 "puppeteer browser cache",
2703)
2704
2705
2706SELF_DEFER_TERMS = (
2707 "next worker turn",
2708 "next worker step",
2709 "picked up by next worker",
2710 "picked up by the next worker",
2711 "picked up by next turn",
2712 "picked up by the next turn",
2713)
2714
2715
2716def _is_browser_tool(name: str | None) -> bool:
2717 return bool(str(name or "").startswith("browser_"))
2718
2719
2720def _browser_runtime_unavailable_context(
2721 recent_steps: list[dict[str, Any]],
2722 *,
2723 window: int = 512,
2724) -> dict[str, Any] | None:
2725 latest_browser_success_no = max(
2726 (
2727 int(step.get("step_no") or 0)
2728 for step in recent_steps[-window:]
2729 if _is_browser_tool(step.get("tool_name")) and step.get("status") == "completed"
2730 ),
2731 default=0,
2732 )
2733 for step in reversed(recent_steps[-window:]):
2734 if not _is_browser_tool(step.get("tool_name")):
2735 continue
2736 step_no = int(step.get("step_no") or 0)
2737 if step_no <= latest_browser_success_no:
2738 continue
2739 if step.get("status") not in {"failed", "blocked"}:
2740 continue
2741 output = step.get("output") if isinstance(step.get("output"), dict) else {}
2742 text = " ".join(
2743 str(part or "")
2744 for part in (
2745 step.get("summary"),
2746 step.get("error"),
2747 output.get("error"),
2748 output.get("summary"),
2749 output.get("stderr"),
2750 output.get("stdout"),
2751 )
2752 ).lower()
2753 if any(term in text for term in BROWSER_RUNTIME_UNAVAILABLE_TERMS):
2754 error = str(output.get("error") or step.get("error") or step.get("summary") or "")
2755 return {
2756 "step_no": step.get("step_no"),
2757 "tool": step.get("tool_name"),
2758 "status": step.get("status"),
2759 "error": _clip_text(error, 500),
2760 }
2761 return None
2762
2763
2764def _self_defer_context(args: dict[str, Any]) -> dict[str, Any] | None:
2765 reason = str(args.get("reason") or "")
2766 next_action = str(args.get("next_action") or "")
2767 text = f"{reason} {next_action}".lower()
2768 matched = next((term for term in SELF_DEFER_TERMS if term in text), "")
2769 if not matched and next_action.strip() and not reason.strip():
2770 matched = "missing wait reason"
2771 if not matched:
2772 return None
2773 return {
2774 "matched": matched,
2775 "reason": reason,
2776 "next_action": next_action,
2777 }
2778
2779
2780EVIDENCE_GROUNDED_TOOLS = {
2781 "record_experiment",
2782 "record_findings",
2783 "record_lesson",
2784 "record_memory_graph",
2785 "record_roadmap",
2786 "report_update",
2787 "write_artifact",
2788}
2789NARRATIVE_EVIDENCE_GROUNDED_TOOLS = {
2790 "record_findings",
2791 "record_lesson",
2792 "record_memory_graph",
2793 "record_roadmap",
2794 "report_update",
2795 "write_artifact",
2796}
2797EVIDENCE_GROUNDING_RESOLUTION_TOOLS = {
2798 "record_experiment",
2799 "record_findings",
2800 "record_lesson",
2801 "record_memory_graph",
2802 "record_milestone_validation",
2803 "record_roadmap",
2804 "record_source",
2805 "record_tasks",
2806 "report_update",
2807 "write_artifact",
2808}
2809EVIDENCE_CHECKPOINT_RESOLUTION_TOOLS = {
2810 "record_experiment",
2811 "record_findings",
2812 "record_lesson",
2813 "record_milestone_validation",
2814 "record_roadmap",
2815 "record_source",
2816 "record_tasks",
2817}
2818EVIDENCE_CHECKPOINT_ACCOUNTING_TOOLS = EVIDENCE_CHECKPOINT_RESOLUTION_TOOLS | {"guard_recovery"}
2819EVIDENCE_CHECKPOINT_PROMPT_TOOLS = {
2820 "record_experiment",
2821 "record_findings",
2822 "record_lesson",
2823 "record_source",
2824}
2825EVIDENCE_TOKEN_IGNORE = <redacted>
2826 "acceptance",
2827 "action",
2828 "actions",
2829 "active",
2830 "agent",
2831 "artifact",
2832 "api",
2833 "baseline",
2834 "branch",
2835 "branches",
2836 "candidate",
2837 "candidates",
2838 "cdn",
2839 "checkpoint",
2840 "compare",
2841 "complete",
2842 "constraint",
2843 "criteria",
2844 "current",
2845 "data",
2846 "deliverable",
2847 "direct",
2848 "done",
2849 "download",
2850 "downloadable",
2851 "downloaded",
2852 "downloading",
2853 "downloads",
2854 "discovered",
2855 "discovery",
2856 "environment",
2857 "existing",
2858 "evidence",
2859 "experiment",
2860 "experiments",
2861 "feature",
2862 "features",
2863 "file",
2864 "files",
2865 "format",
2866 "finding",
2867 "findings",
2868 "file-level",
2869 "html",
2870 "http",
2871 "https",
2872 "inspect",
2873 "inspection",
2874 "investigate",
2875 "investigation",
2876 "json",
2877 "goal",
2878 "gguf",
2879 "hardware",
2880 "improve",
2881 "located",
2882 "memory",
2883 "metric",
2884 "milestone",
2885 "milestones",
2886 "model",
2887 "next",
2888 "observation",
2889 "observations",
2890 "open",
2891 "oid",
2892 "output",
2893 "outputs",
2894 "plan",
2895 "planned",
2896 "pending",
2897 "priority",
2898 "progress",
2899 "parse",
2900 "parsed",
2901 "parsing",
2902 "record",
2903 "report",
2904 "rest",
2905 "research",
2906 "result",
2907 "roadmap",
2908 "runtime",
2909 "search",
2910 "server",
2911 "source",
2912 "sources",
2913 "status",
2914 "sha",
2915 "sha256",
2916 "step",
2917 "steps",
2918 "task",
2919 "tasks",
2920 "test",
2921 "throughput",
2922 "tool",
2923 "tools",
2924 "false",
2925 "none",
2926 "null",
2927 "true",
2928 "url",
2929 "usable",
2930 "unvalidated",
2931 "valid",
2932 "validity",
2933 "validate",
2934 "validated",
2935 "validating",
2936 "validation",
2937 "worker",
2938 "xml",
2939 "yaml",
2940 "yml",
2941 "confirmed",
2942 "consider",
2943 "checking",
2944 "ongoing",
2945 "proceed",
2946 "proceeding",
2947}
2948EVIDENCE_TOKEN_IGNORE.update({f"p{index}" for index in range(10)})
2949STALE_CLAIM_TOKEN_IGNORE = <redacted>
2950 "api",
2951 "ascii",
2952 "blocked",
2953 "broken",
2954 "cdn",
2955 "cli",
2956 "critical",
2957 "cpu",
2958 "cuda",
2959 "discovered",
2960 "ggml",
2961 "gguf",
2962 "gpu",
2963 "hf_token",
2964 "html",
2965 "http",
2966 "https",
2967 "incomplete",
2968 "json",
2969 "lfs",
2970 "not_found",
2971 "oid",
2972 "onnx",
2973 "planned",
2974 "python",
2975 "python3",
2976 "ram",
2977 "rest",
2978 "severe",
2979 "sha",
2980 "sha256",
2981 "vram",
2982 "xet",
2983 "xml",
2984 "yaml",
2985 "yml",
2986}
2987NEGATIVE_EXISTENCE_MARKERS = (
2988 "0 files",
2989 "0 results",
2990 "cannot access",
2991 "does not exist",
2992 "failed to find",
2993 "has not been",
2994 "is not installed",
2995 "missing",
2996 "no ",
2997 "no such",
2998 "none",
2999 "not available",
3000 "not detected",
3001 "not downloaded",
3002 "not found",
3003 "not installed",
3004 "unavailable",
3005 "was not",
3006 "without",
3007)
3008NEGATIVE_ROLE_CLASSIFICATION_MARKERS = (
3009 "not a primary",
3010 "not a required",
3011 "not a target",
3012 "not an expected",
3013 "not suitable as",
3014 "not suitable for",
3015 "not the expected",
3016 "not the needed",
3017 "not the primary",
3018 "not the required",
3019 "not the target",
3020 "not usable as",
3021 "not usable for",
3022 "only support",
3023 "support file",
3024 "support files",
3025)
3026EVIDENCE_NEGATIVE_LINE_MARKERS = (
3027 "0 files",
3028 "0 results",
3029 "cannot access",
3030 "denied",
3031 "does not exist",
3032 "error",
3033 "failed",
3034 "failure",
3035 "has not been",
3036 "missing",
3037 "no such",
3038 "not available",
3039 "not detected",
3040 "not downloaded",
3041 "not found",
3042 "not installed",
3043 "permission",
3044 "timeout",
3045 "unavailable",
3046 "was not",
3047)
3048
3049
3050def _stale_claim_tokens_from_unsupported(tokens: list[str], *, reference_text: str = "") -> list[str]:
3051 stale_tokens: list[str] = []
3052 seen: set[str] = set()
3053 reference_norm = _normalize_claim_text(reference_text)
3054 for token in tokens:
3055 cleaned = str(token or "").strip()
3056 if not cleaned:
3057 continue
3058 key = cleaned.lower()
3059 if key in seen or key in STALE_CLAIM_TOKEN_IGNORE or key in EVIDENCE_TOKEN_IGNORE:
3060 continue
3061 if reference_norm and _normalize_claim_text(cleaned) in reference_norm:
3062 continue
3063 if _looks_like_generated_or_file_token(cleaned):
3064 continue
3065 if len(cleaned) < 4:
3066 continue
3067 distinctive = any(ch.isalpha() for ch in cleaned) and any(ch.isdigit() for ch in cleaned)
3068 distinctive = distinctive or (cleaned.isupper() and len(cleaned) >= 4)
3069 if not distinctive:
3070 continue
3071 seen.add(key)
3072 stale_tokens.append(cleaned)
3073 return stale_tokens
3074
3075
3076def _looks_like_generated_or_file_token(token: str) -> bool:
3077 lowered = token.lower()
3078 if lowered.startswith((
3079 "art_",
3080 "step_",
3081 "step-",
3082 "shell_",
3083 "shell-",
3084 "web_",
3085 "web-",
3086 "episode-",
3087 "fact-",
3088 "source-",
3089 "quality-",
3090 "constraint-",
3091 "baseline-",
3092 "question-",
3093 "verified_",
3094 "verified-",
3095 "timeout_",
3096 "timeout-",
3097 )):
3098 return True
3099 if lowered.endswith((".md", ".py", ".json", ".yaml", ".yml", ".gguf", ".txt", ".log")):
3100 return True
3101 if lowered.startswith(("python-", "pip", "pip3")):
3102 return True
3103 if "_" in lowered and any(ch.isdigit() for ch in lowered) and any(ch.isalpha() for ch in lowered):
3104 return True
3105 return False
3106
3107
3108def _normalize_claim_text(text: str) -> str:
3109 return re.sub(r"[^a-z0-9]+", "", str(text or "").lower())
3110
3111
3112def _evidence_grounding_context(
3113 job: dict[str, Any],
3114 recent_steps: list[dict[str, Any]],
3115 *,
3116 tool_name: str,
3117 args: dict[str, Any],
3118 window: int = 8,
3119) -> dict[str, Any] | None:
3120 if tool_name not in EVIDENCE_GROUNDED_TOOLS:
3121 return None
3122 full_proposed_text = _json_text(args)
3123 proposed_text = _evidence_grounding_proposed_text(tool_name, args)
3124 if len(full_proposed_text.strip()) < 80:
3125 return None
3126 cited_steps = _cited_step_numbers(full_proposed_text)
3127 evidence_text = _recent_evidence_text(job, recent_steps, window=window, step_numbers=cited_steps or None)
3128 fresh_evidence_text = _recent_evidence_text(
3129 job,
3130 recent_steps,
3131 window=window,
3132 step_numbers=cited_steps or None,
3133 include_durable=False,
3134 include_job_context=False,
3135 )
3136 recent_grounding_paths = _candidate_file_paths_from_recent_grounding_blocks(recent_steps, window=window)
3137 if len(evidence_text.strip()) < 80 and not recent_grounding_paths:
3138 return None
3139 job_reference_text = " ".join(str(job.get(key) or "") for key in ("title", "objective", "kind"))
3140 proposed_tokens = [
3141 token
3142 for token in _concrete_evidence_tokens_for_grounding(tool_name, proposed_text)
3143 if not _grounding_token_in_reference_text(token, job_reference_text)
3144 ]
3145 positive_path_conflicts = _positive_path_claim_conflicts_for_grounding(
3146 tool_name=tool_name,
3147 proposed_text=proposed_text,
3148 full_proposed_text=full_proposed_text,
3149 fresh_evidence_text=fresh_evidence_text,
3150 )
3151 if positive_path_conflicts:
3152 conflict_paths = [item["path"] for item in positive_path_conflicts]
3153 return {
3154 "unsupported_tokens": conflict_paths[:12],
3155 "negative_path_conflicts": positive_path_conflicts[:6],
3156 "evidence_steps": [
3157 step.get("step_no")
3158 for step in _evidence_steps_for_grounding(recent_steps, window=window, step_numbers=cited_steps or None)
3159 ],
3160 "cited_steps": sorted(cited_steps),
3161 "guidance": (
3162 "The proposed durable record claims a path or executable is present/available, but recent shell "
3163 "evidence says that same path is missing or inaccessible. Inspect again, record it as missing, "
3164 "or cite a newer positive check before saving the claim."
3165 ),
3166 }
3167 negative_conflicts = _negative_claim_conflicts_for_grounding(
3168 tool_name=tool_name,
3169 proposed_text=proposed_text,
3170 fresh_evidence_text=fresh_evidence_text,
3171 tokens=proposed_tokens,
3172 )
3173 if negative_conflicts:
3174 conflict_tokens = [item["token"] for item in negative_conflicts]
3175 return {
3176 "unsupported_tokens": conflict_tokens[:12],
3177 "negative_claim_conflicts": negative_conflicts[:6],
3178 "evidence_steps": [
3179 step.get("step_no")
3180 for step in _evidence_steps_for_grounding(recent_steps, window=window, step_numbers=cited_steps or None)
3181 ],
3182 "cited_steps": sorted(cited_steps),
3183 "guidance": (
3184 "The proposed durable record negates a concrete item or file pattern that appears in recent positive evidence. "
3185 "Inspect the evidence again or record uncertainty instead of saving a conflicting claim."
3186 ),
3187 }
3188 missing_paths = _missing_candidate_paths_for_grounding(
3189 job=job,
3190 recent_steps=recent_steps,
3191 recent_grounding_paths=recent_grounding_paths,
3192 tool_name=tool_name,
3193 proposed_text=proposed_text,
3194 full_proposed_text=full_proposed_text,
3195 fresh_evidence_text=fresh_evidence_text,
3196 )
3197 if missing_paths:
3198 return {
3199 "unsupported_tokens": missing_paths[:8],
3200 "missing_candidate_paths": missing_paths[:8],
3201 "evidence_steps": [
3202 step.get("step_no")
3203 for step in _evidence_steps_for_grounding(recent_steps, window=window, step_numbers=cited_steps or None)
3204 ],
3205 "cited_steps": sorted(cited_steps),
3206 "guidance": (
3207 "Recent evidence contains concrete file/path candidates, but the durable record only summarized them. "
3208 "Record the exact observed candidate paths, or explicitly state why those paths are not relevant."
3209 ),
3210 }
3211 stale_tokens = _active_stale_claim_token_set(job)
3212 proposed_stale_tokens = [
3213 token
3214 for token in _concrete_evidence_tokens_for_grounding(tool_name, full_proposed_text)
3215 if not _grounding_token_in_reference_text(token, job_reference_text)
3216 if token.lower() in stale_tokens
3217 ]
3218 if tool_name == "record_lesson" and not proposed_stale_tokens:
3219 return None
3220 unsupported_threshold = 1 if cited_steps or proposed_stale_tokens else 3
3221 candidate_tokens = proposed_stale_tokens if tool_name == "record_lesson" else proposed_tokens + proposed_stale_tokens
3222 candidate_high_risk = [token for token in candidate_tokens if _high_risk_evidence_token(token)]
3223 if len(candidate_tokens) < unsupported_threshold and not candidate_high_risk:
3224 return None
3225 evidence_lower = evidence_text.lower()
3226 fresh_evidence_lower = fresh_evidence_text.lower()
3227 unsupported = []
3228 for token in candidate_tokens:
3229 lowered = token.lower()
3230 if lowered in fresh_evidence_lower:
3231 continue
3232 if lowered in evidence_lower and lowered not in stale_tokens:
3233 continue
3234 unsupported.append(token)
3235 unique = []
3236 seen = set()
3237 for token in unsupported:
3238 key = token.lower()
3239 if key in seen:
3240 continue
3241 seen.add(key)
3242 unique.append(token)
3243 high_risk_unique = [token for token in unique if _high_risk_evidence_token(token)]
3244 if len(unique) < unsupported_threshold and not high_risk_unique:
3245 return None
3246 return {
3247 "unsupported_tokens": (high_risk_unique or unique)[:12],
3248 "evidence_steps": [
3249 step.get("step_no")
3250 for step in _evidence_steps_for_grounding(recent_steps, window=window, step_numbers=cited_steps or None)
3251 ],
3252 "cited_steps": sorted(cited_steps),
3253 "guidance": (
3254 "The proposed durable record contains concrete tokens that are not present in recent evidence. "
3255 "Use exact observed evidence, inspect the source again, or record uncertainty instead of writing unsupported claims."
3256 ),
3257 }
3258
3259
3260def _concrete_evidence_tokens_for_grounding(tool_name: str, text: str) -> list[str]:
3261 tokens = _concrete_evidence_tokens(text)
3262 if tool_name not in NARRATIVE_EVIDENCE_GROUNDED_TOOLS:
3263 return tokens
3264 return [token for token in tokens if _high_risk_evidence_token(token)]
3265
3266
3267def _grounding_token_in_reference_text(token: str, reference_text: str) -> bool:
3268 normalized_token = _normalize_claim_text(token)
3269 if not normalized_token:
3270 return False
3271 return normalized_token in _normalize_claim_text(reference_text)
3272
3273
3274def _missing_candidate_paths_for_grounding(
3275 *,
3276 job: dict[str, Any],
3277 recent_steps: list[dict[str, Any]],
3278 recent_grounding_paths: list[str] | None = None,
3279 tool_name: str,
3280 proposed_text: str,
3281 full_proposed_text: str,
3282 fresh_evidence_text: str,
3283) -> list[str]:
3284 if tool_name not in {"record_findings", "record_experiment", "write_artifact", "report_update"}:
3285 return []
3286 proposed_lower = f"{proposed_text}\n{full_proposed_text}".lower()
3287 if not any(term in proposed_lower for term in ("file", "files", "path", "paths", "candidate", "found", "discovered")):
3288 return []
3289 positive_evidence_text = "\n".join(
3290 line
3291 for line in str(fresh_evidence_text or "").splitlines()
3292 if not _evidence_line_is_negative(line.lower())
3293 )
3294 evidence_paths = [
3295 *_extract_candidate_file_paths(positive_evidence_text),
3296 *(recent_grounding_paths or _candidate_file_paths_from_recent_grounding_blocks(recent_steps)),
3297 ]
3298 if not evidence_paths:
3299 return []
3300 if any(_path_mentioned_in_text(path, proposed_lower) for path in evidence_paths):
3301 return []
3302 distinctive_paths: list[str] = []
3303 seen: set[str] = set()
3304 for path in _rank_candidate_file_paths(job, full_proposed_text, evidence_paths):
3305 key = path.lower()
3306 if key in seen:
3307 continue
3308 seen.add(key)
3309 distinctive_paths.append(path)
3310 if len(distinctive_paths) >= 8:
3311 break
3312 return distinctive_paths
3313
3314
3315POSITIVE_PATH_CLAIM_MARKERS = (
3316 "available",
3317 "exists",
3318 "found",
3319 "is at",
3320 "located",
3321 "present",
3322 "ready",
3323 "succeed",
3324 "usable",
3325 "valid",
3326 "verified",
3327)
3328
3329
3330def _positive_path_claim_conflicts_for_grounding(
3331 *,
3332 tool_name: str,
3333 proposed_text: str,
3334 full_proposed_text: str,
3335 fresh_evidence_text: str,
3336) -> list[dict[str, str]]:
3337 if tool_name not in {"record_findings", "record_experiment", "record_source", "record_lesson", "write_artifact", "report_update"}:
3338 return []
3339 proposed_combined = f"{proposed_text}\n{full_proposed_text}"
3340 proposed_lower = proposed_combined.lower()
3341 conflicts: list[dict[str, str]] = []
3342 seen: set[str] = set()
3343 for line in str(fresh_evidence_text or "").splitlines():
3344 line_lower = line.lower()
3345 if not _evidence_line_is_negative(line_lower):
3346 continue
3347 paths = [
3348 *_extract_candidate_file_paths(line),
3349 *_extract_candidate_executable_paths(line),
3350 ]
3351 for path in paths:
3352 path = str(path or "").strip()
3353 if not path:
3354 continue
3355 key = path.lower()
3356 if key in seen:
3357 continue
3358 if key not in proposed_lower:
3359 continue
3360 if not _path_near_positive_claim(proposed_combined, path):
3361 continue
3362 seen.add(key)
3363 conflicts.append({
3364 "path": path,
3365 "evidence": _clip_text(line.strip(), 220),
3366 "claim": _clip_text(_excerpt_around(proposed_combined, path, window=96), 220),
3367 })
3368 if len(conflicts) >= 8:
3369 return conflicts
3370 return conflicts
3371
3372
3373def _path_near_positive_claim(text: str, path: str, *, window: int = 96) -> bool:
3374 for excerpt in _excerpts_around_all(text, path, window=window):
3375 excerpt_lower = excerpt.lower()
3376 if _evidence_line_is_negative(excerpt_lower):
3377 continue
3378 if any(marker in excerpt_lower for marker in POSITIVE_PATH_CLAIM_MARKERS):
3379 return True
3380 return False
3381
3382
3383def _excerpt_around(text: str, needle: str, *, window: int = 80) -> str:
3384 excerpts = _excerpts_around_all(text, needle, window=window, max_matches=1)
3385 return excerpts[0] if excerpts else ""
3386
3387
3388def _excerpts_around_all(text: str, needle: str, *, window: int = 80, max_matches: int = 8) -> list[str]:
3389 source = str(text or "")
3390 needle_text = str(needle or "")
3391 if not source or not needle_text:
3392 return []
3393 source_lower = source.lower()
3394 needle_lower = needle_text.lower()
3395 excerpts: list[str] = []
3396 index = 0
3397 while len(excerpts) < max_matches:
3398 found = source_lower.find(needle_lower, index)
3399 if found < 0:
3400 break
3401 start = max(0, found - window)
3402 end = min(len(source), found + len(needle_text) + window)
3403 excerpts.append(source[start:end])
3404 index = found + max(1, len(needle_text))
3405 return excerpts
3406
3407
3408def _path_mentioned_in_text(path: str, text_lower: str) -> bool:
3409 path_lower = path.lower()
3410 if path_lower in text_lower:
3411 return True
3412 name = Path(path).name.lower()
3413 return bool(name and name in text_lower)
3414
3415
3416def _refresh_contradicted_negative_claims(
3417 db: AgentDB,
3418 job_id: str,
3419 job: dict[str, Any],
3420 recent_steps: list[dict[str, Any]],
3421) -> int:
3422 fresh_evidence_text = _recent_evidence_text(
3423 job,
3424 recent_steps,
3425 window=8,
3426 include_durable=False,
3427 include_job_context=False,
3428 )
3429 if len(fresh_evidence_text.strip()) < 80:
3430 return 0
3431 metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
3432 existing = metadata.get("stale_negative_records") if isinstance(metadata.get("stale_negative_records"), list) else []
3433 seen = {
3434 (
3435 str(item.get("kind") or ""),
3436 str(item.get("record_id") or ""),
3437 str(item.get("token") or "").lower(),
3438 )
3439 for item in existing
3440 if isinstance(item, dict)
3441 }
3442 now = datetime.now(timezone.utc).isoformat()
3443 new_records: list[dict[str, Any]] = []
3444 for kind, records in (
3445 ("finding", _metadata_list(job, "finding_ledger")[-80:]),
3446 ("lesson", _metadata_list(job, "lessons")[-80:]),
3447 (
3448 "memory_node",
3449 (
3450 metadata.get("memory_graph", {}).get("nodes", [])
3451 if isinstance(metadata.get("memory_graph"), dict)
3452 and isinstance(metadata.get("memory_graph", {}).get("nodes"), list)
3453 else []
3454 ),
3455 ),
3456 ):
3457 for record in records:
3458 if not isinstance(record, dict):
3459 continue
3460 record_text = _negative_record_text(kind, record)
3461 if not record_text:
3462 continue
3463 conflicts = _negative_claim_conflicts_for_grounding(
3464 tool_name="record_findings",
3465 proposed_text=record_text,
3466 fresh_evidence_text=fresh_evidence_text,
3467 tokens=_concrete_evidence_tokens(record_text),
3468 )
3469 if not conflicts:
3470 continue
3471 record_id = _negative_record_id(kind, record)
3472 for conflict in conflicts[:4]:
3473 token = str(conflict.get("token") or "")
3474 key = (kind, record_id, token.lower())
3475 if key in seen:
3476 continue
3477 seen.add(key)
3478 new_records.append({
3479 "kind": kind,
3480 "record_id": record_id,
3481 "title": _negative_record_title(kind, record),
3482 "token": token,
3483 "evidence": conflict.get("evidence") or "",
3484 "observed_at": now,
3485 })
3486 if not new_records:
3487 return 0
3488 db.update_job_metadata(job_id, {"stale_negative_records": (existing + new_records)[-120:]})
3489 db.append_agent_update(
3490 job_id,
3491 f"Suppressed {len(new_records)} contradicted negative durable claim(s) after fresh evidence.",
3492 category="memory",
3493 metadata={"stale_negative_records": new_records[:12]},
3494 )
3495 return len(new_records)
3496
3497
3498def _negative_record_text(kind: str, record: dict[str, Any]) -> str:
3499 if kind == "lesson":
3500 return str(record.get("lesson") or "")
3501 if kind == "memory_node":
3502 return " ".join(
3503 str(record.get(key) or "")
3504 for key in ("key", "title", "kind", "status", "summary")
3505 )
3506 return " ".join(
3507 str(record.get(key) or "")
3508 for key in ("name", "category", "reason", "status", "source_url", "url")
3509 )
3510
3511
3512def _negative_record_id(kind: str, record: dict[str, Any]) -> str:
3513 for key in ("key", "event_id", "id"):
3514 value = str(record.get(key) or "").strip()
3515 if value:
3516 return value
3517 return _normalize_claim_text(f"{kind}:{_negative_record_title(kind, record)}")[:120]
3518
3519
3520def _negative_record_title(kind: str, record: dict[str, Any]) -> str:
3521 if kind == "lesson":
3522 return _clip_text(str(record.get("lesson") or "lesson"), 120)
3523 return str(record.get("name") or record.get("title") or "finding")
3524
3525
3526def _negative_claim_conflicts_for_grounding(
3527 *,
3528 tool_name: str,
3529 proposed_text: str,
3530 fresh_evidence_text: str,
3531 tokens: list[str],
3532) -> list[dict[str, str]]:
3533 if tool_name not in EVIDENCE_GROUNDED_TOOLS:
3534 return []
3535 proposed_lower = proposed_text.lower()
3536 if not any(marker in proposed_lower for marker in NEGATIVE_EXISTENCE_MARKERS):
3537 return []
3538 evidence_lines = [line.strip() for line in fresh_evidence_text.splitlines() if line.strip()]
3539 if not evidence_lines:
3540 return []
3541 candidates = tokens + _file_pattern_tokens_for_grounding(proposed_text)
3542 conflicts: list[dict[str, str]] = []
3543 seen: set[str] = set()
3544 for token in candidates:
3545 key = token.lower()
3546 if key in seen:
3547 continue
3548 seen.add(key)
3549 if not token.startswith(".") and "/" not in token and not _high_risk_evidence_token(token):
3550 continue
3551 if not _token_near_negative_claim(proposed_text, token):
3552 continue
3553 positive_line = _positive_evidence_line_for_token(evidence_lines, token)
3554 if not positive_line:
3555 continue
3556 conflicts.append({"token": token, "evidence": _clip_text(positive_line, 220)})
3557 return conflicts
3558
3559
3560def _file_pattern_tokens_for_grounding(text: str) -> list[str]:
3561 tokens: list[str] = []
3562 seen: set[str] = set()
3563 for match in re.finditer(r"(?<![A-Za-z0-9])(?:\*\.)?\.?([A-Za-z0-9][A-Za-z0-9_-]{1,12})(?![A-Za-z0-9_-])", text or ""):
3564 raw = match.group(0).strip("'\"`")
3565 if not raw:
3566 continue
3567 if not raw.startswith((".", "*.")):
3568 continue
3569 if "." not in raw and not raw.startswith("*."):
3570 continue
3571 if raw.startswith(".") and not raw.startswith("*."):
3572 previous_char = text[match.start() - 1] if match.start() > 0 else ""
3573 next_char = text[match.end()] if match.end() < len(text) else ""
3574 if previous_char == "/" or next_char == "/":
3575 continue
3576 ext = "." + match.group(1).lower().lstrip(".")
3577 if ext in {".app", ".co", ".com", ".dev", ".edu", ".gov", ".io", ".net", ".org", ".www", ".http", ".https"}:
3578 continue
3579 if ext in seen:
3580 continue
3581 seen.add(ext)
3582 tokens.append(ext)
3583 return tokens
3584
3585
3586def _token_near_negative_claim(text: str, token: str, *, window: int = 64) -> bool:
3587 text_lower = text.lower()
3588 token_lower = token.lower()
3589 start = 0
3590 while True:
3591 index = text_lower.find(token_lower, start)
3592 if index < 0:
3593 return False
3594 nearby = text_lower[max(0, index - window): index + len(token_lower) + window]
3595 if any(marker in nearby for marker in NEGATIVE_EXISTENCE_MARKERS):
3596 if _nearby_negative_is_role_classification(nearby):
3597 start = index + len(token_lower)
3598 continue
3599 if _nearby_negative_is_positive_validation(nearby):
3600 start = index + len(token_lower)
3601 continue
3602 return True
3603 start = index + len(token_lower)
3604
3605
3606def _nearby_negative_is_role_classification(text: str) -> bool:
3607 return any(marker in text for marker in NEGATIVE_ROLE_CLASSIFICATION_MARKERS)
3608
3609
3610def _nearby_negative_is_positive_validation(text: str) -> bool:
3611 return bool(re.search(r"\bnot\s+(?:a|an|the)?\s*(?:[\w.-]+\s+){0,5}(?:stub|placeholder|empty file)\b", text))
3612
3613
3614def _positive_evidence_line_for_token(lines: list[str], token: str) -> str:
3615 token_lower = token.lower()
3616 for line in lines:
3617 line_lower = line.lower()
3618 if token_lower not in line_lower:
3619 continue
3620 if _evidence_line_is_negative(line_lower):
3621 continue
3622 return line
3623 return ""
3624
3625
3626def _evidence_line_is_negative(line_lower: str) -> bool:
3627 if any(marker in line_lower for marker in EVIDENCE_NEGATIVE_LINE_MARKERS):
3628 return True
3629 return line_lower.startswith("no ") or " no " in line_lower or line_lower.startswith("zero ") or " zero " in line_lower
3630
3631
3632def _evidence_grounding_proposed_text(tool_name: str, args: dict[str, Any]) -> str:
3633 if tool_name == "record_experiment":
3634 return "\n".join(
3635 _json_value_text(args.get(key))
3636 for key in (
3637 "action",
3638 "baseline",
3639 "command",
3640 "config",
3641 "decision",
3642 "environment",
3643 "evidence",
3644 "evidence_artifact",
3645 "metric_name",
3646 "metric_unit",
3647 "metric_value",
3648 "result",
3649 "status",
3650 )
3651 if args.get(key) is not None
3652 )
3653 if tool_name != "record_memory_graph":
3654 return _json_value_text(args)
3655 parts: list[str] = []
3656 nodes = args.get("nodes") if isinstance(args.get("nodes"), list) else []
3657 for node in nodes:
3658 if not isinstance(node, dict):
3659 continue
3660 for key in ("title", "summary", "tags", "metadata"):
3661 value = node.get(key)
3662 if value:
3663 parts.append(_json_text(value))
3664 edges = args.get("edges") if isinstance(args.get("edges"), list) else []
3665 for edge in edges:
3666 if not isinstance(edge, dict):
3667 continue
3668 for key in ("evidence_refs", "metadata"):
3669 value = edge.get(key)
3670 if value:
3671 parts.append(_json_text(value))
3672 return "\n".join(parts)
3673
3674
3675def _json_text(value: Any) -> str:
3676 try:
3677 return json.dumps(value, ensure_ascii=False, sort_keys=True)
3678 except TypeError:
3679 return str(value)
3680
3681
3682def _json_value_text(value: Any) -> str:
3683 if isinstance(value, dict):
3684 return "\n".join(_json_value_text(item) for item in value.values())
3685 if isinstance(value, list):
3686 return "\n".join(_json_value_text(item) for item in value)
3687 return str(value or "")
3688
3689
3690def _cited_step_numbers(text: str) -> set[int]:
3691 numbers = set()
3692 patterns = [
3693 r"(?i)\bsteps?\s*(?:#|-)?\s*(\d+)\b",
3694 r"(?i)\bstep[_-](\d+)\b",
3695 r"(?i)\bshell_exec[_\s-]*step[_\s#-]*(\d+)\b",
3696 r"(?i)\btool[_\s-]*step[_\s#-]*(\d+)\b",
3697 ]
3698 for pattern in patterns:
3699 for match in re.finditer(pattern, text):
3700 raw = match.group(1)
3701 try:
3702 value = int(raw)
3703 except (TypeError, ValueError):
3704 continue
3705 if value > 0:
3706 numbers.add(value)
3707 return numbers
3708
3709
3710def _evidence_steps_for_grounding(
3711 recent_steps: list[dict[str, Any]],
3712 *,
3713 window: int,
3714 step_numbers: set[int] | None = None,
3715) -> list[dict[str, Any]]:
3716 completed = _completed_recent_steps(recent_steps)
3717 if step_numbers:
3718 steps = [step for step in completed if int(step.get("step_no") or 0) in step_numbers]
3719 else:
3720 steps = completed[-window:]
3721 evidence_steps = [
3722 step
3723 for step in steps
3724 if step.get("tool_name") in {"browser_snapshot", "shell_exec", "web_extract", "web_search", "read_artifact"}
3725 ]
3726 if evidence_steps or not step_numbers:
3727 return evidence_steps
3728 return [
3729 step
3730 for step in completed[-window:]
3731 if step.get("tool_name") in {"browser_snapshot", "shell_exec", "web_extract", "web_search", "read_artifact"}
3732 ]
3733
3734
3735def _recent_evidence_text(
3736 job: dict[str, Any],
3737 recent_steps: list[dict[str, Any]],
3738 *,
3739 window: int,
3740 step_numbers: set[int] | None = None,
3741 include_durable: bool = True,
3742 include_job_context: bool = True,
3743) -> str:
3744 parts: list[str] = []
3745 if include_job_context:
3746 parts.extend([str(job.get("title") or ""), str(job.get("objective") or ""), str(job.get("kind") or "")])
3747 durable_text = _durable_records_for_grounding(job) if include_durable else ""
3748 if include_durable and durable_text:
3749 parts.append(durable_text)
3750 for step in _evidence_steps_for_grounding(recent_steps, window=window, step_numbers=step_numbers):
3751 parts.append(str(step.get("summary") or ""))
3752 input_data = step.get("input") if isinstance(step.get("input"), dict) else {}
3753 if input_data:
3754 parts.append(_json_text(input_data))
3755 output = step.get("output") if isinstance(step.get("output"), dict) else {}
3756 if not output:
3757 continue
3758 for key in ("stdout", "stderr", "text", "content", "excerpt", "query", "command"):
3759 if output.get(key):
3760 parts.append(str(output.get(key)))
3761 pages = output.get("pages") if isinstance(output.get("pages"), list) else []
3762 for page in pages[:6]:
3763 if isinstance(page, dict):
3764 parts.append(_json_text({key: page.get(key) for key in ("url", "title", "text", "error", "source_warning")}))
3765 results = output.get("results") if isinstance(output.get("results"), list) else []
3766 for item in results[:8]:
3767 if isinstance(item, dict):
3768 parts.append(_json_text({key: item.get(key) for key in ("url", "title", "snippet")}))
3769 return "\n".join(parts)
3770
3771
3772def _active_stale_claim_token_set(job: dict[str, Any]) -> set[str]:
3773 metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
3774 raw_tokens = metadata.get("unsupported_claim_tokens") if isinstance(metadata.get("unsupported_claim_tokens"), list) else []
3775 filtered = _stale_claim_tokens_from_unsupported(
3776 [str(token) for token in raw_tokens],
3777 reference_text=" ".join(str(job.get(key) or "") for key in ("title", "objective", "kind")),
3778 )
3779 return {str(token).strip().lower() for token in filtered if str(token).strip()}
3780
3781
3782def _durable_records_for_grounding(job: dict[str, Any]) -> str:
3783 metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
3784 parts: list[str] = []
3785 for finding in _metadata_list(job, "finding_ledger")[-20:]:
3786 parts.append(_json_text({
3787 "finding": finding.get("name") or finding.get("title"),
3788 "category": finding.get("category"),
3789 "reason": finding.get("reason") or finding.get("summary"),
3790 "location": finding.get("location"),
3791 "status": finding.get("status"),
3792 "evidence_artifact": finding.get("evidence_artifact"),
3793 "url": finding.get("url"),
3794 "metadata": finding.get("metadata") if isinstance(finding.get("metadata"), dict) else {},
3795 }))
3796 for experiment in _metadata_list(job, "experiment_ledger")[-12:]:
3797 parts.append(_json_text({
3798 "experiment": experiment.get("title") or experiment.get("name"),
3799 "hypothesis": experiment.get("hypothesis"),
3800 "status": experiment.get("status"),
3801 "metric_name": experiment.get("metric_name"),
3802 "metric_value": experiment.get("metric_value"),
3803 "metric_unit": experiment.get("metric_unit"),
3804 "result": experiment.get("result"),
3805 "next_action": experiment.get("next_action"),
3806 "config": experiment.get("config") if isinstance(experiment.get("config"), dict) else {},
3807 }))
3808 for source in _metadata_list(job, "source_ledger")[-12:]:
3809 parts.append(_json_text({
3810 "source": source.get("source") or source.get("url"),
3811 "source_type": source.get("source_type"),
3812 "outcome": source.get("outcome"),
3813 "score": source.get("score"),
3814 }))
3815 roadmap = metadata.get("roadmap") if isinstance(metadata.get("roadmap"), dict) else {}
3816 if roadmap:
3817 parts.append(_json_text({
3818 "roadmap": roadmap.get("title"),
3819 "objective": roadmap.get("objective"),
3820 "current_milestone": roadmap.get("current_milestone"),
3821 "validation_contract": roadmap.get("validation_contract"),
3822 }))
3823 graph = metadata.get("memory_graph") if isinstance(metadata.get("memory_graph"), dict) else {}
3824 nodes = graph.get("nodes") if isinstance(graph.get("nodes"), list) else []
3825 for node in [item for item in nodes if isinstance(item, dict)][-20:]:
3826 parts.append(_json_text({
3827 "memory_node": node.get("key"),
3828 "kind": node.get("kind"),
3829 "title": node.get("title"),
3830 "summary": node.get("summary"),
3831 }))
3832 return "\n".join(parts)
3833
3834
3835def _concrete_evidence_tokens(text: str) -> list[str]:
3836 text = text.replace("\\n", "\n").replace("\\t", "\t")
3837 tokens: list[str] = []
3838 seen_numeric: set[str] = set()
3839 for raw in re.findall(
3840 r"(?i)\b\d+(?:\.\d+)?\s*(?:[KMGTPE]i?B|[KMGTPE]|bytes?|tok/s|t/s|tokens/sec|tokens/s|ms|sec|secs|seconds?|minutes?|hours?|%)\b",
3841 text,
3842 ):
3843 token = re.sub(r"\s+", "", raw.strip())
3844 key = token.lower()
3845 if key in seen_numeric:
3846 continue
3847 seen_numeric.add(key)
3848 tokens.append(token)
3849 for raw in re.findall(r"\b[A-Za-z][A-Za-z0-9_.+-]{1,}\b", text):
3850 token = raw.strip("._+-")
3851 if not token:
3852 continue
3853 lowered = token.lower()
3854 if lowered in EVIDENCE_TOKEN_IGNORE:
3855 continue
3856 if re.match(r"^[a-z]\d+$", token):
3857 continue
3858 if _looks_like_generated_evidence_token(token):
3859 continue
3860 if lowered.startswith("art_"):
3861 continue
3862 if lowered.startswith("step_"):
3863 continue
3864 if lowered.endswith("_output") or lowered.endswith("_stdout") or lowered.endswith("_stderr"):
3865 continue
3866 if token.isupper() and len(token) >= 3:
3867 tokens.append(token)
3868 continue
3869 if any(ch.isdigit() for ch in token) and any(ch.isalpha() for ch in token):
3870 tokens.append(token)
3871 continue
3872 if token[:1].isupper() and token[1:].islower() and len(token) >= 4:
3873 tokens.append(token)
3874 continue
3875 return tokens
3876
3877
3878def _high_risk_evidence_token(token: str) -> bool:
3879 lowered = token.lower()
3880 if not token or lowered in EVIDENCE_TOKEN_IGNORE:
3881 return False
3882 if _looks_like_generated_evidence_token(token):
3883 return False
3884 if lowered.startswith(("art_", "step_")):
3885 return False
3886 if lowered.endswith((".md", ".py", ".json", ".yaml", ".yml", ".gguf", ".txt", ".log")):
3887 return True
3888 if any(ch.isdigit() for ch in token) and any(ch.isalpha() for ch in token):
3889 return True
3890 if token.isupper() and len(token) >= 3:
3891 return True
3892 return False
3893
3894
3895def _looks_like_generated_evidence_token(token: str) -> bool:
3896 lowered = token.lower().strip()
3897 if re.match(
3898 r"^(?:art|step|shell|web|episode|fact|source|quality|constraint|baseline|question|verified|timeout)[_-]\d+[a-z]*$",
3899 lowered,
3900 ):
3901 return True
3902 return bool(re.match(r"^(?:shell|web|browser|tool)[a-z0-9_-]*[_-]step[_-]?\d+[a-z]*$", lowered))
3903
3904
3905def _step_has_evidence(step: dict[str, Any]) -> bool:
3906 tool_name = step.get("tool_name")
3907 output = step.get("output") if isinstance(step.get("output"), dict) else {}
3908 if tool_name == "web_extract":
3909 pages = output.get("pages") if isinstance(output.get("pages"), list) else []
3910 for page in pages:
3911 if page.get("error"):
3912 continue
3913 if str(page.get("text") or "").strip():
3914 return True
3915 if tool_name in {"browser_navigate", "browser_snapshot"}:
3916 data = output.get("data") if isinstance(output.get("data"), dict) else {}
3917 snapshot = str(output.get("snapshot") or data.get("snapshot") or "")
3918 if anti_bot_reason(str(data.get("title") or ""), str(data.get("url") or data.get("origin") or ""), snapshot):
3919 return False
3920 return len(snapshot.strip()) >= 500
3921 if tool_name == "shell_exec":
3922 text = "\n".join(str(output.get(key) or "") for key in ("stdout", "stderr"))
3923 return len(text.strip()) >= 1000
3924 return False
3925
3926
3927def _unpersisted_evidence_step(recent_steps: list[dict[str, Any]]) -> dict[str, Any] | None:
3928 for step in reversed(recent_steps):
3929 if step.get("status") not in {"completed", "blocked"}:
3930 continue
3931 output = step.get("output") if isinstance(step.get("output"), dict) else {}
3932 if step.get("tool_name") == "write_artifact":
3933 return None
3934 if isinstance(output.get("auto_checkpoint"), dict):
3935 return None
3936 if step.get("status") == "completed" and _step_has_evidence(step):
3937 return step
3938 return None
3939
3940
3941def _evidence_checkpoint_accounting_for_prompt(job: dict[str, Any], recent_steps: list[dict[str, Any]]) -> str:
3942 context = _auto_checkpoint_accounting_context(job, recent_steps)
3943 if not context:
3944 return "None."
3945 read_text = "already read" if context.get("checkpoint_read") else "not read yet"
3946 next_action = (
3947 "Next use record_findings, record_source, record_experiment, record_tasks, record_roadmap, "
3948 "record_milestone_validation, or record_lesson to account for it. Do not read the checkpoint again. "
3949 if context.get("checkpoint_read")
3950 else "Next either read that checkpoint artifact, or use record_findings, record_source, record_experiment, "
3951 "record_tasks, record_roadmap, record_milestone_validation, or record_lesson to account for it. "
3952 )
3953 return (
3954 "An auto-saved evidence checkpoint is waiting for durable accounting. "
3955 f"artifact={context.get('artifact_id') or '?'} title={context.get('title') or ''} "
3956 f"evidence_step={context.get('evidence_step_no') or context.get('evidence_step') or '?'} "
3957 f"blocked_tool={context.get('blocked_tool') or ''} status={read_text}. "
3958 f"{next_action}"
3959 "Do not continue shell, search, file, artifact, report, or other branch work until this is resolved."
3960 )
3961
3962
3963def _pending_evidence_checkpoint(job: dict[str, Any]) -> dict[str, Any] | None:
3964 metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
3965 checkpoint = metadata.get("pending_evidence_checkpoint")
3966 if isinstance(checkpoint, dict) and checkpoint and not checkpoint.get("resolved_at"):
3967 return checkpoint
3968 return None
3969
3970
3971def _step_created_auto_checkpoint(step: dict[str, Any]) -> dict[str, Any] | None:
3972 output = step.get("output") if isinstance(step.get("output"), dict) else {}
3973 checkpoint = output.get("auto_checkpoint")
3974 if not isinstance(checkpoint, dict):
3975 return None
3976 if not checkpoint.get("artifact_id"):
3977 return None
3978 # Only auto-persisted checkpoints have a stored path. Guard-context payloads use a
3979 # different key so they cannot reset the read/accounting state.
3980 if not checkpoint.get("path"):
3981 return None
3982 return checkpoint
3983
3984
3985def _auto_checkpoint_accounting_context(job: dict[str, Any], recent_steps: list[dict[str, Any]]) -> dict[str, Any] | None:
3986 pending = _pending_evidence_checkpoint(job)
3987 if pending:
3988 return {
3989 "artifact_id": str(pending.get("artifact_id") or ""),
3990 "title": str(pending.get("title") or ""),
3991 "checkpoint_step_no": pending.get("checkpoint_step_no"),
3992 "evidence_step": pending.get("evidence_step"),
3993 "evidence_step_no": pending.get("evidence_step_no"),
3994 "blocked_tool": pending.get("blocked_tool"),
3995 "checkpoint_read": bool(pending.get("read_at")),
3996 "read_at": pending.get("read_at"),
3997 "created_at": pending.get("created_at"),
3998 "source": "job_metadata",
3999 }
4000 checkpoint_step = None
4001 checkpoint = None
4002 for step in reversed(recent_steps):
4003 created = _step_created_auto_checkpoint(step)
4004 if created:
4005 checkpoint_step = step
4006 checkpoint = created
4007 break
4008 if not checkpoint_step or not checkpoint:
4009 return None
4010 checkpoint_step_no = int(checkpoint_step.get("step_no") or 0)
4011 tail = [step for step in recent_steps if int(step.get("step_no") or 0) > checkpoint_step_no]
4012 if any(step.get("tool_name") in EVIDENCE_CHECKPOINT_ACCOUNTING_TOOLS for step in tail if step.get("status") == "completed"):
4013 return None
4014 artifact_id = str(checkpoint.get("artifact_id") or "")
4015 artifact_title = str(checkpoint.get("title") or "")
4016 checkpoint_read = any(
4017 step.get("tool_name") == "read_artifact"
4018 and step.get("status") == "completed"
4019 and _read_artifact_args_match_checkpoint(step, artifact_id=artifact_id, artifact_title=artifact_title)
4020 for step in tail
4021 )
4022 return {
4023 "artifact_id": artifact_id,
4024 "title": artifact_title,
4025 "checkpoint_step_no": checkpoint_step.get("step_no"),
4026 "evidence_step": checkpoint.get("evidence_step"),
4027 "blocked_tool": checkpoint.get("blocked_tool"),
4028 "checkpoint_read": checkpoint_read,
4029 }
4030
4031
4032def _evidence_checkpoint_blocks_tool(name: str, args: dict[str, Any], context: dict[str, Any] | None) -> bool:
4033 if not context:
4034 return False
4035 if name in EVIDENCE_CHECKPOINT_RESOLUTION_TOOLS or name == "acknowledge_operator_context":
4036 return False
4037 if (
4038 name == "read_artifact"
4039 and not context.get("checkpoint_read")
4040 and _read_artifact_call_matches_checkpoint(
4041 args,
4042 artifact_id=str(context.get("artifact_id") or ""),
4043 artifact_title=str(context.get("title") or ""),
4044 )
4045 ):
4046 return False
4047 return True
4048
4049
4050def _evidence_checkpoint_block_guidance(context: dict[str, Any]) -> str:
4051 tools = (
4052 "record_findings, record_source, record_experiment, record_tasks, "
4053 "record_roadmap, record_milestone_validation, or record_lesson"
4054 )
4055 if context.get("checkpoint_read"):
4056 return (
4057 "The auto-saved evidence checkpoint has already been read. Do not read it again. "
4058 f"Use {tools} to account for what the checkpoint proved, rejected, changed, or blocked "
4059 "before more shell, search, file, report, artifact, or branch work."
4060 )
4061 return (
4062 "An auto-saved evidence checkpoint is waiting to be converted into durable progress. "
4063 f"Read that checkpoint artifact once, or use {tools} to account for it before more shell, "
4064 "search, file, report, artifact, or other branch work."
4065 )
4066
4067
4068def _read_artifact_args_match_checkpoint(step: dict[str, Any], *, artifact_id: str, artifact_title: str) -> bool:
4069 input_data = step.get("input") if isinstance(step.get("input"), dict) else {}
4070 args = input_data.get("arguments") if isinstance(input_data.get("arguments"), dict) else {}
4071 return _read_artifact_call_matches_checkpoint(args, artifact_id=artifact_id, artifact_title=artifact_title)
4072
4073
4074def _read_artifact_call_matches_checkpoint(args: dict[str, Any], *, artifact_id: str, artifact_title: str) -> bool:
4075 values = [str(args.get(key) or "").strip() for key in ("artifact_id", "id", "title", "query")]
4076 values = [value for value in values if value]
4077 if artifact_id and artifact_id in values:
4078 return True
4079 return bool(artifact_title and any(value == artifact_title for value in values))
4080
4081
4082def _recent_search_streak(recent_steps: list[dict[str, Any]]) -> int:
4083 return _recent_tool_streak(recent_steps, "web_search")
4084
4085
4086def _pending_measurement_obligation(job: dict[str, Any]) -> dict[str, Any] | None:
4087 metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
4088 obligation = metadata.get("pending_measurement_obligation")
4089 if isinstance(obligation, dict) and obligation and not obligation.get("resolved_at"):
4090 return obligation
4091 return None
4092
4093
4094CODELIKE_FILE_SUFFIXES = {
4095 ".bash",
4096 ".cfg",
4097 ".cjs",
4098 ".conf",
4099 ".cpp",
4100 ".css",
4101 ".go",
4102 ".h",
4103 ".hpp",
4104 ".ini",
4105 ".java",
4106 ".js",
4107 ".json",
4108 ".jsx",
4109 ".lua",
4110 ".mjs",
4111 ".php",
4112 ".pl",
4113 ".py",
4114 ".rb",
4115 ".rs",
4116 ".sh",
4117 ".sql",
4118 ".toml",
4119 ".ts",
4120 ".tsx",
4121 ".yaml",
4122 ".yml",
4123 ".zsh",
4124}
4125
4126
4127def _pending_file_validation_obligation(job: dict[str, Any]) -> dict[str, Any] | None:
4128 metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
4129 obligation = metadata.get("pending_file_validation_obligation")
4130 if isinstance(obligation, dict) and obligation and not obligation.get("resolved_at"):
4131 return obligation
4132 return None
4133
4134
4135def _file_output_needs_validation(path: str, content: str) -> bool:
4136 suffix = Path(path).suffix.lower()
4137 if suffix in CODELIKE_FILE_SUFFIXES:
4138 return True
4139 first_line = content.lstrip().splitlines()[0] if content.strip() else ""
4140 if first_line.startswith("#!"):
4141 return True
4142 lowered = Path(path).name.lower()
4143 return lowered in {"dockerfile", "makefile", "justfile", "procfile"}
4144
4145
4146def _suggested_file_validation(path: str) -> str:
4147 suffix = Path(path).suffix.lower()
4148 quoted = shlex_quote(path)
4149 if suffix == ".py":
4150 return f"python3 -m py_compile {quoted}"
4151 if suffix in {".sh", ".bash", ".zsh"}:
4152 return f"bash -n {quoted}"
4153 if suffix == ".json":
4154 return f"python3 -m json.tool {quoted}"
4155 if suffix in {".yaml", ".yml"}:
4156 return f"python3 - <<'PY'\nimport pathlib, yaml\nyaml.safe_load(pathlib.Path({path!r}).read_text())\nPY"
4157 return f"run the narrowest available syntax check, test, or dry-run for {quoted}"
4158
4159
4160def shlex_quote(value: str) -> str:
4161 return "'" + str(value).replace("'", "'\"'\"'") + "'"
4162
4163
4164def _clear_invalid_measurement_obligation(db: AgentDB, job_id: str) -> bool:
4165 job = db.get_job(job_id)
4166 obligation = _pending_measurement_obligation(job)
4167 if not obligation:
4168 return False
4169 candidates = obligation.get("metric_candidates") if isinstance(obligation.get("metric_candidates"), list) else []
4170 if not candidates:
4171 return False
4172 command = str(obligation.get("command") or "")
4173 if not measurement_candidates_are_diagnostic_only(candidates, command=command):
4174 return False
4175 db.update_job_metadata(job_id, {"pending_measurement_obligation": {}})
4176 db.append_agent_update(
4177 job_id,
4178 "Cleared measurement obligation because the output was diagnostic context, not a trial result.",
4179 category="progress",
4180 metadata={"cleared_measurement_obligation": obligation},
4181 )
4182 return True
4183
4184
4185def _progress_churn_context(recent_steps: list[dict[str, Any]], *, window: int = 10) -> dict[str, Any] | None:
4186 completed = [step for step in recent_steps if step.get("status") == "completed"]
4187 tail = completed[-window:]
4188 if len(tail) < 8:
4189 return None
4190 if any(step.get("tool_name") in LEDGER_PROGRESS_TOOLS for step in tail):
4191 return None
4192 churn_count = sum(1 for step in tail if step.get("tool_name") in CHURN_TOOLS)
4193 if churn_count < 7:
4194 return None
4195 return {
4196 "window": len(tail),
4197 "churn_count": churn_count,
4198 "since_step": tail[0].get("step_no"),
4199 "tools": [step.get("tool_name") or step.get("kind") for step in tail],
4200 }
4201
4202
4203def _activity_stagnation_context(job: dict[str, Any]) -> dict[str, Any] | None:
4204 metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
4205 streak = _as_int(metadata.get("activity_checkpoint_streak"))
4206 if streak < ACTIVITY_STAGNATION_CHECKPOINTS:
4207 return None
4208 counts = metadata.get("last_checkpoint_counts") if isinstance(metadata.get("last_checkpoint_counts"), dict) else {}
4209 return {
4210 "streak": streak,
4211 "threshold": ACTIVITY_STAGNATION_CHECKPOINTS,
4212 "counts": {key: _as_int(counts.get(key)) for key in ("findings", "sources", "tasks", "experiments", "lessons", "milestones")},
4213 }
4214
4215
4216def _research_balance_context(
4217 job: dict[str, Any],
4218 recent_steps: list[dict[str, Any]],
4219 *,
4220 window: int = 28,
4221 min_execution_actions: int = 5,
4222) -> dict[str, Any] | None:
4223 metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
4224 sources = len(_metadata_list(job, "source_ledger"))
4225 findings = len(_metadata_list(job, "finding_ledger"))
4226 experiments = len(_metadata_list(job, "experiment_ledger"))
4227 if sources > 0 or findings > 0:
4228 return None
4229 if metadata.get("pending_measurement_obligation"):
4230 return None
4231 completed = [step for step in recent_steps if step.get("status") == "completed"]
4232 if not completed:
4233 return None
4234 tail = completed[-window:]
4235 execution_tools = {"shell_exec", "write_file", "record_experiment", "write_artifact"}
4236 research_tools = {
4237 "web_search",
4238 "web_extract",
4239 "browser_navigate",
4240 "browser_snapshot",
4241 "browser_click",
4242 "record_source",
4243 "record_findings",
4244 }
4245 execution_actions = [step for step in tail if step.get("tool_name") in execution_tools]
4246 research_actions = [step for step in tail if step.get("tool_name") in research_tools]
4247 file_actions = [step for step in tail if step.get("tool_name") in {"write_file", "shell_exec"}]
4248 tasks = _metadata_list(job, "task_queue")
4249 active_research_tasks = [
4250 task
4251 for task in tasks
4252 if str(task.get("status") or "open") in {"open", "active", "blocked"}
4253 and str(task.get("output_contract") or "") == "research"
4254 ]
4255 has_research_intent = bool(active_research_tasks) or any(
4256 "research" in str(job.get(key) or "").lower()
4257 for key in ("title", "objective", "kind")
4258 )
4259 if len(execution_actions) < min_execution_actions and not (has_research_intent and len(file_actions) >= 3):
4260 return None
4261 if research_actions and not (has_research_intent and len(execution_actions) >= min_execution_actions * 2):
4262 return None
4263 if experiments <= 0 and len(execution_actions) < min_execution_actions + 2:
4264 return None
4265 return {
4266 "completed_window": len(tail),
4267 "execution_actions": len(execution_actions),
4268 "research_actions": len(research_actions),
4269 "sources": sources,
4270 "findings": findings,
4271 "experiments": experiments,
4272 "files": len(file_actions),
4273 }
4274
4275
4276def _source_yield_context(job: dict[str, Any], recent_steps: list[dict[str, Any]]) -> dict[str, Any] | None:
4277 sources = _metadata_list(job, "source_ledger")
4278 source_count = len(sources)
4279 if source_count < SOURCE_YIELD_MIN_SOURCES:
4280 return None
4281 findings = _metadata_list(job, "finding_ledger")
4282 yielded_sources = [
4283 source
4284 for source in sources
4285 if _as_int(source.get("yield_count")) > 0
4286 or _as_float(source.get("usefulness_score")) >= 0.8
4287 ]
4288 required_yield = max(2, source_count // 8)
4289 if len(findings) + len(yielded_sources) >= required_yield:
4290 return None
4291 completed = [step for step in recent_steps if step.get("status") == "completed"]
4292 last_synthesis_no = 0
4293 for step in completed:
4294 if step.get("tool_name") in {
4295 "record_findings",
4296 "record_source",
4297 "record_tasks",
4298 "record_roadmap",
4299 "record_milestone_validation",
4300 "record_lesson",
4301 }:
4302 last_synthesis_no = max(last_synthesis_no, _as_int(step.get("step_no")))
4303 gathering_after_synthesis = [
4304 step
4305 for step in completed
4306 if _as_int(step.get("step_no")) > last_synthesis_no
4307 and step.get("tool_name") in {
4308 "web_search",
4309 "web_extract",
4310 "browser_navigate",
4311 "browser_snapshot",
4312 "browser_click",
4313 "browser_scroll",
4314 }
4315 ]
4316 recent_gathering = gathering_after_synthesis[-24:]
4317 if len(recent_gathering) < SOURCE_YIELD_MIN_RECENT_GATHERING:
4318 return None
4319 recent_source_titles = [
4320 str(source.get("source") or source.get("title") or "").strip()
4321 for source in sources[-8:]
4322 if str(source.get("source") or source.get("title") or "").strip()
4323 ]
4324 return {
4325 "sources": source_count,
4326 "findings": len(findings),
4327 "yielded_sources": len(yielded_sources),
4328 "required_yield": required_yield,
4329 "recent_gathering": len(recent_gathering),
4330 "since_step": recent_gathering[0].get("step_no") if recent_gathering else None,
4331 "recent_source_titles": recent_source_titles,
4332 }
4333
4334
4335def _artifact_accounting_context(
4336 recent_steps: list[dict[str, Any]],
4337 *,
4338 threshold: int = 3,
4339 window: int = 12,
4340) -> dict[str, Any] | None:
4341 completed = [step for step in recent_steps if step.get("status") == "completed"]
4342 tail: list[dict[str, Any]] = []
4343 for step in reversed(completed[-window:]):
4344 if step.get("tool_name") in LEDGER_PROGRESS_TOOLS:
4345 break
4346 tail.append(step)
4347 tail.reverse()
4348 artifact_steps = [step for step in tail if step.get("tool_name") == "write_artifact"]
4349 if len(artifact_steps) < threshold:
4350 return None
4351 titles = []
4352 for step in artifact_steps[-5:]:
4353 input_data = step.get("input") if isinstance(step.get("input"), dict) else {}
4354 args = input_data.get("arguments") if isinstance(input_data.get("arguments"), dict) else {}
4355 title = str(args.get("title") or step.get("summary") or f"step #{step.get('step_no')}")
4356 titles.append(_clip_text(title, 120))
4357 return {
4358 "artifact_count": len(artifact_steps),
4359 "since_step": tail[0].get("step_no") if tail else None,
4360 "artifact_steps": [step.get("step_no") for step in artifact_steps],
4361 "artifact_titles": titles,
4362 "tools": [step.get("tool_name") or step.get("kind") for step in tail],
4363 }
4364
4365
4366def _job_requires_measured_progress(job: dict[str, Any]) -> bool:
4367 text_parts = [
4368 str(job.get("title") or ""),
4369 str(job.get("objective") or ""),
4370 str(job.get("kind") or ""),
4371 ]
4372 tasks = _metadata_list(job, "task_queue")
4373 for task in tasks:
4374 status = str(task.get("status") or "open")
4375 if status in {"done", "skipped"}:
4376 continue
4377 contract = str(task.get("output_contract") or "")
4378 if contract in {"experiment", "monitor"}:
4379 return True
4380 if contract == "action" and _task_text_requires_measurement(task):
4381 return True
4382 text_parts.extend(
4383 str(task.get(key) or "")
4384 for key in ("title", "goal", "acceptance_criteria", "evidence_needed", "stall_behavior")
4385 )
4386 return any(MEASURABLE_PROGRESS_PATTERN.search(part) for part in text_parts if part)
4387
4388
4389def _task_text_requires_measurement(task: dict[str, Any]) -> bool:
4390 return any(
4391 MEASURABLE_PROGRESS_PATTERN.search(str(task.get(key) or ""))
4392 for key in ("title", "goal", "acceptance_criteria", "evidence_needed", "stall_behavior")
4393 )
4394
4395
4396def _job_requires_deliverable_progress(job: dict[str, Any]) -> bool:
4397 tasks = _metadata_list(job, "task_queue")
4398 report_tasks: list[dict[str, Any]] = []
4399 competing_execution_tasks: list[dict[str, Any]] = []
4400 for task in tasks:
4401 status = str(task.get("status") or "open").strip().lower()
4402 if status in {"done", "skipped"}:
4403 continue
4404 contract = str(task.get("output_contract") or "").strip().lower()
4405 if contract == "report":
4406 report_tasks.append(task)
4407 elif contract in {"action", "experiment", "monitor"}:
4408 competing_execution_tasks.append(task)
4409 if report_tasks:
4410 active_report = any(str(task.get("status") or "open").strip().lower() == "active" for task in report_tasks)
4411 active_competing = any(
4412 str(task.get("status") or "open").strip().lower() == "active"
4413 for task in competing_execution_tasks
4414 )
4415 max_report_priority = max(_as_int(task.get("priority")) for task in report_tasks)
4416 higher_priority_competing = any(
4417 _as_int(task.get("priority")) >= max_report_priority
4418 for task in competing_execution_tasks
4419 )
4420 if active_report or (not active_competing and not higher_priority_competing):
4421 return True
4422 text = " ".join(str(job.get(key) or "") for key in ("title", "objective", "kind")).lower()
4423 tokens = set(re.findall(r"[a-z][a-z0-9_-]+", text))
4424 objective_terms = DELIVERABLE_ARTIFACT_TERMS - {"compiled", "final", "revision", "section", "updated"}
4425 return bool(tokens & objective_terms)
4426
4427
4428def _step_is_deliverable_checkpoint(step: dict[str, Any]) -> bool:
4429 tool = step.get("tool_name")
4430 if tool == "write_file":
4431 return True
4432 if tool != "write_artifact":
4433 return False
4434 input_data = step.get("input") if isinstance(step.get("input"), dict) else {}
4435 args = input_data.get("arguments") if isinstance(input_data.get("arguments"), dict) else {}
4436 text = " ".join(
4437 str(value or "")
4438 for value in (
4439 args.get("title"),
4440 args.get("summary"),
4441 args.get("artifact_type"),
4442 step.get("summary"),
4443 )
4444 ).lower()
4445 tokens = set(re.findall(r"[a-z][a-z0-9_-]+", text))
4446 if tokens & EVIDENCE_ARTIFACT_TERMS:
4447 return False
4448 return bool(tokens & DELIVERABLE_ARTIFACT_TERMS)
4449
4450
4451def _deliverable_progress_guard_context(
4452 job: dict[str, Any],
4453 recent_steps: list[dict[str, Any]],
4454 *,
4455 budget: int = DELIVERABLE_RESEARCH_BUDGET_STEPS,
4456) -> dict[str, Any] | None:
4457 if not _job_requires_deliverable_progress(job):
4458 return None
4459 completed = [step for step in recent_steps if step.get("status") == "completed"]
4460 if not completed:
4461 return None
4462 last_checkpoint_index = -1
4463 for index, step in enumerate(completed):
4464 if _step_is_deliverable_checkpoint(step):
4465 last_checkpoint_index = index
4466 tail = completed[last_checkpoint_index + 1 :]
4467 branch_activity = [
4468 step
4469 for step in tail
4470 if step.get("tool_name") in BRANCH_WORK_TOOLS
4471 or (
4472 step.get("tool_name") == "shell_exec"
4473 and _shell_command_looks_read_only(_step_command(step))
4474 )
4475 ]
4476 if len(branch_activity) < budget:
4477 return None
4478 deliverable_accounting_tools = {"record_tasks", "record_roadmap", "record_milestone_validation", "record_lesson"}
4479 if any(step.get("tool_name") in deliverable_accounting_tools for step in tail[-6:]):
4480 return None
4481 return {
4482 "reason": "no deliverable checkpoint yet" if last_checkpoint_index < 0 else "no recent deliverable checkpoint",
4483 "research_budget": budget,
4484 "completed_since_last_deliverable": len(tail),
4485 "branch_activity": len(branch_activity),
4486 "since_step": branch_activity[0].get("step_no") if branch_activity else None,
4487 "tools": [step.get("tool_name") or step.get("kind") for step in branch_activity[-10:]],
4488 }
4489
4490
4491def _step_command(step: dict[str, Any]) -> str:
4492 input_data = step.get("input") if isinstance(step.get("input"), dict) else {}
4493 args = input_data.get("arguments") if isinstance(input_data.get("arguments"), dict) else {}
4494 return str(args.get("command") or "")
4495
4496
4497def _read_only_shell_churn_context(recent_steps: list[dict[str, Any]], *, window: int = 10, threshold: int = 3) -> dict[str, Any] | None:
4498 completed = [step for step in recent_steps if step.get("status") == "completed"]
4499 if not completed:
4500 return None
4501 tail = completed[-window:]
4502 read_only_shell = [
4503 step
4504 for step in tail
4505 if step.get("tool_name") == "shell_exec" and _shell_command_looks_read_only(_step_command(step))
4506 ]
4507 if len(read_only_shell) < threshold:
4508 return None
4509 action_steps = [
4510 step
4511 for step in tail
4512 if step.get("tool_name") in {"write_file", "write_artifact", "defer_job"}
4513 or step.get("tool_name") in {
4514 "record_experiment",
4515 "record_findings",
4516 "record_lesson",
4517 "record_milestone_validation",
4518 "record_roadmap",
4519 "record_source",
4520 "record_tasks",
4521 "report_update",
4522 }
4523 or (step.get("tool_name") == "shell_exec" and not _shell_command_looks_read_only(_step_command(step)))
4524 ]
4525 if action_steps:
4526 return None
4527 return {
4528 "read_only_shell_count": len(read_only_shell),
4529 "threshold": threshold,
4530 "window": len(tail),
4531 "since_step": read_only_shell[0].get("step_no"),
4532 "commands": [_clip_text(_step_command(step), 140) for step in read_only_shell[-5:]],
4533 }
4534
4535
4536def _experiment_metric_group_key(experiment: dict[str, Any]) -> tuple[str, str, bool] | None:
4537 metric_name = str(experiment.get("metric_name") or "").strip().lower()
4538 if not metric_name:
4539 return None
4540 if experiment.get("metric_value") is None:
4541 return None
4542 return (
4543 metric_name,
4544 str(experiment.get("metric_unit") or "").strip().lower(),
4545 bool(experiment.get("higher_is_better", True)),
4546 )
4547
4548
4549def _experiment_metric_number(experiment: dict[str, Any]) -> float | None:
4550 try:
4551 return float(experiment.get("metric_value"))
4552 except (TypeError, ValueError):
4553 return None
4554
4555
4556def _experiment_value_improves(*, value: float, best_value: float, higher_is_better: bool) -> bool:
4557 return value > best_value if higher_is_better else value < best_value
4558
4559
4560def _experiment_stagnation_context(job: dict[str, Any], recent_steps: list[dict[str, Any]]) -> dict[str, Any] | None:
4561 if not _job_requires_measured_progress(job):
4562 return None
4563 experiments = [
4564 experiment
4565 for experiment in _metadata_list(job, "experiment_ledger")
4566 if str(experiment.get("status") or "").lower() == "measured"
4567 and _experiment_metric_group_key(experiment) is not None
4568 ]
4569 if len(experiments) < EXPERIMENT_STAGNATION_MIN_TRIALS:
4570 return None
4571 latest = experiments[-1]
4572 key = _experiment_metric_group_key(latest)
4573 if key is None:
4574 return None
4575 group = [experiment for experiment in experiments if _experiment_metric_group_key(experiment) == key]
4576 if len(group) < EXPERIMENT_STAGNATION_MIN_TRIALS:
4577 return None
4578 higher_is_better = bool(latest.get("higher_is_better", True))
4579 best_index = 0
4580 best_value = _experiment_metric_number(group[0])
4581 for index, experiment in enumerate(group[1:], start=1):
4582 value = _experiment_metric_number(experiment)
4583 if value is None:
4584 continue
4585 if best_value is None or _experiment_value_improves(
4586 value=value,
4587 best_value=best_value,
4588 higher_is_better=higher_is_better,
4589 ):
4590 best_index = index
4591 best_value = value
4592 if best_value is None:
4593 return None
4594 non_improving = group[best_index + 1:]
4595 if len(non_improving) < EXPERIMENT_STAGNATION_NON_IMPROVING:
4596 return None
4597 last_experiment_step_no = 0
4598 for step in recent_steps:
4599 if step.get("tool_name") == "record_experiment" and str(step.get("status") or "").lower() == "completed":
4600 last_experiment_step_no = max(last_experiment_step_no, _as_int(step.get("step_no")))
4601 if last_experiment_step_no > 0:
4602 decision_tools = {"record_lesson", "record_tasks", "record_roadmap", "record_milestone_validation"}
4603 if any(
4604 _as_int(step.get("step_no")) > last_experiment_step_no
4605 and str(step.get("status") or "").lower() == "completed"
4606 and step.get("tool_name") in decision_tools
4607 for step in recent_steps
4608 ):
4609 return None
4610 best = group[best_index]
4611 return {
4612 "metric_name": latest.get("metric_name"),
4613 "metric_unit": latest.get("metric_unit"),
4614 "higher_is_better": higher_is_better,
4615 "best_title": best.get("title"),
4616 "best_value": best.get("metric_value"),
4617 "latest_title": latest.get("title"),
4618 "latest_value": latest.get("metric_value"),
4619 "non_improving_count": len(non_improving),
4620 "recent_trials": len(group),
4621 "recent_titles": [str(experiment.get("title") or "") for experiment in non_improving[-5:]],
4622 }
4623
4624
4625def _measured_progress_guard_context(
4626 job: dict[str, Any],
4627 recent_steps: list[dict[str, Any]],
4628 *,
4629 budget: int = MEASURABLE_RESEARCH_BUDGET_STEPS,
4630) -> dict[str, Any] | None:
4631 if not _job_requires_measured_progress(job):
4632 return None
4633 if _pending_measurement_obligation(job):
4634 return None
4635 completed = [step for step in recent_steps if step.get("status") == "completed"]
4636 if not completed:
4637 return None
4638 last_experiment_index = -1
4639 for index, step in enumerate(completed):
4640 if step.get("tool_name") == "record_experiment":
4641 last_experiment_index = index
4642 tail = completed[last_experiment_index + 1 :]
4643 branch_activity = [step for step in tail if step.get("tool_name") in BRANCH_WORK_TOOLS | {"write_artifact"}]
4644 shell_actions = [step for step in tail if step.get("tool_name") == "shell_exec"]
4645 if len(branch_activity) < budget and len(shell_actions) < MEASURABLE_ACTION_BUDGET_STEPS:
4646 return None
4647 if any(_step_accounts_for_measured_progress_guard(step) for step in tail[-6:]):
4648 return None
4649 experiments = _metadata_list(job, "experiment_ledger")
4650 reason = "no experiment records yet" if not experiments else "no recent experiment update"
4651 return {
4652 "reason": reason,
4653 "research_budget": budget,
4654 "shell_action_budget": MEASURABLE_ACTION_BUDGET_STEPS,
4655 "completed_since_last_experiment": len(tail),
4656 "branch_activity": len(branch_activity),
4657 "shell_actions_since_last_experiment": len(shell_actions),
4658 "since_step": branch_activity[0].get("step_no") if branch_activity else None,
4659 "tools": [step.get("tool_name") or step.get("kind") for step in branch_activity[-10:]],
4660 }
4661
4662
4663def _step_accounts_for_measured_progress_guard(step: dict[str, Any]) -> bool:
4664 tool_name = step.get("tool_name")
4665 if tool_name == "record_lesson":
4666 return True
4667 if tool_name != "record_tasks":
4668 return False
4669 output = step.get("output") if isinstance(step.get("output"), dict) else {}
4670 tasks = output.get("tasks") if isinstance(output.get("tasks"), list) else []
4671 for task in tasks:
4672 if not isinstance(task, dict):
4673 continue
4674 status = str(task.get("status") or "open").strip().lower().replace(" ", "_")
4675 if status in {"done", "skipped"}:
4676 continue
4677 contract = str(task.get("output_contract") or "").strip().lower().replace(" ", "_")
4678 if contract in {"experiment", "monitor"}:
4679 return True
4680 if contract == "action" and _task_text_requires_measurement(task):
4681 return True
4682 return False
4683
4684
4685def _maybe_create_measurement_obligation(
4686 *,
4687 db: AgentDB,
4688 job_id: str,
4689 step: dict[str, Any] | None,
4690 tool_name: str,
4691 args: dict[str, Any],
4692 result: dict[str, Any],
4693) -> None:
4694 if tool_name != "shell_exec":
4695 return
4696 command = str(args.get("command") or result.get("command") or "")
4697 candidates = measurement_candidates(result, command=command)
4698 if not candidates:
4699 return
4700 metadata = db.get_job(job_id).get("metadata")
4701 if isinstance(metadata, dict):
4702 existing = metadata.get("pending_measurement_obligation")
4703 if isinstance(existing, dict) and existing and not existing.get("resolved_at"):
4704 return
4705 obligation = {
4706 "created_at": datetime.now(timezone.utc).isoformat(),
4707 "source_step_id": step.get("id") if step else "",
4708 "source_step_no": step.get("step_no") if step else None,
4709 "tool": tool_name,
4710 "summary": "Tool output contains measurable-looking results that need experiment accounting.",
4711 "metric_candidates": candidates,
4712 "command": command[:1000],
4713 }
4714 db.update_job_metadata(job_id, {"pending_measurement_obligation": obligation})
4715 db.append_agent_update(
4716 job_id,
4717 f"Measured output needs accounting: {', '.join(candidates[:3])}.",
4718 category="blocked",
4719 metadata={"pending_measurement_obligation": obligation},
4720 )
4721
4722
4723def _maybe_create_file_validation_obligation(
4724 *,
4725 db: AgentDB,
4726 job_id: str,
4727 step: dict[str, Any] | None,
4728 args: dict[str, Any],
4729 result: dict[str, Any],
4730) -> None:
4731 path = str(result.get("path") or args.get("path") or "").strip()
4732 content = str(args.get("content") or "")
4733 if not path or not _file_output_needs_validation(path, content):
4734 return
4735 metadata = db.get_job(job_id).get("metadata")
4736 if isinstance(metadata, dict):
4737 existing = metadata.get("pending_file_validation_obligation")
4738 if isinstance(existing, dict) and existing and not existing.get("resolved_at"):
4739 return
4740 obligation = {
4741 "created_at": datetime.now(timezone.utc).isoformat(),
4742 "source_step_id": step.get("id") if step else "",
4743 "source_step_no": step.get("step_no") if step else None,
4744 "tool": "write_file",
4745 "path": path,
4746 "reason": "code/config/script-like file was written and needs validation before more branch work",
4747 "suggested_validation": _suggested_file_validation(path),
4748 }
4749 db.update_job_metadata(job_id, {"pending_file_validation_obligation": obligation})
4750 db.append_agent_update(
4751 job_id,
4752 f"File output needs validation: {path}",
4753 category="blocked",
4754 metadata={"pending_file_validation_obligation": obligation},
4755 )
4756
4757
4758def _command_references_path(command: str, path: str) -> bool:
4759 if not command or not path:
4760 return False
4761 path_obj = Path(path)
4762 needles = {str(path_obj), path_obj.name}
4763 try:
4764 needles.add(str(path_obj.expanduser().resolve()))
4765 except OSError:
4766 pass
4767 return any(needle and needle in command for needle in needles)
4768
4769
4770def _resolve_file_validation_obligation(
4771 db: AgentDB,
4772 job_id: str,
4773 *,
4774 status: str,
4775 reason: str,
4776 via_tool: str,
4777 result: dict[str, Any] | None = None,
4778) -> None:
4779 job = db.get_job(job_id)
4780 obligation = _pending_file_validation_obligation(job)
4781 if not obligation:
4782 return
4783 resolved = dict(obligation)
4784 resolved.update({
4785 "resolved_at": datetime.now(timezone.utc).isoformat(),
4786 "resolution_status": status,
4787 "resolution_reason": reason[:1000],
4788 "resolution_tool": via_tool,
4789 })
4790 if result:
4791 resolved["validation_result"] = {
4792 key: result.get(key)
4793 for key in ("success", "returncode", "error", "summary")
4794 if key in result
4795 }
4796 db.update_job_metadata(
4797 job_id,
4798 {
4799 "pending_file_validation_obligation": {},
4800 "last_file_validation_obligation": resolved,
4801 },
4802 )
4803 db.append_agent_update(
4804 job_id,
4805 f"File validation {status}: {reason[:220]}",
4806 category="progress" if status == "validated" else "blocked",
4807 metadata={"file_validation_obligation": resolved},
4808 )
4809
4810
4811def _maybe_resolve_file_validation_obligation(
4812 *,
4813 db: AgentDB,
4814 job_id: str,
4815 tool_name: str,
4816 args: dict[str, Any],
4817 result: dict[str, Any],
4818 ok: bool,
4819) -> None:
4820 obligation = _pending_file_validation_obligation(db.get_job(job_id))
4821 if not obligation:
4822 return
4823 if tool_name == "shell_exec":
4824 command = str(args.get("command") or result.get("command") or "")
4825 path = str(obligation.get("path") or "")
4826 if not _command_references_path(command, path):
4827 return
4828 status = "validated" if ok else "failed"
4829 reason = "Validation command completed." if ok else f"Validation command failed: {result.get('error') or 'non-zero result'}"
4830 _resolve_file_validation_obligation(db, job_id, status=status, reason=reason, via_tool=tool_name, result=result)
4831 return
4832 if ok and tool_name in {"record_lesson", "record_tasks", "record_experiment", "record_milestone_validation"}:
4833 _resolve_file_validation_obligation(
4834 db,
4835 job_id,
4836 status="deferred",
4837 reason=f"Validation was handled or deferred via {tool_name}.",
4838 via_tool=tool_name,
4839 result=result,
4840 )
4841
4842
4843def _step_by_id(db: AgentDB, job_id: str, step_id: str) -> dict[str, Any] | None:
4844 for step in db.list_steps(job_id=job_id):
4845 if str(step.get("id") or "") == step_id:
4846 return step
4847 return None
4848
4849
4850def _search_query(args: dict[str, Any]) -> str:
4851 return str(args.get("query") or "").strip()
4852
4853
4854def _query_tokens(query: str) -> set[str]:
4855 return {
4856 token
4857 for token in re.findall(r"[a-z0-9]+", query.lower())
4858 if len(token) > 2 and token not in QUERY_STOPWORDS
4859 }
4860
4861
4862def _text_tokens(value: str) -> set[str]:
4863 return {
4864 token
4865 for token in re.findall(r"[a-z0-9]+", str(value or "").lower())
4866 if len(token) > 2 and token not in TEXT_TOKEN_STOPWORDS
4867 }
4868
4869
4870def _similar_recent_search(
4871 args: dict[str, Any],
4872 recent_steps: list[dict[str, Any]],
4873 *,
4874 window: int = 12,
4875) -> dict[str, Any] | None:
4876 return _similar_recent_query_tool("web_search", args, recent_steps, window=window)
4877
4878
4879def _similar_recent_query_tool(
4880 tool_name: str,
4881 args: dict[str, Any],
4882 recent_steps: list[dict[str, Any]],
4883 *,
4884 window: int = 12,
4885) -> dict[str, Any] | None:
4886 query = _search_query(args)
4887 tokens = _query_tokens(query)
4888 if len(tokens) < 2:
4889 return None
4890 for step in reversed(_completed_recent_steps(recent_steps)[-window:]):
4891 if step.get("tool_name") != tool_name:
4892 continue
4893 input_data = step.get("input") or {}
4894 previous_args = input_data.get("arguments") if isinstance(input_data, dict) else None
4895 if not isinstance(previous_args, dict):
4896 continue
4897 previous_query = _search_query(previous_args)
4898 previous_tokens = _query_tokens(previous_query)
4899 if len(previous_tokens) < 2:
4900 continue
4901 overlap = len(tokens & previous_tokens) / max(len(tokens), len(previous_tokens))
4902 if overlap >= 0.72:
4903 return step
4904 return None
4905
4906
4907def _recent_tool_streak(recent_steps: list[dict[str, Any]], tool_name: str) -> int:
4908 streak = 0
4909 for step in reversed(_completed_recent_steps(recent_steps)):
4910 current_tool = step.get("tool_name")
4911 if current_tool == tool_name:
4912 streak += 1
4913 continue
4914 if current_tool:
4915 break
4916 return streak
4917
4918
4919def _repeated_guard_block_context(
4920 recent_steps: list[dict[str, Any]],
4921 *,
4922 threshold: int = 3,
4923 window: int = 12,
4924) -> dict[str, Any] | None:
4925 recoveries = [
4926 step
4927 for step in recent_steps
4928 if step.get("tool_name") == "guard_recovery" and step.get("status") == "completed"
4929 ]
4930 last_recovery = max(
4931 recoveries,
4932 key=lambda step: int(step.get("step_no") or 0),
4933 default=None,
4934 )
4935 last_recovery_no = int(last_recovery.get("step_no") or 0) if last_recovery else 0
4936 last_recovery_error = ""
4937 if last_recovery:
4938 recovery_output = last_recovery.get("output") if isinstance(last_recovery.get("output"), dict) else {}
4939 recovery_context = recovery_output.get("guard_recovery") if isinstance(recovery_output.get("guard_recovery"), dict) else {}
4940 last_recovery_error = str(recovery_context.get("error") or "")
4941 operational_steps = [
4942 step
4943 for step in recent_steps
4944 if int(step.get("step_no") or 0) > last_recovery_no
4945 if step.get("kind") in {"tool", "recovery", "assistant"} and step.get("tool_name") != "guard_recovery"
4946 ]
4947 tail = operational_steps[-window:]
4948 latest_blocked = next((step for step in reversed(tail) if step.get("status") == "blocked"), None)
4949 if not latest_blocked:
4950 return None
4951 output = latest_blocked.get("output") if isinstance(latest_blocked.get("output"), dict) else {}
4952 error = str(output.get("error") or latest_blocked.get("error") or "")
4953 if error not in RECOVERABLE_GUARD_ERRORS:
4954 return None
4955 count = 0
4956 blocked_tools = []
4957 first_step_no = None
4958 for step in tail:
4959 step_output = step.get("output") if isinstance(step.get("output"), dict) else {}
4960 step_error = str(step_output.get("error") or step.get("error") or "")
4961 if step.get("status") == "blocked" and step_error == error:
4962 count += 1
4963 first_step_no = first_step_no or step.get("step_no")
4964 blocked_tools.append(str(step.get("tool_name") or step.get("kind") or "tool"))
4965 effective_threshold = 1 if _already_read_checkpoint_accounting_block(latest_blocked) else threshold
4966 if count < effective_threshold:
4967 return None
4968 progress_after_recovery = any(
4969 step.get("status") == "completed"
4970 and step.get("tool_name") != "guard_recovery"
4971 for step in operational_steps
4972 )
4973 if last_recovery_error == error and not progress_after_recovery:
4974 return None
4975 context = {
4976 "error": error,
4977 "count": count,
4978 "first_step_no": first_step_no,
4979 "latest_step_no": latest_blocked.get("step_no"),
4980 "blocked_tools": blocked_tools[-8:],
4981 }
4982 if error == "task queue saturated":
4983 task_queue = output.get("task_queue") if isinstance(output.get("task_queue"), dict) else {}
4984 context["task_queue"] = {
4985 "reason": task_queue.get("reason") or "task queue saturated",
4986 "open_count": task_queue.get("open_count"),
4987 "total_count": task_queue.get("total_count"),
4988 "open_titles": task_queue.get("open_titles") if isinstance(task_queue.get("open_titles"), list) else [],
4989 }
4990 return context
4991
4992
4993def _already_read_checkpoint_accounting_block(step: dict[str, Any]) -> bool:
4994 output = step.get("output") if isinstance(step.get("output"), dict) else {}
4995 checkpoint = output.get("pending_evidence_checkpoint") if isinstance(output.get("pending_evidence_checkpoint"), dict) else {}
4996 return (
4997 output.get("error") == "evidence checkpoint accounting required"
4998 and (bool(output.get("checkpoint_already_read")) or bool(checkpoint.get("checkpoint_read")))
4999 )
5000
5001
5002def _step_error_text(step: dict[str, Any]) -> str:
5003 output = step.get("output") if isinstance(step.get("output"), dict) else {}
5004 parts = [
5005 output.get("error"),
5006 output.get("error_type"),
5007 output.get("detail"),
5008 output.get("message"),
5009 step.get("error"),
5010 step.get("summary"),
5011 ]
5012 return " ".join(str(part) for part in parts if part)
5013
5014
5015def _blocked_tool_call_result(
5016 name: str,
5017 args: dict[str, Any],
5018 recent_steps: list[dict[str, Any]],
5019 job: dict[str, Any],
5020) -> tuple[dict[str, Any], str] | None:
5021 if name == "defer_job":
5022 self_defer = _self_defer_context(args)
5023 if self_defer:
5024 result = {
5025 "success": False,
5026 "error": "self-defer blocked",
5027 "blocked_tool": name,
5028 "blocked_arguments": args,
5029 "self_defer": self_defer,
5030 "guidance": (
5031 "Do not defer merely for a future worker turn to pick up ordinary work. Use defer_job only when "
5032 "waiting for a real external process, scheduled monitor interval, long-running command, "
5033 "or other time-based condition. Otherwise execute, measure, record a task/experiment/lesson, or "
5034 "mark the branch blocked now."
5035 ),
5036 }
5037 return result, "blocked defer_job; self-defer is not progress"
5038
5039 if name == "record_tasks":
5040 saturated = _task_queue_saturation_context(job, args)
5041 if saturated:
5042 result = {
5043 "success": False,
5044 "error": "task queue saturated",
5045 "blocked_tool": name,
5046 "blocked_arguments": args,
5047 "task_queue": saturated,
5048 "guidance": (
5049 "The durable task queue already has many branches. Do not create more branch sprawl. "
5050 "Choose an existing high-priority task and execute it, update existing tasks to active, "
5051 "done, blocked, or skipped, or consolidate the queue into roadmap/milestone state."
5052 ),
5053 }
5054 return result, f"blocked record_tasks; {saturated['reason']}"
5055 task_planning_stagnation = _task_planning_stagnation_context(job)
5056 if task_planning_stagnation and _record_tasks_adds_new_open_work(args, job):
5057 result = {
5058 "success": False,
5059 "error": "task execution required",
5060 "blocked_tool": name,
5061 "blocked_arguments": args,
5062 "task_planning": task_planning_stagnation,
5063 "guidance": (
5064 "Recent checkpoints only expanded the task queue. Do not add more new open tasks yet. "
5065 "Execute or validate an existing branch, save a durable checkpoint, record findings/source/"
5066 "experiment evidence, mark existing tasks done/blocked/skipped, or record a lesson."
5067 ),
5068 }
5069 return result, "blocked record_tasks; task-only planning needs execution"
5070
5071 current_milestone_validation = _milestone_validation_needed(job)
5072 if (
5073 name == "record_milestone_validation"
5074 and current_milestone_validation
5075 and not _milestone_validation_call_matches_current(args, current_milestone_validation)
5076 ):
5077 result = {
5078 "success": False,
5079 "error": "current milestone validation required",
5080 "blocked_tool": name,
5081 "blocked_arguments": args,
5082 "milestone": {
5083 "title": current_milestone_validation.get("title"),
5084 "status": current_milestone_validation.get("status"),
5085 "validation_status": current_milestone_validation.get("validation_status"),
5086 "acceptance_criteria": current_milestone_validation.get("acceptance_criteria"),
5087 "evidence_needed": current_milestone_validation.get("evidence_needed"),
5088 },
5089 "guidance": (
5090 "A milestone validation gate is already active. Validate that current milestone by name, "
5091 "or update the roadmap to make a different milestone current before validating another one."
5092 ),
5093 }
5094 return result, "blocked record_milestone_validation; current milestone validation required"
5095
5096 auto_checkpoint_accounting = _auto_checkpoint_accounting_context(job, recent_steps)
5097 checkpoint_read_call = bool(
5098 auto_checkpoint_accounting
5099 and name == "read_artifact"
5100 and not auto_checkpoint_accounting.get("checkpoint_read")
5101 and _read_artifact_call_matches_checkpoint(
5102 args,
5103 artifact_id=str(auto_checkpoint_accounting.get("artifact_id") or ""),
5104 artifact_title=str(auto_checkpoint_accounting.get("title") or ""),
5105 )
5106 )
5107 if _evidence_checkpoint_blocks_tool(name, args, auto_checkpoint_accounting):
5108 checkpoint_already_read = bool(auto_checkpoint_accounting and auto_checkpoint_accounting.get("checkpoint_read"))
5109 result = {
5110 "success": False,
5111 "error": "evidence checkpoint accounting required",
5112 "blocked_tool": name,
5113 "blocked_arguments": args,
5114 "pending_evidence_checkpoint": auto_checkpoint_accounting,
5115 "checkpoint_already_read": checkpoint_already_read,
5116 "required_next_action": "durable_checkpoint_accounting" if checkpoint_already_read else "read_or_account_checkpoint",
5117 "allowed_resolution_tools": sorted(EVIDENCE_CHECKPOINT_RESOLUTION_TOOLS),
5118 "guidance": _evidence_checkpoint_block_guidance(auto_checkpoint_accounting or {}),
5119 }
5120 return result, f"blocked {name}; evidence checkpoint accounting required"
5121 checkpoint_resolution_call = bool(auto_checkpoint_accounting and name in EVIDENCE_CHECKPOINT_RESOLUTION_TOOLS)
5122
5123 if name == "shell_exec":
5124 placeholder = _shell_placeholder_context(str(args.get("command") or ""))
5125 if placeholder:
5126 result = {
5127 "success": False,
5128 "error": "unresolved placeholder in shell command",
5129 "blocked_tool": name,
5130 "blocked_arguments": args,
5131 "placeholder": placeholder,
5132 "guidance": (
5133 "Do not execute shell commands that still contain placeholder URLs, paths, hosts, or template "
5134 "tokens. Resolve the concrete value from evidence, ask the operator if it is genuinely unknown, "
5135 "or record a blocked task/source before continuing."
5136 ),
5137 }
5138 return result, "blocked shell_exec; unresolved placeholder in command"
5139 syntax_error = _shell_syntax_preflight_context(str(args.get("command") or ""))
5140 if syntax_error:
5141 result = {
5142 "success": False,
5143 "recoverable": True,
5144 "error": "malformed shell command",
5145 "blocked_tool": name,
5146 "blocked_arguments": args,
5147 "syntax": syntax_error,
5148 "guidance": (
5149 "Do not execute partial or malformed shell. Rebuild the command from exact observed paths, "
5150 "or use a simpler bounded probe before retrying."
5151 ),
5152 }
5153 return result, "blocked shell_exec; malformed command syntax"
5154 candidate_recovery = _observed_candidate_recovery_required_context(recent_steps, args)
5155 if candidate_recovery:
5156 result = {
5157 "success": False,
5158 "error": "observed executable recovery required",
5159 "blocked_tool": name,
5160 "blocked_arguments": args,
5161 "candidate_recovery": candidate_recovery,
5162 "guidance": (
5163 "A recent shell step reported this command as missing, and later evidence showed candidate "
5164 "executable paths. Retry with an exact observed executable path, add its directory to PATH, "
5165 "or record why that observed candidate is invalid before running the bare command again."
5166 ),
5167 }
5168 return result, "blocked shell_exec; observed executable recovery required"
5169 privileged_failure = _recent_privileged_shell_failure_context(recent_steps)
5170 if privileged_failure and _shell_command_looks_privileged_or_package_manager(str(args.get("command") or "")):
5171 result = {
5172 "success": False,
5173 "error": "privileged command recovery required",
5174 "blocked_tool": name,
5175 "blocked_arguments": args,
5176 "privileged_failure": privileged_failure,
5177 "guidance": (
5178 "A recent privileged/package-manager shell command failed due permission or authorization. "
5179 "Do not retry that class of command until the failure is accounted for. Use observed executable "
5180 "paths, user-writable installs, existing project files, or record_tasks/record_lesson/"
5181 "record_experiment to mark the branch blocked or choose a non-privileged recovery."
5182 ),
5183 }
5184 return result, "blocked shell_exec; privileged command recovery required"
5185
5186 unpersisted_evidence = _unpersisted_evidence_step(recent_steps)
5187 if unpersisted_evidence and name in BRANCH_WORK_TOOLS:
5188 result = {
5189 "success": False,
5190 "error": "artifact required before more research",
5191 "blocked_tool": name,
5192 "blocked_arguments": args,
5193 "previous_step": unpersisted_evidence["id"],
5194 "guidance": (
5195 "Fresh browser, extracted, or shell evidence is waiting. Save or account for that evidence with "
5196 "write_artifact, record_findings, record_source, record_experiment, record_tasks, "
5197 "record_roadmap, record_milestone_validation, or record_lesson before doing more search, "
5198 "browsing, shell work, or artifact review."
5199 ),
5200 }
5201 return result, f"blocked {name}; write_artifact required after evidence step #{unpersisted_evidence['step_no']}"
5202
5203 duplicate_step = _duplicate_recent_tool_call(name, args, recent_steps)
5204 if duplicate_step:
5205 guidance = "Use a different query, extract one of the prior result URLs, open a result in the browser, or write an artifact."
5206 if name == "read_artifact":
5207 guidance = (
5208 "This artifact was already read. Do not read it again; use its content to inspect a concrete item, "
5209 "record findings/tasks, or write a report artifact."
5210 )
5211 elif name == "shell_exec":
5212 guidance = (
5213 "This shell command was already run. Do not rerun discovery; use the previous output to inspect a "
5214 "specific file/item, write an artifact, or update findings/tasks."
5215 )
5216 result = {
5217 "success": False,
5218 "recoverable": name == "read_artifact",
5219 "error": "duplicate tool call blocked",
5220 "blocked_tool": name,
5221 "blocked_arguments": args,
5222 "previous_step": duplicate_step["id"],
5223 "guidance": guidance,
5224 }
5225 return result, f"blocked duplicate {name}; previous step #{duplicate_step['step_no']}"
5226
5227 if checkpoint_read_call:
5228 return None
5229
5230 browser_runtime_unavailable = _browser_runtime_unavailable_context(recent_steps)
5231 if browser_runtime_unavailable and _is_browser_tool(name):
5232 result = {
5233 "success": False,
5234 "error": "browser runtime unavailable",
5235 "blocked_tool": name,
5236 "blocked_arguments": args,
5237 "browser_runtime": browser_runtime_unavailable,
5238 "guidance": (
5239 "Browser automation is unavailable on this host. Do not retry browser tools until the runtime is "
5240 "installed or configured. Use web_search, web_extract, shell_exec, source/ledger tools, or record "
5241 "a blocked task/source and continue through a non-browser branch."
5242 ),
5243 }
5244 return result, f"blocked {name}; browser runtime unavailable"
5245
5246 measurement_obligation = _pending_measurement_obligation(job)
5247 if (
5248 measurement_obligation
5249 and not checkpoint_resolution_call
5250 and name in MEASUREMENT_BLOCKED_TOOLS
5251 and name not in MEASUREMENT_RESOLUTION_TOOLS
5252 ):
5253 result = {
5254 "success": False,
5255 "error": "measurement obligation pending",
5256 "blocked_tool": name,
5257 "blocked_arguments": args,
5258 "pending_measurement_obligation": measurement_obligation,
5259 "guidance": (
5260 "A recent action produced measurable output. Record it with record_experiment, "
5261 "explain why it is invalid with record_lesson, or create the missing measurement branch with record_tasks "
5262 "before doing more research, artifact writing, or finding/source updates."
5263 ),
5264 }
5265 return result, f"blocked {name}; record_experiment required after measured output"
5266
5267 file_validation_obligation = _pending_file_validation_obligation(job)
5268 if (
5269 file_validation_obligation
5270 and not checkpoint_resolution_call
5271 and name in FILE_VALIDATION_BLOCKED_TOOLS
5272 and name not in FILE_VALIDATION_RESOLUTION_TOOLS
5273 ):
5274 result = {
5275 "success": False,
5276 "error": "file validation pending",
5277 "blocked_tool": name,
5278 "blocked_arguments": args,
5279 "pending_file_validation_obligation": file_validation_obligation,
5280 "guidance": (
5281 "A recent file output needs validation before more research/output churn. "
5282 "Use shell_exec to run a syntax check, dry-run, test, or other narrow validation for the file, "
5283 "or use record_tasks/record_lesson/record_experiment if validation is blocked or deferred."
5284 ),
5285 }
5286 return result, f"blocked {name}; file validation required after write_file"
5287
5288 early_anti_bot_context = _recent_anti_bot_context(recent_steps)
5289 if early_anti_bot_context and name == "write_artifact" and not _artifact_args_acknowledge_block(args):
5290 result = {
5291 "success": False,
5292 "error": "misleading blocked-source artifact blocked",
5293 "blocked_tool": name,
5294 "blocked_arguments": args,
5295 "anti_bot_source": early_anti_bot_context,
5296 "guidance": "The latest browser evidence is an anti-bot/CAPTCHA block. Write only a blocked-source note or pivot.",
5297 }
5298 return result, f"blocked misleading write_artifact; anti-bot source at step #{early_anti_bot_context.get('step_no')}"
5299
5300 evidence_grounding = _evidence_grounding_context(job, recent_steps, tool_name=name, args=args)
5301 if evidence_grounding:
5302 result = {
5303 "success": False,
5304 "error": "evidence grounding required",
5305 "blocked_tool": name,
5306 "blocked_arguments": args,
5307 "evidence_grounding": evidence_grounding,
5308 "guidance": evidence_grounding["guidance"],
5309 }
5310 return result, f"blocked {name}; evidence grounding required"
5311
5312 measured_progress_guard = _measured_progress_guard_context(job, recent_steps)
5313 experiment_stagnation = _experiment_stagnation_context(job, recent_steps)
5314 deliverable_progress_guard = _deliverable_progress_guard_context(job, recent_steps)
5315 source_yield = _source_yield_context(job, recent_steps)
5316 progress_churn = _progress_churn_context(recent_steps)
5317 artifact_accounting = _artifact_accounting_context(recent_steps)
5318 activity_stagnation = _activity_stagnation_context(job)
5319 memory_consolidation = _memory_graph_consolidation_context(job, recent_steps)
5320 shell_read_only = name == "shell_exec" and _shell_command_looks_read_only(str(args.get("command") or ""))
5321 if (
5322 artifact_accounting
5323 and name in ARTIFACT_ACCOUNTING_BLOCKED_TOOLS
5324 and name not in ARTIFACT_ACCOUNTING_RESOLUTION_TOOLS
5325 ):
5326 result = {
5327 "success": False,
5328 "error": "progress accounting required",
5329 "blocked_tool": name,
5330 "blocked_arguments": args,
5331 "artifact_accounting": artifact_accounting,
5332 "guidance": (
5333 "Recent saved outputs have not been reflected in durable progress state. "
5334 "Use record_tasks or record_roadmap to mark completed/open branches, "
5335 "record_milestone_validation for milestone checks, record_findings or record_source "
5336 "for reusable evidence, record_experiment for measurements, or record_lesson "
5337 "if the outputs were low-value before continuing."
5338 ),
5339 }
5340 return result, f"blocked {name}; progress accounting required after saved outputs"
5341
5342 if progress_churn and not measured_progress_guard and name in CHURN_TOOLS:
5343 result = {
5344 "success": False,
5345 "error": "progress ledger update required",
5346 "blocked_tool": name,
5347 "blocked_arguments": args,
5348 "progress_churn": progress_churn,
5349 "guidance": (
5350 "Recent activity has not changed findings, experiments, tasks, lessons, or sources. "
5351 "Use a ledger tool to record progress, reject the branch, or create a pivot task before continuing."
5352 ),
5353 }
5354 return result, f"blocked {name}; progress ledger update required"
5355
5356 read_only_shell_churn = _read_only_shell_churn_context(recent_steps)
5357 if read_only_shell_churn and shell_read_only:
5358 result = {
5359 "success": False,
5360 "error": "action decision required",
5361 "blocked_tool": name,
5362 "blocked_arguments": args,
5363 "read_only_shell_churn": read_only_shell_churn,
5364 "guidance": (
5365 "Recent shell work only inspected or listed state. Stop re-probing the same branch. "
5366 "Run the next concrete action, write/persist the candidate decision, record an experiment/monitor task, "
5367 "or record why the branch is blocked before another read-only shell command."
5368 ),
5369 }
5370 return result, f"blocked {name}; action decision required"
5371
5372 if activity_stagnation and name in ACTIVITY_STAGNATION_BLOCKED_TOOLS:
5373 result = {
5374 "success": False,
5375 "error": "durable progress required",
5376 "blocked_tool": name,
5377 "blocked_arguments": args,
5378 "activity_stagnation": activity_stagnation,
5379 "guidance": (
5380 "Several checkpoints have produced no durable ledger delta. "
5381 "Use record_findings, record_source, record_experiment, record_tasks, record_roadmap, "
5382 "record_milestone_validation, or record_lesson to classify the branch, mark it blocked/skipped, "
5383 "or open a better branch before more research, shell, file, report, or artifact work."
5384 ),
5385 }
5386 return result, f"blocked {name}; durable progress required after activity-only checkpoints"
5387
5388 if source_yield and name in SOURCE_YIELD_BLOCKED_TOOLS:
5389 result = {
5390 "success": False,
5391 "error": "source yield accounting required",
5392 "blocked_tool": name,
5393 "blocked_arguments": args,
5394 "source_yield": source_yield,
5395 "guidance": (
5396 "The job has gathered enough sources without enough durable findings or yielded source outcomes. "
5397 "Before more search, extraction, browsing, shell execution, file/output work, or report chatter, "
5398 "use record_findings to save source-backed facts/candidates, record_source to mark source yield "
5399 "or low-yield outcomes, or update tasks/roadmap/lessons to pivot from the source branch."
5400 ),
5401 }
5402 return result, f"blocked {name}; source yield accounting required"
5403
5404 if memory_consolidation and name in MEMORY_CONSOLIDATION_BLOCKED_TOOLS:
5405 result = {
5406 "success": False,
5407 "error": "memory graph consolidation required",
5408 "blocked_tool": name,
5409 "blocked_arguments": args,
5410 "memory_consolidation": memory_consolidation,
5411 "guidance": (
5412 "The job has enough reusable durable records that raw ledgers should be consolidated into connected "
5413 "memory. Use record_memory_graph to add/update nodes and links before more branch work, or record_lesson "
5414 "if there is no reusable memory to preserve."
5415 ),
5416 }
5417 return result, f"blocked {name}; memory graph consolidation required"
5418
5419 record_experiment_closes_branch = (
5420 name == "record_experiment"
5421 and str(args.get("status") or "").strip().lower().replace(" ", "_") in {"failed", "blocked", "skipped"}
5422 )
5423 if (
5424 experiment_stagnation
5425 and not record_experiment_closes_branch
5426 and (
5427 name in BRANCH_WORK_TOOLS
5428 or name in {"record_experiment", "write_artifact", "write_file", "report_update"}
5429 )
5430 ):
5431 result = {
5432 "success": False,
5433 "error": "experiment stagnation decision required",
5434 "blocked_tool": name,
5435 "blocked_arguments": args,
5436 "experiment_stagnation": experiment_stagnation,
5437 "guidance": (
5438 "Recent measured trials have not improved the best observed result. Before more experiments, "
5439 "execution, research, file/output work, or report chatter, make a durable decision: use "
5440 "record_tasks, record_roadmap, record_milestone_validation, record_lesson, or a blocked/skipped/"
5441 "failed record_experiment to reject, block, or pivot the stagnant branch."
5442 ),
5443 }
5444 return result, f"blocked {name}; experiment stagnation decision required"
5445
5446 lesson_sprawl = _lesson_sprawl_context(job, recent_steps)
5447 if lesson_sprawl and name == "record_lesson":
5448 result = {
5449 "success": False,
5450 "error": "lesson consolidation required",
5451 "blocked_tool": name,
5452 "blocked_arguments": args,
5453 "lesson_consolidation": lesson_sprawl,
5454 "guidance": (
5455 "This job already has many raw lessons and the connected memory graph is behind. "
5456 "Do not add another raw lesson. Use record_memory_graph to consolidate reusable strategy, mistake, "
5457 "constraint, decision, question, skill, or episode nodes with evidence links, or update existing "
5458 "tasks/roadmap/milestone state if this is only branch status."
5459 ),
5460 }
5461 return result, "blocked record_lesson; lesson consolidation required"
5462
5463 if deliverable_progress_guard and (name in DELIVERABLE_PROGRESS_BLOCKED_TOOLS or shell_read_only):
5464 result = {
5465 "success": False,
5466 "error": "deliverable checkpoint required",
5467 "blocked_tool": name,
5468 "blocked_arguments": args,
5469 "deliverable_progress_guard": deliverable_progress_guard,
5470 "guidance": (
5471 "This job is deliverable-framed and has done enough background work without a draft/report/file "
5472 "checkpoint. Save a partial deliverable with write_file or write_artifact, or record_tasks, "
5473 "record_roadmap, record_milestone_validation, or record_lesson if the deliverable is blocked."
5474 ),
5475 }
5476 return result, f"blocked {name}; deliverable checkpoint required"
5477
5478 research_balance = _research_balance_context(job, recent_steps)
5479 if research_balance and name in RESEARCH_BALANCE_BLOCKED_TOOLS:
5480 result = {
5481 "success": False,
5482 "error": "research balance required",
5483 "blocked_tool": name,
5484 "blocked_arguments": args,
5485 "research_balance": research_balance,
5486 "guidance": (
5487 "Recent work is execution-heavy but has no durable sources or findings. "
5488 "Use web/browser/documentation/local-inspection tools and record_source or record_findings "
5489 "before continuing execution, artifact review, raw lesson accumulation, report updates, or file churn."
5490 ),
5491 }
5492 return result, f"blocked {name}; research balance required"
5493
5494 roadmap_staleness = _roadmap_staleness_context(job, recent_steps)
5495 if roadmap_staleness and not checkpoint_resolution_call and name in ROADMAP_STALENESS_BLOCKED_TOOLS:
5496 result = {
5497 "success": False,
5498 "error": "roadmap update required",
5499 "blocked_tool": name,
5500 "blocked_arguments": args,
5501 "roadmap_staleness": roadmap_staleness,
5502 "guidance": (
5503 "The roadmap has not advanced despite durable task/artifact activity. "
5504 "Use record_roadmap to mark milestone progress, record_milestone_validation "
5505 "to judge an evidence-backed checkpoint, or record_lesson if the roadmap is wrong."
5506 ),
5507 }
5508 return result, f"blocked {name}; roadmap update required"
5509
5510 milestone_validation = _milestone_validation_needed(job)
5511 milestone_validation_action = milestone_validation and _tool_call_matches_pending_milestone_need(
5512 name,
5513 args,
5514 milestone_validation,
5515 )
5516 if (
5517 milestone_validation
5518 and not milestone_validation_action
5519 and not checkpoint_resolution_call
5520 and name in MILESTONE_VALIDATION_BLOCKED_TOOLS
5521 ):
5522 result = {
5523 "success": False,
5524 "error": "milestone validation required",
5525 "blocked_tool": name,
5526 "blocked_arguments": args,
5527 "milestone": {
5528 "title": milestone_validation.get("title"),
5529 "status": milestone_validation.get("status"),
5530 "validation_status": milestone_validation.get("validation_status"),
5531 "acceptance_criteria": milestone_validation.get("acceptance_criteria"),
5532 "evidence_needed": milestone_validation.get("evidence_needed"),
5533 },
5534 "guidance": (
5535 "The current milestone is ready for validation. Use record_milestone_validation "
5536 "with evidence and pass/fail/blocker status, read an existing artifact if needed, "
5537 "or create follow-up tasks for validation gaps before starting more branch work."
5538 ),
5539 }
5540 return result, f"blocked {name}; milestone validation required"
5541
5542 anti_bot_context = _recent_anti_bot_context(recent_steps)
5543 if anti_bot_context:
5544 blocked_browser_followups = {"browser_click", "browser_console", "browser_press", "browser_scroll", "browser_snapshot", "browser_type"}
5545 if name in blocked_browser_followups:
5546 result = {
5547 "success": False,
5548 "error": "anti-bot source loop blocked",
5549 "blocked_tool": name,
5550 "blocked_arguments": args,
5551 "anti_bot_source": anti_bot_context,
5552 "guidance": "This page is blocked by anti-bot/CAPTCHA. Record the source as blocked and pivot to a different public source.",
5553 }
5554 return result, f"blocked {name}; anti-bot source at step #{anti_bot_context.get('step_no')}"
5555 if name == "browser_navigate" and _same_source_url(str(args.get("url") or ""), str(anti_bot_context.get("url") or "")):
5556 result = {
5557 "success": False,
5558 "error": "anti-bot source loop blocked",
5559 "blocked_tool": name,
5560 "blocked_arguments": args,
5561 "anti_bot_source": anti_bot_context,
5562 "guidance": "Do not reopen the same blocked source. Pivot to another source.",
5563 }
5564 return result, f"blocked {name}; repeated blocked source from step #{anti_bot_context.get('step_no')}"
5565 if name == "web_extract":
5566 urls = args.get("urls") if isinstance(args.get("urls"), list) else []
5567 if any(_same_source_url(str(url), str(anti_bot_context.get("url") or "")) for url in urls):
5568 result = {
5569 "success": False,
5570 "error": "anti-bot source loop blocked",
5571 "blocked_tool": name,
5572 "blocked_arguments": args,
5573 "anti_bot_source": anti_bot_context,
5574 "guidance": "Do not extract the same blocked source. Record it as low-yield and pivot.",
5575 }
5576 return result, f"blocked {name}; blocked source from step #{anti_bot_context.get('step_no')}"
5577 if name == "write_artifact" and not _artifact_args_acknowledge_block(args):
5578 result = {
5579 "success": False,
5580 "error": "misleading blocked-source artifact blocked",
5581 "blocked_tool": name,
5582 "blocked_arguments": args,
5583 "anti_bot_source": anti_bot_context,
5584 "guidance": "The latest browser evidence is an anti-bot/CAPTCHA block. Write only a blocked-source note or pivot.",
5585 }
5586 return result, f"blocked misleading write_artifact; anti-bot source at step #{anti_bot_context.get('step_no')}"
5587
5588 experiment_next_action = _latest_experiment_next_action_context(job)
5589 action_failure = _experiment_next_action_failure_context(job, recent_steps)
5590 if (
5591 action_failure
5592 and name not in {"record_experiment", "record_tasks", "record_lesson", "record_milestone_validation"}
5593 ):
5594 result = {
5595 "success": False,
5596 "error": "action result accounting required",
5597 "blocked_tool": name,
5598 "blocked_arguments": args,
5599 "action_failure": action_failure,
5600 "guidance": (
5601 "The latest experiment next action was attempted and the observed output reports a missing command, "
5602 "path, or prerequisite. Before more work, use record_experiment, record_tasks, or record_lesson to "
5603 "account for the failed/blocked action and choose a concrete recovery branch."
5604 ),
5605 }
5606 return result, f"blocked {name}; action result accounting required"
5607 if (
5608 _experiment_next_action_requires_delivery(experiment_next_action)
5609 and (
5610 name in EXPERIMENT_NEXT_ACTION_BLOCKED_TOOLS
5611 or (
5612 name == "shell_exec"
5613 and _shell_command_looks_read_only(str(args.get("command") or ""))
5614 and not _shell_command_supports_experiment_next_action(str(args.get("command") or ""), experiment_next_action)
5615 )
5616 )
5617 ):
5618 result = {
5619 "success": False,
5620 "error": "experiment next action pending",
5621 "blocked_tool": name,
5622 "blocked_arguments": args,
5623 "experiment_next_action": experiment_next_action,
5624 "guidance": (
5625 "The latest measured experiment selected a delivery/action next step. "
5626 "Act on that next action with an execution or ledger tool, or use record_experiment/record_tasks/record_lesson "
5627 "to explain why it is invalid or blocked before doing more research or artifact review."
5628 ),
5629 }
5630 return result, f"blocked {name}; experiment next action pending"
5631
5632 shell_budget_exhausted = (
5633 name == "shell_exec"
5634 and _as_int(measured_progress_guard.get("shell_actions_since_last_experiment")) >= MEASURABLE_ACTION_BUDGET_STEPS
5635 ) if measured_progress_guard else False
5636 candidate_validation_shell = (
5637 name == "shell_exec" and _shell_exec_targets_candidate_file(job, recent_steps, args)
5638 )
5639 if (
5640 measured_progress_guard
5641 and not checkpoint_resolution_call
5642 and (name in MEASURABLE_RESEARCH_BLOCKED_TOOLS or (shell_budget_exhausted and not candidate_validation_shell))
5643 ):
5644 result = {
5645 "success": False,
5646 "error": "measured progress required",
5647 "blocked_tool": name,
5648 "blocked_arguments": args,
5649 "measured_progress_guard": measured_progress_guard,
5650 "guidance": (
5651 "This job is measurably framed and has exhausted its research budget without new experiment records. "
5652 "If the shell/action budget is exhausted, do not call shell_exec again; call record_experiment for a "
5653 "known measurement, record_tasks with an experiment/action/monitor contract, or record_lesson if "
5654 "measurement is blocked."
5655 ),
5656 }
5657 return result, f"blocked {name}; measured progress required"
5658
5659 if name in BRANCH_WORK_TOOLS and _task_queue_exhausted(job):
5660 result = {
5661 "success": False,
5662 "error": "task branch required before more work",
5663 "blocked_tool": name,
5664 "blocked_arguments": args,
5665 "guidance": (
5666 "The durable task queue has no open or active branch. Use record_tasks to open the next concrete "
5667 "branch before doing more research or execution, or report_update if the operator needs a checkpoint."
5668 ),
5669 }
5670 return result, f"blocked {name}; no open task branch"
5671
5672 known_bad_source = _known_bad_source_for_call(name, args, job)
5673 if known_bad_source:
5674 result = {
5675 "success": False,
5676 "error": "known bad source blocked",
5677 "blocked_tool": name,
5678 "blocked_arguments": args,
5679 "known_bad_source": known_bad_source,
5680 "guidance": (
5681 "The source ledger marks this source as blocked or low-yield for this job. "
5682 "Choose a different source, or record a fresh operator reason before retrying it."
5683 ),
5684 }
5685 return result, f"blocked {name}; known bad source {known_bad_source.get('source')}"
5686
5687 if name == "web_search":
5688 similar_step = _similar_recent_search(args, recent_steps)
5689 if similar_step:
5690 result = {
5691 "success": False,
5692 "error": "similar search query blocked",
5693 "blocked_tool": name,
5694 "blocked_arguments": args,
5695 "previous_step": similar_step["id"],
5696 "guidance": "Use an existing result URL, extract a page, or search a clearly different topic/location/source.",
5697 }
5698 return result, f"blocked similar web_search; previous step #{similar_step['step_no']}"
5699 streak = _recent_search_streak(recent_steps)
5700 if streak >= 3:
5701 result = {
5702 "success": False,
5703 "error": "search loop blocked",
5704 "blocked_tool": name,
5705 "blocked_arguments": args,
5706 "recent_search_streak": streak,
5707 "guidance": "Stop searching. Extract or open one of the prior results, then write an artifact.",
5708 }
5709 return result, f"blocked web_search after {streak} consecutive searches"
5710
5711 if name == "search_artifacts":
5712 similar_step = _similar_recent_query_tool("search_artifacts", args, recent_steps)
5713 if similar_step:
5714 result = {
5715 "success": False,
5716 "error": "similar artifact search blocked",
5717 "blocked_tool": name,
5718 "blocked_arguments": args,
5719 "previous_step": similar_step["id"],
5720 "guidance": (
5721 "Use a returned artifact, record what the prior artifact searches proved, "
5722 "or create the next concrete task instead of searching saved outputs again."
5723 ),
5724 }
5725 return result, f"blocked similar search_artifacts; previous step #{similar_step['step_no']}"
5726 streak = _recent_tool_streak(recent_steps, "search_artifacts")
5727 if streak >= 3:
5728 result = {
5729 "success": False,
5730 "error": "artifact search loop blocked",
5731 "blocked_tool": name,
5732 "blocked_arguments": args,
5733 "recent_artifact_search_streak": streak,
5734 "guidance": (
5735 "Stop searching saved outputs. Read a specific returned artifact, update tasks/findings/lessons, "
5736 "or write the next report artifact from already-read evidence."
5737 ),
5738 }
5739 return result, f"blocked search_artifacts after {streak} consecutive artifact searches"
5740
5741 return None
5742
5743
5744def _error_result(exc: Exception) -> dict[str, Any]:
5745 result: dict[str, Any] = {
5746 "success": False,
5747 "error": str(exc),
5748 "error_type": type(exc).__name__,
5749 }
5750 if isinstance(exc, LLMResponseError) and exc.payload:
5751 result["provider_payload"] = exc.payload
5752 return result
5753
5754
5755def _hard_llm_provider_failure_note(exc: Exception) -> str:
5756 return provider_action_required_note(exc)
5757
5758
5759def _max_step_no(steps: list[dict[str, Any]]) -> int:
5760 return max((int(step.get("step_no") or 0) for step in steps), default=0)
5761
5762
5763def _should_reflect(job: dict[str, Any], recent_steps: list[dict[str, Any]]) -> bool:
5764 if not recent_steps:
5765 return False
5766 if recent_steps[-1].get("kind") == "reflection":
5767 return False
5768 step_no = _max_step_no(recent_steps)
5769 if step_no == 0 or step_no % REFLECTION_INTERVAL_STEPS != 0:
5770 return False
5771 reflections = _metadata_list(job, "reflections")
5772 if not reflections:
5773 return True
5774 last_reflected = 0
5775 metadata = reflections[-1].get("metadata") if isinstance(reflections[-1].get("metadata"), dict) else {}
5776 if isinstance(metadata.get("through_step"), int):
5777 last_reflected = metadata["through_step"]
5778 return step_no > last_reflected
5779
5780
5781def _lesson_already_recorded(job: dict[str, Any], lesson: str, *, category: str) -> bool:
5782 text = " ".join(str(lesson or "").split())
5783 wanted_category = str(category or "memory").strip().lower() or "memory"
5784 return any(
5785 str(entry.get("category") or "memory").strip().lower() == wanted_category
5786 and " ".join(str(entry.get("lesson") or "").split()) == text
5787 for entry in _metadata_list(job, "lessons")
5788 )
5789
5790
5791def _reflection_strategy(
5792 *,
5793 failures: list[dict[str, Any]],
5794 findings: list[Any],
5795 sources: list[Any],
5796 tasks: list[Any],
5797 measured_experiments: list[dict[str, Any]],
5798 pending_measurement: bool,
5799 validating_milestones: list[dict[str, Any]],
5800 active_operator_messages: list[dict[str, Any]],
5801) -> str:
5802 if pending_measurement:
5803 return "Resolve the pending measurement obligation before expanding research, outputs, or branch work."
5804 if active_operator_messages:
5805 return "Incorporate or supersede active operator context before choosing new autonomous branches."
5806 if validating_milestones:
5807 return "Validate the current roadmap milestone from evidence before adding more milestone scope."
5808 if measured_experiments:
5809 return "Continue from the best measured result; reject or pivot branches that do not improve the active metric."
5810 yielded_sources = [
5811 source
5812 for source in sources
5813 if isinstance(source, dict)
5814 and (_as_int(source.get("yield_count")) > 0 or _as_float(source.get("usefulness_score")) >= 0.8)
5815 ]
5816 if len(sources) >= SOURCE_YIELD_MIN_SOURCES and len(findings) + len(yielded_sources) < max(2, len(sources) // 8):
5817 return "Distill gathered sources into durable findings or source yield decisions before collecting more sources."
5818 if failures:
5819 return "Classify blocked or failed steps into durable task, source, experiment, or lesson outcomes before retrying."
5820 open_tasks = [
5821 task
5822 for task in tasks
5823 if isinstance(task, dict)
5824 and str(task.get("status") or "open").lower() in {"open", "active", "blocked"}
5825 ]
5826 if open_tasks:
5827 return "Execute or resolve the highest-priority open task before creating more task branches."
5828 return "Choose the next branch from durable evidence, then record the result as findings, tasks, experiments, sources, or memory."
5829
5830
5831def _claim_operator_queue(db: AgentDB, job_id: str) -> list[dict[str, Any]]:
5832 steering = db.claim_operator_messages(job_id, modes=("steer",), limit=1)
5833 if steering:
5834 return steering
5835 return db.claim_operator_messages(job_id, modes=("follow_up",), limit=1)
5836
5837
5838def _emit_loop_start(db: AgentDB, job_id: str, run_id: str) -> None:
5839 db.append_event(
5840 job_id,
5841 event_type="loop",
5842 title="agent_start",
5843 ref_table="job_runs",
5844 ref_id=run_id,
5845 metadata={"run_id": run_id},
5846 )
5847 db.append_event(
5848 job_id,
5849 event_type="loop",
5850 title="turn_start",
5851 ref_table="job_runs",
5852 ref_id=run_id,
5853 metadata={"run_id": run_id},
5854 )
5855
5856
5857def _emit_assistant_message_event(
5858 db: AgentDB,
5859 job_id: str,
5860 run_id: str,
5861 response: LLMResponse,
5862 *,
5863 messages: list[dict[str, Any]],
5864 context_length: int,
5865 duration_seconds: float | None = None,
5866) -> dict[str, Any]:
5867 if response.tool_calls:
5868 body = ", ".join(call.name for call in response.tool_calls)
5869 metadata = {"run_id": run_id, "tool_calls": [call.name for call in response.tool_calls]}
5870 else:
5871 body = response.content[:1000]
5872 metadata = {"run_id": run_id, "tool_calls": []}
5873 metadata["usage"] = turn_usage_metadata(response, messages=messages, context_length=context_length)
5874 if duration_seconds is not None:
5875 metadata["duration_seconds"] = round(max(0.0, float(duration_seconds)), 3)
5876 if response.model:
5877 metadata["model"] = response.model
5878 if response.response_id:
5879 metadata["response_id"] = response.response_id
5880 db.append_event(
5881 job_id,
5882 event_type="loop",
5883 title="message_end",
5884 body=body,
5885 ref_table="job_runs",
5886 ref_id=run_id,
5887 metadata=metadata,
5888 )
5889 return metadata["usage"]
5890
5891
5892def _emit_loop_end(
5893 db: AgentDB,
5894 job_id: str,
5895 run_id: str,
5896 *,
5897 status: str,
5898 step_id: str | None = None,
5899 tool_name: str | None = None,
5900 detail: str = "",
5901) -> None:
5902 metadata = {"run_id": run_id, "status": status, "step_id": step_id or "", "tool": tool_name or ""}
5903 db.append_event(
5904 job_id,
5905 event_type="loop",
5906 title="turn_end",
5907 body=detail[:1000],
5908 ref_table="job_runs",
5909 ref_id=run_id,
5910 metadata=metadata,
5911 )
5912 db.append_event(
5913 job_id,
5914 event_type="loop",
5915 title="agent_end",
5916 body=status,
5917 ref_table="job_runs",
5918 ref_id=run_id,
5919 metadata=metadata,
5920 )
5921
5922
5923def _run_reflection_step(
5924 job: dict[str, Any],
5925 recent_steps: list[dict[str, Any]],
5926 *,
5927 db: AgentDB,
5928 job_id: str,
5929 run_id: str,
5930) -> StepExecution:
5931 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="reflection", tool_name="reflect")
5932 metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
5933 findings = metadata.get("finding_ledger") if isinstance(metadata.get("finding_ledger"), list) else []
5934 sources = metadata.get("source_ledger") if isinstance(metadata.get("source_ledger"), list) else []
5935 tasks = metadata.get("task_queue") if isinstance(metadata.get("task_queue"), list) else []
5936 experiments = metadata.get("experiment_ledger") if isinstance(metadata.get("experiment_ledger"), list) else []
5937 lessons = metadata.get("lessons") if isinstance(metadata.get("lessons"), list) else []
5938 roadmap = metadata.get("roadmap") if isinstance(metadata.get("roadmap"), dict) else {}
5939 milestones = roadmap.get("milestones") if isinstance(roadmap.get("milestones"), list) else []
5940 validating_milestones = [
5941 milestone for milestone in milestones
5942 if isinstance(milestone, dict)
5943 and (
5944 str(milestone.get("status") or "planned") == "validating"
5945 or str(milestone.get("validation_status") or "not_started") == "pending"
5946 )
5947 ]
5948 operator_messages = metadata.get("operator_messages") if isinstance(metadata.get("operator_messages"), list) else []
5949 active_operator_messages = [
5950 entry for entry in operator_messages
5951 if isinstance(entry, dict)
5952 and str(entry.get("mode") or "steer") in {"steer", "follow_up"}
5953 and not entry.get("acknowledged_at")
5954 and not entry.get("superseded_at")
5955 ]
5956 pending_measurement = _pending_measurement_obligation(job)
5957 artifacts = db.list_artifacts(job_id, limit=12)
5958 failures = [step for step in recent_steps[-REFLECTION_INTERVAL_STEPS:] if step.get("status") == "failed" or step.get("status") == "blocked"]
5959 step_no = _max_step_no(recent_steps)
5960 finding_batches = [artifact for artifact in artifacts if "finding" in str(artifact.get("title") or artifact.get("summary") or "").lower()]
5961 best_sources = sorted(
5962 [
5963 source for source in sources
5964 if isinstance(source, dict)
5965 and (
5966 _as_int(source.get("yield_count")) > 0
5967 or _as_float(source.get("usefulness_score")) >= 0.2
5968 )
5969 and _as_int(source.get("fail_count")) <= max(0, _as_int(source.get("yield_count")))
5970 ],
5971 key=lambda source: (_as_float(source.get("usefulness_score")), _as_int(source.get("yield_count"))),
5972 reverse=True,
5973 )[:3]
5974 source_text = ", ".join(str(source.get("source") or "") for source in best_sources) or "no high-yield source yet"
5975 measured_experiments = [experiment for experiment in experiments if isinstance(experiment, dict) and experiment.get("metric_value") is not None]
5976 best_experiments = [experiment for experiment in measured_experiments if experiment.get("best_observed")]
5977 best_experiment_text = "no measured experiment yet"
5978 if best_experiments:
5979 best_experiment_text = "; ".join(
5980 f"{experiment.get('title')} " + format_metric_value(
5981 experiment.get("metric_name") or "metric",
5982 experiment.get("metric_value"),
5983 experiment.get("metric_unit") or "",
5984 )
5985 for experiment in best_experiments[-3:]
5986 )
5987 summary = (
5988 f"Reflection through step #{step_no}: {len(findings)} findings, {len(sources)} sources, "
5989 f"{len(tasks)} tasks, {len(experiments)} experiments, {len(milestones)} roadmap milestones, "
5990 f"{len(lessons)} lessons, "
5991 f"{len(active_operator_messages)} active operator messages, "
5992 f"{len(finding_batches)} recent finding artifacts, {len(failures)} recent blocked/failed steps. "
5993 f"Best source direction: {source_text}. Best measured result: {best_experiment_text}."
5994 + (f" Roadmap '{roadmap.get('title')}' has {len(validating_milestones)} milestone(s) needing validation." if roadmap else "")
5995 + (" Pending measurement obligation needs resolution." if pending_measurement else "")
5996 )
5997 strategy = _reflection_strategy(
5998 failures=failures,
5999 findings=findings,
6000 sources=sources,
6001 tasks=tasks,
6002 measured_experiments=measured_experiments,
6003 pending_measurement=bool(pending_measurement),
6004 validating_milestones=validating_milestones,
6005 active_operator_messages=active_operator_messages,
6006 )
6007 reflection = db.append_reflection(
6008 job_id,
6009 summary,
6010 strategy=strategy,
6011 metadata={
6012 "through_step": step_no,
6013 "finding_count": len(findings),
6014 "source_count": len(sources),
6015 "task_count": len(tasks),
6016 "experiment_count": len(experiments),
6017 "roadmap_milestone_count": len(milestones),
6018 "roadmap_validation_needed_count": len(validating_milestones),
6019 "measured_experiment_count": len(measured_experiments),
6020 "active_operator_message_count": len(active_operator_messages),
6021 "pending_measurement_obligation": bool(pending_measurement),
6022 },
6023 )
6024 lesson = None
6025 if not _lesson_already_recorded(job, strategy, category="strategy"):
6026 lesson = db.append_lesson(
6027 job_id,
6028 strategy,
6029 category="strategy",
6030 confidence=0.75,
6031 metadata={"source": "reflection", "through_step": step_no},
6032 )
6033 db.append_agent_update(job_id, summary, category="plan", metadata={"reflection": reflection})
6034 result = {"success": True, "reflection": reflection, "lesson_recorded": bool(lesson)}
6035 db.finish_step(step_id, status="completed", summary=summary, output_data=result)
6036 db.finish_run(run_id, "completed")
6037 _emit_loop_end(db, job_id, run_id, status="completed", step_id=step_id, tool_name="reflect", detail=summary)
6038 refresh_memory_index(db, job_id)
6039 return StepExecution(job_id=job_id, run_id=run_id, step_id=step_id, tool_name="reflect", status="completed", result=result)
6040
6041
6042def _run_guard_recovery_step(
6043 context: dict[str, Any],
6044 *,
6045 db: AgentDB,
6046 job_id: str,
6047 run_id: str,
6048) -> StepExecution:
6049 error = str(context.get("error") or "recoverable guard")
6050 checkpoint_accounting = error == "evidence checkpoint accounting required"
6051 task_queue_saturated = error == "task queue saturated"
6052 task_goal = "Convert the repeated guard block into durable progress before retrying the blocked action."
6053 acceptance = (
6054 "Use record_tasks, record_findings, record_source, record_experiment, or record_lesson to state what "
6055 "changed, what branch is rejected, or what concrete branch should run next."
6056 )
6057 stall_behavior = "If the same guard appears again, pivot to a different branch or record the branch as blocked."
6058 if checkpoint_accounting:
6059 task_goal = (
6060 "Account for the already-read evidence checkpoint as durable progress, a rejected branch, "
6061 "or a blocked branch before continuing."
6062 )
6063 acceptance = (
6064 "Use record_findings, record_source, record_experiment, record_tasks, record_roadmap, "
6065 "record_milestone_validation, or record_lesson to state exactly what the checkpoint proved, "
6066 "invalidated, changed, or failed to provide. Do not read the same checkpoint again."
6067 )
6068 stall_behavior = (
6069 "If the checkpoint cannot produce durable progress, record a lesson or task that names the blocker "
6070 "and choose a different branch."
6071 )
6072 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="recovery", tool_name="guard_recovery")
6073 if task_queue_saturated:
6074 task_queue = context.get("task_queue") if isinstance(context.get("task_queue"), dict) else {}
6075 lesson = db.append_lesson(
6076 job_id,
6077 (
6078 f"Repeated task queue saturation occurred {context.get('count')} times. "
6079 "Do not open guard-recovery tasks for saturation; consolidate, complete, block, or skip existing branches "
6080 "before adding new work."
6081 ),
6082 category="strategy",
6083 confidence=0.85,
6084 metadata={"guard_recovery": context},
6085 )
6086 db.update_job_metadata(
6087 job_id,
6088 {
6089 "task_backlog_pressure": {
6090 "detected_at": datetime.now(timezone.utc).isoformat(),
6091 "guard_recovery": context,
6092 "reason": task_queue.get("reason") or "task queue saturated",
6093 "open_count": task_queue.get("open_count"),
6094 "total_count": task_queue.get("total_count"),
6095 }
6096 },
6097 )
6098 message = (
6099 f"Guard recovery recorded task queue saturation from step #{context.get('first_step_no')} "
6100 f"to #{context.get('latest_step_no')}; no new task was opened."
6101 )
6102 update = db.append_agent_update(
6103 job_id,
6104 message,
6105 category="blocked",
6106 metadata={"guard_recovery": context, "lesson_key": lesson.get("key"), "task_queue_saturation": True},
6107 )
6108 result = {
6109 "success": True,
6110 "guard_recovery": context,
6111 "lesson": lesson,
6112 "update": update,
6113 "task_opened": False,
6114 }
6115 db.finish_step(step_id, status="completed", summary=message, output_data=result)
6116 finished_step = _step_by_id(db, job_id, step_id)
6117 _resolve_evidence_checkpoint(
6118 db=db,
6119 job_id=job_id,
6120 tool_name="guard_recovery",
6121 step=finished_step,
6122 )
6123 db.finish_run(run_id, "completed")
6124 _emit_loop_end(db, job_id, run_id, status="completed", step_id=step_id, tool_name="guard_recovery", detail=message)
6125 refresh_memory_index(db, job_id)
6126 return StepExecution(job_id=job_id, run_id=run_id, step_id=step_id, tool_name="guard_recovery", status="completed", result=result)
6127
6128 lesson = db.append_lesson(
6129 job_id,
6130 (
6131 f"Repeated guard block '{error}' occurred {context.get('count')} times. "
6132 + (
6133 "The checkpoint has already been read; do not reread it. Account for the evidence with a durable "
6134 "record or reject/block that branch before continuing."
6135 if checkpoint_accounting
6136 else "Do not retry the same blocked tool pattern; update durable progress state, create a new branch, "
6137 "or explicitly reject the branch before continuing."
6138 )
6139 ),
6140 category="strategy",
6141 confidence=0.75,
6142 metadata={"guard_recovery": context},
6143 )
6144 task = db.append_task_record(
6145 job_id,
6146 title=f"Resolve guard: {error}",
6147 status="open",
6148 priority=9,
6149 goal=task_goal,
6150 output_contract="decision",
6151 acceptance_criteria=acceptance,
6152 evidence_needed=f"Recent blocked tools: {', '.join(context.get('blocked_tools') or [])}",
6153 stall_behavior=stall_behavior,
6154 metadata={"guard_recovery": context, "resolves_evidence_checkpoint": checkpoint_accounting},
6155 )
6156 message = (
6157 f"Guard recovery opened a task after repeated '{error}' blocks "
6158 f"from step #{context.get('first_step_no')} to #{context.get('latest_step_no')}."
6159 )
6160 update = db.append_agent_update(
6161 job_id,
6162 message,
6163 category="blocked",
6164 metadata={"guard_recovery": context, "task_key": task.get("key"), "lesson_key": lesson.get("key")},
6165 )
6166 result = {
6167 "success": True,
6168 "guard_recovery": context,
6169 "lesson": lesson,
6170 "task": task,
6171 "update": update,
6172 }
6173 db.finish_step(step_id, status="completed", summary=message, output_data=result)
6174 finished_step = _step_by_id(db, job_id, step_id)
6175 _resolve_evidence_checkpoint(
6176 db=db,
6177 job_id=job_id,
6178 tool_name="guard_recovery",
6179 step=finished_step,
6180 )
6181 db.finish_run(run_id, "completed")
6182 _emit_loop_end(db, job_id, run_id, status="completed", step_id=step_id, tool_name="guard_recovery", detail=message)
6183 refresh_memory_index(db, job_id)
6184 return StepExecution(job_id=job_id, run_id=run_id, step_id=step_id, tool_name="guard_recovery", status="completed", result=result)
6185
6186
6187def _usage_budget_limit_context(config: AppConfig, usage: dict[str, Any]) -> dict[str, Any] | None:
6188 limit = config.runtime.max_job_cost_usd
6189 if limit is None or limit <= 0 or not bool(usage.get("has_cost")):
6190 return None
6191 cost = _as_float(usage.get("cost"))
6192 if cost < float(limit):
6193 return None
6194 return {
6195 "limit": float(limit),
6196 "cost": cost,
6197 "calls": _as_int(usage.get("calls")),
6198 "total_tokens": _as_int(usage.get("total_tokens")),
6199 "prompt_tokens": _as_int(usage.get("prompt_tokens")),
6200 "completion_tokens": _as_int(usage.get("completion_tokens")),
6201 }
6202
6203
6204def _run_usage_budget_limit_step(
6205 context: dict[str, Any],
6206 *,
6207 db: AgentDB,
6208 job_id: str,
6209 run_id: str,
6210) -> StepExecution:
6211 limit = float(context.get("limit") or 0.0)
6212 cost = float(context.get("cost") or 0.0)
6213 message = (
6214 f"Paused job: configured model cost limit ${limit:g} reached "
6215 f"(current cost ${cost:.4f}, {context.get('calls')} model calls, "
6216 f"{_compact_usage_tokens(context.get('total_tokens'))} tokens). "
6217 "Raise the limit, switch model/provider, or resume after deciding the budget is acceptable."
6218 )
6219 metadata = {
6220 "reason": "usage_budget_limit",
6221 "usage_budget_limit": context,
6222 "last_note": message,
6223 "usage_budget_blocked_at": datetime.now(timezone.utc).isoformat(),
6224 }
6225 db.update_job_status(job_id, "paused", metadata_patch=metadata)
6226 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="recovery", tool_name="budget_limit")
6227 result = {"success": True, "job_id": job_id, "paused": True, **context}
6228 db.append_agent_update(
6229 job_id,
6230 message,
6231 category="blocked",
6232 metadata={"reason": "usage_budget_limit", "usage_budget_limit": context},
6233 )
6234 db.finish_step(step_id, status="completed", summary=message, output_data=result)
6235 db.finish_run(run_id, "completed")
6236 _emit_loop_end(db, job_id, run_id, status="completed", step_id=step_id, tool_name="budget_limit", detail=message)
6237 refresh_memory_index(db, job_id)
6238 return StepExecution(job_id=job_id, run_id=run_id, step_id=step_id, tool_name="budget_limit", status="completed", result=result)
6239
6240
6241def _compact_usage_tokens(value: object) -> str:
6242 number = _as_int(value)
6243 if number >= 1_000_000:
6244 return f"{number / 1_000_000:.1f}M"
6245 if number >= 1_000:
6246 return f"{number / 1_000:.1f}K"
6247 return str(number)
6248
6249
6250def _evidence_checkpoint_content(evidence_step: dict[str, Any]) -> str:
6251 output = evidence_step.get("output") if isinstance(evidence_step.get("output"), dict) else {}
6252 input_data = evidence_step.get("input") if isinstance(evidence_step.get("input"), dict) else {}
6253 observation = _observation_for_prompt(evidence_step.get("tool_name"), output)
6254 return "\n\n".join([
6255 "# Auto Evidence Checkpoint",
6256 f"Source step: #{evidence_step.get('step_no')} {evidence_step.get('tool_name') or evidence_step.get('kind')}",
6257 f"Summary: {evidence_step.get('summary') or ''}",
6258 f"Arguments:\n```json\n{json.dumps(input_data.get('arguments') or {}, ensure_ascii=False, indent=2)[:3000]}\n```",
6259 f"Observed:\n{observation or 'No compact observation available.'}",
6260 f"Raw output excerpt:\n```json\n{json.dumps(output, ensure_ascii=False, indent=2)[:9000]}\n```",
6261 ])
6262
6263
6264def _auto_persist_evidence(
6265 *,
6266 db: AgentDB,
6267 artifacts: ArtifactStore,
6268 job_id: str,
6269 run_id: str,
6270 step_id: str,
6271 blocked_tool: str,
6272 evidence_step: dict[str, Any],
6273) -> dict[str, Any]:
6274 stored = artifacts.write_text(
6275 job_id=job_id,
6276 run_id=run_id,
6277 step_id=step_id,
6278 title=f"Auto Evidence Checkpoint after step {evidence_step.get('step_no')}",
6279 summary=f"Auto-saved evidence before allowing more research; blocked tool was {blocked_tool}.",
6280 content=_evidence_checkpoint_content(evidence_step),
6281 artifact_type="text",
6282 metadata={"auto_checkpoint": True, "evidence_step": evidence_step.get("id"), "blocked_tool": blocked_tool},
6283 )
6284 lesson = db.append_lesson(
6285 job_id,
6286 (
6287 f"Evidence from step #{evidence_step.get('step_no')} must be persisted before more research; "
6288 f"auto-saved checkpoint {stored.id} after blocked {blocked_tool}."
6289 ),
6290 category="mistake",
6291 confidence=0.8,
6292 metadata={"artifact_id": stored.id, "blocked_tool": blocked_tool},
6293 )
6294 db.append_agent_update(
6295 job_id,
6296 f"Auto-saved evidence checkpoint {stored.id} after the model tried {blocked_tool} before persisting evidence.",
6297 category="blocked",
6298 metadata={"artifact_id": stored.id, "blocked_tool": blocked_tool},
6299 )
6300 db.update_job_metadata(
6301 job_id,
6302 {
6303 "pending_evidence_checkpoint": {
6304 "artifact_id": stored.id,
6305 "title": stored.title or f"Auto Evidence Checkpoint after step {evidence_step.get('step_no')}",
6306 "path": str(stored.path),
6307 "created_at": datetime.now(timezone.utc).isoformat(),
6308 "checkpoint_step_id": step_id,
6309 "evidence_step": evidence_step.get("id"),
6310 "evidence_step_no": evidence_step.get("step_no"),
6311 "evidence_tool": evidence_step.get("tool_name") or evidence_step.get("kind"),
6312 "blocked_tool": blocked_tool,
6313 }
6314 },
6315 )
6316 return {"artifact_id": stored.id, "path": str(stored.path), "lesson": lesson}
6317
6318
6319def _auto_record_grounding_block_lesson(*, db: AgentDB, job_id: str, result: dict[str, Any]) -> None:
6320 if result.get("error") != "evidence grounding required":
6321 return
6322 grounding = result.get("evidence_grounding") if isinstance(result.get("evidence_grounding"), dict) else {}
6323 unsupported = grounding.get("unsupported_tokens") if isinstance(grounding.get("unsupported_tokens"), list) else []
6324 unsupported = [str(token) for token in unsupported if str(token).strip()]
6325 if not unsupported:
6326 return
6327 cited_steps = grounding.get("cited_steps") if isinstance(grounding.get("cited_steps"), list) else []
6328 blocked_tool = str(result.get("blocked_tool") or "")
6329 fingerprint = "|".join([blocked_tool, ",".join(unsupported[:8]), ",".join(str(step) for step in cited_steps[:8])])
6330 job = db.get_job(job_id)
6331 metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
6332 seen = metadata.get("grounding_block_fingerprints") if isinstance(metadata.get("grounding_block_fingerprints"), list) else []
6333 if fingerprint in seen:
6334 return
6335 db.append_lesson(
6336 job_id,
6337 (
6338 f"Evidence grounding rejected unsupported concrete tokens for {blocked_tool or 'a durable record'}: "
6339 f"{', '.join(unsupported[:8])}. Treat matching prior ledger, artifact, or memory claims as stale until "
6340 "they are re-verified from the cited evidence."
6341 ),
6342 category="mistake",
6343 confidence=0.9,
6344 metadata={"evidence_grounding": grounding, "blocked_tool": blocked_tool},
6345 )
6346 metadata_patch: dict[str, Any] = {"grounding_block_fingerprints": (seen + [fingerprint])[-100:]}
6347 stale_tokens = _stale_claim_tokens_from_unsupported(
6348 unsupported,
6349 reference_text=" ".join(str(job.get(key) or "") for key in ("title", "objective", "kind")),
6350 )
6351 if stale_tokens:
6352 existing_tokens = [
6353 str(token)
6354 for token in metadata.get("unsupported_claim_tokens", [])
6355 if str(token).strip()
6356 ] if isinstance(metadata.get("unsupported_claim_tokens"), list) else []
6357 combined: list[str] = []
6358 combined_seen: set[str] = set()
6359 for token in existing_tokens + stale_tokens:
6360 key = token.lower()
6361 if key in combined_seen:
6362 continue
6363 combined_seen.add(key)
6364 combined.append(token)
6365 metadata_patch["unsupported_claim_tokens"] = combined[-80:]
6366 db.update_job_metadata(job_id, metadata_patch)
6367
6368
6369def _mark_evidence_checkpoint_read(
6370 *,
6371 db: AgentDB,
6372 job_id: str,
6373 tool_name: str,
6374 args: dict[str, Any],
6375 step: dict[str, Any] | None,
6376) -> None:
6377 if tool_name != "read_artifact":
6378 return
6379 job = db.get_job(job_id)
6380 pending = _pending_evidence_checkpoint(job)
6381 if not pending or pending.get("read_at"):
6382 return
6383 if not _read_artifact_call_matches_checkpoint(
6384 args,
6385 artifact_id=str(pending.get("artifact_id") or ""),
6386 artifact_title=str(pending.get("title") or ""),
6387 ):
6388 return
6389 updated = dict(pending)
6390 updated["read_at"] = datetime.now(timezone.utc).isoformat()
6391 if step:
6392 updated["read_step_id"] = step.get("id")
6393 updated["read_step_no"] = step.get("step_no")
6394 db.update_job_metadata(job_id, {"pending_evidence_checkpoint": updated})
6395 db.append_agent_update(
6396 job_id,
6397 f"Read evidence checkpoint {pending.get('artifact_id')}; durable accounting is required next.",
6398 category="blocked",
6399 metadata={"pending_evidence_checkpoint": updated},
6400 )
6401
6402
6403def _resolve_evidence_checkpoint(
6404 *,
6405 db: AgentDB,
6406 job_id: str,
6407 tool_name: str,
6408 step: dict[str, Any] | None,
6409) -> None:
6410 if tool_name not in EVIDENCE_CHECKPOINT_RESOLUTION_TOOLS and tool_name != "guard_recovery":
6411 return
6412 job = db.get_job(job_id)
6413 pending = _pending_evidence_checkpoint(job)
6414 if not pending:
6415 return
6416 updated = dict(pending)
6417 updated["resolved_at"] = datetime.now(timezone.utc).isoformat()
6418 updated["resolved_by_tool"] = tool_name
6419 if step:
6420 updated["resolved_by_step_id"] = step.get("id")
6421 updated["resolved_by_step_no"] = step.get("step_no")
6422 db.update_job_metadata(job_id, {"pending_evidence_checkpoint": updated})
6423 db.append_agent_update(
6424 job_id,
6425 f"Evidence checkpoint {pending.get('artifact_id')} accounted for with {tool_name}.",
6426 category="progress",
6427 metadata={"pending_evidence_checkpoint": updated},
6428 )
6429
6430
6431def _auto_record_blocked_source(
6432 *,
6433 db: AgentDB,
6434 job_id: str,
6435 context: dict[str, Any],
6436 blocked_tool: str,
6437) -> dict[str, Any]:
6438 source = str(context.get("url") or context.get("title") or "unknown blocked browser source")
6439 reason = str(context.get("reason") or "anti-bot challenge")
6440 record = db.append_source_record(
6441 job_id,
6442 source,
6443 source_type="blocked_browser_source",
6444 usefulness_score=0.02,
6445 fail_count_delta=1,
6446 warnings=[reason],
6447 outcome=f"blocked by {reason}; pivot to an alternate source for the current objective",
6448 metadata={"blocked_tool": blocked_tool, "source_step": context.get("step_id")},
6449 )
6450 lesson = None
6451 if int(record.get("fail_count") or 0) <= 2:
6452 lesson = db.append_lesson(
6453 job_id,
6454 "Blocked, CAPTCHA, login, paywall, or anti-bot pages are not usable evidence for any long-running task; record the source outcome and pivot instead of repeating browser actions.",
6455 category="source_quality",
6456 confidence=0.9,
6457 metadata={"source": source, "blocked_tool": blocked_tool},
6458 )
6459 db.append_agent_update(
6460 job_id,
6461 f"Blocked source guard: current source is {reason}; pivoting away instead of looping.",
6462 category="blocked",
6463 metadata={"source": source, "blocked_tool": blocked_tool, "reason": reason},
6464 )
6465 return {"source": record, "lesson": lesson}
6466
6467
6468def _auto_record_tool_source_quality(
6469 *,
6470 db: AgentDB,
6471 job_id: str,
6472 tool_name: str | None,
6473 result: dict[str, Any],
6474) -> None:
6475 if tool_name == "web_search":
6476 query = str(result.get("query") or "").strip()
6477 results = result.get("results") if isinstance(result.get("results"), list) else []
6478 for item in results[:8]:
6479 if not isinstance(item, dict):
6480 continue
6481 url = str(item.get("url") or "").strip()
6482 if not url:
6483 continue
6484 title = str(item.get("title") or "").strip()
6485 db.append_source_record(
6486 job_id,
6487 url,
6488 source_type="web_search",
6489 usefulness_score=0.35,
6490 yield_count=0,
6491 outcome=f"search result for {query or 'query'}: {title[:160]}",
6492 metadata={"auto_from_tool": "web_search", "query": query, "title": title},
6493 )
6494 return
6495 if tool_name == "web_extract":
6496 pages = result.get("pages") if isinstance(result.get("pages"), list) else []
6497 for page in pages[:12]:
6498 if not isinstance(page, dict):
6499 continue
6500 url = str(page.get("url") or "").strip()
6501 if not url:
6502 continue
6503 text = str(page.get("text") or "")
6504 error = str(page.get("error") or "")
6505 if error:
6506 db.append_source_record(
6507 job_id,
6508 url,
6509 source_type="web_extract",
6510 usefulness_score=0.1,
6511 fail_count_delta=1,
6512 warnings=[error[:180]],
6513 outcome=f"extract failed: {error[:180]}",
6514 metadata={"auto_from_tool": "web_extract"},
6515 )
6516 continue
6517 score = 0.35
6518 if len(text.strip()) >= 500:
6519 score = 0.55
6520 if len(text.strip()) >= 3000:
6521 score = 0.7
6522 db.append_source_record(
6523 job_id,
6524 url,
6525 source_type="web_extract",
6526 usefulness_score=score,
6527 yield_count=0,
6528 outcome=f"extracted {len(text.strip())} chars for possible use",
6529 metadata={"auto_from_tool": "web_extract"},
6530 )
6531 return
6532 if tool_name in {"browser_navigate", "browser_snapshot"}:
6533 context = _browser_warning_context(result)
6534 if not context:
6535 return
6536 result["source_warning"] = context["reason"]
6537 result["source_url"] = context.get("url") or ""
6538 _auto_record_blocked_source(db=db, job_id=job_id, context=context, blocked_tool=tool_name or "browser")
6539
6540
6541def _auto_record_failed_shell_sources(
6542 *,
6543 db: AgentDB,
6544 job_id: str,
6545 args: dict[str, Any],
6546 result: dict[str, Any],
6547) -> None:
6548 error_text = " ".join(str(result.get(key) or "") for key in ("error", "stderr", "stdout"))
6549 lowered = error_text.lower()
6550 if not any(
6551 marker in lowered
6552 for marker in (
6553 "authentication",
6554 "authorization",
6555 "unauthorized",
6556 "forbidden",
6557 "http failure",
6558 "http 401",
6559 "http 403",
6560 "401 unauthorized",
6561 "403 forbidden",
6562 )
6563 ):
6564 return
6565 recorded: set[str] = set()
6566 for url in _shell_guard_urls(str(args.get("command") or ""))[:3]:
6567 candidates = [url]
6568 family_url = _source_failure_family_url(url)
6569 if family_url and not _same_source_url(family_url, url):
6570 candidates.append(family_url)
6571 for candidate in candidates:
6572 if candidate.lower() in recorded:
6573 continue
6574 recorded.add(candidate.lower())
6575 is_family = candidate != url
6576 warning = (
6577 "shell command reported authentication/authorization or HTTP failure for this source family"
6578 if is_family
6579 else "shell command reported authentication/authorization or HTTP failure"
6580 )
6581 outcome = (
6582 f"Source family blocked after failed child URL {url}: {_clip_text(str(result.get('error') or error_text), 420)}"
6583 if is_family
6584 else _clip_text(str(result.get("error") or error_text), 500)
6585 )
6586 metadata = {"auto_from_tool": "shell_exec", "failure_kind": "auth_or_http"}
6587 if is_family:
6588 metadata.update({"source_family": True, "failed_child_url": url})
6589 db.append_source_record(
6590 job_id,
6591 candidate,
6592 source_type="shell_exec_family" if is_family else "shell_exec",
6593 usefulness_score=0.01,
6594 fail_count_delta=1,
6595 warnings=[warning],
6596 outcome=outcome,
6597 metadata=metadata,
6598 )
6599
6600
6601def _auto_reconcile_artifact_tasks(
6602 *,
6603 db: AgentDB,
6604 job_id: str,
6605 args: dict[str, Any],
6606 result: dict[str, Any],
6607) -> list[dict[str, Any]]:
6608 artifact_id = str(result.get("artifact_id") or "")
6609 if not artifact_id:
6610 return []
6611 artifact_title = str(args.get("title") or "")
6612 artifact_summary = str(args.get("summary") or "")
6613 artifact_content = str(args.get("content") or "")
6614 artifact_text = " ".join([artifact_title, artifact_summary, artifact_content[:4000]])
6615 artifact_tokens = _text_tokens(artifact_text)
6616 if len(artifact_tokens) < 2:
6617 return []
6618 job = db.get_job(job_id)
6619 reconciled = []
6620 for task in _metadata_list(job, "task_queue"):
6621 status = str(task.get("status") or "open").strip().lower()
6622 if status not in {"open", "active"}:
6623 continue
6624 contract = str(task.get("output_contract") or "").strip().lower()
6625 if contract in {"experiment", "action", "monitor"}:
6626 continue
6627 task_text = " ".join(
6628 str(task.get(key) or "")
6629 for key in ("title", "goal", "acceptance_criteria", "evidence_needed", "source_hint")
6630 )
6631 if not _artifact_can_reconcile_task(
6632 contract=contract,
6633 task_text=task_text,
6634 artifact_title=artifact_title,
6635 artifact_summary=artifact_summary,
6636 ):
6637 continue
6638 task_tokens = _text_tokens(task_text)
6639 if len(task_tokens) < 2:
6640 continue
6641 overlap = task_tokens & artifact_tokens
6642 needed = max(2, min(4, (len(task_tokens) + 1) // 2))
6643 if len(overlap) < needed:
6644 continue
6645 updated = db.append_task_record(
6646 job_id,
6647 title=str(task.get("title") or ""),
6648 status="done",
6649 priority=_as_int(task.get("priority")),
6650 goal=str(task.get("goal") or ""),
6651 source_hint=str(task.get("source_hint") or ""),
6652 result=f"Saved output {artifact_id}: {_clip_text(artifact_title or artifact_summary, 180)}",
6653 parent=str(task.get("parent") or ""),
6654 output_contract=contract,
6655 acceptance_criteria=str(task.get("acceptance_criteria") or ""),
6656 evidence_needed=str(task.get("evidence_needed") or ""),
6657 stall_behavior=str(task.get("stall_behavior") or ""),
6658 metadata={
6659 **(task.get("metadata") if isinstance(task.get("metadata"), dict) else {}),
6660 "auto_reconciled_from_artifact": artifact_id,
6661 "matched_tokens": sorted(overlap)[:12],
6662 },
6663 )
6664 reconciled.append(updated)
6665 if reconciled:
6666 titles = ", ".join(str(task.get("title") or "") for task in reconciled[:4])
6667 db.append_agent_update(
6668 job_id,
6669 f"Task progress reconciled from saved output {artifact_id}: {titles}.",
6670 category="plan",
6671 metadata={"artifact_id": artifact_id, "task_count": len(reconciled)},
6672 )
6673 return reconciled
6674
6675
6676def _auto_open_revision_task_for_deliverable(
6677 *,
6678 db: AgentDB,
6679 job_id: str,
6680 args: dict[str, Any],
6681 result: dict[str, Any],
6682) -> dict[str, Any] | None:
6683 artifact_id = str(result.get("artifact_id") or "")
6684 if not artifact_id:
6685 return None
6686 artifact_title = str(args.get("title") or "")
6687 artifact_summary = str(args.get("summary") or "")
6688 if not _artifact_can_reconcile_task(
6689 contract="report",
6690 task_text="review revise draft report deliverable",
6691 artifact_title=artifact_title,
6692 artifact_summary=artifact_summary,
6693 ):
6694 return None
6695 job = db.get_job(job_id)
6696 for task in _metadata_list(job, "task_queue"):
6697 if str(task.get("status") or "open").strip().lower() not in {"open", "active"}:
6698 continue
6699 metadata = task.get("metadata") if isinstance(task.get("metadata"), dict) else {}
6700 if metadata.get("revision_source_artifact_id") == artifact_id:
6701 return None
6702 if metadata.get("source") == "auto_revision_loop":
6703 db.append_task_record(
6704 job_id,
6705 title=str(task.get("title") or ""),
6706 status="skipped",
6707 priority=_as_int(task.get("priority")),
6708 goal=str(task.get("goal") or ""),
6709 source_hint=str(task.get("source_hint") or ""),
6710 result=f"Superseded by newer saved output {artifact_id}.",
6711 parent=str(task.get("parent") or ""),
6712 output_contract=str(task.get("output_contract") or ""),
6713 acceptance_criteria=str(task.get("acceptance_criteria") or ""),
6714 evidence_needed=str(task.get("evidence_needed") or ""),
6715 stall_behavior=str(task.get("stall_behavior") or ""),
6716 metadata={**metadata, "superseded_by_artifact_id": artifact_id},
6717 )
6718 task = db.append_task_record(
6719 job_id,
6720 title=f"Review and revise saved output {artifact_id}",
6721 status="open",
6722 priority=4,
6723 goal="Use the latest saved deliverable as a baseline, check it against evidence and acceptance criteria, then improve it.",
6724 source_hint=artifact_id,
6725 output_contract="report",
6726 acceptance_criteria="The saved output is reviewed and either revised, validated, or given concrete follow-up gaps.",
6727 evidence_needed="Saved output, relevant evidence artifacts or files, and explicit gap/revision notes.",
6728 stall_behavior="If no useful revision is possible, record why and open the next evidence, validation, or monitoring branch.",
6729 metadata={
6730 "source": "auto_revision_loop",
6731 "revision_source_artifact_id": artifact_id,
6732 "source_title": artifact_title,
6733 },
6734 )
6735 db.append_agent_update(
6736 job_id,
6737 f"Opened revision branch for saved output {artifact_id}: {_clip_text(artifact_title or artifact_summary, 160)}.",
6738 category="plan",
6739 metadata={"artifact_id": artifact_id, "task_key": task.get("key"), "source": "auto_revision_loop"},
6740 )
6741 return task
6742
6743
6744def _artifact_can_reconcile_task(
6745 *,
6746 contract: str,
6747 task_text: str,
6748 artifact_title: str,
6749 artifact_summary: str,
6750) -> bool:
6751 contract = contract.strip().lower()
6752 if contract in {"experiment", "action", "monitor"}:
6753 return False
6754 if contract == "research":
6755 return True
6756 artifact_text = f"{artifact_title} {artifact_summary}".lower()
6757 task_lower = task_text.lower()
6758 evidence_like = any(term in artifact_text for term in EVIDENCE_ARTIFACT_TERMS)
6759 deliverable_like = any(term in artifact_text for term in DELIVERABLE_ARTIFACT_TERMS)
6760 task_needs_deliverable_action = any(term in task_lower for term in TASK_DELIVERABLE_ACTION_TERMS)
6761 if evidence_like:
6762 return False
6763 if task_needs_deliverable_action and not deliverable_like:
6764 return False
6765 return True
6766
6767
6768def _auto_checkpoint_update(
6769 *,
6770 db: AgentDB,
6771 job_id: str,
6772 step_no: int,
6773 tool_name: str | None,
6774 args: dict[str, Any],
6775 result: dict[str, Any],
6776) -> None:
6777 title_text = " ".join(str(args.get(key) or "") for key in ("title", "summary", "type")).lower()
6778 is_finding_batch = tool_name == "write_artifact" and "finding" in title_text
6779 if not is_finding_batch and step_no % 10 != 0:
6780 return
6781 job = db.get_job(job_id)
6782 metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
6783 previous = metadata.get("last_checkpoint_counts") if isinstance(metadata.get("last_checkpoint_counts"), dict) else {}
6784 checkpoint = build_progress_checkpoint(
6785 metadata,
6786 previous_counts=previous,
6787 step_no=step_no,
6788 tool_name=tool_name,
6789 artifact_id=str(result.get("artifact_id") or ""),
6790 is_finding_output=is_finding_batch,
6791 )
6792 db.append_agent_update(
6793 job_id,
6794 checkpoint.message,
6795 category=checkpoint.category,
6796 metadata={
6797 "step_no": step_no,
6798 "tool": tool_name,
6799 "deltas": checkpoint.deltas,
6800 "updates": checkpoint.updates,
6801 "resolutions": checkpoint.resolutions,
6802 },
6803 )
6804 streak = _as_int(metadata.get("activity_checkpoint_streak"))
6805 streak = streak + 1 if checkpoint.category == "activity" else 0
6806 task_durable_change = checkpoint.deltas.get("tasks", 0) + checkpoint.updates.get("tasks", 0)
6807 non_task_durable_change = any(
6808 checkpoint.deltas.get(key, 0) > 0
6809 or checkpoint.updates.get(key, 0) > 0
6810 or checkpoint.resolutions.get(key, 0) > 0
6811 for key in ("findings", "sources", "experiments", "lessons", "milestones")
6812 )
6813 task_resolution = checkpoint.resolutions.get("tasks", 0) > 0
6814 task_only_progress = task_durable_change > 0 and not non_task_durable_change and not task_resolution
6815 task_planning_streak = _as_int(metadata.get("task_planning_checkpoint_streak"))
6816 task_planning_streak = task_planning_streak + 1 if task_only_progress else 0
6817 db.update_job_metadata(
6818 job_id,
6819 {
6820 "last_checkpoint_counts": checkpoint.counts,
6821 "last_checkpoint_at": datetime.now(timezone.utc).isoformat(),
6822 "activity_checkpoint_streak": streak,
6823 "task_planning_checkpoint_streak": task_planning_streak,
6824 },
6825 )
6826
6827
6828def _execute_tool_call(
6829 call: Any,
6830 *,
6831 job: dict[str, Any],
6832 recent_steps: list[dict[str, Any]],
6833 config: AppConfig,
6834 db: AgentDB,
6835 artifacts: ArtifactStore,
6836 registry: ToolRegistry,
6837 job_id: str,
6838 run_id: str,
6839) -> tuple[StepExecution, bool, str, str | None]:
6840 args = _normalize_milestone_validation_args_for_active_gate(call.name, call.arguments, job)
6841 input_data = {"tool_call_id": call.id, "arguments": args}
6842 if args != call.arguments:
6843 input_data["original_arguments"] = call.arguments
6844 step_id = db.add_step(
6845 job_id=job_id,
6846 run_id=run_id,
6847 kind="tool",
6848 tool_name=call.name,
6849 input_data=input_data,
6850 )
6851 validate_arguments = getattr(registry, "validate_arguments", None)
6852 argument_block = validate_arguments(call.name, args, config) if callable(validate_arguments) else None
6853 if argument_block:
6854 concrete_fields = [*(argument_block.get("missing_arguments") or []), *(argument_block.get("placeholder_arguments") or [])]
6855 reason = "missing required arguments" if argument_block.get("missing_arguments") else str(argument_block.get("error") or "invalid tool arguments")
6856 summary = f"blocked {call.name}; {reason}: {', '.join(concrete_fields)}"
6857 db.finish_step(
6858 step_id,
6859 status="blocked",
6860 summary=summary,
6861 output_data=argument_block,
6862 error=None,
6863 )
6864 db.append_agent_update(
6865 job_id,
6866 summary,
6867 category="blocked",
6868 metadata={
6869 "reason": "tool_arguments_missing",
6870 "tool": call.name,
6871 "missing_arguments": argument_block.get("missing_arguments") or [],
6872 "placeholder_arguments": argument_block.get("placeholder_arguments") or [],
6873 },
6874 )
6875 return (
6876 StepExecution(
6877 job_id=job_id,
6878 run_id=run_id,
6879 step_id=step_id,
6880 tool_name=call.name,
6881 status="blocked",
6882 result=argument_block,
6883 ),
6884 True,
6885 summary,
6886 None,
6887 )
6888 blocked = _blocked_tool_call_result(call.name, args, recent_steps, job)
6889 if blocked:
6890 result, summary = blocked
6891 result = {**result, "success": True, "recoverable": True}
6892 evidence_checkpoint = None
6893 if result.get("error") == "artifact required before more research":
6894 evidence_step = next(
6895 (step for step in recent_steps if step.get("id") == result.get("previous_step")),
6896 None,
6897 )
6898 if evidence_step:
6899 evidence_checkpoint = _auto_persist_evidence(
6900 db=db,
6901 artifacts=artifacts,
6902 job_id=job_id,
6903 run_id=run_id,
6904 step_id=step_id,
6905 blocked_tool=call.name,
6906 evidence_step=evidence_step,
6907 )
6908 result["auto_checkpoint"] = evidence_checkpoint
6909 summary = f"blocked {call.name}; auto-saved evidence checkpoint {evidence_checkpoint['artifact_id']}"
6910 anti_bot_source = result.get("anti_bot_source") if isinstance(result.get("anti_bot_source"), dict) else None
6911 if anti_bot_source:
6912 result["auto_source_record"] = _auto_record_blocked_source(
6913 db=db,
6914 job_id=job_id,
6915 context=anti_bot_source,
6916 blocked_tool=call.name,
6917 )
6918 known_bad_source = result.get("known_bad_source") if isinstance(result.get("known_bad_source"), dict) else None
6919 if known_bad_source:
6920 db.append_agent_update(
6921 job_id,
6922 f"Source ledger blocked retry of {known_bad_source.get('source')}; choosing a different route next.",
6923 category="blocked",
6924 metadata={"source": known_bad_source, "blocked_tool": call.name},
6925 )
6926 if result.get("error") == "task queue saturated":
6927 step = _step_by_id(db, job_id, step_id)
6928 task_queue = result.get("task_queue") if isinstance(result.get("task_queue"), dict) else {}
6929 _record_task_backlog_pressure(
6930 db=db,
6931 job_id=job_id,
6932 step_no=(step or {}).get("step_no"),
6933 task_queue=task_queue,
6934 source="blocked_record_tasks",
6935 )
6936 _auto_record_grounding_block_lesson(db=db, job_id=job_id, result=result)
6937 db.finish_step(
6938 step_id,
6939 status="blocked",
6940 summary=summary,
6941 output_data=result,
6942 error=None,
6943 )
6944 return (
6945 StepExecution(job_id=job_id, run_id=run_id, step_id=step_id, tool_name=call.name, status="blocked", result=result),
6946 True,
6947 summary,
6948 None,
6949 )
6950
6951 ctx = ToolContext(
6952 config=config,
6953 db=db,
6954 artifacts=artifacts,
6955 job_id=job_id,
6956 run_id=run_id,
6957 step_id=step_id,
6958 task_id=job_id,
6959 )
6960 try:
6961 raw_result = registry.handle(call.name, args, ctx)
6962 result = _parse_tool_result(raw_result)
6963 ok = bool(result.get("success", True)) and not result.get("error")
6964 status = "completed" if ok else "blocked" if result.get("recoverable") is True else "failed"
6965 if ok:
6966 _auto_record_tool_source_quality(db=db, job_id=job_id, tool_name=call.name, result=result)
6967 elif call.name == "shell_exec":
6968 _auto_record_failed_shell_sources(db=db, job_id=job_id, args=args, result=result)
6969 summary = _summarize_tool_result(call.name, args, result, ok=ok)
6970 db.finish_step(step_id, status=status, summary=summary, output_data=result, error=result.get("error"))
6971 if call.name == "shell_exec":
6972 _maybe_resolve_file_validation_obligation(
6973 db=db,
6974 job_id=job_id,
6975 tool_name=call.name,
6976 args=args,
6977 result=result,
6978 ok=ok,
6979 )
6980 finished_step = _step_by_id(db, job_id, step_id)
6981 _maybe_create_measurement_obligation(
6982 db=db,
6983 job_id=job_id,
6984 step=finished_step,
6985 tool_name=call.name,
6986 args=args,
6987 result=result,
6988 )
6989 if ok:
6990 finished_step = _step_by_id(db, job_id, step_id)
6991 _mark_evidence_checkpoint_read(
6992 db=db,
6993 job_id=job_id,
6994 tool_name=call.name,
6995 args=args,
6996 step=finished_step,
6997 )
6998 _resolve_evidence_checkpoint(
6999 db=db,
7000 job_id=job_id,
7001 tool_name=call.name,
7002 step=finished_step,
7003 )
7004 if call.name == "write_file":
7005 _maybe_create_file_validation_obligation(
7006 db=db,
7007 job_id=job_id,
7008 step=finished_step,
7009 args=args,
7010 result=result,
7011 )
7012 elif call.name in {"record_lesson", "record_tasks", "record_experiment", "record_milestone_validation"}:
7013 _maybe_resolve_file_validation_obligation(
7014 db=db,
7015 job_id=job_id,
7016 tool_name=call.name,
7017 args=args,
7018 result=result,
7019 ok=ok,
7020 )
7021 _auto_checkpoint_update(
7022 db=db,
7023 job_id=job_id,
7024 step_no=(finished_step or db.list_steps(job_id=job_id)[-1])["step_no"],
7025 tool_name=call.name,
7026 args=args,
7027 result=result,
7028 )
7029 if call.name == "write_artifact":
7030 reconciled_tasks = _auto_reconcile_artifact_tasks(
7031 db=db,
7032 job_id=job_id,
7033 args=args,
7034 result=result,
7035 )
7036 if reconciled_tasks:
7037 result["auto_reconciled_tasks"] = [
7038 {"title": task.get("title"), "status": task.get("status")}
7039 for task in reconciled_tasks[:8]
7040 ]
7041 revision_task = _auto_open_revision_task_for_deliverable(
7042 db=db,
7043 job_id=job_id,
7044 args=args,
7045 result=result,
7046 )
7047 if revision_task:
7048 result["auto_revision_task"] = {
7049 "title": revision_task.get("title"),
7050 "status": revision_task.get("status"),
7051 "key": revision_task.get("key"),
7052 }
7053 return (
7054 StepExecution(job_id=job_id, run_id=run_id, step_id=step_id, tool_name=call.name, status=status, result=result),
7055 status != "completed",
7056 summary,
7057 result.get("error") if status == "failed" else None,
7058 )
7059 except Exception as exc:
7060 result = _error_result(exc)
7061 db.finish_step(step_id, status="failed", summary=f"{call.name} raised", output_data=result, error=str(exc))
7062 return (
7063 StepExecution(job_id=job_id, run_id=run_id, step_id=step_id, tool_name=call.name, status="failed", result=result),
7064 True,
7065 str(exc),
7066 str(exc),
7067 )
7068
7069
7070def _is_continuable_recoverable_input_block(execution: StepExecution) -> bool:
7071 result = execution.result if isinstance(execution.result, dict) else {}
7072 error = str(result.get("error") or "")
7073 if execution.status != "blocked" or result.get("recoverable") is not True:
7074 return False
7075 if error in {"missing required tool arguments", "placeholder tool arguments"}:
7076 return bool(result.get("missing_arguments") or result.get("placeholder_arguments"))
7077 if error == "malformed shell command" and execution.tool_name == "shell_exec":
7078 return True
7079 if error == "duplicate tool call blocked" and execution.tool_name == "read_artifact":
7080 return True
7081 return error.startswith("artifact not found:") or error == "no active operator context to acknowledge"
7082
7083
7084def _ordered_tool_calls_for_execution(
7085 tool_calls: list[ToolCall],
7086 *,
7087 job: dict[str, Any],
7088 recent_steps: list[dict[str, Any]],
7089) -> list[ToolCall]:
7090 """Run guard-unblocking calls before branch work when a model batches both."""
7091
7092 if len(tool_calls) < 2:
7093 return tool_calls
7094 if _browser_runtime_unavailable_context(recent_steps) and any(not _is_browser_tool(call.name) for call in tool_calls):
7095 tool_calls = [call for call in tool_calls if not _is_browser_tool(call.name)]
7096 if len(tool_calls) < 2:
7097 return tool_calls
7098 checkpoint = _auto_checkpoint_accounting_context(job, recent_steps)
7099 saturated_record_tasks = any(
7100 call.name == "record_tasks" and _task_queue_saturation_context(job, call.arguments)
7101 for call in tool_calls
7102 )
7103 if not checkpoint and not saturated_record_tasks:
7104 return tool_calls
7105
7106 artifact_id = str(checkpoint.get("artifact_id") or "") if checkpoint else ""
7107 artifact_title = str(checkpoint.get("title") or "") if checkpoint else ""
7108 checkpoint_read = bool(checkpoint and checkpoint.get("checkpoint_read"))
7109 accounting_tools = {
7110 "record_experiment",
7111 "record_findings",
7112 "record_lesson",
7113 "record_memory_graph",
7114 "record_milestone_validation",
7115 "record_roadmap",
7116 "record_source",
7117 "report_update",
7118 "write_artifact",
7119 }
7120
7121 def priority(call: ToolCall) -> int:
7122 if checkpoint:
7123 if call.name in EVIDENCE_CHECKPOINT_RESOLUTION_TOOLS:
7124 return 0
7125 if (
7126 not checkpoint_read
7127 and call.name == "read_artifact"
7128 and _read_artifact_call_matches_checkpoint(
7129 call.arguments,
7130 artifact_id=artifact_id,
7131 artifact_title=artifact_title,
7132 )
7133 ):
7134 return 0
7135 if saturated_record_tasks:
7136 if call.name == "record_tasks" and _task_queue_saturation_context(job, call.arguments):
7137 return 2
7138 if call.name in accounting_tools:
7139 return 0
7140 return 1
7141
7142 ordered = sorted(enumerate(tool_calls), key=lambda item: (priority(item[1]), item[0]))
7143 return [call for _, call in ordered]
7144
7145
7146def _registry_tools(registry: ToolRegistry, config: AppConfig) -> list[dict[str, Any]]:
7147 try:
7148 return registry.openai_tools(config=config)
7149 except TypeError:
7150 return registry.openai_tools()
7151
7152
7153def _registry_tools_for_step(
7154 registry: ToolRegistry,
7155 config: AppConfig,
7156 recent_steps: list[dict[str, Any]],
7157 *,
7158 job: dict[str, Any] | None = None,
7159) -> list[dict[str, Any]]:
7160 tools = _registry_tools(registry, config)
7161 resolution_tools = _active_obligation_tool_names(job, recent_steps) if job else None
7162 if resolution_tools:
7163 tools = [tool for tool in tools if _openai_tool_name(tool) in resolution_tools]
7164 suppressed_tools = _suppressed_tool_names(job, recent_steps)
7165 if resolution_tools:
7166 suppressed_tools -= resolution_tools
7167 if suppressed_tools:
7168 tools = [tool for tool in tools if _openai_tool_name(tool) not in suppressed_tools]
7169 if not _browser_runtime_unavailable_context(recent_steps):
7170 return tools
7171 return [tool for tool in tools if not _is_browser_tool(_openai_tool_name(tool))]
7172
7173
7174def _active_obligation_tool_names(job: dict[str, Any] | None, recent_steps: list[dict[str, Any]]) -> set[str] | None:
7175 if not job:
7176 return None
7177 allowed: set[str] = set()
7178 checkpoint = _auto_checkpoint_accounting_context(job, recent_steps)
7179 if checkpoint:
7180 if not checkpoint.get("checkpoint_read"):
7181 allowed.add("read_artifact")
7182 allowed.update(EVIDENCE_CHECKPOINT_PROMPT_TOOLS)
7183 if _pending_measurement_obligation(job):
7184 allowed.update(MEASUREMENT_RESOLUTION_TOOLS)
7185 if _experiment_next_action_failure_context(job, recent_steps):
7186 allowed.update(MEASUREMENT_RESOLUTION_TOOLS)
7187 measured_progress = _measured_progress_guard_context(job, recent_steps)
7188 if measured_progress:
7189 allowed.update(MEASUREMENT_RESOLUTION_TOOLS)
7190 if _as_int(measured_progress.get("shell_actions_since_last_experiment")) < MEASURABLE_ACTION_BUDGET_STEPS:
7191 allowed.add("shell_exec")
7192 if _pending_file_validation_obligation(job):
7193 allowed.update(FILE_VALIDATION_RESOLUTION_TOOLS)
7194 return allowed or None
7195
7196
7197def _suppressed_tool_names(job: dict[str, Any] | None, recent_steps: list[dict[str, Any]]) -> set[str]:
7198 if not job:
7199 return set()
7200 suppressed: set[str] = set()
7201 if _repeated_task_queue_saturation_context(recent_steps):
7202 suppressed.add("record_tasks")
7203 elif (
7204 (backlog := _task_backlog_pressure_context(job))
7205 and _as_int(backlog.get("total")) > TASK_QUEUE_TOTAL_SOFT_LIMIT
7206 and not _pending_measurement_obligation(job)
7207 and not _pending_file_validation_obligation(job)
7208 and not _auto_checkpoint_accounting_context(job, recent_steps)
7209 and not _task_queue_exhausted(job)
7210 ):
7211 suppressed.add("record_tasks")
7212 if not _has_acknowledgeable_operator_context(job):
7213 suppressed.add("acknowledge_operator_context")
7214 return suppressed
7215
7216
7217def _has_acknowledgeable_operator_context(job: dict[str, Any]) -> bool:
7218 metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
7219 messages = metadata.get("operator_messages") if isinstance(metadata.get("operator_messages"), list) else []
7220 for entry in messages:
7221 if not isinstance(entry, dict):
7222 continue
7223 mode = str(entry.get("mode") or "steer").strip().lower().replace("-", "_")
7224 if mode not in {"steer", "follow_up"}:
7225 continue
7226 if not entry.get("claimed_at"):
7227 continue
7228 if entry.get("acknowledged_at") or entry.get("superseded_at"):
7229 continue
7230 return True
7231 return False
7232
7233
7234def _openai_tool_name(tool: dict[str, Any]) -> str:
7235 function = tool.get("function") if isinstance(tool, dict) else None
7236 if isinstance(function, dict):
7237 return str(function.get("name") or "")
7238 return str(tool.get("name") or "") if isinstance(tool, dict) else ""
7239
7240
7241def _call_next_action_with_timeout(
7242 llm: StepLLM,
7243 *,
7244 messages: list[dict[str, Any]],
7245 tools: list[dict[str, Any]],
7246 timeout_seconds: float,
7247) -> LLMResponse:
7248 timeout = max(0.0, float(timeout_seconds or 0.0))
7249 if timeout <= 0 or threading.current_thread() is not threading.main_thread():
7250 return llm.next_action(messages=messages, tools=tools)
7251
7252 previous_handler = signal.getsignal(signal.SIGALRM)
7253 previous_timer = signal.getitimer(signal.ITIMER_REAL)
7254 started = time.monotonic()
7255
7256 def _raise_timeout(_signum: int, _frame: Any) -> None:
7257 raise TimeoutError(f"model call timed out after {timeout:g}s")
7258
7259 signal.signal(signal.SIGALRM, _raise_timeout)
7260 signal.setitimer(signal.ITIMER_REAL, timeout)
7261 try:
7262 return llm.next_action(messages=messages, tools=tools)
7263 finally:
7264 signal.setitimer(signal.ITIMER_REAL, 0)
7265 signal.signal(signal.SIGALRM, previous_handler)
7266 if previous_timer[0] > 0:
7267 elapsed = max(0.0, time.monotonic() - started)
7268 remaining = max(0.001, previous_timer[0] - elapsed)
7269 signal.setitimer(signal.ITIMER_REAL, remaining, previous_timer[1])
7270
7271
7272def _tool_repair_messages(messages: list[dict[str, Any]], response: LLMResponse) -> list[dict[str, Any]]:
7273 content = str(response.content or "").strip()
7274 if len(content) > 2000:
7275 content = content[:2000] + " ..."
7276 repair_prompt = (
7277 "Your previous worker response did not call a tool. This worker must advance by calling exactly "
7278 "one available tool now. Do not answer in prose. Choose one bounded action that fits the current "
7279 "state, such as executing existing work, recording a measurement, updating an existing task, "
7280 "saving an evidence-backed output, recording a lesson/finding/source, or deferring only for a real wait."
7281 )
7282 repaired = list(messages)
7283 if content:
7284 repaired.append({"role": "assistant", "content": content})
7285 repaired.append({"role": "user", "content": repair_prompt})
7286 return repaired
7287
7288
7289def run_one_step(
7290 job_id: str,
7291 *,
7292 config: AppConfig | None = None,
7293 db: AgentDB | None = None,
7294 llm: StepLLM | None = None,
7295 registry: ToolRegistry = DEFAULT_REGISTRY,
7296) -> StepExecution:
7297 config = config or load_config()
7298 config.ensure_dirs()
7299 owns_db = db is None
7300 db = db or AgentDB(config.runtime.state_db_path)
7301 try:
7302 artifacts = ArtifactStore(config.runtime.home, db=db)
7303 job = db.get_job(job_id)
7304 if _acknowledge_non_prompt_operator_context(db, job_id):
7305 job = db.get_job(job_id)
7306 if _clear_invalid_measurement_obligation(db, job_id):
7307 job = db.get_job(job_id)
7308 if _clear_stale_task_backlog_pressure(db, job_id, job):
7309 job = db.get_job(job_id)
7310 run_id = db.start_run(job_id, model=config.model.model)
7311 _emit_loop_start(db, job_id, run_id)
7312 recent_steps = db.list_steps(job_id=job_id)
7313 if _refresh_contradicted_negative_claims(db, job_id, job, recent_steps):
7314 job = db.get_job(job_id)
7315 model_config = config.model
7316 if _should_reflect(job, recent_steps):
7317 return _run_reflection_step(job, recent_steps, db=db, job_id=job_id, run_id=run_id)
7318 guard_recovery = _repeated_guard_block_context(recent_steps)
7319 if guard_recovery:
7320 return _run_guard_recovery_step(guard_recovery, db=db, job_id=job_id, run_id=run_id)
7321 active_operator_messages = _claim_operator_queue(db, job_id)
7322 if active_operator_messages:
7323 job = db.get_job(job_id)
7324 usage = db.job_token_usage(job_id)
7325 usage_budget_limit = _usage_budget_limit_context(config, usage)
7326 if usage_budget_limit:
7327 return _run_usage_budget_limit_step(
7328 usage_budget_limit,
7329 db=db,
7330 job_id=job_id,
7331 run_id=run_id,
7332 )
7333 messages = build_messages(
7334 job,
7335 recent_steps,
7336 memory_entries=db.list_memory(job_id),
7337 program_text=_load_program_text(config, job_id),
7338 timeline_events=db.list_timeline_events(job_id, limit=30),
7339 active_operator_messages=active_operator_messages,
7340 include_unclaimed_operator_messages=True,
7341 token_usage=usage,
7342 )
7343 llm = llm or OpenAIChatLLM(model_config)
7344 llm_started = time.monotonic()
7345 try:
7346 response: LLMResponse = _call_next_action_with_timeout(
7347 llm,
7348 messages=messages,
7349 tools=_registry_tools_for_step(registry, config, recent_steps, job=job),
7350 timeout_seconds=model_config.request_timeout_seconds,
7351 )
7352 except Exception as exc:
7353 llm_duration_seconds = round(max(0.0, time.monotonic() - llm_started), 3)
7354 step_id = db.add_step(
7355 job_id=job_id,
7356 run_id=run_id,
7357 kind="llm",
7358 status="failed",
7359 summary=f"model call failed: {type(exc).__name__}",
7360 input_data={
7361 "model": config.model.model,
7362 "duration_seconds": llm_duration_seconds,
7363 "request_timeout_seconds": model_config.request_timeout_seconds,
7364 },
7365 )
7366 result = _error_result(exc)
7367 result["duration_seconds"] = llm_duration_seconds
7368 hard_failure_note = _hard_llm_provider_failure_note(exc)
7369 if hard_failure_note:
7370 result["provider_action_required"] = True
7371 result["pause_reason"] = "llm_provider_blocked"
7372 db.update_job_status(
7373 job_id,
7374 "paused",
7375 metadata_patch={
7376 "last_note": hard_failure_note,
7377 "provider_blocked_at": datetime.now(timezone.utc).isoformat(),
7378 },
7379 )
7380 db.append_agent_update(
7381 job_id,
7382 hard_failure_note,
7383 category="error",
7384 metadata={"reason": "llm_provider_blocked", "error_type": type(exc).__name__},
7385 )
7386 db.finish_step(step_id, status="failed", output_data=result, error=str(exc))
7387 db.finish_run(run_id, "failed", error=str(exc))
7388 _emit_loop_end(db, job_id, run_id, status="failed", step_id=step_id, detail=str(exc))
7389 refresh_memory_index(db, job_id)
7390 return StepExecution(job_id=job_id, run_id=run_id, step_id=step_id, tool_name=None, status="failed", result=result)
7391
7392 llm_duration_seconds = round(max(0.0, time.monotonic() - llm_started), 3)
7393 job = db.get_job(job_id)
7394 usage = _emit_assistant_message_event(
7395 db,
7396 job_id,
7397 run_id,
7398 response,
7399 messages=messages,
7400 context_length=config.model.context_length,
7401 duration_seconds=llm_duration_seconds,
7402 )
7403 emit_context_pressure_update(db, job_id, usage)
7404 emit_usage_pressure_update(db, job_id, db.job_token_usage(job_id))
7405
7406 tool_repair_attempted = False
7407 tool_repair_error: dict[str, Any] | None = None
7408 original_content = response.content
7409 if not response.tool_calls and getattr(llm, "tool_repair", False):
7410 tool_repair_attempted = True
7411 repair_messages = _tool_repair_messages(messages, response)
7412 repair_started = time.monotonic()
7413 try:
7414 repair_response = _call_next_action_with_timeout(
7415 llm,
7416 messages=repair_messages,
7417 tools=_registry_tools_for_step(registry, config, recent_steps, job=job),
7418 timeout_seconds=model_config.request_timeout_seconds,
7419 )
7420 except Exception as exc:
7421 tool_repair_error = _error_result(exc)
7422 tool_repair_error["duration_seconds"] = round(max(0.0, time.monotonic() - repair_started), 3)
7423 else:
7424 repair_duration_seconds = round(max(0.0, time.monotonic() - repair_started), 3)
7425 repair_usage = _emit_assistant_message_event(
7426 db,
7427 job_id,
7428 run_id,
7429 repair_response,
7430 messages=repair_messages,
7431 context_length=config.model.context_length,
7432 duration_seconds=repair_duration_seconds,
7433 )
7434 emit_context_pressure_update(db, job_id, repair_usage)
7435 emit_usage_pressure_update(db, job_id, db.job_token_usage(job_id))
7436 if repair_response.tool_calls:
7437 response = repair_response
7438
7439 if response.tool_calls:
7440 executions: list[StepExecution] = []
7441 details: list[str] = []
7442 run_error: str | None = None
7443 ordered_tool_calls = _ordered_tool_calls_for_execution(
7444 response.tool_calls,
7445 job=db.get_job(job_id),
7446 recent_steps=db.list_steps(job_id=job_id),
7447 )
7448 for index, call in enumerate(ordered_tool_calls):
7449 current_job = db.get_job(job_id)
7450 current_recent_steps = db.list_steps(job_id=job_id)
7451 execution, stop_batch, detail, error = _execute_tool_call(
7452 call,
7453 job=current_job,
7454 recent_steps=current_recent_steps,
7455 config=config,
7456 db=db,
7457 artifacts=artifacts,
7458 registry=registry,
7459 job_id=job_id,
7460 run_id=run_id,
7461 )
7462 executions.append(execution)
7463 details.append(detail)
7464 if error:
7465 run_error = error
7466 if stop_batch:
7467 if index < len(ordered_tool_calls) - 1 and _is_continuable_recoverable_input_block(execution):
7468 details.append(f"continued after recoverable {call.name} input block")
7469 continue
7470 break
7471
7472 final_execution = executions[-1]
7473 run_status = "failed" if any(item.status == "failed" for item in executions) else "completed"
7474 db.finish_run(run_id, run_status, error=run_error)
7475 detail = f"executed {len(executions)}/{len(response.tool_calls)} tool calls"
7476 if details:
7477 detail = f"{detail}; last: {details[-1]}"
7478 _emit_loop_end(
7479 db,
7480 job_id,
7481 run_id,
7482 status=final_execution.status,
7483 step_id=final_execution.step_id,
7484 tool_name=final_execution.tool_name,
7485 detail=detail,
7486 )
7487 refresh_memory_index(db, job_id)
7488 return final_execution
7489
7490 step_id = db.add_step(
7491 job_id=job_id,
7492 run_id=run_id,
7493 kind="assistant",
7494 status="blocked",
7495 summary="worker returned content without a tool call",
7496 input_data={},
7497 )
7498 result = {
7499 "success": False,
7500 "recoverable": True,
7501 "error": "worker tool call required",
7502 "content": response.content,
7503 "original_content": original_content,
7504 "tool_repair_attempted": tool_repair_attempted,
7505 "tool_repair_error": tool_repair_error,
7506 "next": (
7507 "Worker turns must use a tool call. Continue by choosing one bounded action such as "
7508 "record_tasks, report_update, write_artifact, write_file, shell_exec, record_findings, "
7509 "record_source, record_experiment, record_lesson, or defer_job."
7510 ),
7511 }
7512 db.append_agent_update(
7513 job_id,
7514 "Worker returned a message without a tool call; continuing with a tool-action recovery constraint.",
7515 category="blocked",
7516 metadata={"reason": "worker_tool_call_required", "step_id": step_id},
7517 )
7518 db.finish_step(
7519 step_id,
7520 status="blocked",
7521 summary="blocked assistant-only worker turn; tool call required",
7522 output_data=result,
7523 error="worker tool call required",
7524 )
7525 db.finish_run(run_id, "blocked", error="worker tool call required")
7526 _emit_loop_end(
7527 db,
7528 job_id,
7529 run_id,
7530 status="blocked",
7531 step_id=step_id,
7532 detail="worker tool call required",
7533 )
7534 refresh_memory_index(db, job_id)
7535 return StepExecution(job_id=job_id, run_id=run_id, step_id=step_id, tool_name=None, status="blocked", result=result)
7536 finally:
7537 if owns_db:
7538 db.close()
nipux_cli/worker_policy.py 489 lines
1"""Static worker prompt and loop policy constants."""
2
3from __future__ import annotations
4
5import re
6
7
8REFLECTION_INTERVAL_STEPS = 12
9WORKER_PROTOCOL_VERSION = "2026-05-01-contract-first-v1"
10
11SYSTEM_PROMPT = """You are a long-running local work agent.
12
13Operate as a bounded worker, not a chat assistant. Choose one useful next step,
14call one of the available tools, and persist important evidence as artifacts.
15Do not claim the whole job is complete. A strong result is only a checkpoint:
16save it, report it, add the next tasks, and continue improving or broadening.
17
18Use a contract-first durable cycle. Read the objective, operator context,
19roadmap, active task, and recent evidence; choose the next action that satisfies
20the active output contract; produce or measure concrete evidence; update the
21right ledger; report the checkpoint; then open or continue the next branch.
22Research is only one possible contract. For action, experiment, monitor, report,
23or file-deliverable work, prefer execution, measurement, validation, or writing
24over more background collection. Keep moving forever until the operator pauses
25or cancels the job.
26The worker must not mark jobs completed or failed; use record_tasks,
27record_lesson, report_update, and artifacts to describe checkpoints, blockers,
28and next branches while the job stays runnable.
29
30Avoid loops. Do not repeat the same search query or the same exact tool call.
31If search results already exist, move forward by extracting source pages,
32opening a useful site in the browser, or saving a finding/evidence artifact.
33If a page has already been extracted and contains useful evidence, save that
34evidence with write_artifact before doing more searching or browsing.
35Only click or type browser refs from the most recent successful browser snapshot
36or navigation result. If a click/type fails with an unknown ref, use the fresh
37recovery snapshot or call browser_snapshot before retrying.
38If a source shows Cloudflare, login, paywall, or anti-bot verification, keep it
39visible in the trace. Do not bypass protections. Continue with normal visible
40browser actions when possible, persist what you have, or use alternate public
41sources if stuck.
42If a tool returns a list of actionable candidates such as files, packages,
43configurations, commands, sources, venues, records, branches, or options, do not
44keep re-listing the same candidate set with small formatting changes. Persist the
45candidate list once, choose the best candidate for the active contract, and move
46to execution, measurement, validation, or an explicit blocked decision.
47If a probe discovers a local/runtime candidate that might satisfy the active
48contract, promote that candidate immediately: record the fact, validate it with
49the smallest relevant action, and measure it before continuing external
50acquisition or research retries. Do not let an available local candidate fall
51out of context while pursuing lower-confidence external sources.
52If repeated external acquisition attempts fail with authentication, permission,
53quota, missing credentials, or unavailable resources, mark that branch blocked
54or low-yield and pivot to another source, local candidate, monitor/defer branch,
55or operator-visible credential requirement instead of retrying small variants.
56If a browser page says blocked, CAPTCHA, bot check, login required, paywall, or
57anti-bot, treat that page as a failed/low-yield source for the current job. Do
58not write an artifact that claims usable evidence exists unless the evidence is
59actually visible. Record the source outcome or pivot to another public source.
60Use report_update for short operator-readable progress notes when you need to
61say what you found or why you are blocked. Do not use report_update instead of
62write_artifact when you have durable evidence, findings, or report content to save.
63Use write_file when the objective requires a concrete file deliverable, source
64file, document, config, dataset, or other workspace output. If a measured
65experiment says the next action is to write, merge, update, compile, or insert
66content, prefer write_file or an execution command that actually changes the
67target over more read-only inspection.
68Use defer_job when the next useful step is to wait for an external process,
69scheduled check, long-running command, or monitor interval. Do not
70simulate waiting with repeated searches, reports, or shell probes.
71Use record_lesson when you learn something that should change future behavior:
72bad source patterns, task-specific success criteria, repeated mistakes, operator
73preferences, or a better strategy. Keep lessons short and reusable.
74Use record_memory_graph when work produces reusable connected knowledge: an
75episode worth remembering, a stable fact, a strategy, a reusable skill, an open
76question, a decision, or a constraint. Link nodes to their evidence and to each
77other. Treat this as the job's durable brain: recent events are fast episodic
78memory, while stable graph nodes are consolidated knowledge that should guide
79future branches without replaying raw history.
80Durable memory is not automatically true forever. If newer evidence contradicts
81an older memory-graph fact, constraint, strategy, or finding, update the older
82record as deprecated/resolved/stale and link the newer evidence before acting.
83Prefer fresh measured or directly observed evidence over stale summaries.
84Use record_source when a source is high-yield, low-yield, blocked, repetitive,
85or otherwise useful to score for future behavior.
86Use record_findings after finding durable candidates, facts, opportunities,
87experiments, files, bugs, sources, or other reusable outputs. Dedupe against the
88finding ledger and artifacts before saving.
89Use record_tasks to maintain a durable queue of objective-neutral branches:
90open work, active branch, blocked branch, completed branch, and skipped branch.
91Each task should include an output_contract (research, artifact, experiment,
92action, monitor, decision, or report), acceptance criteria, evidence needed,
93and stall behavior so progress is judged by evidence, not activity volume.
94Before marking a task or milestone done, audit the claim against the objective:
95list the requirement, the concrete artifact/file/finding/measurement/validation
96that proves it, and any remaining gap. If the audit is incomplete, keep the
97branch active or blocked and create the next smallest follow-up task.
98When the job is broad or starts looping, split it into tasks and move to the
99highest-priority open task rather than staying on one source or tactic forever.
100Use record_roadmap for broad, multi-phase, or ambiguous objectives that need a
101higher-level orchestration plan. A roadmap is generic: milestones group related
102features or work units; each milestone has acceptance criteria, evidence needed,
103and a validation contract. Use record_milestone_validation at milestone checkpoints
104to pass, fail, block, or create follow-up tasks from validation gaps. Keep the
105roadmap compact and update it from durable evidence, not from activity count.
106Use record_experiment for measurable trials, benchmarks, comparisons,
107optimization attempts, or hypothesis tests. A saved note, source, or artifact is
108not enough progress for a measurable objective: record the exact configuration,
109metric, result, whether higher or lower is better, and the next experiment. Keep
110improving against the best observed result instead of declaring victory after a
111single measurement.
112Use shell_exec for command-line work, repository inspection, diagnostics,
113benchmarks, repeatable experiments, and other command execution that the
114objective requires. Prefer small read-only probes before changing anything, use
115explicit timeouts, and save important command output with write_artifact before
116continuing. Do not run destructive or high-risk cyber commands.
117For long downloads, builds, training runs, crawls, benchmarks, or other slow
118actions, treat the action as a monitored branch: choose a timeout that can make
119meaningful progress, use resumable commands when available, record partial
120progress as an experiment/task/checkpoint, and use defer_job when the next useful
121step is to wait and check again. Do not repeatedly restart the same long action
122with short timeouts without recording what changed and how the next attempt will
123resume or differ.
124If a probe shows a partial output, incomplete file, running process, cache entry,
125checkpoint, or other unfinished artifact from an action branch, stop re-listing
126the same state. Either resume/continue the action with a resumable command,
127record a monitor/defer step for the still-running work, or record the branch as
128blocked with the concrete missing condition and next action.
129read_artifact only reads saved Nipux artifacts. Use shell_exec for repository,
130workspace, project, or filesystem files that are not saved artifacts.
131write_file writes workspace/local files directly; write_artifact writes Nipux's
132separate saved-output store. Use the right one for the operator-facing result.
133Operator messages are durable context from the human operator. Messages marked
134steer are active constraints until acknowledged or superseded. Messages marked
135follow_up are lower-priority queued work; keep them in the task queue and act on
136them after the current active branch has a durable checkpoint. Messages marked
137note are durable preferences. Use acknowledge_operator_context only after you
138have incorporated or intentionally superseded a steer/follow_up message.
139"""
140
141INFORMATION_GATHERING_TOOLS = {
142 "browser_back",
143 "browser_click",
144 "browser_console",
145 "browser_navigate",
146 "browser_press",
147 "browser_scroll",
148 "browser_snapshot",
149 "browser_type",
150 "web_extract",
151 "web_search",
152}
153
154ARTIFACT_REVIEW_TOOLS = {"read_artifact", "search_artifacts"}
155MEMORY_REVIEW_TOOLS = {"search_memory_graph"}
156BRANCH_WORK_TOOLS = INFORMATION_GATHERING_TOOLS | ARTIFACT_REVIEW_TOOLS | MEMORY_REVIEW_TOOLS | {"shell_exec"}
157LEDGER_PROGRESS_TOOLS = {
158 "guard_recovery",
159 "record_findings",
160 "record_memory_graph",
161 "record_source",
162 "record_tasks",
163 "record_roadmap",
164 "record_milestone_validation",
165 "record_experiment",
166 "record_lesson",
167}
168MEASUREMENT_RESOLUTION_TOOLS = {"record_experiment", "record_lesson", "record_tasks", "record_milestone_validation"}
169FILE_VALIDATION_RESOLUTION_TOOLS = {
170 "shell_exec",
171 "record_experiment",
172 "record_lesson",
173 "record_tasks",
174 "record_milestone_validation",
175 "record_memory_graph",
176 "acknowledge_operator_context",
177}
178ARTIFACT_ACCOUNTING_RESOLUTION_TOOLS = LEDGER_PROGRESS_TOOLS | {"acknowledge_operator_context"}
179ARTIFACT_ACCOUNTING_BLOCKED_TOOLS = INFORMATION_GATHERING_TOOLS | {
180 "shell_exec",
181 "write_file",
182 "write_artifact",
183 "read_artifact",
184 "search_artifacts",
185 "report_update",
186}
187MEASUREMENT_BLOCKED_TOOLS = INFORMATION_GATHERING_TOOLS | {
188 "shell_exec",
189 "write_file",
190 "write_artifact",
191 "record_findings",
192 "record_memory_graph",
193 "record_source",
194 "acknowledge_operator_context",
195 "report_update",
196}
197FILE_VALIDATION_BLOCKED_TOOLS = INFORMATION_GATHERING_TOOLS | ARTIFACT_REVIEW_TOOLS | MEMORY_REVIEW_TOOLS | {
198 "write_file",
199 "write_artifact",
200 "record_findings",
201 "record_source",
202 "report_update",
203}
204MILESTONE_VALIDATION_BLOCKED_TOOLS = INFORMATION_GATHERING_TOOLS | {
205 "shell_exec",
206 "write_file",
207 "write_artifact",
208 "record_findings",
209 "record_source",
210 "record_experiment",
211 "report_update",
212}
213ROADMAP_STALENESS_BLOCKED_TOOLS = INFORMATION_GATHERING_TOOLS | {
214 "shell_exec",
215 "write_file",
216 "write_artifact",
217 "record_findings",
218 "record_source",
219 "record_tasks",
220 "record_experiment",
221 "report_update",
222}
223CHURN_TOOLS = INFORMATION_GATHERING_TOOLS | ARTIFACT_REVIEW_TOOLS | MEMORY_REVIEW_TOOLS | {"shell_exec"}
224MEMORY_CONSOLIDATION_BLOCKED_TOOLS = CHURN_TOOLS | {"write_artifact", "write_file", "report_update"}
225ACTIVITY_STAGNATION_BLOCKED_TOOLS = CHURN_TOOLS | {"write_artifact", "write_file", "report_update"}
226DELIVERABLE_PROGRESS_BLOCKED_TOOLS = INFORMATION_GATHERING_TOOLS | ARTIFACT_REVIEW_TOOLS | {"report_update"}
227RESEARCH_BALANCE_BLOCKED_TOOLS = ARTIFACT_REVIEW_TOOLS | MEMORY_REVIEW_TOOLS | {
228 "shell_exec",
229 "write_file",
230 "write_artifact",
231 "record_lesson",
232 "report_update",
233}
234SOURCE_YIELD_BLOCKED_TOOLS = INFORMATION_GATHERING_TOOLS | ARTIFACT_REVIEW_TOOLS | MEMORY_REVIEW_TOOLS | {
235 "shell_exec",
236 "write_file",
237 "write_artifact",
238 "report_update",
239}
240MEASURABLE_RESEARCH_BLOCKED_TOOLS = INFORMATION_GATHERING_TOOLS | {
241 "write_artifact",
242 "record_findings",
243 "record_source",
244 "report_update",
245}
246MEASURABLE_PROGRESS_PATTERN = re.compile(
247 r"(?i)\b("
248 r"benchmark|baseline|compare|comparison|experiment|improv(?:e|ing|ement)|increase|latency|"
249 r"measure|metric|minimi[sz]e|maximi[sz]e|optim(?:ize|ise|ization|isation)|performance|"
250 r"rate|reduce|score|speed|throughput|tune|tuning"
251 r")\b"
252)
253RECOVERABLE_GUARD_ERRORS = {
254 "artifact search loop blocked",
255 "browser runtime unavailable",
256 "deliverable checkpoint required",
257 "durable progress required",
258 "evidence checkpoint accounting required",
259 "evidence grounding required",
260 "duplicate tool call blocked",
261 "experiment stagnation decision required",
262 "experiment next action pending",
263 "known bad source blocked",
264 "lesson consolidation required",
265 "memory graph consolidation required",
266 "measurement obligation pending",
267 "measured progress required",
268 "progress accounting required",
269 "progress ledger update required",
270 "action decision required",
271 "similar artifact search blocked",
272 "similar search query blocked",
273 "source yield accounting required",
274 "task execution required",
275 "task branch required before more work",
276 "task queue saturated",
277 "usage pressure recovery required",
278 "worker tool call required",
279}
280MEASURABLE_RESEARCH_BUDGET_STEPS = 18
281MEASURABLE_ACTION_BUDGET_STEPS = 4
282DELIVERABLE_RESEARCH_BUDGET_STEPS = 18
283ACTIVITY_STAGNATION_CHECKPOINTS = 3
284TASK_QUEUE_SATURATION_OPEN_TASKS = 40
285TASK_QUEUE_TOTAL_SOFT_LIMIT = 80
286TASK_PLANNING_STAGNATION_CHECKPOINTS = 2
287PROGRAM_PROMPT_CHARS = 2000
288MEMORY_ENTRY_PROMPT_CHARS = 700
289MEMORY_PROMPT_CHARS = 1800
290RECENT_STATE_STEPS = 5
291RECENT_STATE_PROMPT_CHARS = 3000
292TIMELINE_PROMPT_EVENTS = 8
293SECTION_ITEM_CHARS = 420
294MAX_WORKER_PROMPT_CHARS = 18_000
295TIMELINE_PROMPT_EVENT_TYPES = {
296 "agent_message",
297 "artifact",
298 "error",
299 "experiment",
300 "finding",
301 "lesson",
302 "memory_node",
303 "milestone_validation",
304 "reflection",
305 "roadmap",
306 "source",
307 "task",
308}
309TIMELINE_PROMPT_AGENT_TITLES = {"blocked", "error", "plan", "progress", "report", "update"}
310TIMELINE_PROMPT_TOOL_STATUSES = {"blocked", "failed"}
311PROMPT_SECTION_BUDGETS = {
312 "Workspace": 520,
313 "Operator context": 2_200,
314 "Current execution focus": 1_600,
315 "Pending measurement obligation": 1_100,
316 "Candidate file discovery": 2_000,
317 "Measured progress guard": 1_000,
318 "Experiment stagnation guard": 1_000,
319 "Source yield guard": 1_000,
320 "Deliverable progress guard": 1_000,
321 "Progress accounting guard": 900,
322 "Activity stagnation": 900,
323 "Task planning guard": 900,
324 "Memory consolidation guard": 900,
325 "Lesson consolidation guard": 900,
326 "Durable progress yield": 900,
327 "Program": 1_400,
328 "Usage pressure": 900,
329 "Lessons learned": 1_100,
330 "Memory graph": 1_800,
331 "Roadmap": 2_000,
332 "Task queue": 2_400,
333 "Durable outcomes": 1_200,
334 "Ledgers": 2_400,
335 "Experiment ledger": 2_200,
336 "Reflections": 900,
337 "Compact memory": 1_100,
338 "Recent visible timeline": 1_000,
339 "Recent state": 1_800,
340 "Next-action constraint": 1_100,
341}
342
343QUERY_STOPWORDS = {
344 "and",
345 "are",
346 "does",
347 "for",
348 "from",
349 "how",
350 "offer",
351 "product",
352 "service",
353 "services",
354 "the",
355 "they",
356 "what",
357 "with",
358}
359TEXT_TOKEN_STOPWORDS = <redacted>
360 "and",
361 "are",
362 "for",
363 "from",
364 "into",
365 "that",
366 "the",
367 "this",
368 "with",
369}
370
371EVIDENCE_ARTIFACT_TERMS = {
372 "checkpoint",
373 "evidence",
374 "extract",
375 "extracted",
376 "notes",
377 "source",
378 "sources",
379}
380DELIVERABLE_ARTIFACT_TERMS = {
381 "article",
382 "checklist",
383 "compiled",
384 "deck",
385 "deliverable",
386 "doc",
387 "document",
388 "draft",
389 "final",
390 "guide",
391 "manual",
392 "memo",
393 "outline",
394 "paper",
395 "presentation",
396 "report",
397 "revision",
398 "section",
399 "spec",
400 "template",
401 "updated",
402 "writeup",
403}
404TASK_DELIVERABLE_ACTION_TERMS = {
405 "add",
406 "append",
407 "compile",
408 "create",
409 "edit",
410 "insert",
411 "polish",
412 "rewrite",
413 "update",
414 "write",
415}
416EXPERIMENT_DELIVERY_ACTION_TERMS = {
417 "append",
418 "apply",
419 "build",
420 "compile",
421 "create",
422 "edit",
423 "finish",
424 "fix",
425 "generate",
426 "implement",
427 "insert",
428 "merge",
429 "patch",
430 "produce",
431 "publish",
432 "replace",
433 "rewrite",
434 "save",
435 "update",
436 "write",
437}
438EXPERIMENT_INFORMATION_ACTION_TERMS = {
439 "audit",
440 "collect",
441 "extract",
442 "find",
443 "gather",
444 "inspect",
445 "read",
446 "research",
447 "review",
448 "search",
449 "source",
450 "survey",
451}
452EXPERIMENT_NEXT_ACTION_BLOCKED_TOOLS = INFORMATION_GATHERING_TOOLS | {"report_update"}
453READ_ONLY_SHELL_COMMAND_PATTERN = re.compile(
454 r"(?is)^\s*(?:"
455 r"awk\b|cat\b|df\b|du\b|echo\b|find\b|git\s+(?:diff|grep|log|ls-files|show|status)\b|"
456 r"grep\b|head\b|ls\b|pwd\b|rg\b|sed\s+-n\b|stat\b|tail\b|tree\b|wc\b"
457 r")"
458)
459
460BROWSER_REF_IGNORE_NAMES = {
461 "about us",
462 "back to top",
463 "careers",
464 "click here",
465 "clutch rating",
466 "organization name",
467 "contact",
468 "contact us",
469 "go",
470 "headquarters",
471 "help",
472 "latest links",
473 "learn more",
474 "privacy",
475 "read more",
476 "readmore",
477 "services",
478 "submit",
479 "top hits",
480}
481
482ANTI_BOT_ACK_TERMS = (
483 "anti-bot",
484 "blocked",
485 "bot check",
486 "captcha",
487 "not usable",
488 "verification",
489)
nipux_cli/worker_prompt_context.py 921 lines
1"""Prompt-context renderers for the Nipux worker loop."""
2
3from __future__ import annotations
4
5import re
6from typing import Any
7
8from nipux_cli.memory_graph import memory_graph_for_prompt
9from nipux_cli.metric_format import format_metric_value
10from nipux_cli.operator_context import active_prompt_operator_entries, operator_entry_is_prompt_relevant
11from nipux_cli.tui_outcomes import hourly_outcome_summary, model_update_event_parts, outcome_counts
12from nipux_cli.worker_policy import (
13 MAX_WORKER_PROMPT_CHARS,
14 PROMPT_SECTION_BUDGETS,
15 SECTION_ITEM_CHARS,
16 TIMELINE_PROMPT_AGENT_TITLES,
17 TIMELINE_PROMPT_EVENT_TYPES,
18 TIMELINE_PROMPT_EVENTS,
19 TIMELINE_PROMPT_TOOL_STATUSES,
20)
21from nipux_cli.worker_prompt_format import clip_text as _clip_text
22
23
24NEGATIVE_EXISTENCE_MARKERS = (
25 "cannot access",
26 "does not exist",
27 "failed to find",
28 "has not been",
29 "missing",
30 "no ",
31 "no such",
32 "none",
33 "not available",
34 "not detected",
35 "not downloaded",
36 "not found",
37 "not installed",
38 "unavailable",
39 "was not",
40 "without",
41)
42NEGATIVE_EVIDENCE_LINE_MARKERS = (
43 "cannot access",
44 "denied",
45 "does not exist",
46 "error",
47 "failed",
48 "failure",
49 "has not been",
50 "missing",
51 "no such",
52 "not available",
53 "not detected",
54 "not downloaded",
55 "not found",
56 "not installed",
57 "permission",
58 "timeout",
59 "unavailable",
60 "was not",
61)
62
63
64def _memory_entries_for_prompt(memory_entries: list[dict[str, Any]], *, limit: int = 2) -> list[dict[str, Any]]:
65 entries = [entry for entry in memory_entries if isinstance(entry, dict)]
66 rolling = next((entry for entry in entries if entry.get("key") == "rolling_state"), None)
67 selected: list[dict[str, Any]] = []
68 if rolling:
69 selected.append(rolling)
70 for entry in entries:
71 if len(selected) >= limit:
72 break
73 if rolling is not None and entry is rolling:
74 continue
75 selected.append(entry)
76 return selected[:limit]
77
78
79def _render_worker_prompt(job: dict[str, Any], *, sections: list[tuple[str, str]]) -> str:
80 objective = _clip_text(job.get("objective") or "", 2_000)
81 header = f"Job: {job['title']}\nKind: {job['kind']}\nObjective:\n{objective}"
82 metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
83 stale_tokens = _stale_claim_tokens_for_prompt(
84 metadata,
85 reference_text=" ".join(str(job.get(key) or "") for key in ("title", "objective", "kind")),
86 )
87 instruction = (
88 "Take exactly one bounded next action. If recent state contains search results, do not search the same query again. "
89 "If recent state contains extracted page evidence, write an artifact before doing more search or browsing."
90 )
91 scale = 1.0
92 while True:
93 parts = [header]
94 for title, body in sections:
95 base_budget = PROMPT_SECTION_BUDGETS.get(title, SECTION_ITEM_CHARS)
96 budget = max(260, int(base_budget * scale))
97 safe_body = _redact_stale_tokens_for_prompt(body, stale_tokens)
98 parts.append(f"{title}:\n{_clip_text(safe_body, budget)}")
99 parts.append(instruction)
100 content = "\n\n".join(parts)
101 if len(content) <= MAX_WORKER_PROMPT_CHARS or scale <= 0.45:
102 break
103 scale -= 0.12
104 if len(content) <= MAX_WORKER_PROMPT_CHARS:
105 return content
106 suffix_sections: list[str] = []
107 for title, body in sections:
108 if title == "Operator context":
109 suffix_sections.append(f"Operator context:\n{_clip_text(_redact_stale_tokens_for_prompt(body, stale_tokens), 900)}")
110 elif title == "Next-action constraint":
111 suffix_sections.append(f"Next-action constraint:\n{_clip_text(_redact_stale_tokens_for_prompt(body, stale_tokens), 900)}")
112 suffix = "\n\n".join(suffix_sections + [instruction])
113 marker = "\n\n...[middle context clipped; operator context and next action repeated below]...\n"
114 head_budget = max(0, MAX_WORKER_PROMPT_CHARS - len(suffix) - len(marker))
115 return _clip_text(content, head_budget) + marker + suffix
116
117
118def _redact_stale_tokens_for_prompt(text: str, stale_tokens: list[str]) -> str:
119 redacted = str(text or "")
120 for token in sorted((str(token) for token in stale_tokens if str(token).strip()), key=len, reverse=True):
121 pattern = r"(?<![A-Za-z0-9])" + re.escape(token) + r"(?![A-Za-z0-9])"
122 redacted = re.sub(
123 pattern,
124 lambda match: match.group(0) if _match_inside_path_like_span(match.string, match.start(), match.end()) else "[unsupported-stale-claim]",
125 redacted,
126 flags=re.IGNORECASE,
127 )
128 return redacted
129
130
131def _match_inside_path_like_span(text: str, start: int, end: int) -> bool:
132 left = start
133 while left > 0 and not text[left - 1].isspace() and text[left - 1] not in "'\"`<>|;":
134 left -= 1
135 right = end
136 while right < len(text) and not text[right].isspace() and text[right] not in "'\"`<>|;":
137 right += 1
138 span = text[left:right]
139 return "/" in span or "\\" in span or span.startswith(("~", "."))
140
141
142def _operator_messages_for_prompt(
143 job: dict[str, Any],
144 *,
145 active_messages: list[dict[str, Any]] | None = None,
146 include_unclaimed: bool = True,
147) -> str:
148 metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
149 messages = metadata.get("operator_messages") if isinstance(metadata.get("operator_messages"), list) else []
150 lines = []
151 active_messages = active_prompt_operator_entries(active_messages or [])
152 active_ids = {active.get("event_id") for active in active_messages if isinstance(active, dict)}
153 if active_messages:
154 lines.append("Newly delivered operator messages for this turn:")
155 for entry in active_messages:
156 line = _operator_message_line(entry)
157 if line:
158 lines.append(line)
159 active_context = [
160 entry
161 for entry in messages
162 if isinstance(entry, dict)
163 and operator_entry_is_prompt_relevant(entry)
164 and _operator_message_visible_in_prompt(entry, include_unclaimed=include_unclaimed)
165 and entry.get("event_id") not in active_ids
166 ]
167 if active_context:
168 if lines:
169 lines.append("Still-active durable operator context:")
170 for entry in active_context[-6:]:
171 line = _operator_message_line(entry)
172 if line:
173 lines.append(line)
174 return "\n".join(lines) if lines else "No active operator context."
175
176
177def _operator_message_line(entry: dict[str, Any]) -> str:
178 if not isinstance(entry, dict):
179 return ""
180 at = str(entry.get("at") or "")
181 source = str(entry.get("source") or "operator")
182 mode = str(entry.get("mode") or "steer")
183 event_id = str(entry.get("event_id") or "")
184 message = " ".join(str(entry.get("message") or "").split())
185 if message:
186 states = []
187 if entry.get("claimed_at"):
188 states.append("delivered")
189 if entry.get("acknowledged_at"):
190 states.append("acknowledged")
191 if entry.get("superseded_at"):
192 states.append("superseded")
193 state_text = f" ({', '.join(states)})" if states else ""
194 id_text = f" id={event_id}" if event_id else ""
195 return f"-{id_text} {at} {source} {mode}{state_text}: {_clip_text(message, 420)}"
196 return ""
197
198
199def _lessons_for_prompt(job: dict[str, Any]) -> str:
200 metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
201 lessons = metadata.get("lessons") if isinstance(metadata.get("lessons"), list) else []
202 if not lessons:
203 return "No durable lessons yet."
204 reference_text = " ".join(str(job.get(key) or "") for key in ("title", "objective", "kind"))
205 positive_lines = _positive_durable_lines_for_lesson_conflicts(metadata)
206 stale_lesson_ids = _stale_negative_record_ids(metadata, kind="lesson")
207 lines = []
208 for entry in lessons[-5:]:
209 if not isinstance(entry, dict):
210 continue
211 category = str(entry.get("category") or "memory")
212 raw_lesson = str(entry.get("lesson") or "")
213 record_id = _record_id_for_staleness(entry)
214 conflicting_tokens = _negative_lesson_conflict_tokens(raw_lesson, positive_lines)
215 if conflicting_tokens:
216 lesson = (
217 "Potentially stale negative lesson suppressed for "
218 + ", ".join(conflicting_tokens[:6])
219 + ". Re-verify against fresh evidence before using this claim."
220 )
221 elif record_id in stale_lesson_ids:
222 lesson = "Potentially stale negative lesson suppressed after fresh contradictory evidence. Re-verify before using this claim."
223 else:
224 lesson = _lesson_prompt_text(raw_lesson, reference_text=reference_text)
225 if lesson:
226 lines.append(f"- {category}: {_clip_text(lesson, SECTION_ITEM_CHARS)}")
227 return "\n".join(lines) if lines else "No durable lessons yet."
228
229
230def _lesson_prompt_text(lesson: str, *, reference_text: str = "") -> str:
231 lesson = " ".join(str(lesson or "").split())
232 if "unsupported concrete tokens" not in lesson.lower():
233 return lesson
234 reference_norm = _normalize_claim_text(reference_text)
235 stale_tokens = []
236 seen = set()
237 for token in _unsupported_tokens_from_lesson(lesson):
238 cleaned = " ".join(str(token or "").split())
239 if not cleaned:
240 continue
241 key = cleaned.lower()
242 if reference_norm and _normalize_claim_text(cleaned) in reference_norm:
243 continue
244 if key in seen or not _stale_token_is_distinctive(cleaned):
245 continue
246 seen.add(key)
247 stale_tokens.append(cleaned)
248 if stale_tokens:
249 return (
250 "Evidence grounding rejected unsupported durable-record claims: "
251 + ", ".join(stale_tokens[:8])
252 + ". Re-verify them from fresh evidence before using them."
253 )
254 return "Evidence grounding rejected an unsupported durable record. Re-verify from fresh evidence before using it."
255
256
257def _positive_durable_lines_for_lesson_conflicts(metadata: dict[str, Any]) -> list[str]:
258 lines: list[str] = []
259 for key in ("finding_ledger", "experiment_ledger", "source_ledger"):
260 records = metadata.get(key) if isinstance(metadata.get(key), list) else []
261 for record in records[-30:]:
262 if isinstance(record, dict):
263 text = _dict_scalar_text(record)
264 if text:
265 lines.append(text)
266 graph = metadata.get("memory_graph") if isinstance(metadata.get("memory_graph"), dict) else {}
267 nodes = graph.get("nodes") if isinstance(graph.get("nodes"), list) else []
268 for node in nodes[-30:]:
269 if isinstance(node, dict):
270 text = _dict_scalar_text(node)
271 if text:
272 lines.append(text)
273 return lines
274
275
276def _negative_lesson_conflict_tokens(lesson: str, positive_lines: list[str]) -> list[str]:
277 lesson = " ".join(str(lesson or "").split())
278 lesson_lower = lesson.lower()
279 if not positive_lines or not any(marker in lesson_lower for marker in NEGATIVE_EXISTENCE_MARKERS):
280 return []
281 tokens = _distinctive_claim_tokens(lesson)
282 conflicts: list[str] = []
283 seen: set[str] = set()
284 for token in tokens:
285 key = token.lower()
286 if key in seen:
287 continue
288 seen.add(key)
289 if not _token_near_negative_marker(lesson, token):
290 continue
291 if _positive_line_contains_token(positive_lines, token):
292 conflicts.append(token)
293 return conflicts
294
295
296def _dict_scalar_text(record: dict[str, Any]) -> str:
297 parts: list[str] = []
298 for key, value in record.items():
299 if key in {"created_at", "updated_at", "event_id", "id"}:
300 continue
301 if isinstance(value, (str, int, float, bool)):
302 parts.append(str(value))
303 elif isinstance(value, dict):
304 parts.append(_dict_scalar_text(value))
305 return " ".join(part for part in parts if part)
306
307
308def _distinctive_claim_tokens(text: str) -> list[str]:
309 tokens: list[str] = []
310 for raw in re.findall(r"\b[A-Za-z][A-Za-z0-9_.+-]{1,}\b", text):
311 token = raw.strip("._+-")
312 if token and _stale_token_is_distinctive(token):
313 tokens.append(token)
314 return tokens
315
316
317def _token_near_negative_marker(text: str, token: str, *, window: int = 140) -> bool:
318 text_lower = text.lower()
319 token_lower = token.lower()
320 start = 0
321 while True:
322 index = text_lower.find(token_lower, start)
323 if index < 0:
324 return False
325 nearby = text_lower[max(0, index - window): index + len(token_lower) + window]
326 if any(marker in nearby for marker in NEGATIVE_EXISTENCE_MARKERS):
327 return True
328 start = index + len(token_lower)
329
330
331def _positive_line_contains_token(lines: list[str], token: str) -> bool:
332 token_lower = token.lower()
333 for line in lines:
334 line_lower = line.lower()
335 if token_lower not in line_lower:
336 continue
337 if any(marker in line_lower for marker in NEGATIVE_EVIDENCE_LINE_MARKERS):
338 continue
339 return True
340 return False
341
342
343def _memory_graph_for_prompt(job: dict[str, Any]) -> str:
344 metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
345 stale_tokens = _stale_claim_tokens_for_prompt(
346 metadata,
347 reference_text=" ".join(str(job.get(key) or "") for key in ("title", "objective", "kind")),
348 )
349 return memory_graph_for_prompt(job, limit=10, stale_tokens=stale_tokens)
350
351
352def _roadmap_for_prompt(job: dict[str, Any]) -> str:
353 metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
354 roadmap = metadata.get("roadmap") if isinstance(metadata.get("roadmap"), dict) else {}
355 if not roadmap:
356 return (
357 "No roadmap yet. If the objective is broad, multi-phase, or needs validation checkpoints, "
358 "use record_roadmap to define compact milestones, features, acceptance criteria, and validation evidence."
359 )
360 milestones = roadmap.get("milestones") if isinstance(roadmap.get("milestones"), list) else []
361 status_counts: dict[str, int] = {}
362 validation_counts: dict[str, int] = {}
363 for milestone in milestones:
364 if not isinstance(milestone, dict):
365 continue
366 status = str(milestone.get("status") or "planned")
367 validation_status = str(milestone.get("validation_status") or "not_started")
368 status_counts[status] = status_counts.get(status, 0) + 1
369 validation_counts[validation_status] = validation_counts.get(validation_status, 0) + 1
370 lines = [
371 _clip_text(
372 f"{roadmap.get('status') or 'planned'}: {roadmap.get('title') or 'Roadmap'}"
373 + (f" | current={roadmap.get('current_milestone')}" if roadmap.get("current_milestone") else ""),
374 520,
375 ),
376 "Milestone counts: " + (", ".join(f"{key}={value}" for key, value in sorted(status_counts.items())) or "none"),
377 "Validation counts: " + (", ".join(f"{key}={value}" for key, value in sorted(validation_counts.items())) or "none"),
378 ]
379 if roadmap.get("scope"):
380 lines.append("Scope: " + _clip_text(str(roadmap.get("scope") or ""), 420))
381 if roadmap.get("validation_contract"):
382 lines.append("Validation contract: " + _clip_text(str(roadmap.get("validation_contract") or ""), 520))
383 selected = [
384 milestone for milestone in milestones
385 if isinstance(milestone, dict)
386 and str(milestone.get("status") or "planned") in {"active", "validating", "planned", "blocked"}
387 ][:6]
388 if not selected:
389 selected = [milestone for milestone in milestones if isinstance(milestone, dict)][-4:]
390 for milestone in selected[:6]:
391 features = milestone.get("features") if isinstance(milestone.get("features"), list) else []
392 open_features = sum(1 for feature in features if isinstance(feature, dict) and str(feature.get("status") or "planned") in {"planned", "active"})
393 detail = " | ".join(
394 bit
395 for bit in [
396 str(milestone.get("status") or "planned"),
397 f"validation={milestone.get('validation_status') or 'not_started'}",
398 f"p={milestone.get('priority') or 0}",
399 str(milestone.get("title") or "milestone"),
400 f"features={len(features)}/{open_features} open" if features else "",
401 ]
402 if bit
403 )
404 if milestone.get("acceptance_criteria"):
405 detail += f" | accept={milestone.get('acceptance_criteria')}"
406 if milestone.get("evidence_needed"):
407 detail += f" | evidence={milestone.get('evidence_needed')}"
408 if milestone.get("validation_result"):
409 detail += f" | validation_result={milestone.get('validation_result')}"
410 if milestone.get("next_action"):
411 detail += f" | next={milestone.get('next_action')}"
412 lines.append("- " + _clip_text(detail, 620))
413 return "\n".join(lines)
414
415
416def _tasks_for_prompt(job: dict[str, Any]) -> str:
417 tasks = _metadata_list(job, "task_queue")
418 if not tasks:
419 return (
420 "No durable task queue yet. If the objective is broad, use record_tasks "
421 "to create a few concrete open branches with output contracts and acceptance criteria before continuing."
422 )
423 status_rank = {"active": 0, "open": 1, "blocked": 2, "done": 3, "skipped": 4}
424 ranked = sorted(
425 tasks,
426 key=lambda task: (status_rank.get(str(task.get("status") or "open"), 9), -_as_int(task.get("priority"))),
427 )
428 counts: dict[str, int] = {}
429 for task in tasks:
430 status = str(task.get("status") or "open")
431 counts[status] = counts.get(status, 0) + 1
432 lines = ["Task counts: " + ", ".join(f"{key}={value}" for key, value in sorted(counts.items()))]
433 selected = [task for task in ranked if str(task.get("status") or "open") in {"active", "open"}][:6]
434 if len(selected) < 6:
435 selected.extend([task for task in ranked if str(task.get("status") or "open") == "blocked"][: 6 - len(selected)])
436 if len(selected) < 6:
437 selected.extend([task for task in ranked if task not in selected][: 6 - len(selected)])
438 for task in selected[:6]:
439 output_contract = _task_output_contract(task)
440 bits = [
441 str(task.get("status") or "open"),
442 f"priority={task.get('priority') or 0}",
443 str(task.get("title") or "untitled"),
444 ]
445 if output_contract:
446 bits.append(f"contract={output_contract}")
447 detail = " | ".join(bit for bit in bits if bit)
448 if task.get("goal"):
449 detail += f" | goal={task.get('goal')}"
450 if task.get("acceptance_criteria"):
451 detail += f" | accept={task.get('acceptance_criteria')}"
452 if task.get("evidence_needed"):
453 detail += f" | evidence={task.get('evidence_needed')}"
454 if task.get("stall_behavior"):
455 detail += f" | stall={task.get('stall_behavior')}"
456 if task.get("source_hint"):
457 detail += f" | source_hint={task.get('source_hint')}"
458 if task.get("result"):
459 detail += f" | result={task.get('result')}"
460 lines.append("- " + _clip_text(detail, 520))
461 return "\n".join(lines)
462
463
464def _task_output_contract(task: dict[str, Any]) -> str:
465 metadata = task.get("metadata") if isinstance(task.get("metadata"), dict) else {}
466 return str(task.get("output_contract") or task.get("contract") or metadata.get("output_contract") or metadata.get("contract") or "")
467
468
469def _timeline_for_prompt(events: list[dict[str, Any]]) -> str:
470 if not events:
471 return "No timeline events yet."
472 selected: list[tuple[str, str, str]] = []
473 counts: dict[str, int] = {}
474 for event in events:
475 rendered = _timeline_event_for_prompt(event)
476 if not rendered:
477 continue
478 at, event_type, detail = rendered
479 counts[event_type] = counts.get(event_type, 0) + 1
480 selected.append((at, event_type, detail))
481 if not selected:
482 return "No high-signal timeline events yet. Recent state covers raw tool activity."
483 summary = ", ".join(f"{key}={value}" for key, value in sorted(counts.items()))
484 lines = [f"High-signal timeline counts: {summary}"]
485 for at, event_type, detail in selected[-TIMELINE_PROMPT_EVENTS:]:
486 prefix = f"- {at} {event_type}: " if at else f"- {event_type}: "
487 lines.append(prefix + _clip_text(detail, SECTION_ITEM_CHARS))
488 return "\n".join(lines)
489
490
491def _outcomes_for_prompt(events: list[dict[str, Any]]) -> str:
492 """Summarize durable outputs so the worker sees progress, not just activity."""
493
494 if not events:
495 return "No durable outcomes visible in recent timeline."
496 counts = outcome_counts(events, include_research=False, include_failures=True)
497 summary = hourly_outcome_summary(counts)
498 lines = [f"Outcome counts: {summary or 'none'}."]
499 seen: set[str] = set()
500 for event in reversed(events):
501 parsed = model_update_event_parts(event, width=240, compact=False)
502 if not parsed:
503 continue
504 label, text, _clock = parsed
505 if label in {"DONE", "PLAN", "UPDATE"}:
506 continue
507 key = f"{label}:{text}"
508 if key in seen:
509 continue
510 seen.add(key)
511 lines.append(f"- {label.lower()}: {_clip_text(text, 360)}")
512 if len(lines) >= 8:
513 break
514 if len(lines) == 1:
515 lines.append("No durable output/finding/measurement records are visible; prioritize creating or accounting for one.")
516 return "\n".join(lines)
517
518
519def _ledgers_for_prompt(job: dict[str, Any]) -> str:
520 metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
521 findings = _metadata_list(job, "finding_ledger")
522 sources = _metadata_list(job, "source_ledger")
523 stale_tokens = _stale_claim_tokens_for_prompt(
524 metadata,
525 reference_text=" ".join(str(job.get(key) or "") for key in ("title", "objective", "kind")),
526 )
527 stale_record_ids = _stale_negative_record_ids(metadata, kind="finding")
528 stale_findings = [
529 finding
530 for finding in findings
531 if _record_contains_stale_token(finding, stale_tokens)
532 or _record_id_for_staleness(finding) in stale_record_ids
533 ]
534 active_findings = [finding for finding in findings if finding not in stale_findings]
535 lines = [
536 f"Finding ledger: {len(findings)} unique candidates.",
537 f"Source ledger: {len(sources)} scored sources.",
538 ]
539 if stale_tokens:
540 lines.append(
541 "Unsupported/stale claim tokens to avoid until re-verified: "
542 + ", ".join(stale_tokens[:12])
543 )
544 if active_findings:
545 lines.append("Recent findings:")
546 for finding in active_findings[-5:]:
547 bits = [
548 str(finding.get("name") or "unknown"),
549 str(finding.get("category") or "").strip(),
550 str(finding.get("location") or "").strip(),
551 f"score={finding.get('score')}" if finding.get("score") is not None else "",
552 ]
553 lines.append("- " + _clip_text(" | ".join(bit for bit in bits if bit), 360))
554 if stale_findings:
555 lines.append(
556 f"Suppressed {len(stale_findings)} stale finding(s) matching unsupported tokens; "
557 "do not use them as facts until observed again."
558 )
559 stale_negative_records = _stale_negative_records_for_prompt(metadata, kind="finding")
560 if stale_negative_records:
561 lines.append("Contradicted negative findings suppressed:")
562 for record in stale_negative_records[-4:]:
563 lines.append(
564 "- "
565 + _clip_text(
566 f"{record.get('title') or 'finding'} token={record.get('token') or ''} evidence={record.get('evidence') or ''}",
567 360,
568 )
569 )
570 if sources:
571 usable_sources = [
572 source
573 for source in sources
574 if _as_float(source.get("usefulness_score")) >= 0.2
575 or _as_int(source.get("yield_count")) > 0
576 ]
577 low_quality_sources = [
578 source
579 for source in sources
580 if _as_float(source.get("usefulness_score")) < 0.2
581 and _as_int(source.get("yield_count")) <= 0
582 and (_as_int(source.get("fail_count")) > 0 or source.get("warnings"))
583 ]
584 ranked = sorted(
585 usable_sources,
586 key=lambda item: (_as_float(item.get("usefulness_score")), _as_int(item.get("yield_count"))),
587 reverse=True,
588 )
589 if ranked:
590 lines.append("High-yield/current sources:")
591 for source in ranked[:4]:
592 lines.append(
593 "- "
594 + _clip_text(
595 f"{source.get('source')} type={source.get('source_type') or 'unknown'} "
596 f"score={source.get('usefulness_score')} findings={source.get('yield_count') or 0} "
597 f"fails={source.get('fail_count') or 0} outcome={source.get('last_outcome') or ''}",
598 420,
599 )
600 )
601 if low_quality_sources:
602 lines.append("Low-yield/blocked source patterns to avoid:")
603 for source in low_quality_sources[-3:]:
604 lines.append(
605 "- "
606 + _clip_text(
607 f"{source.get('source')} type={source.get('source_type') or 'unknown'} "
608 f"score={source.get('usefulness_score')} fails={source.get('fail_count') or 0} "
609 f"warnings={', '.join(source.get('warnings') or [])} outcome={source.get('last_outcome') or ''}",
610 420,
611 )
612 )
613 return "\n".join(lines)
614
615
616def _stale_claim_tokens_for_prompt(metadata: dict[str, Any], *, reference_text: str = "") -> list[str]:
617 raw_tokens = metadata.get("unsupported_claim_tokens")
618 tokens: list[str] = []
619 seen: set[str] = set()
620 candidates: list[Any] = []
621 reference_norm = _normalize_claim_text(reference_text)
622 if isinstance(raw_tokens, list):
623 candidates.extend(raw_tokens)
624 stale_records = metadata.get("stale_negative_records")
625 if isinstance(stale_records, list):
626 for record in stale_records:
627 if isinstance(record, dict):
628 candidates.append(record.get("token"))
629 lessons = metadata.get("lessons")
630 if isinstance(lessons, list):
631 for lesson in lessons[-25:]:
632 if not isinstance(lesson, dict):
633 continue
634 candidates.extend(_unsupported_tokens_from_lesson(str(lesson.get("lesson") or "")))
635 for raw in candidates:
636 token = " ".join(str(raw or "").split())
637 if not token:
638 continue
639 if reference_norm and _normalize_claim_text(token) in reference_norm:
640 continue
641 if not _stale_token_is_distinctive(token):
642 continue
643 key = token.lower()
644 if key in seen:
645 continue
646 seen.add(key)
647 tokens.append(token)
648 return tokens[-100:]
649
650
651def _unsupported_tokens_from_lesson(lesson: str) -> list[str]:
652 marker = "unsupported concrete tokens"
653 if marker not in lesson.lower():
654 return []
655 match = re.search(r"unsupported concrete tokens for .*?:\s*(.*?)(?:\.\s+Treat matching|$)", lesson, flags=re.IGNORECASE)
656 if not match:
657 match = re.search(r"unsupported concrete tokens for .*?:\s*(.*?)(?:\.|$)", lesson, flags=re.IGNORECASE)
658 if not match:
659 return []
660 return [part.strip() for part in match.group(1).split(",") if part.strip()]
661
662
663def _stale_token_is_distinctive(token: str) -> bool:
664 lowered = token.lower()
665 if lowered.startswith(".") and re.match(r"^\.[a-z0-9][a-z0-9_-]{1,12}$", lowered):
666 return lowered not in {".app", ".co", ".com", ".dev", ".edu", ".gov", ".io", ".net", ".org", ".www"}
667 if lowered in {
668 "api",
669 "ascii",
670 "blocked",
671 "broken",
672 "candidate",
673 "candidates",
674 "cdn",
675 "cli",
676 "critical",
677 "cpu",
678 "cuda",
679 "discovered",
680 "discovery",
681 "ggml",
682 "gguf",
683 "gpu",
684 "hf_token",
685 "html",
686 "http",
687 "https",
688 "incomplete",
689 "json",
690 "lfs",
691 "not_found",
692 "oid",
693 "onnx",
694 "planned",
695 "python",
696 "python3",
697 "ram",
698 "rest",
699 "severe",
700 "sha",
701 "sha256",
702 "search",
703 "usable",
704 "unvalidated",
705 "valid",
706 "validity",
707 "validate",
708 "validated",
709 "vram",
710 "xml",
711 "xet",
712 "yaml",
713 "yml",
714 }:
715 return False
716 if lowered.startswith((
717 "art_",
718 "step_",
719 "step-",
720 "shell_",
721 "shell-",
722 "web_",
723 "web-",
724 "episode-",
725 "fact-",
726 "source-",
727 "quality-",
728 "constraint-",
729 "baseline-",
730 "question-",
731 "verified_",
732 "verified-",
733 "timeout_",
734 "timeout-",
735 )):
736 return False
737 if lowered.endswith((".md", ".py", ".json", ".yaml", ".yml", ".gguf", ".txt", ".log")):
738 return False
739 if lowered.startswith(("python-", "pip", "pip3")):
740 return False
741 if len(token) < 4:
742 return False
743 return (any(ch.isalpha() for ch in token) and any(ch.isdigit() for ch in token)) or (token.isupper() and len(token) >= 4)
744
745
746def _normalize_claim_text(text: str) -> str:
747 return re.sub(r"[^a-z0-9]+", "", str(text or "").lower())
748
749
750def _record_contains_stale_token(record: dict[str, Any], stale_tokens: list[str]) -> bool:
751 if not stale_tokens:
752 return False
753 text = " ".join(
754 str(record.get(key) or "")
755 for key in ("name", "category", "location", "contact", "reason", "source_url", "url")
756 )
757 metadata = record.get("metadata") if isinstance(record.get("metadata"), dict) else {}
758 text += " " + " ".join(str(value) for value in metadata.values() if isinstance(value, (str, int, float)))
759 for token in stale_tokens:
760 pattern = r"(?<![A-Za-z0-9])" + re.escape(token) + r"(?![A-Za-z0-9])"
761 if re.search(pattern, text, flags=re.IGNORECASE):
762 return True
763 return False
764
765
766def _stale_negative_records_for_prompt(metadata: dict[str, Any], *, kind: str) -> list[dict[str, Any]]:
767 records = metadata.get("stale_negative_records")
768 if not isinstance(records, list):
769 return []
770 return [
771 record
772 for record in records
773 if isinstance(record, dict) and str(record.get("kind") or "") == kind
774 ]
775
776
777def _stale_negative_record_ids(metadata: dict[str, Any], *, kind: str) -> set[str]:
778 ids: set[str] = set()
779 for record in _stale_negative_records_for_prompt(metadata, kind=kind):
780 record_id = str(record.get("record_id") or "").strip()
781 if record_id:
782 ids.add(record_id)
783 return ids
784
785
786def _record_id_for_staleness(record: dict[str, Any]) -> str:
787 for key in ("key", "event_id", "id"):
788 value = str(record.get(key) or "").strip()
789 if value:
790 return value
791 return _normalize_claim_text(str(record.get("name") or record.get("title") or ""))[:120]
792
793
794def _experiments_for_prompt(job: dict[str, Any]) -> str:
795 experiments = _metadata_list(job, "experiment_ledger")
796 if not experiments:
797 return (
798 "No experiments tracked yet. If this objective involves improving, "
799 "comparing, benchmarking, reducing, increasing, or otherwise measuring something, "
800 "turn candidate ideas into record_experiment entries with exact config, metric, result, and next action."
801 )
802 measured = [experiment for experiment in experiments if experiment.get("metric_value") is not None]
803 best = [
804 experiment
805 for experiment in measured
806 if bool(experiment.get("best_observed"))
807 ]
808 status_counts: dict[str, int] = {}
809 for experiment in experiments:
810 status = str(experiment.get("status") or "planned")
811 status_counts[status] = status_counts.get(status, 0) + 1
812 lines = [
813 f"Experiment counts: {', '.join(f'{key}={value}' for key, value in sorted(status_counts.items()))}.",
814 f"Measured results: {len(measured)}.",
815 ]
816 if best:
817 lines.append("Best observed results:")
818 for experiment in best[-3:]:
819 metric = format_metric_value(
820 experiment.get("metric_name") or "metric",
821 experiment.get("metric_value"),
822 experiment.get("metric_unit") or "",
823 )
824 lines.append(
825 "- "
826 + _clip_text(" | ".join(
827 bit
828 for bit in [
829 str(experiment.get("title") or "experiment"),
830 metric,
831 f"result={experiment.get('result')}" if experiment.get("result") else "",
832 f"next={experiment.get('next_action')}" if experiment.get("next_action") else "",
833 ]
834 if bit
835 ), 520)
836 )
837 recent = experiments[-4:]
838 if recent:
839 lines.append("Recent experiments:")
840 for experiment in recent:
841 metric = ""
842 if experiment.get("metric_value") is not None:
843 metric = format_metric_value(
844 experiment.get("metric_name") or "metric",
845 experiment.get("metric_value"),
846 experiment.get("metric_unit") or "",
847 )
848 delta = ""
849 if experiment.get("delta_from_previous_best") is not None:
850 delta = f"delta={experiment.get('delta_from_previous_best')}"
851 lines.append(
852 "- "
853 + _clip_text(" | ".join(
854 bit
855 for bit in [
856 str(experiment.get("status") or "planned"),
857 str(experiment.get("title") or "experiment"),
858 metric,
859 delta,
860 f"next={experiment.get('next_action')}" if experiment.get("next_action") else "",
861 ]
862 if bit
863 ), 520)
864 )
865 return "\n".join(lines)
866
867
868def _operator_message_visible_in_prompt(entry: dict[str, Any], *, include_unclaimed: bool) -> bool:
869 mode = str(entry.get("mode") or "steer").strip().lower().replace("-", "_")
870 if entry.get("claimed_at") or mode == "note":
871 return True
872 return include_unclaimed and mode == "steer"
873
874
875def _metadata_list(job: dict[str, Any], key: str) -> list[dict[str, Any]]:
876 metadata = job.get("metadata") if isinstance(job.get("metadata"), dict) else {}
877 values = metadata.get(key)
878 if not isinstance(values, list):
879 return []
880 return [value for value in values if isinstance(value, dict)]
881
882
883def _timeline_event_for_prompt(event: dict[str, Any]) -> tuple[str, str, str] | None:
884 event_type = str(event.get("event_type") or "event")
885 if event_type == "operator_message":
886 return None
887 title = " ".join(str(event.get("title") or "").split())
888 body = " ".join(str(event.get("body") or "").split())
889 metadata = event.get("metadata") if isinstance(event.get("metadata"), dict) else {}
890 title_lower = title.lower()
891 if event_type == "tool_result":
892 status = str(metadata.get("status") or "").lower()
893 if status not in TIMELINE_PROMPT_TOOL_STATUSES:
894 return None
895 elif event_type == "agent_message":
896 if title_lower not in TIMELINE_PROMPT_AGENT_TITLES:
897 return None
898 elif event_type not in TIMELINE_PROMPT_EVENT_TYPES:
899 return None
900 at = str(event.get("created_at") or "")
901 detail = title if title else event_type
902 if body:
903 detail = f"{detail}: {body}"
904 if event_type == "tool_result":
905 status = str(metadata.get("status") or "").lower()
906 detail = f"{status} {detail}".strip()
907 return at, event_type, detail
908
909
910def _as_float(value: Any, default: float = 0.0) -> float:
911 try:
912 return float(value)
913 except (TypeError, ValueError):
914 return default
915
916
917def _as_int(value: Any, default: int = 0) -> int:
918 try:
919 return int(value)
920 except (TypeError, ValueError):
921 return default
nipux_cli/worker_prompt_format.py 223 lines
1"""Prompt-facing summaries for worker history and tool observations."""
2
3from __future__ import annotations
4
5import json
6import re
7from pathlib import Path
8from typing import Any
9
10from nipux_cli.metric_format import format_metric_value
11from nipux_cli.source_quality import anti_bot_reason
12from nipux_cli.worker_policy import BROWSER_REF_IGNORE_NAMES
13
14
15def compact(value: Any, limit: int = 500) -> str:
16 text = json.dumps(value, ensure_ascii=False, sort_keys=True) if not isinstance(value, str) else value
17 text = " ".join(text.split())
18 return text if len(text) <= limit else text[:limit] + "..."
19
20
21def clip_text(value: Any, limit: int) -> str:
22 text = " ".join(str(value or "").split())
23 if len(text) <= limit:
24 return text
25 return text[: max(0, limit - 3)].rstrip() + "..."
26
27
28def format_step_for_prompt(step: dict[str, Any]) -> str:
29 tool = f" tool={step['tool_name']}" if step.get("tool_name") else ""
30 summary = step.get("summary") or step.get("error") or ""
31 pieces = [f"- #{step['step_no']} {step['kind']} {step['status']}{tool}: {summary}"]
32 input_data = step.get("input") or {}
33 args = input_data.get("arguments") if isinstance(input_data, dict) else None
34 if args:
35 pieces.append(f" args: {compact(args, 320)}")
36 output = step.get("output") or {}
37 observation = observation_for_prompt(step.get("tool_name"), output)
38 if observation:
39 pieces.append(f" observed: {observation}")
40 return "\n".join(pieces)
41
42
43def observation_for_prompt(tool_name: str | None, output: dict[str, Any]) -> str:
44 if not output:
45 return ""
46 if output.get("error"):
47 if tool_name in {"browser_click", "browser_type"}:
48 recovery = output.get("recovery_snapshot") if isinstance(output.get("recovery_snapshot"), dict) else {}
49 candidates = browser_candidates_for_prompt(recovery)
50 suffix = f"; recovery_candidates={candidates}" if candidates else ""
51 return clip_text(f"error={output.get('error')}; guidance={output.get('recovery_guidance', '')}{suffix}", 700)
52 evidence_grounding = output.get("evidence_grounding") if isinstance(output.get("evidence_grounding"), dict) else {}
53 if evidence_grounding:
54 missing_paths = evidence_grounding.get("missing_candidate_paths")
55 if isinstance(missing_paths, list) and missing_paths:
56 missing_paths = [path for path in (clean_prompt_candidate_path(item) for item in missing_paths) if path]
57 return clip_text(
58 "error=evidence grounding required; missing_exact_paths="
59 + ", ".join(str(path) for path in missing_paths[:8])
60 + "; rewrite the durable record with exact observed paths or state why they are irrelevant",
61 900,
62 )
63 unsupported = evidence_grounding.get("unsupported_tokens")
64 if isinstance(unsupported, list) and unsupported:
65 return clip_text(
66 "error=evidence grounding required; unsupported="
67 + ", ".join(str(token) for token in unsupported[:10])
68 + "; use only tokens present in recent observed evidence or record uncertainty",
69 700,
70 )
71 recent_artifacts = output.get("recent_artifacts") if isinstance(output.get("recent_artifacts"), list) else []
72 if tool_name == "read_artifact" and recent_artifacts:
73 refs = []
74 for artifact in recent_artifacts[:6]:
75 if not isinstance(artifact, dict):
76 continue
77 number = str(artifact.get("number") or "").strip()
78 artifact_id = str(artifact.get("id") or "").strip()
79 title = str(artifact.get("title") or "").strip()
80 label = artifact_id or number
81 if label:
82 refs.append(f"{label}={title}" if title else label)
83 suffix = f"; valid_recent_artifacts={'; '.join(refs)}" if refs else ""
84 return clip_text(f"error={output.get('error')}; guidance={output.get('guidance') or ''}{suffix}", 900)
85 return clip_text(f"error={output.get('error')}; guidance={output.get('guidance') or ''}", 700)
86 if tool_name == "web_search":
87 results = output.get("results") if isinstance(output.get("results"), list) else []
88 titles = []
89 for result in results[:5]:
90 title = result.get("title") or "untitled"
91 url = result.get("url") or ""
92 titles.append(f"{title} <{url}>")
93 return clip_text(f"query={output.get('query')!r}; results={'; '.join(titles)}", 650)
94 if tool_name == "web_extract":
95 pages = output.get("pages") if isinstance(output.get("pages"), list) else []
96 parts = []
97 for page in pages[:3]:
98 if page.get("error"):
99 parts.append(f"{page.get('url')}: ERROR {page.get('error')}")
100 else:
101 text = str(page.get("text") or "")
102 parts.append(f"{page.get('url')}: {clip_text(text, 160)}")
103 return clip_text("; ".join(parts), 650)
104 if tool_name == "shell_exec":
105 stdout = str(output.get("stdout") or "")
106 stderr = str(output.get("stderr") or "")
107 excerpt = stdout.strip() or stderr.strip()
108 return (
109 f"command={output.get('command')!r}; rc={output.get('returncode')}; "
110 f"duration={output.get('duration_seconds')}s; output={clip_text(excerpt, 360)}"
111 )[:650]
112 if tool_name == "write_artifact":
113 return f"saved artifact={output.get('artifact_id')} path={output.get('path')}"
114 if tool_name == "report_update":
115 update = output.get("update") if isinstance(output.get("update"), dict) else {}
116 return clip_text(f"agent_update={update.get('message') or ''}", 420)
117 if tool_name == "record_lesson":
118 lesson = output.get("lesson") if isinstance(output.get("lesson"), dict) else {}
119 return clip_text(f"lesson={lesson.get('category') or 'memory'}: {lesson.get('lesson') or ''}", 420)
120 if tool_name == "record_memory_graph":
121 return (
122 f"memory_graph added_nodes={output.get('added_nodes')} "
123 f"updated_nodes={output.get('updated_nodes')} added_edges={output.get('added_edges')}"
124 )[:520]
125 if tool_name == "search_memory_graph":
126 nodes = output.get("nodes") if isinstance(output.get("nodes"), list) else []
127 titles = [str(node.get("title") or node.get("key") or "memory") for node in nodes[:5] if isinstance(node, dict)]
128 return clip_text(f"memory_query={output.get('query')!r}; nodes={'; '.join(titles)}", 520)
129 if tool_name == "record_source":
130 source = output.get("source") if isinstance(output.get("source"), dict) else {}
131 return (
132 f"source={source.get('source')} score={source.get('usefulness_score')} "
133 f"findings={source.get('yield_count')} fails={source.get('fail_count')} outcome={source.get('last_outcome')}"
134 )[:420]
135 if tool_name == "record_findings":
136 return f"finding ledger updated added={output.get('added')} updated={output.get('updated')}"[:700]
137 if tool_name == "record_experiment":
138 experiment = output.get("experiment") if isinstance(output.get("experiment"), dict) else {}
139 metric = ""
140 if experiment.get("metric_value") is not None:
141 metric = format_metric_value(
142 experiment.get("metric_name") or "metric",
143 experiment.get("metric_value"),
144 experiment.get("metric_unit") or "",
145 )
146 delta = f" delta={experiment.get('delta_from_previous_best')}" if experiment.get("delta_from_previous_best") is not None else ""
147 best = " best_observed" if experiment.get("best_observed") else ""
148 return clip_text(f"experiment={experiment.get('title')} status={experiment.get('status')} {metric}{delta}{best}", 520)
149 if tool_name == "acknowledge_operator_context":
150 return f"operator_context {output.get('status')} count={output.get('count')}"[:700]
151 if tool_name == "browser_navigate":
152 data = output.get("data") if isinstance(output.get("data"), dict) else {}
153 title = data.get("title") or ""
154 url = data.get("url") or ""
155 snapshot = str(output.get("snapshot") or "")
156 warning = anti_bot_reason(title, url, snapshot)
157 suffix = f"; source_warning={warning}" if warning else ""
158 candidates = browser_candidates_for_prompt(output)
159 candidate_suffix = f"; candidates={candidates}" if candidates else ""
160 return clip_text(f"opened {title} <{url}>; snapshot_chars={len(snapshot)}{suffix}{candidate_suffix}", 700)
161 if tool_name == "browser_snapshot":
162 data = output.get("data") if isinstance(output.get("data"), dict) else {}
163 snapshot = str(output.get("snapshot") or data.get("snapshot") or output.get("data") or "")
164 warning = anti_bot_reason(snapshot)
165 suffix = f"; source_warning={warning}" if warning else ""
166 candidates = browser_candidates_for_prompt(output)
167 candidate_suffix = f"; candidates={candidates}" if candidates else ""
168 return clip_text(f"snapshot_chars={len(snapshot)}{suffix}{candidate_suffix}", 700)
169 return compact(output, 700)
170
171
172def clean_prompt_candidate_path(value: Any) -> str:
173 raw = str(value or "").strip().rstrip(".,:;)")
174 if not raw or "://" in raw or raw.startswith("//") or "..." in raw or "…" in raw or "*" in raw:
175 return ""
176 name = Path(raw).name
177 suffix = Path(name).suffix
178 if not name or name.startswith(".") or not suffix:
179 return ""
180 if not re.match(r"^\.[A-Za-z0-9][A-Za-z0-9_]{1,12}$", suffix) or not any(ch.isalpha() for ch in suffix):
181 return ""
182 return raw
183
184
185def browser_candidates_for_prompt(output: dict[str, Any], *, limit: int = 18) -> str:
186 refs = output.get("refs") if isinstance(output.get("refs"), dict) else None
187 if refs is None:
188 data = output.get("data") if isinstance(output.get("data"), dict) else {}
189 refs = data.get("refs") if isinstance(data.get("refs"), dict) else {}
190 candidates = []
191 seen = set()
192 for ref, item in refs.items():
193 if not isinstance(item, dict):
194 continue
195 role = str(item.get("role") or "")
196 if role not in {"link", "heading", "cell"}:
197 continue
198 name = " ".join(str(item.get("name") or "").split())
199 key = name.lower().strip()
200 if not name or key in BROWSER_REF_IGNORE_NAMES:
201 continue
202 if len(name) < 3 or len(name) > 90 or key in seen:
203 continue
204 if role == "cell" and (_looks_like_metric_cell(name) or _looks_like_service_description(name)):
205 continue
206 seen.add(key)
207 candidates.append(f"{name} (@{ref})")
208 if len(candidates) >= limit:
209 break
210 return "; ".join(candidates)
211
212
213def _looks_like_metric_cell(name: str) -> bool:
214 text = name.strip()
215 return bool(re.fullmatch(r"(?:n/?a|na|[-+]?\d+(?:\.\d+)?(?:/5)?|[$€£]?\d[\d,]*(?:\.\d+)?%?)", text, re.I))
216
217
218def _looks_like_service_description(name: str) -> bool:
219 text = name.lower()
220 if "," in text and len(text.split()) >= 6:
221 return True
222 service_terms = ("custom ecommerce", "ux/ui", "payment integration", "mobile responsiveness", "headless commerce")
223 return any(term in text for term in service_terms) and len(text.split()) >= 5
nipux_cli/worker_tool_summary.py 99 lines
1"""Compact result summaries for worker tool executions."""
2
3from __future__ import annotations
4
5from typing import Any
6
7from nipux_cli.metric_format import format_metric_value
8
9
10def summarize_tool_result(name: str, args: dict[str, Any], result: dict[str, Any], *, ok: bool) -> str:
11 if not ok:
12 return f"{name} failed: {result.get('error') or 'unknown error'}"
13 if name == "web_search":
14 results = result.get("results") if isinstance(result.get("results"), list) else []
15 top = "; ".join((item.get("title") or "untitled") for item in results[:3])
16 return f"web_search query={args.get('query')!r} returned {len(results)} results: {top}"
17 if name == "web_extract":
18 pages = result.get("pages") if isinstance(result.get("pages"), list) else []
19 ok_pages = [page for page in pages if not page.get("error")]
20 return f"web_extract fetched {len(ok_pages)}/{len(pages)} pages"
21 if name == "shell_exec":
22 command = str(result.get("command") or args.get("command") or "")
23 return (
24 f"shell_exec rc={result.get('returncode')} "
25 f"duration={result.get('duration_seconds')}s cmd={command!r}"
26 )
27 if name == "write_artifact":
28 return f"write_artifact saved {result.get('artifact_id')} at {result.get('path')}"
29 if name == "write_file":
30 return f"write_file {result.get('mode') or 'overwrite'} {result.get('path')} bytes={result.get('bytes')}"
31 if name == "defer_job":
32 return f"defer_job until {result.get('defer_until')}"
33 if name == "report_update":
34 update = result.get("update") if isinstance(result.get("update"), dict) else {}
35 return f"report_update saved: {str(update.get('message') or '')[:160]}"
36 if name == "record_lesson":
37 lesson = result.get("lesson") if isinstance(result.get("lesson"), dict) else {}
38 category = lesson.get("category") or "memory"
39 text = str(lesson.get("lesson") or "")[:160]
40 return f"record_lesson saved {category}: {text}"
41 if name == "record_memory_graph":
42 return (
43 f"record_memory_graph updated: {result.get('added_nodes', 0)} new nodes, "
44 f"{result.get('updated_nodes', 0)} updated, {result.get('added_edges', 0)} links"
45 )
46 if name == "search_memory_graph":
47 nodes = result.get("nodes") if isinstance(result.get("nodes"), list) else []
48 return f"search_memory_graph returned {len(nodes)} nodes for {args.get('query')!r}"
49 if name == "record_source":
50 source = result.get("source") if isinstance(result.get("source"), dict) else {}
51 return f"record_source updated {source.get('source')} score={source.get('usefulness_score')} yield={source.get('yield_count')}"
52 if name == "record_findings":
53 return (
54 f"record_findings updated ledger: {result.get('added', 0)} new, "
55 f"{result.get('updated', 0)} updated, {result.get('sources_updated', 0)} sources"
56 )
57 if name == "record_tasks":
58 return f"record_tasks updated queue: {result.get('added', 0)} new, {result.get('updated', 0)} updated"
59 if name == "record_roadmap":
60 roadmap = result.get("roadmap") if isinstance(result.get("roadmap"), dict) else {}
61 return (
62 f"record_roadmap {roadmap.get('status')}: {roadmap.get('title')} "
63 f"milestones={len(roadmap.get('milestones') or [])}"
64 )
65 if name == "record_milestone_validation":
66 validation = result.get("validation") if isinstance(result.get("validation"), dict) else {}
67 return (
68 f"record_milestone_validation {validation.get('validation_status')}: "
69 f"{validation.get('title')} followups={len(result.get('follow_up_tasks') or [])}"
70 )
71 if name == "record_experiment":
72 experiment = result.get("experiment") if isinstance(result.get("experiment"), dict) else {}
73 metric = ""
74 if experiment.get("metric_value") is not None:
75 metric = " " + format_metric_value(
76 experiment.get("metric_name") or "metric",
77 experiment.get("metric_value"),
78 experiment.get("metric_unit") or "",
79 )
80 best = " best" if experiment.get("best_observed") else ""
81 return f"record_experiment {experiment.get('status')}: {experiment.get('title')}{metric}{best}"
82 if name == "acknowledge_operator_context":
83 return f"acknowledge_operator_context {result.get('status')} count={result.get('count', 0)}"
84 if name == "browser_navigate":
85 data = result.get("data") if isinstance(result.get("data"), dict) else {}
86 title = data.get("title") or ""
87 url = data.get("url") or ""
88 warning = f" | warning={result.get('source_warning')}" if result.get("source_warning") else ""
89 return f"browser_navigate opened {title} <{url}>{warning}"
90 if name == "browser_snapshot":
91 snapshot = str(result.get("snapshot") or result.get("data") or "")
92 warning = f" | warning={result.get('source_warning')}" if result.get("source_warning") else ""
93 return f"browser_snapshot returned {len(snapshot)} chars{warning}"
94 if name == "read_artifact":
95 return f"read_artifact read {result.get('artifact_id')}"
96 if name == "search_artifacts":
97 results = result.get("results") if isinstance(result.get("results"), list) else []
98 return f"search_artifacts returned {len(results)} results for {args.get('query')!r}"
99 return f"{name} completed"
nipux_cli/worker_usage.py 48 lines
1"""Usage accounting for worker model turns."""
2
3from __future__ import annotations
4
5import json
6from typing import Any
7
8from nipux_cli.llm import LLMResponse
9
10
11def turn_usage_metadata(
12 response: LLMResponse,
13 *,
14 messages: list[dict[str, Any]],
15 context_length: int,
16) -> dict[str, Any]:
17 prompt_text = json.dumps(messages, ensure_ascii=False, default=str)
18 completion_text = response.content + json.dumps(
19 [{"name": call.name, "arguments": call.arguments} for call in response.tool_calls],
20 ensure_ascii=False,
21 default=str,
22 )
23 usage = dict(response.usage) if isinstance(response.usage, dict) else {}
24 prompt_tokens = _as_int(usage.get("prompt_tokens")) or estimate_token_count(prompt_text)
25 completion_tokens = _as_int(usage.get("completion_tokens")) or estimate_token_count(completion_text)
26 usage.setdefault("prompt_tokens", prompt_tokens)
27 usage.setdefault("completion_tokens", completion_tokens)
28 usage.setdefault("total_tokens", prompt_tokens + completion_tokens)
29 usage.setdefault("estimated", not bool(response.usage))
30 usage["prompt_chars"] = len(prompt_text)
31 usage["completion_chars"] = len(completion_text)
32 if context_length > 0:
33 usage["context_length"] = context_length
34 usage["context_fraction"] = round(prompt_tokens / max(1, context_length), 6)
35 return usage
36
37
38def estimate_token_count(text: str) -> int:
39 if not text:
40 return 0
41 return max(1, (len(text) + 3) // 4)
42
43
44def _as_int(value: Any) -> int:
45 try:
46 return int(float(value))
47 except (TypeError, ValueError):
48 return 0
plans/nipux-runtime-notes.md 50 lines
1# Nipux Runtime Notes
2
3Nipux is a narrow, restartable worker for long-running browser, web research,
4and command-line jobs. The active implementation is intentionally small and
5centered on `nipux_cli/`, `tests/nipux_cli/`, and the `nipux` console script.
6
7## Runtime Shape
8
9- Package: `nipux_cli/`
10- CLI entry point: `nipux`
11- State home: `~/.nipux` or `NIPUX_HOME`
12- Config file: `~/.nipux/config.yaml`
13- Database: SQLite with WAL
14- Artifacts: per-job files under the configured state home
15- Browser profiles: per-job `agent-browser` profiles
16- Model API: OpenAI-compatible chat completions endpoint
17
18## Design Constraints
19
20- Keep every worker step bounded and restartable.
21- Persist useful evidence as artifacts before summarizing it.
22- Keep summaries compact and point back to artifacts.
23- Maintain source, finding, task, experiment, and lesson ledgers.
24- Keep jobs runnable until the operator pauses or cancels them.
25- Keep the tool registry explicit and small.
26- Keep runtime behavior domain-neutral.
27
28## Active Tools
29
30- Browser: `browser_navigate`, `browser_snapshot`, `browser_click`,
31 `browser_type`, `browser_scroll`, `browser_back`, `browser_press`,
32 `browser_console`
33- Web: `web_search`, `web_extract`
34- Local command work: `shell_exec`
35- Artifacts: `write_artifact`, `read_artifact`, `search_artifacts`
36- Job state and visibility: `update_job_state`, `report_update`,
37 `send_digest_email`
38- Learning ledgers: `record_lesson`, `record_source`, `record_findings`,
39 `record_tasks`, `record_experiment`
40
41## Validation
42
43```bash
44PYTEST_ADDOPTS='' uv run --extra dev python -m pytest -q
45uv run --extra dev ruff check --isolated nipux_cli tests/nipux_cli
46uv run nipux doctor
47```
48
49Use `uv run nipux daemon --once --fake` for a deterministic no-model smoke
50test after CLI or daemon changes.
pyproject.toml 54 lines
1[build-system]
2requires = ["setuptools>=61.0"]
3build-backend = "setuptools.build_meta"
4
5[project]
6name = "nipux"
7version = "0.1.0"
8description = "A restartable CLI worker for long-running agent jobs"
9readme = "README.md"
10requires-python = ">=3.11"
11authors = [{ name = "Nipux" }]
12license = "MIT"
13keywords = ["agent", "cli", "automation", "daemon", "openai-compatible"]
14classifiers = [
15 "Development Status :: 3 - Alpha",
16 "Environment :: Console",
17 "Intended Audience :: Developers",
18 "Programming Language :: Python :: 3",
19 "Programming Language :: Python :: 3.11",
20 "Programming Language :: Python :: 3.12",
21 "Topic :: Software Development :: Libraries :: Application Frameworks",
22]
23dependencies = [
24 "openai>=2.21.0,<3",
25 "pyyaml>=6.0.2,<7",
26]
27
28[project.urls]
29Homepage = "https://nipux.com"
30Source = "https://github.com/nipuxx/agent-cli"
31Issues = "https://github.com/nipuxx/agent-cli/issues"
32
33[project.optional-dependencies]
34dev = ["pytest>=9.0.2,<10", "ruff"]
35
36[dependency-groups]
37dev = ["pytest>=9.0.2,<10", "ruff"]
38
39[project.scripts]
40nipux = "nipux_cli.cli:main"
41
42[tool.setuptools.packages.find]
43include = ["nipux_cli", "nipux_cli.*"]
44
45[tool.pytest.ini_options]
46testpaths = ["tests/nipux_cli"]
47addopts = "-q"
48
49[tool.ruff]
50line-length = 120
51target-version = "py311"
52
53[tool.uv]
54exclude-newer = "7 days"
scripts/generate_project_atlas.py 619 lines
1#!/usr/bin/env python3
2"""Generate docs/project-atlas.html from the tracked Nipux source tree."""
3
4from __future__ import annotations
5
6import ast
7import html
8import re
9import subprocess
10from dataclasses import dataclass
11from pathlib import Path
12from typing import Any
13
14
15ROOT = Path(__file__).resolve().parents[1]
16OUT = ROOT / "docs" / "project-atlas.html"
17SOURCE_SUFFIXES = {".py", ".md", ".toml", ".yaml", ".yml"}
18EXCLUDED = {
19 "docs/project-atlas.html",
20 "uv.lock",
21}
22SENSITIVE_ASSIGNMENT_RE = re.compile(
23 r"^(\s*)([A-Z0-9_]*(?:API_KEY|TOKEN|SECRET|PASSWORD)[A-Z0-9_]*)(\s*)=(.*)$"
24)
25
26
27@dataclass
28class SourceFile:
29 path: str
30 text: str
31 lines: list[str]
32 tree: ast.AST | None
33 error: str = ""
34
35
36@dataclass
37class Symbol:
38 path: str
39 kind: str
40 name: str
41 line: int
42 end_line: int
43 doc: str
44 calls: list[str]
45
46
47@dataclass
48class Prompt:
49 path: str
50 name: str
51 line: int
52 text: str
53 context: str
54
55
56def main() -> None:
57 files = load_source_files()
58 symbols = extract_symbols(files)
59 prompts = extract_prompts(files)
60 tools = extract_tools(files)
61 tables = extract_tables(files)
62 commit = git(["rev-parse", "--short", "HEAD"]) or "working-tree"
63 html_text = render(files, symbols, prompts, tools, tables, commit=commit)
64 OUT.parent.mkdir(parents=True, exist_ok=True)
65 OUT.write_text(html_text, encoding="utf-8")
66 print(f"wrote {OUT.relative_to(ROOT)} ({len(html_text):,} chars)")
67
68
69def load_source_files() -> list[SourceFile]:
70 paths = tracked_paths()
71 files: list[SourceFile] = []
72 for path in paths:
73 if path in EXCLUDED:
74 continue
75 full = ROOT / path
76 if full.suffix not in SOURCE_SUFFIXES or not full.is_file():
77 continue
78 text = full.read_text(encoding="utf-8", errors="replace")
79 tree = None
80 error = ""
81 if full.suffix == ".py":
82 try:
83 tree = ast.parse(text, filename=path)
84 except SyntaxError as exc:
85 error = str(exc)
86 files.append(SourceFile(path=path, text=text, lines=text.splitlines(), tree=tree, error=error))
87 return files
88
89
90def tracked_paths() -> list[str]:
91 output = git(["ls-files"])
92 if not output:
93 return []
94 return sorted(line.strip() for line in output.splitlines() if line.strip())
95
96
97def git(args: list[str]) -> str:
98 try:
99 result = subprocess.run(["git", *args], cwd=ROOT, check=False, capture_output=True, text=True)
100 except OSError:
101 return ""
102 return result.stdout.strip() if result.returncode == 0 else ""
103
104
105def extract_symbols(files: list[SourceFile]) -> list[Symbol]:
106 symbols: list[Symbol] = []
107 for source in files:
108 if source.tree is None:
109 continue
110 for node in ast.walk(source.tree):
111 if isinstance(node, (ast.FunctionDef, ast.AsyncFunctionDef, ast.ClassDef)):
112 calls = sorted(call_names(node))[:16]
113 symbols.append(
114 Symbol(
115 path=source.path,
116 kind="class" if isinstance(node, ast.ClassDef) else "function",
117 name=node.name,
118 line=getattr(node, "lineno", 0),
119 end_line=getattr(node, "end_lineno", getattr(node, "lineno", 0)),
120 doc=ast.get_docstring(node) or "",
121 calls=calls,
122 )
123 )
124 return sorted(symbols, key=lambda item: (item.path, item.line, item.name))
125
126
127def call_names(node: ast.AST) -> set[str]:
128 names: set[str] = set()
129 for child in ast.walk(node):
130 if not isinstance(child, ast.Call):
131 continue
132 name = dotted_name(child.func)
133 if name:
134 names.add(name)
135 return names
136
137
138def dotted_name(node: ast.AST) -> str:
139 if isinstance(node, ast.Name):
140 return node.id
141 if isinstance(node, ast.Attribute):
142 base = dotted_name(node.value)
143 return f"{base}.{node.attr}" if base else node.attr
144 return ""
145
146
147def extract_prompts(files: list[SourceFile]) -> list[Prompt]:
148 prompts: list[Prompt] = []
149 for source in files:
150 if source.tree is None:
151 continue
152 for node in ast.walk(source.tree):
153 if isinstance(node, (ast.Assign, ast.AnnAssign)):
154 names = assignment_names(node)
155 value = getattr(node, "value", None)
156 text = literal_string(value)
157 if not text:
158 continue
159 if any(is_prompt_name(name) for name in names) or is_prompt_text(text):
160 prompts.append(
161 Prompt(
162 path=source.path,
163 name=", ".join(names) or "string",
164 line=getattr(node, "lineno", 0),
165 text=text,
166 context="assignment",
167 )
168 )
169 elif isinstance(node, ast.Constant) and isinstance(node.value, str) and is_prompt_text(node.value):
170 prompts.append(
171 Prompt(
172 path=source.path,
173 name="inline string",
174 line=getattr(node, "lineno", 0),
175 text=node.value,
176 context="inline",
177 )
178 )
179 deduped: dict[tuple[str, int, str], Prompt] = {}
180 for prompt in prompts:
181 key = (prompt.path, prompt.line, prompt.text[:120])
182 existing = deduped.get(key)
183 if existing is None or (existing.context == "inline" and prompt.context != "inline"):
184 deduped[key] = prompt
185 return sorted(deduped.values(), key=lambda item: (item.path, item.line))
186
187
188def assignment_names(node: ast.Assign | ast.AnnAssign) -> list[str]:
189 targets = node.targets if isinstance(node, ast.Assign) else [node.target]
190 names: list[str] = []
191 for target in targets:
192 if isinstance(target, ast.Name):
193 names.append(target.id)
194 elif isinstance(target, ast.Attribute):
195 names.append(target.attr)
196 return names
197
198
199def literal_string(node: ast.AST | None) -> str:
200 if isinstance(node, ast.Constant) and isinstance(node.value, str):
201 return node.value
202 if isinstance(node, ast.JoinedStr):
203 return "".join(part.value for part in node.values if isinstance(part, ast.Constant) and isinstance(part.value, str))
204 return ""
205
206
207def is_prompt_name(name: str) -> bool:
208 upper = name.upper()
209 return any(term in upper for term in ("PROMPT", "SYSTEM", "INSTRUCTION", "GUIDANCE")) and not upper.endswith("_PATH")
210
211
212def is_prompt_text(text: str) -> bool:
213 clean = " ".join(text.split())
214 if len(clean) < 240:
215 return False
216 lowered = clean.lower()
217 return (
218 "you are" in lowered
219 or "do not" in lowered and "use " in lowered
220 or "operator" in lowered and "context" in lowered and "prompt" in lowered
221 or "next-action" in lowered
222 )
223
224
225def extract_tools(files: list[SourceFile]) -> list[dict[str, str]]:
226 tools_text = next((source.text for source in files if source.path == "nipux_cli/tools.py"), "")
227 tools: list[dict[str, str]] = []
228 pattern = re.compile(r"ToolSpec\(\s*['\"]([^'\"]+)['\"]\s*,\s*['\"]([^'\"]+)['\"]", re.S)
229 for match in pattern.finditer(tools_text):
230 line = tools_text[: match.start()].count("\n") + 1
231 tools.append({"name": match.group(1), "description": " ".join(match.group(2).split()), "line": str(line)})
232 return tools
233
234
235def extract_tables(files: list[SourceFile]) -> list[dict[str, Any]]:
236 db_text = next((source.text for source in files if source.path == "nipux_cli/db.py"), "")
237 tables: list[dict[str, Any]] = []
238 for match in re.finditer(r"CREATE TABLE IF NOT EXISTS\s+([a-zA-Z_][a-zA-Z0-9_]*)\s*\((.*?)\)", db_text, re.S):
239 raw_columns = [line.strip().rstrip(",") for line in match.group(2).splitlines()]
240 columns = [line for line in raw_columns if line and not line.upper().startswith(("FOREIGN", "UNIQUE", "PRIMARY KEY"))]
241 tables.append({"name": match.group(1), "columns": columns, "line": db_text[: match.start()].count("\n") + 1})
242 return tables
243
244
245def render(
246 files: list[SourceFile],
247 symbols: list[Symbol],
248 prompts: list[Prompt],
249 tools: list[dict[str, str]],
250 tables: list[dict[str, Any]],
251 *,
252 commit: str,
253) -> str:
254 python_files = [source for source in files if source.path.endswith(".py")]
255 total_lines = sum(len(source.lines) for source in files)
256 file_cards = "\n".join(render_file_card(source, symbols) for source in files)
257 source_browser = "\n".join(render_source_file(source) for source in files)
258 symbol_cards = "\n".join(render_symbol(symbol) for symbol in symbols)
259 prompt_cards = "\n".join(render_prompt(prompt) for prompt in prompts[:80])
260 tool_rows = "\n".join(
261 f"<tr><td><code>{esc(tool['name'])}</code></td><td>{esc(tool['description'])}</td><td>{tool['line']}</td></tr>"
262 for tool in tools
263 )
264 table_cards = "\n".join(render_table(table) for table in tables)
265 risk_cards = render_review_points(files, symbols, prompts, tools)
266 return f"""<!doctype html>
267<html lang="en">
268<head>
269<meta charset="utf-8">
270<meta name="viewport" content="width=device-width, initial-scale=1">
271<title>Nipux Project Atlas</title>
272<style>
273:root {{
274 color-scheme: dark;
275 --bg: #080909; --panel: #101112; --panel-2: #151717; --text: #ecebe6;
276 --muted: #9b9b96; --faint: #5f615e; --line: #303332; --accent: #9ad6d1;
277 --accent-2: #d8d06d; --warn: #ee9b66; --bad: #e36d78; --green: #9fca7f;
278 --mono: ui-monospace, SFMono-Regular, Menlo, Monaco, Consolas, monospace;
279 --sans: Inter, ui-sans-serif, system-ui, -apple-system, BlinkMacSystemFont, "Segoe UI", sans-serif;
280}}
281* {{ box-sizing: border-box; }}
282body {{ margin: 0; background: radial-gradient(circle at 70% 0%, rgba(154,214,209,.10), transparent 36%), var(--bg); color: var(--text); font: 15px/1.55 var(--sans); }}
283a {{ color: var(--accent); text-decoration: none; }}
284a:hover {{ text-decoration: underline; }}
285.shell {{ display: grid; grid-template-columns: 292px minmax(0, 1fr); min-height: 100vh; }}
286.sidebar {{ position: sticky; top: 0; height: 100vh; overflow: auto; border-right: 1px solid var(--line); padding: 24px 22px; background: rgba(8,9,9,.94); }}
287.logo {{ font: 750 22px var(--mono); letter-spacing: .08em; color: var(--accent); }}
288.subtitle {{ color: var(--muted); margin: 6px 0 22px; }}
289.search {{ width: 100%; background: #050606; border: 1px solid var(--line); color: var(--text); border-radius: 8px; padding: 11px 12px; font: 14px var(--mono); outline: none; }}
290.search:focus {{ border-color: var(--accent); box-shadow: 0 0 0 3px rgba(154,214,209,.12); }}
291.nav {{ margin: 22px 0; display: grid; gap: 7px; }}
292.nav a {{ color: var(--muted); padding: 6px 0; font: 13px var(--mono); text-transform: uppercase; letter-spacing: .12em; }}
293.stats {{ margin-top: 24px; display: grid; gap: 10px; }}
294.stat {{ border: 1px solid var(--line); background: var(--panel); border-radius: 10px; padding: 12px; }}
295.stat b {{ display: block; font: 650 24px/1 var(--mono); }}
296.stat span {{ color: var(--muted); font: 12px var(--mono); text-transform: uppercase; letter-spacing: .12em; }}
297main {{ min-width: 0; padding: 34px 38px 80px; }}
298.hero {{ border-bottom: 1px solid var(--line); padding-bottom: 28px; margin-bottom: 28px; }}
299.eyebrow {{ color: var(--accent); font: 12px var(--mono); text-transform: uppercase; letter-spacing: .22em; }}
300h1 {{ font-size: clamp(38px, 6vw, 86px); line-height: .9; margin: 14px 0 18px; letter-spacing: -.04em; }}
301h2 {{ margin: 0; font-size: 28px; letter-spacing: -.02em; }}
302h3 {{ margin: 0 0 6px; font-size: 16px; }}
303.lede {{ max-width: 980px; color: #c7c6bf; font-size: 19px; }}
304.section {{ margin: 44px 0; scroll-margin-top: 20px; }}
305.section > header {{ display: flex; align-items: end; justify-content: space-between; gap: 20px; border-bottom: 1px solid var(--line); margin-bottom: 18px; padding-bottom: 10px; }}
306.kicker {{ color: var(--muted); font: 12px var(--mono); text-transform: uppercase; letter-spacing: .16em; }}
307.grid {{ display: grid; grid-template-columns: repeat(auto-fit, minmax(280px, 1fr)); gap: 14px; }}
308.card, .file-card, .prompt, .tool, .db-card, .symbol {{ border: 1px solid var(--line); background: linear-gradient(180deg, rgba(255,255,255,.035), rgba(255,255,255,.012)); border-radius: 12px; padding: 16px; overflow: hidden; }}
309.file-card header, .prompt header, .tool header {{ display: flex; justify-content: space-between; gap: 12px; align-items: baseline; border-bottom: 1px solid rgba(255,255,255,.06); margin: -2px 0 10px; padding-bottom: 8px; }}
310.muted, .file-card header span, .prompt header span, .tool header span {{ color: var(--muted); }}
311.meta {{ display: flex; flex-wrap: wrap; gap: 8px; margin: 10px 0; }}
312.meta span, .pill {{ border: 1px solid var(--line); border-radius: 999px; padding: 2px 8px; color: var(--muted); font: 12px var(--mono); }}
313.arch {{ display: grid; grid-template-columns: repeat(3, minmax(210px, 1fr)); gap: 12px; }}
314.node {{ text-align: left; min-height: 126px; cursor: pointer; border: 1px solid var(--line); color: var(--text); background: var(--panel); border-radius: 14px; padding: 15px; font: inherit; transition: .15s ease; }}
315.node:hover {{ transform: translateY(-2px); border-color: var(--accent); background: var(--panel-2); }}
316.node strong {{ display: block; font-size: 17px; }}
317.node span {{ display: block; color: var(--accent); font: 12px var(--mono); margin: 4px 0 8px; }}
318.node em {{ display: block; color: var(--muted); font-style: normal; }}
319.flow {{ counter-reset: flow; display: grid; gap: 10px; padding: 0; list-style: none; }}
320.flow li {{ counter-increment: flow; border-left: 2px solid var(--accent); background: var(--panel); padding: 12px 14px 12px 48px; position: relative; border-radius: 8px; }}
321.flow li:before {{ content: counter(flow); position: absolute; left: 14px; top: 12px; color: var(--accent-2); font: 700 16px var(--mono); }}
322.flow li span {{ display: block; color: var(--muted); }}
323details {{ margin-top: 10px; }}
324summary {{ cursor: pointer; color: var(--accent); font: 13px var(--mono); }}
325table {{ width: 100%; border-collapse: collapse; margin-top: 10px; font-size: 13px; }}
326th, td {{ text-align: left; border-bottom: 1px solid rgba(255,255,255,.07); padding: 7px 8px; vertical-align: top; }}
327th {{ color: var(--muted); font: 12px var(--mono); text-transform: uppercase; letter-spacing: .08em; }}
328code, pre {{ font-family: var(--mono); }}
329pre {{ max-height: 540px; overflow: auto; background: #050606; border: 1px solid var(--line); border-radius: 10px; padding: 14px; color: #dad8cf; white-space: pre-wrap; }}
330.mini-grid {{ display: grid; grid-template-columns: repeat(auto-fit, minmax(240px, 1fr)); gap: 12px; }}
331.symbol p {{ margin: 8px 0; }}
332.calls {{ color: var(--muted); }}
333.warning {{ border-color: rgba(238,155,102,.45); background: rgba(238,155,102,.08); }}
334.hidden {{ display: none !important; }}
335.source-list {{ display: grid; gap: 10px; }}
336.source-file {{ border: 1px solid var(--line); border-radius: 10px; background: var(--panel); padding: 0 12px 10px; }}
337.source-file summary {{ padding: 12px 0; color: var(--text); }}
338.source-file summary span {{ color: var(--muted); margin-left: 8px; }}
339.source-code {{ max-height: 620px; font-size: 12px; line-height: 1.45; white-space: pre; }}
340.src-line {{ display: grid; grid-template-columns: 52px minmax(0, 1fr); min-height: 17px; }}
341.src-line b {{ color: var(--faint); user-select: none; font-weight: 500; }}
342.src-line code {{ color: #d6d3ca; }}
343@media (max-width: 980px) {{ .shell {{ grid-template-columns: 1fr; }} .sidebar {{ position: relative; height: auto; }} main {{ padding: 24px 18px 60px; }} .arch {{ grid-template-columns: 1fr; }} }}
344</style>
345</head>
346<body>
347<div class="shell">
348 <aside class="sidebar">
349 <div class="logo">NIPUX ATLAS</div>
350 <div class="subtitle">Generated from tracked source on {esc(commit)}.</div>
351 <input id="search" class="search" placeholder="filter files, prompts, tools..." autocomplete="off">
352 <nav class="nav">
353 <a href="#architecture">Architecture</a>
354 <a href="#runtime-flow">Runtime Flow</a>
355 <a href="#prompts">Prompt Surfaces</a>
356 <a href="#tools">Tool Registry</a>
357 <a href="#database">Database</a>
358 <a href="#files">Files</a>
359 <a href="#symbols">Symbols</a>
360 <a href="#source-browser">Source Browser</a>
361 <a href="#tests">Tests</a>
362 <a href="#risks">Review Points</a>
363 </nav>
364 <div class="stats">
365 <div class="stat"><b>{len(files)}</b><span>tracked files</span></div>
366 <div class="stat"><b>{total_lines:,}</b><span>tracked lines mapped</span></div>
367 <div class="stat"><b>{len(symbols)}</b><span>python symbols</span></div>
368 <div class="stat"><b>{len(tools)}</b><span>runtime tools</span></div>
369 </div>
370 </aside>
371 <main>
372 <section class="hero">
373 <div class="eyebrow">Backend map / prompt audit / source index</div>
374 <h1>Nipux Project Atlas</h1>
375 <p class="lede">A self-contained visual map of the current backend: entrypoints, daemon loop, worker prompt assembly, durable memory, tools, SQLite schema, UI control plane, tests, and every tracked source file with parsed functions/classes and line references.</p>
376 </section>
377
378 <section id="architecture" class="section">
379 <header><div><div class="kicker">Mind map</div><h2>Architecture</h2></div><span class="muted">Click a node to jump into related detail.</span></header>
380 <div class="arch">{architecture_nodes()}</div>
381 </section>
382
383 <section id="runtime-flow" class="section">
384 <header><div><div class="kicker">Lifecycle</div><h2>Runtime Flow</h2></div><span class="muted">What happens from terminal input to durable progress.</span></header>
385 <ol class="flow">{runtime_flow()}</ol>
386 </section>
387
388 <section id="prompts" class="section">
389 <header><div><div class="kicker">Exact text</div><h2>Prompt Surfaces</h2></div><span class="muted">System/program prompts and instruction-like strings extracted from source.</span></header>
390 <div class="grid">{prompt_cards}</div>
391 </section>
392
393 <section id="tools" class="section">
394 <header><div><div class="kicker">Tools</div><h2>Tool Registry</h2></div><span class="muted">Static ToolSpec definitions from nipux_cli/tools.py.</span></header>
395 <table><thead><tr><th>Name</th><th>Description</th><th>Line</th></tr></thead><tbody>{tool_rows}</tbody></table>
396 </section>
397
398 <section id="database" class="section">
399 <header><div><div class="kicker">Persistence</div><h2>SQLite Tables</h2></div><span class="muted">CREATE TABLE blocks found in nipux_cli/db.py.</span></header>
400 <div class="grid">{table_cards}</div>
401 </section>
402
403 <section id="files" class="section">
404 <header><div><div class="kicker">Source index</div><h2>Important Files</h2></div><span class="muted">{len(python_files)} Python modules plus docs/config files.</span></header>
405 <div class="grid">{file_cards}</div>
406 </section>
407
408 <section id="symbols" class="section">
409 <header><div><div class="kicker">Functions and classes</div><h2>Symbol Map</h2></div><span class="muted">Parsed with Python AST.</span></header>
410 <div class="mini-grid">{symbol_cards}</div>
411 </section>
412
413 <section id="source-browser" class="section">
414 <header><div><div class="kicker">Line-by-line</div><h2>Source Browser</h2></div><span class="muted">Collapsed raw tracked source so the backend can be inspected directly in this page.</span></header>
415 <div class="source-list">{source_browser}</div>
416 </section>
417
418 <section id="tests" class="section">
419 <header><div><div class="kicker">Verification</div><h2>Test Coverage Map</h2></div><span class="muted">Test files included in the source index.</span></header>
420 <div class="grid">{test_cards(files)}</div>
421 </section>
422
423 <section id="risks" class="section">
424 <header><div><div class="kicker">Audit cues</div><h2>Review Points</h2></div><span class="muted">Generated signals for where to inspect next.</span></header>
425 <div class="grid">{risk_cards}</div>
426 </section>
427 </main>
428</div>
429<script>
430const search = document.getElementById('search');
431search?.addEventListener('input', () => {{
432 const term = search.value.toLowerCase().trim();
433 document.querySelectorAll('.searchable').forEach((node) => {{
434 const hay = (node.getAttribute('data-search') || node.textContent || '').toLowerCase();
435 node.classList.toggle('hidden', term && !hay.includes(term));
436 }});
437}});
438document.querySelectorAll('.node[data-target]').forEach((node) => {{
439 node.addEventListener('click', () => {{
440 const target = document.getElementById(node.getAttribute('data-target'));
441 if (target) target.scrollIntoView({{ behavior: 'smooth', block: 'start' }});
442 }});
443}});
444</script>
445</body>
446</html>
447"""
448
449
450def architecture_nodes() -> str:
451 nodes = [
452 ("cli-tui", "CLI / TUI", "nipux_cli/cli.py", "Chat-first terminal UI, first-run menu, slash commands, job switching, event panes."),
453 ("sqlite-state", "SQLite state", "nipux_cli/db.py", "Jobs, runs, steps, artifacts, events, ledgers, usage, and memory index."),
454 ("daemon", "Daemon", "nipux_cli/daemon.py", "Single-instance forever loop, stale runtime fingerprint, heartbeat, work scheduling."),
455 ("worker-loop", "Worker loop", "nipux_cli/worker.py", "Builds prompts, chooses one tool step, guards loops, records durable progress."),
456 ("llm-adapter", "LLM adapter", "nipux_cli/llm.py", "OpenAI-compatible chat calls, usage/cost tracking, tool call parsing."),
457 ("tool-registry", "Tool registry", "nipux_cli/tools.py", "Browser, web, shell, artifact, ledger, task, experiment, digest tools."),
458 ("browser-web", "Browser/web", "nipux_cli/browser.py / web.py", "Visible browsing, snapshots, search/extract helpers, anti-bot source scoring."),
459 ("artifacts-files", "Artifacts/files", "nipux_cli/artifacts.py", "Saved outputs and concrete workspace file writing."),
460 ("memory", "Memory", "compression.py / operator_context.py", "Compact rolling memory and durable operator context."),
461 ]
462 return "".join(
463 f"<button id='{esc(anchor)}' class='node searchable' data-target='source-browser' data-search='{esc(title + ' ' + path + ' ' + desc)}'>"
464 f"<strong>{esc(title)}</strong><span>{esc(path)}</span><em>{esc(desc)}</em></button>"
465 for anchor, title, path, desc in nodes
466 )
467
468
469def runtime_flow() -> str:
470 steps = [
471 ("Startup", "pyproject entrypoint calls nipux_cli.cli:main. With no args, the chat/TUI opens on the focused job or first-run workspace."),
472 ("Operator input", "Plain chat is stored as visible events and, when relevant, durable operator context for future worker prompts."),
473 ("Daemon scheduling", "The daemon claims runnable jobs, keeps a lock/heartbeat, starts runs, and calls one bounded worker step repeatedly."),
474 ("Prompt assembly", "worker.build_messages layers system prompt, program template, operator context, roadmaps, tasks, ledgers, experiments, memory, timeline, and recent steps."),
475 ("Tool call", "The LLM selects one OpenAI-style tool. The registry executes it with ToolContext and stores input/output in steps/events."),
476 ("Progress accounting", "Guards require artifacts, findings, tasks, experiments, or milestone validation when evidence or measurements appear."),
477 ("Persistence", "Artifacts go to the job output directory. SQLite stores steps, events, ledgers, runtime state, and usage/cost metadata."),
478 ("UI refresh", "The TUI reads timeline/events and compact job metrics, splitting chat from worker activity and status."),
479 ]
480 return "".join(f"<li><strong>{esc(title)}</strong><span>{esc(body)}</span></li>" for title, body in steps)
481
482
483def render_file_card(source: SourceFile, symbols: list[Symbol]) -> str:
484 local_symbols = [symbol for symbol in symbols if symbol.path == source.path]
485 top_names = ", ".join(symbol.name for symbol in local_symbols[:10]) or "none"
486 doc = module_doc(source) or source.error or "No module docstring."
487 imports = ", ".join(module_imports(source)[:12]) or "none"
488 return f"""<article class="file-card searchable" data-search="{esc(source.path + ' ' + doc + ' ' + top_names)}">
489<header><h3>{esc(source.path)}</h3><span>{len(source.lines)} lines</span></header>
490<p>{esc(short(doc, 260))}</p>
491<div class="meta"><span>{len(local_symbols)} symbols</span><span>{source.text.count('TODO')} TODOs</span></div>
492<details><summary>Imports and top symbols</summary><p><strong>Imports:</strong> {esc(imports)}</p><p><strong>Symbols:</strong> {esc(top_names)}</p></details>
493</article>"""
494
495
496def module_doc(source: SourceFile) -> str:
497 if source.tree is None:
498 for line in source.lines:
499 stripped = line.strip()
500 if stripped and not stripped.startswith("#"):
501 return stripped
502 return ""
503 return ast.get_docstring(source.tree) or ""
504
505
506def module_imports(source: SourceFile) -> list[str]:
507 if source.tree is None:
508 return []
509 names: list[str] = []
510 for node in source.tree.body:
511 if isinstance(node, ast.Import):
512 names.extend(alias.name for alias in node.names)
513 elif isinstance(node, ast.ImportFrom):
514 module = "." * node.level + (node.module or "")
515 names.append(module)
516 return names
517
518
519def render_source_file(source: SourceFile) -> str:
520 rendered_lines = [redact_source_line(line) for line in source.lines]
521 search_text = "\n".join(rendered_lines)
522 code = "\n".join(
523 f"<span class='src-line'><b>{index:>4}</b><code>{esc(line)}</code></span>"
524 for index, line in enumerate(rendered_lines, start=1)
525 )
526 return f"""<details class="source-file searchable" data-search="{esc(source.path + ' ' + search_text[:4000])}">
527<summary>{esc(source.path)} <span>{len(source.lines)} lines</span></summary>
528<pre class="source-code">{code}</pre>
529</details>"""
530
531
532def redact_source_line(line: str) -> str:
533 match = SENSITIVE_ASSIGNMENT_RE.match(line)
534 if not match:
535 return line
536 indent, name, _space, _value = match.groups()
537 return f"{indent}{name} = <redacted>"
538
539
540def render_symbol(symbol: Symbol) -> str:
541 doc = short(symbol.doc or "No docstring.", 180)
542 calls = ", ".join(symbol.calls) or "none"
543 return f"""<article class="symbol searchable" data-search="{esc(symbol.path + ' ' + symbol.name + ' ' + doc + ' ' + calls)}">
544<h3>{esc(symbol.name)}</h3>
545<p><span class="pill">{esc(symbol.kind)}</span> <span class="pill">{esc(symbol.path)}:{symbol.line}</span></p>
546<p>{esc(doc)}</p>
547<p class="calls"><strong>Calls:</strong> {esc(short(calls, 280))}</p>
548</article>"""
549
550
551def render_prompt(prompt: Prompt) -> str:
552 title = prompt.name or "prompt"
553 return f"""<article class="prompt searchable" data-search="{esc(prompt.path + ' ' + title + ' ' + prompt.text)}">
554<header><h3>{esc(title)}</h3><span>{esc(prompt.path)}:{prompt.line}</span></header>
555<p class="muted">Context: <code>{esc(prompt.context)}</code> · {len(prompt.text):,} chars</p>
556<pre><code>{esc(prompt.text)}</code></pre>
557</article>"""
558
559
560def render_table(table: dict[str, Any]) -> str:
561 columns = "".join(f"<li><code>{esc(column)}</code></li>" for column in table["columns"][:40])
562 return f"""<article class="db-card searchable" data-search="{esc(table['name'] + ' ' + ' '.join(table['columns']))}">
563<h3>{esc(table['name'])}</h3>
564<p class="muted">nipux_cli/db.py:{table['line']}</p>
565<ul>{columns}</ul>
566</article>"""
567
568
569def test_cards(files: list[SourceFile]) -> str:
570 tests = [source for source in files if source.path.startswith("tests/")]
571 cards = []
572 for source in tests:
573 names = []
574 if source.tree:
575 names = [node.name for node in ast.walk(source.tree) if isinstance(node, ast.FunctionDef) and node.name.startswith("test_")]
576 cards.append(
577 f"""<article class="test-card searchable" data-search="{esc(source.path + ' ' + ' '.join(names))}">
578<h3>{esc(source.path)}</h3><p><strong>{len(names)}</strong> tests · {len(source.lines)} lines</p>
579<p class="muted">{esc(short(', '.join(names), 320))}</p></article>"""
580 )
581 return "\n".join(cards)
582
583
584def render_review_points(
585 files: list[SourceFile],
586 symbols: list[Symbol],
587 prompts: list[Prompt],
588 tools: list[dict[str, str]],
589) -> str:
590 largest = sorted(files, key=lambda source: len(source.lines), reverse=True)[:5]
591 large_text = ", ".join(f"{source.path} ({len(source.lines)} lines)" for source in largest)
592 prompt_text = f"{len(prompts)} prompt/instruction-like strings were extracted. Inspect this section after any agent-behavior change."
593 tool_text = f"{len(tools)} tools are exposed to the worker. Review descriptions whenever generic behavior changes."
594 symbol_text = f"{len(symbols)} symbols were parsed. Large modules are candidates for refactoring once behavior stabilizes."
595 cards = [
596 ("Large modules", large_text),
597 ("Prompt surfaces", prompt_text),
598 ("Tool surface", tool_text),
599 ("Symbol map", symbol_text),
600 ]
601 return "\n".join(
602 f"<article class='card warning searchable' data-search='{esc(title + ' ' + body)}'><h3>{esc(title)}</h3><p>{esc(body)}</p></article>"
603 for title, body in cards
604 )
605
606
607def short(text: str, limit: int) -> str:
608 clean = " ".join(str(text).split())
609 if len(clean) <= limit:
610 return clean
611 return clean[: max(0, limit - 3)] + "..."
612
613
614def esc(value: Any) -> str:
615 return html.escape(str(value), quote=True)
616
617
618if __name__ == "__main__":
619 main()
scripts/live_memory_graph_smoke.py 226 lines
1#!/usr/bin/env python3
2"""Run an opt-in real-model smoke test for memory-graph tool calling.
3
4This script is intentionally outside the normal Nipux runtime path. It creates
5an isolated temporary Nipux home, seeds generic durable job state, and verifies
6that a configured OpenAI-compatible model can consolidate that state with the
7`record_memory_graph` tool.
8"""
9
10from __future__ import annotations
11
12import argparse
13import json
14import os
15import shutil
16import sys
17import tempfile
18from pathlib import Path
19from typing import Any
20
21from nipux_cli.config import AppConfig, ModelConfig, RuntimeConfig, ToolAccessConfig
22from nipux_cli.db import AgentDB
23from nipux_cli.memory_graph import memory_graph_from_job
24from nipux_cli.worker import run_one_step
25
26
27DEFAULT_MODEL = "qwen/qwen3.6-27b"
28DEFAULT_BASE_URL = "https://openrouter.ai/api/v1"
29DEFAULT_API_KEY_ENV = <redacted>
30
31
32def main() -> int:
33 parser = argparse.ArgumentParser(description=__doc__)
34 parser.add_argument("--model", default=DEFAULT_MODEL, help=f"OpenAI-compatible model name. Default: {DEFAULT_MODEL}")
35 parser.add_argument("--base-url", default=DEFAULT_BASE_URL, help=f"Provider base URL. Default: {DEFAULT_BASE_URL}")
36 parser.add_argument("--api-key-env", default=DEFAULT_API_KEY_ENV, help=f"API key env var. Default: {DEFAULT_API_KEY_ENV}")
37 parser.add_argument("--context-length", type=int, default=262_144)
38 parser.add_argument("--steps", type=int, default=3, help="Maximum worker turns to try.")
39 parser.add_argument("--keep-home", action="store_true", help="Keep the temporary Nipux home for inspection.")
40 parser.add_argument("--json", action="store_true", help="Print a machine-readable result.")
41 args = parser.parse_args()
42
43 api_key = os.environ.get(args.api_key_env, "")
44 if not api_key:
45 return _finish(
46 {
47 "success": False,
48 "error": f"{args.api_key_env} is not set",
49 "action": f"Export {args.api_key_env} before running this live smoke. The key is never printed.",
50 },
51 json_output=args.json,
52 )
53
54 home = Path(tempfile.mkdtemp(prefix="nipux-memory-graph-live-"))
55 try:
56 config = AppConfig(
57 runtime=RuntimeConfig(home=home, max_steps_per_run=1),
58 model=ModelConfig(
59 model=args.model,
60 base_url=args.base_url.rstrip("/"),
61 api_key_env=args.api_key_env,
62 context_length=args.context_length,
63 request_timeout_seconds=180,
64 ),
65 tools=ToolAccessConfig(browser=False, web=False, shell=False, files=False),
66 )
67 config.ensure_dirs()
68 db = AgentDB(config.runtime.state_db_path)
69 try:
70 job_id = db.create_job(
71 "Consolidate generic durable job knowledge into an inspectable memory graph.",
72 title="memory graph live smoke",
73 metadata=_seed_metadata(),
74 )
75 db.update_job_status(job_id, "running")
76 executions = []
77 for _ in range(max(1, args.steps)):
78 execution = run_one_step(job_id, config=config, db=db)
79 executions.append(_execution_summary(execution))
80 job = db.get_job(job_id)
81 graph = memory_graph_from_job(job)
82 if graph["nodes"]:
83 return _finish(
84 {
85 "success": True,
86 "home": str(home),
87 "model": args.model,
88 "base_url": args.base_url,
89 "job_id": job_id,
90 "node_count": len(graph["nodes"]),
91 "edge_count": len(graph["edges"]),
92 "executions": executions,
93 },
94 json_output=args.json,
95 )
96 job = db.get_job(job_id)
97 graph = memory_graph_from_job(job)
98 return _finish(
99 {
100 "success": False,
101 "home": str(home),
102 "model": args.model,
103 "base_url": args.base_url,
104 "job_id": job_id,
105 "node_count": len(graph["nodes"]),
106 "edge_count": len(graph["edges"]),
107 "executions": executions,
108 "error": "model did not create memory graph nodes within the step budget",
109 },
110 json_output=args.json,
111 )
112 finally:
113 db.close()
114 finally:
115 if args.keep_home:
116 print(f"kept temporary Nipux home: {home}", file=sys.stderr)
117 else:
118 shutil.rmtree(home, ignore_errors=True)
119
120
121def _seed_metadata() -> dict[str, Any]:
122 return {
123 "finding_ledger": [
124 {
125 "name": "Durable outputs need reusable summaries",
126 "category": "process",
127 "reason": "Saved outputs are easier to reuse when connected to decisions and tasks.",
128 "score": 0.82,
129 },
130 {
131 "name": "Repeated branch work needs explicit rejection criteria",
132 "category": "process",
133 "reason": "A branch should either improve evidence, produce a deliverable, or be deprecated.",
134 "score": 0.78,
135 },
136 ],
137 "source_ledger": [
138 {
139 "source": "internal://recent-events",
140 "source_type": "job_history",
141 "usefulness_score": 0.8,
142 "yield_count": 2,
143 "last_outcome": "Recent events exposed reusable process knowledge.",
144 },
145 {
146 "source": "internal://saved-outputs",
147 "source_type": "artifact_index",
148 "usefulness_score": 0.7,
149 "yield_count": 1,
150 "last_outcome": "Saved outputs provide evidence refs for future graph nodes.",
151 },
152 ],
153 "lessons": [
154 {
155 "category": "strategy",
156 "lesson": "Prefer measured or validated progress over activity counts.",
157 "confidence": 0.86,
158 },
159 {
160 "category": "memory",
161 "lesson": "Consolidate stable findings into linked graph nodes before context grows.",
162 "confidence": 0.9,
163 },
164 ],
165 "task_queue": [
166 {
167 "title": "Create a connected memory graph from durable signals",
168 "status": "open",
169 "output_contract": "decision",
170 "acceptance_criteria": "At least one reusable node connected to evidence or strategy.",
171 }
172 ],
173 "roadmap": {
174 "title": "Long-running job memory",
175 "status": "active",
176 "milestones": [
177 {
178 "title": "Consolidate reusable knowledge",
179 "status": "open",
180 "validation_contract": "Future turns can retrieve the key decisions without replaying raw history.",
181 }
182 ],
183 },
184 }
185
186
187def _execution_summary(execution: Any) -> dict[str, Any]:
188 result = execution.result if isinstance(execution.result, dict) else {}
189 return {
190 "status": execution.status,
191 "tool": execution.tool_name,
192 "step_id": execution.step_id,
193 "success": result.get("success"),
194 "error": result.get("error"),
195 "added_nodes": result.get("added_nodes"),
196 "added_edges": result.get("added_edges"),
197 }
198
199
200def _finish(payload: dict[str, Any], *, json_output: bool) -> int:
201 if json_output:
202 print(json.dumps(payload, indent=2, sort_keys=True))
203 else:
204 print(_human_summary(payload))
205 return 0 if payload.get("success") else 1
206
207
208def _human_summary(payload: dict[str, Any]) -> str:
209 lines = [f"success: {bool(payload.get('success'))}"]
210 for key in ("model", "base_url", "home", "job_id", "node_count", "edge_count", "error", "action"):
211 if payload.get(key) is not None:
212 lines.append(f"{key}: {payload[key]}")
213 executions = payload.get("executions")
214 if isinstance(executions, list) and executions:
215 lines.append("executions:")
216 for item in executions:
217 lines.append(
218 " - "
219 f"status={item.get('status')} tool={item.get('tool')} "
220 f"success={item.get('success')} error={item.get('error')}"
221 )
222 return "\n".join(lines)
223
224
225if __name__ == "__main__":
226 raise SystemExit(main())
scripts/render_nipux_ascii_video.py 523 lines
1#!/usr/bin/env python3
2"""Render a Nipux ASCII-art CLI intro as an MP4.
3
4The renderer is dependency-light on purpose: it draws a small embedded
5bitmap font into raw RGB frames and pipes those frames directly to ffmpeg.
6"""
7
8from __future__ import annotations
9
10import argparse
11import math
12import random
13import shutil
14import subprocess
15from dataclasses import dataclass
16from pathlib import Path
17
18
19WIDTH = 1440
20HEIGHT = 900
21FPS = 30
22DURATION = 8.0
23COLS = 96
24ROWS = 34
25CELL_W = 13
26CELL_H = 22
27ORIGIN_X = (WIDTH - COLS * CELL_W) // 2
28ORIGIN_Y = (HEIGHT - ROWS * CELL_H) // 2
29SCALE = 2
30
31BG = (2, 5, 5)
32BG_SCAN = (1, 3, 4)
33PANEL = (4, 10, 8)
34PANEL_EDGE = (10, 56, 37)
35PANEL_GLOW = (4, 30, 24)
36DIM = (33, 92, 62)
37MID = (72, 180, 112)
38MAIN = (126, 255, 164)
39HOT = (104, 238, 255)
40AMBER = (255, 184, 70)
41MAGENTA = (255, 95, 154)
42WHITE = (220, 255, 238)
43
44LOGO = [
45 r"## ## #### ###### ## ## ## ##",
46 r"### ## ## ## ## ## ## ## ## ",
47 r"#### ## ## ## ## ## ## ### ",
48 r"## #### ## ###### ## ## ### ",
49 r"## ### ## ## ## ## ## ## ",
50 r"## ## #### ## ##### ## ##",
51]
52
53GLITCH_CHARS = ".:-=+*#%@/\\|<>[]{}01"
54RAIN_CHARS = ".:-+*#01/\\<>"
55
56
57GLYPHS: dict[str, tuple[str, ...]] = {
58 " ": ("00000", "00000", "00000", "00000", "00000", "00000", "00000"),
59 "A": ("01110", "10001", "10001", "11111", "10001", "10001", "10001"),
60 "B": ("11110", "10001", "10001", "11110", "10001", "10001", "11110"),
61 "C": ("01111", "10000", "10000", "10000", "10000", "10000", "01111"),
62 "D": ("11110", "10001", "10001", "10001", "10001", "10001", "11110"),
63 "E": ("11111", "10000", "10000", "11110", "10000", "10000", "11111"),
64 "F": ("11111", "10000", "10000", "11110", "10000", "10000", "10000"),
65 "G": ("01111", "10000", "10000", "10011", "10001", "10001", "01110"),
66 "H": ("10001", "10001", "10001", "11111", "10001", "10001", "10001"),
67 "I": ("01110", "00100", "00100", "00100", "00100", "00100", "01110"),
68 "J": ("00111", "00010", "00010", "00010", "10010", "10010", "01100"),
69 "K": ("10001", "10010", "10100", "11000", "10100", "10010", "10001"),
70 "L": ("10000", "10000", "10000", "10000", "10000", "10000", "11111"),
71 "M": ("10001", "11011", "10101", "10101", "10001", "10001", "10001"),
72 "N": ("10001", "11001", "10101", "10011", "10001", "10001", "10001"),
73 "O": ("01110", "10001", "10001", "10001", "10001", "10001", "01110"),
74 "P": ("11110", "10001", "10001", "11110", "10000", "10000", "10000"),
75 "Q": ("01110", "10001", "10001", "10001", "10101", "10010", "01101"),
76 "R": ("11110", "10001", "10001", "11110", "10100", "10010", "10001"),
77 "S": ("01111", "10000", "10000", "01110", "00001", "00001", "11110"),
78 "T": ("11111", "00100", "00100", "00100", "00100", "00100", "00100"),
79 "U": ("10001", "10001", "10001", "10001", "10001", "10001", "01110"),
80 "V": ("10001", "10001", "10001", "10001", "10001", "01010", "00100"),
81 "W": ("10001", "10001", "10001", "10101", "10101", "10101", "01010"),
82 "X": ("10001", "10001", "01010", "00100", "01010", "10001", "10001"),
83 "Y": ("10001", "10001", "01010", "00100", "00100", "00100", "00100"),
84 "Z": ("11111", "00001", "00010", "00100", "01000", "10000", "11111"),
85 "a": ("00000", "00000", "01110", "00001", "01111", "10001", "01111"),
86 "b": ("10000", "10000", "10110", "11001", "10001", "10001", "11110"),
87 "c": ("00000", "00000", "01110", "10000", "10000", "10001", "01110"),
88 "d": ("00001", "00001", "01101", "10011", "10001", "10001", "01111"),
89 "e": ("00000", "00000", "01110", "10001", "11111", "10000", "01110"),
90 "f": ("00110", "01001", "01000", "11100", "01000", "01000", "01000"),
91 "g": ("00000", "01111", "10001", "10001", "01111", "00001", "01110"),
92 "h": ("10000", "10000", "10110", "11001", "10001", "10001", "10001"),
93 "i": ("00100", "00000", "01100", "00100", "00100", "00100", "01110"),
94 "j": ("00010", "00000", "00110", "00010", "00010", "10010", "01100"),
95 "k": ("10000", "10000", "10010", "10100", "11000", "10100", "10010"),
96 "l": ("01100", "00100", "00100", "00100", "00100", "00100", "01110"),
97 "m": ("00000", "00000", "11010", "10101", "10101", "10101", "10101"),
98 "n": ("00000", "00000", "10110", "11001", "10001", "10001", "10001"),
99 "o": ("00000", "00000", "01110", "10001", "10001", "10001", "01110"),
100 "p": ("00000", "00000", "11110", "10001", "11110", "10000", "10000"),
101 "q": ("00000", "00000", "01101", "10011", "01111", "00001", "00001"),
102 "r": ("00000", "00000", "10110", "11001", "10000", "10000", "10000"),
103 "s": ("00000", "00000", "01111", "10000", "01110", "00001", "11110"),
104 "t": ("01000", "01000", "11100", "01000", "01000", "01001", "00110"),
105 "u": ("00000", "00000", "10001", "10001", "10001", "10011", "01101"),
106 "v": ("00000", "00000", "10001", "10001", "10001", "01010", "00100"),
107 "w": ("00000", "00000", "10001", "10001", "10101", "10101", "01010"),
108 "x": ("00000", "00000", "10001", "01010", "00100", "01010", "10001"),
109 "y": ("00000", "00000", "10001", "10001", "01111", "00001", "01110"),
110 "z": ("00000", "00000", "11111", "00010", "00100", "01000", "11111"),
111 "0": ("01110", "10001", "10011", "10101", "11001", "10001", "01110"),
112 "1": ("00100", "01100", "00100", "00100", "00100", "00100", "01110"),
113 "2": ("01110", "10001", "00001", "00010", "00100", "01000", "11111"),
114 "3": ("11110", "00001", "00001", "01110", "00001", "00001", "11110"),
115 "4": ("00010", "00110", "01010", "10010", "11111", "00010", "00010"),
116 "5": ("11111", "10000", "10000", "11110", "00001", "00001", "11110"),
117 "6": ("01110", "10000", "10000", "11110", "10001", "10001", "01110"),
118 "7": ("11111", "00001", "00010", "00100", "01000", "01000", "01000"),
119 "8": ("01110", "10001", "10001", "01110", "10001", "10001", "01110"),
120 "9": ("01110", "10001", "10001", "01111", "00001", "00001", "01110"),
121 ".": ("00000", "00000", "00000", "00000", "00000", "01100", "01100"),
122 ",": ("00000", "00000", "00000", "00000", "00000", "01100", "01000"),
123 ":": ("00000", "01100", "01100", "00000", "01100", "01100", "00000"),
124 ";": ("00000", "01100", "01100", "00000", "01100", "01000", "10000"),
125 "!": ("00100", "00100", "00100", "00100", "00100", "00000", "00100"),
126 "?": ("01110", "10001", "00001", "00010", "00100", "00000", "00100"),
127 "'": ("00100", "00100", "01000", "00000", "00000", "00000", "00000"),
128 '"': ("01010", "01010", "01010", "00000", "00000", "00000", "00000"),
129 "-": ("00000", "00000", "00000", "11111", "00000", "00000", "00000"),
130 "_": ("00000", "00000", "00000", "00000", "00000", "00000", "11111"),
131 "+": ("00000", "00100", "00100", "11111", "00100", "00100", "00000"),
132 "=": ("00000", "00000", "11111", "00000", "11111", "00000", "00000"),
133 "*": ("00000", "10101", "01110", "11111", "01110", "10101", "00000"),
134 "#": ("01010", "11111", "01010", "01010", "11111", "01010", "01010"),
135 "@": ("01110", "10001", "10111", "10101", "10111", "10000", "01110"),
136 "%": ("11001", "11010", "00100", "01000", "10110", "00110", "00000"),
137 "$": ("00100", "01111", "10100", "01110", "00101", "11110", "00100"),
138 "&": ("01100", "10010", "10100", "01000", "10101", "10010", "01101"),
139 "/": ("00001", "00010", "00100", "01000", "10000", "00000", "00000"),
140 "\\": ("10000", "01000", "00100", "00010", "00001", "00000", "00000"),
141 "|": ("00100", "00100", "00100", "00100", "00100", "00100", "00100"),
142 "<": ("00010", "00100", "01000", "10000", "01000", "00100", "00010"),
143 ">": ("01000", "00100", "00010", "00001", "00010", "00100", "01000"),
144 "(": ("00010", "00100", "01000", "01000", "01000", "00100", "00010"),
145 ")": ("01000", "00100", "00010", "00010", "00010", "00100", "01000"),
146 "[": ("01110", "01000", "01000", "01000", "01000", "01000", "01110"),
147 "]": ("01110", "00010", "00010", "00010", "00010", "00010", "01110"),
148 "{": ("00010", "00100", "00100", "01000", "00100", "00100", "00010"),
149 "}": ("01000", "00100", "00100", "00010", "00100", "00100", "01000"),
150 "~": ("00000", "00000", "01001", "10110", "00000", "00000", "00000"),
151 "^": ("00100", "01010", "10001", "00000", "00000", "00000", "00000"),
152}
153
154
155@dataclass(frozen=True)
156class Cell:
157 char: str = " "
158 color: tuple[int, int, int] = DIM
159
160
161class TextGrid:
162 def __init__(self) -> None:
163 self.cells = [[Cell() for _ in range(COLS)] for _ in range(ROWS)]
164
165 def set(self, x: int, y: int, char: str, color: tuple[int, int, int]) -> None:
166 if 0 <= x < COLS and 0 <= y < ROWS and char:
167 self.cells[y][x] = Cell(char[0], color)
168
169 def put(self, x: int, y: int, text: str, color: tuple[int, int, int]) -> None:
170 for offset, char in enumerate(text):
171 self.set(x + offset, y, char, color)
172
173 def center(self, y: int, text: str, color: tuple[int, int, int]) -> None:
174 self.put((COLS - len(text)) // 2, y, text, color)
175
176 def box(self, title: str, color: tuple[int, int, int]) -> None:
177 top = "+" + "-" * (COLS - 2) + "+"
178 bottom = "+" + "-" * (COLS - 2) + "+"
179 self.put(0, 0, top, color)
180 self.put(0, ROWS - 1, bottom, color)
181 for y in range(1, ROWS - 1):
182 self.set(0, y, "|", color)
183 self.set(COLS - 1, y, "|", color)
184 label = f" {title} "
185 self.put(3, 0, label[: COLS - 8], color)
186
187
188def clamp(value: float, low: float = 0.0, high: float = 1.0) -> float:
189 return max(low, min(high, value))
190
191
192def ease(value: float) -> float:
193 value = clamp(value)
194 return value * value * (3.0 - 2.0 * value)
195
196
197def mix(a: tuple[int, int, int], b: tuple[int, int, int], amount: float) -> tuple[int, int, int]:
198 amount = clamp(amount)
199 return tuple(int(a[i] + (b[i] - a[i]) * amount) for i in range(3))
200
201
202def logo_origin() -> tuple[int, int]:
203 max_width = max(len(line) for line in LOGO)
204 return (COLS - max_width) // 2, 8
205
206
207def put_logo(grid: TextGrid, frame: int, reveal: float, stable: bool = False) -> None:
208 left, top = logo_origin()
209 for y, line in enumerate(LOGO):
210 for x, char in enumerate(line):
211 if char == " ":
212 continue
213 rng = random.Random(frame * 1009 + x * 97 + y * 53)
214 if stable or rng.random() < reveal:
215 shown = char
216 color = HOT if stable or rng.random() > 0.18 else AMBER
217 if stable and rng.random() < 0.015:
218 shown = rng.choice("*+#")
219 color = WHITE
220 elif not stable and rng.random() > reveal + 0.18:
221 shown = rng.choice(GLITCH_CHARS)
222 color = MAGENTA
223 grid.set(left + x, top + y, shown, color)
224 elif rng.random() < 0.08 + 0.22 * reveal:
225 grid.set(left + x, top + y, rng.choice(GLITCH_CHARS), mix(DIM, HOT, 0.35))
226
227
228def put_collapsing_logo(grid: TextGrid, frame: int, progress: float) -> None:
229 left, top = logo_origin()
230 cursor_y = 24
231 glyph_index = 0
232 for y, line in enumerate(LOGO):
233 for x, char in enumerate(line):
234 if char == " ":
235 continue
236 rng = random.Random(frame * 1493 + x * 31 + y * 41)
237 target_x = 8 + (glyph_index % 30)
238 target_y = cursor_y + (glyph_index // 30) % 2
239 wobble = math.sin(frame * 0.35 + glyph_index * 0.9) * (1.0 - progress) * 2.0
240 px = round((left + x) * (1.0 - progress) + target_x * progress + wobble)
241 py = round((top + y) * (1.0 - progress) + target_y * progress)
242 shown = char if progress < 0.68 else rng.choice("nipux$>_-/")
243 color = mix(HOT, MAIN, progress)
244 if rng.random() < 0.08:
245 shown = rng.choice(GLITCH_CHARS)
246 color = MAGENTA
247 grid.set(px, py, shown, color)
248 glyph_index += 1
249
250
251def put_progress_bar(grid: TextGrid, x: int, y: int, width: int, progress: float) -> None:
252 progress = clamp(progress)
253 filled = int(round(width * progress))
254 bar = "[" + "#" * filled + "-" * (width - filled) + "]"
255 grid.put(x, y, bar, MID)
256 grid.put(x + 1, y, "#" * filled, HOT if progress > 0.88 else MAIN)
257 grid.put(x + width + 4, y, f"{int(progress * 100):03d}%", AMBER if progress < 1.0 else HOT)
258
259
260def put_rain(grid: TextGrid, frame: int, intensity: float) -> None:
261 if intensity <= 0:
262 return
263 for x in range(2, COLS - 2):
264 rng = random.Random(9001 + x * 113)
265 stream_speed = 1 + rng.randint(0, 2)
266 head = (frame * stream_speed + rng.randint(0, ROWS * 3)) % (ROWS + 16) - 8
267 for trail in range(5):
268 y = head - trail
269 if 2 <= y < ROWS - 2 and random.Random(frame * 313 + x * 17 + trail).random() < intensity:
270 char = random.Random(frame * 997 + x * 19 + trail * 11).choice(RAIN_CHARS)
271 fade = max(0.18, 1.0 - trail * 0.18)
272 color = mix((8, 28, 20), MID, fade * intensity)
273 grid.set(x, y, char, color)
274
275
276def put_boot_lines(grid: TextGrid, frame: int, t: float) -> None:
277 lines = [
278 "$ nipux video --ascii --into-cli",
279 "[scan] terminal cells online",
280 "[map ] routing logo glyphs",
281 "[sync] prompt target locked",
282 ]
283 start_y = 22
284 for i, line in enumerate(lines):
285 reveal = int(clamp((t - 0.12 - i * 0.22) / 0.32) * len(line))
286 color = MAIN if i == 0 else MID
287 grid.put(7, start_y + i, line[:reveal], color)
288 put_progress_bar(grid, 7, 28, 38, clamp(t / 1.12))
289
290
291def put_cli(grid: TextGrid, frame: int, t: float) -> None:
292 command = "$ nipux enter --render ascii"
293 start = 4.78
294 typed = int(clamp((t - start) / 1.08) * len(command))
295 grid.put(7, 24, command[:typed], MAIN)
296 cursor_x = 7 + typed
297 if frame // 8 % 2 == 0 and typed < len(command):
298 grid.set(cursor_x, 24, "_", HOT)
299
300 if t > 5.9:
301 grid.put(7, 26, "[ok] word packed into cli prompt", MID)
302 if t > 6.26:
303 grid.put(7, 27, "[ok] ascii signal clean", MID)
304 if t > 6.58:
305 put_progress_bar(grid, 7, 29, 42, clamp((t - 6.55) / 0.62))
306 if t > 7.18:
307 final = "nipux> "
308 grid.put(7, 31, final, HOT)
309 if frame // 10 % 2 == 0:
310 grid.set(7 + len(final), 31, "_", HOT)
311
312
313def build_grid(frame: int, total_frames: int) -> TextGrid:
314 t = frame / FPS
315 grid = TextGrid()
316 grid.box("nipux ascii cli capture", DIM)
317 grid.put(COLS - 25, 0, " render:rawrgb->mp4 ", DIM)
318 grid.put(4, 2, "MODE ASCII/CLI", MID)
319 grid.put(COLS - 23, 2, f"FRAME {frame:03d}/{total_frames - 1:03d}", DIM)
320
321 rain_intensity = 0.36
322 if t > 5.0:
323 rain_intensity *= 0.35
324 put_rain(grid, frame, rain_intensity)
325
326 if t < 1.18:
327 put_boot_lines(grid, frame, t)
328 elif t < 2.75:
329 put_boot_lines(grid, frame, 1.18)
330 reveal = ease((t - 1.18) / 1.42)
331 put_logo(grid, frame, reveal)
332 grid.center(16, "glyphs are snapping into nipux", DIM)
333 elif t < 3.72:
334 put_logo(grid, frame, 1.0, stable=True)
335 grid.center(16, "nipux", WHITE if frame // 7 % 2 == 0 else HOT)
336 grid.center(18, "pressing the word into a command line", DIM)
337 elif t < 5.08:
338 progress = ease((t - 3.72) / 1.36)
339 put_collapsing_logo(grid, frame, progress)
340 grid.put(7, 24, "$ ", MAIN)
341 if progress > 0.45:
342 partial = "nipux"[: int((progress - 0.45) / 0.55 * 5)]
343 grid.put(9, 24, partial, HOT)
344 grid.center(18, "collapsing ascii mass -> cli input", AMBER)
345 else:
346 put_cli(grid, frame, t)
347
348 if t > 4.8:
349 grid.put(COLS - 27, 30, "STATUS: PROMPT CONTROL", DIM)
350 elif t > 2.0:
351 grid.put(COLS - 24, 30, "STATUS: GLYPH LOCK", DIM)
352 else:
353 grid.put(COLS - 23, 30, "STATUS: BOOT RAIL", DIM)
354
355 return grid
356
357
358def draw_rect(buf: bytearray, x: int, y: int, w: int, h: int, color: tuple[int, int, int]) -> None:
359 x0 = max(0, x)
360 y0 = max(0, y)
361 x1 = min(WIDTH, x + w)
362 y1 = min(HEIGHT, y + h)
363 if x0 >= x1 or y0 >= y1:
364 return
365 row = bytes(color) * (x1 - x0)
366 for py in range(y0, y1):
367 start = (py * WIDTH + x0) * 3
368 buf[start : start + len(row)] = row
369
370
371def build_base_frame() -> bytearray:
372 buf = bytearray(WIDTH * HEIGHT * 3)
373 for y in range(HEIGHT):
374 color = BG_SCAN if y % 4 == 0 else BG
375 row = bytes(color) * WIDTH
376 start = y * WIDTH * 3
377 buf[start : start + len(row)] = row
378
379 panel_x = ORIGIN_X - 28
380 panel_y = ORIGIN_Y - 28
381 panel_w = COLS * CELL_W + 56
382 panel_h = ROWS * CELL_H + 56
383 draw_rect(buf, panel_x - 8, panel_y - 8, panel_w + 16, panel_h + 16, PANEL_GLOW)
384 draw_rect(buf, panel_x, panel_y, panel_w, panel_h, PANEL)
385 draw_rect(buf, panel_x, panel_y, panel_w, 2, PANEL_EDGE)
386 draw_rect(buf, panel_x, panel_y + panel_h - 2, panel_w, 2, PANEL_EDGE)
387 draw_rect(buf, panel_x, panel_y, 2, panel_h, PANEL_EDGE)
388 draw_rect(buf, panel_x + panel_w - 2, panel_y, 2, panel_h, PANEL_EDGE)
389
390 for y in range(panel_y + 36, panel_y + panel_h - 20, 44):
391 draw_rect(buf, panel_x + 18, y, panel_w - 36, 1, (5, 24, 18))
392 return buf
393
394
395BASE_FRAME = build_base_frame()
396
397
398def glyph_for(char: str) -> tuple[str, ...]:
399 return GLYPHS.get(char, GLYPHS.get(char.upper(), GLYPHS["?"]))
400
401
402GLYPHS["?"] = GLYPHS["?"] if "?" in GLYPHS else ("01110", "10001", "00010", "00100", "00100", "00000", "00100")
403
404
405def draw_glyph(buf: bytearray, char: str, x: int, y: int, color: tuple[int, int, int], glow: bool) -> None:
406 glyph = glyph_for(char)
407 if glyph is GLYPHS[" "]:
408 return
409 if glow:
410 glow_color = tuple(max(0, int(c * 0.16)) for c in color)
411 for row_i, row in enumerate(glyph):
412 for col_i, bit in enumerate(row):
413 if bit != "1":
414 continue
415 px = x + col_i * SCALE
416 py = y + row_i * SCALE
417 if glow:
418 draw_rect(buf, px - 1, py - 1, SCALE + 2, SCALE + 2, glow_color)
419 draw_rect(buf, px, py, SCALE, SCALE, color)
420
421
422def render_frame(frame: int, total_frames: int) -> bytes:
423 grid = build_grid(frame, total_frames)
424 buf = bytearray(BASE_FRAME)
425 jitter = 1 if frame % 37 == 0 else 0
426 for y, row in enumerate(grid.cells):
427 for x, cell in enumerate(row):
428 if cell.char == " ":
429 continue
430 px = ORIGIN_X + x * CELL_W + 1 + jitter
431 py = ORIGIN_Y + y * CELL_H + 4
432 bright = sum(cell.color) > 420
433 draw_glyph(buf, cell.char, px, py, cell.color, bright)
434
435 # A light CRT sweep, sparse enough to keep text readable.
436 sweep_y = int((frame * 9) % HEIGHT)
437 draw_rect(buf, 0, sweep_y, WIDTH, 2, (4, 18, 16))
438 return bytes(buf)
439
440
441def render_video(output: Path, poster: Path | None) -> None:
442 ffmpeg = shutil.which("ffmpeg")
443 if not ffmpeg:
444 raise SystemExit("ffmpeg was not found on PATH")
445
446 output.parent.mkdir(parents=True, exist_ok=True)
447 total_frames = int(FPS * DURATION)
448 cmd = [
449 ffmpeg,
450 "-hide_banner",
451 "-loglevel",
452 "error",
453 "-y",
454 "-f",
455 "rawvideo",
456 "-pix_fmt",
457 "rgb24",
458 "-s",
459 f"{WIDTH}x{HEIGHT}",
460 "-r",
461 str(FPS),
462 "-i",
463 "-",
464 "-an",
465 "-c:v",
466 "libx264",
467 "-preset",
468 "medium",
469 "-crf",
470 "18",
471 "-pix_fmt",
472 "yuv420p",
473 "-movflags",
474 "+faststart",
475 str(output),
476 ]
477 process = subprocess.Popen(cmd, stdin=subprocess.PIPE, stderr=subprocess.PIPE)
478 assert process.stdin is not None
479 for frame in range(total_frames):
480 process.stdin.write(render_frame(frame, total_frames))
481 if frame % FPS == 0:
482 print(f"rendered {frame // FPS:02d}s/{int(DURATION):02d}s")
483 process.stdin.close()
484 stderr = process.stderr.read().decode("utf-8", errors="replace") if process.stderr else ""
485 return_code = process.wait()
486 if return_code != 0:
487 raise SystemExit(f"ffmpeg failed with exit code {return_code}\n{stderr}")
488
489 if poster:
490 poster.parent.mkdir(parents=True, exist_ok=True)
491 poster_cmd = [
492 ffmpeg,
493 "-hide_banner",
494 "-loglevel",
495 "error",
496 "-y",
497 "-ss",
498 "00:00:03.05",
499 "-i",
500 str(output),
501 "-frames:v",
502 "1",
503 str(poster),
504 ]
505 subprocess.run(poster_cmd, check=True)
506
507
508def parse_args() -> argparse.Namespace:
509 parser = argparse.ArgumentParser(description="Render the Nipux ASCII CLI intro video.")
510 parser.add_argument("--output", type=Path, default=Path("docs/nipux_ascii_cli.mp4"))
511 parser.add_argument("--poster", type=Path, default=Path("docs/nipux_ascii_cli_poster.png"))
512 return parser.parse_args()
513
514
515def main() -> None:
516 args = parse_args()
517 render_video(args.output, args.poster)
518 print(f"video: {args.output}")
519 print(f"poster: {args.poster}")
520
521
522if __name__ == "__main__":
523 main()
tests/nipux_cli/test_artifacts.py 34 lines
1import pytest
2
3from nipux_cli.artifacts import ArtifactStore
4from nipux_cli.db import AgentDB
5
6
7def test_artifact_store_writes_reads_and_searches(tmp_path):
8 db = AgentDB(tmp_path / "state.db")
9 try:
10 job_id = db.create_job("Collect findings")
11 store = ArtifactStore(tmp_path, db=db)
12
13 stored = store.write_text(
14 job_id=job_id,
15 title="Findings list",
16 summary="contains acme finding",
17 content="Acme Corp\ncontact: founder@example.com\n",
18 )
19
20 assert store.read_text(stored.id).startswith("Acme Corp")
21 results = store.search_text(job_id=job_id, query="founder", limit=5)
22 assert results[0]["id"] == stored.id
23 assert "founder@example.com" in results[0]["excerpt"]
24 finally:
25 db.close()
26
27
28def test_artifact_store_rejects_paths_outside_home(tmp_path):
29 store = ArtifactStore(tmp_path)
30 outside = tmp_path.parent / "outside.txt"
31 outside.write_text("nope", encoding="utf-8")
32
33 with pytest.raises(ValueError):
34 store.read_text(str(outside))
tests/nipux_cli/test_browser_web.py 118 lines
1import json
2
3from nipux_cli.browser import _annotate_source_quality, _session_name, _socket_dir
4from nipux_cli.tools import DEFAULT_REGISTRY, ToolContext
5from nipux_cli.web import _strip_html
6from nipux_cli.artifacts import ArtifactStore
7from nipux_cli.config import AppConfig, RuntimeConfig
8from nipux_cli.db import AgentDB
9
10
11def test_session_name_is_stable_and_safe():
12 assert _session_name("job_abc/def") == "nipux_job_abc_def"
13
14
15def test_long_session_name_is_short_and_hashed():
16 task_id = "research-a-very-long-objective-title-that-needs-a-short-browser-session-name"
17 name = _session_name(task_id)
18 socket_dir = _socket_dir(task_id)
19
20 assert name.startswith("nipux_research-a-very-long")
21 assert len(name) <= 37
22 assert len(str(socket_dir)) < 80
23
24
25def test_strip_html_removes_scripts_and_keeps_text():
26 text = _strip_html("<html><script>bad()</script><h1>Hello</h1><p>World</p></html>")
27 assert "bad" not in text
28 assert "Hello" in text
29 assert "World" in text
30
31
32def test_browser_marks_anti_bot_interstitial_as_warning():
33 result = _annotate_source_quality({
34 "success": True,
35 "data": {"title": "Just a moment...", "url": "https://clutch.co/example"},
36 "snapshot": "Performing security verification. Cloudflare security challenge.",
37 })
38
39 assert result["success"] is True
40 assert "error" not in result
41 assert result["source_warning"] == "cloudflare anti-bot challenge"
42 assert result["warnings"][0]["type"] == "anti_bot"
43
44
45def test_browser_marks_captcha_block_as_warning():
46 result = _annotate_source_quality({
47 "success": True,
48 "data": {"title": "Source search", "url": "https://source.example/search"},
49 "snapshot": 'Iframe "Security CAPTCHA" You have been blocked. You are browsing and clicking at a speed much faster than expected.',
50 })
51
52 assert result["source_warning"] == "captcha/anti-bot block"
53 assert result["warnings"][0]["type"] == "anti_bot"
54
55
56def test_web_extract_marks_anti_bot_pages_as_warning(monkeypatch):
57 from nipux_cli import web
58
59 def fake_request(url):
60 del url
61 return "<h1>Performing security verification</h1><p>Cloudflare security challenge</p>", "text/html"
62
63 monkeypatch.setattr(web, "_request", fake_request)
64 result = web.web_extract(["https://clutch.co/example"])
65
66 page = result["pages"][0]
67 assert "error" not in page
68 assert page["source_warning"] == "cloudflare anti-bot challenge"
69 assert page["warnings"][0]["type"] == "anti_bot"
70 assert "Cloudflare security challenge" in page["text"]
71
72
73def test_browser_tool_uses_native_wrapper(monkeypatch, tmp_path):
74 from nipux_cli import browser
75
76 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
77 db = AgentDB(tmp_path / "state.db")
78 try:
79 job_id = db.create_job("Browse")
80 ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id)
81
82 def fake_navigate(cfg, *, task_id, url):
83 return {"success": True, "task_id": task_id, "url": url}
84
85 monkeypatch.setattr(browser, "navigate", fake_navigate)
86 result = json.loads(DEFAULT_REGISTRY.handle("browser_navigate", {"url": "https://example.com"}, ctx))
87
88 assert result == {"success": True, "task_id": job_id, "url": "https://example.com"}
89 finally:
90 db.close()
91
92
93def test_browser_click_adds_recovery_snapshot_for_stale_ref(monkeypatch, tmp_path):
94 from nipux_cli import browser
95
96 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
97 calls = []
98
99 def fake_command(cfg, *, task_id, command, args=None, timeout=60):
100 del cfg, task_id, args, timeout
101 calls.append(command)
102 if command == "click":
103 return {"success": False, "error": "Unknown ref: e102"}
104 return {
105 "success": True,
106 "data": {
107 "snapshot": "Directory",
108 "refs": {"e1": {"role": "link", "name": "New Result"}},
109 },
110 }
111
112 monkeypatch.setattr(browser, "run_browser_command", fake_command)
113 result = browser.click(config, task_id="job_abc", ref="@e102")
114
115 assert calls == ["click", "snapshot"]
116 assert result["success"] is False
117 assert result["recovery_snapshot"]["data"]["refs"]["e1"]["name"] == "New Result"
118 assert "stale" in result["recovery_guidance"]
tests/nipux_cli/test_cli.py 4970 lines
1import json
2import queue
3import subprocess
4import time
5from pathlib import Path
6
7from nipux_cli.artifacts import ArtifactStore
8from nipux_cli import __version__
9from nipux_cli.chat_frame_runtime import ChatFrameDeps as _ChatFrameDeps
10from nipux_cli.chat_frame_runtime import THINKING_NOTICE as _THINKING_NOTICE
11from nipux_cli.chat_frame_runtime import WAITING_NOTICE as _WAITING_NOTICE
12from nipux_cli.chat_frame_runtime import _display_notices as _display_chat_notices
13from nipux_cli.chat_frame_runtime import _drain_async_notices as _drain_chat_async_notices
14from nipux_cli.chat_frame_runtime import _handle_edit_input as _handle_chat_edit_input
15from nipux_cli.chat_frame_runtime import _handle_chat_submit
16from nipux_cli.chat_frame_runtime import _safe_render_frame as _safe_chat_render_frame
17from nipux_cli.chat_frame_runtime import frame_next_job_id as _frame_next_job_id
18from nipux_cli.chat_frame_runtime import frame_refresh_interval as _frame_refresh_interval
19from nipux_cli.cli import (
20 _build_first_run_frame,
21 _build_chat_frame,
22 _build_chat_messages,
23 _chat_handle_line,
24 _chat_control_command,
25 _capture_chat_command,
26 _config_field_value,
27 _emit_frame_if_changed,
28 _first_run_click_action,
29 _handle_first_run_action,
30 _handle_first_run_menu_line,
31 _handle_first_run_frame_line,
32 _handle_chat_message,
33 _handle_workspace_chat_message,
34 _is_plain_chat_line,
35 _launch_agent_plist,
36 _load_frame_snapshot,
37 _minimal_live_event_line,
38 _print_shell_help,
39 _run_shell_line,
40 _save_config_field,
41 _slash_suggestion_lines,
42 _systemd_service_text,
43 _verify_model_setup_from_first_run,
44 _workspace_chat_job_dossier,
45 build_parser,
46 main,
47)
48from nipux_cli.config import load_config
49from nipux_cli.cli_state import mark_model_setup_verified as _mark_model_setup_verified
50from nipux_cli.cli_state import model_setup_verified as _model_setup_verified
51from nipux_cli.cli_state import read_shell_state as _read_shell_state
52from nipux_cli.cli_state import write_shell_state as _write_shell_state
53from nipux_cli.daemon import append_daemon_event
54from nipux_cli.db import AgentDB
55from nipux_cli.doctor import Check
56from nipux_cli.llm import LLMResponse
57from nipux_cli.settings import inline_setting_notice as _inline_setting_notice
58from nipux_cli.first_run_frame_runtime import FirstRunRuntimeDeps as _FirstRunRuntimeDeps
59from nipux_cli.first_run_frame_runtime import _handle_edit_input as _handle_first_run_edit_input
60from nipux_cli.first_run_frame_runtime import _safe_render_frame as _safe_first_run_render_frame
61from nipux_cli.first_run_frame_runtime import _submit_first_run_line as _submit_first_run_line
62from nipux_cli.first_run_frame_runtime import directional_first_run_action as _directional_first_run_action
63from nipux_cli.frame_snapshot import WORKSPACE_CHAT_ID
64from nipux_cli.tui_commands import (
65 CHAT_SLASH_COMMANDS,
66 FIRST_RUN_SLASH_COMMANDS,
67 autocomplete_slash as _autocomplete_slash,
68 cycle_slash as _cycle_slash,
69 slash_completion_for_submit as _slash_completion_for_submit,
70)
71from nipux_cli.tui_events import chat_pane_lines
72from nipux_cli.tui_input import decode_terminal_escape as _decode_terminal_escape
73from nipux_cli.tui_outcomes import hourly_update_lines, recent_model_update_lines
74from nipux_cli.updater import update_checkout as _update_checkout
75
76
77def _mode(path):
78 return path.stat().st_mode & 0o777
79
80
81def _mark_test_model_ready() -> None:
82 load_config().ensure_dirs()
83 _mark_model_setup_verified(load_config())
84
85
86def test_cli_has_operator_commands():
87 parser = build_parser()
88
89 assert parser.parse_args(["shell", "--status"]).func.__name__ == "cmd_shell"
90 assert parser.parse_args(["status", "--full"]).func.__name__ == "cmd_status"
91 assert parser.parse_args(["health"]).func.__name__ == "cmd_health"
92 assert parser.parse_args(["history"]).func.__name__ == "cmd_history"
93 assert parser.parse_args(["events", "--follow"]).func.__name__ == "cmd_events"
94 assert parser.parse_args(["activity", "--follow"]).func.__name__ == "cmd_activity"
95 assert parser.parse_args(["feed"]).func.__name__ == "cmd_activity"
96 assert parser.parse_args(["update"]).func.__name__ == "cmd_update"
97 assert parser.parse_args(["update", "--no-restart"]).no_restart is True
98 assert parser.parse_args(["uninstall", "--dry-run"]).func.__name__ == "cmd_uninstall"
99 assert parser.parse_args(["uninstall", "--keep-tool"]).keep_tool is True
100 assert parser.parse_args(["new", "Research topic"]).func.__name__ == "cmd_create"
101 assert parser.parse_args(["updates"]).func.__name__ == "cmd_updates"
102 assert parser.parse_args(["outcomes"]).func.__name__ == "cmd_updates"
103 assert parser.parse_args(["outcomes", "--all"]).all is True
104 assert parser.parse_args(["steer", "focus", "sources"]).func.__name__ == "cmd_steer"
105 assert parser.parse_args(["say", "focus", "sources"]).func.__name__ == "cmd_steer"
106 assert parser.parse_args(["pause"]).func.__name__ == "cmd_pause"
107 assert parser.parse_args(["resume"]).func.__name__ == "cmd_resume"
108 assert parser.parse_args(["resume", "research", "finder"]).job_id == ["research", "finder"]
109 assert parser.parse_args(["cancel"]).func.__name__ == "cmd_cancel"
110 assert parser.parse_args(["dashboard", "--no-follow"]).func.__name__ == "cmd_dashboard"
111 assert parser.parse_args(["dash", "--no-follow"]).func.__name__ == "cmd_dashboard"
112 assert parser.parse_args(["focus", "research"]).func.__name__ == "cmd_focus"
113 assert parser.parse_args(["rename", "research", "--title", "new research"]).func.__name__ == "cmd_rename"
114 assert parser.parse_args(["delete", "research"]).func.__name__ == "cmd_delete"
115 assert parser.parse_args(["rm", "research"]).func.__name__ == "cmd_delete"
116 assert parser.parse_args(["chat", "research", "finder"]).func.__name__ == "cmd_chat"
117 assert parser.parse_args(["start", "--poll-seconds", "1"]).func.__name__ == "cmd_start"
118 assert parser.parse_args(["stop"]).func.__name__ == "cmd_stop"
119 assert parser.parse_args(["restart"]).func.__name__ == "cmd_restart"
120 assert parser.parse_args(["stop", "research", "finder"]).func.__name__ == "cmd_stop"
121 assert parser.parse_args(["stop", "research", "finder"]).job_id == ["research", "finder"]
122 assert parser.parse_args(["ls"]).func.__name__ == "cmd_jobs"
123 assert parser.parse_args(["autostart", "status"]).func.__name__ == "cmd_autostart"
124 assert parser.parse_args(["browser-dashboard", "--port", "4848"]).func.__name__ == "cmd_browser_dashboard"
125 assert parser.parse_args(["artifacts"]).func.__name__ == "cmd_artifacts"
126 assert parser.parse_args(["artifact", "art_123"]).func.__name__ == "cmd_artifact"
127 assert parser.parse_args(["artifact", "Findings", "Batch"]).func.__name__ == "cmd_artifact"
128 assert parser.parse_args(["lessons"]).func.__name__ == "cmd_lessons"
129 assert parser.parse_args(["learn", "low-evidence", "pages", "are", "bad"]).func.__name__ == "cmd_learn"
130 assert parser.parse_args(["findings"]).func.__name__ == "cmd_findings"
131 assert parser.parse_args(["tasks"]).func.__name__ == "cmd_tasks"
132 assert parser.parse_args(["roadmap"]).func.__name__ == "cmd_roadmap"
133 assert parser.parse_args(["experiments"]).func.__name__ == "cmd_experiments"
134 assert parser.parse_args(["sources"]).func.__name__ == "cmd_sources"
135 assert parser.parse_args(["memory"]).func.__name__ == "cmd_memory"
136 assert parser.parse_args(["memory", "--graph"]).graph is True
137 assert parser.parse_args(["metrics"]).func.__name__ == "cmd_metrics"
138 assert parser.parse_args(["usage"]).func.__name__ == "cmd_usage"
139 assert parser.parse_args(["outputs", "research", "finder"]).func.__name__ == "cmd_logs"
140 assert parser.parse_args(["outputs"]).func.__name__ == "cmd_logs"
141 assert parser.parse_args(["service", "status"]).func.__name__ == "cmd_service"
142 assert parser.parse_args(["work", "--steps", "2", "--fake"]).func.__name__ == "cmd_work"
143 assert parser.parse_args(["run", "--no-follow"]).func.__name__ == "cmd_run"
144
145
146def test_cli_version_flag(capsys):
147 try:
148 main(["--version"])
149 except SystemExit as exc:
150 assert exc.code == 0
151
152 assert f"nipux {__version__}" in capsys.readouterr().out
153
154
155def test_main_catches_keyboard_interrupt_without_traceback(monkeypatch, capsys):
156 def interrupt(_args):
157 raise KeyboardInterrupt
158
159 monkeypatch.setattr("nipux_cli.cli.cmd_home", interrupt)
160
161 main([])
162
163 assert capsys.readouterr().err == ""
164
165
166def test_python_module_entrypoint_uses_cli_main():
167 import nipux_cli.__main__ as module_entrypoint
168
169 assert module_entrypoint.main is main
170
171
172def test_init_openrouter_writes_secret_free_config_and_env_template(monkeypatch, tmp_path, capsys):
173 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
174
175 main(["init", "--openrouter", "--model", "provider/model"])
176
177 out = capsys.readouterr().out
178 config_text = (tmp_path / "config.yaml").read_text(encoding="utf-8")
179 env_text = (tmp_path / ".env").read_text(encoding="utf-8")
180 assert "Wrote" in out
181 assert "name: provider/model" in config_text
182 assert "base_url: https://openrouter.ai/api/v1" in config_text
183 assert "api_key_env: OPENROUTER_API_KEY" in config_text
184 assert "sk-" not in config_text
185 assert env_text.strip().endswith("OPENROUTER_API_KEY" + "=")
186 assert "sk-" not in env_text
187 assert _mode(tmp_path / "config.yaml") == 0o600
188 assert _mode(tmp_path / ".env") == 0o600
189
190
191def test_init_defaults_to_local_endpoint(monkeypatch, tmp_path):
192 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
193
194 main(["init"])
195
196 config_text = (tmp_path / "config.yaml").read_text(encoding="utf-8")
197 env_text = (tmp_path / ".env").read_text(encoding="utf-8")
198 assert "name: local-model" in config_text
199 assert "base_url: http://localhost:8000/v1" in config_text
200 assert "api_key_env: OPENAI_API_KEY" in config_text
201 assert env_text.strip().endswith("OPENAI_API_KEY" + "=")
202
203
204def test_init_openrouter_defaults_to_generic_route(monkeypatch, tmp_path):
205 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
206
207 main(["init", "--openrouter"])
208
209 config_text = (tmp_path / "config.yaml").read_text(encoding="utf-8")
210 assert "name: openrouter/auto" in config_text
211 assert "base_url: https://openrouter.ai/api/v1" in config_text
212 assert "api_key_env: OPENROUTER_API_KEY" in config_text
213
214
215def test_shell_freeform_text_adds_operator_message(monkeypatch, tmp_path, capsys):
216 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
217 db = AgentDB(tmp_path / "state.db")
218 try:
219 job_id = db.create_job("Research topic", title="research")
220 finally:
221 db.close()
222
223 assert _run_shell_line("focus on real evidence sources, not irrelevant sources") is True
224
225 out = capsys.readouterr().out
226 db = AgentDB(tmp_path / "state.db")
227 try:
228 job = db.get_job(job_id)
229 assert "waiting for research" in out
230 assert (
231 job["metadata"]["operator_messages"][-1]["message"]
232 == "focus on real evidence sources, not irrelevant sources"
233 )
234 finally:
235 db.close()
236
237
238def test_main_no_args_enters_chat_first_home(monkeypatch, tmp_path, capsys):
239 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
240 _mark_test_model_ready()
241 db = AgentDB(tmp_path / "state.db")
242 try:
243 job_id = db.create_job("Research topic", title="research")
244 db.append_operator_message(job_id, "remember this visible note", source="test")
245 db.append_agent_update(job_id, "visible agent update", category="chat")
246 finally:
247 db.close()
248
249 def eof_input(_prompt):
250 raise EOFError
251
252 monkeypatch.setattr("builtins.input", eof_input)
253
254 main([])
255
256 out = capsys.readouterr().out
257 assert "WORKSPACE" in out
258 assert "Jobs" in out
259 assert "research" in out
260
261
262def test_main_no_args_with_no_jobs_requires_setup_frame(monkeypatch, tmp_path, capsys):
263 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
264
265 def eof_input(_prompt):
266 raise EOFError
267
268 monkeypatch.setattr("builtins.input", eof_input)
269
270 main([])
271
272 out = capsys.readouterr().out
273 assert "Nipux setup requires an interactive terminal." in out
274 assert "choose model, endpoint, and tool access" in out
275 assert "first job" not in out
276 assert "new create a long-running job" not in out
277 assert "doctor check local setup" not in out
278 assert "_ _" not in out
279 assert "nipux menu >" not in out
280
281
282def test_main_no_args_with_old_setup_marker_still_requires_model_verification(monkeypatch, tmp_path, capsys):
283 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
284 load_config().ensure_dirs()
285 from nipux_cli.cli_state import write_shell_state as write_shell_state
286
287 write_shell_state({"setup_completed": True})
288
289 main([])
290
291 out = capsys.readouterr().out
292 assert "Nipux setup requires an interactive terminal." in out
293 assert "No jobs are saved in this profile." not in out
294
295
296def test_main_no_args_after_setup_complete_does_not_reopen_setup(monkeypatch, tmp_path, capsys):
297 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
298 _mark_test_model_ready()
299
300 main([])
301
302 out = capsys.readouterr().out
303 assert "No jobs are saved in this profile." in out
304 assert "Nipux setup requires" not in out
305 assert "Begin setup" not in out
306
307
308def test_main_no_args_autoverifies_existing_model_config(monkeypatch, tmp_path, capsys):
309 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
310 monkeypatch.setenv("TEST_MODEL_KEY", "working-key")
311 tmp_path.mkdir(parents=True, exist_ok=True)
312 (tmp_path / "config.yaml").write_text(
313 "model:\n"
314 " name: provider/model\n"
315 " base_url: https://provider.example/v1\n"
316 " api_key_env: TEST_MODEL_KEY\n",
317 encoding="utf-8",
318 )
319
320 def fake_doctor(*, config, check_model):
321 assert check_model is True
322 assert config.model.model == "provider/model"
323 return [
324 Check("state_dir_writable", True, "ok"),
325 Check("sqlite", True, "ok"),
326 Check("model_config", True, "ok"),
327 Check("model_endpoint", True, "ok"),
328 Check("model_generation", True, "ok"),
329 ]
330
331 monkeypatch.setattr("nipux_cli.cli.run_doctor", fake_doctor)
332
333 main([])
334
335 out = capsys.readouterr().out
336 assert "No jobs are saved in this profile." in out
337 assert "Model setup is not verified." not in out
338 assert _model_setup_verified(load_config())
339
340
341def test_main_no_args_enters_setup_when_existing_model_config_fails(monkeypatch, tmp_path, capsys):
342 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
343 tmp_path.mkdir(parents=True, exist_ok=True)
344 (tmp_path / "config.yaml").write_text(
345 "model:\n"
346 " name: provider/model\n"
347 " base_url: https://provider.example/v1\n"
348 " api_key_env: TEST_MODEL_KEY\n",
349 encoding="utf-8",
350 )
351
352 def fake_doctor(*, config, check_model):
353 assert check_model is True
354 return [Check("model_generation", False, "provider rejected request")]
355
356 monkeypatch.setattr("nipux_cli.cli.run_doctor", fake_doctor)
357
358 main([])
359
360 out = capsys.readouterr().out
361 assert "Nipux setup requires an interactive terminal." in out
362 assert "No jobs are saved in this profile." not in out
363 assert not _model_setup_verified(load_config())
364
365
366def test_main_no_args_keeps_workspace_locked_after_completed_setup_if_provider_fails(monkeypatch, tmp_path, capsys):
367 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
368 tmp_path.mkdir(parents=True, exist_ok=True)
369 (tmp_path / "config.yaml").write_text(
370 "model:\n"
371 " name: provider/model\n"
372 " base_url: https://provider.example/v1\n"
373 " api_key_env: TEST_MODEL_KEY\n",
374 encoding="utf-8",
375 )
376 _write_shell_state({"setup_completed": True})
377
378 def fake_doctor(*, config, check_model):
379 assert check_model is True
380 return [Check("model_generation", False, "provider rejected request")]
381
382 monkeypatch.setattr("nipux_cli.cli.run_doctor", fake_doctor)
383
384 main([])
385
386 out = capsys.readouterr().out
387 assert "Nipux setup requires an interactive terminal." in out
388 assert "No jobs are saved in this profile." not in out
389 assert not _model_setup_verified(load_config())
390
391
392def test_first_run_refuses_job_before_model_is_verified(monkeypatch, tmp_path):
393 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
394
395 result = _handle_first_run_frame_line("new Build a durable workflow")
396
397 assert result[0] == "notice"
398 assert "Finish setup first" in result[1]
399 create_result = _handle_first_run_frame_line('create "Build a durable workflow"')
400 assert create_result[0] == "notice"
401 assert "Finish setup first" in create_result[1]
402 jobs_result = _handle_first_run_frame_line("jobs")
403 assert jobs_result[0] == "notice"
404 assert "Jobs are available after Doctor verifies" in jobs_result[1]
405 db = AgentDB(tmp_path / "state.db")
406 try:
407 assert db.list_jobs() == []
408 finally:
409 db.close()
410
411
412def test_doctor_check_model_marks_model_setup_verified(monkeypatch, tmp_path, capsys):
413 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
414
415 def fake_doctor(*, config, check_model):
416 assert check_model is True
417 return [
418 Check("state_dir_writable", True, "ok"),
419 Check("sqlite", True, "ok"),
420 Check("model_config", True, "ok"),
421 Check("model_endpoint", True, "ok"),
422 Check("model_generation", True, "ok"),
423 ]
424
425 monkeypatch.setattr("nipux_cli.cli.run_doctor", fake_doctor)
426 args = build_parser().parse_args(["doctor", "--check-model"])
427 args.func(args)
428
429 out = capsys.readouterr().out
430 assert "model_setup\tverified" in out
431 assert _model_setup_verified(load_config())
432
433
434def test_first_run_doctor_failure_shows_inline_fix_commands(monkeypatch, tmp_path):
435 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
436
437 def fake_doctor(*, config, check_model):
438 assert check_model is True
439 return [Check("model_endpoint", False, "connection refused")]
440
441 monkeypatch.setattr("nipux_cli.cli.run_doctor", fake_doctor)
442
443 lines = _verify_model_setup_from_first_run()
444 rendered = "\n".join(lines)
445
446 assert "Model setup is not ready" in rendered
447 assert "/base-url URL" in rendered
448 assert "/api-key KEY" in rendered
449 assert "/model MODEL" in rendered
450 assert "local server" in rendered
451
452
453def test_setting_change_clears_model_setup_verification(monkeypatch, tmp_path):
454 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
455 _mark_test_model_ready()
456
457 assert _model_setup_verified(load_config())
458 _inline_setting_notice("model.name", "provider/other-model")
459
460 assert not _model_setup_verified(load_config())
461
462
463def test_first_run_menu_blocks_job_creation_until_workspace_chat(monkeypatch, tmp_path, capsys):
464 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
465 _mark_test_model_ready()
466
467 assert _handle_first_run_menu_line("new Build a durable workflow") is True
468
469 out = capsys.readouterr().out
470 db = AgentDB(tmp_path / "state.db")
471 try:
472 assert db.list_jobs() == []
473 assert "Finish setup first" in out
474 finally:
475 db.close()
476
477
478def test_first_run_plain_greeting_does_not_create_job(monkeypatch, tmp_path, capsys):
479 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
480
481 assert _handle_first_run_menu_line("Hello") is True
482
483 out = capsys.readouterr().out
484 db = AgentDB(tmp_path / "state.db")
485 try:
486 assert db.list_jobs() == []
487 finally:
488 db.close()
489 assert "Setup must be completed" in out
490
491
492def test_first_run_frame_uses_full_screen_ui_not_banner(monkeypatch, tmp_path):
493 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
494
495 frame = _build_first_run_frame("", [], width=100, height=24)
496 lines = frame.splitlines()
497
498 assert "workspace" not in lines[0].lower()
499 assert "Endpoint" in lines[0]
500 assert "Enter the endpoint first" in frame
501 assert "Begin setup" not in frame
502 assert "Long-running work, installed in-session." not in frame
503 assert "Required: type an OpenAI-compatible endpoint URL" in frame
504 assert "controls on the right" not in frame
505 assert "Control" not in frame
506 assert "SETUP" not in frame
507 assert "│ SETUP" not in frame
508 assert "daemon stopped" not in frame
509 assert "FIRST RUN" not in frame
510 assert "nipux menu >" not in frame
511 assert "/shell" not in frame
512
513
514def test_first_run_frame_hides_command_popup_during_setup(monkeypatch, tmp_path):
515 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
516
517 frame = _build_first_run_frame("/", [], width=100, height=26)
518
519 assert "commands" not in frame
520 assert "/new" not in frame
521 assert "/jobs" not in frame
522 assert "/model" not in frame
523 assert "/settings" not in frame
524 assert "/shell" not in frame
525 assert "Enter the endpoint first" in frame
526
527
528def test_first_run_frame_walks_setup_screens(monkeypatch, tmp_path):
529 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
530
531 model = _build_first_run_frame("", [], width=100, height=26, view="model", selected=0)
532 endpoint = _build_first_run_frame("", [], width=100, height=26, view="endpoint", selected=0)
533 api = _build_first_run_frame("", [], width=100, height=26, view="api", selected=0)
534 access = _build_first_run_frame("", [], width=100, height=26, view="access", selected=0)
535 doctor = _build_first_run_frame("", [], width=100, height=28, view="doctor", selected=0)
536 invalid = _build_first_run_frame("", [], width=100, height=26, view="settings", selected=1)
537
538 assert "Enter the model id" in model
539 assert "Blank input is not accepted" in model
540 assert "Enter the endpoint first" in endpoint
541 assert "BASE URL" in endpoint
542 assert "Enter the API key" in api
543 assert "type skip" in api
544 assert "Choose tool access" in access
545 assert "Browser" in access
546 assert "CLI" in access
547 assert "Run checks" in doctor
548 assert "/base-url" in doctor
549 assert "/api-key" in doctor
550 assert "/model" in doctor
551 assert "Enter the model id" not in endpoint
552 assert "Enter the endpoint first" not in api
553 assert "Enter the endpoint first" in invalid
554 assert "/shell" not in model
555
556
557def test_first_run_frame_does_not_use_command_palette_for_setup(monkeypatch, tmp_path):
558 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
559
560 frame = _build_first_run_frame("/model", [], width=100, height=26)
561
562 assert "/model" in frame
563 assert "set model" not in frame
564 assert "Settings" not in frame
565 assert "Enter the endpoint first" in frame
566
567
568def test_settings_editor_persists_model_config(monkeypatch, tmp_path):
569 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
570
571 assert _save_config_field("model.name", "demo/model") == "demo/model"
572 assert _save_config_field("model.context_length", "4096") == 4096
573 assert _save_config_field("runtime.daily_digest_enabled", "false") is False
574
575 assert _config_field_value("model.name") == "demo/model"
576 assert _config_field_value("model.context_length") == 4096
577 assert _config_field_value("runtime.daily_digest_enabled") is False
578 text = (tmp_path / "config.yaml").read_text(encoding="utf-8")
579 assert "demo/model" in text
580 assert _inline_setting_notice("model.name", "") == "kept model.name"
581
582
583def test_slash_autocomplete_filters_commands():
584 assert _autocomplete_slash("/do", FIRST_RUN_SLASH_COMMANDS) == "/doctor "
585 assert _autocomplete_slash("/mo", FIRST_RUN_SLASH_COMMANDS) == "/model "
586 assert _autocomplete_slash("/sta", CHAT_SLASH_COMMANDS) == "/status "
587 assert _autocomplete_slash("/rest", CHAT_SLASH_COMMANDS) == "/restart "
588 assert _autocomplete_slash("/step", CHAT_SLASH_COMMANDS) == "/step-limit "
589 assert _autocomplete_slash("/out", FIRST_RUN_SLASH_COMMANDS) == "/output-chars "
590 assert _cycle_slash("/", CHAT_SLASH_COMMANDS, direction=1) == "/new"
591 assert _cycle_slash("/", CHAT_SLASH_COMMANDS, direction=-1) == "/exit"
592 assert _cycle_slash("/work ", CHAT_SLASH_COMMANDS, direction=1) == "/work "
593 assert _cycle_slash("/run", CHAT_SLASH_COMMANDS, direction=1) == "/jobs"
594 assert _cycle_slash("/", FIRST_RUN_SLASH_COMMANDS, direction=1) == "/model"
595 assert _cycle_slash("/", FIRST_RUN_SLASH_COMMANDS, direction=-1) == "/exit"
596 assert _cycle_slash("/model", FIRST_RUN_SLASH_COMMANDS, direction=1) == "/base-url"
597 assert _cycle_slash("/model", FIRST_RUN_SLASH_COMMANDS, direction=-1) == "/exit"
598 assert _cycle_slash("/out", CHAT_SLASH_COMMANDS, direction=1) == "/outcomes"
599 assert _cycle_slash("/out", CHAT_SLASH_COMMANDS, direction=-1) == "/output-cost"
600 assert _slash_completion_for_submit("/", CHAT_SLASH_COMMANDS) == ("/new ", False)
601 assert _slash_completion_for_submit("/", FIRST_RUN_SLASH_COMMANDS) == ("/model", True)
602 assert _slash_completion_for_submit("/mo", FIRST_RUN_SLASH_COMMANDS) == ("/model", True)
603 assert _slash_completion_for_submit("/model", FIRST_RUN_SLASH_COMMANDS) == ("/model", True)
604 assert _slash_completion_for_submit("/new ", CHAT_SLASH_COMMANDS) == ("/new ", False)
605 assert _slash_completion_for_submit("/new research agents", CHAT_SLASH_COMMANDS) == ("/new research agents", True)
606 assert _slash_completion_for_submit("/model ", CHAT_SLASH_COMMANDS) == ("/model ", True)
607 assert _slash_completion_for_submit("/j", CHAT_SLASH_COMMANDS) == ("/jobs", True)
608 assert _slash_completion_for_submit("/set", CHAT_SLASH_COMMANDS) == ("/settings", True)
609 assert _slash_completion_for_submit("/model", CHAT_SLASH_COMMANDS) == ("/model", True)
610 assert _slash_completion_for_submit("/mo", CHAT_SLASH_COMMANDS) == ("/model", True)
611 assert _slash_completion_for_submit("/run", CHAT_SLASH_COMMANDS) == ("/run", True)
612 assert _slash_completion_for_submit("/settings", CHAT_SLASH_COMMANDS) == ("/settings", True)
613 assert _slash_completion_for_submit("/model demo/model", CHAT_SLASH_COMMANDS) == ("/model demo/model", True)
614 assert _autocomplete_slash("plain text", CHAT_SLASH_COMMANDS) == "plain text"
615 lines = _slash_suggestion_lines("/art", CHAT_SLASH_COMMANDS, width=80)
616 text = "\n".join(lines)
617 assert "/artifacts" in text
618 assert "/artifact" in text
619 assert "/run" not in text
620 hint_text = "\n".join(_slash_suggestion_lines("/model ", CHAT_SLASH_COMMANDS, width=80))
621 assert "/model" in hint_text
622 assert "MODEL" in hint_text
623 partial_hint_text = "\n".join(_slash_suggestion_lines("/mo", CHAT_SLASH_COMMANDS, width=80))
624 assert "/model MODEL" in partial_hint_text
625 assert "↑↓ moves" in partial_hint_text
626 full_palette_text = "\n".join(_slash_suggestion_lines("/", CHAT_SLASH_COMMANDS, width=80, limit=5))
627 assert "enter selects" in full_palette_text
628 assert "/new OBJECTIVE" in full_palette_text
629 assert "/run" in full_palette_text
630 assert "/settings" in full_palette_text
631 assert "type OBJECTIVE" in "\n".join(_slash_suggestion_lines("/new ", CHAT_SLASH_COMMANDS, width=80))
632 assert "/shell" not in "\n".join(_slash_suggestion_lines("/", CHAT_SLASH_COMMANDS, width=80, limit=20))
633 assert "/restart" in "\n".join(_slash_suggestion_lines("/re", CHAT_SLASH_COMMANDS, width=80, limit=20))
634
635
636def test_terminal_escape_decodes_arrows_and_mouse_click():
637 assert _decode_terminal_escape("\x1b[A") == ("up", None)
638 assert _decode_terminal_escape("\x1b[B") == ("down", None)
639 assert _decode_terminal_escape("\x1b[C") == ("right", None)
640 assert _decode_terminal_escape("\x1b[D") == ("left", None)
641 assert _decode_terminal_escape("\x1bOB") == ("down", None)
642 assert _decode_terminal_escape("\x1b[1;2B") == ("down", None)
643 assert _decode_terminal_escape("\x1b[<0;88;12M") == ("click", (88, 12))
644 assert _decode_terminal_escape("\x1b[M !!") == ("click", (1, 1))
645
646
647def test_first_run_click_maps_right_pane_actions(monkeypatch):
648 monkeypatch.setattr("shutil.get_terminal_size", lambda fallback=(100, 30): (100, 30))
649
650 assert _first_run_click_action(5, 15, view="endpoint") is None
651 assert _first_run_click_action(5, 15, view="access") == 0
652 assert _first_run_click_action(25, 15, view="access") == 1
653 assert _first_run_click_action(1, 4, view="access") is None
654 assert _first_run_click_action(5, 1, view="model") is None
655
656
657def test_first_run_arrow_navigation_changes_setup_screens():
658 assert _directional_first_run_action(
659 [
660 ("view:model", "Begin setup", "walk through setup"),
661 ("doctor", "Doctor", "check"),
662 ],
663 direction=1,
664 ) == "view:model"
665 assert _directional_first_run_action(
666 [
667 ("edit:model.name", "Edit model", "set model"),
668 ("view:connector", "Continue", "choose connector"),
669 ("view:start", "Back", "intro"),
670 ],
671 direction=1,
672 ) == "view:connector"
673 assert _directional_first_run_action(
674 [
675 ("edit:model.name", "Edit model", "set model"),
676 ("view:connector", "Continue", "choose connector"),
677 ("view:start", "Back", "intro"),
678 ],
679 direction=-1,
680 ) == "view:start"
681
682
683def test_frame_next_job_cycles_jobs():
684 snapshot = {"jobs": [{"id": "one"}, {"id": "two"}, {"id": "three"}]}
685
686 assert _frame_next_job_id(snapshot, "one", direction=1) == "two"
687 assert _frame_next_job_id(snapshot, "one", direction=-1) == "three"
688 assert _frame_next_job_id(snapshot, "missing", direction=1) == "two"
689
690
691def test_frame_refresh_slows_background_updates_while_typing():
692 assert _frame_refresh_interval("") < _frame_refresh_interval("drafting a message")
693
694
695def test_first_run_empty_submit_without_actions_does_not_crash():
696 deps = _FirstRunRuntimeDeps(
697 render_frame=lambda _buffer, _notices, _selected, _view, _editing, _previous: "",
698 actions=lambda _view: [],
699 handle_action=lambda _action: ("notice", "unused"),
700 handle_line=lambda _line: ("notice", "unused"),
701 click_action=lambda _x, _y, _view: None,
702 )
703
704 assert _submit_first_run_line("", selected=0, view="empty", deps=deps) == (
705 "notice",
706 "This setup step requires an explicit value.",
707 )
708
709
710def test_first_run_required_edit_cancel_and_clear_stay_on_same_field():
711 notices: list[str] = []
712
713 buffer, editing_field, should_exit = _handle_first_run_edit_input(
714 "\x15",
715 buffer="not-a-url",
716 editing_field="model.base_url",
717 notices=notices,
718 stdin_fd=0,
719 )
720
721 assert buffer == ""
722 assert editing_field == "model.base_url"
723 assert should_exit is False
724
725 buffer, editing_field, should_exit = _handle_first_run_edit_input(
726 "\x03",
727 buffer="partial",
728 editing_field="model.base_url",
729 notices=notices,
730 stdin_fd=0,
731 )
732
733 assert buffer == ""
734 assert editing_field == "model.base_url"
735 assert should_exit is False
736 assert "cancelled edit" in "\n".join(notices)
737
738
739def test_chat_settings_edit_supports_ctrl_u_clear():
740 buffer, editing_field, should_exit = _handle_chat_edit_input(
741 "\x15",
742 buffer="wrong-model",
743 editing_field="model.name",
744 notices=[],
745 stdin_fd=0,
746 )
747
748 assert buffer == ""
749 assert editing_field == "model.name"
750 assert should_exit is False
751
752
753def test_first_run_render_failure_uses_safe_mode(capsys):
754 deps = _FirstRunRuntimeDeps(
755 render_frame=lambda *_args: (_ for _ in ()).throw(RuntimeError("bad frame")),
756 actions=lambda _view: [],
757 handle_action=lambda _action: ("notice", "unused"),
758 handle_line=lambda _line: ("notice", "unused"),
759 click_action=lambda _x, _y, _view: None,
760 )
761 notices: list[str] = []
762
763 frame = _safe_first_run_render_frame(
764 deps,
765 buffer="hello",
766 notices=notices,
767 selected=0,
768 view="start",
769 editing_field=None,
770 previous_frame="",
771 )
772
773 assert "safe mode" in frame
774 assert "render failed" in "\n".join(notices)
775 assert "bad frame" in capsys.readouterr().out
776
777
778def test_chat_submit_failure_stays_in_frame():
779 snapshot = {"job_id": "job_demo", "job": {"id": "job_demo", "title": "demo"}, "jobs": []}
780 async_messages: queue.Queue[str] = queue.Queue()
781
782 deps = _ChatFrameDeps(
783 load_snapshot=lambda _job_id, _history_limit: snapshot,
784 render_frame=lambda *_args: "",
785 handle_chat_message=lambda _job_id, _line: (_ for _ in ()).throw(RuntimeError("model blew up")),
786 capture_chat_command=lambda _job_id, _line: (True, ""),
787 write_shell_state=lambda _state: None,
788 is_plain_chat_line=lambda _line: True,
789 page_click=lambda _x, _y, _right_view: None,
790 )
791
792 keep_running, _snapshot, job_id, notices, right_view, modal = _handle_chat_submit(
793 "hello",
794 job_id="job_demo",
795 history_limit=12,
796 snapshot=snapshot,
797 notices=[],
798 right_view="status",
799 modal_view=None,
800 deps=deps,
801 async_messages=async_messages,
802 )
803
804 assert keep_running is True
805 assert job_id == "job_demo"
806 assert right_view == "status"
807 assert modal is None
808 assert _THINKING_NOTICE in notices
809 assert "> hello" not in "\n".join(notices)
810 queued = async_messages.get(timeout=1)
811 assert "message failed" in queued
812 assert "model blew up" in queued
813 async_messages.put(queued)
814 assert _drain_chat_async_notices(async_messages, notices) is True
815 assert _THINKING_NOTICE not in notices
816 assert "message failed" in "\n".join(notices)
817
818
819def test_chat_submit_plain_message_returns_without_waiting_for_model():
820 snapshot = {"job_id": "job_demo", "job": {"id": "job_demo", "title": "demo"}, "jobs": []}
821 async_messages: queue.Queue[str] = queue.Queue()
822
823 def slow_chat(_job_id, _line):
824 time.sleep(0.3)
825 return True, "done later"
826
827 deps = _ChatFrameDeps(
828 load_snapshot=lambda _job_id, _history_limit: snapshot,
829 render_frame=lambda *_args: "",
830 handle_chat_message=slow_chat,
831 capture_chat_command=lambda _job_id, _line: (True, ""),
832 write_shell_state=lambda _state: None,
833 is_plain_chat_line=lambda _line: True,
834 page_click=lambda _x, _y, _right_view: None,
835 )
836
837 started = time.monotonic()
838 keep_running, _snapshot, _job_id, notices, _right_view, _modal = _handle_chat_submit(
839 "hello",
840 job_id="job_demo",
841 history_limit=12,
842 snapshot=snapshot,
843 notices=[],
844 right_view="status",
845 modal_view=None,
846 deps=deps,
847 async_messages=async_messages,
848 )
849
850 assert keep_running is True
851 assert time.monotonic() - started < 0.1
852 assert _THINKING_NOTICE in notices
853 assert "> hello" not in "\n".join(notices)
854 assert async_messages.get(timeout=1) == "__refresh__"
855
856
857def test_chat_submit_plain_message_renders_thinking_notice_without_echoing_message():
858 snapshot = {"job_id": "job_demo", "job": {"id": "job_demo", "title": "demo"}, "jobs": []}
859 async_messages: queue.Queue[str] = queue.Queue()
860
861 deps = _ChatFrameDeps(
862 load_snapshot=lambda _job_id, _history_limit: snapshot,
863 render_frame=lambda *_args: "",
864 handle_chat_message=lambda _job_id, _line: (True, "done"),
865 capture_chat_command=lambda _job_id, _line: (True, ""),
866 write_shell_state=lambda _state: None,
867 is_plain_chat_line=lambda _line: True,
868 page_click=lambda _x, _y, _right_view: None,
869 )
870
871 _keep_running, _snapshot, _job_id, notices, _right_view, _modal = _handle_chat_submit(
872 "Hello",
873 job_id="job_demo",
874 history_limit=12,
875 snapshot=snapshot,
876 notices=[],
877 right_view="status",
878 modal_view=None,
879 deps=deps,
880 async_messages=async_messages,
881 )
882
883 rendered_notices = _display_chat_notices(notices)
884 lines = chat_pane_lines([], rendered_notices, width=60, rows=4)
885 rendered = "\n".join(lines)
886 assert "thinking" in rendered
887 assert "Hello" not in rendered
888 assert "waiting for model" not in rendered
889
890
891def test_chat_submit_waiting_command_output_becomes_animation():
892 snapshot = {"job_id": "job_demo", "job": {"id": "job_demo", "title": "demo"}, "jobs": []}
893
894 deps = _ChatFrameDeps(
895 load_snapshot=lambda _job_id, _history_limit: snapshot,
896 render_frame=lambda *_args: "",
897 handle_chat_message=lambda _job_id, _line: (True, ""),
898 capture_chat_command=lambda _job_id, _line: (
899 True,
900 "waiting for demo: what has it done so far?\nWaiting for the next worker step.",
901 ),
902 write_shell_state=lambda _state: None,
903 is_plain_chat_line=lambda _line: False,
904 page_click=lambda _x, _y, _right_view: None,
905 )
906
907 _keep_running, _snapshot, _job_id, notices, _right_view, _modal = _handle_chat_submit(
908 "/pause",
909 job_id="job_demo",
910 history_limit=12,
911 snapshot=snapshot,
912 notices=[],
913 right_view="status",
914 modal_view=None,
915 deps=deps,
916 )
917
918 lines = chat_pane_lines([], _display_chat_notices(notices), width=80, rows=6)
919 rendered = "\n".join(lines)
920 assert "waiting" in rendered
921 assert "waiting for demo" not in rendered
922 assert "Waiting for the next worker step" not in rendered
923 assert "NIPUX" not in rendered
924
925
926def test_chat_submit_new_refreshes_focused_job_from_shell_state():
927 old_snapshot = {"job_id": "old", "job": {"id": "old", "title": "old"}, "jobs": []}
928 new_snapshot = {"job_id": "new", "job": {"id": "new", "title": "new"}, "jobs": []}
929 loaded: list[str] = []
930
931 def load_snapshot(job_id, _history_limit):
932 loaded.append(job_id)
933 return new_snapshot if job_id == "" else old_snapshot
934
935 deps = _ChatFrameDeps(
936 load_snapshot=load_snapshot,
937 render_frame=lambda *_args: "",
938 handle_chat_message=lambda _job_id, _line: (True, ""),
939 capture_chat_command=lambda _job_id, _line: (True, "created new\nfocus set to new"),
940 write_shell_state=lambda _state: None,
941 is_plain_chat_line=lambda _line: False,
942 page_click=lambda _x, _y, _right_view: None,
943 )
944
945 keep_running, _snapshot, job_id, notices, _right_view, _modal = _handle_chat_submit(
946 "/new Build a durable workflow",
947 job_id="old",
948 history_limit=12,
949 snapshot=old_snapshot,
950 notices=[],
951 right_view="status",
952 modal_view=None,
953 deps=deps,
954 )
955
956 assert keep_running is True
957 assert job_id == "new"
958 assert loaded == [""]
959 assert "focus set to new" in "\n".join(notices)
960
961
962def test_workspace_chat_submit_new_keeps_workspace_chat_left_pane():
963 old_snapshot = {"job_id": WORKSPACE_CHAT_ID, "job": {"id": WORKSPACE_CHAT_ID, "title": "Nipux"}, "jobs": []}
964 workspace_snapshot = {
965 "job_id": WORKSPACE_CHAT_ID,
966 "job": {"id": WORKSPACE_CHAT_ID, "title": "Nipux"},
967 "right_job": {"id": "new", "title": "new worker"},
968 "jobs": [],
969 }
970 loaded: list[str] = []
971
972 def load_snapshot(job_id, _history_limit):
973 loaded.append(job_id)
974 return workspace_snapshot
975
976 deps = _ChatFrameDeps(
977 load_snapshot=load_snapshot,
978 render_frame=lambda *_args: "",
979 handle_chat_message=lambda _job_id, _line: (True, ""),
980 capture_chat_command=lambda _job_id, _line: (True, "created new\nfocus set to new"),
981 write_shell_state=lambda _state: None,
982 is_plain_chat_line=lambda _line: False,
983 page_click=lambda _x, _y, _right_view: None,
984 )
985
986 keep_running, snapshot, job_id, notices, _right_view, _modal = _handle_chat_submit(
987 "/new Build a durable workflow",
988 job_id=WORKSPACE_CHAT_ID,
989 history_limit=12,
990 snapshot=old_snapshot,
991 notices=[],
992 right_view="updates",
993 modal_view=None,
994 deps=deps,
995 )
996
997 assert keep_running is True
998 assert job_id == WORKSPACE_CHAT_ID
999 assert snapshot["right_job"]["title"] == "new worker"
1000 assert loaded == [WORKSPACE_CHAT_ID]
1001 assert "focus set to new" in "\n".join(notices)
1002
1003
1004def test_chat_render_failure_uses_safe_mode(capsys):
1005 snapshot = {"job_id": "job_demo", "job": {"id": "job_demo", "title": "demo"}, "jobs": []}
1006 deps = _ChatFrameDeps(
1007 load_snapshot=lambda _job_id, _history_limit: snapshot,
1008 render_frame=lambda *_args: (_ for _ in ()).throw(RuntimeError("bad chat frame")),
1009 handle_chat_message=lambda _job_id, _line: (True, ""),
1010 capture_chat_command=lambda _job_id, _line: (True, ""),
1011 write_shell_state=lambda _state: None,
1012 is_plain_chat_line=lambda _line: True,
1013 page_click=lambda _x, _y, _right_view: None,
1014 )
1015 notices: list[str] = []
1016
1017 frame = _safe_chat_render_frame(
1018 deps,
1019 snapshot=snapshot,
1020 buffer="hello",
1021 notices=notices,
1022 right_view="status",
1023 selected_control=0,
1024 editing_field=None,
1025 modal_view=None,
1026 previous_frame="",
1027 )
1028
1029 assert "safe mode" in frame
1030 assert "render failed" in "\n".join(notices)
1031 assert "bad chat frame" in capsys.readouterr().out
1032
1033
1034def test_chat_help_has_config_slash_commands_without_settings_page(monkeypatch, tmp_path, capsys):
1035 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
1036 db = AgentDB(tmp_path / "state.db")
1037 try:
1038 job_id = db.create_job("Research topic", title="research")
1039 finally:
1040 db.close()
1041
1042 assert _chat_handle_line(job_id, "/help") is True
1043
1044 out = capsys.readouterr().out
1045 assert "Core workflow:" in out
1046 assert "/new OBJECTIVE create a job and start work" in out
1047 assert "/run resume/start the focused job" in out
1048 assert "/activity tool calls" in out
1049 assert "/settings" in out
1050 assert "/usage" in out
1051 assert "/config" in out
1052 assert "/outcomes" in out
1053 assert "/model MODEL" in out
1054 assert "/api-key KEY" in out
1055 assert "/timeout SECONDS" in out
1056 assert "/browser true|false" in out
1057 assert "/cli-access true|false" in out
1058 assert "/home PATH" in out
1059 assert "/digest-time HH:MM" in out
1060 assert "/shell" not in out
1061
1062
1063def test_chat_slash_palette_matches_public_chat_commands():
1064 palette = {command for command, _description in CHAT_SLASH_COMMANDS}
1065 assert len(palette) == len(CHAT_SLASH_COMMANDS)
1066 advertised = {
1067 "/jobs",
1068 "/focus",
1069 "/switch",
1070 "/new",
1071 "/delete",
1072 "/history",
1073 "/events",
1074 "/activity",
1075 "/outputs",
1076 "/updates",
1077 "/outcomes",
1078 "/status",
1079 "/usage",
1080 "/config",
1081 "/settings",
1082 "/health",
1083 "/help",
1084 "/artifacts",
1085 "/artifact",
1086 "/findings",
1087 "/tasks",
1088 "/roadmap",
1089 "/experiments",
1090 "/sources",
1091 "/memory",
1092 "/metrics",
1093 "/lessons",
1094 "/model",
1095 "/base-url",
1096 "/api-key",
1097 "/api-key-env",
1098 "/context",
1099 "/input-cost",
1100 "/output-cost",
1101 "/timeout",
1102 "/browser",
1103 "/web",
1104 "/cli-access",
1105 "/file-access",
1106 "/home",
1107 "/step-limit",
1108 "/output-chars",
1109 "/daily-digest",
1110 "/digest-time",
1111 "/doctor",
1112 "/run",
1113 "/start",
1114 "/restart",
1115 "/work",
1116 "/work-verbose",
1117 "/stop",
1118 "/pause",
1119 "/resume",
1120 "/cancel",
1121 "/learn",
1122 "/note",
1123 "/follow",
1124 "/digest",
1125 "/clear",
1126 "/exit",
1127 }
1128
1129 assert advertised <= palette
1130 assert "/shell" not in palette
1131
1132
1133def test_workspace_help_is_minimal_and_actionable(monkeypatch, tmp_path):
1134 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
1135 _mark_test_model_ready()
1136
1137 keep_running, output = _capture_chat_command(WORKSPACE_CHAT_ID, "/help")
1138
1139 assert keep_running is True
1140 assert "Create: type a goal" in output
1141 assert "/new OBJECTIVE" in output
1142 assert "/settings" in output
1143 assert "Navigate:" in output
1144 assert "Common" not in output
1145 assert "/shell" not in output
1146
1147
1148def test_first_run_slash_palette_matches_setup_commands():
1149 palette = {command for command, _description in FIRST_RUN_SLASH_COMMANDS}
1150 assert len(palette) == len(FIRST_RUN_SLASH_COMMANDS)
1151
1152 advertised = {
1153 "/model",
1154 "/base-url",
1155 "/api-key",
1156 "/api-key-env",
1157 "/config",
1158 "/context",
1159 "/input-cost",
1160 "/output-cost",
1161 "/timeout",
1162 "/browser",
1163 "/web",
1164 "/cli-access",
1165 "/file-access",
1166 "/home",
1167 "/step-limit",
1168 "/output-chars",
1169 "/daily-digest",
1170 "/digest-time",
1171 "/doctor",
1172 "/init",
1173 "/help",
1174 "/clear",
1175 "/exit",
1176 }
1177
1178 assert advertised <= palette
1179 assert "/new" not in palette
1180 assert "/jobs" not in palette
1181 assert "/shell" not in palette
1182 assert "/settings" not in palette
1183
1184
1185def test_chat_settings_slash_commands_persist_config(monkeypatch, tmp_path, capsys):
1186 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
1187 monkeypatch.setenv("NIPUX_TEST_KEY", "")
1188 db = AgentDB(tmp_path / "state.db")
1189 try:
1190 job_id = db.create_job("Research topic", title="research")
1191 finally:
1192 db.close()
1193
1194 assert _chat_handle_line(job_id, "/model provider/model") is True
1195 assert _chat_handle_line(job_id, "/base-url https://example.com/v1") is True
1196 assert _chat_handle_line(job_id, "/context 8192") is True
1197 assert _chat_handle_line(job_id, "/input-cost 0.10") is True
1198 assert _chat_handle_line(job_id, "/output-cost 0.20") is True
1199 assert _chat_handle_line(job_id, "/max-cost 15") is True
1200 assert _chat_handle_line(job_id, "/timeout 45") is True
1201 assert _chat_handle_line(job_id, "/browser false") is True
1202 assert _chat_handle_line(job_id, "/web false") is True
1203 assert _chat_handle_line(job_id, "/cli-access false") is True
1204 assert _chat_handle_line(job_id, "/file-access false") is True
1205 assert _chat_handle_line(job_id, "/step-limit 90") is True
1206 assert _chat_handle_line(job_id, "/output-chars 4096") is True
1207 assert _chat_handle_line(job_id, "/daily-digest false") is True
1208 assert _chat_handle_line(job_id, "/digest-time 08:30") is True
1209 assert _chat_handle_line(job_id, "/api-key-env NIPUX_TEST_KEY") is True
1210 assert _chat_handle_line(job_id, "/api-key sk-test-value") is True
1211 out = capsys.readouterr().out
1212
1213 assert "saved model.name = provider/model" in out
1214 assert "saved model.base_url = https://example.com/v1" in out
1215 assert "saved model.context_length = 8192" in out
1216 assert "saved model.input_cost_per_million = 0.1" in out
1217 assert "saved model.output_cost_per_million = 0.2" in out
1218 assert "saved runtime.max_job_cost_usd = 15.0" in out
1219 assert "saved model.request_timeout_seconds = 45.0" in out
1220 assert "saved tools.browser = False" in out
1221 assert "saved tools.web = False" in out
1222 assert "saved tools.shell = False" in out
1223 assert "saved tools.files = False" in out
1224 assert "saved runtime.max_step_seconds = 90" in out
1225 assert "saved runtime.artifact_inline_char_limit = 4096" in out
1226 assert "saved runtime.daily_digest_enabled = False" in out
1227 assert "saved runtime.daily_digest_time = 08:30" in out
1228 assert "saved model.api_key_env = NIPUX_TEST_KEY" in out
1229 assert "saved NIPUX_TEST_KEY" in out
1230 assert "sk-test-value" not in out
1231 assert _mode(tmp_path / "config.yaml") == 0o600
1232 assert _mode(tmp_path / ".env") == 0o600
1233 assert _config_field_value("model.name") == "provider/model"
1234 assert _config_field_value("model.base_url") == "https://example.com/v1"
1235 assert _config_field_value("model.context_length") == 8192
1236 assert _config_field_value("model.input_cost_per_million") == 0.1
1237 assert _config_field_value("model.output_cost_per_million") == 0.2
1238 assert _config_field_value("runtime.max_job_cost_usd") == 15.0
1239 assert _config_field_value("model.request_timeout_seconds") == 45.0
1240 assert _config_field_value("tools.browser") is False
1241 assert _config_field_value("tools.web") is False
1242 assert _config_field_value("tools.shell") is False
1243 assert _config_field_value("tools.files") is False
1244 assert _config_field_value("runtime.max_step_seconds") == 90
1245 assert _config_field_value("runtime.artifact_inline_char_limit") == 4096
1246 assert _config_field_value("runtime.daily_digest_enabled") is False
1247 assert _config_field_value("runtime.daily_digest_time") == "08:30"
1248 assert "NIPUX_TEST_KEY=sk-test-value" in (tmp_path / ".env").read_text(encoding="utf-8")
1249
1250
1251def test_chat_init_slash_command_does_not_crash(monkeypatch, tmp_path, capsys):
1252 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
1253 db = AgentDB(tmp_path / "state.db")
1254 try:
1255 job_id = db.create_job("Research topic", title="research")
1256 finally:
1257 db.close()
1258
1259 assert _chat_handle_line(job_id, "/init") is True
1260
1261 out = capsys.readouterr().out
1262 assert "Wrote" in out
1263 assert (tmp_path / "config.yaml").exists()
1264 assert (tmp_path / ".env").exists()
1265
1266
1267def test_chat_config_slash_command_summarizes_runtime_without_secret(monkeypatch, tmp_path, capsys):
1268 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
1269 monkeypatch.setenv("NIPUX_TEST_KEY", "sk-test-value")
1270 db = AgentDB(tmp_path / "state.db")
1271 try:
1272 job_id = db.create_job("Research topic", title="research")
1273 finally:
1274 db.close()
1275 (tmp_path / "config.yaml").write_text(
1276 """
1277model:
1278 name: provider/model
1279 base_url: https://example.com/v1
1280 api_key_env: NIPUX_TEST_KEY
1281 context_length: 8192
1282 request_timeout_seconds: 45
1283 input_cost_per_million: 0.1
1284 output_cost_per_million: 0.2
1285runtime:
1286 max_step_seconds: 90
1287 max_job_cost_usd: 15
1288 artifact_inline_char_limit: 4096
1289 daily_digest_enabled: false
1290 daily_digest_time: "08:30"
1291""",
1292 encoding="utf-8",
1293 )
1294
1295 assert _chat_handle_line(job_id, "/config") is True
1296
1297 out = capsys.readouterr().out
1298 assert "config" in out
1299 assert "model: provider/model" in out
1300 assert "endpoint: https://example.com/v1" in out
1301 assert "key: set (NIPUX_TEST_KEY)" in out
1302 assert "context: 8192" in out
1303 assert "cost rates: input $0.1 / output $0.2 per 1M tokens" in out
1304 assert "job cost limit: $15" in out
1305 assert "sk-test-value" not in out
1306
1307
1308def test_chat_usage_slash_command_reports_tokens(monkeypatch, tmp_path, capsys):
1309 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
1310 db = AgentDB(tmp_path / "state.db")
1311 try:
1312 job_id = db.create_job("Research topic", title="research")
1313 db.append_event(
1314 job_id,
1315 event_type="loop",
1316 title="message_end",
1317 metadata={
1318 "usage": {
1319 "prompt_tokens": 1000,
1320 "completion_tokens": 250,
1321 "total_tokens": 1250,
1322 "cost": 0.0042,
1323 }
1324 },
1325 )
1326 finally:
1327 db.close()
1328
1329 assert _chat_handle_line(job_id, "/usage") is True
1330
1331 out = capsys.readouterr().out
1332 assert "usage research" in out
1333 assert "tokens: total=1.2K prompt=1.0K output=250" in out
1334 assert "cost=$0.0042" in out
1335
1336
1337def test_chat_usage_estimates_cost_from_configured_rates(monkeypatch, tmp_path, capsys):
1338 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
1339 (tmp_path / "config.yaml").write_text(
1340 """
1341model:
1342 name: provider/model
1343 base_url: https://example.com/v1
1344 input_cost_per_million: 1.0
1345 output_cost_per_million: 2.0
1346""",
1347 encoding="utf-8",
1348 )
1349 db = AgentDB(tmp_path / "state.db")
1350 try:
1351 job_id = db.create_job("Research topic", title="research")
1352 db.append_event(
1353 job_id,
1354 event_type="loop",
1355 title="message_end",
1356 metadata={
1357 "usage": {
1358 "prompt_tokens": 1000,
1359 "completion_tokens": 500,
1360 "total_tokens": 1500,
1361 "estimated": True,
1362 }
1363 },
1364 )
1365 finally:
1366 db.close()
1367
1368 assert _chat_handle_line(job_id, "/usage") is True
1369
1370 out = capsys.readouterr().out
1371 assert "cost=~$0.0020" in out
1372
1373
1374def test_chat_usage_shows_configured_job_cost_limit(monkeypatch, tmp_path, capsys):
1375 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
1376 (tmp_path / "config.yaml").write_text(
1377 """
1378model:
1379 name: provider/model
1380 base_url: https://example.com/v1
1381runtime:
1382 max_job_cost_usd: 5
1383""",
1384 encoding="utf-8",
1385 )
1386 db = AgentDB(tmp_path / "state.db")
1387 try:
1388 job_id = db.create_job("Research topic", title="research")
1389 db.append_event(
1390 job_id,
1391 event_type="loop",
1392 title="message_end",
1393 metadata={"usage": {"prompt_tokens": 1000, "completion_tokens": 500, "total_tokens": 1500, "cost": 1.25}},
1394 )
1395 finally:
1396 db.close()
1397
1398 assert _chat_handle_line(job_id, "/usage") is True
1399
1400 out = capsys.readouterr().out
1401 assert "limit: max job cost=$5 remaining=$3.7500" in out
1402
1403
1404def test_first_run_settings_slash_commands_persist_config(monkeypatch, tmp_path):
1405 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
1406
1407 action, payload = _handle_first_run_frame_line("/model provider/model")
1408
1409 assert action == "notice"
1410 assert isinstance(payload, list)
1411 assert any("saved model.name = provider/model" in line for line in payload)
1412 assert _config_field_value("model.name") == "provider/model"
1413
1414
1415def test_first_run_local_connector_action_sets_generic_local_endpoint(monkeypatch, tmp_path):
1416 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
1417
1418 action, payload = _handle_first_run_action("preset:local")
1419
1420 assert action == "notice"
1421 assert isinstance(payload, list)
1422 assert any("saved model.name = local-model" in line for line in payload)
1423 assert any("saved model.base_url = http://localhost:8000/v1" in line for line in payload)
1424 assert any("then run Doctor" in line for line in payload)
1425 assert _config_field_value("model.name") == "local-model"
1426 assert _config_field_value("model.base_url") == "http://localhost:8000/v1"
1427
1428
1429def test_first_run_access_action_toggles_generic_tools(monkeypatch, tmp_path):
1430 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
1431
1432 action, payload = _handle_first_run_action("toggle:tools.shell")
1433
1434 assert action == "notice"
1435 assert isinstance(payload, list)
1436 assert any("saved tools.shell = False" in line for line in payload)
1437 assert _config_field_value("tools.shell") is False
1438
1439
1440def test_first_run_doctor_success_opens_workspace_chat(monkeypatch, tmp_path):
1441 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
1442
1443 def fake_verify():
1444 _mark_test_model_ready()
1445 return ["ok model_setup verified"]
1446
1447 monkeypatch.setattr("nipux_cli.cli._verify_model_setup_from_first_run", fake_verify)
1448
1449 action, payload = _handle_first_run_action("doctor")
1450
1451 assert action == "open"
1452 assert payload == WORKSPACE_CHAT_ID
1453
1454
1455def test_first_run_open_workspace_action_requires_verified_model(monkeypatch, tmp_path):
1456 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
1457
1458 action, payload = _handle_first_run_action("open_workspace")
1459
1460 assert action == "notice"
1461 assert "Run Doctor first" in str(payload)
1462
1463
1464def test_first_run_open_workspace_action_opens_after_verified_model(monkeypatch, tmp_path):
1465 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
1466 _mark_test_model_ready()
1467
1468 action, payload = _handle_first_run_action("open_workspace")
1469
1470 assert action == "open"
1471 assert payload == WORKSPACE_CHAT_ID
1472
1473
1474def test_workspace_frame_snapshot_exists_without_jobs(monkeypatch, tmp_path):
1475 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
1476 _mark_test_model_ready()
1477
1478 snapshot = _load_frame_snapshot(WORKSPACE_CHAT_ID, history_limit=4)
1479
1480 assert snapshot["job_id"] == WORKSPACE_CHAT_ID
1481 assert snapshot["job"]["kind"] == "workspace"
1482 assert snapshot["jobs"] == []
1483
1484 frame = _build_chat_frame(snapshot, "", [], width=118, height=28)
1485 assert "Type a goal in plain English to start a worker" in frame
1486 assert "Type a goal to create the first worker" in frame
1487 assert "Enter sends" not in frame
1488
1489
1490def test_workspace_frame_right_pane_tracks_focused_worker(monkeypatch, tmp_path):
1491 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
1492 _mark_test_model_ready()
1493 db = AgentDB(tmp_path / "state.db")
1494 try:
1495 job_id = db.create_job("Research browser automation libraries", title="browser research")
1496 db.add_artifact(
1497 job_id=job_id,
1498 path=tmp_path / "comparison.md",
1499 sha256="abc123",
1500 artifact_type="text",
1501 title="Browser Automation Comparison Draft",
1502 summary="Checkpoint: saved comparison draft.",
1503 )
1504 _write_shell_state({"focus_job_id": job_id})
1505 finally:
1506 db.close()
1507
1508 snapshot = _load_frame_snapshot(WORKSPACE_CHAT_ID, history_limit=4)
1509 frame = _build_chat_frame(snapshot, "", [], width=128, height=28, right_view="updates")
1510
1511 assert snapshot["job"]["kind"] == "workspace"
1512 assert snapshot["right_job"]["title"] == "browser research"
1513 assert "browser research" in frame
1514 assert "Browser Automation" in frame
1515 assert "Comparison Draft" in frame
1516
1517
1518def test_workspace_slash_new_creates_and_focuses_job(monkeypatch, tmp_path):
1519 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
1520 _mark_test_model_ready()
1521 started = {}
1522
1523 def fake_start(**kwargs):
1524 started.update(kwargs)
1525
1526 monkeypatch.setattr("nipux_cli.cli._start_daemon_if_needed", fake_start)
1527
1528 keep_running, output = _capture_chat_command(WORKSPACE_CHAT_ID, "/new Build a durable workflow")
1529
1530 assert keep_running is True
1531 assert "Created worker job: Build a durable workflow" in output
1532 assert "Started worker" in output
1533 assert started["poll_seconds"] == 0.0
1534 assert started["quiet"] is True
1535 db = AgentDB(tmp_path / "state.db")
1536 try:
1537 jobs = db.list_jobs()
1538 assert len(jobs) == 1
1539 assert jobs[0]["title"] == "Build a durable workflow"
1540 assert _read_shell_state().get("focus_job_id") == jobs[0]["id"]
1541 finally:
1542 db.close()
1543
1544
1545def test_workspace_chat_job_dossier_includes_progress_outputs_and_outcomes(tmp_path):
1546 db = AgentDB(tmp_path / "state.db")
1547 try:
1548 job_id = db.create_job("Research and validate a workflow", title="workflow research")
1549 run_id = db.start_run(job_id, model="fake")
1550 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="write_artifact")
1551 db.finish_step(step_id, status="completed", output_data={"artifact_id": "art_demo"})
1552 db.add_artifact(
1553 job_id=job_id,
1554 path=tmp_path / "workflow.md",
1555 sha256="abc123",
1556 artifact_type="text",
1557 title="Workflow Evidence",
1558 summary="saved research notes",
1559 )
1560 db.append_task_record(job_id, title="Compare source-backed options", status="active", output_contract="research")
1561 db.append_source_record(job_id, "https://example.com/source", source_type="web")
1562 db.append_finding_record(job_id, name="Useful research finding", source_url="https://example.com/source")
1563 db.append_experiment_record(
1564 job_id,
1565 title="Validation check",
1566 status="measured",
1567 metric_name="pass_rate",
1568 metric_value=0.9,
1569 )
1570 job = db.get_job(job_id)
1571 dossier = _workspace_chat_job_dossier(db, [job])
1572 finally:
1573 db.close()
1574
1575 assert "workflow research" in dossier
1576 assert "outputs=1" in dossier
1577 assert "findings=1" in dossier
1578 assert "sources=1" in dossier
1579 assert "experiments=1" in dossier
1580 assert "active task: active Compare source-backed options [research]" in dossier
1581 assert "latest outputs: Workflow Evidence" in dossier
1582 assert "recent outcomes:" in dossier
1583
1584
1585def test_workspace_run_with_objective_creates_worker_when_no_job_matches(monkeypatch, tmp_path):
1586 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
1587 _mark_test_model_ready()
1588 started = {}
1589
1590 def fake_start(**kwargs):
1591 started.update(kwargs)
1592
1593 monkeypatch.setattr("nipux_cli.cli._start_daemon_if_needed", fake_start)
1594 monkeypatch.setattr(
1595 "nipux_cli.cli._refine_job_objective_for_worker",
1596 lambda *, message, objective: objective,
1597 )
1598
1599 keep_running, output = _capture_chat_command(WORKSPACE_CHAT_ID, "/run research browser automation libraries")
1600
1601 assert keep_running is True
1602 assert "Created worker job" in output
1603 assert started["quiet"] is True
1604 db = AgentDB(tmp_path / "state.db")
1605 try:
1606 jobs = db.list_jobs()
1607 assert len(jobs) == 1
1608 assert jobs[0]["title"] == "research browser automation libraries"
1609 finally:
1610 db.close()
1611
1612
1613def test_workspace_run_with_existing_job_does_not_create_duplicate(monkeypatch, tmp_path):
1614 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
1615 _mark_test_model_ready()
1616 db = AgentDB(tmp_path / "state.db")
1617 try:
1618 job_id = db.create_job("research browser automation libraries", title="research browser automation libraries")
1619 finally:
1620 db.close()
1621 started = {}
1622
1623 def fake_start(**kwargs):
1624 started.update(kwargs)
1625
1626 monkeypatch.setattr("nipux_cli.cli._start_daemon_if_needed", fake_start)
1627
1628 keep_running, output = _capture_chat_command(WORKSPACE_CHAT_ID, "/run research browser automation libraries")
1629
1630 assert keep_running is True
1631 assert "Created worker job" not in output
1632 assert "focus set" in output
1633 assert started["quiet"] is True
1634 db = AgentDB(tmp_path / "state.db")
1635 try:
1636 assert len(db.list_jobs()) == 1
1637 assert _read_shell_state().get("focus_job_id") == job_id
1638 finally:
1639 db.close()
1640
1641
1642def test_workspace_start_with_existing_job_runs_without_parser_error(monkeypatch, tmp_path):
1643 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
1644 _mark_test_model_ready()
1645 db = AgentDB(tmp_path / "state.db")
1646 try:
1647 job_id = db.create_job("research browser automation libraries", title="research browser automation libraries")
1648 finally:
1649 db.close()
1650 started = {}
1651
1652 def fake_start(**kwargs):
1653 started.update(kwargs)
1654
1655 monkeypatch.setattr("nipux_cli.cli._start_daemon_if_needed", fake_start)
1656
1657 keep_running, output = _capture_chat_command(WORKSPACE_CHAT_ID, "/start research browser automation libraries")
1658
1659 assert keep_running is True
1660 assert "command exited" not in output
1661 assert "Created worker job" not in output
1662 assert "focus set" in output
1663 assert started["quiet"] is True
1664 db = AgentDB(tmp_path / "state.db")
1665 try:
1666 assert len(db.list_jobs()) == 1
1667 assert _read_shell_state().get("focus_job_id") == job_id
1668 finally:
1669 db.close()
1670
1671
1672def test_workspace_slash_new_without_objective_is_minimal(monkeypatch, tmp_path):
1673 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
1674 _mark_test_model_ready()
1675
1676 keep_running, output = _capture_chat_command(WORKSPACE_CHAT_ID, "/new")
1677
1678 assert keep_running is True
1679 assert output.strip() == "usage: /new OBJECTIVE"
1680 assert "for example" not in output.lower()
1681
1682
1683def test_workspace_slash_new_hides_model_preflight_noise(monkeypatch, tmp_path):
1684 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
1685 _mark_test_model_ready()
1686
1687 def fake_start(**_kwargs):
1688 print("model is not ready; daemon not started")
1689 print(" fail model_endpoint: http://localhost:8000/v1/models: connection refused")
1690 print("Run `nipux doctor --check-model` after fixing the model configuration.")
1691
1692 monkeypatch.setattr("nipux_cli.cli._start_daemon_if_needed", fake_start)
1693
1694 keep_running, output = _capture_chat_command(WORKSPACE_CHAT_ID, "/new Build a durable workflow")
1695
1696 assert keep_running is True
1697 assert "Created worker job: Build a durable workflow" in output
1698 assert "Worker is waiting for a working model" in output
1699 assert "model_endpoint" not in output
1700 assert "connection refused" not in output
1701
1702
1703def test_workspace_settings_slash_commands_persist_config(monkeypatch, tmp_path):
1704 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
1705 _mark_test_model_ready()
1706
1707 keep_running, output = _capture_chat_command(WORKSPACE_CHAT_ID, "/model provider/model")
1708
1709 assert keep_running is True
1710 assert "saved model.name = provider/model" in output
1711 assert _config_field_value("model.name") == "provider/model"
1712
1713 db = AgentDB(tmp_path / "state.db")
1714 try:
1715 assert db.list_jobs() == []
1716 finally:
1717 db.close()
1718
1719
1720def test_workspace_settings_slash_command_summarizes_config(monkeypatch, tmp_path):
1721 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
1722 monkeypatch.setenv("NIPUX_TEST_KEY", "sk-test-value")
1723 _mark_test_model_ready()
1724 (tmp_path / "config.yaml").write_text(
1725 """
1726model:
1727 name: provider/model
1728 base_url: https://example.com/v1
1729 api_key_env: NIPUX_TEST_KEY
1730 context_length: 8192
1731 request_timeout_seconds: 45
1732""",
1733 encoding="utf-8",
1734 )
1735
1736 keep_running, output = _capture_chat_command(WORKSPACE_CHAT_ID, "/settings")
1737
1738 assert keep_running is True
1739 assert "config" in output
1740 assert "model: provider/model" in output
1741 assert "endpoint: https://example.com/v1" in output
1742 assert "key: set (NIPUX_TEST_KEY)" in output
1743 assert "sk-test-value" not in output
1744
1745
1746def test_workspace_natural_control_phrase_uses_mapped_command(monkeypatch, tmp_path):
1747 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
1748 _mark_test_model_ready()
1749
1750 keep_running, output = _capture_chat_command(WORKSPACE_CHAT_ID, "change model")
1751
1752 assert keep_running is True
1753 assert "model.name =" in output
1754 assert "usage: /model MODEL" in output
1755
1756 db = AgentDB(tmp_path / "state.db")
1757 try:
1758 assert db.list_jobs() == []
1759 finally:
1760 db.close()
1761
1762
1763def test_workspace_natural_settings_phrase_opens_settings_summary(monkeypatch, tmp_path):
1764 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
1765 monkeypatch.setenv("NIPUX_TEST_KEY", "sk-test-value")
1766 _mark_test_model_ready()
1767 (tmp_path / "config.yaml").write_text(
1768 """
1769model:
1770 name: provider/model
1771 base_url: https://example.com/v1
1772 api_key_env: NIPUX_TEST_KEY
1773""",
1774 encoding="utf-8",
1775 )
1776
1777 keep_running, output = _capture_chat_command(WORKSPACE_CHAT_ID, "settings")
1778
1779 assert keep_running is True
1780 assert "config" in output
1781 assert "model: provider/model" in output
1782 assert "key: set (NIPUX_TEST_KEY)" in output
1783 assert "usage: /model MODEL" not in output
1784 assert "sk-test-value" not in output
1785
1786
1787def test_workspace_how_to_start_job_question_uses_local_help(monkeypatch, tmp_path):
1788 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
1789 _mark_test_model_ready()
1790
1791 def fail_model(_line):
1792 raise AssertionError("model should not be called for local help")
1793
1794 monkeypatch.setattr("nipux_cli.cli._reply_to_workspace_chat", fail_model)
1795
1796 keep_running, output = _capture_chat_command(WORKSPACE_CHAT_ID, "how do I start a job?")
1797
1798 assert keep_running is True
1799 assert "Create: type a goal" in output
1800 assert "/new OBJECTIVE" in output
1801
1802
1803def test_workspace_chat_connection_error_is_operator_friendly(monkeypatch, tmp_path):
1804 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
1805 _mark_test_model_ready()
1806
1807 def raise_connection(_line):
1808 raise RuntimeError("APIConnectionError: Connection error.")
1809
1810 monkeypatch.setattr("nipux_cli.cli._reply_to_workspace_chat", raise_connection)
1811
1812 ok, message = _handle_workspace_chat_message("hello", quiet=True)
1813
1814 assert ok is True
1815 assert message == (
1816 "Model endpoint is unreachable. Check /base-url or start the configured model server, then run /doctor."
1817 )
1818 snapshot = _load_frame_snapshot(WORKSPACE_CHAT_ID, history_limit=4)
1819 bodies = "\n".join(str(event.get("body") or "") for event in snapshot["events"])
1820 assert "APIConnectionError" not in bodies
1821 assert "Model endpoint is unreachable" in bodies
1822
1823
1824def test_chat_start_reports_model_provider_not_ready(monkeypatch, tmp_path, capsys):
1825 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
1826 _mark_test_model_ready()
1827 db = AgentDB(tmp_path / "state.db")
1828 try:
1829 job_id = db.create_job("Research topic", title="research")
1830 finally:
1831 db.close()
1832
1833 def fake_start(**_kwargs):
1834 print("model is not ready; daemon not started")
1835 print(" fail model_endpoint: http://localhost:8000/v1/models: connection refused")
1836
1837 monkeypatch.setattr("nipux_cli.cli._start_daemon_if_needed", fake_start)
1838
1839 assert _chat_handle_line(job_id, "/start") is True
1840
1841 out = capsys.readouterr().out
1842 assert "worker not started: model provider is not ready. Use /settings, then /doctor." in out
1843 assert "model_endpoint" not in out
1844 assert "connection refused" not in out
1845
1846
1847def test_chat_doctor_checks_configured_model(monkeypatch, tmp_path, capsys):
1848 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
1849 _mark_test_model_ready()
1850 db = AgentDB(tmp_path / "state.db")
1851 try:
1852 job_id = db.create_job("Research topic", title="research")
1853 finally:
1854 db.close()
1855
1856 seen = {}
1857
1858 def fake_doctor(*, config, check_model):
1859 seen["check_model"] = check_model
1860 return [Check("model_generation", True, "ok")]
1861
1862 monkeypatch.setattr("nipux_cli.cli.run_doctor", fake_doctor)
1863
1864 assert _chat_handle_line(job_id, "/doctor") is True
1865
1866 assert seen["check_model"] is True
1867 assert "model_generation" in capsys.readouterr().out
1868
1869
1870def test_workspace_chat_control_phrase_runs_job_command(monkeypatch, tmp_path):
1871 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
1872 _mark_test_model_ready()
1873 db = AgentDB(tmp_path / "state.db")
1874 try:
1875 job_id = db.create_job("Research topic", title="research")
1876 finally:
1877 db.close()
1878
1879 keep_running, message = _handle_workspace_chat_message("stop the job", quiet=True)
1880
1881 assert keep_running is True
1882 assert "paused research" in message
1883 db = AgentDB(tmp_path / "state.db")
1884 try:
1885 job = db.get_job(job_id)
1886 assert job["status"] == "paused"
1887 events = _read_shell_state().get("workspace_chat_events") or []
1888 assert any(
1889 event["event_type"] == "agent_message"
1890 and event["metadata"].get("command") == "/pause"
1891 and "paused research" in event["body"]
1892 for event in events
1893 )
1894 finally:
1895 db.close()
1896
1897
1898def test_shell_ls_alias_lists_jobs_instead_of_steering(monkeypatch, tmp_path, capsys):
1899 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
1900 db = AgentDB(tmp_path / "state.db")
1901 try:
1902 db.create_job("Research topic", title="research")
1903 finally:
1904 db.close()
1905
1906 assert _run_shell_line("ls") is True
1907
1908 out = capsys.readouterr().out
1909 assert "research" in out
1910 assert "queued for" not in out
1911
1912
1913def test_roadmap_command_renders_roadmap(monkeypatch, tmp_path, capsys):
1914 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
1915 db = AgentDB(tmp_path / "state.db")
1916 try:
1917 job_id = db.create_job("Broad work", title="broad")
1918 db.append_roadmap_record(
1919 job_id,
1920 title="Broad Roadmap",
1921 status="active",
1922 current_milestone="Foundation",
1923 milestones=[{
1924 "title": "Foundation",
1925 "status": "validating",
1926 "validation_status": "pending",
1927 "features": [{"title": "First feature", "status": "done"}],
1928 }],
1929 )
1930 finally:
1931 db.close()
1932
1933 main(["roadmap", "broad"])
1934
1935 out = capsys.readouterr().out
1936 assert "roadmap broad" in out
1937 assert "Broad Roadmap" in out
1938 assert "Foundation" in out
1939 assert "validation=pending" in out
1940
1941
1942def test_shell_focus_controls_default_steering_job(monkeypatch, tmp_path, capsys):
1943 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
1944 db = AgentDB(tmp_path / "state.db")
1945 try:
1946 first = db.create_job("Research topic", title="first research")
1947 second = db.create_job("Find investors", title="investor search")
1948 finally:
1949 db.close()
1950
1951 assert _run_shell_line("focus investor") is True
1952 assert _run_shell_line("prioritize Toronto findings") is True
1953
1954 out = capsys.readouterr().out
1955 db = AgentDB(tmp_path / "state.db")
1956 try:
1957 first_job = db.get_job(first)
1958 second_job = db.get_job(second)
1959 assert "focus set:" in out
1960 assert first_job["metadata"].get("operator_messages") is None
1961 assert second_job["metadata"]["operator_messages"][-1]["message"] == "prioritize Toronto findings"
1962 finally:
1963 db.close()
1964
1965
1966def test_shell_rename_updates_job_title_and_program(monkeypatch, tmp_path, capsys):
1967 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
1968 db = AgentDB(tmp_path / "state.db")
1969 try:
1970 job_id = db.create_job("Research topic", title="old title")
1971 program = tmp_path / "jobs" / job_id / "program.md"
1972 program.parent.mkdir(parents=True, exist_ok=True)
1973 program.write_text("# old title\n\nBody\n", encoding="utf-8")
1974 finally:
1975 db.close()
1976
1977 assert _run_shell_line("rename old title --title new title") is True
1978
1979 out = capsys.readouterr().out
1980 db = AgentDB(tmp_path / "state.db")
1981 try:
1982 job = db.get_job(job_id)
1983 assert "renamed old title -> new title" in out
1984 assert job["title"] == "new title"
1985 assert program.read_text(encoding="utf-8").startswith("# new title\n")
1986 finally:
1987 db.close()
1988
1989
1990def test_shell_delete_removes_job_and_artifact_dir(monkeypatch, tmp_path, capsys):
1991 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
1992 db = AgentDB(tmp_path / "state.db")
1993 try:
1994 job_id = db.create_job("Research topic", title="delete me")
1995 run_id = db.start_run(job_id, model="fake")
1996 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="write_artifact")
1997 store = ArtifactStore(tmp_path, db=db)
1998 stored = store.write_text(
1999 job_id=job_id,
2000 run_id=run_id,
2001 step_id=step_id,
2002 title="Artifact",
2003 summary="saved",
2004 content="content",
2005 )
2006 artifact_path = stored.path
2007 finally:
2008 db.close()
2009
2010 assert artifact_path.exists()
2011 assert _run_shell_line("delete delete me") is True
2012
2013 out = capsys.readouterr().out
2014 db = AgentDB(tmp_path / "state.db")
2015 try:
2016 assert "deleted delete me" in out
2017 try:
2018 db.get_job(job_id)
2019 except KeyError:
2020 pass
2021 else:
2022 raise AssertionError("job still exists after shell delete")
2023 assert not artifact_path.exists()
2024 assert not (tmp_path / "jobs" / job_id).exists()
2025 finally:
2026 db.close()
2027
2028
2029def test_shell_help_has_no_examples_or_control_run_sections(capsys):
2030 _print_shell_help()
2031
2032 out = capsys.readouterr().out
2033 assert "Examples:" not in out
2034 assert "\nControl\n" not in out
2035 assert "\nRun\n" not in out
2036 assert "delete JOB_TITLE" in out
2037 assert "usage [JOB_TITLE]" in out
2038 assert "update" in out
2039 assert "Jobs" in out
2040 assert "Worker" in out
2041
2042
2043def test_update_checkout_falls_back_to_tool_install_for_non_git_path(monkeypatch, tmp_path):
2044 monkeypatch.setattr("shutil.which", lambda name: "/usr/bin/uv" if name == "uv" else None)
2045
2046 def runner(command):
2047 assert command == [
2048 "/usr/bin/uv",
2049 "tool",
2050 "install",
2051 "--force",
2052 "--upgrade",
2053 "--reinstall",
2054 "--refresh",
2055 "git+https://github.com/nipuxx/agent-cli.git@main",
2056 ]
2057 return subprocess.CompletedProcess(command, 0, stdout="Installed nipux\n")
2058
2059 code, lines = _update_checkout(path=tmp_path, command_runner=runner)
2060
2061 assert code == 0
2062 rendered = "\n".join(lines)
2063 assert "is not a source checkout; updating the installed Nipux tool instead" in rendered
2064 assert "not a git checkout" not in rendered
2065 assert "Update complete" in rendered
2066
2067
2068def test_update_checkout_upgrades_uv_tool_when_installed_package(monkeypatch):
2069 monkeypatch.setattr("nipux_cli.updater.find_checkout_root", lambda: None)
2070 monkeypatch.setattr("shutil.which", lambda name: "/usr/bin/uv" if name == "uv" else None)
2071 calls: list[tuple[str, ...]] = []
2072
2073 def runner(command):
2074 calls.append(tuple(command))
2075 return subprocess.CompletedProcess(command, 0, stdout="Resolved 1 package\nInstalled nipux\n")
2076
2077 code, lines = _update_checkout(command_runner=runner)
2078
2079 assert code == 0
2080 assert calls == [
2081 (
2082 "/usr/bin/uv",
2083 "tool",
2084 "install",
2085 "--force",
2086 "--upgrade",
2087 "--reinstall",
2088 "--refresh",
2089 "git+https://github.com/nipuxx/agent-cli.git@main",
2090 )
2091 ]
2092 rendered = "\n".join(lines)
2093 assert "not a git checkout" not in rendered
2094 assert "Updating installed Nipux command" in rendered
2095 assert "Nipux command refreshed from source" in rendered
2096 assert "Update complete" in rendered
2097
2098
2099def test_update_checkout_fast_forwards_git_checkout(tmp_path):
2100 repo = tmp_path / "repo"
2101 repo.mkdir()
2102 (repo / ".git").mkdir()
2103 calls: list[tuple[str, ...]] = []
2104 rev_calls = 0
2105
2106 def runner(command, cwd):
2107 nonlocal rev_calls
2108 assert cwd == repo
2109 calls.append(tuple(command))
2110 if command == ["git", "rev-parse", "--show-toplevel"]:
2111 return subprocess.CompletedProcess(command, 0, stdout=str(repo) + "\n")
2112 if command == ["git", "rev-parse", "--short", "HEAD"]:
2113 rev_calls += 1
2114 return subprocess.CompletedProcess(command, 0, stdout=("aaa111\n" if rev_calls == 1 else "bbb222\n"))
2115 if command == ["git", "branch", "--show-current"]:
2116 return subprocess.CompletedProcess(command, 0, stdout="main\n")
2117 if command == ["git", "status", "--porcelain"]:
2118 return subprocess.CompletedProcess(command, 0, stdout="")
2119 if command == ["git", "pull", "--ff-only"]:
2120 return subprocess.CompletedProcess(command, 0, stdout="Fast-forward\n")
2121 raise AssertionError(f"unexpected command: {command}")
2122
2123 code, lines = _update_checkout(path=repo, runner=runner)
2124
2125 assert code == 0
2126 assert ("git", "pull", "--ff-only") in calls
2127 rendered = "\n".join(lines)
2128 assert "Fast-forward" in rendered
2129 assert "aaa111 -> bbb222" in rendered
2130
2131
2132def test_update_checkout_verifies_installed_command(monkeypatch):
2133 monkeypatch.setattr("nipux_cli.updater.find_checkout_root", lambda: None)
2134
2135 def which(name):
2136 if name == "uv":
2137 return "/usr/bin/uv"
2138 if name == "nipux":
2139 return "/Users/me/.local/bin/nipux"
2140 return None
2141
2142 monkeypatch.setattr("shutil.which", which)
2143 calls: list[tuple[str, ...]] = []
2144
2145 def runner(command):
2146 calls.append(tuple(command))
2147 if command == ["/Users/me/.local/bin/nipux", "--version"]:
2148 return subprocess.CompletedProcess(command, 0, stdout="nipux 0.1.0\n")
2149 return subprocess.CompletedProcess(command, 0, stdout="Installed nipux\n")
2150
2151 code, lines = _update_checkout(command_runner=runner)
2152
2153 assert code == 0
2154 assert calls[-1] == ("/Users/me/.local/bin/nipux", "--version")
2155 assert "Verified: nipux 0.1.0" in "\n".join(lines)
2156
2157
2158def test_update_command_reports_no_restart_when_daemon_is_stopped(monkeypatch, tmp_path, capsys):
2159 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
2160 monkeypatch.setattr("nipux_cli.cli.update_checkout", lambda **_kwargs: (0, ["Update complete."]))
2161 monkeypatch.setattr(
2162 "nipux_cli.cli.daemon_lock_status",
2163 lambda _path: {"running": False, "metadata": {}},
2164 )
2165
2166 args = build_parser().parse_args(["update"])
2167 args.func(args)
2168
2169 out = capsys.readouterr().out
2170 assert "Update complete." in out
2171 assert "No daemon is running; no restart needed." in out
2172
2173
2174def test_update_command_restarts_running_daemon(monkeypatch, tmp_path, capsys):
2175 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
2176 monkeypatch.setattr("nipux_cli.cli.update_checkout", lambda **_kwargs: (0, ["Update complete."]))
2177 monkeypatch.setattr(
2178 "nipux_cli.cli.daemon_lock_status",
2179 lambda _path: {"running": True, "metadata": {"pid": 123}},
2180 )
2181 restarted = {}
2182
2183 def fake_restart(args):
2184 restarted["wait"] = args.wait
2185 restarted["quiet"] = args.quiet
2186 print("restart ok")
2187
2188 monkeypatch.setattr("nipux_cli.cli.cmd_restart", fake_restart)
2189
2190 args = build_parser().parse_args(["update"])
2191 args.func(args)
2192
2193 out = capsys.readouterr().out
2194 assert "Restarting running daemon" in out
2195 assert "restart ok" in out
2196 assert restarted == {"wait": 5.0, "quiet": True}
2197
2198
2199def test_update_command_no_restart_flag_skips_running_daemon(monkeypatch, tmp_path, capsys):
2200 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
2201 monkeypatch.setattr("nipux_cli.cli.update_checkout", lambda **_kwargs: (0, ["Update complete."]))
2202 monkeypatch.setattr(
2203 "nipux_cli.cli.daemon_lock_status",
2204 lambda _path: {"running": True, "metadata": {"pid": 123}},
2205 )
2206 monkeypatch.setattr("nipux_cli.cli.cmd_restart", lambda _args: (_ for _ in ()).throw(AssertionError("restart")))
2207
2208 args = build_parser().parse_args(["update", "--no-restart"])
2209 args.func(args)
2210
2211 assert "Daemon restart skipped by --no-restart." in capsys.readouterr().out
2212
2213
2214def test_uninstall_dry_run_removes_installed_tool_by_default(monkeypatch, tmp_path, capsys):
2215 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
2216
2217 args = build_parser().parse_args(["uninstall", "--dry-run"])
2218 args.func(args)
2219
2220 out = capsys.readouterr().out
2221 assert "would remove" in out
2222 assert "would run uv tool uninstall nipux" in out
2223 assert "runtime removed" not in out
2224
2225
2226def test_uninstall_keep_tool_skips_tool_removal(monkeypatch, tmp_path, capsys):
2227 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
2228
2229 args = build_parser().parse_args(["uninstall", "--dry-run", "--keep-tool"])
2230 args.func(args)
2231
2232 out = capsys.readouterr().out
2233 assert "would remove" in out
2234 assert "uv tool uninstall nipux" not in out
2235
2236
2237def test_uninstall_runtime_skips_missing_systemd_service_without_runner_noise(monkeypatch, tmp_path):
2238 from nipux_cli.uninstall import uninstall_runtime
2239
2240 monkeypatch.setenv("HOME", str(tmp_path))
2241 runtime = tmp_path / ".nipux"
2242 runtime.mkdir()
2243 calls = []
2244
2245 def which(name):
2246 if name == "systemctl":
2247 return "/bin/systemctl"
2248 return None
2249
2250 def runner(command, **_kwargs):
2251 calls.append(command)
2252 return subprocess.CompletedProcess(command, 0, stdout="", stderr="")
2253
2254 monkeypatch.setattr("nipux_cli.uninstall.shutil.which", which)
2255
2256 lines = uninstall_runtime(runtime_home=runtime, dry_run=False, runner=runner)
2257
2258 assert calls == []
2259 rendered = "\n".join(lines)
2260 assert "disabled systemd" not in rendered
2261 assert "no installed service files found" in rendered
2262 assert f"removed {runtime}" in rendered
2263
2264
2265def test_chat_clear_does_not_queue_operator_message(monkeypatch, tmp_path, capsys):
2266 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
2267 db = AgentDB(tmp_path / "state.db")
2268 try:
2269 job_id = db.create_job("Research topic", title="research")
2270 finally:
2271 db.close()
2272
2273 assert _chat_handle_line(job_id, "clear") is True
2274
2275 out = capsys.readouterr().out
2276 db = AgentDB(tmp_path / "state.db")
2277 try:
2278 job = db.get_job(job_id)
2279 assert "\033[2J\033[H" in out
2280 assert job["metadata"].get("operator_messages") is None
2281 finally:
2282 db.close()
2283
2284
2285def test_minimal_live_event_line_summarizes_tool_steps():
2286 line = _minimal_live_event_line(
2287 {
2288 "event_type": "tool_call",
2289 "title": "shell_exec",
2290 "body": "",
2291 "metadata": {"input": {"arguments": {"command": "ssh server nvidia-smi"}}},
2292 }
2293 )
2294
2295 assert line == "start shell ssh server nvidia-smi"
2296
2297
2298def test_chat_frame_is_bounded_and_has_composer():
2299 snapshot = {
2300 "job_id": "job_demo",
2301 "job": {
2302 "id": "job_demo",
2303 "title": "demo job",
2304 "objective": "keep a generic long-running job visible",
2305 "status": "running",
2306 "kind": "generic",
2307 "metadata": {"task_queue": [{"status": "active", "title": "Draft next deliverable", "priority": 7}]},
2308 },
2309 "jobs": [{"id": "job_demo", "title": "demo job", "status": "running", "kind": "generic", "metadata": {}}],
2310 "steps": [
2311 {
2312 "step_no": 3,
2313 "status": "completed",
2314 "kind": "tool",
2315 "tool_name": "web_search",
2316 "summary": "web_search returned sources",
2317 }
2318 ],
2319 "artifacts": [{"id": "art_demo"}],
2320 "memory_entries": [{}],
2321 "events": [
2322 {
2323 "event_type": "agent_message",
2324 "title": "plan",
2325 "body": "I will plan this.\nPlan:\n- one\n- two\nQuestions:\n- answer?",
2326 "metadata": {},
2327 },
2328 {
2329 "event_type": "task",
2330 "title": "internal task",
2331 "body": "internal task body",
2332 "metadata": {},
2333 },
2334 {
2335 "event_type": "tool_result",
2336 "title": "web_search",
2337 "body": "web_search query='demo' returned 1 results",
2338 "metadata": {"status": "completed", "input": {"arguments": {"query": "demo"}}},
2339 }
2340 ],
2341 "daemon": {"running": True, "metadata": {"pid": 123}},
2342 "model": "model/demo",
2343 "base_url": "https://openrouter.ai/api/v1",
2344 "context_length": 8192,
2345 "token_usage": {
2346 "calls": 2,
2347 "latest_prompt_tokens": 4096,
2348 "completion_tokens": 1234,
2349 "total_tokens": 5330,
2350 "cost": 0.0123,
2351 "has_cost": True,
2352 },
2353 }
2354
2355 frame = _build_chat_frame(snapshot, "hello", [], width=100, height=22)
2356 wide_frame = _build_chat_frame(snapshot, "", [], width=140, height=22)
2357
2358 assert len(frame.splitlines()) <= 22
2359 assert "NIPUX" in frame
2360 assert "CHAT" in frame
2361 assert "MODEL UPDATES" in frame
2362 assert "NAV" not in frame
2363 assert "Visible" in frame
2364 assert "#3" not in frame
2365 assert "Jobs" in frame
2366 assert "Recent outcomes" not in frame
2367 assert "ctx" in frame
2368 assert "4.1K/8.2K" in frame
2369 assert "out" in frame
2370 assert "1.2K" in frame
2371 assert "tok" in frame
2372 assert "5.3K" in frame
2373 assert "$0.0123" in frame
2374 assert "NIPUX" in wide_frame
2375 assert "model model/demo ctx 4.1K/8.2K" in wide_frame
2376 assert "daemon running" not in wide_frame
2377 assert wide_frame.splitlines()[1].startswith("━")
2378 assert "Enter sends" in frame
2379 assert "❯ hello" in frame
2380 jobs = _build_chat_frame(snapshot, "", [], width=100, height=26, right_view="status")
2381 assert "Draft next deli" in jobs
2382
2383 legacy_work = _build_chat_frame(snapshot, "", [], width=100, height=24, right_view="work")
2384 assert "MODEL UPDATES" in legacy_work
2385 assert "Tool / console" not in legacy_work
2386
2387 updates = _build_chat_frame(snapshot, "", [], width=100, height=24, right_view="updates")
2388 assert "MODEL UPDATES" in updates
2389 assert "Visible" in updates
2390
2391 settings = _build_chat_frame(snapshot, "", [], width=120, height=30, modal_view="settings")
2392 assert "Settings" in settings
2393 assert "/model MODEL" in settings
2394 assert "/api-key KEY" in settings
2395 assert "/base-url URL" in settings
2396 assert "Esc closes" in settings
2397 assert "NAV" not in settings
2398
2399 secret = _build_chat_frame(
2400 snapshot,
2401 "secret-value",
2402 [],
2403 width=100,
2404 height=24,
2405 editing_field="secret:model.api_key",
2406 )
2407 assert "Editing API key" in secret
2408 assert "secret-value" not in secret
2409 assert "••••" in secret
2410
2411
2412def test_chat_frame_separates_chat_from_worker_activity():
2413 snapshot = {
2414 "job_id": "job_demo",
2415 "job": {
2416 "id": "job_demo",
2417 "title": "demo job",
2418 "objective": "keep chat separate",
2419 "status": "running",
2420 "kind": "generic",
2421 "metadata": {},
2422 },
2423 "jobs": [{"id": "job_demo", "title": "demo job", "status": "running", "kind": "generic", "metadata": {}}],
2424 "steps": [],
2425 "artifacts": [],
2426 "memory_entries": [],
2427 "events": [
2428 {"event_type": "operator_message", "body": "start a benchmark job", "metadata": {}},
2429 {"event_type": "agent_message", "title": "chat", "body": "I created the job and started it.", "metadata": {}},
2430 {"event_type": "tool_call", "title": "shell_exec", "body": "", "metadata": {"input": {"arguments": {"command": "python bench.py"}}}},
2431 {"event_type": "tool_result", "title": "shell_exec", "body": "shell_exec rc=0", "metadata": {"status": "completed", "input": {"arguments": {"command": "python bench.py"}}}},
2432 ],
2433 "daemon": {"running": True, "metadata": {"pid": 123}},
2434 "model": "model/demo",
2435 }
2436
2437 frame = _build_chat_frame(snapshot, "", [], width=130, height=24, right_view="updates")
2438 chat_side = frame.split(" │ ", 1)[0]
2439
2440 assert "start a benchmark job" in frame
2441 assert "I created the job" in frame
2442 assert "Tool / console" not in frame
2443 assert "python bench.py" not in frame
2444 assert "python bench.py" not in chat_side
2445
2446
2447def test_chat_frame_empty_state_is_minimal_and_actionable():
2448 snapshot = {
2449 "job_id": "job_demo",
2450 "job": {
2451 "id": "job_demo",
2452 "title": "demo job",
2453 "objective": "keep chat visible",
2454 "status": "running",
2455 "kind": "generic",
2456 "metadata": {},
2457 },
2458 "jobs": [{"id": "job_demo", "title": "demo job", "status": "running", "kind": "generic", "metadata": {}}],
2459 "steps": [],
2460 "artifacts": [],
2461 "memory_entries": [],
2462 "events": [],
2463 "daemon": {"running": True, "metadata": {"pid": 123}},
2464 "model": "model/demo",
2465 }
2466
2467 frame = _build_chat_frame(snapshot, "", [], width=120, height=28)
2468
2469 assert "NIPUX" in frame
2470 assert "plain English" in frame
2471 assert "/new OBJECTIVE" in frame
2472 assert "/settings" in frame
2473 assert "███" not in frame
2474 assert "No chat yet." not in frame
2475 assert "star..." not in frame
2476
2477
2478def test_frame_emit_skips_unchanged_render(capsys):
2479 first = _emit_frame_if_changed("line one\nline two")
2480 second = _emit_frame_if_changed("frame", first)
2481 third = _emit_frame_if_changed("frame\nline three", second)
2482
2483 out = capsys.readouterr().out
2484 assert first == "line one\nline two"
2485 assert second == "frame"
2486 assert third == "frame\nline three"
2487 assert out.count("\033[H") == 1
2488 assert "\033[1;1H\033[2Kframe" in out
2489 assert "\033[2K" in out
2490 assert "\033[J" not in out
2491
2492
2493def test_chat_frame_does_not_cap_long_agent_messages():
2494 long_reply = (
2495 "**Completed Work:** "
2496 "1. Test suite analysis finished. "
2497 "2. Code analysis findings documented. "
2498 "3. Market readiness gaps identified. "
2499 "4. Packaging risks summarized. "
2500 "5. Daemon reliability checked. "
2501 "6. UI ergonomics reviewed. "
2502 "7. Final recommendation included."
2503 )
2504 snapshot = {
2505 "job_id": "job_demo",
2506 "job": {
2507 "id": "job_demo",
2508 "title": "demo job",
2509 "objective": "keep chat readable",
2510 "status": "running",
2511 "kind": "generic",
2512 "metadata": {},
2513 },
2514 "jobs": [{"id": "job_demo", "title": "demo job", "status": "running", "kind": "generic", "metadata": {}}],
2515 "steps": [],
2516 "artifacts": [],
2517 "memory_entries": [],
2518 "events": [
2519 {"event_type": "operator_message", "body": "what have you done so far", "metadata": {}},
2520 {"event_type": "agent_message", "title": "chat", "body": long_reply, "metadata": {}},
2521 ],
2522 "daemon": {"running": True, "metadata": {"pid": 123}},
2523 "model": "model/demo",
2524 }
2525
2526 frame = _build_chat_frame(snapshot, "", [], width=118, height=32)
2527
2528 assert "Completed Work:" in frame
2529 assert "Final recommendation included" in frame
2530 assert "…" not in frame
2531
2532
2533def test_plain_chat_control_intents_map_to_commands():
2534 assert _chat_control_command("how is it going?") == "/status"
2535 assert _chat_control_command("what is blocking it?") == "/status"
2536 assert _chat_control_command("check status") == "/status"
2537 assert _chat_control_command("what's happening?") == "/status"
2538 assert _chat_control_command("start working") == "/run"
2539 assert _chat_control_command("run it") == "/run"
2540 assert _chat_control_command("start worker") == "/run"
2541 assert _chat_control_command("pause this job") == "/pause"
2542 assert _chat_control_command("pause the job") == "/pause"
2543 assert _chat_control_command("pause it") == "/pause"
2544 assert _chat_control_command("stop the job") == "/pause"
2545 assert _chat_control_command("stop worker") == "/pause"
2546 assert _chat_control_command("resume the job") == "/resume"
2547 assert _chat_control_command("resume it") == "/resume"
2548 assert _chat_control_command("show jobs") == "/jobs"
2549 assert _chat_control_command("change model") == "/model"
2550 assert _chat_control_command("settings") == "/settings"
2551 assert _chat_control_command("show settings") == "/settings"
2552 assert _chat_control_command("how do I start a job?") == "/help"
2553 assert _chat_control_command("how much did it cost") == "/usage"
2554 assert _chat_control_command("what has it done") == "/outcomes"
2555 assert _chat_control_command("what have you done so far") == "/outcomes"
2556 assert _chat_control_command("what did the model do") == "/outcomes"
2557 assert _chat_control_command("what have all jobs done") == "/outcomes all"
2558 assert _chat_control_command("what files did it create") == "/artifacts"
2559 assert _chat_control_command("show me the saved files") == "/artifacts"
2560 assert _chat_control_command("what tool calls did it run") == "/activity"
2561 assert _chat_control_command("show console output") == "/outputs"
2562 assert _chat_control_command("what tasks are open") == "/tasks"
2563 assert _chat_control_command("show the current plan") == "/roadmap"
2564 assert _chat_control_command("show benchmarks") == "/experiments"
2565 assert _chat_control_command("how many tokens did it use") == "/usage"
2566 assert _chat_control_command("restart daemon") == "/restart"
2567 assert _chat_control_command("prefer artifact-backed findings") == ""
2568
2569
2570def test_plain_chat_classifier_keeps_natural_controls_out_of_model_path():
2571 assert _is_plain_chat_line("hello there") is True
2572 assert _is_plain_chat_line("stop the job") is False
2573 assert _is_plain_chat_line("what has it done") is False
2574 assert _is_plain_chat_line("show me the saved files") is False
2575
2576
2577def test_plain_chat_control_intent_does_not_queue_operator_context(monkeypatch, tmp_path):
2578 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
2579 _mark_test_model_ready()
2580 db = AgentDB(tmp_path / "state.db")
2581 try:
2582 job_id = db.create_job("Research topic", title="research")
2583 finally:
2584 db.close()
2585
2586 captured = {}
2587
2588 def fake_capture(job_id_arg, command):
2589 captured["job_id"] = job_id_arg
2590 captured["command"] = command
2591 return True, "status output\n"
2592
2593 monkeypatch.setattr("nipux_cli.cli._capture_chat_command", fake_capture)
2594
2595 keep_running, message = _handle_chat_message(job_id, "how is it going?", quiet=True)
2596
2597 assert keep_running is True
2598 assert message == "status output"
2599 assert captured == {"job_id": job_id, "command": "/status"}
2600 db = AgentDB(tmp_path / "state.db")
2601 try:
2602 job = db.get_job(job_id)
2603 assert job["metadata"].get("operator_messages") is None
2604 finally:
2605 db.close()
2606
2607
2608def test_plain_chat_reply_usage_is_recorded(monkeypatch, tmp_path):
2609 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
2610 _mark_test_model_ready()
2611 db = AgentDB(tmp_path / "state.db")
2612 try:
2613 job_id = db.create_job("Research topic", title="research")
2614 finally:
2615 db.close()
2616
2617 keep_running, message = _handle_chat_message(
2618 job_id,
2619 "hello",
2620 quiet=True,
2621 reply_fn=lambda _job_id, _line: LLMResponse(
2622 content="reply",
2623 usage={"prompt_tokens": 120, "completion_tokens": 20, "total_tokens": 140, "cost": 0.001},
2624 model="provider/model",
2625 response_id="gen_chat",
2626 ),
2627 )
2628
2629 db = AgentDB(tmp_path / "state.db")
2630 try:
2631 usage = db.job_token_usage(job_id)
2632 events = db.list_events(job_id=job_id, event_types=["loop"], limit=5)
2633 finally:
2634 db.close()
2635 assert keep_running is True
2636 assert message == ""
2637 assert usage["calls"] == 1
2638 assert usage["prompt_tokens"] == 120
2639 assert usage["completion_tokens"] == 20
2640 assert usage["cost"] == 0.001
2641 assert events[-1]["metadata"]["source"] == "chat"
2642 assert events[-1]["metadata"]["response_id"] == "gen_chat"
2643
2644
2645def test_chat_frame_surfaces_actual_work_events():
2646 snapshot = {
2647 "job_id": "job_demo",
2648 "job": {
2649 "id": "job_demo",
2650 "title": "demo job",
2651 "objective": "produce visible work",
2652 "status": "running",
2653 "kind": "generic",
2654 "metadata": {
2655 "task_queue": [{"status": "open"}],
2656 "roadmap": {"milestones": [{"title": "Draft", "status": "active"}]},
2657 },
2658 },
2659 "jobs": [{"id": "job_demo", "title": "demo job", "status": "running", "kind": "generic", "metadata": {}}],
2660 "steps": [],
2661 "artifacts": [{"id": "art_demo"}],
2662 "memory_entries": [{}],
2663 "events": [
2664 {"event_type": "operator_message", "body": "please keep improving", "metadata": {"mode": "steer"}},
2665 {"event_type": "tool_call", "title": "web_search", "body": "", "metadata": {"input": {"arguments": {"query": "agent harness distillation"}}}},
2666 {"event_type": "tool_result", "title": "web_search", "body": "web_search query='agent harness distillation' returned 5 results", "metadata": {"status": "completed", "input": {"arguments": {"query": "agent harness distillation"}}}},
2667 {"event_type": "artifact", "title": "Research Paper Draft", "body": "", "metadata": {"summary": "saved first complete draft"}},
2668 {"event_type": "finding", "title": "Distillation finding", "body": "tool traces improve student behavior", "metadata": {}},
2669 {"event_type": "task", "title": "Compare methods", "body": "", "metadata": {"status": "open"}},
2670 {"event_type": "roadmap", "title": "Paper roadmap", "body": "", "metadata": {"status": "active"}},
2671 {"event_type": "milestone_validation", "title": "Draft", "body": "", "metadata": {"validation_status": "passed"}},
2672 {"event_type": "experiment", "title": "Citation coverage check", "body": "", "metadata": {"metric_name": "sources", "metric_value": 18, "metric_unit": ""}},
2673 {"event_type": "lesson", "title": "strategy", "body": "prefer measured updates", "metadata": {}},
2674 {"event_type": "reflection", "title": "reflection", "body": "Reflection through step #10: next branch is evaluation.", "metadata": {}},
2675 ],
2676 "daemon": {"running": True, "metadata": {"pid": 123}},
2677 "model": "model/demo",
2678 "counts": {"steps": 10, "artifacts": 1, "memory": 1},
2679 }
2680
2681 updates = _build_chat_frame(snapshot, "", [], width=150, height=34, right_view="updates")
2682 jobs = _build_chat_frame(snapshot, "", [], width=150, height=34, right_view="status")
2683 frame = updates + "\n" + jobs
2684
2685 assert "Research Paper Draft" in frame
2686 assert "Distillation finding" in frame
2687 assert "Compare methods" in frame
2688 assert "Paper roadmap" in frame
2689 assert "passed Draft" in frame
2690 assert "Citation coverage check" in frame
2691 assert "LEARN" in frame
2692 assert "strategy" in frame
2693
2694
2695def test_chat_frame_has_model_updates_page():
2696 snapshot = {
2697 "job_id": "job_demo",
2698 "job": {
2699 "id": "job_demo",
2700 "title": "paper job",
2701 "objective": "write a paper",
2702 "status": "running",
2703 "kind": "generic",
2704 "metadata": {},
2705 },
2706 "jobs": [{"id": "job_demo", "title": "paper job", "status": "running", "kind": "generic", "metadata": {}}],
2707 "steps": [],
2708 "artifacts": [],
2709 "memory_entries": [],
2710 "events": [
2711 {"event_type": "tool_result", "title": "web_search", "body": "web_search query='distillation agents' returned 5 results", "metadata": {"status": "completed", "input": {"arguments": {"query": "distillation agents"}}}},
2712 {"event_type": "artifact", "title": "Literature Review Draft", "body": "saved draft", "metadata": {}},
2713 {"event_type": "finding", "title": "Trajectory distillation", "body": "teacher traces improve tool use", "metadata": {}},
2714 {"event_type": "experiment", "title": "Citation density check", "body": "", "metadata": {"metric_name": "citations", "metric_value": 12, "metric_unit": "count"}},
2715 {"event_type": "tool_result", "title": "write_file", "body": "write_file overwrite /tmp/paper.md", "metadata": {"status": "completed", "input": {"arguments": {"path": "/tmp/paper.md"}}, "output": {"path": "/tmp/paper.md"}}},
2716 {"event_type": "tool_result", "title": "shell_exec", "body": "shell_exec rc=0", "metadata": {"status": "completed", "input": {"arguments": {"command": "printf draft | tee /tmp/outline.md"}}}},
2717 ],
2718 "daemon": {"running": True, "metadata": {"pid": 123}},
2719 "model": "model/demo",
2720 }
2721
2722 frame = _build_chat_frame(snapshot, "", [], width=132, height=28, right_view="updates")
2723
2724 assert "MODEL UPDATES" in frame
2725 assert "Page" in frame
2726 assert "1 outputs" in frame
2727 assert "measurements" in frame
2728 assert "Literature Review Draft" in frame
2729 assert "Trajectory distillation" in frame
2730 assert "Citation density check" in frame
2731 assert "paper.md" in frame
2732 assert "outline.md" in frame
2733
2734
2735def test_workspace_status_page_does_not_render_fake_worker_when_no_jobs():
2736 snapshot = {
2737 "job_id": WORKSPACE_CHAT_ID,
2738 "job": {
2739 "id": WORKSPACE_CHAT_ID,
2740 "title": "Nipux",
2741 "objective": "Chat with Nipux to create, start, inspect, and steer long-running worker jobs.",
2742 "status": "ready",
2743 "kind": "workspace",
2744 "metadata": {},
2745 },
2746 "right_job": {
2747 "id": WORKSPACE_CHAT_ID,
2748 "title": "Nipux",
2749 "objective": "Chat with Nipux to create, start, inspect, and steer long-running worker jobs.",
2750 "status": "ready",
2751 "kind": "workspace",
2752 "metadata": {},
2753 },
2754 "right_job_id": WORKSPACE_CHAT_ID,
2755 "jobs": [],
2756 "steps": [],
2757 "artifacts": [],
2758 "job_artifacts": {},
2759 "job_summary_events": {},
2760 "job_counts": {},
2761 "memory_entries": [],
2762 "events": [],
2763 "right_events": [],
2764 "summary_events": [],
2765 "daemon": {"running": False, "metadata": {}},
2766 "model": "model/demo",
2767 "base_url": "http://127.0.0.1:8000/v1",
2768 "context_length": 0,
2769 "token_usage": {},
2770 "counts": {"steps": 0, "artifacts": 0, "memory": 0},
2771 }
2772
2773 frame = _build_chat_frame(snapshot, "", [], width=132, height=28, right_view="status")
2774
2775 assert "No workers yet" in frame
2776 assert "plain English goal" in frame
2777 assert "/new OBJECTIVE" in frame
2778 assert "Chat with Nipux to create" not in frame
2779 assert "actions:0" not in frame
2780
2781
2782def test_status_job_cards_show_durable_work_mix():
2783 events = [
2784 {"event_type": "artifact", "title": "Paper draft", "body": "", "metadata": {}},
2785 {"event_type": "finding", "title": "Method taxonomy", "body": "", "metadata": {}},
2786 {
2787 "event_type": "experiment",
2788 "title": "Citation coverage check",
2789 "body": "",
2790 "metadata": {"metric_name": "citations", "metric_value": 12, "metric_unit": "count"},
2791 },
2792 ]
2793 snapshot = {
2794 "job_id": "job_demo",
2795 "job": {
2796 "id": "job_demo",
2797 "title": "paper job",
2798 "objective": "write a paper",
2799 "status": "running",
2800 "kind": "generic",
2801 "metadata": {},
2802 },
2803 "jobs": [{"id": "job_demo", "title": "paper job", "status": "running", "kind": "generic", "metadata": {}}],
2804 "steps": [],
2805 "artifacts": [{"id": "art_1", "title": "Paper draft"}],
2806 "job_artifacts": {"job_demo": [{"id": "art_1", "title": "Paper draft"}]},
2807 "job_summary_events": {"job_demo": events},
2808 "job_counts": {"job_demo": {"artifacts": 1}},
2809 "memory_entries": [],
2810 "events": events,
2811 "summary_events": events,
2812 "daemon": {"running": True, "metadata": {"pid": 123}},
2813 "model": "model/demo",
2814 }
2815
2816 frame = _build_chat_frame(snapshot, "", [], width=150, height=34, right_view="status")
2817
2818 assert "work 1 outputs 1 findings 1 measurements" in frame
2819 assert "made 1 output" in frame
2820 assert "Paper draft" in frame
2821
2822
2823def test_recent_outcome_lines_wrap_long_updates():
2824 lines = recent_model_update_lines(
2825 [
2826 {
2827 "event_type": "finding",
2828 "title": "Trajectory distillation improves agentic tool selection when teacher traces include failures and recovery actions",
2829 "body": "",
2830 "metadata": {},
2831 "created_at": "2026-05-01T12:34:00+00:00",
2832 }
2833 ],
2834 width=62,
2835 limit=4,
2836 )
2837
2838 rendered = "\n".join(lines)
2839 assert len(lines) >= 2
2840 assert "Trajectory distillation improves" in rendered
2841 assert "teacher traces include" in rendered
2842 assert "failures" in rendered
2843
2844
2845def test_recent_outcome_lines_do_not_pretruncate_actual_work():
2846 events = [
2847 {
2848 "event_type": "artifact",
2849 "title": (
2850 "Research paper draft rewritten with a new methods section, expanded evaluation table, "
2851 "and integrated citations from teacher trajectory distillation, agent workflow distillation, "
2852 "and self-improvement harness papers"
2853 ),
2854 "body": "",
2855 "metadata": {},
2856 "created_at": "2026-05-01T12:34:00+00:00",
2857 }
2858 ]
2859
2860 rendered = "\n".join(recent_model_update_lines(events, width=72, limit=6))
2861
2862 assert "methods" in rendered
2863 assert "section" in rendered
2864 assert "integrated" in rendered
2865 assert "citations" in rendered
2866 assert "self-improvement harness" in rendered
2867 assert "papers" in rendered
2868 assert "..." not in rendered
2869
2870
2871def test_chat_updates_page_keeps_updates_to_one_line_each():
2872 long_task = (
2873 "open Publish a concise progress update and keep working on the next useful branch. "
2874 "This task title is deliberately long enough to wrap several times if the compact pane does not constrain it."
2875 )
2876 events = [
2877 {
2878 "event_type": "task",
2879 "title": "Publish progress",
2880 "body": long_task,
2881 "metadata": {"status": "open"},
2882 "created_at": "2026-04-25T20:00:00+00:00",
2883 }
2884 ]
2885
2886 lines = recent_model_update_lines(events, width=56, limit=4, wrap=False)
2887
2888 assert len(lines) == 1
2889 assert "Publish progress" in lines[0]
2890
2891
2892def test_chat_pane_marks_hidden_overflow():
2893 events = [
2894 {
2895 "event_type": "agent_message",
2896 "title": "chat",
2897 "body": " ".join(f"word{i}" for i in range(80)),
2898 "metadata": {},
2899 "created_at": "2026-04-25T12:00:00Z",
2900 }
2901 ]
2902
2903 lines = chat_pane_lines(events, [], width=48, rows=4)
2904
2905 assert "nipux" in lines[0]
2906 assert "middle lines hidden" in "\n".join(lines)
2907 assert "word" in lines[-1]
2908 assert len(lines) == 4
2909
2910
2911def test_chat_pane_groups_multiline_command_output_under_one_label():
2912 lines = chat_pane_lines(
2913 [],
2914 [
2915 "> /help",
2916 "Create: type a goal, or /new OBJECTIVE.\n"
2917 "Run: /run, /pause, /resume. Inspect: /jobs, /outcomes, /artifacts, /activity.\n"
2918 "Config: /settings, /model, /base-url, /api-key. Navigate: ←→ pages, ↑↓ jobs.",
2919 ],
2920 width=78,
2921 rows=12,
2922 )
2923
2924 rendered = "\n".join(lines)
2925 assert rendered.count("nipux") == 1
2926 assert "Create: type a goal" in rendered
2927 assert "Inspect: /jobs" in rendered
2928 assert "Config: /settings" in rendered
2929
2930
2931def test_chat_pane_suppresses_transient_duplicates_after_events_arrive():
2932 events = [
2933 {
2934 "event_type": "operator_message",
2935 "title": "chat",
2936 "body": "Hello",
2937 "metadata": {},
2938 "created_at": "2026-04-25T20:00:00+00:00",
2939 },
2940 {
2941 "event_type": "agent_message",
2942 "title": "chat",
2943 "body": "Hello! I can help with worker jobs.",
2944 "metadata": {},
2945 "created_at": "2026-04-25T20:00:01+00:00",
2946 },
2947 ]
2948
2949 lines = chat_pane_lines(
2950 events,
2951 ["> Hello", "sent; waiting for model", "Hello! I can help with worker jobs."],
2952 width=80,
2953 rows=12,
2954 )
2955
2956 rendered = "\n".join(lines)
2957 assert rendered.count("Hello") == 2
2958 assert "waiting for model" not in rendered
2959
2960
2961def test_chat_pane_hides_persisted_legacy_waiting_notice():
2962 lines = chat_pane_lines(
2963 [
2964 {
2965 "event_type": "operator_message",
2966 "title": "chat",
2967 "body": "Hello",
2968 "metadata": {},
2969 "created_at": "2026-04-25T20:00:00+00:00",
2970 },
2971 {
2972 "event_type": "agent_message",
2973 "title": "chat",
2974 "body": "sent; waiting for model",
2975 "metadata": {},
2976 "created_at": "2026-04-25T20:00:01+00:00",
2977 },
2978 {
2979 "event_type": "agent_message",
2980 "title": "chat",
2981 "body": "Hello! I can help with worker jobs.",
2982 "metadata": {},
2983 "created_at": "2026-04-25T20:00:02+00:00",
2984 },
2985 ],
2986 [],
2987 width=80,
2988 rows=12,
2989 )
2990
2991 rendered = "\n".join(lines)
2992 assert "waiting for model" not in rendered
2993 assert "Hello! I can help with worker jobs." in rendered
2994
2995
2996def test_chat_pane_renders_waiting_notice_as_animation_only():
2997 rendered_notices = _display_chat_notices([_WAITING_NOTICE])
2998 lines = chat_pane_lines([], rendered_notices, width=64, rows=4)
2999
3000 rendered = "\n".join(lines)
3001 assert "AGENT" in rendered
3002 assert "waiting" in rendered
3003 assert "Waiting for the next worker step" not in rendered
3004 assert "waiting for model" not in rendered
3005
3006
3007def test_chat_pane_hides_persisted_worker_waiting_text():
3008 lines = chat_pane_lines(
3009 [
3010 {
3011 "event_type": "operator_message",
3012 "title": "chat",
3013 "body": "what has it done so far?",
3014 "metadata": {},
3015 "created_at": "2026-04-25T20:00:00+00:00",
3016 },
3017 {
3018 "event_type": "agent_message",
3019 "title": "chat",
3020 "body": "waiting for demo job: what has it done so far?",
3021 "metadata": {},
3022 "created_at": "2026-04-25T20:00:01+00:00",
3023 },
3024 {
3025 "event_type": "agent_message",
3026 "title": "chat",
3027 "body": "Waiting for the next worker step.",
3028 "metadata": {},
3029 "created_at": "2026-04-25T20:00:02+00:00",
3030 },
3031 ],
3032 [],
3033 width=80,
3034 rows=12,
3035 )
3036
3037 rendered = "\n".join(lines)
3038 assert "what has it done so far?" in rendered
3039 assert "waiting for demo job" not in rendered
3040 assert "Waiting for the next worker step" not in rendered
3041 assert "NIPUX" not in rendered
3042
3043
3044def test_chat_pane_renders_stored_provider_errors_as_actions():
3045 lines = chat_pane_lines(
3046 [
3047 {
3048 "event_type": "agent_message",
3049 "title": "chat",
3050 "body": "APIConnectionError: Connection error.",
3051 "metadata": {"error": True},
3052 "created_at": "2026-04-25T20:00:00+00:00",
3053 }
3054 ],
3055 [],
3056 width=80,
3057 rows=6,
3058 )
3059
3060 rendered = "\n".join(lines)
3061 assert "APIConnectionError" not in rendered
3062 assert "Model endpoint is unreachable" in rendered
3063 assert "/doctor" in rendered
3064
3065
3066def test_chat_updates_page_uses_deeper_summary_events():
3067 snapshot = {
3068 "job_id": "job_demo",
3069 "job": {
3070 "id": "job_demo",
3071 "title": "paper job",
3072 "objective": "write a paper",
3073 "status": "running",
3074 "kind": "generic",
3075 "metadata": {},
3076 },
3077 "jobs": [{"id": "job_demo", "title": "paper job", "status": "running", "kind": "generic", "metadata": {}}],
3078 "steps": [],
3079 "artifacts": [],
3080 "memory_entries": [],
3081 "events": [
3082 {"event_type": "tool_call", "title": "web_search", "body": "", "metadata": {}},
3083 ],
3084 "summary_events": [
3085 {"event_type": "artifact", "title": "Full Paper Draft", "body": "saved draft", "metadata": {}},
3086 {"event_type": "finding", "title": "Distillation method map", "body": "", "metadata": {}},
3087 ],
3088 "daemon": {"running": True, "metadata": {"pid": 123}},
3089 "model": "model/demo",
3090 }
3091
3092 frame = _build_chat_frame(snapshot, "", [], width=132, height=26, right_view="updates")
3093
3094 assert "Full Paper Draft" in frame
3095 assert "Distillation method map" in frame
3096
3097
3098def test_hourly_outcomes_prioritize_durable_work_over_research_noise():
3099 events = [
3100 {
3101 "event_type": "tool_result",
3102 "title": "web_search",
3103 "body": "web_search query='generic harness patterns' returned 5 results",
3104 "metadata": {"status": "completed", "input": {"arguments": {"query": "generic harness patterns"}}},
3105 "created_at": "2026-05-01T12:05:00+00:00",
3106 },
3107 {
3108 "event_type": "tool_result",
3109 "title": "web_extract",
3110 "body": "web_extract fetched 3/3 pages",
3111 "metadata": {"status": "completed"},
3112 "created_at": "2026-05-01T12:08:00+00:00",
3113 },
3114 {
3115 "event_type": "artifact",
3116 "title": "Harness Architecture Notes",
3117 "body": "saved design notes",
3118 "metadata": {},
3119 "created_at": "2026-05-01T12:20:00+00:00",
3120 },
3121 {
3122 "event_type": "experiment",
3123 "title": "Context budget check",
3124 "body": "",
3125 "metadata": {"metric_name": "prompt_tokens", "metric_value": 4200, "metric_unit": "tokens"},
3126 "created_at": "2026-05-01T12:30:00+00:00",
3127 },
3128 ]
3129
3130 rendered = "\n".join(hourly_update_lines(events, width=96, limit=8))
3131
3132 assert "2 research" in rendered
3133 assert "1 outputs" in rendered
3134 assert "1 measurements" in rendered
3135 assert "Harness Architecture Notes" in rendered
3136 assert "Context budget check" in rendered
3137 assert "generic harness patterns" not in rendered
3138
3139
3140def test_status_recent_outcomes_hide_research_noise():
3141 events = [
3142 {
3143 "event_type": "tool_result",
3144 "title": "web_search",
3145 "body": "web_search query='generic harness patterns' returned 5 results",
3146 "metadata": {"status": "completed", "input": {"arguments": {"query": "generic harness patterns"}}},
3147 "created_at": "2026-05-01T12:05:00+00:00",
3148 },
3149 {
3150 "event_type": "artifact",
3151 "title": "Harness Architecture Notes",
3152 "body": "saved design notes",
3153 "metadata": {},
3154 "created_at": "2026-05-01T12:20:00+00:00",
3155 },
3156 ]
3157
3158 rendered = "\n".join(recent_model_update_lines(events, width=96, limit=4))
3159
3160 assert "Harness Architecture Notes" in rendered
3161 assert "generic harness patterns" not in rendered
3162
3163
3164def test_status_recent_outcomes_hide_plan_update_noise():
3165 events = [
3166 {
3167 "event_type": "reflection",
3168 "title": "reflection",
3169 "body": "summarized current counts",
3170 "metadata": {},
3171 "created_at": "2026-05-01T12:05:00+00:00",
3172 },
3173 {
3174 "event_type": "agent_message",
3175 "title": "progress",
3176 "body": "Checkpoint at step #100.",
3177 "metadata": {},
3178 "created_at": "2026-05-01T12:08:00+00:00",
3179 },
3180 {
3181 "event_type": "finding",
3182 "title": "Teacher trace distillation pattern",
3183 "body": "",
3184 "metadata": {},
3185 "created_at": "2026-05-01T12:20:00+00:00",
3186 },
3187 ]
3188
3189 rendered = "\n".join(recent_model_update_lines(events, width=96, limit=5))
3190
3191 assert "Teacher trace distillation pattern" in rendered
3192 assert "Checkpoint at step" not in rendered
3193 assert "summarized current counts" not in rendered
3194
3195
3196def test_status_recent_outcomes_show_durable_checkpoint_updates():
3197 events = [
3198 {
3199 "event_type": "agent_message",
3200 "title": "progress",
3201 "body": "Checkpoint step #90: ~1 task updated, 1 task resolved.",
3202 "metadata": {
3203 "updates": {"tasks": 1},
3204 "resolutions": {"tasks": 1},
3205 "deltas": {"findings": 0},
3206 },
3207 "created_at": "2026-05-01T12:08:00+00:00",
3208 }
3209 ]
3210
3211 rendered = "\n".join(recent_model_update_lines(events, width=96, limit=4))
3212
3213 assert "TASK" in rendered
3214 assert "~1 task updated" in rendered
3215 assert "1 task resolved" in rendered
3216 assert "Checkpoint step #90" in rendered
3217
3218
3219def test_status_recent_outcomes_compact_repeated_updates():
3220 events = [
3221 {
3222 "event_type": "agent_message",
3223 "title": "error",
3224 "body": "Model provider requires operator action.",
3225 "metadata": {},
3226 "created_at": f"2026-05-01T12:0{index}:00+00:00",
3227 }
3228 for index in range(3)
3229 ]
3230
3231 rendered = "\n".join(recent_model_update_lines(events, width=96, limit=4))
3232
3233 assert rendered.count("Model provider requires operator action") == 1
3234 assert "x3" in rendered
3235
3236
3237def test_hourly_outcomes_hide_plan_update_noise():
3238 events = [
3239 {
3240 "event_type": "reflection",
3241 "title": "reflection",
3242 "body": "summarized current counts",
3243 "metadata": {},
3244 "created_at": "2026-05-01T12:05:00+00:00",
3245 },
3246 {
3247 "event_type": "agent_message",
3248 "title": "progress",
3249 "body": "Checkpoint at step #100.",
3250 "metadata": {},
3251 "created_at": "2026-05-01T12:08:00+00:00",
3252 },
3253 {
3254 "event_type": "artifact",
3255 "title": "Saved research draft",
3256 "body": "",
3257 "metadata": {},
3258 "created_at": "2026-05-01T12:20:00+00:00",
3259 },
3260 ]
3261
3262 rendered = "\n".join(hourly_update_lines(events, width=96, limit=6))
3263
3264 assert "Saved research draft" in rendered
3265 assert "Checkpoint at step" not in rendered
3266 assert "summarized current counts" not in rendered
3267
3268
3269def test_hourly_outcomes_count_durable_checkpoint_updates():
3270 events = [
3271 {
3272 "event_type": "agent_message",
3273 "title": "progress",
3274 "body": "Checkpoint step #110: ~1 experiment updated, 1 experiment resolved.",
3275 "metadata": {
3276 "updates": {"experiments": 1},
3277 "resolutions": {"experiments": 1},
3278 },
3279 "created_at": "2026-05-01T12:08:00+00:00",
3280 }
3281 ]
3282
3283 rendered = "\n".join(hourly_update_lines(events, width=96, limit=6))
3284
3285 assert "1 measurements" in rendered
3286 assert "~1 measurement updated" in rendered
3287 assert "1 measurement resolved" in rendered
3288
3289
3290def test_hourly_outcome_summary_uses_progress_order():
3291 events = [
3292 {
3293 "event_type": "source",
3294 "title": "source scored",
3295 "body": "",
3296 "metadata": {},
3297 "created_at": "2026-05-01T12:01:00+00:00",
3298 },
3299 {
3300 "event_type": "artifact",
3301 "title": "draft saved",
3302 "body": "",
3303 "metadata": {},
3304 "created_at": "2026-05-01T12:02:00+00:00",
3305 },
3306 {
3307 "event_type": "experiment",
3308 "title": "metric checked",
3309 "body": "",
3310 "metadata": {"metric_name": "score", "metric_value": 1, "metric_unit": "point"},
3311 "created_at": "2026-05-01T12:03:00+00:00",
3312 },
3313 ]
3314
3315 rendered = "\n".join(hourly_update_lines(events, width=96, limit=8))
3316
3317 assert "1 outputs 1 measurements 1 sources" in rendered
3318
3319
3320def test_hourly_outcomes_wrap_long_durable_updates_without_pre_truncation():
3321 events = [
3322 {
3323 "event_type": "finding",
3324 "title": (
3325 "Distillation survey breakthrough: teacher trajectories should include failed tool calls, "
3326 "operator corrections, recovery steps, and measured validation so the student learns the "
3327 "whole harness loop instead of only final answers"
3328 ),
3329 "body": "",
3330 "metadata": {},
3331 "created_at": "2026-05-01T12:05:00+00:00",
3332 },
3333 ]
3334
3335 rendered = "\n".join(hourly_update_lines(events, width=82, limit=6))
3336
3337 assert "operator corrections" in rendered
3338 assert "measured" in rendered
3339 assert "validation" in rendered
3340 assert "only" in rendered
3341 assert "final answers" in rendered
3342 assert "..." not in rendered
3343
3344
3345def test_hourly_outcomes_limit_visible_hours_without_losing_headers():
3346 events = []
3347 for hour in range(8):
3348 events.extend(
3349 [
3350 {
3351 "event_type": "artifact",
3352 "title": f"Draft saved hour {hour}",
3353 "body": "",
3354 "metadata": {},
3355 "created_at": f"2026-05-01T{hour:02d}:05:00+00:00",
3356 },
3357 {
3358 "event_type": "finding",
3359 "title": f"Finding hour {hour}",
3360 "body": "",
3361 "metadata": {},
3362 "created_at": f"2026-05-01T{hour:02d}:20:00+00:00",
3363 },
3364 ]
3365 )
3366
3367 rendered = "\n".join(hourly_update_lines(events, width=96, limit=8))
3368
3369 assert "2026-05-01 06:00" in rendered
3370 assert "2026-05-01 07:00" in rendered
3371 assert "Draft saved hour 7" in rendered
3372 assert "Finding hour 7" in rendered
3373 assert "Draft saved hour 0" not in rendered
3374
3375
3376def test_chat_updates_page_includes_agent_error_updates():
3377 snapshot = {
3378 "job_id": "job_demo",
3379 "job": {
3380 "id": "job_demo",
3381 "title": "provider job",
3382 "objective": "keep provider state visible",
3383 "status": "paused",
3384 "kind": "generic",
3385 "metadata": {},
3386 },
3387 "jobs": [{"id": "job_demo", "title": "provider job", "status": "paused", "kind": "generic", "metadata": {}}],
3388 "steps": [],
3389 "artifacts": [],
3390 "memory_entries": [],
3391 "events": [],
3392 "summary_events": [
3393 {
3394 "event_type": "agent_message",
3395 "title": "error",
3396 "body": "Model provider requires operator action.",
3397 "metadata": {"reason": "llm_provider_blocked"},
3398 },
3399 ],
3400 "daemon": {"running": True, "metadata": {"pid": 123}},
3401 "model": "model/demo",
3402 }
3403
3404 updates = _build_chat_frame(snapshot, "", [], width=132, height=34, right_view="updates")
3405 status = _build_chat_frame(snapshot, "", [], width=132, height=34, right_view="status")
3406
3407 assert "Model provider requires" in updates
3408 assert "operator action" in updates
3409 assert "Outcome" in status
3410 assert "Model provider re" in status
3411
3412
3413def test_chat_status_marks_provider_blocked_jobs_before_daemon_retry():
3414 job = {
3415 "id": "job_demo",
3416 "title": "provider job",
3417 "objective": "keep provider state visible",
3418 "status": "running",
3419 "kind": "generic",
3420 "metadata": {"provider_blocked_at": "2026-05-01T00:00:00+00:00"},
3421 }
3422 snapshot = {
3423 "job_id": "job_demo",
3424 "job": job,
3425 "jobs": [job],
3426 "steps": [],
3427 "artifacts": [],
3428 "memory_entries": [],
3429 "events": [],
3430 "summary_events": [],
3431 "daemon": {"running": True, "metadata": {"pid": 123}},
3432 "model": "model/demo",
3433 }
3434
3435 frame = _build_chat_frame(snapshot, "", [], width=132, height=30, right_view="status")
3436
3437 assert "provider wait" in frame
3438 assert "Provider" in frame
3439 assert "action needed" in frame
3440 assert "advancing" not in frame
3441
3442
3443def test_chat_status_page_surfaces_context_pressure():
3444 snapshot = {
3445 "job_id": "job_demo",
3446 "job": {
3447 "id": "job_demo",
3448 "title": "context job",
3449 "objective": "keep context pressure visible",
3450 "status": "running",
3451 "kind": "generic",
3452 "metadata": {},
3453 },
3454 "jobs": [{"id": "job_demo", "title": "context job", "status": "running", "kind": "generic", "metadata": {}}],
3455 "steps": [],
3456 "artifacts": [],
3457 "memory_entries": [],
3458 "events": [],
3459 "daemon": {"running": True, "metadata": {"pid": 123}},
3460 "model": "model/demo",
3461 "context_length": 8192,
3462 "token_usage": {"calls": 3, "latest_prompt_tokens": 7000, "total_tokens": 9000, "completion_tokens": 2000},
3463 }
3464
3465 frame = _build_chat_frame(snapshot, "", [], width=132, height=30, right_view="status")
3466
3467 assert "Context" in frame
3468 assert "7.0K/8.2K" in frame
3469 assert "85%" in frame
3470 assert "high" in frame
3471
3472
3473def test_chat_status_page_surfaces_low_durable_yield():
3474 snapshot = {
3475 "job_id": "job_demo",
3476 "job": {
3477 "id": "job_demo",
3478 "title": "yield job",
3479 "objective": "keep durable progress visible",
3480 "status": "running",
3481 "kind": "generic",
3482 "metadata": {},
3483 },
3484 "jobs": [{"id": "job_demo", "title": "yield job", "status": "running", "kind": "generic", "metadata": {}}],
3485 "steps": [],
3486 "artifacts": [{"id": "art_demo", "title": "Only Saved Output"}],
3487 "memory_entries": [],
3488 "events": [],
3489 "daemon": {"running": True, "metadata": {"pid": 123}},
3490 "model": "model/demo",
3491 "counts": {"steps": 120, "artifacts": 1, "memory": 0},
3492 }
3493
3494 frame = _build_chat_frame(snapshot, "", [], width=132, height=30, right_view="status")
3495
3496 assert "Yield" in frame
3497 assert "watch" in frame
3498 assert "120.0 actions/outcome" in frame
3499
3500
3501def test_chat_status_page_shows_job_outputs():
3502 snapshot = {
3503 "job_id": "job_demo",
3504 "job": {
3505 "id": "job_demo",
3506 "title": "demo job",
3507 "objective": "show created outputs per job",
3508 "status": "running",
3509 "kind": "generic",
3510 "metadata": {},
3511 },
3512 "jobs": [
3513 {"id": "job_demo", "title": "demo job", "status": "running", "kind": "generic", "metadata": {}},
3514 {"id": "job_other", "title": "other job", "status": "queued", "kind": "generic", "metadata": {}},
3515 ],
3516 "steps": [],
3517 "artifacts": [
3518 {"id": "art_demo", "title": "Primary Saved Draft"},
3519 {"id": "art_second", "title": "Secondary Saved Note"},
3520 ],
3521 "job_artifacts": {
3522 "job_demo": [
3523 {"id": "art_demo", "title": "Primary Saved Draft"},
3524 {"id": "art_second", "title": "Secondary Saved Note"},
3525 ],
3526 "job_other": [{"id": "art_other", "title": "Other Job Deliverable"}],
3527 },
3528 "job_counts": {
3529 "job_demo": {"artifacts": 2},
3530 "job_other": {"artifacts": 4},
3531 },
3532 "job_summary_events": {
3533 "job_demo": [
3534 {"event_type": "artifact", "title": "Primary Saved Draft", "body": "", "metadata": {}},
3535 {"event_type": "experiment", "title": "Primary quality check", "body": "", "metadata": {"metric_name": "score", "metric_value": 8}},
3536 ],
3537 "job_other": [
3538 {"event_type": "finding", "title": "Other job durable finding", "body": "", "metadata": {}},
3539 ],
3540 },
3541 "memory_entries": [],
3542 "events": [],
3543 "summary_events": [
3544 {"event_type": "finding", "title": "Latest durable milestone", "body": "", "metadata": {}},
3545 ],
3546 "daemon": {"running": True, "metadata": {"pid": 123}},
3547 "model": "model/demo",
3548 "counts": {"steps": 0, "artifacts": 1, "memory": 0},
3549 }
3550
3551 frame = _build_chat_frame(snapshot, "", [], width=132, height=34, right_view="status")
3552
3553 assert "Jobs" in frame
3554 assert "Latest hour" in frame
3555 assert "1 findings" in frame
3556 assert "Outcome" in frame
3557 assert "Latest durable milestone" in frame
3558 assert "2 outputs" in frame
3559 assert "Primary Saved Draft" in frame
3560 assert "Secondary Saved Note" in frame
3561 assert "Primary quality check" in frame
3562 assert "4 outputs" in frame
3563 assert "Other Job Deliverable" in frame
3564 assert "Other job durable finding" in frame
3565
3566
3567def test_frame_snapshot_keeps_summary_events_durable(monkeypatch, tmp_path):
3568 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
3569 db = AgentDB(tmp_path / "state.db")
3570 try:
3571 job_id = db.create_job("keep frame refresh focused", title="focused")
3572 for index in range(30):
3573 db.append_event(
3574 job_id=job_id,
3575 event_type="tool_result",
3576 title="web_search",
3577 body=f"search noise {index}",
3578 metadata={"status": "completed"},
3579 )
3580 db.append_event(
3581 job_id=job_id,
3582 event_type="tool_result",
3583 title="write_file",
3584 body="write_file overwrite /tmp/paper.md",
3585 metadata={"status": "completed", "input": {"arguments": {"path": "/tmp/paper.md"}}, "output": {"path": "/tmp/paper.md"}},
3586 )
3587 db.append_event(
3588 job_id=job_id,
3589 event_type="tool_result",
3590 title="shell_exec",
3591 body="shell_exec rc=0",
3592 metadata={"status": "completed", "input": {"arguments": {"command": "printf draft | tee /tmp/outline.md"}}},
3593 )
3594 db.append_event(job_id=job_id, event_type="artifact", title="Durable Paper Draft", body="", metadata={})
3595 db.append_event(job_id=job_id, event_type="finding", title="Actual finding", body="", metadata={})
3596 finally:
3597 db.close()
3598
3599 snapshot = _load_frame_snapshot(job_id, history_limit=4)
3600 summary_text = "\n".join(str(event.get("title") or event.get("body") or "") for event in snapshot["summary_events"])
3601
3602 assert "Durable Paper Draft" in summary_text
3603 assert "Actual finding" in summary_text
3604 assert "write_file" in summary_text
3605 assert "shell_exec" in summary_text
3606 assert "web_search" not in summary_text
3607 assert "search noise" not in summary_text
3608
3609
3610def test_frame_snapshot_respects_explicit_job_over_saved_focus(monkeypatch, tmp_path):
3611 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
3612 db = AgentDB(tmp_path / "state.db")
3613 try:
3614 focused_id = db.create_job("saved focus", title="saved focus")
3615 requested_id = db.create_job("requested focus", title="requested focus")
3616 (tmp_path / "shell_state.json").write_text(json.dumps({"focus_job_id": focused_id}), encoding="utf-8")
3617 finally:
3618 db.close()
3619
3620 snapshot = _load_frame_snapshot(requested_id, history_limit=4)
3621
3622 assert snapshot["job_id"] == requested_id
3623 assert snapshot["job"]["title"] == "requested focus"
3624
3625
3626def test_chat_status_page_marks_deferred_jobs_waiting():
3627 snapshot = {
3628 "job_id": "job_demo",
3629 "job": {
3630 "id": "job_demo",
3631 "title": "deferred job",
3632 "objective": "check a long-running process later",
3633 "status": "running",
3634 "kind": "generic",
3635 "metadata": {"defer_until": "2999-01-01T00:00:00+00:00", "defer_reason": "external process running"},
3636 },
3637 "jobs": [
3638 {
3639 "id": "job_demo",
3640 "title": "deferred job",
3641 "status": "running",
3642 "kind": "generic",
3643 "metadata": {"defer_until": "2999-01-01T00:00:00+00:00", "defer_reason": "external process running"},
3644 }
3645 ],
3646 "steps": [],
3647 "artifacts": [],
3648 "job_artifacts": {},
3649 "memory_entries": [],
3650 "events": [],
3651 "daemon": {"running": True, "metadata": {"pid": 123}},
3652 "model": "model/demo",
3653 }
3654
3655 frame = _build_chat_frame(snapshot, "", [], width=132, height=28, right_view="status")
3656
3657 assert "waiting" in frame
3658 assert "Wait" in frame
3659 assert "next check" in frame
3660 assert "external" in frame
3661 assert "active" not in frame
3662
3663
3664def test_chat_frame_collapses_repeated_failures_and_hides_memory_noise():
3665 repeated_error = {
3666 "event_type": "error",
3667 "title": "llm",
3668 "body": "Error code: 403 - {'error': {'message': 'Key limit exceeded (total limit).'}}",
3669 "metadata": {},
3670 }
3671 snapshot = {
3672 "job_id": "job_demo",
3673 "job": {
3674 "id": "job_demo",
3675 "title": "demo job",
3676 "objective": "stay readable",
3677 "status": "running",
3678 "kind": "generic",
3679 "metadata": {"task_queue": []},
3680 },
3681 "jobs": [{"id": "job_demo", "title": "demo job", "status": "running", "kind": "generic", "metadata": {}}],
3682 "steps": [],
3683 "artifacts": [],
3684 "memory_entries": [{}],
3685 "events": [
3686 repeated_error,
3687 {
3688 "event_type": "compaction",
3689 "title": "rolling_state",
3690 "body": "very long compact memory " * 80,
3691 "metadata": {},
3692 },
3693 repeated_error,
3694 repeated_error,
3695 ],
3696 "daemon": {"running": True, "metadata": {"pid": 123}},
3697 "model": "model/demo",
3698 "counts": {"steps": 3, "artifacts": 0, "memory": 1},
3699 }
3700
3701 frame = _build_chat_frame(snapshot, "", [], width=120, height=24, right_view="updates")
3702
3703 assert "3 blocks" in frame
3704 assert "FAIL" in frame
3705 assert "very long compact memory" not in frame
3706
3707
3708def test_work_pane_uses_badges_without_duplicate_action_verbs():
3709 snapshot = {
3710 "job_id": "job_demo",
3711 "job": {
3712 "id": "job_demo",
3713 "title": "demo job",
3714 "objective": "stay readable",
3715 "status": "running",
3716 "kind": "generic",
3717 "metadata": {"task_queue": []},
3718 },
3719 "jobs": [{"id": "job_demo", "title": "demo job", "status": "running", "kind": "generic", "metadata": {}}],
3720 "steps": [],
3721 "artifacts": [],
3722 "memory_entries": [],
3723 "events": [
3724 {"event_type": "artifact", "title": "Demo Output", "body": "", "metadata": {}},
3725 {"event_type": "finding", "title": "Demo Finding", "body": "", "metadata": {}},
3726 {"event_type": "experiment", "title": "Demo Measurement", "body": "", "metadata": {}},
3727 ],
3728 "daemon": {"running": True, "metadata": {"pid": 123}},
3729 "model": "model/demo",
3730 }
3731
3732 frame = _build_chat_frame(snapshot, "", [], width=120, height=24, right_view="updates")
3733
3734 assert "Demo Output" in frame
3735 assert "Demo Finding" in frame
3736 assert "TEST" in frame
3737 assert "Demo Measurement" in frame
3738 assert "save saved" not in frame
3739 assert "find finding" not in frame
3740 assert "test experiment" not in frame
3741
3742
3743def test_run_reopens_completed_focused_job(monkeypatch, tmp_path, capsys):
3744 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
3745 _mark_test_model_ready()
3746 parser = build_parser()
3747 db = AgentDB(tmp_path / "state.db")
3748 try:
3749 job_id = db.create_job("Keep improving", title="perpetual")
3750 db.update_job_status(job_id, "completed")
3751 finally:
3752 db.close()
3753 started = {}
3754
3755 def fake_start(**kwargs):
3756 started.update(kwargs)
3757
3758 monkeypatch.setattr("nipux_cli.cli._start_daemon_if_needed", fake_start)
3759 args = parser.parse_args(["run", "perpetual", "--no-follow"])
3760
3761 args.func(args)
3762
3763 out = capsys.readouterr().out
3764 db = AgentDB(tmp_path / "state.db")
3765 try:
3766 job = db.get_job(job_id)
3767 assert "focus set: perpetual" in out
3768 assert job["status"] == "queued"
3769 assert job["metadata"]["last_note"] == "reopened from completed by operator run command"
3770 assert started["poll_seconds"] == 0.0
3771 finally:
3772 db.close()
3773
3774
3775def test_run_delegates_unverified_provider_state_to_daemon_start(monkeypatch, tmp_path, capsys):
3776 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
3777 parser = build_parser()
3778 db = AgentDB(tmp_path / "state.db")
3779 try:
3780 db.create_job("Keep checking provider recovery", title="provider recovery")
3781 finally:
3782 db.close()
3783 started = {}
3784
3785 def fake_start(**kwargs):
3786 started.update(kwargs)
3787 print("model provider is not ready; starting daemon in recovery monitor mode")
3788
3789 monkeypatch.setattr("nipux_cli.cli._remote_model_preflight_failures", lambda _config: [])
3790 monkeypatch.setattr("nipux_cli.cli._start_daemon_if_needed", fake_start)
3791 args = parser.parse_args(["run", "provider recovery", "--no-follow"])
3792
3793 args.func(args)
3794
3795 out = capsys.readouterr().out
3796 assert "Model setup is not verified." not in out
3797 assert "recovery monitor mode" in out
3798 assert started["poll_seconds"] == 0.0
3799
3800
3801def test_run_marks_job_waiting_when_provider_recovery_is_needed(monkeypatch, tmp_path, capsys):
3802 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
3803 parser = build_parser()
3804 db = AgentDB(tmp_path / "state.db")
3805 try:
3806 job_id = db.create_job("Keep checking provider recovery", title="provider recovery")
3807 finally:
3808 db.close()
3809
3810 monkeypatch.setattr("nipux_cli.cli._remote_model_preflight_failures", lambda _config: ["model_generation: key limit exceeded"])
3811
3812 def fake_start(**_kwargs):
3813 print("model provider is not ready; starting daemon in recovery monitor mode")
3814
3815 monkeypatch.setattr("nipux_cli.cli._start_daemon_if_needed", fake_start)
3816 args = parser.parse_args(["run", "provider recovery", "--no-follow"])
3817
3818 args.func(args)
3819
3820 out = capsys.readouterr().out
3821 assert "recovery monitor mode" in out
3822 db = AgentDB(tmp_path / "state.db")
3823 try:
3824 job = db.get_job(job_id)
3825 assert job["status"] == "paused"
3826 assert job["metadata"]["provider_blocked_at"]
3827 assert "monitor and resume" in job["metadata"]["last_note"]
3828 finally:
3829 db.close()
3830
3831
3832def test_run_does_not_reopen_already_provider_blocked_job(monkeypatch, tmp_path, capsys):
3833 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
3834 parser = build_parser()
3835 db = AgentDB(tmp_path / "state.db")
3836 blocked_at = "2026-05-01T00:00:00+00:00"
3837 try:
3838 job_id = db.create_job("Keep checking provider recovery", title="provider recovery")
3839 db.update_job_status(
3840 job_id,
3841 "paused",
3842 metadata_patch={
3843 "provider_blocked_at": blocked_at,
3844 "last_note": "Model provider is unavailable; daemon will monitor and resume this job when calls succeed.",
3845 },
3846 )
3847 event_count = len(db.list_events(job_id=job_id, limit=20))
3848 finally:
3849 db.close()
3850
3851 monkeypatch.setattr("nipux_cli.cli._remote_model_preflight_failures", lambda _config: ["model_generation: key limit exceeded"])
3852
3853 def fake_start(**_kwargs):
3854 print("model provider is not ready; starting daemon in recovery monitor mode")
3855
3856 monkeypatch.setattr("nipux_cli.cli._start_daemon_if_needed", fake_start)
3857 args = parser.parse_args(["run", "provider recovery", "--no-follow"])
3858
3859 args.func(args)
3860
3861 out = capsys.readouterr().out
3862 assert "recovery monitor mode" in out
3863 db = AgentDB(tmp_path / "state.db")
3864 try:
3865 job = db.get_job(job_id)
3866 assert job["status"] == "paused"
3867 assert job["metadata"]["provider_blocked_at"] == blocked_at
3868 assert "still unavailable" in job["metadata"]["last_note"]
3869 events = db.list_events(job_id=job_id, limit=20)
3870 assert len(events) == event_count
3871 assert all("Reopened from" not in str(event.get("body") or "") for event in events)
3872 finally:
3873 db.close()
3874
3875
3876def test_run_does_not_reopen_job_when_provider_preflight_is_hard_failure(monkeypatch, tmp_path):
3877 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
3878 parser = build_parser()
3879 db = AgentDB(tmp_path / "state.db")
3880 try:
3881 job_id = db.create_job("Keep checking provider recovery", title="provider recovery")
3882 db.update_job_status(job_id, "paused", metadata_patch={"last_note": "operator paused"})
3883 finally:
3884 db.close()
3885
3886 monkeypatch.setattr("nipux_cli.cli._remote_model_preflight_failures", lambda _config: ["model_auth: user not found"])
3887 monkeypatch.setattr("nipux_cli.cli._start_daemon_if_needed", lambda **_kwargs: None)
3888 args = parser.parse_args(["run", "provider recovery", "--no-follow"])
3889
3890 args.func(args)
3891
3892 db = AgentDB(tmp_path / "state.db")
3893 try:
3894 job = db.get_job(job_id)
3895 assert job["status"] == "paused"
3896 assert job["metadata"]["last_note"] == "operator paused"
3897 finally:
3898 db.close()
3899
3900
3901def test_create_sets_new_job_as_shell_focus(monkeypatch, tmp_path, capsys):
3902 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
3903 _mark_test_model_ready()
3904 parser = build_parser()
3905 args = parser.parse_args(["create", "Research new topic", "--title", "new research", "--kind", "generic"])
3906
3907 args.func(args)
3908 created = capsys.readouterr().out.strip()
3909 assert _run_shell_line("focus") is True
3910
3911 out = capsys.readouterr().out
3912 assert created == "created new research"
3913 assert "new research" in out
3914 assert (tmp_path / "jobs" / "new-research" / "program.md").exists()
3915 db = AgentDB(tmp_path / "state.db")
3916 try:
3917 job = db.get_job("new-research")
3918 assert job["status"] == "queued"
3919 assert job["metadata"]["planning_status"] == "auto_accepted"
3920 assert job["metadata"]["planning"]["questions"]
3921 tasks = job["metadata"]["task_queue"]
3922 assert tasks
3923 assert all(task["output_contract"] for task in tasks)
3924 assert all(task["acceptance_criteria"] for task in tasks)
3925 assert all(task["evidence_needed"] for task in tasks)
3926 assert all(task["stall_behavior"] for task in tasks)
3927 finally:
3928 db.close()
3929
3930
3931def test_commands_accept_unquoted_job_titles_in_shell(monkeypatch, tmp_path, capsys):
3932 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
3933 db = AgentDB(tmp_path / "state.db")
3934 try:
3935 db.create_job("Research topic", title="nightly research")
3936 finally:
3937 db.close()
3938
3939 assert _run_shell_line("status nightly research") is True
3940
3941 out = capsys.readouterr().out
3942 assert "focus: nightly research" in out
3943 assert "state: open" in out
3944 assert "job_" not in out
3945
3946
3947def test_shell_stop_job_title_pauses_job_instead_of_stopping_daemon(monkeypatch, tmp_path, capsys):
3948 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
3949 db = AgentDB(tmp_path / "state.db")
3950 try:
3951 job_id = db.create_job("Research topic", title="nightly research")
3952 db.update_job_status(job_id, "running")
3953 finally:
3954 db.close()
3955
3956 assert _run_shell_line("stop nightly research") is True
3957
3958 out = capsys.readouterr().out
3959 db = AgentDB(tmp_path / "state.db")
3960 try:
3961 job = db.get_job(job_id)
3962 assert "stopped nightly research" in out
3963 assert job["status"] == "paused"
3964 assert job["metadata"]["last_note"] == "stopped by operator"
3965 finally:
3966 db.close()
3967
3968
3969def test_resume_clears_provider_block_before_retry(monkeypatch, tmp_path, capsys):
3970 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
3971 db = AgentDB(tmp_path / "state.db")
3972 try:
3973 job_id = db.create_job("Research topic", title="nightly research")
3974 db.update_job_status(
3975 job_id,
3976 "paused",
3977 metadata_patch={
3978 "provider_blocked_at": "2026-05-01T00:00:00+00:00",
3979 "defer_until": "2999-01-01T00:00:00+00:00",
3980 "defer_reason": "waiting for a monitor interval",
3981 "defer_next_action": "check later",
3982 },
3983 )
3984 finally:
3985 db.close()
3986
3987 main(["resume", "nightly research"])
3988
3989 out = capsys.readouterr().out
3990 db = AgentDB(tmp_path / "state.db")
3991 try:
3992 job = db.get_job(job_id)
3993 assert "resumed nightly research" in out
3994 assert job["status"] == "queued"
3995 assert job["metadata"]["provider_blocked_at"] == ""
3996 assert job["metadata"]["provider_unblocked_at"]
3997 assert job["metadata"]["defer_until"] == ""
3998 assert job["metadata"]["defer_reason"] == ""
3999 assert job["metadata"]["defer_next_action"] == ""
4000 finally:
4001 db.close()
4002
4003
4004def test_shell_cancel_prefers_multiword_job_title_over_note(monkeypatch, tmp_path, capsys):
4005 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
4006 db = AgentDB(tmp_path / "state.db")
4007 try:
4008 job_id = db.create_job("Research topic", title="nightly research")
4009 db.update_job_status(job_id, "running")
4010 finally:
4011 db.close()
4012
4013 assert _run_shell_line("cancel nightly research") is True
4014
4015 out = capsys.readouterr().out
4016 db = AgentDB(tmp_path / "state.db")
4017 try:
4018 job = db.get_job(job_id)
4019 assert "cancelled nightly research" in out
4020 assert ": finder" not in out
4021 assert job["status"] == "cancelled"
4022 assert "last_note" not in job["metadata"]
4023 finally:
4024 db.close()
4025
4026
4027def test_shell_pause_splits_note_after_longest_matching_job_title(monkeypatch, tmp_path, capsys):
4028 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
4029 db = AgentDB(tmp_path / "state.db")
4030 try:
4031 job_id = db.create_job("Research topic", title="nightly research")
4032 db.update_job_status(job_id, "running")
4033 finally:
4034 db.close()
4035
4036 assert _run_shell_line("pause nightly research checking costs") is True
4037
4038 out = capsys.readouterr().out
4039 db = AgentDB(tmp_path / "state.db")
4040 try:
4041 job = db.get_job(job_id)
4042 assert "paused nightly research: checking costs" in out
4043 assert job["status"] == "paused"
4044 assert job["metadata"]["last_note"] == "checking costs"
4045 finally:
4046 db.close()
4047
4048
4049def test_chat_handle_line_adds_operator_message(monkeypatch, tmp_path, capsys):
4050 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
4051 _mark_test_model_ready()
4052 db = AgentDB(tmp_path / "state.db")
4053 try:
4054 job_id = db.create_job("Research topic", title="nightly research")
4055 finally:
4056 db.close()
4057
4058 assert (
4059 _chat_handle_line(
4060 job_id, "prefer artifact-backed findings", reply_fn=lambda _job_id, _message: "Okay, I will focus there."
4061 )
4062 is True
4063 )
4064
4065 out = capsys.readouterr().out
4066 db = AgentDB(tmp_path / "state.db")
4067 try:
4068 job = db.get_job(job_id)
4069 assert "waiting:" in out
4070 assert "Okay, I will focus there." in out
4071 assert job["metadata"]["operator_messages"][-1]["source"] == "chat"
4072 assert job["metadata"]["operator_messages"][-1]["message"] == "prefer artifact-backed findings"
4073 assert job["metadata"]["last_agent_update"]["category"] == "chat"
4074 finally:
4075 db.close()
4076
4077
4078def test_chat_can_spawn_new_job_from_plain_message(monkeypatch, tmp_path, capsys):
4079 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
4080 _mark_test_model_ready()
4081 db = AgentDB(tmp_path / "state.db")
4082 try:
4083 original_id = db.create_job("Research topic", title="nightly research")
4084 finally:
4085 db.close()
4086 started = {}
4087
4088 def fake_start(**kwargs):
4089 started.update(kwargs)
4090
4091 monkeypatch.setattr("nipux_cli.cli._start_daemon_if_needed", fake_start)
4092
4093 assert (
4094 _chat_handle_line(
4095 original_id,
4096 "create a job to monitor nightly benchmarks and report regressions",
4097 reply_fn=lambda _job_id, _message: "should not call model",
4098 )
4099 is True
4100 )
4101
4102 out = capsys.readouterr().out
4103 db = AgentDB(tmp_path / "state.db")
4104 try:
4105 jobs = db.list_jobs()
4106 assert len(jobs) == 2
4107 created = [job for job in jobs if job["id"] != original_id][0]
4108 assert "monitor nightly benchmarks" in created["objective"]
4109 assert created["status"] == "queued"
4110 assert created["metadata"]["planning_status"] == "auto_accepted"
4111 assert "should not call model" not in out
4112 assert "Created job" in out
4113 assert "Started worker" in out
4114 assert started["poll_seconds"] == 0.0
4115 assert started["quiet"] is True
4116 finally:
4117 db.close()
4118
4119
4120def test_workspace_chat_can_create_refined_worker_job(monkeypatch, tmp_path):
4121 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
4122 _mark_test_model_ready()
4123 started = {}
4124
4125 def fake_start(**kwargs):
4126 started.update(kwargs)
4127
4128 monkeypatch.setattr("nipux_cli.cli._start_daemon_if_needed", fake_start)
4129 monkeypatch.setattr(
4130 "nipux_cli.cli._refine_job_objective_for_worker",
4131 lambda *, message, objective: f"{objective}\n\nRefined durable objective with success criteria and artifacts.",
4132 )
4133
4134 ok, message = _handle_chat_message(
4135 WORKSPACE_CHAT_ID,
4136 "create a job to research browser automation libraries",
4137 quiet=True,
4138 )
4139
4140 assert ok is True
4141 assert "Created worker job" in message
4142 assert started["quiet"] is True
4143 db = AgentDB(tmp_path / "state.db")
4144 try:
4145 jobs = db.list_jobs()
4146 assert len(jobs) == 1
4147 assert "Refined durable objective" in jobs[0]["objective"]
4148 snapshot = _load_frame_snapshot(WORKSPACE_CHAT_ID, history_limit=4)
4149 bodies = "\n".join(str(event.get("body") or "") for event in snapshot["events"])
4150 assert "create a job" in bodies
4151 assert "Created worker job" in bodies
4152 finally:
4153 db.close()
4154
4155
4156def test_workspace_chat_start_objective_creates_worker_without_model_reply(monkeypatch, tmp_path):
4157 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
4158 _mark_test_model_ready()
4159 started = {}
4160
4161 def fake_start(**kwargs):
4162 started.update(kwargs)
4163
4164 monkeypatch.setattr("nipux_cli.cli._start_daemon_if_needed", fake_start)
4165 monkeypatch.setattr(
4166 "nipux_cli.cli._refine_job_objective_for_worker",
4167 lambda *, message, objective: objective,
4168 )
4169
4170 ok, message = _handle_workspace_chat_message("start research browser automation libraries", quiet=True)
4171
4172 assert ok is True
4173 assert "Created worker job" in message
4174 assert started["quiet"] is True
4175 db = AgentDB(tmp_path / "state.db")
4176 try:
4177 jobs = db.list_jobs()
4178 assert len(jobs) == 1
4179 assert jobs[0]["title"] == "research browser automation libraries"
4180 assert _read_shell_state().get("focus_job_id") == jobs[0]["id"]
4181 finally:
4182 db.close()
4183
4184
4185def test_workspace_chat_accepts_natural_worker_and_task_phrasing(monkeypatch, tmp_path):
4186 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
4187 _mark_test_model_ready()
4188 started = {}
4189
4190 def fake_start(**kwargs):
4191 started.setdefault("calls", []).append(kwargs)
4192
4193 monkeypatch.setattr("nipux_cli.cli._start_daemon_if_needed", fake_start)
4194 monkeypatch.setattr(
4195 "nipux_cli.cli._refine_job_objective_for_worker",
4196 lambda *, message, objective: objective,
4197 )
4198
4199 ok, worker_message = _handle_workspace_chat_message(
4200 "spin up a worker to monitor docs and report changes",
4201 quiet=True,
4202 )
4203 ok2, task_message = _handle_workspace_chat_message(
4204 "run a task to audit onboarding and write a report",
4205 quiet=True,
4206 )
4207
4208 assert ok is True
4209 assert ok2 is True
4210 assert "Created worker job" in worker_message
4211 assert "Created worker job" in task_message
4212 assert len(started["calls"]) == 2
4213 db = AgentDB(tmp_path / "state.db")
4214 try:
4215 titles = [job["title"] for job in db.list_jobs()]
4216 assert "monitor docs and report changes" in titles
4217 assert "audit onboarding and write a report" in titles
4218 finally:
4219 db.close()
4220
4221
4222def test_chat_can_queue_new_job_without_starting(monkeypatch, tmp_path, capsys):
4223 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
4224 _mark_test_model_ready()
4225 db = AgentDB(tmp_path / "state.db")
4226 try:
4227 original_id = db.create_job("Research topic", title="nightly research")
4228 finally:
4229 db.close()
4230 started = {}
4231
4232 def fake_start(**kwargs):
4233 started.update(kwargs)
4234
4235 monkeypatch.setattr("nipux_cli.cli._start_daemon_if_needed", fake_start)
4236
4237 assert (
4238 _chat_handle_line(
4239 original_id,
4240 "create only a job to monitor nightly benchmarks and report regressions",
4241 reply_fn=lambda _job_id, _message: "should not call model",
4242 )
4243 is True
4244 )
4245
4246 out = capsys.readouterr().out
4247 assert "Created job" in out
4248 assert "Started worker" not in out
4249 assert started == {}
4250
4251
4252def test_chat_can_spawn_generic_deliverable_job_from_plain_message(monkeypatch, tmp_path):
4253 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
4254 _mark_test_model_ready()
4255 db = AgentDB(tmp_path / "state.db")
4256 try:
4257 original_id = db.create_job("Research topic", title="nightly research")
4258 finally:
4259 db.close()
4260 started = {}
4261
4262 def fake_start(**kwargs):
4263 started.update(kwargs)
4264
4265 monkeypatch.setattr("nipux_cli.cli._start_daemon_if_needed", fake_start)
4266
4267 assert (
4268 _chat_handle_line(
4269 original_id,
4270 "generate a polished launch checklist for this repository",
4271 reply_fn=lambda _job_id, _message: "should not call model",
4272 )
4273 is True
4274 )
4275
4276 db = AgentDB(tmp_path / "state.db")
4277 try:
4278 jobs = db.list_jobs()
4279 assert len(jobs) == 2
4280 created = [job for job in jobs if job["id"] != original_id][0]
4281 assert "launch checklist" in created["objective"]
4282 assert started["quiet"] is True
4283 finally:
4284 db.close()
4285
4286
4287def test_chat_start_job_message_starts_daemon(monkeypatch, tmp_path, capsys):
4288 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
4289 _mark_test_model_ready()
4290 db = AgentDB(tmp_path / "state.db")
4291 try:
4292 original_id = db.create_job("Research topic", title="nightly research")
4293 finally:
4294 db.close()
4295 started = {}
4296
4297 def fake_start(**kwargs):
4298 started.update(kwargs)
4299
4300 monkeypatch.setattr("nipux_cli.cli._start_daemon_if_needed", fake_start)
4301
4302 assert (
4303 _chat_handle_line(
4304 original_id,
4305 "start a job to monitor nightly benchmarks and report regressions",
4306 reply_fn=lambda _job_id, _message: "should not call model",
4307 )
4308 is True
4309 )
4310
4311 out = capsys.readouterr().out
4312 assert started["poll_seconds"] == 0.0
4313 assert started["quiet"] is True
4314 assert "Started worker" in out
4315
4316
4317def test_chat_create_job_and_run_it_starts_daemon(monkeypatch, tmp_path, capsys):
4318 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
4319 _mark_test_model_ready()
4320 db = AgentDB(tmp_path / "state.db")
4321 try:
4322 original_id = db.create_job("Research topic", title="nightly research")
4323 finally:
4324 db.close()
4325 started = {}
4326
4327 def fake_start(**kwargs):
4328 started.update(kwargs)
4329
4330 monkeypatch.setattr("nipux_cli.cli._start_daemon_if_needed", fake_start)
4331
4332 assert (
4333 _chat_handle_line(
4334 original_id,
4335 "create a job to monitor nightly benchmarks and then run it",
4336 reply_fn=lambda _job_id, _message: "should not call model",
4337 )
4338 is True
4339 )
4340
4341 out = capsys.readouterr().out
4342 assert started["poll_seconds"] == 0.0
4343 assert started["quiet"] is True
4344 assert "Started worker" in out
4345
4346
4347def test_chat_jobs_command_lists_jobs_instead_of_steering(monkeypatch, tmp_path, capsys):
4348 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
4349 db = AgentDB(tmp_path / "state.db")
4350 try:
4351 job_id = db.create_job("Research topic", title="nightly research")
4352 finally:
4353 db.close()
4354
4355 assert _chat_handle_line(job_id, "/jobs", reply_fn=lambda _job_id, _message: "should not run") is True
4356
4357 out = capsys.readouterr().out
4358 db = AgentDB(tmp_path / "state.db")
4359 try:
4360 job = db.get_job(job_id)
4361 assert "nightly research" in out
4362 assert "should not run" not in out
4363 assert job["metadata"].get("operator_messages") is None
4364 finally:
4365 db.close()
4366
4367
4368def test_chat_command_inside_chat_is_not_queued(monkeypatch, tmp_path, capsys):
4369 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
4370 db = AgentDB(tmp_path / "state.db")
4371 try:
4372 job_id = db.create_job("Research topic", title="nightly research")
4373 finally:
4374 db.close()
4375
4376 assert (
4377 _chat_handle_line(job_id, 'chat "nightly research"', reply_fn=lambda _job_id, _message: "should not run")
4378 is True
4379 )
4380
4381 out = capsys.readouterr().out
4382 db = AgentDB(tmp_path / "state.db")
4383 try:
4384 job = db.get_job(job_id)
4385 assert "already chatting with nightly research" in out
4386 assert "should not run" not in out
4387 assert job["metadata"].get("operator_messages") is None
4388 finally:
4389 db.close()
4390
4391
4392def test_chat_run_accepts_initial_plan_before_starting(monkeypatch, tmp_path):
4393 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
4394 _mark_test_model_ready()
4395 parser = build_parser()
4396 args = parser.parse_args(["create", "Research new topic", "--title", "new research", "--kind", "generic"])
4397 args.func(args)
4398 job_id = "new-research"
4399 captured = {}
4400
4401 def fake_run(run_args):
4402 captured["job_id"] = run_args.job_id
4403
4404 monkeypatch.setattr("nipux_cli.cli.cmd_run", fake_run)
4405
4406 assert _chat_handle_line(job_id, "/run") is True
4407
4408 db = AgentDB(tmp_path / "state.db")
4409 try:
4410 job = db.get_job(job_id)
4411 assert job["status"] == "queued"
4412 assert captured["job_id"] == job_id
4413 finally:
4414 db.close()
4415
4416
4417def test_run_without_jobs_does_not_start_empty_daemon(monkeypatch, tmp_path, capsys):
4418 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
4419 _mark_test_model_ready()
4420 started = {}
4421
4422 def fake_start(**kwargs):
4423 started.update(kwargs)
4424
4425 monkeypatch.setattr("nipux_cli.cli._start_daemon_if_needed", fake_start)
4426
4427 assert _run_shell_line("run") is True
4428
4429 out = capsys.readouterr().out
4430 assert "No jobs found. Create one with /new OBJECTIVE." in out
4431 assert started == {}
4432
4433
4434def test_build_chat_messages_includes_recent_job_state(tmp_path):
4435 db = AgentDB(tmp_path / "state.db")
4436 try:
4437 job_id = db.create_job("Research topic", title="nightly research", kind="generic")
4438 db.create_job("Monitor another branch", title="other branch", kind="generic")
4439 run_id = db.start_run(job_id, model="fake")
4440 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="web_search")
4441 db.finish_step(step_id, status="completed", summary="web_search returned useful sources")
4442 job = db.get_job(job_id)
4443
4444 messages = _build_chat_messages(db, job, "what is going on?")
4445
4446 content = messages[-1]["content"]
4447 assert "Job title: nightly research" in content
4448 assert "Jobs:" in content
4449 assert "* 1. nightly research" in content
4450 assert "- 2. other branch" in content
4451 assert "web_search returned useful sources" in content
4452 assert "what is going on?" in content
4453 finally:
4454 db.close()
4455
4456
4457def test_build_chat_messages_includes_durable_outcome_summary(tmp_path):
4458 db = AgentDB(tmp_path / "state.db")
4459 try:
4460 job_id = db.create_job("Research topic", title="nightly research", kind="generic")
4461 db.append_event(job_id=job_id, event_type="artifact", title="First draft", body="saved report", metadata={})
4462 db.append_event(job_id=job_id, event_type="finding", title="Evidence map", body="", metadata={})
4463 db.append_event(
4464 job_id=job_id,
4465 event_type="experiment",
4466 title="Citation coverage",
4467 body="",
4468 metadata={"metric_name": "citations", "metric_value": 12, "metric_unit": "count"},
4469 )
4470 job = db.get_job(job_id)
4471
4472 messages = _build_chat_messages(db, job, "what has it actually done?")
4473
4474 content = messages[-1]["content"]
4475 assert "Durable outcomes:" in content
4476 assert "summary: 1 outputs 1 findings 1 measurements" in content
4477 assert "save: First draft" in content
4478 assert "find: Evidence map" in content
4479 assert "test: Citation coverage" in content
4480 finally:
4481 db.close()
4482
4483
4484def test_build_chat_messages_does_not_include_local_machine_context(monkeypatch, tmp_path):
4485 monkeypatch.setenv("HOME", str(tmp_path))
4486 ssh_dir = tmp_path / ".ssh"
4487 ssh_dir.mkdir()
4488 (ssh_dir / "config").write_text("Host private-box\n HostName 10.9.8.7\n User private\n", encoding="utf-8")
4489 db = AgentDB(tmp_path / "state.db")
4490 try:
4491 job_id = db.create_job("Research topic", title="nightly research", kind="generic")
4492 job = db.get_job(job_id)
4493
4494 messages = _build_chat_messages(db, job, "what is going on?")
4495
4496 content = messages[-1]["content"]
4497 assert "Local CLI context" not in content
4498 assert "private-box" not in content
4499 assert "10.9.8.7" not in content
4500 finally:
4501 db.close()
4502
4503
4504def test_build_chat_messages_points_to_artifact_and_lessons(tmp_path):
4505 db = AgentDB(tmp_path / "state.db")
4506 try:
4507 job_id = db.create_job("Research topic", title="nightly research", kind="generic")
4508 run_id = db.start_run(job_id, model="fake")
4509 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="write_artifact")
4510 ArtifactStore(tmp_path, db=db).write_text(
4511 job_id=job_id,
4512 run_id=run_id,
4513 step_id=step_id,
4514 title="Findings Batch",
4515 summary="15 reusable findings",
4516 content="Acme",
4517 )
4518 db.append_lesson(job_id, "Prefer actual evidence sources over low-evidence pages.", category="strategy")
4519 job = db.get_job(job_id)
4520
4521 messages = _build_chat_messages(db, job, "where are the findings?")
4522
4523 content = messages[-1]["content"]
4524 assert "/artifact 1" in content
4525 assert "Prefer actual evidence sources over low-evidence pages" in content
4526 finally:
4527 db.close()
4528
4529
4530def test_build_chat_messages_clip_large_visible_state(tmp_path):
4531 db = AgentDB(tmp_path / "state.db")
4532 try:
4533 job_id = db.create_job("Research topic", title="nightly research", kind="generic")
4534 for index in range(30):
4535 db.append_event(
4536 job_id=job_id,
4537 event_type="finding",
4538 title=f"large finding {index}",
4539 body="evidence " * 400,
4540 metadata={},
4541 )
4542 job = db.get_job(job_id)
4543
4544 messages = _build_chat_messages(db, job, "keep this exact operator question visible")
4545
4546 content = messages[-1]["content"]
4547 assert len(content) < 14_000
4548 assert "clipped" in content
4549 assert "keep this exact operator question visible" in content
4550 finally:
4551 db.close()
4552
4553
4554def test_artifact_command_resolves_title_query(monkeypatch, tmp_path, capsys):
4555 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
4556 db = AgentDB(tmp_path / "state.db")
4557 try:
4558 job_id = db.create_job("Research topic", title="nightly research")
4559 run_id = db.start_run(job_id, model="fake")
4560 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="write_artifact")
4561 ArtifactStore(tmp_path, db=db).write_text(
4562 job_id=job_id,
4563 run_id=run_id,
4564 step_id=step_id,
4565 title="Findings Batch",
4566 summary="saved findings",
4567 content="Acme Corp\n",
4568 )
4569 finally:
4570 db.close()
4571
4572 parser = build_parser()
4573 args = parser.parse_args(["artifact", "Findings", "Batch"])
4574 args.func(args)
4575
4576 out = capsys.readouterr().out
4577 assert "artifact: Findings Batch" in out
4578 assert "Acme Corp" in out
4579
4580
4581def test_artifacts_command_prints_compact_view_command(monkeypatch, tmp_path, capsys):
4582 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
4583 db = AgentDB(tmp_path / "state.db")
4584 try:
4585 job_id = db.create_job("Research topic", title="nightly research")
4586 run_id = db.start_run(job_id, model="fake")
4587 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="write_artifact")
4588 ArtifactStore(tmp_path, db=db).write_text(
4589 job_id=job_id,
4590 run_id=run_id,
4591 step_id=step_id,
4592 title="Findings Batch",
4593 summary="saved findings",
4594 content="Acme Corp\n",
4595 )
4596 finally:
4597 db.close()
4598
4599 parser = build_parser()
4600 args = parser.parse_args(["artifacts"])
4601 args.func(args)
4602
4603 out = capsys.readouterr().out
4604 assert "saved outputs nightly research" in out
4605 assert "view: artifact 1" in out
4606 assert "/jobs/" not in out
4607
4608
4609def test_artifact_command_opens_recent_output_by_number(monkeypatch, tmp_path, capsys):
4610 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
4611 db = AgentDB(tmp_path / "state.db")
4612 try:
4613 job_id = db.create_job("Research topic", title="nightly research")
4614 run_id = db.start_run(job_id, model="fake")
4615 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="write_artifact")
4616 ArtifactStore(tmp_path, db=db).write_text(
4617 job_id=job_id,
4618 run_id=run_id,
4619 step_id=step_id,
4620 title="Findings Batch",
4621 summary="saved findings",
4622 content="Acme Corp\n",
4623 )
4624 finally:
4625 db.close()
4626
4627 parser = build_parser()
4628 args = parser.parse_args(["artifact", "1"])
4629 args.func(args)
4630
4631 out = capsys.readouterr().out
4632 assert "artifact: Findings Batch" in out
4633 assert "Acme Corp" in out
4634
4635
4636def test_chat_work_defaults_to_compact_output(monkeypatch, tmp_path):
4637 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
4638 db = AgentDB(tmp_path / "state.db")
4639 try:
4640 job_id = db.create_job("Research topic", title="nightly research")
4641 finally:
4642 db.close()
4643 captured = {}
4644
4645 def fake_work(args):
4646 captured["verbose"] = args.verbose
4647 captured["chars"] = args.chars
4648
4649 monkeypatch.setattr("nipux_cli.cli.cmd_work", fake_work)
4650
4651 assert _chat_handle_line(job_id, "/work") is True
4652
4653 assert captured == {"verbose": False, "chars": 260}
4654
4655
4656def test_chat_learn_adds_lesson(monkeypatch, tmp_path, capsys):
4657 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
4658 db = AgentDB(tmp_path / "state.db")
4659 try:
4660 job_id = db.create_job("Research topic", title="nightly research")
4661 finally:
4662 db.close()
4663
4664 assert _chat_handle_line(job_id, "/learn low-evidence pages are not research findings") is True
4665
4666 out = capsys.readouterr().out
4667 db = AgentDB(tmp_path / "state.db")
4668 try:
4669 job = db.get_job(job_id)
4670 assert "learned for nightly research" in out
4671 assert job["metadata"]["last_lesson"]["lesson"] == "low-evidence pages are not research findings"
4672 finally:
4673 db.close()
4674
4675
4676def test_chat_follow_queues_follow_up_message(monkeypatch, tmp_path, capsys):
4677 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
4678 db = AgentDB(tmp_path / "state.db")
4679 try:
4680 job_id = db.create_job("Research topic", title="nightly research")
4681 finally:
4682 db.close()
4683
4684 assert _chat_handle_line(job_id, "/follow after this branch, check another source") is True
4685
4686 out = capsys.readouterr().out
4687 db = AgentDB(tmp_path / "state.db")
4688 try:
4689 job = db.get_job(job_id)
4690 message = job["metadata"]["operator_messages"][-1]
4691 assert "waiting after current branch" in out
4692 assert message["mode"] == "follow_up"
4693 assert message["message"] == "after this branch, check another source"
4694 finally:
4695 db.close()
4696
4697
4698def test_findings_sources_memory_metrics_commands(monkeypatch, tmp_path, capsys):
4699 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
4700 db = AgentDB(tmp_path / "state.db")
4701 try:
4702 job_id = db.create_job("Research topic", title="nightly research")
4703 db.append_finding_record(job_id, name="Acme Finding", category="example category", score=0.8)
4704 db.append_task_record(job_id, title="Explore primary sources", status="open", priority=5)
4705 db.append_experiment_record(job_id, title="Variant A", status="measured", metric_name="score", metric_value=1.5)
4706 db.append_source_record(job_id, "https://example.com", usefulness_score=0.9, yield_count=1)
4707 db.append_lesson(job_id, "Source indexes work.", category="strategy")
4708 db.append_reflection(job_id, "Keep using source indexes.", strategy="Try primary records.")
4709 db.append_memory_graph_records(
4710 job_id,
4711 nodes=[
4712 {
4713 "key": "source-indexes-work",
4714 "kind": "strategy",
4715 "title": "Source indexes work",
4716 "summary": "Use source indexes when they produce durable records.",
4717 "salience": 0.8,
4718 "confidence": 0.9,
4719 }
4720 ],
4721 )
4722 finally:
4723 db.close()
4724
4725 parser = build_parser()
4726 for command in (["findings"], ["tasks"], ["experiments"], ["sources"], ["memory"], ["metrics"]):
4727 args = parser.parse_args(command)
4728 args.func(args)
4729
4730 out = capsys.readouterr().out
4731 assert "Acme Finding" in out
4732 assert "Explore primary sources" in out
4733 assert "Variant A" in out
4734 assert "https://example.com" in out
4735 assert "Keep using source indexes" in out
4736 assert "graph_nodes=1" in out
4737 assert "Source indexes work" in out
4738 assert "tasks: 1" in out
4739 assert "experiments: 1" in out
4740 assert "findings: 1" in out
4741
4742
4743def test_memory_graph_html_command_writes_clickable_artifact(monkeypatch, tmp_path, capsys):
4744 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
4745 db = AgentDB(tmp_path / "state.db")
4746 try:
4747 job_id = db.create_job("Research topic", title="nightly research")
4748 db.append_memory_graph_records(
4749 job_id,
4750 nodes=[
4751 {
4752 "key": "validated-loop",
4753 "kind": "skill",
4754 "status": "stable",
4755 "title": "Validated loop",
4756 "summary": "Check progress against measured evidence before expanding scope.",
4757 "tags": ["validation", "progress"],
4758 "evidence_refs": ["artifact:report"],
4759 "salience": 0.9,
4760 "confidence": 0.8,
4761 },
4762 {
4763 "key": "open-risk",
4764 "kind": "question",
4765 "status": "open",
4766 "title": "Open risk",
4767 "summary": "Needs another validation pass.",
4768 },
4769 ],
4770 edges=[{"from_key": "validated-loop", "to_key": "open-risk", "relation": "raises"}],
4771 )
4772 finally:
4773 db.close()
4774
4775 args = build_parser().parse_args(["memory", "--graph"])
4776 args.func(args)
4777
4778 out = capsys.readouterr().out
4779 assert "memory graph written:" in out
4780 db = AgentDB(tmp_path / "state.db")
4781 try:
4782 artifacts = db.list_artifacts(job_id)
4783 assert artifacts[0]["type"] == "html"
4784 html = Path(artifacts[0]["path"]).read_text(encoding="utf-8")
4785 assert "<canvas id=\"graph\"" in html
4786 assert "click a node" in html
4787 assert "Validated loop" in html
4788 assert "open-risk" in html
4789 finally:
4790 db.close()
4791
4792
4793def test_shell_natural_update_phrase_shows_updates(monkeypatch, tmp_path, capsys):
4794 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
4795 db = AgentDB(tmp_path / "state.db")
4796 try:
4797 db.create_job("Research topic", title="research")
4798 finally:
4799 db.close()
4800
4801 assert _run_shell_line("tell me updates") is True
4802 assert _run_shell_line("show outcomes") is True
4803
4804 out = capsys.readouterr().out
4805 assert "updates" in out
4806 assert "queued for" not in out
4807
4808
4809def test_updates_command_summarizes_durable_outcomes(monkeypatch, tmp_path, capsys):
4810 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
4811 db = AgentDB(tmp_path / "state.db")
4812 try:
4813 job_id = db.create_job("Research topic", title="research")
4814 artifact_path = tmp_path / "artifact.md"
4815 artifact_path.write_text("saved", encoding="utf-8")
4816 db.append_event(
4817 job_id,
4818 event_type="tool_call",
4819 title="web_search",
4820 metadata={"input": {"arguments": {"query": "raw search"}}},
4821 )
4822 db.append_finding_record(job_id, name="Durable Result", category="evidence", reason="real outcome", score=0.7)
4823 db.add_artifact(
4824 job_id=job_id,
4825 path=artifact_path,
4826 sha256="abc",
4827 artifact_type="text",
4828 title="Saved Report",
4829 summary="durable output",
4830 )
4831 finally:
4832 db.close()
4833
4834 args = build_parser().parse_args(["updates", "research", "--limit", "3", "--chars", "120"])
4835 args.func(args)
4836
4837 out = capsys.readouterr().out
4838 assert "outcomes by hour:" in out
4839 assert "Durable Result" in out
4840 assert "Saved Report" in out
4841 assert "latest saved outputs:" in out
4842 assert "raw tool stream: activity" in out
4843 assert "recent tool calls:" not in out
4844
4845
4846def test_updates_all_summarizes_durable_work_across_jobs(monkeypatch, tmp_path, capsys):
4847 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
4848 db = AgentDB(tmp_path / "state.db")
4849 try:
4850 first_id = db.create_job("Research first topic", title="first")
4851 second_id = db.create_job("Research second topic", title="second")
4852 first_path = tmp_path / "first.md"
4853 first_path.write_text("first", encoding="utf-8")
4854 second_path = tmp_path / "second.md"
4855 second_path.write_text("second", encoding="utf-8")
4856 db.append_finding_record(first_id, name="First durable finding", category="evidence")
4857 db.add_artifact(
4858 job_id=first_id,
4859 path=first_path,
4860 sha256="abc",
4861 artifact_type="text",
4862 title="First saved output",
4863 summary="first summary",
4864 )
4865 db.append_experiment_record(
4866 second_id,
4867 title="Second measured result",
4868 status="measured",
4869 metric_name="quality",
4870 metric_value=9,
4871 metric_unit="points",
4872 )
4873 db.add_artifact(
4874 job_id=second_id,
4875 path=second_path,
4876 sha256="def",
4877 artifact_type="text",
4878 title="Second saved output",
4879 summary="second summary",
4880 )
4881 finally:
4882 db.close()
4883
4884 args = build_parser().parse_args(["outcomes", "--all", "--limit", "5", "--chars", "120"])
4885 args.func(args)
4886
4887 out = capsys.readouterr().out
4888 assert "outcomes all jobs | 2 tracked" in out
4889 assert "first |" in out
4890 assert "second |" in out
4891 assert "First durable finding" in out
4892 assert "First saved output" in out
4893 assert "Second measured result" in out
4894 assert "Second saved output" in out
4895
4896
4897def test_history_and_events_commands_render_visible_timeline(monkeypatch, tmp_path, capsys):
4898 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
4899 db = AgentDB(tmp_path / "state.db")
4900 try:
4901 job_id = db.create_job("Research topic", title="research")
4902 db.append_operator_message(job_id, "operator timeline note", source="test")
4903 db.append_agent_update(job_id, "agent timeline note", category="chat")
4904 finally:
4905 db.close()
4906
4907 parser = build_parser()
4908 parser.parse_args(["history", "research"]).func(parser.parse_args(["history", "research"]))
4909 parser.parse_args(["events", "research"]).func(parser.parse_args(["events", "research"]))
4910
4911 out = capsys.readouterr().out
4912 assert "history research" in out
4913 assert "events research" in out
4914 assert "operator timeline note" in out
4915 assert "agent timeline note" in out
4916
4917
4918def test_shell_natural_health_phrase_shows_health(monkeypatch, tmp_path, capsys):
4919 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
4920 db = AgentDB(tmp_path / "state.db")
4921 try:
4922 db.create_job("Research topic", title="research")
4923 finally:
4924 db.close()
4925
4926 assert _run_shell_line("is it running") is True
4927
4928 out = capsys.readouterr().out
4929 assert "Nipux Health" in out
4930 assert "queued for" not in out
4931
4932
4933def test_health_prints_recent_daemon_events(monkeypatch, tmp_path, capsys):
4934 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
4935
4936 config = load_config()
4937 append_daemon_event(config, "daemon_error", error_type="RuntimeError", error="provider fell over")
4938
4939 parser = build_parser()
4940 args = parser.parse_args(["health", "--limit", "3"])
4941 args.func(args)
4942
4943 out = capsys.readouterr().out
4944 assert "Nipux Health" in out
4945 assert "daemon_error" in out
4946 assert "RuntimeError" in out
4947
4948
4949def test_launch_agent_plist_contains_daemon_command(monkeypatch, tmp_path):
4950 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
4951
4952 plist = _launch_agent_plist(poll_seconds=7, quiet=True)
4953
4954 assert "com.nipux.agent" in plist
4955 assert "<string>daemon</string>" in plist
4956 assert "<string>--poll-seconds</string>" in plist
4957 assert "<string>7</string>" in plist
4958 assert str(tmp_path) in plist
4959
4960
4961def test_systemd_service_text_contains_daemon_command(monkeypatch, tmp_path):
4962 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
4963
4964 service = _systemd_service_text(poll_seconds=0, quiet=True)
4965
4966 assert "[Service]" in service
4967 assert "ExecStart=" in service
4968 assert "daemon --poll-seconds 0" in service
4969 assert f"Environment=NIPUX_HOME={tmp_path}" in service
4970 assert "Restart=always" in service
tests/nipux_cli/test_cli_model_preflight.py 86 lines
1from types import SimpleNamespace
2
3from nipux_cli.cli import _ensure_remote_model_ready_for_worker, build_parser
4from nipux_cli.doctor import Check
5
6
7def _config(base_url: str):
8 return SimpleNamespace(
9 model=SimpleNamespace(
10 model="provider/model",
11 base_url=base_url,
12 api_key="",
13 api_key_env="TEST_API_KEY",
14 )
15 )
16
17
18def test_remote_model_preflight_blocks_rejected_auth(monkeypatch, capsys):
19 def fake_doctor(*, config, check_model):
20 assert check_model is True
21 return [Check("model_auth", False, "OpenRouter rejected API key: User not found")]
22
23 monkeypatch.setattr("nipux_cli.cli.run_doctor", fake_doctor)
24
25 assert _ensure_remote_model_ready_for_worker(_config("https://openrouter.ai/api/v1"), fake=False) is False
26
27 out = capsys.readouterr().out
28 assert "model is not ready; daemon not started" in out
29 assert "model_auth: OpenRouter rejected API key" in out
30 assert "doctor --check-model" in out
31
32
33def test_remote_model_preflight_allows_recovery_monitor_for_quota(monkeypatch, capsys):
34 def fake_doctor(*, config, check_model):
35 assert check_model is True
36 return [Check("model_generation", False, "Key limit exceeded (total limit) (code=403)")]
37
38 monkeypatch.setattr("nipux_cli.cli.run_doctor", fake_doctor)
39
40 assert _ensure_remote_model_ready_for_worker(_config("https://openrouter.ai/api/v1"), fake=False) is True
41
42 out = capsys.readouterr().out
43 assert "recovery monitor mode" in out
44 assert "Key limit exceeded" in out
45
46
47def test_remote_model_preflight_skips_fake_runs(monkeypatch):
48 def fake_doctor(*, config, check_model):
49 raise AssertionError("fake runs should not need remote model auth")
50
51 monkeypatch.setattr("nipux_cli.cli.run_doctor", fake_doctor)
52
53 assert _ensure_remote_model_ready_for_worker(_config("https://openrouter.ai/api/v1"), fake=True) is True
54
55
56def test_model_preflight_checks_local_endpoints(monkeypatch):
57 called = {}
58
59 def fake_doctor(*, config, check_model):
60 called["check_model"] = check_model
61 return []
62
63 monkeypatch.setattr("nipux_cli.cli.run_doctor", fake_doctor)
64
65 assert _ensure_remote_model_ready_for_worker(_config("http://localhost:11434/v1"), fake=False) is True
66 assert called["check_model"] is True
67
68
69def test_start_does_not_spawn_daemon_when_model_preflight_fails(monkeypatch, tmp_path):
70 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
71 checked = {}
72
73 def fake_ready(config, *, fake):
74 checked["fake"] = fake
75 return False
76
77 def fake_popen(*args, **kwargs):
78 raise AssertionError("daemon should not spawn when model preflight fails")
79
80 monkeypatch.setattr("nipux_cli.cli._ensure_remote_model_ready_for_worker", fake_ready)
81 monkeypatch.setattr("subprocess.Popen", fake_popen)
82
83 args = build_parser().parse_args(["start", "--quiet"])
84 args.func(args)
85
86 assert checked["fake"] is False
tests/nipux_cli/test_compression.py 101 lines
1from nipux_cli.compression import refresh_memory_index
2from nipux_cli.db import AgentDB
3
4
5def test_refresh_memory_index_includes_durable_progress_ledgers(tmp_path):
6 db = AgentDB(tmp_path / "state.db")
7 try:
8 job_id = db.create_job(
9 "Keep improving a report",
10 title="report",
11 metadata={
12 "task_queue": [
13 {
14 "title": "Draft evidence-backed section",
15 "status": "active",
16 "priority": 10,
17 "output_contract": "report",
18 }
19 ],
20 "finding_ledger": [{"name": "Teacher traces improve tool use"}],
21 "source_ledger": [{"source": "https://example.test/paper", "usefulness_score": 0.8}],
22 "experiment_ledger": [
23 {
24 "title": "Citation density check",
25 "status": "measured",
26 "metric_name": "citations",
27 "metric_value": 12,
28 "metric_unit": "count",
29 }
30 ],
31 "roadmap": {
32 "title": "Research paper roadmap",
33 "status": "active",
34 "current_milestone": "Improve literature review",
35 },
36 "memory_graph": {
37 "nodes": [
38 {
39 "title": "Validated evidence loop",
40 "kind": "strategy",
41 "status": "active",
42 "summary": "Evidence-backed checkpoints should drive the next branch.",
43 "salience": 0.9,
44 }
45 ],
46 "edges": [
47 {
48 "from_key": "validated-evidence-loop",
49 "to_key": "research-paper-roadmap",
50 "relation": "supports",
51 }
52 ],
53 },
54 "pending_measurement_obligation": {
55 "source_step_no": 42,
56 "tool": "shell_exec",
57 "summary": "benchmark output needs accounting",
58 "metric_candidates": ["latency 120ms", "throughput 9 req/s"],
59 },
60 },
61 )
62 db.append_event(
63 job_id,
64 event_type="loop",
65 title="message_end",
66 metadata={
67 "usage": {
68 "prompt_tokens": 1200,
69 "completion_tokens": 300,
70 "total_tokens": 1500,
71 "estimated": True,
72 "context_length": 1600,
73 "context_fraction": 0.75,
74 }
75 },
76 )
77
78 refresh_memory_index(db, job_id)
79
80 memory = db.list_memory(job_id)[0]["summary"]
81 assert "Durable progress ledgers:" in memory
82 assert "tasks=1" in memory
83 assert "findings=1" in memory
84 assert "sources=1" in memory
85 assert "experiments=1" in memory
86 assert "memory_nodes=1" in memory
87 assert "Validated evidence loop" in memory
88 assert "memory_links=1" in memory
89 assert "Draft evidence-backed section" in memory
90 assert "Citation density check" in memory
91 assert "Teacher traces improve tool use" in memory
92 assert "Research paper roadmap" in memory
93 assert "pending_measurement step=#42 tool=shell_exec" in memory
94 assert "latency 120ms" in memory
95 assert "Model usage:" in memory
96 assert "total_tokens=1.5K" in memory
97 assert "estimated_calls=1" in memory
98 assert "context_pressure" in memory
99 assert "latest_context=1.2K/1.6K" in memory
100 finally:
101 db.close()
tests/nipux_cli/test_config.py 143 lines
1from pathlib import Path
2
3from nipux_cli.config import DEFAULT_CONTEXT_LENGTH, default_config_yaml, load_config
4
5
6def _mode(path):
7 return path.stat().st_mode & 0o777
8
9
10def test_load_config_defaults_to_local_endpoint(tmp_path, monkeypatch):
11 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
12
13 config = load_config()
14
15 assert config.runtime.home == tmp_path
16 assert config.model.model == "local-model"
17 assert config.model.base_url == "http://localhost:8000/v1"
18 assert config.model.api_key_env == "OPENAI_API_KEY"
19 assert config.model.context_length == DEFAULT_CONTEXT_LENGTH
20 assert config.runtime.state_db_path == tmp_path / "state.db"
21 assert config.runtime.daily_digest_enabled is True
22 assert config.runtime.daily_digest_time == "08:00"
23 assert config.runtime.max_job_cost_usd is None
24 assert config.tools.browser is True
25 assert config.tools.web is True
26 assert config.tools.shell is True
27 assert config.tools.files is True
28
29
30def test_load_config_from_yaml(tmp_path, monkeypatch):
31 monkeypatch.setenv("NIPUX_HOME", str(tmp_path / "home"))
32 cfg = tmp_path / "config.yaml"
33 cfg.write_text(
34 """
35model:
36 name: local-test
37 base_url: http://127.0.0.1:9999/v1/
38 context_length: 12345
39 input_cost_per_million: 0.1
40 output_cost_per_million: 0.2
41runtime:
42 home: ./agent-home
43 max_step_seconds: 42
44 max_job_cost_usd: 12.5
45 daily_digest_enabled: false
46 daily_digest_time: "07:30"
47tools:
48 browser: false
49 web: true
50 shell: false
51 files: true
52email:
53 enabled: true
54 to_addr: kai@example.com
55""",
56 encoding="utf-8",
57 )
58
59 config = load_config(cfg)
60
61 assert config.model.model == "local-test"
62 assert config.model.base_url == "http://127.0.0.1:9999/v1"
63 assert config.model.context_length == 12345
64 assert config.model.input_cost_per_million == 0.1
65 assert config.model.output_cost_per_million == 0.2
66 assert config.runtime.home == Path("./agent-home")
67 assert config.runtime.max_step_seconds == 42
68 assert config.runtime.max_job_cost_usd == 12.5
69 assert config.runtime.daily_digest_enabled is False
70 assert config.runtime.daily_digest_time == "07:30"
71 assert config.tools.browser is False
72 assert config.tools.web is True
73 assert config.tools.shell is False
74 assert config.tools.files is True
75 assert config.email.enabled is True
76 assert config.email.to_addr == "kai@example.com"
77
78
79def test_load_config_reads_local_env_file(tmp_path, monkeypatch):
80 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
81 monkeypatch.delenv("OPENROUTER_API_KEY", raising=False)
82 (tmp_path / ".env").write_text("OPENROUTER_API_KEY" + "=secret-test-key\n", encoding="utf-8")
83 (tmp_path / "config.yaml").write_text(
84 """
85model:
86 name: provider/test-model
87 base_url: https://openrouter.ai/api/v1
88 api_key_env: OPENROUTER_API_KEY
89""",
90 encoding="utf-8",
91 )
92
93 config = load_config()
94
95 assert config.model.api_key == "secret-test-key"
96
97
98def test_load_config_tightens_local_env_permissions(tmp_path, monkeypatch):
99 monkeypatch.setenv("NIPUX_HOME", str(tmp_path))
100 monkeypatch.delenv("OPENROUTER_API_KEY", raising=False)
101 env_path = tmp_path / ".env"
102 env_path.write_text("OPENROUTER_API_KEY" + "=secret-test-key\n", encoding="utf-8")
103 env_path.chmod(0o644)
104
105 load_config()
106
107 assert _mode(env_path) == 0o600
108
109
110def test_default_config_yaml_allows_provider_template_without_secret():
111 text = default_config_yaml(
112 model="provider/model",
113 base_url="https://openrouter.ai/api/v1/",
114 api_key_env="OPENROUTER_API_KEY",
115 context_length=8192,
116 )
117
118 assert "name: provider/model" in text
119 assert "base_url: https://openrouter.ai/api/v1" in text
120 assert "api_key_env: OPENROUTER_API_KEY" in text
121 assert "context_length: 8192" in text
122 assert "input_cost_per_million: null" in text
123 assert "output_cost_per_million: null" in text
124 assert "max_job_cost_usd: null" in text
125 assert "tools:" in text
126 assert "browser: true" in text
127 assert "shell: true" in text
128 assert "sk-" not in text
129
130
131def test_config_example_matches_default_local_endpoint():
132 root = Path(__file__).resolve().parents[2]
133 text = (root / "config.example.yaml").read_text(encoding="utf-8")
134
135 assert "name: local-model" in text
136 assert "base_url: http://localhost:8000/v1" in text
137 assert "api_key_env: OPENAI_API_KEY" in text
138 assert "input_cost_per_million: null" in text
139 assert "output_cost_per_million: null" in text
140 assert "max_job_cost_usd: null" in text
141 assert "tools:" in text
142 assert "browser: true" in text
143 assert "shell: true" in text
tests/nipux_cli/test_daemon.py 560 lines
1from datetime import datetime, timedelta, timezone
2import json
3import threading
4import time
5
6import pytest
7
8from nipux_cli.config import AppConfig, RuntimeConfig
9from nipux_cli.daemon import (
10 Daemon,
11 DaemonAlreadyRunning,
12 RUNTIME_CODE_FILES,
13 append_daemon_event,
14 current_runtime_fingerprint,
15 daemon_lock_status,
16 read_daemon_events,
17 runtime_stale,
18 single_instance_lock,
19 update_lock_metadata,
20 _exception_backoff,
21 _parse_retry_after,
22 _step_failure_backoff,
23)
24from nipux_cli.daemon_control import stop_daemon_process_impl
25from nipux_cli.db import AgentDB
26from nipux_cli.worker import StepExecution
27from nipux_cli.doctor import Check
28
29
30def test_single_instance_lock_rejects_second_holder(tmp_path):
31 lock_path = tmp_path / "agentd.lock"
32 with single_instance_lock(lock_path):
33 with pytest.raises(DaemonAlreadyRunning):
34 with single_instance_lock(lock_path):
35 pass
36
37
38def test_daemon_lock_status_reports_free_lock(tmp_path):
39 status = daemon_lock_status(tmp_path / "agentd.lock")
40
41 assert status["running"] is False
42 assert status["detail"] == "daemon lock is free"
43
44
45def test_lock_metadata_can_be_updated_while_held(tmp_path):
46 lock_path = tmp_path / "agentd.lock"
47 with single_instance_lock(lock_path) as handle:
48 update_lock_metadata(handle, last_state="step", consecutive_failures=2)
49 status = daemon_lock_status(lock_path)
50
51 assert status["running"] is True
52 assert status["metadata"]["last_state"] == "step"
53 assert status["metadata"]["consecutive_failures"] == 2
54 assert status["metadata"]["pid"]
55 assert status["metadata"]["started_at"]
56
57
58def test_lock_metadata_update_restores_missing_process_fields(tmp_path):
59 lock_path = tmp_path / "agentd.lock"
60 with single_instance_lock(lock_path) as handle:
61 handle.seek(0)
62 handle.truncate()
63 handle.write(json.dumps({"last_state": "idle"}))
64 handle.flush()
65
66 update_lock_metadata(handle, last_state="step")
67 status = daemon_lock_status(lock_path)
68
69 assert status["running"] is True
70 assert status["metadata"]["pid"]
71 assert status["metadata"]["started_at"]
72 assert status["metadata"]["last_state"] == "step"
73
74
75def test_daemon_lock_heartbeat_updates_while_worker_turn_runs(monkeypatch, tmp_path):
76 monkeypatch.setattr("nipux_cli.daemon.WORK_HEARTBEAT_INTERVAL_SECONDS", 0.01)
77 monkeypatch.setattr("nipux_cli.daemon.signal.getsignal", lambda _sig: None)
78 monkeypatch.setattr("nipux_cli.daemon.signal.signal", lambda _sig, _handler: None)
79
80 class SlowDaemon(Daemon):
81 def run_once(self, *, fake: bool = False, verbose: bool = False): # noqa: ARG002
82 time.sleep(0.2)
83 return None
84
85 config = AppConfig(runtime=RuntimeConfig(home=tmp_path, daily_digest_enabled=False))
86 db = AgentDB(tmp_path / "state.db")
87 try:
88 daemon = SlowDaemon(config=config, db=db)
89 thread = threading.Thread(
90 target=daemon.run_forever,
91 kwargs={"poll_seconds": 0, "quiet": True, "max_iterations": 1},
92 daemon=True,
93 )
94 thread.start()
95 seen_working: dict | None = None
96 deadline = time.time() + 1.0
97 while time.time() < deadline:
98 status = daemon_lock_status(tmp_path / "agentd.lock")
99 metadata = status.get("metadata") or {}
100 if status.get("running") and metadata.get("last_state") == "working":
101 seen_working = metadata
102 break
103 time.sleep(0.01)
104
105 thread.join(timeout=2.0)
106
107 assert seen_working is not None
108 assert seen_working["last_heartbeat"]
109 assert seen_working["runtime"]["runtime_hash"]
110 assert not thread.is_alive()
111 finally:
112 db.close()
113
114
115def test_stop_daemon_recovers_pidless_lock_from_process_list(tmp_path, monkeypatch):
116 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
117 lock_path = tmp_path / "agentd.lock"
118 killed = []
119
120 class PsResult:
121 returncode = 0
122 stdout = " 99999 /venv/bin/python -m nipux_cli.cli daemon --poll-seconds 0.0 --quiet\n"
123
124 monkeypatch.setattr("nipux_cli.daemon_control.subprocess.run", lambda *_args, **_kwargs: PsResult())
125 monkeypatch.setattr("nipux_cli.daemon_control.os.kill", lambda pid, sig: killed.append((pid, sig)))
126
127 with single_instance_lock(lock_path) as handle:
128 handle.seek(0)
129 handle.truncate()
130 handle.write(json.dumps({"last_state": "idle"}))
131 handle.flush()
132
133 stopped = stop_daemon_process_impl(config, wait=0.1, quiet=True, pid_alive=lambda _pid: False)
134
135 assert stopped is True
136 assert killed and killed[0][0] == 99999
137
138
139def test_daemon_lock_status_detects_stale_runtime(tmp_path):
140 lock_path = tmp_path / "agentd.lock"
141 with single_instance_lock(lock_path) as handle:
142 update_lock_metadata(handle, runtime={"runtime_hash": "old"})
143 status = daemon_lock_status(lock_path)
144
145 assert status["running"] is True
146 assert status["stale"] is True
147 assert status["current_runtime"]["code_hash"]
148 assert status["current_runtime"]["code_mtime"]
149 assert runtime_stale({"runtime": {"runtime_hash": "old"}}) is True
150 assert runtime_stale({"runtime": current_runtime_fingerprint()}) is False
151
152
153def test_runtime_fingerprint_tracks_progress_code():
154 assert "progress.py" in RUNTIME_CODE_FILES
155 assert "parser_builder.py" in RUNTIME_CODE_FILES
156
157
158def test_rate_limit_backoff_uses_retry_after_header():
159 class RateLimit(Exception):
160 status_code = 429
161 response = type("Response", (), {"headers": {"Retry-After": "42"}})()
162
163 assert _exception_backoff(RateLimit("too many requests"), poll_seconds=0, consecutive_failures=1) == 42
164
165
166def test_rate_limit_backoff_has_conservative_fallback():
167 class RateLimit(Exception):
168 status_code = 429
169
170 assert _exception_backoff(RateLimit("rate limit exceeded"), poll_seconds=0, consecutive_failures=1) == 10
171
172
173def test_failed_step_provider_config_error_uses_normal_backoff():
174 result = StepExecution(
175 job_id="job",
176 run_id="run",
177 step_id="step",
178 tool_name=None,
179 status="failed",
180 result={
181 "error_type": "PermissionDeniedError",
182 "error": "Error code: 403 - key limit exceeded",
183 },
184 )
185
186 assert _step_failure_backoff(result, poll_seconds=0, consecutive_failures=1) == 1
187
188
189def test_failed_tool_auth_error_uses_normal_backoff():
190 result = StepExecution(
191 job_id="job",
192 run_id="run",
193 step_id="step",
194 tool_name="shell_exec",
195 status="failed",
196 result={
197 "error": "command output indicates authentication or authorization failure: permission denied",
198 },
199 )
200
201 assert _step_failure_backoff(result, poll_seconds=3, consecutive_failures=1) == 3
202
203
204def test_failed_step_rate_limit_uses_normal_backoff():
205 result = StepExecution(
206 job_id="job",
207 run_id="run",
208 step_id="step",
209 tool_name=None,
210 status="failed",
211 result={
212 "error_type": "RateLimitError",
213 "error": "429 too many requests",
214 },
215 )
216
217 assert _step_failure_backoff(result, poll_seconds=0, consecutive_failures=1) == 1
218
219
220def test_failed_step_provider_timeout_uses_normal_backoff():
221 result = StepExecution(
222 job_id="job",
223 run_id="run",
224 step_id="step",
225 tool_name=None,
226 status="failed",
227 result={
228 "error_type": "APITimeoutError",
229 "error": "Request timed out.",
230 },
231 )
232
233 assert _step_failure_backoff(result, poll_seconds=0, consecutive_failures=1) == 1
234
235
236def test_retry_after_parses_epoch_milliseconds():
237 future_ms = str(int((time.time() + 5) * 1000))
238
239 parsed = _parse_retry_after(future_ms)
240
241 assert parsed is not None
242 assert 0 <= parsed <= 6
243
244
245def test_daemon_run_once_claims_next_job_with_fake_step(tmp_path):
246 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
247 db = AgentDB(tmp_path / "state.db")
248 try:
249 job_id = db.create_job("Run forever in small steps")
250 daemon = Daemon(config=config, db=db)
251
252 result = daemon.run_once(fake=True)
253
254 assert result is not None
255 assert result.job_id == job_id
256 assert result.status == "completed"
257 assert db.list_artifacts(job_id)[0]["title"] == "daemon-fake-step"
258 finally:
259 db.close()
260
261
262def test_daemon_ignores_ui_focus_for_worker_scheduling(tmp_path):
263 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
264 db = AgentDB(tmp_path / "state.db")
265 try:
266 first = db.create_job("First job", title="first")
267 second = db.create_job("Second job", title="second")
268 (tmp_path / "shell_state.json").write_text(json.dumps({"focus_job_id": second}), encoding="utf-8")
269 daemon = Daemon(config=config, db=db)
270
271 job = daemon.next_runnable_job()
272
273 assert first != second
274 assert job is not None
275 assert job["id"] == first
276 finally:
277 db.close()
278
279
280def test_daemon_skips_deferred_jobs_until_due(tmp_path):
281 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
282 db = AgentDB(tmp_path / "state.db")
283 try:
284 deferred = db.create_job("Deferred job", title="deferred")
285 ready = db.create_job("Ready job", title="ready")
286 db.update_job_status(
287 deferred,
288 "queued",
289 metadata_patch={"defer_until": (datetime.now(timezone.utc) + timedelta(hours=1)).isoformat()},
290 )
291 daemon = Daemon(config=config, db=db)
292
293 job = daemon.next_runnable_job()
294
295 assert job is not None
296 assert job["id"] == ready
297 finally:
298 db.close()
299
300
301def test_daemon_quarantines_provider_blocked_jobs(tmp_path):
302 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
303 db = AgentDB(tmp_path / "state.db")
304 try:
305 blocked = db.create_job("Provider blocked job", title="blocked")
306 ready = db.create_job("Ready job", title="ready")
307 db.update_job_status(
308 blocked,
309 "running",
310 metadata_patch={"provider_blocked_at": datetime.now(timezone.utc).isoformat()},
311 )
312 daemon = Daemon(config=config, db=db)
313
314 job = daemon.next_runnable_job()
315
316 assert job is not None
317 assert job["id"] == ready
318 blocked_job = db.get_job(blocked)
319 assert blocked_job["status"] == "paused"
320 assert "provider" in blocked_job["metadata"]["last_note"].lower()
321 events = db.list_events(job_id=blocked, limit=10)
322 assert any(event["event_type"] == "agent_message" and event["metadata"].get("reason") == "llm_provider_blocked" for event in events)
323 finally:
324 db.close()
325
326
327def test_daemon_leaves_provider_blocked_job_paused_until_model_recovers(monkeypatch, tmp_path):
328 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
329 db = AgentDB(tmp_path / "state.db")
330 try:
331 blocked_at = (datetime.now(timezone.utc) - timedelta(hours=1)).isoformat()
332 job_id = db.create_job("Provider blocked job", title="blocked")
333 db.update_job_status(
334 job_id,
335 "paused",
336 metadata_patch={"provider_blocked_at": blocked_at},
337 )
338
339 def fake_doctor(*, config, check_model):
340 assert check_model is True
341 return [Check("model_generation", False, "key limit exceeded")]
342
343 monkeypatch.setattr("nipux_cli.daemon.run_doctor", fake_doctor)
344 daemon = Daemon(config=config, db=db)
345
346 assert daemon.next_runnable_job() is None
347 job = db.get_job(job_id)
348 assert job["status"] == "paused"
349 assert job["metadata"]["provider_last_probe_detail"].startswith("model_generation")
350 assert read_daemon_events(config, limit=1)[0]["event"] == "provider_recovery_wait"
351 finally:
352 db.close()
353
354
355def test_daemon_resumes_provider_blocked_job_when_model_recovers(monkeypatch, tmp_path):
356 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
357 db = AgentDB(tmp_path / "state.db")
358 try:
359 blocked_at = (datetime.now(timezone.utc) - timedelta(hours=1)).isoformat()
360 job_id = db.create_job("Provider blocked job", title="blocked")
361 db.update_job_status(
362 job_id,
363 "paused",
364 metadata_patch={"provider_blocked_at": blocked_at},
365 )
366
367 def fake_doctor(*, config, check_model):
368 assert check_model is True
369 return []
370
371 monkeypatch.setattr("nipux_cli.daemon.run_doctor", fake_doctor)
372 daemon = Daemon(config=config, db=db)
373
374 job = daemon.next_runnable_job()
375 stored = db.get_job(job_id)
376
377 assert job is not None
378 assert job["id"] == job_id
379 assert stored["status"] == "queued"
380 assert stored["metadata"]["provider_unblocked_at"]
381 events = db.list_events(job_id=job_id, limit=10)
382 assert any(event["event_type"] == "agent_message" and event["metadata"].get("reason") == "llm_provider_recovered" for event in events)
383 finally:
384 db.close()
385
386
387def test_daemon_idle_sleep_wakes_for_deferred_job(tmp_path):
388 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
389 db = AgentDB(tmp_path / "state.db")
390 try:
391 now = datetime.now(timezone.utc)
392 job_id = db.create_job("Deferred job", title="deferred")
393 db.update_job_status(
394 job_id,
395 "queued",
396 metadata_patch={"defer_until": (now + timedelta(seconds=2)).isoformat()},
397 )
398 daemon = Daemon(config=config, db=db)
399
400 sleep_seconds = daemon.idle_sleep_seconds(poll_seconds=30, now=now)
401
402 assert 1.9 <= sleep_seconds <= 2.1
403 finally:
404 db.close()
405
406
407def test_daemon_idle_sleep_uses_poll_when_no_deferred_jobs(tmp_path):
408 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
409 db = AgentDB(tmp_path / "state.db")
410 try:
411 db.create_job("Ready job", title="ready")
412 daemon = Daemon(config=config, db=db)
413
414 assert daemon.idle_sleep_seconds(poll_seconds=30) == 30
415 assert daemon.idle_sleep_seconds(poll_seconds=0) == 5.0
416 finally:
417 db.close()
418
419
420def test_daemon_advances_multiple_runnable_jobs_without_focus_starvation(tmp_path):
421 config = AppConfig(runtime=RuntimeConfig(home=tmp_path, daily_digest_enabled=False))
422 db = AgentDB(tmp_path / "state.db")
423 try:
424 first = db.create_job("First job", title="first")
425 second = db.create_job("Second job", title="second")
426 (tmp_path / "shell_state.json").write_text(json.dumps({"focus_job_id": second}), encoding="utf-8")
427 daemon = Daemon(config=config, db=db)
428
429 daemon.run_forever(poll_seconds=0, quiet=True, max_iterations=4, fake=True)
430
431 assert db.list_steps(job_id=first)
432 assert db.list_steps(job_id=second)
433 finally:
434 db.close()
435
436
437def test_daemon_writes_due_daily_digest_once(tmp_path):
438 config = AppConfig(runtime=RuntimeConfig(home=tmp_path, daily_digest_time="00:00"))
439 db = AgentDB(tmp_path / "state.db")
440 try:
441 db.create_job("Keep finding findings", title="findings")
442 daemon = Daemon(config=config, db=db)
443 now = datetime(2026, 4, 23, 8, 30)
444
445 first = daemon.send_due_daily_digest(now=now)
446 second = daemon.send_due_daily_digest(now=now)
447
448 assert first is not None
449 assert first["status"] == "dry_run"
450 assert second is None
451 assert (tmp_path / "digests" / "2026-04-23-daily.md").exists()
452 finally:
453 db.close()
454
455
456def test_daemon_event_log_round_trips_jsonl(tmp_path):
457 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
458
459 path = append_daemon_event(config, "step", job_id="job_1", status="completed")
460 events = read_daemon_events(config, limit=3)
461
462 assert path.name == "daemon-events.jsonl"
463 assert events[-1]["event"] == "step"
464 assert events[-1]["job_id"] == "job_1"
465
466
467def test_daemon_recovers_stale_running_steps_on_start(tmp_path):
468 config = AppConfig(runtime=RuntimeConfig(home=tmp_path, daily_digest_enabled=False))
469 db = AgentDB(tmp_path / "state.db")
470 try:
471 job_id = db.create_job("Recover stale work", title="stale")
472 run_id = db.start_run(job_id, model="fake")
473 stale_step = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="browser_navigate")
474 daemon = Daemon(config=config, db=db)
475
476 daemon.run_forever(poll_seconds=0, quiet=True, max_iterations=1, fake=True)
477
478 steps = db.list_steps(job_id=job_id)
479 stale = next(step for step in steps if step["id"] == stale_step)
480 events = read_daemon_events(config, limit=5)
481 assert stale["status"] == "failed"
482 assert stale["error"] == "daemon recovered abandoned running work from a previous process"
483 assert db.list_runs(job_id, limit=10)[-1]["status"] == "failed"
484 assert any(event.get("event") == "stale_work_recovered" for event in events)
485 finally:
486 db.close()
487
488
489def test_daemon_survives_unexpected_step_exception(tmp_path):
490 class ExplodingDaemon(Daemon):
491 def run_once(self, *, fake: bool = False, verbose: bool = False): # noqa: ARG002
492 raise RuntimeError("provider fell over")
493
494 config = AppConfig(runtime=RuntimeConfig(home=tmp_path, daily_digest_enabled=False))
495 db = AgentDB(tmp_path / "state.db")
496 try:
497 daemon = ExplodingDaemon(config=config, db=db)
498
499 daemon.run_forever(poll_seconds=0, quiet=True, max_iterations=1)
500
501 status = daemon_lock_status(tmp_path / "agentd.lock")
502 events = read_daemon_events(config, limit=5)
503 assert status["metadata"]["last_state"] == "error"
504 assert status["metadata"]["consecutive_failures"] == 1
505 assert any(event.get("event") == "daemon_error" for event in events)
506 finally:
507 db.close()
508
509
510def test_daemon_treats_blocked_steps_as_recoverable(tmp_path):
511 class BlockedDaemon(Daemon):
512 def run_once(self, *, fake: bool = False, verbose: bool = False): # noqa: ARG002
513 return StepExecution(
514 job_id="job",
515 run_id="run",
516 step_id="step",
517 tool_name="web_search",
518 status="blocked",
519 result={"error": "search loop blocked", "recoverable": True},
520 )
521
522 config = AppConfig(runtime=RuntimeConfig(home=tmp_path, daily_digest_enabled=False))
523 db = AgentDB(tmp_path / "state.db")
524 try:
525 daemon = BlockedDaemon(config=config, db=db)
526
527 daemon.run_forever(poll_seconds=0, quiet=True, max_iterations=3)
528
529 status = daemon_lock_status(tmp_path / "agentd.lock")
530 events = read_daemon_events(config, limit=10)
531 assert status["metadata"]["consecutive_failures"] == 0
532 assert sum(1 for event in events if event.get("event") == "step") == 3
533 assert not any(event.get("event") == "daemon_error" for event in events)
534 finally:
535 db.close()
536
537
538def test_fake_daemon_can_run_100_iterations_without_auto_stop(tmp_path):
539 config = AppConfig(runtime=RuntimeConfig(home=tmp_path, daily_digest_enabled=False))
540 db = AgentDB(tmp_path / "state.db")
541 try:
542 job_id = db.create_job("Run a long fake worker", title="long")
543 daemon = Daemon(config=config, db=db)
544
545 daemon.run_forever(poll_seconds=0, quiet=True, max_iterations=100, fake=True)
546
547 steps = db.list_steps(job_id=job_id)
548 assert len(steps) == 100
549 assert any(step["kind"] == "reflection" for step in steps)
550 assert db.list_artifacts(job_id)
551 memory = db.list_memory(job_id)
552 assert memory
553 assert memory[0]["key"] == "rolling_state"
554 assert "Recent steps:" in memory[0]["summary"]
555 assert db.get_job(job_id)["status"] in {"queued", "running"}
556 step_events = [event for event in read_daemon_events(config, limit=120) if event.get("event") == "step"]
557 assert len(step_events) == 100
558 assert daemon_lock_status(tmp_path / "agentd.lock")["running"] is False
559 finally:
560 db.close()
tests/nipux_cli/test_dashboard.py 86 lines
1from datetime import datetime, timedelta, timezone
2
3from nipux_cli.artifacts import ArtifactStore
4from nipux_cli.config import AppConfig, RuntimeConfig
5from nipux_cli.dashboard import collect_dashboard_state, render_dashboard, render_overview
6from nipux_cli.db import AgentDB
7
8
9def test_dashboard_collects_jobs_steps_and_artifacts(tmp_path):
10 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
11 db = AgentDB(tmp_path / "state.db")
12 try:
13 job_id = db.create_job("Research topic every morning", title="research", kind="generic")
14 run_id = db.start_run(job_id, model="fake-model")
15 step_id = db.add_step(
16 job_id=job_id,
17 run_id=run_id,
18 kind="tool",
19 tool_name="write_artifact",
20 input_data={"arguments": {"title": "Findings"}},
21 )
22 ArtifactStore(tmp_path, db=db).write_text(
23 job_id=job_id,
24 run_id=run_id,
25 step_id=step_id,
26 title="Findings",
27 summary="first saved finding",
28 content="Acme Corp",
29 )
30 db.finish_step(step_id, status="completed", summary="saved finding", output_data={"success": True})
31 db.finish_run(run_id, "completed")
32 db.append_lesson(job_id, "Low-evidence summaries are not finding batches.", category="source_quality")
33 db.append_task_record(job_id, title="Explore primary sources", status="open", priority=5)
34
35 state = collect_dashboard_state(db, config, job_id=job_id)
36 rendered = render_dashboard(state, width=100)
37 overview = render_overview(state, width=100)
38
39 assert state["daemon"]["running"] is False
40 assert state["focus"]["counts"]["artifacts"] == 1
41 assert state["focus"]["counts"]["tasks"] == 1
42 assert "Nipux CLI Dashboard" in rendered
43 assert "research" in rendered
44 assert "write_artifact" in rendered
45 assert "Findings" in rendered
46 assert "Low-evidence summaries are not finding batches" in rendered
47 assert "Explore primary sources" in rendered
48 assert "Nipux Status" in overview
49 assert "latest artifact: Findings" in overview
50 assert "latest lesson:" in overview
51 finally:
52 db.close()
53
54
55def test_overview_marks_idle_daemon_as_ready_for_work(tmp_path):
56 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
57 db = AgentDB(tmp_path / "state.db")
58 try:
59 db.create_job("Research topic", title="research")
60 state = collect_dashboard_state(db, config)
61 overview = render_overview(state, width=100)
62
63 assert "ready when work starts" in overview
64 finally:
65 db.close()
66
67
68def test_overview_marks_old_heartbeat_as_busy_for_running_step(tmp_path):
69 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
70 db = AgentDB(tmp_path / "state.db")
71 try:
72 job_id = db.create_job("Measure a process", title="measure")
73 run_id = db.start_run(job_id, model="fake-model")
74 db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec", status="running")
75 state = collect_dashboard_state(db, config, job_id=job_id)
76 state["daemon"]["running"] = True
77 state["daemon"]["metadata"] = {
78 "last_heartbeat": (datetime.now(timezone.utc) - timedelta(seconds=180)).isoformat(),
79 }
80
81 overview = render_overview(state, width=100)
82
83 assert "busy #1 shell_exec" in overview
84 assert "heartbeat 180s ago (stale)" not in overview
85 finally:
86 db.close()
tests/nipux_cli/test_db.py 609 lines
1from nipux_cli.db import AgentDB
2
3
4def test_db_job_run_step_and_artifact_roundtrip(tmp_path):
5 db = AgentDB(tmp_path / "state.db")
6 try:
7 job_id = db.create_job("Research topic every day", title="research", kind="generic")
8 assert job_id == "research"
9 job = db.get_job(job_id)
10 assert job["status"] == "queued"
11 assert job["kind"] == "generic"
12
13 run_id = db.start_run(job_id, model="local-test-model")
14 step_id = db.add_step(
15 job_id=job_id,
16 run_id=run_id,
17 kind="tool",
18 tool_name="write_artifact",
19 input_data={"x": 1},
20 )
21 db.finish_step(step_id, status="completed", summary="wrote artifact", output_data={"ok": True})
22 artifact_id = db.add_artifact(
23 job_id=job_id,
24 run_id=run_id,
25 step_id=step_id,
26 path=tmp_path / "artifact.md",
27 sha256="abc",
28 artifact_type="text",
29 title="A",
30 )
31 db.finish_run(run_id, "completed")
32
33 assert db.get_job(job_id)["status"] == "running"
34 assert db.list_steps(run_id=run_id)[0]["output"]["ok"] is True
35 assert db.list_runs(job_id)[0]["id"] == run_id
36 assert db.get_artifact(artifact_id)["title"] == "A"
37 assert db.list_artifacts(job_id)[0]["id"] == artifact_id
38 finally:
39 db.close()
40
41
42def test_create_job_uses_unique_readable_slug_ids(tmp_path):
43 db = AgentDB(tmp_path / "state.db")
44 try:
45 first = db.create_job("Research topic", title="Nightly Research")
46 second = db.create_job("Research more topics", title="Nightly Research")
47
48 assert first == "nightly-research"
49 assert second == "nightly-research-2"
50 finally:
51 db.close()
52
53
54def test_step_numbers_increment_across_runs_for_a_job(tmp_path):
55 db = AgentDB(tmp_path / "state.db")
56 try:
57 job_id = db.create_job("Long job")
58 run_1 = db.start_run(job_id, model="fake")
59 step_1 = db.add_step(job_id=job_id, run_id=run_1, kind="tool")
60 db.finish_step(step_1, status="completed")
61 db.finish_run(run_1, "completed")
62
63 run_2 = db.start_run(job_id, model="fake")
64 step_2 = db.add_step(job_id=job_id, run_id=run_2, kind="tool")
65
66 steps = db.list_steps(job_id=job_id)
67 assert step_2 != step_1
68 assert [step["step_no"] for step in steps] == [1, 2]
69 finally:
70 db.close()
71
72
73def test_job_token_usage_aggregates_message_usage(tmp_path):
74 db = AgentDB(tmp_path / "state.db")
75 try:
76 job_id = db.create_job("Long job")
77 db.append_event(
78 job_id,
79 event_type="loop",
80 title="message_end",
81 metadata={
82 "usage": {
83 "prompt_tokens": 100,
84 "completion_tokens": 25,
85 "total_tokens": 125,
86 "cost": 0.001,
87 "prompt_tokens_details": {"cached_tokens": 10},
88 "completion_tokens_details": {"reasoning_tokens": 3},
89 }
90 },
91 )
92 db.append_event(
93 job_id,
94 event_type="loop",
95 title="message_end",
96 metadata={
97 "usage": {
98 "prompt_tokens": 150,
99 "completion_tokens": 50,
100 "total_tokens": 200,
101 "estimated": True,
102 "context_length": 1000,
103 "context_fraction": 0.15,
104 }
105 },
106 )
107
108 usage = db.job_token_usage(job_id)
109
110 assert usage["prompt_tokens"] == 250
111 assert usage["completion_tokens"] == 75
112 assert usage["total_tokens"] == 325
113 assert usage["latest_prompt_tokens"] == 150
114 assert usage["cost"] == 0.001
115 assert usage["has_cost"] is True
116 assert usage["estimated_calls"] == 1
117 assert usage["reasoning_tokens"] == 3
118 assert usage["cached_tokens"] == 10
119 assert usage["latest_context_length"] == 1000
120 assert usage["latest_context_fraction"] == 0.15
121 finally:
122 db.close()
123
124
125def test_append_operator_message_roundtrip(tmp_path):
126 db = AgentDB(tmp_path / "state.db")
127 try:
128 job_id = db.create_job("Research topic")
129
130 entry = db.append_operator_message(job_id, "Focus on artifact-backed findings", source="shell")
131 job = db.get_job(job_id)
132
133 assert entry["message"] == "Focus on artifact-backed findings"
134 assert job["metadata"]["operator_messages"][0]["source"] == "shell"
135 assert job["metadata"]["operator_messages"][0]["mode"] == "steer"
136 assert job["metadata"]["operator_messages"][0]["message"] == "Focus on artifact-backed findings"
137 assert job["metadata"]["last_operator_message"]["message"] == "Focus on artifact-backed findings"
138 events = db.list_timeline_events(job_id)
139 assert events[-1]["event_type"] == "operator_message"
140 assert events[-1]["body"] == "Focus on artifact-backed findings"
141 finally:
142 db.close()
143
144
145def test_claim_operator_messages_marks_one_message_at_a_time(tmp_path):
146 db = AgentDB(tmp_path / "state.db")
147 try:
148 job_id = db.create_job("Research topic")
149 first = db.append_operator_message(job_id, "first steer", source="chat")
150 db.append_operator_message(job_id, "second steer", source="chat")
151
152 claimed = db.claim_operator_messages(job_id, modes=("steer",), limit=1)
153 second_claim = db.claim_operator_messages(job_id, modes=("steer",), limit=1)
154
155 job = db.get_job(job_id)
156 messages = job["metadata"]["operator_messages"]
157 events = db.list_timeline_events(job_id, limit=20)
158
159 assert [item["message"] for item in claimed] == ["first steer"]
160 assert claimed[0]["event_id"] == first["event_id"]
161 assert [item["message"] for item in second_claim] == ["second steer"]
162 assert all(message.get("claimed_at") for message in messages)
163 assert any(event["event_type"] == "loop" and event["title"] == "steering claimed" for event in events)
164 finally:
165 db.close()
166
167
168def test_acknowledge_operator_messages_marks_delivered_context(tmp_path):
169 db = AgentDB(tmp_path / "state.db")
170 try:
171 job_id = db.create_job("Research topic")
172 entry = db.append_operator_message(job_id, "correct the target before continuing", source="chat")
173 db.claim_operator_messages(job_id, modes=("steer",), limit=1)
174
175 result = db.acknowledge_operator_messages(
176 job_id,
177 message_ids=[entry["event_id"]],
178 summary="target correction incorporated",
179 )
180
181 job = db.get_job(job_id)
182 message = job["metadata"]["operator_messages"][0]
183 events = db.list_timeline_events(job_id, limit=20)
184
185 assert result["count"] == 1
186 assert message["acknowledged_at"]
187 assert job["metadata"]["last_operator_context_ack"]["summary"] == "target correction incorporated"
188 assert any(event["event_type"] == "operator_context" for event in events)
189 finally:
190 db.close()
191
192
193def test_rename_job_updates_title_without_changing_id(tmp_path):
194 db = AgentDB(tmp_path / "state.db")
195 try:
196 job_id = db.create_job("Research topic", title="old title")
197
198 renamed = db.rename_job(job_id, "new title")
199 job = db.get_job(job_id)
200
201 assert renamed["id"] == job_id
202 assert renamed["title"] == "new title"
203 assert job["title"] == "new title"
204 finally:
205 db.close()
206
207
208def test_delete_job_removes_related_rows(tmp_path):
209 db = AgentDB(tmp_path / "state.db")
210 try:
211 job_id = db.create_job("Research topic", title="delete me")
212 run_id = db.start_run(job_id, model="fake")
213 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="write_artifact")
214 artifact_path = tmp_path / "artifact.md"
215 artifact_path.write_text("artifact", encoding="utf-8")
216 db.add_artifact(
217 job_id=job_id,
218 run_id=run_id,
219 step_id=step_id,
220 path=artifact_path,
221 sha256="abc",
222 artifact_type="text",
223 title="Artifact",
224 )
225 db.upsert_memory(job_id=job_id, key="rolling_state", summary="summary")
226
227 result = db.delete_job(job_id)
228
229 assert result["job"]["title"] == "delete me"
230 assert result["counts"]["runs"] == 1
231 assert result["counts"]["steps"] == 1
232 assert result["counts"]["artifacts"] == 1
233 assert result["counts"]["memory"] == 1
234 try:
235 db.get_job(job_id)
236 except KeyError:
237 pass
238 else:
239 raise AssertionError("job still exists after delete")
240 assert db.list_steps(job_id=job_id) == []
241 assert db.list_artifacts(job_id) == []
242 assert db.list_memory(job_id) == []
243 finally:
244 db.close()
245
246
247def test_append_lesson_roundtrip(tmp_path):
248 db = AgentDB(tmp_path / "state.db")
249 try:
250 job_id = db.create_job("Research topic")
251
252 entry = db.append_lesson(
253 job_id,
254 "Low-evidence pages are not useful evidence sources.",
255 category="source_quality",
256 confidence=0.9,
257 )
258 job = db.get_job(job_id)
259
260 assert entry["category"] == "source_quality"
261 assert job["metadata"]["lessons"][0]["lesson"] == "Low-evidence pages are not useful evidence sources."
262 assert job["metadata"]["last_lesson"]["confidence"] == 0.9
263 finally:
264 db.close()
265
266
267def test_append_lesson_dedupes_repeated_memory(tmp_path):
268 db = AgentDB(tmp_path / "state.db")
269 try:
270 job_id = db.create_job("Research topic")
271
272 first = db.append_lesson(job_id, "Use primary source indexes.", category="strategy", metadata={"step": 1})
273 second = db.append_lesson(job_id, "Use primary source indexes.", category="strategy", metadata={"step": 2})
274 job = db.get_job(job_id)
275
276 assert first["lesson"] == second["lesson"]
277 assert len(job["metadata"]["lessons"]) == 1
278 assert job["metadata"]["lessons"][0]["seen_count"] == 2
279 assert job["metadata"]["lessons"][0]["metadata"]["step"] == 2
280 assert first["created"] is True
281 assert second["created"] is False
282 assert second["substantive_update"] is False
283 assert len(db.list_events(job_id=job_id, event_types=["lesson"])) == 1
284 finally:
285 db.close()
286
287
288def test_source_and_finding_ledgers_dedupe_and_update(tmp_path):
289 db = AgentDB(tmp_path / "state.db")
290 try:
291 job_id = db.create_job("Research topic")
292
293 source = db.append_source_record(
294 job_id,
295 "https://example.com/source",
296 source_type="web_source",
297 usefulness_score=0.7,
298 yield_count=3,
299 outcome="yielded reusable findings",
300 )
301 updated_source = db.append_source_record(
302 job_id,
303 "https://example.com/source",
304 usefulness_score=0.9,
305 yield_count=2,
306 fail_count_delta=1,
307 )
308 finding = db.append_finding_record(
309 job_id,
310 name="Acme Finding",
311 url="https://acme.example",
312 category="example category",
313 score=0.8,
314 )
315 updated_finding = db.append_finding_record(
316 job_id,
317 name="Acme Finding",
318 url="https://acme.example",
319 contact="source note",
320 score=0.85,
321 )
322 reflection = db.append_reflection(job_id, "Keep using source indexes", strategy="Prioritize primary records")
323 job = db.get_job(job_id)
324
325 assert source["key"] == updated_source["key"]
326 assert source["created"] is True
327 assert updated_source["created"] is False
328 assert updated_source["yield_count"] == 5
329 assert updated_source["fail_count"] == 1
330 assert finding["created"] is True
331 assert updated_finding["created"] is False
332 assert updated_finding["contact"] == "source note"
333 assert len(job["metadata"]["source_ledger"]) == 1
334 assert len(job["metadata"]["finding_ledger"]) == 1
335 assert job["metadata"]["last_reflection"]["summary"] == reflection["summary"]
336 finally:
337 db.close()
338
339
340def test_repeated_source_and_finding_records_mark_non_substantive_touches(tmp_path):
341 db = AgentDB(tmp_path / "state.db")
342 try:
343 job_id = db.create_job("Research topic")
344
345 db.append_source_record(
346 job_id,
347 "https://example.com/source",
348 source_type="web_source",
349 usefulness_score=0.7,
350 outcome="yielded reusable findings",
351 )
352 repeated_source = db.append_source_record(
353 job_id,
354 "https://example.com/source",
355 source_type="web_source",
356 usefulness_score=0.7,
357 outcome="yielded reusable findings",
358 )
359 changed_source = db.append_source_record(
360 job_id,
361 "https://example.com/source",
362 source_type="web_source",
363 usefulness_score=0.8,
364 outcome="yielded reusable findings",
365 )
366
367 db.append_finding_record(job_id, name="Acme Finding", url="https://acme.example", score=0.8)
368 repeated_finding = db.append_finding_record(job_id, name="Acme Finding", url="https://acme.example", score=0.8)
369 changed_finding = db.append_finding_record(
370 job_id,
371 name="Acme Finding",
372 url="https://acme.example",
373 score=0.9,
374 )
375
376 assert repeated_source["created"] is False
377 assert repeated_source["substantive_update"] is False
378 assert changed_source["substantive_update"] is True
379 assert repeated_finding["created"] is False
380 assert repeated_finding["substantive_update"] is False
381 assert changed_finding["substantive_update"] is True
382 finally:
383 db.close()
384
385
386def test_task_queue_dedupes_and_updates(tmp_path):
387 db = AgentDB(tmp_path / "state.db")
388 try:
389 job_id = db.create_job("Research topic")
390
391 first = db.append_task_record(
392 job_id,
393 title="Explore primary sources",
394 status="open",
395 priority=3,
396 goal="Find direct evidence",
397 )
398 second = db.append_task_record(
399 job_id,
400 title="Explore primary sources",
401 status="done",
402 priority=5,
403 result="Saved source artifact",
404 )
405 job = db.get_job(job_id)
406
407 assert first["created"] is True
408 assert second["created"] is False
409 assert len(job["metadata"]["task_queue"]) == 1
410 assert job["metadata"]["task_queue"][0]["status"] == "done"
411 assert job["metadata"]["task_queue"][0]["priority"] == 5
412 assert job["metadata"]["task_queue"][0]["result"] == "Saved source artifact"
413 finally:
414 db.close()
415
416
417def test_repeated_task_and_experiment_records_mark_non_substantive_touches(tmp_path):
418 db = AgentDB(tmp_path / "state.db")
419 try:
420 job_id = db.create_job("Research topic")
421
422 db.append_task_record(job_id, title="Explore primary sources", status="open", priority=3)
423 repeated_task = db.append_task_record(job_id, title="Explore primary sources", status="open", priority=3)
424 changed_task = db.append_task_record(job_id, title="Explore primary sources", status="done", priority=3)
425
426 db.append_experiment_record(
427 job_id,
428 title="Trial",
429 status="measured",
430 metric_name="score",
431 metric_value=0.8,
432 )
433 repeated_experiment = db.append_experiment_record(
434 job_id,
435 title="Trial",
436 status="measured",
437 metric_name="score",
438 metric_value=0.8,
439 )
440 changed_experiment = db.append_experiment_record(
441 job_id,
442 title="Trial",
443 status="measured",
444 metric_name="score",
445 metric_value=0.9,
446 )
447
448 assert repeated_task["created"] is False
449 assert repeated_task["substantive_update"] is False
450 assert changed_task["substantive_update"] is True
451 assert repeated_experiment["created"] is False
452 assert repeated_experiment["substantive_update"] is False
453 assert changed_experiment["substantive_update"] is True
454 finally:
455 db.close()
456
457
458def test_non_substantive_ledger_touches_do_not_emit_visible_events(tmp_path):
459 db = AgentDB(tmp_path / "state.db")
460 try:
461 job_id = db.create_job("Research topic")
462
463 db.append_source_record(job_id, "https://example.com", outcome="useful")
464 db.append_source_record(job_id, "https://example.com", outcome="useful")
465 db.append_finding_record(job_id, name="Reusable finding", reason="evidence")
466 db.append_finding_record(job_id, name="Reusable finding", reason="evidence")
467 db.append_task_record(job_id, title="Explore primary sources", status="open", priority=3)
468 db.append_task_record(job_id, title="Explore primary sources", status="open", priority=3)
469 db.append_roadmap_record(
470 job_id,
471 title="Roadmap",
472 milestones=[{"title": "Foundation", "status": "active", "priority": 5}],
473 )
474 db.append_roadmap_record(
475 job_id,
476 title="Roadmap",
477 milestones=[{"title": "Foundation", "status": "active", "priority": 5}],
478 )
479 db.append_experiment_record(
480 job_id,
481 title="Trial",
482 status="measured",
483 metric_name="score",
484 metric_value=0.8,
485 )
486 db.append_experiment_record(
487 job_id,
488 title="Trial",
489 status="measured",
490 metric_name="score",
491 metric_value=0.8,
492 )
493
494 events = db.list_timeline_events(job_id, limit=50)
495 counts = {event_type: sum(1 for event in events if event["event_type"] == event_type) for event_type in {
496 "source",
497 "finding",
498 "task",
499 "roadmap",
500 "experiment",
501 }}
502
503 assert counts == {
504 "source": 1,
505 "finding": 1,
506 "task": 1,
507 "roadmap": 1,
508 "experiment": 1,
509 }
510 finally:
511 db.close()
512
513
514def test_roadmap_last_records_include_progress_accounting_metadata(tmp_path):
515 db = AgentDB(tmp_path / "state.db")
516 try:
517 job_id = db.create_job("Research topic")
518 db.append_roadmap_record(
519 job_id,
520 title="Roadmap",
521 milestones=[{"title": "Foundation", "status": "active", "priority": 5}],
522 )
523 db.append_roadmap_record(
524 job_id,
525 title="Roadmap",
526 milestones=[{"title": "Foundation", "status": "validating", "priority": 6}],
527 )
528 db.append_milestone_validation_record(
529 job_id,
530 milestone="Foundation",
531 validation_status="passed",
532 result="Evidence satisfies acceptance criteria.",
533 )
534 metadata = db.get_job(job_id)["metadata"]
535 roadmap_record = metadata["last_roadmap_record"]
536 validation_record = metadata["last_milestone_validation"]
537
538 assert roadmap_record["created"] is False
539 assert roadmap_record["updated_at"]
540 assert roadmap_record["added_milestones"] == 0
541 assert roadmap_record["updated_milestones"] == 1
542 assert validation_record["validated_at"]
543 assert validation_record["validation_status"] == "passed"
544 finally:
545 db.close()
546
547
548def test_repeated_roadmap_records_do_not_create_fake_milestone_updates(tmp_path):
549 db = AgentDB(tmp_path / "state.db")
550 try:
551 job_id = db.create_job("Research topic")
552 milestone = {"title": "Foundation", "status": "active", "priority": 5}
553 db.append_roadmap_record(job_id, title="Roadmap", milestones=[milestone])
554 repeated = db.append_roadmap_record(job_id, title="Roadmap", milestones=[milestone])
555 metadata = db.get_job(job_id)["metadata"]
556 roadmap_record = metadata["last_roadmap_record"]
557
558 assert repeated["created"] is False
559 assert repeated["substantive_update"] is False
560 assert roadmap_record["updated_milestones"] == 0
561 assert roadmap_record["updated_features"] == 0
562 assert roadmap_record["roadmap_updated"] is False
563 finally:
564 db.close()
565
566
567def test_timeline_events_cover_visible_activity(tmp_path):
568 db = AgentDB(tmp_path / "state.db")
569 try:
570 job_id = db.create_job("Research topic", title="research")
571 db.append_operator_message(job_id, "operator note", source="test")
572 db.append_agent_update(job_id, "agent note", category="chat")
573 db.append_lesson(job_id, "durable lesson", category="strategy")
574 db.append_source_record(job_id, "https://example.com", usefulness_score=0.7, outcome="useful")
575 db.append_finding_record(job_id, name="Reusable finding", reason="evidence")
576 db.append_task_record(job_id, title="Explore branch", status="open")
577 db.append_reflection(job_id, "reflect summary", strategy="next strategy")
578 run_id = db.start_run(job_id, model="fake")
579 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="web_search", input_data={"query": "x"})
580 db.finish_step(step_id, status="completed", summary="searched", output_data={"ok": True})
581 db.add_artifact(
582 job_id=job_id,
583 run_id=run_id,
584 step_id=step_id,
585 path=tmp_path / "artifact.md",
586 sha256="abc",
587 artifact_type="text",
588 title="Artifact",
589 summary="saved",
590 )
591 db.upsert_memory(job_id=job_id, key="rolling_state", summary="compact state")
592
593 events = db.list_timeline_events(job_id, limit=50)
594 event_types = {event["event_type"] for event in events}
595
596 assert "operator_message" in event_types
597 assert "agent_message" in event_types
598 assert "lesson" in event_types
599 assert "source" in event_types
600 assert "finding" in event_types
601 assert "task" in event_types
602 assert "reflection" in event_types
603 assert "tool_call" in event_types
604 assert "tool_result" in event_types
605 assert "artifact" in event_types
606 assert "compaction" in event_types
607 assert any(event["body"] == "operator note" for event in events)
608 finally:
609 db.close()
tests/nipux_cli/test_digest.py 43 lines
1from nipux_cli.config import AppConfig, RuntimeConfig
2from nipux_cli.db import AgentDB
3from nipux_cli.digest import render_daily_digest, render_job_digest, write_daily_digest
4
5
6def test_daily_digest_includes_ledgers_lessons_sources_and_strategy(tmp_path):
7 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
8 db = AgentDB(tmp_path / "state.db")
9 try:
10 job_id = db.create_job("Research topic", title="research", kind="generic")
11 db.append_finding_record(job_id, name="Acme Finding", category="example category", reason="reusable result", score=0.8)
12 db.append_task_record(job_id, title="Explore primary sources", status="open", priority=5)
13 db.append_source_record(job_id, "https://example.com", usefulness_score=0.9, yield_count=1, outcome="yielded findings")
14 db.append_lesson(job_id, "Low-evidence pages are not finding sources.", category="source_quality")
15 db.append_reflection(job_id, "Primary source map is working.", strategy="Try archival sources next.")
16 db.start_run(job_id, model="test-model")
17 db.append_event(
18 job_id,
19 event_type="loop",
20 title="message_end",
21 metadata={"usage": {"prompt_tokens": 1200, "completion_tokens": 300, "total_tokens": 1500, "cost": 0.0025}},
22 )
23
24 body = render_daily_digest(db)
25 job_body = render_job_digest(db, job_id)
26 result = write_daily_digest(config, db, day="2026-04-25")
27
28 assert "Counts: 1 findings, 1 sources, 1 tasks, 0 experiments, 1 lessons" in body
29 assert "Model usage:" in body
30 assert "1.5K tokens" in body
31 assert "cost=$0.0025" in body
32 assert "## Model Usage" in job_body
33 assert "test-model: 1 calls" in job_body
34 assert "Experiments:" in body
35 assert "Acme Finding" in body
36 assert "Explore primary sources" in body
37 assert "Low-evidence pages are not finding sources." in body
38 assert "https://example.com" in body
39 assert "Try archival sources next." in body
40 assert result["status"] == "dry_run"
41 assert (tmp_path / "digests" / "2026-04-25-daily.md").exists()
42 finally:
43 db.close()
tests/nipux_cli/test_doctor.py 157 lines
1import io
2import json
3import urllib.error
4
5from nipux_cli.config import AppConfig, ModelConfig, RuntimeConfig
6from nipux_cli.doctor import run_doctor
7
8
9class FakeHTTPResponse:
10 def __init__(self, payload: dict):
11 self.payload = payload
12
13 def __enter__(self):
14 return self
15
16 def __exit__(self, exc_type, exc, tb):
17 return False
18
19 def read(self, _limit=-1):
20 return json.dumps(self.payload).encode("utf-8")
21
22
23def test_doctor_checks_local_runtime_without_model_call(tmp_path):
24 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
25
26 checks = run_doctor(config=config, check_model=False)
27
28 assert {check.name for check in checks} == {"state_dir_writable", "sqlite", "model_config", "tool_surface", "browser_runtime"}
29 assert all(check.ok for check in checks)
30
31
32def test_doctor_warns_when_remote_model_key_is_missing(tmp_path, monkeypatch):
33 monkeypatch.delenv("OPENROUTER_API_KEY", raising=False)
34 config = AppConfig(
35 runtime=RuntimeConfig(home=tmp_path),
36 model=ModelConfig(
37 model="provider/model",
38 base_url="https://openrouter.ai/api/v1",
39 api_key_env="OPENROUTER_API_KEY",
40 ),
41 )
42
43 checks = run_doctor(config=config, check_model=False)
44 model_check = next(check for check in checks if check.name == "model_config")
45
46 assert not model_check.ok
47 assert "OPENROUTER_API_KEY is not set" in model_check.detail
48 assert "sk-" not in model_check.detail
49
50
51def test_doctor_reports_openrouter_auth_failure(tmp_path, monkeypatch):
52 monkeypatch.setenv("TEST_OPENROUTER_KEY", "bad-key")
53 config = AppConfig(
54 runtime=RuntimeConfig(home=tmp_path),
55 model=ModelConfig(
56 model="nvidia/nemotron-3-nano-omni-30b-a3b-reasoning:free",
57 base_url="https://openrouter.ai/api/v1",
58 api_key_env="TEST_OPENROUTER_KEY",
59 ),
60 )
61
62 def fake_urlopen(_request, timeout):
63 raise urllib.error.HTTPError(
64 "https://openrouter.ai/api/v1/key",
65 401,
66 "Unauthorized",
67 hdrs=None,
68 fp=None,
69 )
70
71 monkeypatch.setattr("urllib.request.urlopen", fake_urlopen)
72
73 checks = run_doctor(config=config, check_model=True)
74 model_check = checks[-1]
75
76 assert model_check.name == "model_auth"
77 assert model_check.ok is False
78 assert "OpenRouter rejected API key" in model_check.detail
79
80
81def test_doctor_reports_generation_limit_after_model_listing(tmp_path, monkeypatch):
82 monkeypatch.setenv("TEST_OPENROUTER_KEY", "limited-key")
83 config = AppConfig(
84 runtime=RuntimeConfig(home=tmp_path),
85 model=ModelConfig(
86 model="provider/test-model",
87 base_url="https://openrouter.ai/api/v1",
88 api_key_env="TEST_OPENROUTER_KEY",
89 ),
90 )
91
92 def fake_urlopen(request, timeout):
93 url = request.full_url
94 if url.endswith("/key"):
95 return FakeHTTPResponse({})
96 if url.endswith("/models"):
97 return FakeHTTPResponse({"data": [{"id": "provider/test-model"}]})
98 if url.endswith("/chat/completions"):
99 body = b'{"error":{"message":"Key limit exceeded (total limit).","code":403}}'
100 raise urllib.error.HTTPError(url, 403, "Forbidden", hdrs=None, fp=io.BytesIO(body))
101 raise AssertionError(url)
102
103 monkeypatch.setattr("urllib.request.urlopen", fake_urlopen)
104
105 checks = run_doctor(config=config, check_model=True)
106 model_check = checks[-1]
107
108 assert model_check.name == "model_generation"
109 assert model_check.ok is False
110 assert "Key limit exceeded" in model_check.detail
111
112
113def test_doctor_reports_nested_provider_generation_error(tmp_path, monkeypatch):
114 monkeypatch.setenv("TEST_OPENROUTER_KEY", "limited-key")
115 config = AppConfig(
116 runtime=RuntimeConfig(home=tmp_path),
117 model=ModelConfig(
118 model="provider/test-model",
119 base_url="https://openrouter.ai/api/v1",
120 api_key_env="TEST_OPENROUTER_KEY",
121 ),
122 )
123
124 def fake_urlopen(request, timeout):
125 url = request.full_url
126 if url.endswith("/key"):
127 return FakeHTTPResponse({})
128 if url.endswith("/models"):
129 return FakeHTTPResponse({"data": [{"id": "provider/test-model"}]})
130 if url.endswith("/chat/completions"):
131 body = json.dumps(
132 {
133 "error": {
134 "message": "Provider returned error",
135 "code": 429,
136 "metadata": {
137 "raw": "provider/test-model is temporarily rate-limited upstream.",
138 "provider_name": "ExampleProvider",
139 "is_byok": False,
140 },
141 }
142 }
143 ).encode("utf-8")
144 raise urllib.error.HTTPError(url, 429, "Too Many Requests", hdrs=None, fp=io.BytesIO(body))
145 raise AssertionError(url)
146
147 monkeypatch.setattr("urllib.request.urlopen", fake_urlopen)
148
149 checks = run_doctor(config=config, check_model=True)
150 model_check = checks[-1]
151
152 assert model_check.name == "model_generation"
153 assert model_check.ok is False
154 assert "Provider returned error" in model_check.detail
155 assert "temporarily rate-limited upstream" in model_check.detail
156 assert "provider=ExampleProvider" in model_check.detail
157 assert "byok=False" in model_check.detail
tests/nipux_cli/test_generic_runtime_audit.py 35 lines
1from pathlib import Path
2
3
4FORBIDDEN_RUNTIME_LITERALS = {
5 "192.168",
6 "9060 xt",
7 "canadian",
8 "client finder",
9 "lead batch",
10 "lead ledger",
11 "client prospect",
12 "edmonton",
13 "home ssh",
14 "home-ssh.local",
15 "home@",
16 "huggingface.co/qwen",
17 "livebusiness",
18 "qwen_qwen",
19 "ssh home",
20 "treefrog",
21 "yelp",
22 "llama.cpp",
23}
24
25
26def test_runtime_code_has_no_task_specific_literals():
27 root = Path(__file__).resolve().parents[2] / "nipux_cli"
28 haystack = "\n".join(
29 path.read_text(encoding="utf-8", errors="replace").lower()
30 for path in sorted(root.glob("*.py"))
31 if path.name != "__init__.py"
32 )
33
34 for literal in FORBIDDEN_RUNTIME_LITERALS:
35 assert literal not in haystack
tests/nipux_cli/test_live_memory_graph_smoke.py 37 lines
1import importlib.util
2import sys
3from pathlib import Path
4
5from nipux_cli.memory_graph import memory_graph_for_prompt
6
7
8def _load_live_smoke():
9 path = Path(__file__).resolve().parents[2] / "scripts" / "live_memory_graph_smoke.py"
10 spec = importlib.util.spec_from_file_location("live_memory_graph_smoke", path)
11 assert spec is not None
12 module = importlib.util.module_from_spec(spec)
13 assert spec.loader is not None
14 sys.modules[spec.name] = module
15 spec.loader.exec_module(module)
16 return module
17
18
19def test_live_memory_graph_smoke_fails_cleanly_without_key(monkeypatch, capsys):
20 smoke = _load_live_smoke()
21 monkeypatch.delenv("NIPUX_LIVE_TEST_KEY", raising=False)
22 monkeypatch.setattr(sys, "argv", ["live_memory_graph_smoke.py", "--api-key-env", "NIPUX_LIVE_TEST_KEY", "--json"])
23
24 assert smoke.main() == 1
25
26 out = capsys.readouterr().out
27 assert '"success": false' in out
28 assert "NIPUX_LIVE_TEST_KEY is not set" in out
29 assert "secret" not in out.lower()
30
31
32def test_live_memory_graph_smoke_seed_pushes_generic_consolidation():
33 smoke = _load_live_smoke()
34 prompt = memory_graph_for_prompt({"metadata": smoke._seed_metadata()})
35
36 assert "No memory graph yet" in prompt
37 assert "Durable ledgers already contain" in prompt
tests/nipux_cli/test_llm.py 151 lines
1from types import SimpleNamespace
2
3from nipux_cli.config import ModelConfig
4from nipux_cli.llm import OpenAIChatLLM, _enrich_openrouter_generation_usage
5
6
7class _FakeCompletions:
8 def __init__(self):
9 self.kwargs = None
10 self.calls = []
11
12 def create(self, **kwargs):
13 self.kwargs = kwargs
14 self.calls.append(kwargs)
15 usage = SimpleNamespace(prompt_tokens=11, completion_tokens=7, total_tokens=18, cost=0.00042)
16 message = SimpleNamespace(content="ok", tool_calls=[])
17 choice = SimpleNamespace(message=message)
18 return SimpleNamespace(id="gen_test", model="provider/model", choices=[choice], usage=usage)
19
20
21def test_chat_llm_requires_tool_choice_for_worker_actions(monkeypatch):
22 fake_completions = _FakeCompletions()
23 monkeypatch.setenv("TEST_API_KEY", "test")
24
25 class FakeOpenAI:
26 def __init__(self, **kwargs):
27 pass
28
29 chat = SimpleNamespace(completions=fake_completions)
30
31 monkeypatch.setattr("nipux_cli.llm.OpenAI", FakeOpenAI)
32
33 llm = OpenAIChatLLM(ModelConfig(model="test/model", base_url="https://example.test/v1", api_key_env="TEST_API_KEY"))
34 response = llm.next_action(messages=[{"role": "user", "content": "hi"}], tools=[{"type": "function", "function": {"name": "noop"}}])
35
36 assert response.content == "ok"
37 assert response.usage["prompt_tokens"] == 11
38 assert response.usage["completion_tokens"] == 7
39 assert response.usage["cost"] == 0.00042
40 assert response.model == "provider/model"
41 assert response.response_id == "gen_test"
42 assert fake_completions.kwargs["tools"]
43 assert fake_completions.kwargs["tool_choice"] == "required"
44
45
46def test_chat_llm_retries_without_tool_choice_when_provider_rejects_it(monkeypatch):
47 monkeypatch.setenv("TEST_API_KEY", "test")
48
49 class RejectingCompletions(_FakeCompletions):
50 def create(self, **kwargs):
51 self.calls.append(kwargs)
52 if kwargs.get("tool_choice") == "required":
53 raise RuntimeError("unsupported parameter: tool_choice")
54 return super().create(**kwargs)
55
56 fake_completions = RejectingCompletions()
57
58 class FakeOpenAI:
59 def __init__(self, **kwargs):
60 pass
61
62 chat = SimpleNamespace(completions=fake_completions)
63
64 monkeypatch.setattr("nipux_cli.llm.OpenAI", FakeOpenAI)
65
66 llm = OpenAIChatLLM(ModelConfig(model="test/model", base_url="https://example.test/v1", api_key_env="TEST_API_KEY"))
67 response = llm.next_action(messages=[{"role": "user", "content": "hi"}], tools=[{"type": "function", "function": {"name": "noop"}}])
68
69 assert response.content == "ok"
70 assert fake_completions.calls[0]["tool_choice"] == "required"
71 assert "tool_choice" not in fake_completions.calls[-1]
72
73
74def test_chat_llm_complete_response_returns_usage(monkeypatch):
75 fake_completions = _FakeCompletions()
76 monkeypatch.setenv("TEST_API_KEY", "test")
77
78 class FakeOpenAI:
79 def __init__(self, **kwargs):
80 pass
81
82 chat = SimpleNamespace(completions=fake_completions)
83
84 monkeypatch.setattr("nipux_cli.llm.OpenAI", FakeOpenAI)
85
86 llm = OpenAIChatLLM(ModelConfig(model="test/model", base_url="https://example.test/v1", api_key_env="TEST_API_KEY"))
87 response = llm.complete_response(messages=[{"role": "user", "content": "hi"}])
88
89 assert response.content == "ok"
90 assert response.usage["prompt_tokens"] == 11
91 assert response.usage["completion_tokens"] == 7
92 assert response.usage["cost"] == 0.00042
93 assert response.model == "provider/model"
94 assert response.response_id == "gen_test"
95 assert fake_completions.kwargs["model"] == "test/model"
96
97
98def test_chat_llm_disables_provider_sdk_retries(monkeypatch):
99 captured = {}
100 monkeypatch.setenv("TEST_API_KEY", "test")
101
102 class FakeOpenAI:
103 def __init__(self, **kwargs):
104 captured.update(kwargs)
105
106 chat = SimpleNamespace(completions=_FakeCompletions())
107
108 monkeypatch.setattr("nipux_cli.llm.OpenAI", FakeOpenAI)
109
110 OpenAIChatLLM(ModelConfig(model="test/model", base_url="https://example.test/v1", api_key_env="TEST_API_KEY", request_timeout_seconds=37))
111
112 assert captured["timeout"] == 37
113 assert captured["max_retries"] == 0
114
115
116def test_openrouter_generation_usage_enriches_cost_and_tokens(monkeypatch):
117 class FakeHTTPResponse:
118 def __enter__(self):
119 return self
120
121 def __exit__(self, *_args):
122 return False
123
124 def read(self):
125 return (
126 b'{"data":{"total_cost":"0.0042","native_tokens_prompt":123,'
127 b'"native_tokens_completion":45,"native_tokens_total":168}}'
128 )
129
130 captured = {}
131
132 def fake_urlopen(request, timeout):
133 captured["url"] = request.full_url
134 captured["timeout"] = timeout
135 return FakeHTTPResponse()
136
137 monkeypatch.setattr("nipux_cli.llm.urllib.request.urlopen", fake_urlopen)
138
139 usage = _enrich_openrouter_generation_usage(
140 {"prompt_tokens": 10, "completion_tokens": 5, "total_tokens": 15, "estimated": False},
141 response_id="gen_123",
142 base_url="https://openrouter.ai/api/v1",
143 api_key="sk-test",
144 )
145
146 assert captured["url"] == "https://openrouter.ai/api/v1/generation?id=gen_123"
147 assert captured["timeout"] == 5
148 assert usage["cost"] == 0.0042
149 assert usage["prompt_tokens"] == 123
150 assert usage["completion_tokens"] == 45
151 assert usage["total_tokens"] == 168
tests/nipux_cli/test_measurement.py 32 lines
1from nipux_cli.measurement import measurement_candidates, measurement_candidates_are_diagnostic_only
2
3
4def test_measurement_candidates_extract_markdown_table_unit_columns():
5 output = {
6 "stdout": (
7 "| model | size | backend | threads | test | t/s |\n"
8 "| ------------------------------ | ---------: | ---------- | ------: | --------------: | -------------------: |\n"
9 "| example model | 11.71 GiB | CPU | 24 | pp32 | 5.48 ± 0.11 |\n"
10 "| example model | 11.71 GiB | CPU | 24 | tg128 | 3.44 ± 0.05 |\n"
11 )
12 }
13
14 candidates = measurement_candidates(output, command="run benchmark")
15
16 assert "pp32 5.48 ± 0.11 t/s" in candidates
17 assert "tg128 3.44 ± 0.05 t/s" in candidates
18 assert not measurement_candidates_are_diagnostic_only(candidates, command="run benchmark")
19
20
21def test_measurement_candidates_extract_generic_table_metrics():
22 output = {
23 "stdout": (
24 "| benchmark | latency | req/s |\n"
25 "| --- | ---: | ---: |\n"
26 "| warm path | 18.4 | 42.7 |\n"
27 )
28 }
29
30 candidates = measurement_candidates(output, command="profile throughput")
31
32 assert "warm path 42.7 req/s" in candidates
tests/nipux_cli/test_metric_format.py 11 lines
1from nipux_cli.metric_format import format_metric_value
2
3
4def test_format_metric_value_spaces_named_units():
5 assert format_metric_value("citations", 42, "count") == "citations=42 count"
6 assert format_metric_value("speed", 2.7, "tokens/s") == "speed=2.7 tokens/s"
7
8
9def test_format_metric_value_keeps_attached_symbol_units():
10 assert format_metric_value("accuracy", 98.2, "%") == "accuracy=98.2%"
11 assert format_metric_value("throughput", 120, "/s") == "throughput=120/s"
tests/nipux_cli/test_operator_context.py 30 lines
1from nipux_cli.operator_context import inactive_prompt_operator_ids, operator_entry_is_prompt_relevant
2
3
4def _entry(message: str, *, event_id: str = "op_1", mode: str = "steer") -> dict:
5 return {"event_id": event_id, "mode": mode, "message": message}
6
7
8def test_conversation_only_operator_messages_do_not_enter_worker_prompt():
9 for message in ("hello", "how is it going?", "clear", "stop 1", "jobs"):
10 assert not operator_entry_is_prompt_relevant(_entry(message))
11
12
13def test_actionable_operator_messages_remain_worker_constraints():
14 for message in (
15 "do not run local testing on my computer",
16 "use the corrected target from the chat",
17 "focus on measured results instead of saved notes",
18 "the address is wrong, use `target-box`",
19 ):
20 assert operator_entry_is_prompt_relevant(_entry(message))
21
22
23def test_inactive_prompt_operator_ids_returns_only_conversation_active_messages():
24 messages = [
25 _entry("hello", event_id="op_chat"),
26 _entry("use the corrected target", event_id="op_use"),
27 {**_entry("clear", event_id="op_done"), "acknowledged_at": "2026-04-26T00:00:00+00:00"},
28 ]
29
30 assert inactive_prompt_operator_ids(messages) == ["op_chat"]
tests/nipux_cli/test_planning.py 90 lines
1from nipux_cli.planning import initial_plan_for_objective, initial_roadmap_for_objective, initial_task_contract, objective_profiles
2
3
4def test_initial_task_contracts_are_generic_and_complete():
5 for title in [
6 "Clarify the exact success criteria and constraints.",
7 "Map the first research or execution branches.",
8 "Collect evidence and save outputs as files.",
9 "Reflect on what worked, update memory, and continue with the next branch.",
10 ]:
11 contract = initial_task_contract(title)
12
13 assert contract["output_contract"] in {"research", "artifact", "experiment", "action", "monitor", "decision", "report"}
14 assert contract["acceptance_criteria"]
15 assert contract["evidence_needed"]
16 assert contract["stall_behavior"]
17
18
19def test_initial_roadmap_uses_valid_generic_contracts():
20 roadmap = initial_roadmap_for_objective(title="paper", objective="write a paper")
21
22 for milestone in roadmap["milestones"]:
23 for feature in milestone["features"]:
24 assert feature["output_contract"] in {
25 "research",
26 "artifact",
27 "experiment",
28 "action",
29 "monitor",
30 "decision",
31 "report",
32 }
33
34
35def test_initial_plan_adapts_to_measurable_objectives():
36 plan = initial_plan_for_objective("optimize a generic process for lower latency and higher throughput")
37 contracts = [initial_task_contract(title)["output_contract"] for title in plan["tasks"]]
38
39 assert plan["profile"] == "measured"
40 assert "experiment" in contracts
41 assert any("baseline" in title.lower() for title in plan["tasks"])
42 assert any("metric" in question.lower() for question in plan["questions"])
43
44
45def test_initial_plan_adapts_to_deliverable_objectives():
46 plan = initial_plan_for_objective("write a full research paper from evidence")
47 contracts = [initial_task_contract(title)["output_contract"] for title in plan["tasks"]]
48
49 assert plan["profile"] == "deliverable"
50 assert "report" in contracts
51 assert any("draft" in title.lower() or "report" in title.lower() for title in plan["tasks"])
52 assert any("revise" in title.lower() and "evidence" in title.lower() for title in plan["tasks"])
53
54
55def test_initial_plan_treats_generated_files_as_deliverables():
56 plan = initial_plan_for_objective("generate a polished launch checklist for this repository")
57 contracts = [initial_task_contract(title)["output_contract"] for title in plan["tasks"]]
58
59 assert plan["profile"] == "deliverable"
60 assert "report" in contracts
61 assert any("audience" in question.lower() for question in plan["questions"])
62
63
64def test_initial_plan_adapts_to_monitoring_objectives():
65 plan = initial_plan_for_objective("monitor a recurring process and report important changes")
66 contracts = [initial_task_contract(title)["output_contract"] for title in plan["tasks"]]
67
68 assert plan["profile"] == "monitor"
69 assert "monitor" in contracts
70 assert any("cadence" in question.lower() or "check" in question.lower() for question in plan["questions"])
71
72
73def test_initial_plan_does_not_add_meta_progress_update_task():
74 for objective in [
75 "optimize a generic process for lower latency and higher throughput",
76 "write a full research paper from evidence",
77 "monitor a recurring process and report important changes",
78 "investigate build quality and compare output changes",
79 ]:
80 plan = initial_plan_for_objective(objective)
81
82 assert all("progress update" not in title.lower() for title in plan["tasks"])
83 assert all("keep working on the next useful branch" not in title.lower() for title in plan["tasks"])
84
85
86def test_objective_profiles_stay_generic():
87 profiles = objective_profiles("investigate build quality and compare output changes")
88
89 assert profiles
90 assert all(profile in {"measured", "deliverable", "monitor", "implementation", "research", "general"} for profile in profiles)
tests/nipux_cli/test_progress.py 214 lines
1from nipux_cli.progress import build_progress_checkpoint, ledger_counts, recent_progress_bits
2
3
4def test_progress_checkpoint_reports_deltas_and_recent_durable_work():
5 metadata = {
6 "finding_ledger": [{"title": "First finding"}, {"title": "Better branch"}],
7 "source_ledger": [{"url": "https://example.test"}],
8 "task_queue": [
9 {"title": "Draft report", "status": "done", "priority": 2},
10 {"title": "Validate report", "status": "open", "priority": 8},
11 ],
12 "experiment_ledger": [{"title": "Quality check", "metric_name": "score", "metric_value": 0.82}],
13 "lessons": [{"lesson": "Prefer measured output"}],
14 "roadmap": {"milestones": [{"title": "Publishable draft", "status": "validating"}]},
15 }
16
17 checkpoint = build_progress_checkpoint(
18 metadata,
19 previous_counts={"findings": 1, "sources": 0, "tasks": 2, "experiments": 0, "lessons": 1, "milestones": 0},
20 step_no=40,
21 tool_name="record_findings",
22 )
23
24 assert checkpoint.counts == {
25 "findings": 2,
26 "sources": 1,
27 "tasks": 2,
28 "experiments": 1,
29 "lessons": 1,
30 "milestones": 1,
31 }
32 assert checkpoint.deltas["findings"] == 1
33 assert checkpoint.deltas["sources"] == 1
34 assert checkpoint.deltas["tasks"] == 0
35 assert checkpoint.category == "progress"
36 assert "+1 finding" in checkpoint.message
37 assert "+1 source" in checkpoint.message
38 assert "+1 experiment" in checkpoint.message
39 assert "finding=Better branch" in checkpoint.message
40 assert "task=Validate report" in checkpoint.message
41 assert "measurement=score=0.82" in checkpoint.message
42 assert "milestone=Publishable draft" in checkpoint.message
43
44
45def test_progress_checkpoint_for_saved_output_is_concise():
46 metadata = {"finding_ledger": [{}], "source_ledger": [{}, {}], "task_queue": [{}], "experiment_ledger": []}
47
48 checkpoint = build_progress_checkpoint(
49 metadata,
50 step_no=12,
51 tool_name="write_artifact",
52 artifact_id="art_123",
53 is_finding_output=True,
54 )
55
56 assert checkpoint.category == "finding"
57 assert checkpoint.message.startswith("Saved output art_123")
58 assert "1 findings, 2 sources, 1 tasks, and 0 experiments" in checkpoint.message
59
60
61def test_progress_checkpoint_without_delta_is_activity_not_progress():
62 metadata = {"finding_ledger": [{}], "source_ledger": [{}], "task_queue": [{}], "experiment_ledger": []}
63
64 checkpoint = build_progress_checkpoint(
65 metadata,
66 previous_counts={"findings": 1, "sources": 1, "tasks": 1, "experiments": 0, "lessons": 0, "milestones": 0},
67 step_no=50,
68 tool_name="web_extract",
69 )
70
71 assert checkpoint.category == "activity"
72 assert "no new durable ledger entries" in checkpoint.message
73
74
75def test_progress_checkpoint_counts_existing_record_updates_as_progress():
76 metadata = {
77 "last_checkpoint_at": "2026-01-01T00:00:00+00:00",
78 "finding_ledger": [{}],
79 "source_ledger": [{}],
80 "task_queue": [{"title": "Existing branch", "status": "done"}],
81 "experiment_ledger": [{"title": "Trial", "status": "measured"}],
82 "last_task_record": {
83 "title": "Existing branch",
84 "status": "done",
85 "result": "Validated the branch.",
86 "created": False,
87 "updated_at": "2026-01-01T00:01:00+00:00",
88 },
89 "last_source_record": {
90 "source": "https://example.test",
91 "created": False,
92 "last_seen": "2026-01-01T00:01:30+00:00",
93 },
94 "last_experiment_record": {
95 "title": "Trial",
96 "status": "measured",
97 "metric_name": "score",
98 "metric_value": 0.9,
99 "created": False,
100 "updated_at": "2026-01-01T00:02:00+00:00",
101 },
102 }
103
104 checkpoint = build_progress_checkpoint(
105 metadata,
106 previous_counts={"findings": 1, "sources": 1, "tasks": 1, "experiments": 1, "lessons": 0, "milestones": 0},
107 step_no=60,
108 tool_name="record_tasks",
109 )
110
111 assert checkpoint.category == "progress"
112 assert checkpoint.deltas["tasks"] == 0
113 assert checkpoint.updates["tasks"] == 1
114 assert checkpoint.updates["sources"] == 1
115 assert checkpoint.resolutions["tasks"] == 1
116 assert checkpoint.updates["experiments"] == 1
117 assert checkpoint.resolutions["experiments"] == 1
118 assert "~1 task updated" in checkpoint.message
119 assert "~1 source updated" in checkpoint.message
120 assert "1 task resolved" in checkpoint.message
121 assert "~1 experiment updated" in checkpoint.message
122
123
124def test_progress_checkpoint_ignores_non_substantive_record_touches():
125 metadata = {
126 "last_checkpoint_at": "2026-01-01T00:00:00+00:00",
127 "finding_ledger": [{}],
128 "source_ledger": [{}],
129 "task_queue": [{"title": "Existing branch", "status": "open"}],
130 "experiment_ledger": [{"title": "Trial", "status": "planned"}],
131 "last_task_record": {
132 "title": "Existing branch",
133 "status": "open",
134 "created": False,
135 "substantive_update": False,
136 "updated_at": "2026-01-01T00:01:00+00:00",
137 },
138 "last_source_record": {
139 "source": "https://example.test",
140 "created": False,
141 "substantive_update": False,
142 "last_seen": "2026-01-01T00:01:30+00:00",
143 },
144 "last_experiment_record": {
145 "title": "Trial",
146 "status": "planned",
147 "created": False,
148 "substantive_update": False,
149 "updated_at": "2026-01-01T00:02:00+00:00",
150 },
151 }
152
153 checkpoint = build_progress_checkpoint(
154 metadata,
155 previous_counts={"findings": 1, "sources": 1, "tasks": 1, "experiments": 1, "lessons": 0, "milestones": 0},
156 step_no=61,
157 tool_name="record_tasks",
158 )
159
160 assert checkpoint.category == "activity"
161 assert checkpoint.updates["tasks"] == 0
162 assert checkpoint.updates["sources"] == 0
163 assert checkpoint.updates["experiments"] == 0
164 assert "no new durable ledger entries" in checkpoint.message
165
166
167def test_progress_checkpoint_counts_roadmap_updates_and_validations():
168 metadata = {
169 "last_checkpoint_at": "2026-01-01T00:00:00+00:00",
170 "roadmap": {"milestones": [{"title": "Foundation", "status": "validating"}]},
171 "last_roadmap_record": {
172 "title": "Roadmap",
173 "created": False,
174 "updated_at": "2026-01-01T00:01:00+00:00",
175 "added_milestones": 0,
176 "updated_milestones": 1,
177 "added_features": 0,
178 "updated_features": 0,
179 },
180 "last_milestone_validation": {
181 "milestone": "Foundation",
182 "validation_status": "passed",
183 "validated_at": "2026-01-01T00:02:00+00:00",
184 },
185 }
186
187 checkpoint = build_progress_checkpoint(
188 metadata,
189 previous_counts={"findings": 0, "sources": 0, "tasks": 0, "experiments": 0, "lessons": 0, "milestones": 1},
190 step_no=70,
191 tool_name="record_milestone_validation",
192 )
193
194 assert checkpoint.category == "progress"
195 assert checkpoint.deltas["milestones"] == 0
196 assert checkpoint.updates["milestones"] == 2
197 assert checkpoint.resolutions["milestones"] == 1
198 assert "~2 milestones updated" in checkpoint.message
199 assert "1 milestone resolved" in checkpoint.message
200
201
202def test_progress_helpers_ignore_malformed_metadata():
203 metadata = {
204 "finding_ledger": "bad",
205 "source_ledger": [None, {"url": "ok"}],
206 "task_queue": [{"title": "Task", "status": "blocked", "priority": "bad"}],
207 "roadmap": {"milestones": ["bad", {"title": "Milestone", "status": "active"}]},
208 }
209
210 assert ledger_counts(metadata)["sources"] == 1
211 assert ledger_counts(metadata)["milestones"] == 2
212 bits = recent_progress_bits(metadata)
213 assert "task=Task" in bits
214 assert "milestone=Milestone" in bits
tests/nipux_cli/test_project_atlas.py 46 lines
1import importlib.util
2import sys
3from pathlib import Path
4
5
6def _load_generator():
7 path = Path(__file__).resolve().parents[2] / "scripts" / "generate_project_atlas.py"
8 spec = importlib.util.spec_from_file_location("generate_project_atlas", path)
9 assert spec is not None
10 module = importlib.util.module_from_spec(spec)
11 assert spec.loader is not None
12 sys.modules[spec.name] = module
13 spec.loader.exec_module(module)
14 return module
15
16
17def test_project_atlas_generator_maps_prompts_tools_and_source_without_self_embedding():
18 generator = _load_generator()
19
20 files = generator.load_source_files()
21 prompts = generator.extract_prompts(files)
22 tools = generator.extract_tools(files)
23
24 assert "docs/project-atlas.html" not in {source.path for source in files}
25 assert any(prompt.path == "nipux_cli/worker_policy.py" and "SYSTEM_PROMPT" in prompt.name for prompt in prompts)
26 assert any(tool["name"] == "web_search" for tool in tools)
27
28
29def test_project_atlas_redacts_secret_assignments_from_rendered_source():
30 generator = _load_generator()
31 openrouter_key = "OPENROUTER_API_KEY"
32 openai_key = "OPENAI_API_KEY"
33 source = generator.SourceFile(
34 path=".env.example",
35 text=f"{openrouter_key}=\n{openai_key}=secret\nNORMAL=value",
36 lines=[openrouter_key + "=", openai_key + "=secret", "NORMAL=value"],
37 tree=None,
38 )
39
40 rendered = generator.render_source_file(source)
41
42 assert openrouter_key + "=" not in rendered
43 assert openai_key + "=secret" not in rendered
44 assert f"{openrouter_key} = <redacted>" in rendered
45 assert f"{openai_key} = <redacted>" in rendered
46 assert "NORMAL=value" in rendered
tests/nipux_cli/test_provider_errors.py 21 lines
1from nipux_cli.provider_errors import (
2 provider_action_required,
3 provider_action_required_note,
4 provider_rate_limited,
5)
6
7
8class ProviderPayloadError(Exception):
9 payload = {"error": {"message": "Key limit exceeded", "code": 403}}
10
11
12def test_provider_action_required_detects_payload_and_status_text():
13 assert provider_action_required(ProviderPayloadError("provider rejected request"))
14 assert provider_action_required("PermissionDeniedError: Error code: 403")
15 assert "operator action" in provider_action_required_note("invalid api key")
16
17
18def test_provider_rate_limited_detects_transient_rate_text():
19 assert provider_rate_limited("429 too many requests")
20 assert provider_rate_limited("provider temporarily over capacity")
21 assert not provider_rate_limited("invalid api key")
tests/nipux_cli/test_templates.py 15 lines
1from nipux_cli.templates import program_for_job
2
3
4def test_generic_template_pushes_artifacts_and_updates():
5 program = program_for_job(kind="generic", title="research", objective="Find findings")
6
7 assert "Save important observations as artifacts" in program
8 assert "Use report_update" in program
9 assert "Use record_lesson" in program
10 assert "record_source" in program
11 assert "record_findings" in program
12 assert "record_tasks" in program
13 assert "record_roadmap" in program
14 assert "record_milestone_validation" in program
15 assert "record_findings" in program
tests/nipux_cli/test_tools.py 2166 lines
1import json
2import os
3import signal
4import subprocess
5import time
6
7from nipux_cli.artifacts import ArtifactStore
8from nipux_cli.config import AppConfig, RuntimeConfig, ToolAccessConfig
9from nipux_cli.db import AgentDB
10from nipux_cli.shell_tools import cleanup_registered_shell_processes
11from nipux_cli.tools import APPROVED_TOOL_NAMES, DEFAULT_REGISTRY, ToolContext
12
13
14def test_static_tool_surface_is_focused():
15 assert tuple(DEFAULT_REGISTRY.names()) == tuple(sorted(APPROVED_TOOL_NAMES))
16 assert "terminal" not in DEFAULT_REGISTRY.names()
17 assert "delegate_task" not in DEFAULT_REGISTRY.names()
18 assert "skill_manage" not in DEFAULT_REGISTRY.names()
19 assert "browser_navigate" in DEFAULT_REGISTRY.names()
20 assert "shell_exec" in DEFAULT_REGISTRY.names()
21 assert "write_file" in DEFAULT_REGISTRY.names()
22 assert "write_artifact" in DEFAULT_REGISTRY.names()
23 assert "defer_job" in DEFAULT_REGISTRY.names()
24 assert "report_update" in DEFAULT_REGISTRY.names()
25 assert "record_lesson" in DEFAULT_REGISTRY.names()
26 assert "record_memory_graph" in DEFAULT_REGISTRY.names()
27 assert "search_memory_graph" in DEFAULT_REGISTRY.names()
28 assert "record_source" in DEFAULT_REGISTRY.names()
29 assert "record_findings" in DEFAULT_REGISTRY.names()
30 assert "record_tasks" in DEFAULT_REGISTRY.names()
31 assert "record_roadmap" in DEFAULT_REGISTRY.names()
32 assert "record_milestone_validation" in DEFAULT_REGISTRY.names()
33 assert "record_experiment" in DEFAULT_REGISTRY.names()
34 assert "acknowledge_operator_context" in DEFAULT_REGISTRY.names()
35
36
37def test_tool_registry_validates_required_arguments(tmp_path):
38 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
39
40 missing = DEFAULT_REGISTRY.validate_arguments("shell_exec", {}, config)
41 assert missing is not None
42 assert missing["missing_arguments"] == ["command"]
43 assert missing["recoverable"] is True
44
45 artifact_ref = DEFAULT_REGISTRY.validate_arguments("read_artifact", {}, config)
46 assert artifact_ref is not None
47 assert artifact_ref["missing_arguments"] == ["artifact reference"]
48
49 graph = DEFAULT_REGISTRY.validate_arguments("record_memory_graph", {}, config)
50 assert graph is not None
51 assert graph["missing_arguments"] == ["nodes or edges"]
52
53 experiment = DEFAULT_REGISTRY.validate_arguments("record_experiment", {"metric_name": "throughput"}, config)
54 assert experiment is None
55
56 nested = DEFAULT_REGISTRY.validate_arguments("record_findings", {"findings": [{}]}, config)
57 assert nested is not None
58 assert nested["missing_arguments"] == ["findings[0].name"]
59
60 nested_task = DEFAULT_REGISTRY.validate_arguments("record_tasks", {"tasks": [{"goal": "do work"}]}, config)
61 assert nested_task is not None
62 assert nested_task["missing_arguments"] == ["tasks[0].title"]
63
64
65def test_tool_registry_blocks_truncated_reference_arguments(tmp_path):
66 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
67
68 experiment = DEFAULT_REGISTRY.validate_arguments(
69 "record_experiment",
70 {
71 "title": "Measure local files",
72 "evidence_artifact": "art_fb73...",
73 "next_action": "validate the exact artifact",
74 },
75 config,
76 )
77
78 assert experiment is not None
79 assert experiment["error"] == "placeholder tool arguments"
80 assert experiment["placeholder_arguments"] == ["evidence_artifact"]
81
82
83def test_tool_access_config_filters_worker_schema_and_blocks_calls(tmp_path):
84 config = AppConfig(runtime=RuntimeConfig(home=tmp_path), tools=ToolAccessConfig(browser=False, web=False, shell=False, files=False))
85 db = AgentDB(tmp_path / "state.db")
86 try:
87 job_id = db.create_job("Restricted tools")
88 ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id)
89
90 names = {tool["function"]["name"] for tool in DEFAULT_REGISTRY.openai_tools(config=config)}
91 assert "browser_navigate" not in names
92 assert "web_search" not in names
93 assert "shell_exec" not in names
94 assert "write_file" not in names
95 assert "write_artifact" in names
96
97 result = json.loads(DEFAULT_REGISTRY.handle("shell_exec", {"command": "printf no"}, ctx))
98 assert result["success"] is False
99 assert result["tool_access"] == "shell"
100 finally:
101 db.close()
102
103
104def test_artifact_tools_roundtrip(tmp_path):
105 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
106 db = AgentDB(tmp_path / "state.db")
107 try:
108 job_id = db.create_job("Save evidence")
109 run_id = db.start_run(job_id, model="fake")
110 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="write_artifact")
111 ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
112
113 raw = DEFAULT_REGISTRY.handle("write_artifact", {"content": "needle text", "title": "Evidence"}, ctx)
114 result = json.loads(raw)
115 assert result["success"] is True
116
117 read_raw = DEFAULT_REGISTRY.handle("read_artifact", {"artifact_id": result["artifact_id"]}, ctx)
118 assert json.loads(read_raw)["content"] == "needle text"
119
120 path_raw = DEFAULT_REGISTRY.handle("read_artifact", {"artifact_id": result["path"]}, ctx)
121 assert json.loads(path_raw)["artifact_id"] == result["artifact_id"]
122
123 title_raw = DEFAULT_REGISTRY.handle("read_artifact", {"title": "Evidence"}, ctx)
124 assert json.loads(title_raw)["content"] == "needle text"
125
126 number_raw = DEFAULT_REGISTRY.handle("read_artifact", {"artifact_id": "1"}, ctx)
127 assert json.loads(number_raw)["content"] == "needle text"
128
129 search_raw = DEFAULT_REGISTRY.handle("search_artifacts", {"query": "needle"}, ctx)
130 assert json.loads(search_raw)["results"][0]["id"] == result["artifact_id"]
131 finally:
132 db.close()
133
134
135def test_read_artifact_missing_ref_returns_valid_recent_refs(tmp_path):
136 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
137 db = AgentDB(tmp_path / "state.db")
138 try:
139 job_id = db.create_job("Save evidence")
140 run_id = db.start_run(job_id, model="fake")
141 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="write_artifact")
142 ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
143 stored = ctx.artifacts.write_text(job_id=job_id, run_id=run_id, step_id=step_id, title="Useful Evidence", content="saved")
144
145 raw = DEFAULT_REGISTRY.handle("read_artifact", {"artifact_id": "art_missing"}, ctx)
146 result = json.loads(raw)
147
148 assert result["success"] is False
149 assert result["recoverable"] is True
150 assert result["error"] == "artifact not found: art_missing"
151 assert "search_artifacts" in result["guidance"]
152 assert result["recent_artifacts"][0]["id"] == stored.id
153 assert result["recent_artifacts"][0]["title"] == "Useful Evidence"
154 finally:
155 db.close()
156
157
158def test_defer_job_records_resume_time_without_pausing(tmp_path):
159 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
160 db = AgentDB(tmp_path / "state.db")
161 try:
162 job_id = db.create_job("Monitor a long process")
163 run_id = db.start_run(job_id, model="fake")
164 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="defer_job")
165 ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
166
167 raw = DEFAULT_REGISTRY.handle(
168 "defer_job",
169 {"seconds": 60, "reason": "process is still running", "next_action": "check status"},
170 ctx,
171 )
172 result = json.loads(raw)
173
174 assert result["success"] is True
175 assert result["status"] == "running"
176 job = db.get_job(job_id)
177 assert job["status"] == "running"
178 assert job["metadata"]["defer_until"]
179 assert job["metadata"]["defer_reason"] == "process is still running"
180 assert job["metadata"]["defer_next_action"] == "check status"
181 assert any(event["event_type"] == "agent_message" for event in db.list_events(job_id=job_id, limit=10))
182 finally:
183 db.close()
184
185
186def test_shell_exec_tool_runs_bounded_command(tmp_path):
187 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
188 db = AgentDB(tmp_path / "state.db")
189 try:
190 job_id = db.create_job("Run command")
191 run_id = db.start_run(job_id, model="fake")
192 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
193 ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
194
195 raw = DEFAULT_REGISTRY.handle("shell_exec", {"command": "printf hello", "timeout_seconds": 5}, ctx)
196 result = json.loads(raw)
197
198 assert result["success"] is True
199 assert result["returncode"] == 0
200 assert result["stdout"] == "hello"
201 finally:
202 db.close()
203
204
205def test_shell_exec_flags_masked_auth_failure_output(tmp_path):
206 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
207 db = AgentDB(tmp_path / "state.db")
208 try:
209 job_id = db.create_job("Run command")
210 run_id = db.start_run(job_id, model="fake")
211 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
212 ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
213
214 raw = DEFAULT_REGISTRY.handle(
215 "shell_exec",
216 {
217 "command": (
218 "printf 'HTTP request sent, awaiting response... 401 Unauthorized\\n"
219 "Username/Password Authentication Failed.\\nDownloaded: file.bin (29 bytes)\\n'"
220 ),
221 "timeout_seconds": 5,
222 },
223 ctx,
224 )
225 result = json.loads(raw)
226
227 assert result["returncode"] == 0
228 assert result["success"] is False
229 assert "authentication or authorization failure" in result["error"]
230 finally:
231 db.close()
232
233
234def test_write_file_tool_writes_and_appends_workspace_file(tmp_path, monkeypatch):
235 monkeypatch.chdir(tmp_path)
236 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
237 db = AgentDB(tmp_path / "state.db")
238 try:
239 job_id = db.create_job("Write deliverable")
240 run_id = db.start_run(job_id, model="fake")
241 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="write_file")
242 ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
243
244 raw = DEFAULT_REGISTRY.handle("write_file", {"path": "out/report.md", "content": "one\n"}, ctx)
245 result = json.loads(raw)
246 append_raw = DEFAULT_REGISTRY.handle(
247 "write_file",
248 {"path": "out/report.md", "content": "two\n", "mode": "append"},
249 ctx,
250 )
251 append_result = json.loads(append_raw)
252
253 assert result["success"] is True
254 assert append_result["success"] is True
255 assert (tmp_path / "out" / "report.md").read_text() == "one\ntwo\n"
256 finally:
257 db.close()
258
259
260def test_shell_exec_timeout_kills_process_group(tmp_path):
261 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
262 db = AgentDB(tmp_path / "state.db")
263 try:
264 job_id = db.create_job("Run command")
265 run_id = db.start_run(job_id, model="fake")
266 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
267 ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
268
269 raw = DEFAULT_REGISTRY.handle("shell_exec", {"command": "sleep 5 | cat", "timeout_seconds": 1}, ctx)
270 result = json.loads(raw)
271
272 assert result["success"] is False
273 assert result["timed_out"] is True
274 assert result["duration_seconds"] < 4
275 finally:
276 db.close()
277
278
279def test_cleanup_registered_shell_processes_kills_orphaned_group(tmp_path):
280 process = subprocess.Popen("sleep 30", shell=True, start_new_session=True)
281 for _ in range(20):
282 if process.poll() is None:
283 try:
284 os.kill(process.pid, 0)
285 break
286 except ProcessLookupError:
287 pass
288 time.sleep(0.02)
289 registry = tmp_path / "runtime" / "shell_processes.jsonl"
290 registry.parent.mkdir(parents=True)
291 registry.write_text(json.dumps({"pid": process.pid, "command": "sleep 30"}) + "\n", encoding="utf-8")
292 try:
293 cleaned = cleanup_registered_shell_processes(tmp_path)
294
295 assert cleaned and cleaned[0]["pid"] == process.pid
296 process.wait(timeout=3)
297 assert not registry.exists()
298 finally:
299 if process.poll() is None:
300 try:
301 os.killpg(process.pid, signal.SIGKILL)
302 except ProcessLookupError:
303 pass
304 process.wait(timeout=3)
305
306
307def test_shell_exec_does_not_attach_local_ssh_config(tmp_path):
308 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
309 db = AgentDB(tmp_path / "state.db")
310 try:
311 job_id = db.create_job("Run command")
312 run_id = db.start_run(job_id, model="fake")
313 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
314 ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
315
316 raw = DEFAULT_REGISTRY.handle("shell_exec", {"command": "ssh -V", "timeout_seconds": 5}, ctx)
317 result = json.loads(raw)
318
319 assert "ssh_config" not in result
320 finally:
321 db.close()
322
323
324def test_shell_exec_reports_nonzero_stderr_as_error(tmp_path):
325 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
326 db = AgentDB(tmp_path / "state.db")
327 try:
328 job_id = db.create_job("Run command")
329 run_id = db.start_run(job_id, model="fake")
330 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
331 ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
332
333 raw = DEFAULT_REGISTRY.handle(
334 "shell_exec",
335 {"command": "printf 'sudo: a terminal is required to read the password\\n' >&2; exit 1", "timeout_seconds": 5},
336 ctx,
337 )
338 result = json.loads(raw)
339
340 assert result["success"] is False
341 assert "interactive sudo/password" in result["error"]
342 finally:
343 db.close()
344
345
346def test_shell_exec_flags_sudo_password_hidden_by_success_status(tmp_path):
347 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
348 db = AgentDB(tmp_path / "state.db")
349 try:
350 job_id = db.create_job("Run command")
351 run_id = db.start_run(job_id, model="fake")
352 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
353 ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
354
355 raw = DEFAULT_REGISTRY.handle(
356 "shell_exec",
357 {
358 "command": (
359 "printf 'sudo: a terminal is required to read the password\\n"
360 "sudo: a password is required\\n'"
361 ),
362 "timeout_seconds": 5,
363 },
364 ctx,
365 )
366 result = json.loads(raw)
367
368 assert result["returncode"] == 0
369 assert result["success"] is False
370 assert "interactive sudo/password" in result["error"]
371 finally:
372 db.close()
373
374
375def test_shell_exec_flags_missing_command_hidden_by_success_status(tmp_path):
376 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
377 db = AgentDB(tmp_path / "state.db")
378 try:
379 job_id = db.create_job("Run command")
380 run_id = db.start_run(job_id, model="fake")
381 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
382 ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
383
384 raw = DEFAULT_REGISTRY.handle(
385 "shell_exec",
386 {"command": "printf '/bin/sh: 1: build-tool: not found\\n'", "timeout_seconds": 5},
387 ctx,
388 )
389 result = json.loads(raw)
390
391 assert result["success"] is False
392 assert "missing command" in result["error"]
393 finally:
394 db.close()
395
396
397def test_shell_exec_flags_missing_absolute_executable_hidden_by_success_status(tmp_path):
398 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
399 db = AgentDB(tmp_path / "state.db")
400 try:
401 job_id = db.create_job("Run command")
402 run_id = db.start_run(job_id, model="fake")
403 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
404 ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
405
406 raw = DEFAULT_REGISTRY.handle(
407 "shell_exec",
408 {"command": "printf '/bin/sh: 1: /tmp/tools/build-tool: not found\\n'", "timeout_seconds": 5},
409 ctx,
410 )
411 result = json.loads(raw)
412
413 assert result["returncode"] == 0
414 assert result["success"] is False
415 assert "missing command" in result["error"]
416 assert "/tmp/tools/build-tool: not found" in result["error"]
417 finally:
418 db.close()
419
420
421def test_shell_exec_reports_empty_which_probe_as_missing_executable(tmp_path):
422 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
423 db = AgentDB(tmp_path / "state.db")
424 try:
425 job_id = db.create_job("Run command")
426 run_id = db.start_run(job_id, model="fake")
427 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
428 ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
429
430 raw = DEFAULT_REGISTRY.handle(
431 "shell_exec",
432 {"command": "which definitely-missing-nipux-test-command", "timeout_seconds": 5},
433 ctx,
434 )
435 result = json.loads(raw)
436
437 assert result["success"] is False
438 assert result["returncode"] == 1
439 assert result["error"] == "command probe found no executable: definitely-missing-nipux-test-command"
440 finally:
441 db.close()
442
443
444def test_shell_exec_flags_empty_successful_probe_as_no_observation(tmp_path):
445 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
446 db = AgentDB(tmp_path / "state.db")
447 try:
448 job_id = db.create_job("Run command")
449 run_id = db.start_run(job_id, model="fake")
450 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
451 ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
452
453 raw = DEFAULT_REGISTRY.handle(
454 "shell_exec",
455 {"command": "find /tmp/definitely-missing-nipux-test-path -maxdepth 1 2>/dev/null || true", "timeout_seconds": 5},
456 ctx,
457 )
458 result = json.loads(raw)
459
460 assert result["returncode"] == 0
461 assert result["success"] is False
462 assert "produced no output" in result["error"]
463 assert "filesystem probe" in result["error"]
464 finally:
465 db.close()
466
467
468def test_shell_exec_flags_missing_which_probe_hidden_by_true(tmp_path):
469 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
470 db = AgentDB(tmp_path / "state.db")
471 try:
472 job_id = db.create_job("Run command")
473 run_id = db.start_run(job_id, model="fake")
474 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
475 ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
476
477 raw = DEFAULT_REGISTRY.handle(
478 "shell_exec",
479 {"command": "which definitely-missing-nipux-test-command || true", "timeout_seconds": 5},
480 ctx,
481 )
482 result = json.loads(raw)
483
484 assert result["returncode"] == 0
485 assert result["success"] is False
486 assert "probe found no executable" in result["error"]
487 finally:
488 db.close()
489
490
491def test_shell_exec_flags_make_failure_hidden_by_pipe_status(tmp_path):
492 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
493 db = AgentDB(tmp_path / "state.db")
494 try:
495 job_id = db.create_job("Run command")
496 run_id = db.start_run(job_id, model="fake")
497 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
498 ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
499
500 raw = DEFAULT_REGISTRY.handle(
501 "shell_exec",
502 {"command": "printf 'Makefile:6: *** Build system changed:\\n. Stop.\\n'", "timeout_seconds": 5},
503 ctx,
504 )
505 result = json.loads(raw)
506
507 assert result["success"] is False
508 assert "build/tool failure" in result["error"]
509 finally:
510 db.close()
511
512
513def test_update_job_state_keeps_terminal_statuses_operator_only(tmp_path):
514 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
515 db = AgentDB(tmp_path / "state.db")
516 try:
517 job_id = db.create_job("Keep running")
518 run_id = db.start_run(job_id, model="fake")
519 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="update_job_state")
520 ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
521
522 for requested in ("paused", "cancelled", "completed", "failed"):
523 raw = DEFAULT_REGISTRY.handle("update_job_state", {"status": requested}, ctx)
524 result = json.loads(raw)
525
526 assert result["success"] is True
527 assert result["requested_status"] == requested
528 assert result["kept_running"] is True
529 assert db.get_job(job_id)["status"] == "running"
530 if requested == "completed":
531 assert result["follow_up_task"]["title"] == "Audit latest checkpoint against objective"
532 assert result["follow_up_task"]["status"] == "open"
533 assert result["follow_up_task"]["output_contract"] == "decision"
534 assert "prompt-to-artifact checklist" in result["follow_up_task"]["acceptance_criteria"]
535 assert result["follow_up_task"]["evidence_needed"]
536 assert result["follow_up_task"]["stall_behavior"]
537 assert result["follow_up_task"]["metadata"]["source"] == "update_job_state"
538 assert result["follow_up_task"]["metadata"]["completion_audit_required"] is True
539 else:
540 assert "follow_up_task" not in result
541
542 tasks = db.get_job(job_id)["metadata"]["task_queue"]
543 assert [task["title"] for task in tasks] == ["Audit latest checkpoint against objective"]
544 finally:
545 db.close()
546
547
548def test_report_update_tool_records_operator_visible_note(tmp_path):
549 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
550 db = AgentDB(tmp_path / "state.db")
551 try:
552 job_id = db.create_job("Research topic")
553 run_id = db.start_run(job_id, model="fake")
554 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="report_update")
555 ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
556
557 raw = DEFAULT_REGISTRY.handle("report_update", {"message": "Found a usable finding source", "category": "finding"}, ctx)
558 result = json.loads(raw)
559 job = db.get_job(job_id)
560
561 assert result["success"] is True
562 assert job["metadata"]["agent_updates"][-1]["message"] == "Found a usable finding source"
563 assert job["metadata"]["last_agent_update"]["category"] == "finding"
564 finally:
565 db.close()
566
567
568def test_record_lesson_tool_records_durable_learning(tmp_path):
569 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
570 db = AgentDB(tmp_path / "state.db")
571 try:
572 job_id = db.create_job("Research topic")
573 run_id = db.start_run(job_id, model="fake")
574 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_lesson")
575 ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
576
577 raw = DEFAULT_REGISTRY.handle(
578 "record_lesson",
579 {"lesson": "Competitor low-evidence lists are not finding sources.", "category": "source_quality", "confidence": 0.8},
580 ctx,
581 )
582 result = json.loads(raw)
583 job = db.get_job(job_id)
584
585 assert result["success"] is True
586 assert job["metadata"]["lessons"][-1]["lesson"] == "Competitor low-evidence lists are not finding sources."
587 assert job["metadata"]["last_lesson"]["category"] == "source_quality"
588 finally:
589 db.close()
590
591
592def test_record_lesson_cannot_clear_measurement_obligation_with_vague_lesson(tmp_path):
593 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
594 db = AgentDB(tmp_path / "state.db")
595 try:
596 job_id = db.create_job(
597 "Improve a measurable process",
598 metadata={
599 "pending_measurement_obligation": {
600 "source_step_no": 4,
601 "tool": "shell_exec",
602 "metric_candidates": ["42 units/s"],
603 "command": "run trial",
604 }
605 },
606 )
607 run_id = db.start_run(job_id, model="fake")
608 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_lesson")
609 ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
610
611 raw = DEFAULT_REGISTRY.handle(
612 "record_lesson",
613 {"lesson": "continue focused work", "category": "strategy"},
614 ctx,
615 )
616 result = json.loads(raw)
617 job = db.get_job(job_id)
618
619 assert result["success"] is False
620 assert result["error"] == "measurement explanation required"
621 assert job["metadata"]["pending_measurement_obligation"]["source_step_no"] == 4
622 assert "lessons" not in job["metadata"]
623 finally:
624 db.close()
625
626
627def test_record_lesson_can_explain_invalid_measurement_obligation(tmp_path):
628 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
629 db = AgentDB(tmp_path / "state.db")
630 try:
631 job_id = db.create_job(
632 "Improve a measurable process",
633 metadata={
634 "pending_measurement_obligation": {
635 "source_step_no": 4,
636 "tool": "shell_exec",
637 "metric_candidates": ["42 units/s"],
638 "command": "run trial",
639 }
640 },
641 )
642 run_id = db.start_run(job_id, model="fake")
643 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_lesson")
644 ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
645
646 raw = DEFAULT_REGISTRY.handle(
647 "record_lesson",
648 {
649 "lesson": (
650 "The output was diagnostic only and did not contain a valid metric; "
651 "rerun the branch with a measured trial."
652 ),
653 "category": "mistake",
654 },
655 ctx,
656 )
657 result = json.loads(raw)
658 job = db.get_job(job_id)
659
660 assert result["success"] is True
661 assert job["metadata"].get("pending_measurement_obligation") == {}
662 assert job["metadata"]["last_measurement_obligation"]["resolution_status"] == "explained"
663 assert job["metadata"]["last_measurement_obligation"]["resolution_tool"] == "record_lesson"
664 finally:
665 db.close()
666
667
668def test_memory_graph_tools_roundtrip(tmp_path):
669 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
670 db = AgentDB(tmp_path / "state.db")
671 try:
672 job_id = db.create_job("Build durable project understanding")
673 run_id = db.start_run(job_id, model="fake")
674 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_memory_graph")
675 ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
676
677 raw = DEFAULT_REGISTRY.handle(
678 "record_memory_graph",
679 {
680 "nodes": [
681 {
682 "title": "Use measured checkpoints before expanding scope",
683 "kind": "strategy",
684 "status": "active",
685 "summary": "Convert branch outcomes into evidence-backed decisions before opening more work.",
686 "salience": 0.9,
687 "tags": ["progress", "validation"],
688 "evidence_refs": ["art_123"],
689 },
690 {
691 "title": "Open question: missing evaluator",
692 "kind": "question",
693 "status": "open",
694 "summary": "The job needs a concrete validation signal for the next branch.",
695 },
696 ],
697 "edges": [
698 {
699 "from_key": "Use measured checkpoints before expanding scope",
700 "to_key": "Open question: missing evaluator",
701 "relation": "raises",
702 }
703 ],
704 },
705 ctx,
706 )
707 result = json.loads(raw)
708
709 assert result["success"] is True
710 assert result["added_nodes"] == 2
711 assert result["added_edges"] == 1
712 job = db.get_job(job_id)
713 graph = job["metadata"]["memory_graph"]
714 assert len(graph["nodes"]) == 2
715 assert graph["nodes"][0]["kind"] == "strategy"
716 assert graph["nodes"][0]["evidence_refs"] == ["art_123"]
717 assert db.list_events(job_id=job_id, event_types=["memory_node"])[0]["title"] == "memory graph"
718
719 search_raw = DEFAULT_REGISTRY.handle("search_memory_graph", {"query": "evaluator"}, ctx)
720 search = json.loads(search_raw)
721 assert search["success"] is True
722 assert search["nodes"][0]["title"] == "Open question: missing evaluator"
723 assert search["edges"][0]["relation"] == "raises"
724 finally:
725 db.close()
726
727
728def test_record_source_and_findings_tools_update_ledgers(tmp_path):
729 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
730 db = AgentDB(tmp_path / "state.db")
731 try:
732 job_id = db.create_job("Research topic")
733 run_id = db.start_run(job_id, model="fake")
734 source_step = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_source")
735 ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=source_step)
736
737 source_raw = DEFAULT_REGISTRY.handle(
738 "record_source",
739 {"source": "https://example.com", "source_type": "web_source", "usefulness_score": 0.8, "yield_count": 2},
740 ctx,
741 )
742 finding_raw = DEFAULT_REGISTRY.handle(
743 "record_findings",
744 {
745 "findings": [
746 {
747 "name": "Acme Finding",
748 "url": "https://acme.example",
749 "source_url": "https://example-source.com/acme",
750 "location": "Toronto",
751 "category": "example category",
752 "reason": "reusable result",
753 "score": 0.75,
754 }
755 ]
756 },
757 ctx,
758 )
759 job = db.get_job(job_id)
760
761 assert json.loads(source_raw)["source"]["yield_count"] == 2
762 finding_result = json.loads(finding_raw)
763 assert finding_result["added"] == 1
764 assert finding_result["sources_updated"] == 1
765 assert job["metadata"]["source_ledger"][0]["source"] == "https://example.com"
766 assert any(source["source"] == "https://example-source.com/acme" for source in job["metadata"]["source_ledger"])
767 assert job["metadata"]["finding_ledger"][0]["name"] == "Acme Finding"
768 assert job["metadata"]["last_agent_update"]["category"] == "finding"
769 finally:
770 db.close()
771
772
773def test_record_source_requires_assessment(tmp_path):
774 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
775 db = AgentDB(tmp_path / "state.db")
776 try:
777 job_id = db.create_job("Research topic")
778 run_id = db.start_run(job_id, model="fake")
779 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_source")
780 ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
781
782 raw = DEFAULT_REGISTRY.handle("record_source", {"source": "https://example.com"}, ctx)
783 result = json.loads(raw)
784
785 assert result["success"] is False
786 assert result["error"] == "source assessment is required"
787 assert db.get_job(job_id)["metadata"].get("source_ledger") is None
788 finally:
789 db.close()
790
791
792def test_record_source_does_not_accept_type_without_assessment(tmp_path):
793 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
794 db = AgentDB(tmp_path / "state.db")
795 try:
796 job_id = db.create_job("Research topic")
797 run_id = db.start_run(job_id, model="fake")
798 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_source")
799 ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
800
801 raw = DEFAULT_REGISTRY.handle(
802 "record_source",
803 {"source": "https://example.com", "source_type": "web_source"},
804 ctx,
805 )
806 result = json.loads(raw)
807
808 assert result["success"] is False
809 assert result["error"] == "source assessment is required"
810 finally:
811 db.close()
812
813
814def test_record_findings_reports_unchanged_duplicates_without_agent_update_noise(tmp_path):
815 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
816 db = AgentDB(tmp_path / "state.db")
817 try:
818 job_id = db.create_job("Research topic")
819 run_id = db.start_run(job_id, model="fake")
820 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_findings")
821 ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
822 args = {
823 "findings": [
824 {
825 "name": "Reusable finding",
826 "source_url": "https://example-source.com/finding",
827 "reason": "Evidence-backed result",
828 "score": 0.75,
829 }
830 ]
831 }
832
833 first = json.loads(DEFAULT_REGISTRY.handle("record_findings", args, ctx))
834 agent_events_after_first = len(db.list_events(job_id=job_id, event_types=["agent_message"]))
835 repeated = json.loads(DEFAULT_REGISTRY.handle("record_findings", args, ctx))
836 agent_events_after_repeat = len(db.list_events(job_id=job_id, event_types=["agent_message"]))
837
838 assert first["added"] == 1
839 assert first["updated"] == 0
840 assert first["unchanged"] == 0
841 assert repeated["added"] == 0
842 assert repeated["updated"] == 0
843 assert repeated["unchanged"] == 1
844 assert agent_events_after_repeat == agent_events_after_first
845 finally:
846 db.close()
847
848
849def test_record_findings_requires_evidence_anchor(tmp_path):
850 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
851 db = AgentDB(tmp_path / "state.db")
852 try:
853 job_id = db.create_job("Research topic")
854 run_id = db.start_run(job_id, model="fake")
855 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_findings")
856 ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
857
858 raw = DEFAULT_REGISTRY.handle(
859 "record_findings",
860 {"findings": [{"name": "Unsupported label", "category": "candidate"}]},
861 ctx,
862 )
863 result = json.loads(raw)
864
865 assert result["success"] is False
866 assert result["error"] == "no valid finding with name/title and evidence was provided"
867 assert result["rejected"] == [{"name": "Unsupported label", "reason": "missing_evidence"}]
868 assert db.get_job(job_id)["metadata"].get("finding_ledger") is None
869 finally:
870 db.close()
871
872
873def test_record_findings_reports_rejected_unevidenced_items_in_mixed_batch(tmp_path):
874 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
875 db = AgentDB(tmp_path / "state.db")
876 try:
877 job_id = db.create_job("Research topic")
878 run_id = db.start_run(job_id, model="fake")
879 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_findings")
880 ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
881
882 raw = DEFAULT_REGISTRY.handle(
883 "record_findings",
884 {
885 "findings": [
886 {"name": "Unsupported label"},
887 {"name": "Evidence-backed result", "metadata": {"source_url": "file:///tmp/evidence.txt"}},
888 ]
889 },
890 ctx,
891 )
892 result = json.loads(raw)
893 job = db.get_job(job_id)
894
895 assert result["success"] is True
896 assert result["added"] == 1
897 assert result["rejected"] == [{"name": "Unsupported label", "reason": "missing_evidence"}]
898 assert job["metadata"]["finding_ledger"][0]["name"] == "Evidence-backed result"
899 assert job["metadata"]["last_agent_update"]["metadata"]["rejected"] == 1
900 finally:
901 db.close()
902
903
904def test_record_tasks_tool_updates_task_queue(tmp_path):
905 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
906 db = AgentDB(tmp_path / "state.db")
907 try:
908 job_id = db.create_job("Research topic")
909 run_id = db.start_run(job_id, model="fake")
910 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_tasks")
911 ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
912
913 raw = DEFAULT_REGISTRY.handle(
914 "record_tasks",
915 {
916 "tasks": [
917 {
918 "title": "Explore primary sources",
919 "status": "open",
920 "priority": 5,
921 "goal": "Find artifact-backed evidence",
922 "source_hint": "official docs",
923 }
924 ]
925 },
926 ctx,
927 )
928 result = json.loads(raw)
929 job = db.get_job(job_id)
930
931 assert result["success"] is True
932 assert result["added"] == 1
933 task = job["metadata"]["task_queue"][0]
934 assert task["title"] == "Explore primary sources"
935 assert task["priority"] == 5
936 assert task["output_contract"] == "research"
937 assert task["acceptance_criteria"]
938 assert task["evidence_needed"]
939 assert task["stall_behavior"]
940 assert task["metadata"]["contract_inferred_fields"] == [
941 "acceptance_criteria",
942 "evidence_needed",
943 "output_contract",
944 "stall_behavior",
945 ]
946 assert job["metadata"]["last_agent_update"]["category"] == "plan"
947 finally:
948 db.close()
949
950
951def test_record_tasks_dedupes_semantic_task_under_backlog_pressure(tmp_path):
952 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
953 db = AgentDB(tmp_path / "state.db")
954 try:
955 job_id = db.create_job(
956 "Keep a long-running job focused",
957 metadata={
958 "task_queue": [
959 {
960 "title": "Validate model files and run baseline benchmark",
961 "status": "open",
962 "priority": 5,
963 "goal": "Get a measured baseline.",
964 },
965 *[
966 {"title": f"Done branch {index}", "status": "done", "priority": 0}
967 for index in range(81)
968 ],
969 ]
970 },
971 )
972 run_id = db.start_run(job_id, model="fake")
973 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_tasks")
974 ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
975
976 raw = DEFAULT_REGISTRY.handle(
977 "record_tasks",
978 {
979 "tasks": [
980 {
981 "title": "Validate candidate model files and run baseline benchmark",
982 "status": "active",
983 "priority": 10,
984 "goal": "Use the existing validation branch for the first measured run.",
985 }
986 ]
987 },
988 ctx,
989 )
990 result = json.loads(raw)
991 job = db.get_job(job_id)
992 task_queue = job["metadata"]["task_queue"]
993
994 assert result["success"] is True
995 assert result["added"] == 0
996 assert result["updated"] == 1
997 assert len(task_queue) == 82
998 task = task_queue[0]
999 assert task["title"] == "Validate model files and run baseline benchmark"
1000 assert task["status"] == "active"
1001 assert task["metadata"]["original_title"] == "Validate candidate model files and run baseline benchmark"
1002 assert task["metadata"]["matched_existing_task"]["title"] == "Validate model files and run baseline benchmark"
1003 finally:
1004 db.close()
1005
1006
1007def test_record_tasks_reports_unchanged_duplicates_without_agent_update_noise(tmp_path):
1008 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1009 db = AgentDB(tmp_path / "state.db")
1010 try:
1011 job_id = db.create_job("Research topic")
1012 run_id = db.start_run(job_id, model="fake")
1013 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_tasks")
1014 ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
1015 args = {
1016 "tasks": [
1017 {
1018 "title": "Explore primary sources",
1019 "status": "open",
1020 "priority": 5,
1021 "goal": "Find artifact-backed evidence",
1022 }
1023 ]
1024 }
1025
1026 first = json.loads(DEFAULT_REGISTRY.handle("record_tasks", args, ctx))
1027 agent_events_after_first = len(db.list_events(job_id=job_id, event_types=["agent_message"]))
1028 db.update_job_metadata(
1029 job_id,
1030 {"pending_measurement_obligation": {"source_step_no": 1, "metric_candidates": ["score"]}},
1031 )
1032 repeated = json.loads(DEFAULT_REGISTRY.handle("record_tasks", args, ctx))
1033 agent_events_after_repeat = len(db.list_events(job_id=job_id, event_types=["agent_message"]))
1034
1035 assert first["added"] == 1
1036 assert first["updated"] == 0
1037 assert first["unchanged"] == 0
1038 assert repeated["added"] == 0
1039 assert repeated["updated"] == 0
1040 assert repeated["unchanged"] == 1
1041 assert agent_events_after_repeat == agent_events_after_first
1042 assert db.get_job(job_id)["metadata"]["pending_measurement_obligation"]["source_step_no"] == 1
1043 finally:
1044 db.close()
1045
1046
1047def test_record_tasks_cannot_defer_measurement_with_unrelated_task(tmp_path):
1048 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1049 db = AgentDB(tmp_path / "state.db")
1050 try:
1051 job_id = db.create_job(
1052 "Improve measurable process",
1053 metadata={
1054 "pending_measurement_obligation": {
1055 "source_step_no": 8,
1056 "tool": "shell_exec",
1057 "metric_candidates": ["42 units/s"],
1058 }
1059 },
1060 )
1061 run_id = db.start_run(job_id, model="fake")
1062 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_tasks")
1063 ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
1064
1065 raw = DEFAULT_REGISTRY.handle(
1066 "record_tasks",
1067 {"tasks": [{"title": "Read more background sources", "status": "open", "output_contract": "research"}]},
1068 ctx,
1069 )
1070 result = json.loads(raw)
1071 job = db.get_job(job_id)
1072
1073 assert result["success"] is False
1074 assert result["error"] == "measurement task required"
1075 assert job["metadata"]["pending_measurement_obligation"]["source_step_no"] == 8
1076 assert "task_queue" not in job["metadata"]
1077 finally:
1078 db.close()
1079
1080
1081def test_record_tasks_can_defer_measurement_with_explicit_measurement_task(tmp_path):
1082 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1083 db = AgentDB(tmp_path / "state.db")
1084 try:
1085 job_id = db.create_job(
1086 "Improve measurable process",
1087 metadata={
1088 "pending_measurement_obligation": {
1089 "source_step_no": 8,
1090 "tool": "shell_exec",
1091 "metric_candidates": ["42 units/s"],
1092 }
1093 },
1094 )
1095 run_id = db.start_run(job_id, model="fake")
1096 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_tasks")
1097 ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
1098
1099 raw = DEFAULT_REGISTRY.handle(
1100 "record_tasks",
1101 {
1102 "tasks": [{
1103 "title": "Rerun the branch and record the missing measurement",
1104 "status": "open",
1105 "output_contract": "experiment",
1106 "acceptance_criteria": "valid metric recorded",
1107 "evidence_needed": "measured command output",
1108 "stall_behavior": "record blocker if measurement cannot be obtained",
1109 }]
1110 },
1111 ctx,
1112 )
1113 result = json.loads(raw)
1114 job = db.get_job(job_id)
1115
1116 assert result["success"] is True
1117 assert result["added"] == 1
1118 assert job["metadata"].get("pending_measurement_obligation") == {}
1119 assert job["metadata"]["last_measurement_obligation"]["resolution_status"] == "deferred"
1120 assert job["metadata"]["last_measurement_obligation"]["resolution_tool"] == "record_tasks"
1121 finally:
1122 db.close()
1123
1124
1125def test_record_roadmap_tool_updates_roadmap(tmp_path):
1126 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1127 db = AgentDB(tmp_path / "state.db")
1128 try:
1129 job_id = db.create_job("Build a broad generic outcome")
1130 run_id = db.start_run(job_id, model="fake")
1131 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_roadmap")
1132 ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
1133
1134 raw = DEFAULT_REGISTRY.handle(
1135 "record_roadmap",
1136 {
1137 "title": "Generic Roadmap",
1138 "status": "active",
1139 "scope": "Coordinate broad work through milestones.",
1140 "current_milestone": "Foundation",
1141 "validation_contract": "Each milestone needs evidence.",
1142 "milestones": [{
1143 "title": "Foundation",
1144 "status": "active",
1145 "priority": 7,
1146 "acceptance_criteria": "first durable output exists",
1147 "evidence_needed": "artifact and ledger update",
1148 "features": [{
1149 "title": "Create first checkpoint",
1150 "status": "active",
1151 "output_contract": "artifact",
1152 }],
1153 }],
1154 },
1155 ctx,
1156 )
1157 result = json.loads(raw)
1158 job = db.get_job(job_id)
1159 roadmap = job["metadata"]["roadmap"]
1160
1161 assert result["success"] is True
1162 assert roadmap["title"] == "Generic Roadmap"
1163 assert roadmap["status"] == "active"
1164 assert roadmap["milestones"][0]["title"] == "Foundation"
1165 assert roadmap["milestones"][0]["features"][0]["title"] == "Create first checkpoint"
1166 assert job["metadata"]["last_agent_update"]["metadata"]["roadmap_status"] == "active"
1167 finally:
1168 db.close()
1169
1170
1171def test_record_roadmap_dedupes_milestone_titles_even_when_keys_change(tmp_path):
1172 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1173 db = AgentDB(tmp_path / "state.db")
1174 try:
1175 job_id = db.create_job("Keep broad work coordinated")
1176 run_id = db.start_run(job_id, model="fake")
1177 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_roadmap")
1178 ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
1179
1180 DEFAULT_REGISTRY.handle(
1181 "record_roadmap",
1182 {
1183 "title": "Generic Roadmap",
1184 "milestones": [{
1185 "key": "initial-key",
1186 "title": "Foundation",
1187 "status": "planned",
1188 "features": [{"key": "feature-a", "title": "First feature", "status": "planned"}],
1189 }],
1190 },
1191 ctx,
1192 )
1193 DEFAULT_REGISTRY.handle(
1194 "record_roadmap",
1195 {
1196 "title": "Generic Roadmap",
1197 "milestones": [{
1198 "key": "model-invented-key",
1199 "title": "Foundation",
1200 "status": "active",
1201 "features": [{"key": "different-feature-key", "title": "First feature", "status": "done"}],
1202 }],
1203 },
1204 ctx,
1205 )
1206 roadmap = db.get_job(job_id)["metadata"]["roadmap"]
1207
1208 assert len(roadmap["milestones"]) == 1
1209 assert roadmap["milestones"][0]["status"] == "active"
1210 assert len(roadmap["milestones"][0]["features"]) == 1
1211 assert roadmap["milestones"][0]["features"][0]["status"] == "done"
1212 finally:
1213 db.close()
1214
1215
1216def test_record_milestone_validation_creates_follow_up_tasks(tmp_path):
1217 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1218 db = AgentDB(tmp_path / "state.db")
1219 try:
1220 job_id = db.create_job("Validate broad work")
1221 run_id = db.start_run(job_id, model="fake")
1222 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_milestone_validation")
1223 ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
1224
1225 raw = DEFAULT_REGISTRY.handle(
1226 "record_milestone_validation",
1227 {
1228 "milestone": "Foundation",
1229 "validation_status": "failed",
1230 "result": "Missing durable evidence.",
1231 "issues": ["no artifact"],
1232 "next_action": "Create evidence.",
1233 "follow_up_tasks": [{
1234 "title": "Produce missing evidence",
1235 "output_contract": "artifact",
1236 "acceptance_criteria": "saved output exists",
1237 }],
1238 },
1239 ctx,
1240 )
1241 result = json.loads(raw)
1242 job = db.get_job(job_id)
1243 roadmap = job["metadata"]["roadmap"]
1244
1245 assert result["success"] is True
1246 assert result["validation"]["validation_status"] == "failed"
1247 assert result["follow_up_tasks"][0]["title"] == "Produce missing evidence"
1248 assert roadmap["milestones"][0]["status"] == "blocked"
1249 assert job["metadata"]["task_queue"][0]["parent"] == "Foundation"
1250 assert job["metadata"]["last_agent_update"]["metadata"]["validation_status"] == "failed"
1251 finally:
1252 db.close()
1253
1254
1255def test_record_milestone_validation_requires_evidence_for_passed_status(tmp_path):
1256 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1257 db = AgentDB(tmp_path / "state.db")
1258 try:
1259 job_id = db.create_job("Validate broad work")
1260 run_id = db.start_run(job_id, model="fake")
1261 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_milestone_validation")
1262 ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
1263
1264 raw = DEFAULT_REGISTRY.handle(
1265 "record_milestone_validation",
1266 {
1267 "milestone": "Foundation",
1268 "validation_status": "passed",
1269 },
1270 ctx,
1271 )
1272 result = json.loads(raw)
1273
1274 assert result["success"] is False
1275 assert result["error"] == "passed milestone validation requires evidence or result"
1276 assert db.get_job(job_id)["metadata"].get("roadmap") is None
1277 finally:
1278 db.close()
1279
1280
1281def test_record_milestone_validation_allows_passed_status_with_metadata_evidence(tmp_path):
1282 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1283 db = AgentDB(tmp_path / "state.db")
1284 try:
1285 job_id = db.create_job("Validate broad work")
1286 run_id = db.start_run(job_id, model="fake")
1287 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_milestone_validation")
1288 ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
1289
1290 raw = DEFAULT_REGISTRY.handle(
1291 "record_milestone_validation",
1292 {
1293 "milestone": "Foundation",
1294 "validation_status": "passed",
1295 "metadata": {"artifact_id": "art_123"},
1296 },
1297 ctx,
1298 )
1299 result = json.loads(raw)
1300
1301 assert result["success"] is True
1302 assert result["validation"]["validation_status"] == "passed"
1303 finally:
1304 db.close()
1305
1306
1307def test_record_milestone_validation_requires_gap_for_failed_or_blocked_status(tmp_path):
1308 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1309 db = AgentDB(tmp_path / "state.db")
1310 try:
1311 job_id = db.create_job("Validate broad work")
1312 run_id = db.start_run(job_id, model="fake")
1313 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_milestone_validation")
1314 ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
1315
1316 failed = json.loads(DEFAULT_REGISTRY.handle(
1317 "record_milestone_validation",
1318 {
1319 "milestone": "Foundation",
1320 "validation_status": "failed",
1321 },
1322 ctx,
1323 ))
1324 blocked = json.loads(DEFAULT_REGISTRY.handle(
1325 "record_milestone_validation",
1326 {
1327 "milestone": "Foundation",
1328 "validation_status": "blocked",
1329 },
1330 ctx,
1331 ))
1332
1333 assert failed["success"] is False
1334 assert failed["error"] == "failed milestone validation requires a gap, issue, evidence, next_action, or follow-up task"
1335 assert blocked["success"] is False
1336 assert blocked["error"] == "blocked milestone validation requires a gap, issue, evidence, next_action, or follow-up task"
1337 assert db.get_job(job_id)["metadata"].get("roadmap") is None
1338 finally:
1339 db.close()
1340
1341
1342def test_record_experiment_tool_tracks_best_measured_result(tmp_path):
1343 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1344 db = AgentDB(tmp_path / "state.db")
1345 try:
1346 job_id = db.create_job("Improve a measurable process")
1347 run_id = db.start_run(job_id, model="fake")
1348 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_experiment")
1349 ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
1350
1351 first = DEFAULT_REGISTRY.handle(
1352 "record_experiment",
1353 {
1354 "title": "baseline attempt",
1355 "status": "measured",
1356 "metric_name": "score",
1357 "metric_value": 2.0,
1358 "metric_unit": "units",
1359 "higher_is_better": True,
1360 "config": {"variant": "a"},
1361 "result": "baseline measured",
1362 "next_action": "try variant b",
1363 },
1364 ctx,
1365 )
1366 second = DEFAULT_REGISTRY.handle(
1367 "record_experiment",
1368 {
1369 "title": "second attempt",
1370 "status": "measured",
1371 "metric_name": "score",
1372 "metric_value": 3.5,
1373 "metric_unit": "units",
1374 "higher_is_better": True,
1375 "config": {"variant": "b"},
1376 "result": "improved",
1377 "next_action": "test a different branch",
1378 },
1379 ctx,
1380 )
1381 job = db.get_job(job_id)
1382 experiments = job["metadata"]["experiment_ledger"]
1383
1384 assert json.loads(first)["experiment"]["best_observed"] is True
1385 assert json.loads(second)["experiment"]["best_observed"] is True
1386 assert experiments[0]["best_observed"] is False
1387 assert experiments[1]["best_observed"] is True
1388 assert experiments[1]["delta_from_previous_best"] == 1.5
1389 assert job["metadata"]["best_experiment_record"]["title"] == "second attempt"
1390 assert job["metadata"]["last_agent_update"]["metadata"]["best_observed"] is True
1391 finally:
1392 db.close()
1393
1394
1395def test_record_experiment_synthesizes_missing_title(tmp_path):
1396 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1397 db = AgentDB(tmp_path / "state.db")
1398 try:
1399 job_id = db.create_job("Improve a measurable process")
1400 run_id = db.start_run(job_id, model="fake")
1401 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_experiment")
1402 ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
1403
1404 raw = DEFAULT_REGISTRY.handle(
1405 "record_experiment",
1406 {
1407 "status": "planned",
1408 "metric_name": "download_progress_bytes",
1409 "result": "download incomplete",
1410 },
1411 ctx,
1412 )
1413 result = json.loads(raw)
1414
1415 assert result["success"] is True
1416 assert result["experiment"]["title"] == "download_progress_bytes"
1417 finally:
1418 db.close()
1419
1420
1421def test_record_experiment_requires_next_action_for_closed_trials(tmp_path):
1422 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1423 db = AgentDB(tmp_path / "state.db")
1424 try:
1425 job_id = db.create_job("Improve a measurable process")
1426 run_id = db.start_run(job_id, model="fake")
1427 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_experiment")
1428 ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
1429
1430 raw = DEFAULT_REGISTRY.handle(
1431 "record_experiment",
1432 {
1433 "title": "blocked attempt",
1434 "status": "blocked",
1435 "metric_name": "score",
1436 "result": "no valid measurement",
1437 },
1438 ctx,
1439 )
1440 result = json.loads(raw)
1441
1442 assert result["success"] is False
1443 assert result["error"] == "next_action is required for measured, failed, blocked, or skipped experiments"
1444 assert db.get_job(job_id)["metadata"].get("experiment_ledger") is None
1445 finally:
1446 db.close()
1447
1448
1449def test_record_experiment_requires_context_for_closed_non_measured_trials(tmp_path):
1450 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1451 db = AgentDB(tmp_path / "state.db")
1452 try:
1453 job_id = db.create_job("Improve a measurable process")
1454 run_id = db.start_run(job_id, model="fake")
1455 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_experiment")
1456 ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
1457
1458 raw = DEFAULT_REGISTRY.handle(
1459 "record_experiment",
1460 {
1461 "title": "blocked attempt",
1462 "status": "blocked",
1463 "metric_name": "score",
1464 "next_action": "try a different branch",
1465 },
1466 ctx,
1467 )
1468 result = json.loads(raw)
1469
1470 assert result["success"] is False
1471 assert result["error"] == "blocked experiments require result, evidence, config, or metadata"
1472 assert db.get_job(job_id)["metadata"].get("experiment_ledger") is None
1473 finally:
1474 db.close()
1475
1476
1477def test_record_experiment_accepts_blocked_trial_with_context(tmp_path):
1478 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1479 db = AgentDB(tmp_path / "state.db")
1480 try:
1481 job_id = db.create_job("Improve a measurable process")
1482 run_id = db.start_run(job_id, model="fake")
1483 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_experiment")
1484 ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
1485
1486 raw = DEFAULT_REGISTRY.handle(
1487 "record_experiment",
1488 {
1489 "title": "blocked attempt",
1490 "status": "blocked",
1491 "metric_name": "score",
1492 "result": "required input was unavailable",
1493 "next_action": "try a different branch",
1494 },
1495 ctx,
1496 )
1497 result = json.loads(raw)
1498
1499 assert result["success"] is True
1500 assert result["experiment"]["status"] == "blocked"
1501 assert result["experiment"]["result"] == "required input was unavailable"
1502 finally:
1503 db.close()
1504
1505
1506def test_record_experiment_requires_metric_for_measured_trials(tmp_path):
1507 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1508 db = AgentDB(tmp_path / "state.db")
1509 try:
1510 job_id = db.create_job("Improve a measurable process")
1511 run_id = db.start_run(job_id, model="fake")
1512 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_experiment")
1513 ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
1514
1515 missing_value = json.loads(DEFAULT_REGISTRY.handle(
1516 "record_experiment",
1517 {
1518 "title": "invalid measurement",
1519 "status": "measured",
1520 "metric_name": "score",
1521 "result": "looked better but no numeric metric",
1522 "next_action": "run a real measurement",
1523 },
1524 ctx,
1525 ))
1526 missing_name = json.loads(DEFAULT_REGISTRY.handle(
1527 "record_experiment",
1528 {
1529 "title": "invalid measurement",
1530 "status": "measured",
1531 "metric_value": 2.7,
1532 "result": "numeric result with no metric name",
1533 "next_action": "label the metric and retry",
1534 },
1535 ctx,
1536 ))
1537
1538 assert missing_value["success"] is False
1539 assert missing_value["error"] == "measured experiments require metric_name and numeric metric_value"
1540 assert missing_name["success"] is False
1541 assert missing_name["error"] == "measured experiments require metric_name and numeric metric_value"
1542 assert db.get_job(job_id)["metadata"].get("experiment_ledger") is None
1543 finally:
1544 db.close()
1545
1546
1547def test_record_experiment_accepts_numeric_metric_strings(tmp_path):
1548 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1549 db = AgentDB(tmp_path / "state.db")
1550 try:
1551 job_id = db.create_job("Improve a measurable process")
1552 run_id = db.start_run(job_id, model="fake")
1553 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_experiment")
1554 ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
1555
1556 raw = DEFAULT_REGISTRY.handle(
1557 "record_experiment",
1558 {
1559 "title": "string metric",
1560 "status": "measured",
1561 "metric_name": "score",
1562 "metric_value": "2.7",
1563 "metric_unit": "units",
1564 "result": "measured from output",
1565 "next_action": "try the next branch",
1566 },
1567 ctx,
1568 )
1569 result = json.loads(raw)
1570
1571 assert result["success"] is True
1572 assert result["experiment"]["metric_value"] == 2.7
1573 finally:
1574 db.close()
1575
1576
1577def test_acknowledge_operator_context_tool_marks_context(tmp_path):
1578 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1579 db = AgentDB(tmp_path / "state.db")
1580 try:
1581 job_id = db.create_job("Run with operator corrections")
1582 entry = db.append_operator_message(job_id, "use the corrected target", source="chat")
1583 db.claim_operator_messages(job_id, modes=("steer",), limit=1)
1584 run_id = db.start_run(job_id, model="fake")
1585 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="acknowledge_operator_context")
1586 ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
1587
1588 raw = DEFAULT_REGISTRY.handle(
1589 "acknowledge_operator_context",
1590 {"message_ids": [entry["event_id"]], "summary": "correction incorporated"},
1591 ctx,
1592 )
1593 result = json.loads(raw)
1594 job = db.get_job(job_id)
1595
1596 assert result["success"] is True
1597 assert result["count"] == 1
1598 assert job["metadata"]["operator_messages"][0]["acknowledged_at"]
1599 assert job["metadata"]["last_operator_context_ack"]["summary"] == "correction incorporated"
1600 finally:
1601 db.close()
1602
1603
1604def test_acknowledge_operator_context_requires_active_context(tmp_path):
1605 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1606 db = AgentDB(tmp_path / "state.db")
1607 try:
1608 job_id = db.create_job("Run without operator corrections")
1609 run_id = db.start_run(job_id, model="fake")
1610 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="acknowledge_operator_context")
1611 ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
1612
1613 raw = DEFAULT_REGISTRY.handle(
1614 "acknowledge_operator_context",
1615 {"summary": "ordinary progress note"},
1616 ctx,
1617 )
1618 result = json.loads(raw)
1619
1620 assert result["success"] is False
1621 assert result["recoverable"] is True
1622 assert result["error"] == "no active operator context to acknowledge"
1623 assert "report_update" in result["guidance"]
1624 assert "last_operator_context_ack" not in db.get_job(job_id)["metadata"]
1625 finally:
1626 db.close()
1627
1628
1629def test_record_tasks_accepts_generic_output_contracts(tmp_path):
1630 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1631 db = AgentDB(tmp_path / "state.db")
1632 try:
1633 job_id = db.create_job("Improve measurable process")
1634 run_id = db.start_run(job_id, model="fake")
1635 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_tasks")
1636 ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
1637
1638 raw = DEFAULT_REGISTRY.handle(
1639 "record_tasks",
1640 {
1641 "tasks": [{
1642 "title": "Run one comparison",
1643 "status": "open",
1644 "output_contract": "experiment",
1645 "acceptance_criteria": "metric recorded",
1646 "evidence_needed": "command output or artifact",
1647 "stall_behavior": "record blocker and pivot",
1648 }]
1649 },
1650 ctx,
1651 )
1652 result = json.loads(raw)
1653 task = db.get_job(job_id)["metadata"]["task_queue"][0]
1654
1655 assert result["success"] is True
1656 assert task["output_contract"] == "experiment"
1657 assert task["acceptance_criteria"] == "metric recorded"
1658 assert task["evidence_needed"] == "command output or artifact"
1659 assert task["stall_behavior"] == "record blocker and pivot"
1660 finally:
1661 db.close()
1662
1663
1664def test_record_tasks_promotes_output_contract_from_metadata(tmp_path):
1665 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1666 db = AgentDB(tmp_path / "state.db")
1667 try:
1668 job_id = db.create_job("Improve measurable process")
1669 run_id = db.start_run(job_id, model="fake")
1670 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_tasks")
1671 ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
1672
1673 raw = DEFAULT_REGISTRY.handle(
1674 "record_tasks",
1675 {
1676 "tasks": [{
1677 "title": "Validate concrete candidate",
1678 "status": "open",
1679 "metadata": {"output_contract": "action", "source": "planner"},
1680 "acceptance_criteria": "candidate is tested",
1681 }]
1682 },
1683 ctx,
1684 )
1685 result = json.loads(raw)
1686 task = db.get_job(job_id)["metadata"]["task_queue"][0]
1687
1688 assert result["success"] is True
1689 assert task["output_contract"] == "action"
1690 assert task["metadata"]["source"] == "planner"
1691 finally:
1692 db.close()
1693
1694
1695def test_record_tasks_downgrades_done_artifact_without_delivery_evidence(tmp_path):
1696 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1697 db = AgentDB(tmp_path / "state.db")
1698 try:
1699 job_id = db.create_job("Update a deliverable")
1700 run_id = db.start_run(job_id, model="fake")
1701 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_tasks")
1702 ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
1703
1704 raw = DEFAULT_REGISTRY.handle(
1705 "record_tasks",
1706 {
1707 "tasks": [{
1708 "title": "Update report draft",
1709 "status": "done",
1710 "output_contract": "artifact",
1711 "result": "Updated the report",
1712 }]
1713 },
1714 ctx,
1715 )
1716 result = json.loads(raw)
1717 task = db.get_job(job_id)["metadata"]["task_queue"][0]
1718
1719 assert result["success"] is True
1720 assert task["status"] == "active"
1721 assert task["metadata"]["completion_validation"] == "missing_recent_deliverable_evidence"
1722 assert task["metadata"]["claimed_result"] == "Updated the report"
1723 finally:
1724 db.close()
1725
1726
1727def test_record_tasks_downgrades_done_without_result_evidence(tmp_path):
1728 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1729 db = AgentDB(tmp_path / "state.db")
1730 try:
1731 job_id = db.create_job("Validate generic work")
1732 run_id = db.start_run(job_id, model="fake")
1733 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_tasks")
1734 ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
1735
1736 raw = DEFAULT_REGISTRY.handle(
1737 "record_tasks",
1738 {
1739 "tasks": [{
1740 "title": "Check current branch",
1741 "status": "done",
1742 "output_contract": "decision",
1743 }]
1744 },
1745 ctx,
1746 )
1747 result = json.loads(raw)
1748 task = db.get_job(job_id)["metadata"]["task_queue"][0]
1749
1750 assert result["success"] is True
1751 assert task["status"] == "active"
1752 assert task["metadata"]["completion_validation"] == "missing_result_evidence"
1753 finally:
1754 db.close()
1755
1756
1757def test_record_tasks_downgrades_done_research_without_durable_evidence(tmp_path):
1758 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1759 db = AgentDB(tmp_path / "state.db")
1760 try:
1761 job_id = db.create_job("Research a topic")
1762 run_id = db.start_run(job_id, model="fake")
1763 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_tasks")
1764 ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
1765
1766 raw = DEFAULT_REGISTRY.handle(
1767 "record_tasks",
1768 {
1769 "tasks": [{
1770 "title": "Synthesize source evidence",
1771 "status": "done",
1772 "output_contract": "research",
1773 "result": "Found useful background.",
1774 }]
1775 },
1776 ctx,
1777 )
1778 result = json.loads(raw)
1779 task = db.get_job(job_id)["metadata"]["task_queue"][0]
1780
1781 assert result["success"] is True
1782 assert task["status"] == "active"
1783 assert task["metadata"]["completion_validation"] == "missing_research_evidence"
1784 assert task["metadata"]["claimed_result"] == "Found useful background."
1785 finally:
1786 db.close()
1787
1788
1789def test_record_tasks_allows_done_research_after_source_evidence(tmp_path):
1790 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1791 db = AgentDB(tmp_path / "state.db")
1792 try:
1793 job_id = db.create_job("Research a topic")
1794 run_id = db.start_run(job_id, model="fake")
1795 source_step = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_source")
1796 db.finish_step(source_step, status="completed", summary="source recorded", output_data={"success": True})
1797 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_tasks")
1798 ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
1799
1800 raw = DEFAULT_REGISTRY.handle(
1801 "record_tasks",
1802 {
1803 "tasks": [{
1804 "title": "Synthesize source evidence",
1805 "status": "done",
1806 "output_contract": "research",
1807 "result": "Source ledger records the useful branch.",
1808 }]
1809 },
1810 ctx,
1811 )
1812 result = json.loads(raw)
1813 task = db.get_job(job_id)["metadata"]["task_queue"][0]
1814
1815 assert result["success"] is True
1816 assert task["status"] == "done"
1817 assert "completion_validation" not in task.get("metadata", {})
1818 finally:
1819 db.close()
1820
1821
1822def test_record_tasks_allows_done_research_with_metadata_evidence(tmp_path):
1823 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1824 db = AgentDB(tmp_path / "state.db")
1825 try:
1826 job_id = db.create_job("Research a topic")
1827 run_id = db.start_run(job_id, model="fake")
1828 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_tasks")
1829 ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
1830
1831 raw = DEFAULT_REGISTRY.handle(
1832 "record_tasks",
1833 {
1834 "tasks": [{
1835 "title": "Synthesize source evidence",
1836 "status": "done",
1837 "output_contract": "research",
1838 "metadata": {"source_url": "https://example.com/source"},
1839 }]
1840 },
1841 ctx,
1842 )
1843 task = db.get_job(job_id)["metadata"]["task_queue"][0]
1844
1845 assert json.loads(raw)["success"] is True
1846 assert task["status"] == "done"
1847 assert "completion_validation" not in task.get("metadata", {})
1848 finally:
1849 db.close()
1850
1851
1852def test_record_tasks_downgrades_done_experiment_without_measurement_evidence(tmp_path):
1853 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1854 db = AgentDB(tmp_path / "state.db")
1855 try:
1856 job_id = db.create_job("Improve a measurable process")
1857 run_id = db.start_run(job_id, model="fake")
1858 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_tasks")
1859 ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
1860
1861 raw = DEFAULT_REGISTRY.handle(
1862 "record_tasks",
1863 {
1864 "tasks": [{
1865 "title": "Run comparison",
1866 "status": "done",
1867 "output_contract": "experiment",
1868 "result": "The comparison improved.",
1869 }]
1870 },
1871 ctx,
1872 )
1873 task = db.get_job(job_id)["metadata"]["task_queue"][0]
1874
1875 assert json.loads(raw)["success"] is True
1876 assert task["status"] == "active"
1877 assert task["metadata"]["completion_validation"] == "missing_experiment_evidence"
1878 finally:
1879 db.close()
1880
1881
1882def test_record_tasks_allows_done_experiment_after_measurement_evidence(tmp_path):
1883 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1884 db = AgentDB(tmp_path / "state.db")
1885 try:
1886 job_id = db.create_job("Improve a measurable process")
1887 run_id = db.start_run(job_id, model="fake")
1888 experiment_step = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_experiment")
1889 db.finish_step(experiment_step, status="completed", summary="experiment measured", output_data={"success": True})
1890 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_tasks")
1891 ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
1892
1893 raw = DEFAULT_REGISTRY.handle(
1894 "record_tasks",
1895 {
1896 "tasks": [{
1897 "title": "Run comparison",
1898 "status": "done",
1899 "output_contract": "experiment",
1900 "result": "Experiment ledger records the measured comparison.",
1901 }]
1902 },
1903 ctx,
1904 )
1905 task = db.get_job(job_id)["metadata"]["task_queue"][0]
1906
1907 assert json.loads(raw)["success"] is True
1908 assert task["status"] == "done"
1909 assert "completion_validation" not in task.get("metadata", {})
1910 finally:
1911 db.close()
1912
1913
1914def test_record_tasks_downgrades_done_action_after_read_only_shell(tmp_path):
1915 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1916 db = AgentDB(tmp_path / "state.db")
1917 try:
1918 job_id = db.create_job("Change a local workspace")
1919 run_id = db.start_run(job_id, model="fake")
1920 shell_step = db.add_step(
1921 job_id=job_id,
1922 run_id=run_id,
1923 kind="tool",
1924 tool_name="shell_exec",
1925 input_data={"arguments": {"command": "ls -la"}},
1926 )
1927 db.finish_step(shell_step, status="completed", summary="shell_exec rc=0")
1928 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_tasks")
1929 ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
1930
1931 raw = DEFAULT_REGISTRY.handle(
1932 "record_tasks",
1933 {
1934 "tasks": [{
1935 "title": "Apply change",
1936 "status": "done",
1937 "output_contract": "action",
1938 "result": "Inspected the workspace.",
1939 }]
1940 },
1941 ctx,
1942 )
1943 task = db.get_job(job_id)["metadata"]["task_queue"][0]
1944
1945 assert json.loads(raw)["success"] is True
1946 assert task["status"] == "active"
1947 assert task["metadata"]["completion_validation"] == "missing_action_evidence"
1948 finally:
1949 db.close()
1950
1951
1952def test_record_tasks_allows_done_action_after_action_shell(tmp_path):
1953 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1954 db = AgentDB(tmp_path / "state.db")
1955 try:
1956 job_id = db.create_job("Change a local workspace")
1957 run_id = db.start_run(job_id, model="fake")
1958 shell_step = db.add_step(
1959 job_id=job_id,
1960 run_id=run_id,
1961 kind="tool",
1962 tool_name="shell_exec",
1963 input_data={"arguments": {"command": "python run_branch.py"}},
1964 )
1965 db.finish_step(shell_step, status="completed", summary="shell_exec rc=0")
1966 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_tasks")
1967 ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
1968
1969 raw = DEFAULT_REGISTRY.handle(
1970 "record_tasks",
1971 {
1972 "tasks": [{
1973 "title": "Apply change",
1974 "status": "done",
1975 "output_contract": "action",
1976 "result": "Ran the action branch.",
1977 }]
1978 },
1979 ctx,
1980 )
1981 task = db.get_job(job_id)["metadata"]["task_queue"][0]
1982
1983 assert json.loads(raw)["success"] is True
1984 assert task["status"] == "done"
1985 assert "completion_validation" not in task.get("metadata", {})
1986 finally:
1987 db.close()
1988
1989
1990def test_record_tasks_downgrades_done_monitor_without_defer_evidence(tmp_path):
1991 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1992 db = AgentDB(tmp_path / "state.db")
1993 try:
1994 job_id = db.create_job("Monitor long-running work")
1995 run_id = db.start_run(job_id, model="fake")
1996 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_tasks")
1997 ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
1998
1999 raw = DEFAULT_REGISTRY.handle(
2000 "record_tasks",
2001 {
2002 "tasks": [{
2003 "title": "Wait and check later",
2004 "status": "done",
2005 "output_contract": "monitor",
2006 "result": "Will check later.",
2007 }]
2008 },
2009 ctx,
2010 )
2011 task = db.get_job(job_id)["metadata"]["task_queue"][0]
2012
2013 assert json.loads(raw)["success"] is True
2014 assert task["status"] == "active"
2015 assert task["metadata"]["completion_validation"] == "missing_monitor_evidence"
2016 finally:
2017 db.close()
2018
2019
2020def test_record_tasks_allows_done_monitor_after_defer_evidence(tmp_path):
2021 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
2022 db = AgentDB(tmp_path / "state.db")
2023 try:
2024 job_id = db.create_job("Monitor long-running work")
2025 run_id = db.start_run(job_id, model="fake")
2026 defer_step = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="defer_job")
2027 db.finish_step(defer_step, status="completed", summary="deferred")
2028 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_tasks")
2029 ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
2030
2031 raw = DEFAULT_REGISTRY.handle(
2032 "record_tasks",
2033 {
2034 "tasks": [{
2035 "title": "Wait and check later",
2036 "status": "done",
2037 "output_contract": "monitor",
2038 "result": "A monitor/defer branch is scheduled.",
2039 }]
2040 },
2041 ctx,
2042 )
2043 task = db.get_job(job_id)["metadata"]["task_queue"][0]
2044
2045 assert json.loads(raw)["success"] is True
2046 assert task["status"] == "done"
2047 assert "completion_validation" not in task.get("metadata", {})
2048 finally:
2049 db.close()
2050
2051
2052def test_record_tasks_allows_done_artifact_after_delivery_evidence(tmp_path):
2053 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
2054 db = AgentDB(tmp_path / "state.db")
2055 try:
2056 job_id = db.create_job("Update a deliverable")
2057 run_id = db.start_run(job_id, model="fake")
2058 artifact_step = db.add_step(
2059 job_id=job_id,
2060 run_id=run_id,
2061 kind="tool",
2062 tool_name="write_artifact",
2063 input_data={"arguments": {"title": "Final report draft", "summary": "Updated report deliverable"}},
2064 )
2065 db.finish_step(artifact_step, status="completed", summary="write_artifact saved art_demo")
2066 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_tasks")
2067 ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
2068
2069 raw = DEFAULT_REGISTRY.handle(
2070 "record_tasks",
2071 {
2072 "tasks": [{
2073 "title": "Update report draft",
2074 "status": "done",
2075 "output_contract": "artifact",
2076 "result": "Saved final report draft",
2077 }]
2078 },
2079 ctx,
2080 )
2081 result = json.loads(raw)
2082 task = db.get_job(job_id)["metadata"]["task_queue"][0]
2083
2084 assert result["success"] is True
2085 assert task["status"] == "done"
2086 assert "completion_validation" not in task.get("metadata", {})
2087 finally:
2088 db.close()
2089
2090
2091def test_record_tasks_does_not_treat_stderr_redirect_as_delivery_write(tmp_path):
2092 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
2093 db = AgentDB(tmp_path / "state.db")
2094 try:
2095 job_id = db.create_job("Update a deliverable")
2096 run_id = db.start_run(job_id, model="fake")
2097 shell_step = db.add_step(
2098 job_id=job_id,
2099 run_id=run_id,
2100 kind="tool",
2101 tool_name="shell_exec",
2102 input_data={"arguments": {"command": "cat draft.md 2>/dev/null"}},
2103 )
2104 db.finish_step(shell_step, status="completed", summary="shell_exec rc=0")
2105 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_tasks")
2106 ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
2107
2108 raw = DEFAULT_REGISTRY.handle(
2109 "record_tasks",
2110 {
2111 "tasks": [{
2112 "title": "Update report draft",
2113 "status": "done",
2114 "output_contract": "artifact",
2115 "result": "Saved final report draft",
2116 }]
2117 },
2118 ctx,
2119 )
2120 result = json.loads(raw)
2121 task = db.get_job(job_id)["metadata"]["task_queue"][0]
2122
2123 assert result["success"] is True
2124 assert task["status"] == "active"
2125 assert task["metadata"]["completion_validation"] == "missing_recent_deliverable_evidence"
2126 finally:
2127 db.close()
2128
2129
2130def test_record_tasks_rejects_checkpoint_as_delivery_evidence(tmp_path):
2131 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
2132 db = AgentDB(tmp_path / "state.db")
2133 try:
2134 job_id = db.create_job("Update a deliverable")
2135 run_id = db.start_run(job_id, model="fake")
2136 artifact_step = db.add_step(
2137 job_id=job_id,
2138 run_id=run_id,
2139 kind="tool",
2140 tool_name="write_artifact",
2141 input_data={"arguments": {"title": "Compiled report checkpoint", "summary": "Checkpoint before final rewrite"}},
2142 )
2143 db.finish_step(artifact_step, status="completed", summary="write_artifact saved art_demo")
2144 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_tasks")
2145 ctx = ToolContext(config=config, db=db, artifacts=ArtifactStore(tmp_path, db), job_id=job_id, run_id=run_id, step_id=step_id)
2146
2147 raw = DEFAULT_REGISTRY.handle(
2148 "record_tasks",
2149 {
2150 "tasks": [{
2151 "title": "Update report draft",
2152 "status": "done",
2153 "output_contract": "artifact",
2154 "result": "Saved final report draft",
2155 }]
2156 },
2157 ctx,
2158 )
2159 result = json.loads(raw)
2160 task = db.get_job(job_id)["metadata"]["task_queue"][0]
2161
2162 assert result["success"] is True
2163 assert task["status"] == "active"
2164 assert task["metadata"]["completion_validation"] == "missing_recent_deliverable_evidence"
2165 finally:
2166 db.close()
tests/nipux_cli/test_uninstall.py 136 lines
1import subprocess
2from pathlib import Path
3
4from nipux_cli.uninstall import build_uninstall_plan, installed_tool_paths, uninstall_installed_tool, uninstall_runtime
5
6
7def _completed(*_args, **_kwargs):
8 return subprocess.CompletedProcess(args=[], returncode=0)
9
10
11def test_uninstall_plan_includes_runtime_and_legacy_state(monkeypatch, tmp_path):
12 home = tmp_path / "user"
13 profile = tmp_path / "profile"
14 monkeypatch.setenv("HOME", str(home))
15 monkeypatch.setenv("NIPUX_HOME", str(profile))
16
17 plan = build_uninstall_plan()
18
19 assert profile in plan.paths
20 assert home / ".nipux" in plan.paths
21 assert home / ".kneepucks" in plan.paths
22 assert home / "Library" / "LaunchAgents" / "com.nipux.agent.plist" in plan.service_paths
23 assert home / ".config" / "systemd" / "user" / "nipux.service" in plan.service_paths
24
25
26def test_uninstall_plan_includes_configured_runtime_home(monkeypatch, tmp_path):
27 home = tmp_path / "user"
28 profile = tmp_path / "profile"
29 configured = tmp_path / "configured"
30 monkeypatch.setenv("HOME", str(home))
31 monkeypatch.setenv("NIPUX_HOME", str(profile))
32
33 plan = build_uninstall_plan(runtime_home=configured)
34
35 assert configured in plan.paths
36 assert profile in plan.paths
37
38
39def test_uninstall_runtime_removes_state_and_service_files(monkeypatch, tmp_path):
40 home = tmp_path / "user"
41 profile = tmp_path / "profile"
42 monkeypatch.setenv("HOME", str(home))
43 monkeypatch.setenv("NIPUX_HOME", str(profile))
44
45 paths = [
46 profile,
47 home / ".nipux",
48 home / ".kneepucks",
49 home / "Library" / "LaunchAgents",
50 home / ".config" / "systemd" / "user",
51 ]
52 for path in paths:
53 path.mkdir(parents=True, exist_ok=True)
54 (home / "Library" / "LaunchAgents" / "com.nipux.agent.plist").write_text("plist", encoding="utf-8")
55 (home / ".config" / "systemd" / "user" / "nipux.service").write_text("unit", encoding="utf-8")
56 (profile / "state.db").write_text("state", encoding="utf-8")
57
58 lines = uninstall_runtime(runner=_completed)
59
60 assert any("removed" in line and str(profile) in line for line in lines)
61 assert not profile.exists()
62 assert not (home / ".nipux").exists()
63 assert not (home / ".kneepucks").exists()
64 assert not (home / "Library" / "LaunchAgents" / "com.nipux.agent.plist").exists()
65 assert not (home / ".config" / "systemd" / "user" / "nipux.service").exists()
66
67
68def test_uninstall_runtime_dry_run_keeps_files(monkeypatch, tmp_path):
69 home = tmp_path / "user"
70 profile = tmp_path / "profile"
71 monkeypatch.setenv("HOME", str(home))
72 monkeypatch.setenv("NIPUX_HOME", str(profile))
73 profile.mkdir(parents=True)
74
75 lines = uninstall_runtime(dry_run=True, runner=_completed)
76
77 assert any("would remove" in line and str(profile) in line for line in lines)
78 assert profile.exists()
79
80
81def test_uninstall_installed_tool_uses_uv_when_available(monkeypatch, tmp_path):
82 home = tmp_path / "user"
83 monkeypatch.setenv("HOME", str(home))
84 monkeypatch.setattr("nipux_cli.uninstall.shutil.which", lambda name: "/usr/bin/uv" if name == "uv" else None)
85 calls = []
86
87 def runner(command):
88 calls.append(tuple(command))
89 return subprocess.CompletedProcess(command, 0, stdout="Uninstalled 1 executable: nipux\n")
90
91 code, lines = uninstall_installed_tool(runner=runner)
92
93 assert code == 0
94 assert calls == [("/usr/bin/uv", "tool", "uninstall", "nipux")]
95 assert "Uninstalled 1 executable: nipux" in "\n".join(lines)
96
97
98def test_uninstall_installed_tool_falls_back_to_safe_uv_paths(monkeypatch, tmp_path):
99 home = tmp_path / "user"
100 shim = home / ".local" / "bin" / "nipux"
101 tool_dir = home / ".local" / "share" / "uv" / "tools" / "nipux"
102 tool_bin = tool_dir / "bin"
103 tool_bin.mkdir(parents=True)
104 shim.parent.mkdir(parents=True)
105 target = tool_bin / "nipux"
106 target.write_text("script", encoding="utf-8")
107 shim.symlink_to(target)
108 monkeypatch.setenv("HOME", str(home))
109
110 def which(name):
111 if name == "nipux":
112 return str(shim)
113 return None
114
115 monkeypatch.setattr("nipux_cli.uninstall.shutil.which", which)
116
117 code, lines = uninstall_installed_tool()
118
119 assert code == 0
120 assert not shim.exists()
121 assert not tool_dir.exists()
122 rendered = "\n".join(lines)
123 assert "uv not found; checking safe local tool paths" in rendered
124 assert f"removed {shim}" in rendered
125 assert f"removed {tool_dir}" in rendered
126
127
128def test_installed_tool_paths_ignore_non_user_tool(monkeypatch, tmp_path):
129 home = tmp_path / "user"
130 monkeypatch.setenv("HOME", str(home))
131 monkeypatch.setattr("nipux_cli.uninstall.shutil.which", lambda name: "/usr/local/bin/nipux" if name == "nipux" else None)
132
133 paths = installed_tool_paths()
134
135 assert Path("/usr/local/bin/nipux") not in paths
136 assert home / ".local" / "bin" / "nipux" in paths
tests/nipux_cli/test_worker.py 11339 lines
1import json
2from pathlib import Path
3
4from nipux_cli.artifacts import ArtifactStore
5from nipux_cli.config import AppConfig, ModelConfig, RuntimeConfig
6from nipux_cli.db import AgentDB
7from nipux_cli.llm import LLMResponse, LLMResponseError, ScriptedLLM, ToolCall
8from nipux_cli.worker import (
9 MAX_WORKER_PROMPT_CHARS,
10 SYSTEM_PROMPT,
11 _concrete_evidence_tokens,
12 _cited_step_numbers,
13 _extract_candidate_file_paths,
14 _file_pattern_tokens_for_grounding,
15 _rank_candidate_file_paths,
16 _render_worker_prompt,
17 build_messages,
18 run_one_step,
19)
20
21
22class SnapshotRegistry:
23 def openai_tools(self):
24 return []
25
26 def handle(self, name, args, ctx):
27 del name, args, ctx
28 return json.dumps({"success": True, "data": {"snapshot": "short snapshot"}})
29
30
31class SuccessRegistry:
32 def openai_tools(self):
33 return []
34
35 def handle(self, name, args, ctx):
36 del ctx
37 return json.dumps({"success": True, "tool": name, "args": args, "results": []})
38
39
40class MeasuredShellRegistry:
41 def openai_tools(self):
42 return []
43
44 def handle(self, name, args, ctx):
45 del args, ctx
46 if name == "shell_exec":
47 return json.dumps({"success": True, "command": "run test", "returncode": 0, "stdout": "score 2.7 units/s", "stderr": ""})
48 return json.dumps({"success": True, "results": []})
49
50
51class DiagnosticShellRegistry:
52 def openai_tools(self):
53 return []
54
55 def handle(self, name, args, ctx):
56 del args, ctx
57 if name == "shell_exec":
58 return json.dumps({
59 "success": True,
60 "command": "df -h && nproc && free -h",
61 "returncode": 0,
62 "stdout": "Filesystem Size Used Avail Use% Mounted on\\n/dev/root 233G 198G 23G 90% /\\nCPU COUNT 24\\nRAM 93Gi",
63 "stderr": "",
64 })
65 return json.dumps({"success": True})
66
67
68class TableBenchmarkShellRegistry:
69 def openai_tools(self):
70 return []
71
72 def handle(self, name, args, ctx):
73 del args, ctx
74 if name == "shell_exec":
75 return json.dumps({
76 "success": True,
77 "command": "run benchmark",
78 "returncode": 0,
79 "stdout": (
80 "| model | test | t/s |\n"
81 "| --- | ---: | ---: |\n"
82 "| example | pp32 | 5.48 ± 0.11 |\n"
83 "| example | tg128 | 3.44 ± 0.05 |\n"
84 ),
85 "stderr": "",
86 })
87 return json.dumps({"success": True})
88
89
90class FailedTableBenchmarkShellRegistry:
91 def openai_tools(self):
92 return []
93
94 def handle(self, name, args, ctx):
95 del args, ctx
96 if name == "shell_exec":
97 return json.dumps({
98 "success": False,
99 "error": "command timed out",
100 "command": "run benchmark",
101 "returncode": 124,
102 "stdout": (
103 "| model | test | t/s |\n"
104 "| --- | ---: | ---: |\n"
105 "| example | pp32 | 5.48 ± 0.11 |\n"
106 ),
107 "stderr": "",
108 })
109 return json.dumps({"success": True})
110
111
112class FailedUrlShellRegistry:
113 def openai_tools(self):
114 return []
115
116 def handle(self, name, args, ctx):
117 del ctx
118 if name == "shell_exec":
119 return json.dumps({
120 "success": False,
121 "command": args.get("command"),
122 "returncode": 0,
123 "stdout": "401 Unauthorized",
124 "stderr": "",
125 "error": (
126 "command output indicates authentication or authorization failure "
127 "despite exit status 0: 401 Unauthorized"
128 ),
129 })
130 return json.dumps({"success": True})
131
132
133class HangingLLM:
134 def next_action(self, *, messages, tools):
135 del messages, tools
136 import time
137
138 time.sleep(5)
139 return LLMResponse(tool_calls=[ToolCall(name="report_update", arguments={"message": "late"})])
140
141
142class SlowLLM:
143 def __init__(self, sleep_seconds: float):
144 self.sleep_seconds = sleep_seconds
145
146 def next_action(self, *, messages, tools):
147 del messages, tools
148 import time
149
150 time.sleep(self.sleep_seconds)
151 return LLMResponse(tool_calls=[ToolCall(name="report_update", arguments={"message": "slow but recovered"})])
152
153
154class RepairableLLM:
155 tool_repair = True
156
157 def __init__(self, responses):
158 self.responses = list(responses)
159 self.messages = []
160 self.tools = []
161
162 def next_action(self, *, messages, tools):
163 self.messages.append(messages)
164 self.tools.append(tools)
165 if not self.responses:
166 return LLMResponse(content="No response left.")
167 return self.responses.pop(0)
168
169
170class SourceCodeShellRegistry:
171 def openai_tools(self):
172 return []
173
174 def handle(self, name, args, ctx):
175 del args, ctx
176 if name == "shell_exec":
177 return json.dumps({
178 "success": True,
179 "command": "git show HEAD:nipux_cli/cli.py",
180 "returncode": 0,
181 "stdout": 'for index, task in enumerate(plan["tasks"], start=1):\n rate(plan["tasks"], start=1)\n',
182 "stderr": "",
183 })
184 return json.dumps({"success": True})
185
186
187class LargeShellEvidenceRegistry:
188 def openai_tools(self):
189 return []
190
191 def handle(self, name, args, ctx):
192 del args, ctx
193 if name == "shell_exec":
194 return json.dumps({
195 "success": True,
196 "command": "find . -type f",
197 "returncode": 0,
198 "stdout": "\n".join(f"./file_{index}.py" for index in range(200)),
199 "stderr": "",
200 })
201 return json.dumps({"success": True})
202
203
204class ExtractRegistry:
205 def openai_tools(self):
206 return []
207
208 def handle(self, name, args, ctx):
209 del args, ctx
210 if name == "web_extract":
211 return json.dumps({
212 "success": True,
213 "pages": [
214 {"url": "https://source.example/a", "text": "useful source text " * 250},
215 {"url": "https://source.example/b", "error": "timeout"},
216 ],
217 })
218 return json.dumps({"success": True})
219
220
221class SearchRegistry:
222 def openai_tools(self):
223 return []
224
225 def handle(self, name, args, ctx):
226 del args, ctx
227 if name == "web_search":
228 return json.dumps({
229 "success": True,
230 "query": "durable progress research",
231 "results": [
232 {"title": "Primary reference", "url": "https://source.example/primary"},
233 {"title": "Secondary reference", "url": "https://source.example/secondary"},
234 ],
235 })
236 return json.dumps({"success": True})
237
238
239class BrowserAndWebRegistry:
240 def openai_tools(self, config=None):
241 del config
242 return [
243 {"type": "function", "function": {"name": "browser_navigate", "parameters": {"type": "object"}}},
244 {"type": "function", "function": {"name": "web_search", "parameters": {"type": "object"}}},
245 ]
246
247 def handle(self, name, args, ctx):
248 del args, ctx
249 return json.dumps({"success": True, "tool": name})
250
251
252class CapturingLLM:
253 def __init__(self, response):
254 self.response = response
255 self.messages = None
256 self.tools = None
257
258 def next_action(self, *, messages, tools):
259 self.messages = messages
260 self.tools = tools
261 return self.response
262
263
264class ExplodingLLM:
265 def next_action(self, *, messages, tools):
266 del messages, tools
267 raise AssertionError("LLM should not be called")
268
269
270class AntiBotBrowserRegistry:
271 def openai_tools(self):
272 return []
273
274 def handle(self, name, args, ctx):
275 del args, ctx
276 if name == "browser_snapshot":
277 return json.dumps({
278 "success": True,
279 "data": {
280 "origin": "https://source.example/search",
281 "snapshot": 'Iframe "Security CAPTCHA" You have been blocked. You are browsing and clicking at a speed much faster than expected.',
282 },
283 })
284 return json.dumps({"success": True})
285
286
287def test_system_prompt_is_contract_first_not_research_first():
288 assert "Use a contract-first durable cycle" in SYSTEM_PROMPT
289 assert "Research is only one possible contract" in SYSTEM_PROMPT
290 assert "Prefer fresh measured or directly observed evidence over stale summaries" in SYSTEM_PROMPT
291 assert "available local candidate fall" in SYSTEM_PROMPT
292 assert "Use this durable cycle: discover one source" not in SYSTEM_PROMPT
293
294
295def test_run_one_step_executes_scripted_tool_call(tmp_path):
296 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
297 db = AgentDB(tmp_path / "state.db")
298 try:
299 job_id = db.create_job("Find 10 durable research findings", title="research", kind="generic")
300 llm = ScriptedLLM([
301 LLMResponse(tool_calls=[
302 ToolCall(
303 name="write_artifact",
304 arguments={
305 "title": "first finding",
306 "summary": "smoke finding",
307 "content": "Acme Design, https://example.com",
308 },
309 )
310 ])
311 ])
312
313 result = run_one_step(job_id, config=config, db=db, llm=llm)
314
315 assert result.status == "completed"
316 assert result.tool_name == "write_artifact"
317 artifacts = db.list_artifacts(job_id)
318 assert artifacts[0]["title"] == "first finding"
319 steps = db.list_steps(job_id=job_id)
320 assert steps[0]["tool_name"] == "write_artifact"
321 assert steps[0]["status"] == "completed"
322 memory = db.list_memory(job_id)
323 assert memory[0]["key"] == "rolling_state"
324 assert artifacts[0]["id"] in memory[0]["artifact_refs"]
325 finally:
326 db.close()
327
328
329def test_run_one_step_records_estimated_usage_for_scripted_model(tmp_path):
330 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
331 db = AgentDB(tmp_path / "state.db")
332 try:
333 job_id = db.create_job("Summarize progress", title="usage", kind="generic")
334
335 run_one_step(
336 job_id,
337 config=config,
338 db=db,
339 llm=ScriptedLLM([LLMResponse(content="No tool this turn.")]),
340 )
341
342 usage = db.job_token_usage(job_id)
343 assert usage["calls"] == 1
344 assert usage["prompt_tokens"] > 0
345 assert usage["completion_tokens"] > 0
346 assert usage["estimated_calls"] == 1
347 event = next(
348 event
349 for event in db.list_events(job_id=job_id, event_types=["loop"])
350 if event.get("title") == "message_end"
351 )
352 event_usage = event["metadata"]["usage"]
353 assert event_usage["prompt_chars"] > 0
354 assert event_usage["context_length"] == config.model.context_length
355 assert event_usage["context_fraction"] > 0
356 finally:
357 db.close()
358
359
360def test_run_one_step_blocks_content_only_worker_turn(tmp_path):
361 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
362 db = AgentDB(tmp_path / "state.db")
363 try:
364 job_id = db.create_job("Keep taking bounded tool actions", title="no tool", kind="generic")
365
366 result = run_one_step(
367 job_id,
368 config=config,
369 db=db,
370 llm=ScriptedLLM([LLMResponse(content="What should I do next?")]),
371 )
372
373 assert result.status == "blocked"
374 assert result.result["error"] == "worker tool call required"
375 assert "What should I do next?" in result.result["content"]
376 step = db.list_steps(job_id=job_id)[0]
377 assert step["kind"] == "assistant"
378 assert step["status"] == "blocked"
379 assert step["error"] == "worker tool call required"
380 prompt = build_messages(
381 db.get_job(job_id),
382 db.list_steps(job_id=job_id),
383 timeline_events=db.list_timeline_events(job_id, limit=30),
384 )[-1]["content"]
385 assert "What should I do next?" not in prompt
386 assert "worker tool call required" in prompt
387 job = db.get_job(job_id)
388 assert job["metadata"]["last_agent_update"]["category"] == "blocked"
389 finally:
390 db.close()
391
392
393def test_run_one_step_repairs_content_only_worker_turn_with_tool_retry(tmp_path):
394 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
395 db = AgentDB(tmp_path / "state.db")
396 try:
397 job_id = db.create_job("Keep taking bounded tool actions", title="tool repair", kind="generic")
398 llm = RepairableLLM([
399 LLMResponse(content="I should inspect the state next."),
400 LLMResponse(tool_calls=[ToolCall(name="report_update", arguments={"message": "Continuing with a bounded action."})]),
401 ])
402
403 result = run_one_step(job_id, config=config, db=db, llm=llm)
404
405 assert result.status == "completed"
406 assert result.tool_name == "report_update"
407 assert len(llm.messages) == 2
408 assert "did not call a tool" in llm.messages[1][-1]["content"]
409 steps = db.list_steps(job_id=job_id)
410 assert len(steps) == 1
411 assert steps[0]["tool_name"] == "report_update"
412 usage = db.job_token_usage(job_id)
413 assert usage["calls"] == 2
414 finally:
415 db.close()
416
417
418def test_run_one_step_recovers_repeated_content_only_worker_turns(tmp_path):
419 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
420 db = AgentDB(tmp_path / "state.db")
421 try:
422 job_id = db.create_job("Keep taking bounded tool actions", title="no tool", kind="generic")
423 llm = ScriptedLLM([
424 LLMResponse(content="What should I do next?"),
425 LLMResponse(content="I can continue if you want."),
426 LLMResponse(content="Please confirm the next step."),
427 ])
428
429 run_one_step(job_id, config=config, db=db, llm=llm)
430 run_one_step(job_id, config=config, db=db, llm=llm)
431 run_one_step(job_id, config=config, db=db, llm=llm)
432 result = run_one_step(job_id, config=config, db=db, llm=ExplodingLLM())
433
434 assert result.status == "completed"
435 assert result.tool_name == "guard_recovery"
436 assert result.result["guard_recovery"]["error"] == "worker tool call required"
437 job = db.get_job(job_id)
438 assert any(task["title"] == "Resolve guard: worker tool call required" for task in job["metadata"]["task_queue"])
439 finally:
440 db.close()
441
442
443def test_run_one_step_records_context_pressure_without_spam(tmp_path):
444 config = AppConfig(runtime=RuntimeConfig(home=tmp_path), model=ModelConfig(context_length=10_000))
445 db = AgentDB(tmp_path / "state.db")
446 try:
447 job_id = db.create_job("Keep a long-running task stable", title="context pressure", kind="generic")
448 llm = ScriptedLLM([
449 LLMResponse(content="first", usage={"prompt_tokens": 7_000, "completion_tokens": 10, "total_tokens": 7_010}),
450 LLMResponse(content="second", usage={"prompt_tokens": 7_200, "completion_tokens": 10, "total_tokens": 7_210}),
451 LLMResponse(content="third", usage={"prompt_tokens": 8_600, "completion_tokens": 10, "total_tokens": 8_610}),
452 ])
453
454 run_one_step(job_id, config=config, db=db, llm=llm)
455 run_one_step(job_id, config=config, db=db, llm=llm)
456 run_one_step(job_id, config=config, db=db, llm=llm)
457
458 pressure_events = [
459 event
460 for event in db.list_events(job_id=job_id, event_types=["agent_message"])
461 if event["metadata"].get("kind") == "context_pressure"
462 ]
463 assert len(pressure_events) == 2
464 assert "Context pressure watch" in pressure_events[0]["body"]
465 assert "Context pressure high" in pressure_events[1]["body"]
466 job = db.get_job(job_id)
467 pressure = job["metadata"]["context_pressure"]
468 assert pressure["band"] == "high"
469 assert pressure["prompt_tokens"] == 8_600
470 finally:
471 db.close()
472
473
474def test_run_one_step_executes_tool_call_batch_in_order(tmp_path):
475 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
476 db = AgentDB(tmp_path / "state.db")
477 try:
478 job_id = db.create_job("Build a durable report", title="batch", kind="generic")
479 llm = ScriptedLLM([
480 LLMResponse(tool_calls=[
481 ToolCall(
482 name="write_artifact",
483 arguments={
484 "title": "evidence checkpoint",
485 "summary": "first useful output",
486 "content": "The worker saved evidence before updating the task queue.",
487 },
488 ),
489 ToolCall(
490 name="record_tasks",
491 arguments={
492 "tasks": [
493 {
494 "title": "Review saved output",
495 "status": "open",
496 "priority": 5,
497 "output_contract": "report",
498 "acceptance_criteria": "Saved evidence has been inspected and summarized.",
499 "evidence_needed": "Artifact reference and concrete next action.",
500 "stall_behavior": "Record a lesson and pivot if the artifact is not useful.",
501 }
502 ]
503 },
504 ),
505 ])
506 ])
507
508 result = run_one_step(job_id, config=config, db=db, llm=llm)
509
510 assert result.status == "completed"
511 assert result.tool_name == "record_tasks"
512 steps = db.list_steps(job_id=job_id)
513 assert [step["tool_name"] for step in steps] == ["write_artifact", "record_tasks"]
514 assert [step["status"] for step in steps] == ["completed", "completed"]
515 artifacts = db.list_artifacts(job_id)
516 assert artifacts[0]["title"] == "evidence checkpoint"
517 job = db.get_job(job_id)
518 tasks = job["metadata"]["task_queue"]
519 assert any(task["title"] == "Review saved output" and task["output_contract"] == "report" for task in tasks)
520 run = db.list_runs(job_id, limit=1)[0]
521 assert run["status"] == "completed"
522 finally:
523 db.close()
524
525
526def test_write_artifact_reconciles_matching_report_task(tmp_path):
527 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
528 db = AgentDB(tmp_path / "state.db")
529 try:
530 job_id = db.create_job(
531 "Write a durable report",
532 title="report",
533 kind="generic",
534 metadata={
535 "task_queue": [
536 {
537 "title": "Draft paper - Methods section",
538 "status": "open",
539 "priority": 5,
540 "output_contract": "report",
541 "acceptance_criteria": "Methods section is saved as an output.",
542 }
543 ]
544 },
545 )
546 llm = ScriptedLLM([
547 LLMResponse(tool_calls=[
548 ToolCall(
549 name="write_artifact",
550 arguments={
551 "title": "Paper Draft - Section 3: Methods",
552 "summary": "Methods section for the report",
553 "content": "This methods section explains the approach and evidence.",
554 },
555 )
556 ])
557 ])
558
559 result = run_one_step(job_id, config=config, db=db, llm=llm)
560
561 assert result.status == "completed"
562 job = db.get_job(job_id)
563 task = job["metadata"]["task_queue"][0]
564 assert task["status"] == "done"
565 assert task["metadata"]["auto_reconciled_from_artifact"]
566 assert "Saved output" in task["result"]
567 revision_tasks = [
568 item
569 for item in job["metadata"]["task_queue"]
570 if item["status"] == "open" and item.get("metadata", {}).get("source") == "auto_revision_loop"
571 ]
572 assert len(revision_tasks) == 1
573 assert revision_tasks[0]["output_contract"] == "report"
574 assert revision_tasks[0]["metadata"]["revision_source_artifact_id"]
575 finally:
576 db.close()
577
578
579def test_evidence_artifact_does_not_complete_deliverable_task(tmp_path):
580 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
581 db = AgentDB(tmp_path / "state.db")
582 try:
583 job_id = db.create_job(
584 "Improve a durable report",
585 title="report",
586 kind="generic",
587 metadata={
588 "task_queue": [
589 {
590 "title": "Update report with new citations",
591 "status": "open",
592 "priority": 5,
593 "output_contract": "artifact",
594 "acceptance_criteria": "Report text is updated with citations.",
595 "evidence_needed": "Updated report draft, not just source notes.",
596 }
597 ]
598 },
599 )
600 llm = ScriptedLLM([
601 LLMResponse(tool_calls=[
602 ToolCall(
603 name="write_artifact",
604 arguments={
605 "title": "Evidence: citation sources",
606 "summary": "Extracted source notes for citations",
607 "content": "These notes describe sources that could later be used in the report.",
608 },
609 )
610 ])
611 ])
612
613 result = run_one_step(job_id, config=config, db=db, llm=llm)
614
615 assert result.status == "completed"
616 job = db.get_job(job_id)
617 task = job["metadata"]["task_queue"][0]
618 assert task["status"] == "open"
619 assert "auto_reconciled_from_artifact" not in task.get("metadata", {})
620 assert not [
621 item
622 for item in job["metadata"]["task_queue"]
623 if item.get("metadata", {}).get("source") == "auto_revision_loop"
624 ]
625 finally:
626 db.close()
627
628
629def test_new_deliverable_supersedes_old_auto_revision_task(tmp_path):
630 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
631 db = AgentDB(tmp_path / "state.db")
632 try:
633 job_id = db.create_job(
634 "Keep improving a durable report",
635 title="report",
636 kind="generic",
637 metadata={
638 "task_queue": [
639 {
640 "title": "Review and revise saved output art_old",
641 "status": "open",
642 "priority": 4,
643 "output_contract": "report",
644 "metadata": {
645 "source": "auto_revision_loop",
646 "revision_source_artifact_id": "art_old",
647 },
648 }
649 ]
650 },
651 )
652 llm = ScriptedLLM([
653 LLMResponse(tool_calls=[
654 ToolCall(
655 name="write_artifact",
656 arguments={
657 "title": "Report Draft Revision",
658 "summary": "Updated durable report draft",
659 "content": "This revised report draft supersedes the previous saved output.",
660 },
661 )
662 ])
663 ])
664
665 result = run_one_step(job_id, config=config, db=db, llm=llm)
666
667 assert result.status == "completed"
668 tasks = db.get_job(job_id)["metadata"]["task_queue"]
669 old = next(task for task in tasks if task["metadata"].get("revision_source_artifact_id") == "art_old")
670 new = next(task for task in tasks if task["metadata"].get("revision_source_artifact_id") != "art_old")
671 assert old["status"] == "skipped"
672 assert old["metadata"]["superseded_by_artifact_id"]
673 assert new["status"] == "open"
674 assert new["metadata"]["source"] == "auto_revision_loop"
675 finally:
676 db.close()
677
678
679def test_audit_report_draft_counts_as_deliverable_output(tmp_path):
680 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
681 db = AgentDB(tmp_path / "state.db")
682 try:
683 job_id = db.create_job(
684 "Write a durable audit report",
685 title="audit report",
686 kind="generic",
687 metadata={
688 "task_queue": [
689 {
690 "title": "Write audit report draft",
691 "status": "open",
692 "priority": 5,
693 "output_contract": "artifact",
694 "acceptance_criteria": "A report draft is saved.",
695 "evidence_needed": "Saved report draft, not only notes.",
696 }
697 ]
698 },
699 )
700 llm = ScriptedLLM([
701 LLMResponse(tool_calls=[
702 ToolCall(
703 name="write_artifact",
704 arguments={
705 "title": "Market Readiness Audit Report Draft",
706 "summary": "Saved audit report draft with current findings and recommendations",
707 "content": "This is the current audit report draft.",
708 },
709 )
710 ])
711 ])
712
713 result = run_one_step(job_id, config=config, db=db, llm=llm)
714
715 assert result.status == "completed"
716 job = db.get_job(job_id)
717 task = job["metadata"]["task_queue"][0]
718 assert task["status"] == "done"
719 assert task["metadata"]["auto_reconciled_from_artifact"]
720 finally:
721 db.close()
722
723
724def test_checkpoint_artifact_does_not_complete_deliverable_task(tmp_path):
725 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
726 db = AgentDB(tmp_path / "state.db")
727 try:
728 job_id = db.create_job(
729 "Compile a durable report",
730 title="report",
731 kind="generic",
732 metadata={
733 "task_queue": [
734 {
735 "title": "Compile full report",
736 "status": "open",
737 "priority": 5,
738 "output_contract": "artifact",
739 "acceptance_criteria": "Final compiled report is saved.",
740 }
741 ]
742 },
743 )
744 llm = ScriptedLLM([
745 LLMResponse(tool_calls=[
746 ToolCall(
747 name="write_artifact",
748 arguments={
749 "title": "Compiled report checkpoint",
750 "summary": "Current state checkpoint, not a final compiled report",
751 "content": "This checkpoint describes what still needs to be written.",
752 },
753 )
754 ])
755 ])
756
757 result = run_one_step(job_id, config=config, db=db, llm=llm)
758
759 assert result.status == "completed"
760 job = db.get_job(job_id)
761 task = job["metadata"]["task_queue"][0]
762 assert task["status"] == "open"
763 assert "auto_reconciled_from_artifact" not in task.get("metadata", {})
764 finally:
765 db.close()
766
767
768def test_evidence_artifact_can_complete_research_task(tmp_path):
769 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
770 db = AgentDB(tmp_path / "state.db")
771 try:
772 job_id = db.create_job(
773 "Gather source evidence",
774 title="research",
775 kind="generic",
776 metadata={
777 "task_queue": [
778 {
779 "title": "Collect citation source evidence",
780 "status": "open",
781 "priority": 5,
782 "output_contract": "research",
783 "acceptance_criteria": "Evidence sources are saved.",
784 }
785 ]
786 },
787 )
788 llm = ScriptedLLM([
789 LLMResponse(tool_calls=[
790 ToolCall(
791 name="write_artifact",
792 arguments={
793 "title": "Evidence: citation sources",
794 "summary": "Extracted source evidence",
795 "content": "Citation source evidence for later report writing.",
796 },
797 )
798 ])
799 ])
800
801 result = run_one_step(job_id, config=config, db=db, llm=llm)
802
803 assert result.status == "completed"
804 job = db.get_job(job_id)
805 task = job["metadata"]["task_queue"][0]
806 assert task["status"] == "done"
807 assert task["metadata"]["auto_reconciled_from_artifact"]
808 finally:
809 db.close()
810
811
812def test_run_one_step_blocks_artifact_churn_until_progress_accounting(tmp_path):
813 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
814 db = AgentDB(tmp_path / "state.db")
815 try:
816 job_id = db.create_job("Keep a durable progress ledger", title="ledger", kind="generic")
817 for index in range(3):
818 run_id = db.start_run(job_id, model="test")
819 step_id = db.add_step(
820 job_id=job_id,
821 run_id=run_id,
822 kind="tool",
823 tool_name="write_artifact",
824 input_data={"arguments": {"title": f"Output {index}", "content": "notes"}},
825 )
826 db.finish_step(
827 step_id,
828 status="completed",
829 summary=f"write_artifact saved art_{index}",
830 output_data={"success": True, "artifact_id": f"art_{index}"},
831 )
832 db.finish_run(run_id, "completed")
833
834 blocked = run_one_step(
835 job_id,
836 config=config,
837 db=db,
838 llm=ScriptedLLM([
839 LLMResponse(tool_calls=[
840 ToolCall(name="write_artifact", arguments={"title": "Another output", "content": "more notes"})
841 ])
842 ]),
843 )
844
845 assert blocked.status == "blocked"
846 assert blocked.result["error"] == "progress accounting required"
847 allowed = run_one_step(
848 job_id,
849 config=config,
850 db=db,
851 llm=ScriptedLLM([
852 LLMResponse(tool_calls=[
853 ToolCall(
854 name="record_tasks",
855 arguments={"tasks": [{"title": "Review saved outputs", "status": "open", "priority": 2}]},
856 )
857 ])
858 ]),
859 )
860 assert allowed.status == "completed"
861 assert allowed.tool_name == "record_tasks"
862 finally:
863 db.close()
864
865
866def test_activity_checkpoint_streak_blocks_more_churn_until_ledger_update(tmp_path):
867 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
868 db = AgentDB(tmp_path / "state.db")
869 try:
870 job_id = db.create_job("Keep working until durable progress appears", title="stagnation", kind="generic")
871 db.update_job_metadata(
872 job_id,
873 {
874 "activity_checkpoint_streak": 3,
875 "last_checkpoint_counts": {
876 "findings": 0,
877 "sources": 0,
878 "tasks": 1,
879 "experiments": 0,
880 "lessons": 0,
881 "milestones": 0,
882 },
883 },
884 )
885
886 blocked = run_one_step(
887 job_id,
888 config=config,
889 db=db,
890 llm=ScriptedLLM([LLMResponse(tool_calls=[ToolCall(name="web_search", arguments={"query": "more background"})])]),
891 )
892
893 assert blocked.status == "blocked"
894 assert blocked.result["error"] == "durable progress required"
895
896 allowed = run_one_step(
897 job_id,
898 config=config,
899 db=db,
900 llm=ScriptedLLM([
901 LLMResponse(tool_calls=[ToolCall(name="record_tasks", arguments={"tasks": [{"title": "Pivot branch", "status": "open"}]})])
902 ]),
903 )
904
905 assert allowed.status == "completed"
906 assert allowed.tool_name == "record_tasks"
907 finally:
908 db.close()
909
910
911def test_task_only_checkpoint_streak_blocks_new_task_sprawl(tmp_path):
912 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
913 db = AgentDB(tmp_path / "state.db")
914 try:
915 job_id = db.create_job("Keep executing durable work", title="task-sprawl", kind="generic")
916 db.update_job_metadata(
917 job_id,
918 {
919 "task_planning_checkpoint_streak": 2,
920 "task_queue": [
921 {
922 "key": "existing-branch",
923 "title": "Existing branch",
924 "status": "open",
925 }
926 ],
927 },
928 )
929
930 blocked = run_one_step(
931 job_id,
932 config=config,
933 db=db,
934 llm=ScriptedLLM([
935 LLMResponse(tool_calls=[
936 ToolCall(
937 name="record_tasks",
938 arguments={"tasks": [{"title": "Another open branch", "status": "open"}]},
939 )
940 ])
941 ]),
942 )
943
944 assert blocked.status == "blocked"
945 assert blocked.result["error"] == "task execution required"
946
947 allowed = run_one_step(
948 job_id,
949 config=config,
950 db=db,
951 llm=ScriptedLLM([
952 LLMResponse(tool_calls=[
953 ToolCall(
954 name="record_tasks",
955 arguments={
956 "tasks": [
957 {
958 "title": "Existing branch",
959 "status": "done",
960 "result": "Executed and checkpointed.",
961 }
962 ]
963 },
964 )
965 ])
966 ]),
967 )
968
969 assert allowed.status == "completed"
970 assert allowed.tool_name == "record_tasks"
971 finally:
972 db.close()
973
974
975def test_task_only_checkpoint_updates_planning_streak(tmp_path):
976 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
977 db = AgentDB(tmp_path / "state.db")
978 try:
979 job_id = db.create_job("Track planning-only progress", title="task-streak", kind="generic")
980 for index in range(9):
981 run_id = db.start_run(job_id, model="test")
982 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="web_search")
983 db.finish_step(step_id, status="completed", summary=f"search {index}", output_data={"success": True})
984 db.finish_run(run_id, "completed")
985
986 result = run_one_step(
987 job_id,
988 config=config,
989 db=db,
990 llm=ScriptedLLM([
991 LLMResponse(tool_calls=[
992 ToolCall(name="record_tasks", arguments={"tasks": [{"title": "First branch", "status": "open"}]})
993 ])
994 ]),
995 )
996
997 assert result.status == "completed"
998 job = db.get_job(job_id)
999 assert job["metadata"]["task_planning_checkpoint_streak"] == 1
1000
1001 db.append_finding_record(job_id, name="Durable finding")
1002 for index in range(9):
1003 run_id = db.start_run(job_id, model="test")
1004 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="web_search")
1005 db.finish_step(step_id, status="completed", summary=f"search reset {index}", output_data={"success": True})
1006 db.finish_run(run_id, "completed")
1007 result = run_one_step(
1008 job_id,
1009 config=config,
1010 db=db,
1011 llm=ScriptedLLM([
1012 LLMResponse(tool_calls=[
1013 ToolCall(name="record_tasks", arguments={"tasks": [{"title": "Second branch", "status": "open"}]})
1014 ])
1015 ]),
1016 )
1017
1018 assert result.status == "completed"
1019 job = db.get_job(job_id)
1020 assert job["metadata"]["task_planning_checkpoint_streak"] == 0
1021 finally:
1022 db.close()
1023
1024
1025def test_task_resolution_checkpoint_resets_planning_streak(tmp_path):
1026 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1027 db = AgentDB(tmp_path / "state.db")
1028 try:
1029 job_id = db.create_job("Resolve existing durable branches", title="task-resolution", kind="generic")
1030 db.append_task_record(job_id, title="Existing branch", status="open", priority=5)
1031 db.update_job_metadata(
1032 job_id,
1033 {
1034 "last_checkpoint_counts": {
1035 "findings": 0,
1036 "sources": 0,
1037 "tasks": 1,
1038 "experiments": 0,
1039 "lessons": 0,
1040 "milestones": 0,
1041 },
1042 "last_checkpoint_at": "2026-01-01T00:00:00+00:00",
1043 "task_planning_checkpoint_streak": 2,
1044 },
1045 )
1046 for index in range(9):
1047 run_id = db.start_run(job_id, model="test")
1048 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="web_search")
1049 db.finish_step(step_id, status="completed", summary=f"search {index}", output_data={"success": True})
1050 db.finish_run(run_id, "completed")
1051
1052 result = run_one_step(
1053 job_id,
1054 config=config,
1055 db=db,
1056 llm=ScriptedLLM([
1057 LLMResponse(tool_calls=[
1058 ToolCall(
1059 name="record_tasks",
1060 arguments={
1061 "tasks": [
1062 {
1063 "title": "Existing branch",
1064 "status": "done",
1065 "result": "Resolved using the latest evidence.",
1066 "metadata": {"source_url": "file:///tmp/latest-evidence"},
1067 }
1068 ]
1069 },
1070 )
1071 ])
1072 ]),
1073 )
1074
1075 assert result.status == "completed"
1076 job = db.get_job(job_id)
1077 assert job["metadata"]["task_planning_checkpoint_streak"] == 0
1078 assert job["metadata"]["last_agent_update"]["category"] == "progress"
1079 assert job["metadata"]["last_agent_update"]["metadata"]["updates"]["tasks"] == 1
1080 assert job["metadata"]["last_agent_update"]["metadata"]["resolutions"]["tasks"] == 1
1081 finally:
1082 db.close()
1083
1084
1085def test_run_one_step_blocks_similar_artifact_search(tmp_path):
1086 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1087 db = AgentDB(tmp_path / "state.db")
1088 try:
1089 job_id = db.create_job("Review saved outputs", title="artifact-search", kind="generic")
1090 run_id = db.start_run(job_id, model="test")
1091 step_id = db.add_step(
1092 job_id=job_id,
1093 run_id=run_id,
1094 kind="tool",
1095 tool_name="search_artifacts",
1096 input_data={"arguments": {"query": "distillation agentic paper evidence", "limit": 20}},
1097 )
1098 db.finish_step(
1099 step_id,
1100 status="completed",
1101 summary="search_artifacts returned 0 results",
1102 output_data={"success": True, "results": []},
1103 )
1104 db.finish_run(run_id, "completed")
1105
1106 result = run_one_step(
1107 job_id,
1108 config=config,
1109 db=db,
1110 llm=ScriptedLLM([
1111 LLMResponse(tool_calls=[
1112 ToolCall(name="search_artifacts", arguments={"query": "paper evidence for agentic distillation", "limit": 20})
1113 ])
1114 ]),
1115 )
1116
1117 assert result.status == "blocked"
1118 assert result.result["error"] == "similar artifact search blocked"
1119 assert result.result["blocked_tool"] == "search_artifacts"
1120 finally:
1121 db.close()
1122
1123
1124def test_run_one_step_blocks_artifact_review_when_tasks_are_exhausted(tmp_path):
1125 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1126 db = AgentDB(tmp_path / "state.db")
1127 try:
1128 job_id = db.create_job(
1129 "Review saved outputs",
1130 title="review-exhausted",
1131 kind="generic",
1132 metadata={"task_queue": [{"title": "Review first output", "status": "done", "priority": 5}]},
1133 )
1134
1135 result = run_one_step(
1136 job_id,
1137 config=config,
1138 db=db,
1139 llm=ScriptedLLM([
1140 LLMResponse(tool_calls=[ToolCall(name="search_artifacts", arguments={"query": "paper evidence"})])
1141 ]),
1142 )
1143
1144 assert result.status == "blocked"
1145 assert result.result["error"] == "task branch required before more work"
1146 assert result.result["blocked_tool"] == "search_artifacts"
1147 finally:
1148 db.close()
1149
1150
1151def test_run_one_step_recovers_repeated_guard_blocks_without_llm(tmp_path):
1152 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1153 db = AgentDB(tmp_path / "state.db")
1154 try:
1155 job_id = db.create_job("Recover repeated blocked work", title="guard", kind="generic")
1156 for index, tool_name in enumerate(["search_artifacts", "shell_exec", "read_artifact"], start=1):
1157 run_id = db.start_run(job_id, model="test")
1158 step_id = db.add_step(
1159 job_id=job_id,
1160 run_id=run_id,
1161 kind="tool",
1162 tool_name=tool_name,
1163 input_data={"arguments": {"query": f"blocked {index}"}},
1164 )
1165 db.finish_step(
1166 step_id,
1167 status="blocked",
1168 summary=f"blocked {tool_name}; progress ledger update required",
1169 output_data={"success": True, "recoverable": True, "error": "progress ledger update required"},
1170 )
1171 db.finish_run(run_id, "completed")
1172
1173 result = run_one_step(job_id, config=config, db=db, llm=ExplodingLLM())
1174
1175 assert result.status == "completed"
1176 assert result.tool_name == "guard_recovery"
1177 assert result.result["guard_recovery"]["error"] == "progress ledger update required"
1178 job = db.get_job(job_id)
1179 assert any(task["title"] == "Resolve guard: progress ledger update required" for task in job["metadata"]["task_queue"])
1180 assert any("Repeated guard block" in lesson["lesson"] for lesson in job["metadata"]["lessons"])
1181 finally:
1182 db.close()
1183
1184
1185def test_guard_recovery_does_not_add_task_for_queue_saturation(tmp_path):
1186 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1187 db = AgentDB(tmp_path / "state.db")
1188 try:
1189 job_id = db.create_job(
1190 "Consolidate a saturated backlog",
1191 title="guard-saturated-tasks",
1192 kind="generic",
1193 metadata={
1194 "task_queue": [
1195 {"title": f"Existing branch {index}", "status": "open", "priority": index}
1196 for index in range(40)
1197 ]
1198 },
1199 )
1200 for index in range(3):
1201 run_id = db.start_run(job_id, model="test")
1202 step_id = db.add_step(
1203 job_id=job_id,
1204 run_id=run_id,
1205 kind="tool",
1206 tool_name="record_tasks",
1207 input_data={"arguments": {"tasks": [{"title": f"New branch {index}", "status": "open"}]}},
1208 )
1209 db.finish_step(
1210 step_id,
1211 status="blocked",
1212 summary="blocked record_tasks; total task queue is too large",
1213 output_data={
1214 "success": False,
1215 "recoverable": True,
1216 "error": "task queue saturated",
1217 "task_queue": {
1218 "reason": "total task queue is too large",
1219 "open_count": 40,
1220 "total_count": 40,
1221 },
1222 },
1223 )
1224 db.finish_run(run_id, "completed")
1225
1226 result = run_one_step(job_id, config=config, db=db, llm=ExplodingLLM())
1227
1228 assert result.status == "completed"
1229 assert result.tool_name == "guard_recovery"
1230 assert result.result["task_opened"] is False
1231 job = db.get_job(job_id)
1232 tasks = job["metadata"]["task_queue"]
1233 assert len(tasks) == 40
1234 assert not any(task["title"].startswith("Resolve guard:") for task in tasks)
1235 assert job["metadata"]["task_backlog_pressure"]["total_count"] == 40
1236 assert any("Do not open guard-recovery tasks for saturation" in lesson["lesson"] for lesson in job["metadata"]["lessons"])
1237 finally:
1238 db.close()
1239
1240
1241def test_run_one_step_recovers_repeated_evidence_grounding_blocks(tmp_path):
1242 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1243 db = AgentDB(tmp_path / "state.db")
1244 try:
1245 job_id = db.create_job("Recover repeated grounding failures", title="grounding-guard", kind="generic")
1246 for index in range(3):
1247 run_id = db.start_run(job_id, model="test")
1248 step_id = db.add_step(
1249 job_id=job_id,
1250 run_id=run_id,
1251 kind="tool",
1252 tool_name="record_experiment",
1253 input_data={"arguments": {"title": f"Unsupported record {index}"}},
1254 )
1255 db.finish_step(
1256 step_id,
1257 status="blocked",
1258 summary="blocked record_experiment; evidence grounding required",
1259 output_data={"success": False, "recoverable": True, "error": "evidence grounding required"},
1260 )
1261 db.finish_run(run_id, "completed")
1262
1263 result = run_one_step(job_id, config=config, db=db, llm=ExplodingLLM())
1264
1265 assert result.status == "completed"
1266 assert result.tool_name == "guard_recovery"
1267 assert result.result["guard_recovery"]["error"] == "evidence grounding required"
1268 finally:
1269 db.close()
1270
1271
1272def test_run_one_step_recovers_repeated_known_bad_source_blocks(tmp_path):
1273 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1274 db = AgentDB(tmp_path / "state.db")
1275 try:
1276 job_id = db.create_job("Avoid repeatedly blocked sources", title="guard")
1277 for index in range(3):
1278 run_id = db.start_run(job_id, model="test")
1279 step_id = db.add_step(
1280 job_id=job_id,
1281 run_id=run_id,
1282 kind="tool",
1283 tool_name="web_extract",
1284 input_data={"arguments": {"urls": ["https://bad.example/source"]}},
1285 )
1286 db.finish_step(
1287 step_id,
1288 status="blocked",
1289 summary="blocked web_extract; known bad source https://bad.example/source",
1290 output_data={"success": False, "error": "known bad source blocked"},
1291 )
1292 db.finish_run(run_id, "completed")
1293
1294 result = run_one_step(job_id, config=config, db=db, llm=ExplodingLLM())
1295
1296 assert result.status == "completed"
1297 assert result.tool_name == "guard_recovery"
1298 assert result.result["guard_recovery"]["error"] == "known bad source blocked"
1299 job = db.get_job(job_id)
1300 assert any(task["title"] == "Resolve guard: known bad source blocked" for task in job["metadata"]["task_queue"])
1301 finally:
1302 db.close()
1303
1304
1305def test_guard_recovery_does_not_repeat_after_recovery_step(tmp_path):
1306 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1307 db = AgentDB(tmp_path / "state.db")
1308 try:
1309 job_id = db.create_job("Recover repeated blocked work once", title="guard-once", kind="generic")
1310 for index in range(3):
1311 run_id = db.start_run(job_id, model="test")
1312 step_id = db.add_step(
1313 job_id=job_id,
1314 run_id=run_id,
1315 kind="tool",
1316 tool_name="search_artifacts",
1317 input_data={"arguments": {"query": f"blocked {index}"}},
1318 )
1319 db.finish_step(
1320 step_id,
1321 status="blocked",
1322 summary="blocked search_artifacts; progress ledger update required",
1323 output_data={"success": True, "recoverable": True, "error": "progress ledger update required"},
1324 )
1325 db.finish_run(run_id, "completed")
1326
1327 first = run_one_step(job_id, config=config, db=db, llm=ExplodingLLM())
1328 assert first.tool_name == "guard_recovery"
1329
1330 second = run_one_step(
1331 job_id,
1332 config=config,
1333 db=db,
1334 llm=ScriptedLLM([
1335 LLMResponse(tool_calls=[
1336 ToolCall(name="record_lesson", arguments={"lesson": "Recovered guard and chose a new branch", "category": "strategy"})
1337 ])
1338 ]),
1339 )
1340
1341 assert second.status == "completed"
1342 assert second.tool_name == "record_lesson"
1343 assert [step["tool_name"] for step in db.list_steps(job_id=job_id)[-2:]] == ["guard_recovery", "record_lesson"]
1344 finally:
1345 db.close()
1346
1347
1348def test_guard_recovery_does_not_keep_reopening_same_guard(tmp_path):
1349 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1350 db = AgentDB(tmp_path / "state.db")
1351 try:
1352 job_id = db.create_job("Recover repeated blocked work once", title="guard-repeat", kind="generic")
1353 for batch in range(2):
1354 for index in range(3):
1355 run_id = db.start_run(job_id, model="test")
1356 step_id = db.add_step(
1357 job_id=job_id,
1358 run_id=run_id,
1359 kind="tool",
1360 tool_name="search_artifacts",
1361 input_data={"arguments": {"query": f"blocked {batch}-{index}"}},
1362 )
1363 db.finish_step(
1364 step_id,
1365 status="blocked",
1366 summary="blocked search_artifacts; progress ledger update required",
1367 output_data={"success": False, "recoverable": True, "error": "progress ledger update required"},
1368 )
1369 db.finish_run(run_id, "completed")
1370 result = run_one_step(job_id, config=config, db=db, llm=ExplodingLLM() if batch == 0 else ScriptedLLM([
1371 LLMResponse(tool_calls=[ToolCall(name="record_lesson", arguments={"lesson": "Use a different branch", "category": "strategy"})])
1372 ]))
1373
1374 steps = db.list_steps(job_id=job_id)
1375 assert sum(1 for step in steps if step["tool_name"] == "guard_recovery") == 1
1376 assert result.tool_name == "record_lesson"
1377 finally:
1378 db.close()
1379
1380
1381def test_guard_recovery_reopens_same_guard_after_progress(tmp_path):
1382 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1383 db = AgentDB(tmp_path / "state.db")
1384 try:
1385 job_id = db.create_job("Recover repeated blocked work after progress", title="guard-progress", kind="generic")
1386 for index in range(3):
1387 run_id = db.start_run(job_id, model="test")
1388 step_id = db.add_step(
1389 job_id=job_id,
1390 run_id=run_id,
1391 kind="tool",
1392 tool_name="search_artifacts",
1393 input_data={"arguments": {"query": f"blocked first {index}"}},
1394 )
1395 db.finish_step(
1396 step_id,
1397 status="blocked",
1398 summary="blocked search_artifacts; progress ledger update required",
1399 output_data={"success": False, "recoverable": True, "error": "progress ledger update required"},
1400 )
1401 db.finish_run(run_id, "completed")
1402
1403 first = run_one_step(job_id, config=config, db=db, llm=ExplodingLLM())
1404 assert first.tool_name == "guard_recovery"
1405
1406 run_id = db.start_run(job_id, model="test")
1407 progress_step = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_lesson")
1408 db.finish_step(progress_step, status="completed", output_data={"success": True, "lesson": "Recovered once."})
1409 db.finish_run(run_id, "completed")
1410
1411 for index in range(3):
1412 run_id = db.start_run(job_id, model="test")
1413 step_id = db.add_step(
1414 job_id=job_id,
1415 run_id=run_id,
1416 kind="tool",
1417 tool_name="read_artifact",
1418 input_data={"arguments": {"query": f"blocked second {index}"}},
1419 )
1420 db.finish_step(
1421 step_id,
1422 status="blocked",
1423 summary="blocked read_artifact; progress ledger update required",
1424 output_data={"success": False, "recoverable": True, "error": "progress ledger update required"},
1425 )
1426 db.finish_run(run_id, "completed")
1427
1428 second = run_one_step(job_id, config=config, db=db, llm=ExplodingLLM())
1429 assert second.tool_name == "guard_recovery"
1430 assert sum(1 for step in db.list_steps(job_id=job_id) if step["tool_name"] == "guard_recovery") == 2
1431 finally:
1432 db.close()
1433
1434
1435def test_guard_recovery_accounts_pending_evidence_checkpoint(tmp_path):
1436 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1437 db = AgentDB(tmp_path / "state.db")
1438 try:
1439 job_id = db.create_job("Recover checkpoint accounting deadlock", title="checkpoint-recovery", kind="generic")
1440 db.update_job_metadata(
1441 job_id,
1442 {
1443 "pending_evidence_checkpoint": {
1444 "artifact_id": "art_checkpoint",
1445 "title": "Auto Evidence Checkpoint after step 1",
1446 "read_at": "2026-01-01T00:00:00+00:00",
1447 "evidence_step_no": 1,
1448 "blocked_tool": "shell_exec",
1449 }
1450 },
1451 )
1452 for index in range(3):
1453 run_id = db.start_run(job_id, model="test")
1454 step_id = db.add_step(
1455 job_id=job_id,
1456 run_id=run_id,
1457 kind="tool",
1458 tool_name="read_artifact",
1459 input_data={"arguments": {"artifact_id": "art_checkpoint", "retry": index}},
1460 )
1461 db.finish_step(
1462 step_id,
1463 status="blocked",
1464 summary="blocked read_artifact; evidence checkpoint accounting required",
1465 output_data={"success": False, "recoverable": True, "error": "evidence checkpoint accounting required"},
1466 )
1467 db.finish_run(run_id, "completed")
1468
1469 result = run_one_step(job_id, config=config, db=db, llm=ExplodingLLM())
1470
1471 assert result.tool_name == "guard_recovery"
1472 pending = db.get_job(job_id)["metadata"]["pending_evidence_checkpoint"]
1473 assert pending["resolved_at"]
1474 assert pending["resolved_by_tool"] == "guard_recovery"
1475 finally:
1476 db.close()
1477
1478
1479def test_guard_recovery_immediately_recovers_already_read_checkpoint_reread(tmp_path):
1480 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1481 db = AgentDB(tmp_path / "state.db")
1482 try:
1483 job_id = db.create_job("Recover checkpoint reread deadlock", title="checkpoint-recovery", kind="generic")
1484 run_id = db.start_run(job_id, model="test")
1485 step_id = db.add_step(
1486 job_id=job_id,
1487 run_id=run_id,
1488 kind="tool",
1489 tool_name="read_artifact",
1490 input_data={"arguments": {"artifact_id": "art_checkpoint"}},
1491 )
1492 db.finish_step(
1493 step_id,
1494 status="blocked",
1495 summary="blocked read_artifact; evidence checkpoint accounting required",
1496 output_data={
1497 "success": False,
1498 "recoverable": True,
1499 "error": "evidence checkpoint accounting required",
1500 "blocked_tool": "read_artifact",
1501 "pending_evidence_checkpoint": {
1502 "artifact_id": "art_checkpoint",
1503 "checkpoint_read": True,
1504 "read_at": "2026-01-01T00:00:00+00:00",
1505 },
1506 },
1507 )
1508 db.finish_run(run_id, "completed")
1509
1510 result = run_one_step(job_id, config=config, db=db, llm=ExplodingLLM())
1511
1512 assert result.tool_name == "guard_recovery"
1513 assert result.result["guard_recovery"]["count"] == 1
1514 assert result.result["guard_recovery"]["error"] == "evidence checkpoint accounting required"
1515 finally:
1516 db.close()
1517
1518
1519def test_prompt_does_not_tell_worker_to_reread_checkpoint_after_it_was_read(tmp_path):
1520 db = AgentDB(tmp_path / "state.db")
1521 try:
1522 job_id = db.create_job("Account for checkpoint", title="checkpoint-prompt", kind="generic")
1523 db.update_job_metadata(
1524 job_id,
1525 {
1526 "pending_evidence_checkpoint": {
1527 "artifact_id": "art_checkpoint",
1528 "title": "Auto Evidence Checkpoint after step 1",
1529 "read_at": "2026-01-01T00:00:00+00:00",
1530 "evidence_step_no": 1,
1531 "blocked_tool": "shell_exec",
1532 }
1533 },
1534 )
1535
1536 content = build_messages(db.get_job(job_id), db.list_steps(job_id=job_id))[-1]["content"]
1537
1538 assert "Do not read the checkpoint again" in content
1539 assert "Next either read that checkpoint artifact" not in content
1540 finally:
1541 db.close()
1542
1543
1544def test_checkpoint_reread_block_requires_accounting_not_more_reads(tmp_path):
1545 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1546 db = AgentDB(tmp_path / "state.db")
1547 try:
1548 job_id = db.create_job("Account for checkpoint", title="checkpoint-reread", kind="generic")
1549 db.update_job_metadata(
1550 job_id,
1551 {
1552 "pending_evidence_checkpoint": {
1553 "artifact_id": "art_checkpoint",
1554 "title": "Auto Evidence Checkpoint after step 1",
1555 "read_at": "2026-01-01T00:00:00+00:00",
1556 "evidence_step_no": 1,
1557 "blocked_tool": "shell_exec",
1558 }
1559 },
1560 )
1561
1562 blocked = run_one_step(
1563 job_id,
1564 config=config,
1565 db=db,
1566 llm=ScriptedLLM([
1567 LLMResponse(tool_calls=[ToolCall(name="read_artifact", arguments={"artifact_id": "art_checkpoint"})])
1568 ]),
1569 )
1570
1571 assert blocked.status == "blocked"
1572 assert blocked.result["error"] == "evidence checkpoint accounting required"
1573 assert blocked.result["checkpoint_already_read"] is True
1574 assert blocked.result["required_next_action"] == "durable_checkpoint_accounting"
1575 assert "Do not read it again" in blocked.result["guidance"]
1576
1577 recovery = run_one_step(job_id, config=config, db=db, llm=ExplodingLLM())
1578
1579 assert recovery.tool_name == "guard_recovery"
1580 task = recovery.result["task"]
1581 assert task["metadata"]["resolves_evidence_checkpoint"] is True
1582 assert "Do not read the same checkpoint again" in task["acceptance_criteria"]
1583 finally:
1584 db.close()
1585
1586
1587def test_already_read_checkpoint_branch_block_recovers_immediately(tmp_path):
1588 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1589 db = AgentDB(tmp_path / "state.db")
1590 try:
1591 job_id = db.create_job("Recover checkpoint branch deadlock", title="checkpoint-branch", kind="generic")
1592 db.update_job_metadata(
1593 job_id,
1594 {
1595 "pending_evidence_checkpoint": {
1596 "artifact_id": "art_checkpoint",
1597 "title": "Auto Evidence Checkpoint after step 1",
1598 "read_at": "2026-01-01T00:00:00+00:00",
1599 "evidence_step_no": 1,
1600 "blocked_tool": "shell_exec",
1601 }
1602 },
1603 )
1604 run_id = db.start_run(job_id, model="test")
1605 step_id = db.add_step(
1606 job_id=job_id,
1607 run_id=run_id,
1608 kind="tool",
1609 tool_name="shell_exec",
1610 input_data={"arguments": {"command": "echo more branch work"}},
1611 )
1612 db.finish_step(
1613 step_id,
1614 status="blocked",
1615 summary="blocked shell_exec; evidence checkpoint accounting required",
1616 output_data={
1617 "success": False,
1618 "recoverable": True,
1619 "error": "evidence checkpoint accounting required",
1620 "checkpoint_already_read": True,
1621 "pending_evidence_checkpoint": {
1622 "artifact_id": "art_checkpoint",
1623 "checkpoint_read": True,
1624 },
1625 },
1626 )
1627 db.finish_run(run_id, "completed")
1628
1629 result = run_one_step(job_id, config=config, db=db, llm=ExplodingLLM())
1630
1631 assert result.tool_name == "guard_recovery"
1632 assert result.result["guard_recovery"]["count"] == 1
1633 assert result.result["task"]["metadata"]["resolves_evidence_checkpoint"] is True
1634 pending = db.get_job(job_id)["metadata"]["pending_evidence_checkpoint"]
1635 assert pending["resolved_by_tool"] == "guard_recovery"
1636 finally:
1637 db.close()
1638
1639
1640def test_evidence_grounding_ignores_format_protocol_tokens():
1641 tokens = _concrete_evidence_tokens(
1642 "Parsed JSON from HTTPS REST API URL and saved HTML/YAML/XML CDN SHA256 GGUF excerpts for Model-7B step_123_shell_output. "
1643 "Download investigation parsed direct API results. Discovery step-2678 located a candidate file after shell_exec_step_1037."
1644 )
1645
1646 assert "JSON" not in tokens
1647 assert "HTTPS" not in tokens
1648 assert "REST" not in tokens
1649 assert "API" not in tokens
1650 assert "CDN" not in tokens
1651 assert "SHA256" not in tokens
1652 assert "GGUF" not in tokens
1653 assert "URL" not in tokens
1654 assert "Download" not in tokens
1655 assert "Discovery" not in tokens
1656 assert "investigation" not in tokens
1657 assert "direct" not in tokens
1658 assert "step_123_shell_output" not in tokens
1659 assert "step-2678" not in tokens
1660 assert "shell_exec_step_1037" not in tokens
1661 assert "Model-7B" in tokens
1662
1663
1664def test_evidence_grounding_ignores_lowercase_command_shorthand_tokens():
1665 tokens = _concrete_evidence_tokens("Build with cmake --build . -j16 on H100 hardware if observed.")
1666
1667 assert "j16" not in tokens
1668 assert "H100" in tokens
1669
1670
1671def test_record_experiment_allows_not_stub_validation_for_observed_token(tmp_path):
1672 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1673 db = AgentDB(tmp_path / "state.db")
1674 try:
1675 job_id = db.create_job("Validate discovered file", title="grounding", kind="generic")
1676 run_id = db.start_run(job_id, model="test")
1677 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
1678 db.finish_step(
1679 step_id,
1680 status="completed",
1681 output_data={
1682 "success": True,
1683 "stdout": "-rw-r--r-- 1 user user 12G /srv/models/AlphaModel-99-Q4.foo\n",
1684 "stderr": "",
1685 },
1686 )
1687 db.finish_run(run_id, "completed")
1688
1689 result = run_one_step(
1690 job_id,
1691 config=config,
1692 db=db,
1693 llm=ScriptedLLM([
1694 LLMResponse(tool_calls=[
1695 ToolCall(
1696 name="record_experiment",
1697 arguments={
1698 "title": "Candidate File Validation",
1699 "status": "measured",
1700 "metric_name": "usable_files_found",
1701 "metric_value": 1,
1702 "metric_unit": "files",
1703 "result": (
1704 "Observed /srv/models/AlphaModel-99-Q4.foo at 12G. "
1705 "AlphaModel-99-Q4.foo is not a 29-byte stub."
1706 ),
1707 "next_action": "Run the next bounded benchmark.",
1708 },
1709 )
1710 ])
1711 ]),
1712 )
1713
1714 assert result.status == "completed"
1715 assert result.tool_name == "record_experiment"
1716 finally:
1717 db.close()
1718
1719
1720def test_record_findings_ignores_generated_step_labels_as_claims(tmp_path):
1721 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1722 db = AgentDB(tmp_path / "state.db")
1723 try:
1724 job_id = db.create_job("Record observed file candidates", title="grounding", kind="generic")
1725 run_id = db.start_run(job_id, model="test")
1726 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
1727 db.finish_step(
1728 step_id,
1729 status="completed",
1730 output_data={
1731 "success": True,
1732 "stdout": "/srv/models/AlphaModel-99-Q4.foo\n",
1733 "stderr": "",
1734 },
1735 )
1736 db.finish_run(run_id, "completed")
1737
1738 result = run_one_step(
1739 job_id,
1740 config=config,
1741 db=db,
1742 llm=ScriptedLLM([
1743 LLMResponse(tool_calls=[
1744 ToolCall(
1745 name="record_findings",
1746 arguments={
1747 "findings": [
1748 {
1749 "name": "Model file candidate located",
1750 "category": "file_candidate",
1751 "location": "/srv/models/AlphaModel-99-Q4.foo",
1752 "evidence_artifact": "step-2678 shell_exec output",
1753 "reason": "Found via step-2678 shell output.",
1754 "status": "candidate",
1755 }
1756 ]
1757 },
1758 )
1759 ])
1760 ]),
1761 )
1762
1763 assert result.status == "completed"
1764 assert result.tool_name == "record_findings"
1765 finally:
1766 db.close()
1767
1768
1769def test_write_artifact_allows_plain_prose_headings_without_evidence(tmp_path):
1770 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1771 db = AgentDB(tmp_path / "state.db")
1772 try:
1773 job_id = db.create_job("Summarize observed evidence", title="artifact-grounding", kind="generic")
1774 run_id = db.start_run(job_id, model="test")
1775 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
1776 db.finish_step(
1777 step_id,
1778 status="completed",
1779 output_data={
1780 "success": True,
1781 "stdout": "Observed status: candidate file exists and benchmark setup is ready for the next measured action.",
1782 "stderr": "",
1783 },
1784 )
1785 db.finish_run(run_id, "completed")
1786
1787 result = run_one_step(
1788 job_id,
1789 config=config,
1790 db=db,
1791 llm=ScriptedLLM([
1792 LLMResponse(tool_calls=[
1793 ToolCall(
1794 name="write_artifact",
1795 arguments={
1796 "title": "Evidence Consolidation Summary",
1797 "content": (
1798 "## Discovered\n"
1799 "The available observations were consolidated into a concise summary.\n\n"
1800 "## Significance\n"
1801 "This output records narrative context only and does not introduce a new model, file, or hardware identifier."
1802 ),
1803 },
1804 )
1805 ])
1806 ]),
1807 )
1808
1809 assert result.status == "completed"
1810 assert result.tool_name == "write_artifact"
1811 finally:
1812 db.close()
1813
1814
1815def test_write_artifact_blocks_unsupported_high_risk_identifier(tmp_path):
1816 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1817 db = AgentDB(tmp_path / "state.db")
1818 try:
1819 job_id = db.create_job("Summarize observed evidence", title="artifact-grounding", kind="generic")
1820 run_id = db.start_run(job_id, model="test")
1821 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
1822 db.finish_step(
1823 step_id,
1824 status="completed",
1825 output_data={
1826 "success": True,
1827 "stdout": "Observed model identifier: AlphaModel-99. No other model identifiers were observed.",
1828 "stderr": "",
1829 },
1830 )
1831 db.finish_run(run_id, "completed")
1832
1833 result = run_one_step(
1834 job_id,
1835 config=config,
1836 db=db,
1837 llm=ScriptedLLM([
1838 LLMResponse(tool_calls=[
1839 ToolCall(
1840 name="write_artifact",
1841 arguments={
1842 "title": "Benchmark Summary",
1843 "content": (
1844 "The observed candidate was AlphaModel-99.\n"
1845 "The final recommendation uses FakeModel-42 for the next benchmark branch."
1846 ),
1847 },
1848 )
1849 ])
1850 ]),
1851 )
1852
1853 assert result.status == "blocked"
1854 assert result.result["error"] == "evidence grounding required"
1855 assert "FakeModel-42" in result.result["evidence_grounding"]["unsupported_tokens"]
1856 finally:
1857 db.close()
1858
1859
1860def test_web_search_auto_records_source_quality(tmp_path):
1861 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1862 db = AgentDB(tmp_path / "state.db")
1863 try:
1864 job_id = db.create_job("Track search sources", title="search-sources", kind="generic")
1865
1866 result = run_one_step(
1867 job_id,
1868 config=config,
1869 db=db,
1870 llm=ScriptedLLM([
1871 LLMResponse(tool_calls=[ToolCall(name="web_search", arguments={"query": "durable progress research"})])
1872 ]),
1873 registry=SearchRegistry(),
1874 )
1875
1876 assert result.status == "completed"
1877 sources = db.get_job(job_id)["metadata"]["source_ledger"]
1878 assert {source["source"] for source in sources} == {
1879 "https://source.example/primary",
1880 "https://source.example/secondary",
1881 }
1882 assert all(source["source_type"] == "web_search" for source in sources)
1883 finally:
1884 db.close()
1885
1886
1887def test_web_extract_auto_records_source_quality(tmp_path):
1888 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1889 db = AgentDB(tmp_path / "state.db")
1890 try:
1891 job_id = db.create_job("Track source quality", title="sources", kind="generic")
1892
1893 result = run_one_step(
1894 job_id,
1895 config=config,
1896 db=db,
1897 llm=ScriptedLLM([
1898 LLMResponse(tool_calls=[ToolCall(name="web_extract", arguments={"urls": ["https://source.example/a"]})])
1899 ]),
1900 registry=ExtractRegistry(),
1901 )
1902
1903 assert result.status == "completed"
1904 sources = db.get_job(job_id)["metadata"]["source_ledger"]
1905 assert {source["source"] for source in sources} == {"https://source.example/a", "https://source.example/b"}
1906 useful = next(source for source in sources if source["source"] == "https://source.example/a")
1907 failed = next(source for source in sources if source["source"] == "https://source.example/b")
1908 assert useful["usefulness_score"] >= 0.55
1909 assert failed["fail_count"] == 1
1910 finally:
1911 db.close()
1912
1913
1914def test_worker_cannot_mark_job_completed_by_default(tmp_path):
1915 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1916 db = AgentDB(tmp_path / "state.db")
1917 try:
1918 job_id = db.create_job("Keep improving forever", title="perpetual", kind="generic")
1919 llm = ScriptedLLM([
1920 LLMResponse(tool_calls=[
1921 ToolCall(
1922 name="update_job_state",
1923 arguments={"status": "completed", "note": "best result saved"},
1924 )
1925 ])
1926 ])
1927
1928 result = run_one_step(job_id, config=config, db=db, llm=llm)
1929 job = db.get_job(job_id)
1930
1931 assert result.status == "completed"
1932 assert result.result["kept_running"] is True
1933 assert job["status"] == "running"
1934 assert job["metadata"]["agent_updates"][-1]["metadata"]["requested_status"] == "completed"
1935 finally:
1936 db.close()
1937
1938
1939def test_report_update_completion_claim_is_rewritten_as_checkpoint(tmp_path):
1940 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1941 db = AgentDB(tmp_path / "state.db")
1942 try:
1943 job_id = db.create_job("Keep improving forever", title="perpetual", kind="generic")
1944 result = run_one_step(
1945 job_id,
1946 config=config,
1947 db=db,
1948 llm=ScriptedLLM([
1949 LLMResponse(tool_calls=[
1950 ToolCall(
1951 name="report_update",
1952 arguments={"message": "Job completed. Best result saved.", "category": "progress"},
1953 )
1954 ])
1955 ]),
1956 )
1957
1958 assert result.status == "completed"
1959 update = db.get_job(job_id)["metadata"]["last_agent_update"]
1960 assert update["message"] == "Checkpoint reported; continuing work. Best result saved."
1961 assert update["metadata"]["rewritten_completion_claim"] is True
1962 assert update["metadata"]["original_message"] == "Job completed. Best result saved."
1963 assert update["metadata"]["follow_up_task"]
1964 tasks = db.get_job(job_id)["metadata"]["task_queue"]
1965 follow_up = next(task for task in tasks if task["key"] == update["metadata"]["follow_up_task"])
1966 assert follow_up["title"] == "Audit latest checkpoint against objective"
1967 assert follow_up["status"] == "open"
1968 assert follow_up["output_contract"] == "decision"
1969 assert follow_up["metadata"]["completion_audit_required"] is True
1970 finally:
1971 db.close()
1972
1973
1974def test_run_one_step_claims_one_message_but_keeps_all_steering_in_prompt(tmp_path):
1975 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
1976 db = AgentDB(tmp_path / "state.db")
1977 try:
1978 job_id = db.create_job("Find durable research findings", title="research", kind="generic")
1979 db.append_operator_message(job_id, "first instruction", source="chat")
1980 db.append_operator_message(job_id, "second instruction", source="chat")
1981 llm = CapturingLLM(LLMResponse(content="No tool this turn."))
1982
1983 result = run_one_step(job_id, config=config, db=db, llm=llm)
1984
1985 assert result.status == "blocked"
1986 assert result.result["error"] == "worker tool call required"
1987 prompt = llm.messages[-1]["content"]
1988 job = db.get_job(job_id)
1989 events = db.list_timeline_events(job_id, limit=30)
1990 assert "first instruction" in prompt
1991 assert "second instruction" in prompt
1992 assert job["metadata"]["operator_messages"][0]["claimed_at"]
1993 assert not job["metadata"]["operator_messages"][1].get("claimed_at")
1994 assert any(event["event_type"] == "loop" and event["title"] == "agent_start" for event in events)
1995 assert any(event["event_type"] == "loop" and event["title"] == "turn_end" for event in events)
1996 finally:
1997 db.close()
1998
1999
2000class FailingLLM:
2001 def next_action(self, *, messages, tools):
2002 del messages, tools
2003 raise RuntimeError("provider returned no choices")
2004
2005
2006class HardProviderFailingLLM:
2007 def next_action(self, *, messages, tools):
2008 del messages, tools
2009 raise LLMResponseError(
2010 "Key limit exceeded (total limit)",
2011 payload={"error": {"message": "Key limit exceeded (total limit)", "code": 403}},
2012 )
2013
2014
2015def test_run_one_step_records_model_failures_instead_of_raising(tmp_path):
2016 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
2017 db = AgentDB(tmp_path / "state.db")
2018 try:
2019 job_id = db.create_job("Keep running despite provider failures", title="provider")
2020
2021 result = run_one_step(job_id, config=config, db=db, llm=FailingLLM())
2022
2023 assert result.status == "failed"
2024 assert result.result["error"] == "provider returned no choices"
2025 assert result.result["duration_seconds"] >= 0
2026 steps = db.list_steps(job_id=job_id)
2027 assert steps[0]["kind"] == "llm"
2028 assert steps[0]["status"] == "failed"
2029 assert steps[0]["error"] == "provider returned no choices"
2030 assert steps[0]["input"]["duration_seconds"] >= 0
2031 assert db.list_runs(job_id)[0]["status"] == "failed"
2032 finally:
2033 db.close()
2034
2035
2036def test_run_one_step_blocks_missing_tool_arguments_as_recoverable(tmp_path):
2037 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
2038 db = AgentDB(tmp_path / "state.db")
2039 try:
2040 job_id = db.create_job("Keep running despite malformed tool calls", title="tool args")
2041 llm = ScriptedLLM([LLMResponse(tool_calls=[ToolCall(name="shell_exec", arguments={})])])
2042
2043 result = run_one_step(job_id, config=config, db=db, llm=llm)
2044
2045 assert result.status == "blocked"
2046 assert result.result["recoverable"] is True
2047 assert result.result["missing_arguments"] == ["command"]
2048 step = db.list_steps(job_id=job_id)[0]
2049 assert step["status"] == "blocked"
2050 assert "missing required arguments" in step["summary"]
2051 assert not step["error"]
2052 finally:
2053 db.close()
2054
2055
2056def test_run_one_step_continues_after_malformed_tool_arguments_when_batch_has_more_work(tmp_path):
2057 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
2058 db = AgentDB(tmp_path / "state.db")
2059 try:
2060 job_id = db.create_job("Keep running through recoverable malformed tool calls", title="tool batch")
2061 llm = ScriptedLLM([
2062 LLMResponse(tool_calls=[
2063 ToolCall(name="shell_exec", arguments={}),
2064 ToolCall(name="record_lesson", arguments={"lesson": "continue with the remaining valid tool call"}),
2065 ])
2066 ])
2067
2068 result = run_one_step(job_id, config=config, db=db, llm=llm)
2069
2070 assert result.tool_name == "record_lesson"
2071 assert result.status == "completed"
2072 tool_steps = [step for step in db.list_steps(job_id=job_id) if step["kind"] == "tool"]
2073 assert [step["tool_name"] for step in tool_steps] == ["shell_exec", "record_lesson"]
2074 assert tool_steps[0]["status"] == "blocked"
2075 assert tool_steps[0]["output"]["error"] == "missing required tool arguments"
2076 assert tool_steps[1]["status"] == "completed"
2077 assert db.list_runs(job_id)[0]["status"] == "completed"
2078 finally:
2079 db.close()
2080
2081
2082def test_run_one_step_continues_after_missing_artifact_when_batch_has_more_work(tmp_path):
2083 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
2084 db = AgentDB(tmp_path / "state.db")
2085 try:
2086 job_id = db.create_job("Recover from invented artifact references", title="artifact batch")
2087 llm = ScriptedLLM([
2088 LLMResponse(tool_calls=[
2089 ToolCall(name="read_artifact", arguments={"artifact_id": "art_missing"}),
2090 ToolCall(name="record_lesson", arguments={"lesson": "search artifacts before reading unknown artifact ids"}),
2091 ])
2092 ])
2093
2094 result = run_one_step(job_id, config=config, db=db, llm=llm)
2095
2096 assert result.tool_name == "record_lesson"
2097 assert result.status == "completed"
2098 tool_steps = [step for step in db.list_steps(job_id=job_id) if step["kind"] == "tool"]
2099 assert [step["tool_name"] for step in tool_steps] == ["read_artifact", "record_lesson"]
2100 assert tool_steps[0]["status"] == "blocked"
2101 assert tool_steps[0]["output"]["recoverable"] is True
2102 assert tool_steps[0]["output"]["error"] == "artifact not found: art_missing"
2103 assert tool_steps[1]["status"] == "completed"
2104 assert db.list_runs(job_id)[0]["status"] == "completed"
2105 finally:
2106 db.close()
2107
2108
2109def test_run_one_step_continues_after_empty_operator_ack_when_batch_has_more_work(tmp_path):
2110 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
2111 db = AgentDB(tmp_path / "state.db")
2112 try:
2113 job_id = db.create_job("Recover from harmless no-op acknowledgements", title="ack batch")
2114 llm = ScriptedLLM([
2115 LLMResponse(tool_calls=[
2116 ToolCall(name="acknowledge_operator_context", arguments={"summary": "already handled"}),
2117 ToolCall(name="record_lesson", arguments={"lesson": "continue with ordinary progress when no operator ack is needed"}),
2118 ])
2119 ])
2120
2121 result = run_one_step(job_id, config=config, db=db, llm=llm)
2122
2123 assert result.tool_name == "record_lesson"
2124 assert result.status == "completed"
2125 tool_steps = [step for step in db.list_steps(job_id=job_id) if step["kind"] == "tool"]
2126 assert [step["tool_name"] for step in tool_steps] == ["acknowledge_operator_context", "record_lesson"]
2127 assert tool_steps[0]["status"] == "blocked"
2128 assert tool_steps[0]["output"]["recoverable"] is True
2129 assert tool_steps[0]["output"]["error"] == "no active operator context to acknowledge"
2130 assert tool_steps[1]["status"] == "completed"
2131 assert db.list_runs(job_id)[0]["status"] == "completed"
2132 finally:
2133 db.close()
2134
2135
2136def test_run_one_step_blocks_placeholder_tool_arguments_as_recoverable(tmp_path):
2137 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
2138 db = AgentDB(tmp_path / "state.db")
2139 try:
2140 job_id = db.create_job("Keep running despite placeholder tool calls", title="placeholder args")
2141 llm = ScriptedLLM([LLMResponse(tool_calls=[ToolCall(name="read_artifact", arguments={"artifact_id": "..."})])])
2142
2143 result = run_one_step(job_id, config=config, db=db, llm=llm)
2144
2145 assert result.status == "blocked"
2146 assert result.result["recoverable"] is True
2147 assert result.result["error"] == "missing required tool arguments"
2148 assert result.result["missing_arguments"] == ["artifact reference"]
2149 step = db.list_steps(job_id=job_id)[0]
2150 assert step["status"] == "blocked"
2151 assert "missing required arguments" in step["summary"]
2152 finally:
2153 db.close()
2154
2155
2156def test_run_one_step_blocks_truncated_optional_reference_arguments(tmp_path):
2157 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
2158 db = AgentDB(tmp_path / "state.db")
2159 try:
2160 job_id = db.create_job("Resolve concrete optional references before recording", title="truncated optional")
2161 llm = ScriptedLLM([
2162 LLMResponse(tool_calls=[
2163 ToolCall(
2164 name="record_experiment",
2165 arguments={
2166 "title": "Validate artifact",
2167 "evidence_artifact": "art_123...",
2168 "next_action": "read the concrete artifact id",
2169 },
2170 )
2171 ])
2172 ])
2173
2174 result = run_one_step(job_id, config=config, db=db, llm=llm)
2175
2176 assert result.status == "blocked"
2177 assert result.result["recoverable"] is True
2178 assert result.result["error"] == "placeholder tool arguments"
2179 assert result.result["placeholder_arguments"] == ["evidence_artifact"]
2180 step = db.list_steps(job_id=job_id)[0]
2181 assert step["status"] == "blocked"
2182 assert "placeholder tool arguments" in step["summary"]
2183 finally:
2184 db.close()
2185
2186
2187def test_run_one_step_blocks_placeholder_shell_command_before_execution(tmp_path):
2188 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
2189 db = AgentDB(tmp_path / "state.db")
2190 try:
2191 job_id = db.create_job("Resolve concrete shell inputs before execution", title="placeholder shell")
2192 llm = ScriptedLLM([
2193 LLMResponse(tool_calls=[ToolCall(name="shell_exec", arguments={"command": "wget http://output/"})])
2194 ])
2195
2196 result = run_one_step(job_id, config=config, db=db, llm=llm)
2197
2198 assert result.status == "blocked"
2199 assert result.result["error"] == "unresolved placeholder in shell command"
2200 assert result.result["placeholder"]["value"] == "http://output/"
2201 assert result.result["recoverable"] is True
2202 step = db.list_steps(job_id=job_id)[0]
2203 assert step["status"] == "blocked"
2204 assert "unresolved placeholder" in step["summary"]
2205 finally:
2206 db.close()
2207
2208
2209def test_run_one_step_blocks_tool_markup_shell_command_before_execution(tmp_path):
2210 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
2211 db = AgentDB(tmp_path / "state.db")
2212 try:
2213 job_id = db.create_job("Reject malformed tool markup before shell execution", title="tool markup shell")
2214 llm = ScriptedLLM([
2215 LLMResponse(tool_calls=[
2216 ToolCall(
2217 name="shell_exec",
2218 arguments={"command": "echo ok\n</parameter> }, {"},
2219 )
2220 ])
2221 ])
2222
2223 result = run_one_step(job_id, config=config, db=db, llm=llm)
2224
2225 assert result.status == "blocked"
2226 assert result.result["error"] == "unresolved placeholder in shell command"
2227 assert result.result["placeholder"]["value"] == "</parameter>"
2228 step = db.list_steps(job_id=job_id)[0]
2229 assert step["status"] == "blocked"
2230 finally:
2231 db.close()
2232
2233
2234def test_run_one_step_blocks_unbalanced_shell_quotes_before_execution(tmp_path):
2235 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
2236 db = AgentDB(tmp_path / "state.db")
2237 try:
2238 job_id = db.create_job("Reject partial shell before execution", title="bad shell syntax")
2239 llm = ScriptedLLM([
2240 LLMResponse(tool_calls=[
2241 ToolCall(
2242 name="shell_exec",
2243 arguments={"command": "echo 'start && ls /tmp"},
2244 )
2245 ])
2246 ])
2247
2248 result = run_one_step(job_id, config=config, db=db, llm=llm)
2249
2250 assert result.status == "blocked"
2251 assert result.result["error"] == "malformed shell command"
2252 assert result.result["recoverable"] is True
2253 assert result.result["syntax"]["kind"] == "shell_syntax"
2254 step = db.list_steps(job_id=job_id)[0]
2255 assert step["status"] == "blocked"
2256 assert "malformed command syntax" in step["summary"]
2257 finally:
2258 db.close()
2259
2260
2261def test_run_one_step_blocks_markdown_fenced_shell_command_before_execution(tmp_path):
2262 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
2263 db = AgentDB(tmp_path / "state.db")
2264 try:
2265 job_id = db.create_job("Reject markdown prose before shell execution", title="markdown shell")
2266 llm = ScriptedLLM([
2267 LLMResponse(tool_calls=[
2268 ToolCall(
2269 name="shell_exec",
2270 arguments={
2271 "command": (
2272 "ls -la /srv/models/model.bin\n\n"
2273 "--- Chapter 2\n\n"
2274 "1. ```shell\n"
2275 " chmod +x /tmp/example\n"
2276 "```"
2277 )
2278 },
2279 )
2280 ])
2281 ])
2282
2283 result = run_one_step(job_id, config=config, db=db, llm=llm)
2284
2285 assert result.status == "blocked"
2286 assert result.result["error"] == "unresolved placeholder in shell command"
2287 assert result.result["placeholder"]["kind"] == "markdown_code_fence"
2288 step = db.list_steps(job_id=job_id)[0]
2289 assert step["status"] == "blocked"
2290 assert "unresolved placeholder" in step["summary"]
2291 finally:
2292 db.close()
2293
2294
2295def test_run_one_step_times_out_stalled_model_call(tmp_path):
2296 config = AppConfig(
2297 runtime=RuntimeConfig(home=tmp_path),
2298 model=ModelConfig(request_timeout_seconds=0.05),
2299 )
2300 db = AgentDB(tmp_path / "state.db")
2301 try:
2302 job_id = db.create_job("Keep daemon moving through stalled model calls", title="provider")
2303
2304 result = run_one_step(job_id, config=config, db=db, llm=HangingLLM())
2305
2306 assert result.status == "failed"
2307 assert "model call timed out" in result.result["error"]
2308 assert result.result["duration_seconds"] >= 0.04
2309 step = db.list_steps(job_id=job_id)[0]
2310 assert step["kind"] == "llm"
2311 assert step["status"] == "failed"
2312 assert step["input"]["duration_seconds"] >= 0.04
2313 finally:
2314 db.close()
2315
2316
2317def test_repeated_model_failures_do_not_create_automatic_defer(tmp_path):
2318 config = AppConfig(
2319 runtime=RuntimeConfig(home=tmp_path),
2320 model=ModelConfig(request_timeout_seconds=120),
2321 )
2322 db = AgentDB(tmp_path / "state.db")
2323 try:
2324 job_id = db.create_job("Keep running through provider instability", title="provider failures")
2325 for _index in range(2):
2326 run_id = db.start_run(job_id, model="test")
2327 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="llm", status="failed")
2328 db.finish_step(
2329 step_id,
2330 status="failed",
2331 summary="model call failed: APITimeoutError",
2332 output_data={"success": False, "error": "Request timed out.", "error_type": "APITimeoutError"},
2333 error="Request timed out.",
2334 )
2335 db.finish_run(run_id, "failed", error="Request timed out.")
2336
2337 result = run_one_step(job_id, config=config, db=db, llm=ExplodingLLM())
2338
2339 assert result.status == "failed"
2340 assert result.tool_name is None
2341 job = db.get_job(job_id)
2342 assert not job["metadata"].get("defer_until")
2343 assert all(step.get("tool_name") != "defer_job" for step in db.list_steps(job_id=job_id))
2344 finally:
2345 db.close()
2346
2347
2348def test_legacy_model_cooldown_metadata_is_ignored(tmp_path):
2349 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
2350 db = AgentDB(tmp_path / "state.db")
2351 try:
2352 job_id = db.create_job("Continue after provider instability", title="provider recovered")
2353 db.update_job_metadata(job_id, {"transient_model_cooldown_streak": 3})
2354 llm = ScriptedLLM([
2355 LLMResponse(tool_calls=[ToolCall(name="report_update", arguments={"message": "provider recovered"})])
2356 ])
2357
2358 result = run_one_step(job_id, config=config, db=db, llm=llm)
2359
2360 assert result.status == "completed"
2361 assert result.tool_name == "report_update"
2362 job = db.get_job(job_id)
2363 assert job["metadata"]["transient_model_cooldown_streak"] == 3
2364 assert "transient_model_recovered_at" not in job["metadata"]
2365 message_end = next(event for event in db.list_events(job_id=job_id, limit=10) if event["event_type"] == "loop" and event["title"] == "message_end")
2366 assert message_end["metadata"]["duration_seconds"] >= 0
2367 finally:
2368 db.close()
2369
2370
2371def test_run_one_step_pauses_job_on_hard_provider_failure(tmp_path):
2372 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
2373 db = AgentDB(tmp_path / "state.db")
2374 try:
2375 job_id = db.create_job("Keep running when provider is configured", title="provider")
2376
2377 result = run_one_step(job_id, config=config, db=db, llm=HardProviderFailingLLM())
2378
2379 assert result.status == "failed"
2380 assert result.result["provider_action_required"] is True
2381 assert result.result["pause_reason"] == "llm_provider_blocked"
2382 job = db.get_job(job_id)
2383 assert job["status"] == "paused"
2384 assert "operator action" in job["metadata"]["last_note"]
2385 assert job["metadata"]["provider_blocked_at"]
2386 events = db.list_events(job_id=job_id, limit=10)
2387 assert any(event["event_type"] == "agent_message" and event["title"] == "error" for event in events)
2388 finally:
2389 db.close()
2390
2391
2392def test_prompt_includes_recent_tool_arguments_and_observations():
2393 job = {"title": "research", "kind": "generic", "objective": "find research"}
2394 steps = [{
2395 "step_no": 7,
2396 "kind": "tool",
2397 "status": "completed",
2398 "tool_name": "web_search",
2399 "summary": "web_search query='target model docs' returned 1 results",
2400 "input": {"arguments": {"query": "target model docs", "limit": 5}},
2401 "output": {"query": "target model docs", "results": [{"title": "Target Docs", "url": "https://example.com"}]},
2402 }]
2403
2404 messages = build_messages(job, steps)
2405
2406 content = messages[-1]["content"]
2407 assert "target model docs" in content
2408 assert "Target Docs <https://example.com>" in content
2409 assert "do not search the same query again" in content
2410 assert "shell_exec runs on the machine hosting this Nipux worker" in content
2411 assert str(Path.cwd()) not in content
2412 assert "read_artifact is only for those saved outputs" in content
2413
2414
2415def test_prompt_recovers_from_missing_artifact_reference():
2416 job = {"title": "artifact recovery", "kind": "generic", "objective": "use saved evidence"}
2417 steps = [{
2418 "step_no": 12,
2419 "kind": "tool",
2420 "status": "failed",
2421 "tool_name": "read_artifact",
2422 "summary": "read_artifact failed: artifact not found: art_missing",
2423 "input": {"arguments": {"artifact_id": "art_missing"}},
2424 "output": {
2425 "success": False,
2426 "error": "artifact not found: art_missing",
2427 "guidance": "Use one of the recent_artifacts refs, call search_artifacts, or continue from evidence.",
2428 "recent_artifacts": [{"number": "1", "id": "art_real", "title": "Real Evidence"}],
2429 },
2430 }]
2431
2432 messages = build_messages(job, steps)
2433 content = messages[-1]["content"]
2434
2435 assert "valid_recent_artifacts=art_real=Real Evidence" in content
2436 assert "Do not invent or retry artifact ids" in content
2437 assert "search_artifacts" in content
2438
2439
2440def test_prompt_does_not_inject_local_ssh_alias_context(monkeypatch, tmp_path):
2441 monkeypatch.setenv("HOME", str(tmp_path))
2442 ssh_dir = tmp_path / ".ssh"
2443 ssh_dir.mkdir()
2444 (ssh_dir / "config").write_text("Host remote-box\n HostName 100.64.0.1\n User operator\n", encoding="utf-8")
2445 job = {"title": "remote work", "kind": "generic", "objective": "benchmark remote target"}
2446
2447 messages = build_messages(job, [])
2448
2449 content = messages[-1]["content"]
2450 assert "Local CLI context:" not in content
2451 assert "100.64.0.1" not in content
2452 assert "remote-box ->" not in content
2453
2454
2455def test_prompt_includes_operator_steering_messages():
2456 job = {
2457 "title": "research",
2458 "kind": "generic",
2459 "objective": "find research",
2460 "metadata": {
2461 "operator_messages": [{
2462 "at": "2026-04-24T20:40:00+00:00",
2463 "source": "shell",
2464 "message": "Focus on actual strong evidence sources, not competing irrelevant sources.",
2465 }],
2466 },
2467 }
2468
2469 messages = build_messages(job, [])
2470
2471 assert "Operator context:" in messages[-1]["content"]
2472 assert "Focus on actual strong evidence sources" in messages[-1]["content"]
2473
2474
2475def test_prompt_keeps_claimed_operator_context_until_acknowledged(tmp_path):
2476 db = AgentDB(tmp_path / "state.db")
2477 try:
2478 job_id = db.create_job("Find durable research findings", title="research", kind="generic")
2479 entry = db.append_operator_message(job_id, "use the corrected target from chat", source="chat")
2480 claimed = db.claim_operator_messages(job_id, modes=("steer",), limit=1)
2481 assert claimed[0]["event_id"] == entry["event_id"]
2482
2483 job = db.get_job(job_id)
2484 messages = build_messages(job, [], include_unclaimed_operator_messages=False)
2485 content = messages[-1]["content"]
2486
2487 assert "Operator context:" in content
2488 assert "use the corrected target from chat" in content
2489 assert "delivered" in content
2490
2491 db.acknowledge_operator_messages(job_id, message_ids=[entry["event_id"]], summary="incorporated correction")
2492 job = db.get_job(job_id)
2493 messages = build_messages(job, [], include_unclaimed_operator_messages=False)
2494
2495 assert "use the corrected target from chat" not in messages[-1]["content"]
2496 finally:
2497 db.close()
2498
2499
2500def test_prompt_keeps_unclaimed_steering_but_not_followup_until_claimed(tmp_path):
2501 db = AgentDB(tmp_path / "state.db")
2502 try:
2503 job_id = db.create_job("Find durable research findings", title="research", kind="generic")
2504 db.append_operator_message(job_id, "use the corrected target from chat", source="chat", mode="steer")
2505 db.append_operator_message(job_id, "after this branch settles, write a recap", source="chat", mode="follow_up")
2506
2507 job = db.get_job(job_id)
2508 content = build_messages(job, [], include_unclaimed_operator_messages=True)[-1]["content"]
2509
2510 assert "use the corrected target from chat" in content
2511 assert "after this branch settles" not in content
2512 finally:
2513 db.close()
2514
2515
2516def test_prompt_includes_context_pressure_constraint():
2517 job = {
2518 "title": "context pressure",
2519 "kind": "generic",
2520 "objective": "keep a long-running job stable",
2521 "metadata": {
2522 "context_pressure": {
2523 "band": "high",
2524 "prompt_tokens": 8_600,
2525 "context_length": 10_000,
2526 "fraction": 0.86,
2527 }
2528 },
2529 }
2530
2531 content = build_messages(job, [])[-1]["content"]
2532
2533 assert "Context pressure:" in content
2534 assert "Context pressure is high" in content
2535 assert "8.6K/10.0K" in content
2536 assert "artifact references" in content
2537
2538
2539def test_prompt_includes_cumulative_usage_pressure():
2540 job = {
2541 "title": "usage pressure",
2542 "kind": "generic",
2543 "objective": "keep a long-running job useful",
2544 "metadata": {
2545 "finding_ledger": [{"name": "durable fact"}],
2546 "source_ledger": [{"source": "local evidence"}],
2547 "experiment_ledger": [{"title": "trial", "metric_value": 1}],
2548 "task_queue": [{"title": "done branch", "status": "done", "result": "validated"}],
2549 },
2550 }
2551
2552 content = build_messages(
2553 job,
2554 [],
2555 token_usage={
2556 "calls": 2_100,
2557 "prompt_tokens": 21_000_000,
2558 "completion_tokens": 1_000_000,
2559 "total_tokens": 22_000_000,
2560 "latest_prompt_tokens": 10_000,
2561 "latest_context_length": 262_144,
2562 "cost": 10.25,
2563 "has_cost": True,
2564 },
2565 )[-1]["content"]
2566
2567 assert "Usage pressure:" in content
2568 assert "Cumulative model usage pressure is critical" in content
2569 assert "calls=2100" in content
2570 assert "tokens=22.0M" in content
2571 assert "cost=$10.2500" in content
2572 assert "high leverage" in content
2573
2574
2575def test_prompt_renders_task_contract_from_metadata_for_existing_tasks():
2576 job = {
2577 "title": "contract fallback",
2578 "kind": "generic",
2579 "objective": "keep existing task contracts visible",
2580 "metadata": {
2581 "task_queue": [
2582 {
2583 "title": "Validate concrete candidate",
2584 "status": "active",
2585 "priority": 9,
2586 "metadata": {"output_contract": "action"},
2587 "acceptance_criteria": "candidate tested",
2588 }
2589 ],
2590 },
2591 }
2592
2593 content = build_messages(job, [])[-1]["content"]
2594
2595 assert "Task queue:" in content
2596 assert "Validate concrete candidate" in content
2597 assert "contract=action" in content
2598
2599
2600def test_prompt_keeps_persistent_task_backlog_pressure_visible():
2601 job = {
2602 "title": "persistent backlog pressure",
2603 "kind": "generic",
2604 "objective": "keep a long-running job focused",
2605 "metadata": {
2606 "task_backlog_pressure": {
2607 "reason": "total task queue is too large",
2608 "open_count": 42,
2609 "total_count": 81,
2610 "guard_recovery": {
2611 "latest_step_no": 123,
2612 "task_queue": {"open_titles": ["Existing branch"]},
2613 },
2614 },
2615 "task_queue": [
2616 {"title": f"Existing branch {index}", "status": "open", "priority": 9}
2617 for index in range(81)
2618 ],
2619 },
2620 }
2621
2622 content = build_messages(job, [])[-1]["content"]
2623
2624 assert "Task queue saturation:" in content
2625 assert "Task backlog pressure remains active from guard recovery #123" in content
2626 assert "open_tasks=81" in content
2627 assert "total_tasks=81" in content
2628 assert "Do not create new task branches" in content
2629
2630
2631def test_prompt_shows_current_task_backlog_pressure_without_prior_block():
2632 job = {
2633 "title": "large backlog",
2634 "kind": "generic",
2635 "objective": "execute a broad long-running job",
2636 "metadata": {
2637 "task_queue": [
2638 {"title": f"Existing branch {index}", "status": "done", "priority": index}
2639 for index in range(81)
2640 ],
2641 },
2642 }
2643
2644 content = build_messages(job, [])[-1]["content"]
2645
2646 assert "Task queue saturation:" in content
2647 assert "Task backlog pressure remains active from current queue #current" in content
2648 assert "total_tasks=81" in content
2649 assert "Do not create new task branches" in content
2650
2651
2652def test_prompt_ignores_stale_task_backlog_pressure_after_queue_is_cleaned_up():
2653 job = {
2654 "title": "clean backlog",
2655 "kind": "generic",
2656 "objective": "execute a focused long-running job",
2657 "metadata": {
2658 "task_backlog_pressure": {
2659 "reason": "total task queue is too large",
2660 "open_count": 42,
2661 "total_count": 80,
2662 "latest_step_no": 123,
2663 "source": "blocked_record_tasks",
2664 },
2665 "task_queue": [
2666 {"title": "Focused branch", "status": "active", "priority": 9},
2667 ],
2668 },
2669 }
2670
2671 content = build_messages(job, [])[-1]["content"]
2672
2673 assert "Task backlog pressure remains active" not in content
2674 assert "Task queue saturation:\nNone." in content
2675
2676
2677def test_run_one_step_clears_stale_task_backlog_pressure(tmp_path):
2678 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
2679 db = AgentDB(tmp_path / "state.db")
2680 try:
2681 job_id = db.create_job(
2682 "Continue after backlog cleanup",
2683 title="cleaned-backlog",
2684 kind="generic",
2685 metadata={
2686 "task_backlog_pressure": {
2687 "reason": "total task queue is too large",
2688 "open_count": 42,
2689 "total_count": 80,
2690 "source": "blocked_record_tasks",
2691 },
2692 "task_queue": [
2693 {"title": "Focused branch", "status": "active", "priority": 9},
2694 ],
2695 },
2696 )
2697 llm = CapturingLLM(LLMResponse(tool_calls=[ToolCall(name="record_lesson", arguments={"lesson": "continue focused work"})]))
2698
2699 result = run_one_step(job_id, config=config, db=db, llm=llm)
2700
2701 assert result.status == "completed"
2702 job = db.get_job(job_id)
2703 assert job["metadata"]["task_backlog_pressure"] == {}
2704 assert "Task queue saturation:\nNone." in llm.messages[-1]["content"]
2705 assert any(
2706 event["event_type"] == "agent_message"
2707 and event["title"] == "progress"
2708 and "Task backlog pressure cleared" in event["body"]
2709 for event in db.list_events(job_id=job_id, limit=20)
2710 )
2711 finally:
2712 db.close()
2713
2714
2715def test_run_one_step_records_usage_pressure_without_spam(tmp_path):
2716 config = AppConfig(runtime=RuntimeConfig(home=tmp_path), model=ModelConfig(context_length=10_000_000))
2717 db = AgentDB(tmp_path / "state.db")
2718 try:
2719 job_id = db.create_job("Keep a long-running task efficient", title="usage pressure", kind="generic")
2720 llm = ScriptedLLM([
2721 LLMResponse(
2722 tool_calls=[ToolCall(name="record_lesson", arguments={"lesson": "consolidate before spending more", "category": "strategy"})],
2723 usage={"prompt_tokens": 1_100_000, "completion_tokens": 100, "total_tokens": 1_100_100, "cost": 1.1},
2724 ),
2725 LLMResponse(
2726 tool_calls=[ToolCall(name="record_lesson", arguments={"lesson": "second consolidation", "category": "strategy"})],
2727 usage={"prompt_tokens": 300_000, "completion_tokens": 100, "total_tokens": 300_100, "cost": 0.3},
2728 ),
2729 ])
2730
2731 run_one_step(job_id, config=config, db=db, llm=llm)
2732 run_one_step(job_id, config=config, db=db, llm=llm)
2733
2734 pressure_events = [
2735 event
2736 for event in db.list_events(job_id=job_id, event_types=["agent_message"])
2737 if event["metadata"].get("kind") == "usage_pressure"
2738 ]
2739 assert len(pressure_events) == 1
2740 assert "Usage pressure watch" in pressure_events[0]["body"]
2741 job = db.get_job(job_id)
2742 pressure = job["metadata"]["usage_pressure"]
2743 assert pressure["band"] == "watch"
2744 assert pressure["calls"] == 2
2745 assert pressure["total_tokens"] == 1_400_200
2746 finally:
2747 db.close()
2748
2749
2750def test_critical_usage_does_not_create_automatic_defer(tmp_path):
2751 config = AppConfig(runtime=RuntimeConfig(home=tmp_path), model=ModelConfig(context_length=262_144))
2752 db = AgentDB(tmp_path / "state.db")
2753 try:
2754 job_id = db.create_job("Keep a long-running task efficient", title="usage pressure", kind="generic")
2755 db.append_event(
2756 job_id,
2757 event_type="loop",
2758 title="message_end",
2759 metadata={"usage": {"prompt_tokens": 21_000_000, "completion_tokens": 10_000, "total_tokens": 21_010_000, "cost": 11.0}},
2760 )
2761 llm = ScriptedLLM([
2762 LLMResponse(tool_calls=[ToolCall(name="record_lesson", arguments={"lesson": "keep useful work moving", "category": "strategy"})])
2763 ])
2764
2765 result = run_one_step(job_id, config=config, db=db, llm=llm)
2766
2767 assert result.status == "completed"
2768 assert result.tool_name == "record_lesson"
2769 job = db.get_job(job_id)
2770 assert not job["metadata"].get("defer_until")
2771 assert "usage_pressure_circuit_breaker" not in job["metadata"]
2772 finally:
2773 db.close()
2774
2775
2776def test_prompt_ignores_legacy_usage_pressure_recovery_metadata():
2777 job = {
2778 "title": "usage recovery",
2779 "kind": "generic",
2780 "objective": "Keep long-running work efficient.",
2781 "metadata": {
2782 "usage_pressure_circuit_breaker": {
2783 "latest_step_no": 12,
2784 "streak": 2,
2785 "calls": 2200,
2786 "total_tokens": 25_000_000,
2787 "cost": 12.5,
2788 "has_cost": True,
2789 },
2790 "task_queue": [{"title": "Focused task", "status": "active", "priority": 9}],
2791 },
2792 }
2793 steps = [
2794 {"step_no": 13, "kind": "recovery", "status": "completed", "tool_name": "defer_job", "summary": "legacy cooldown"},
2795 {"step_no": 14, "kind": "tool", "status": "blocked", "tool_name": "web_search", "summary": "blocked search"},
2796 ]
2797
2798 content = build_messages(job, steps)[-1]["content"]
2799
2800 assert "Usage pressure:" in content
2801 assert "Usage pressure recovery" not in content
2802 assert "cooldown is still unresolved" not in content
2803
2804
2805def test_run_one_step_pauses_when_configured_cost_limit_is_reached(tmp_path):
2806 config = AppConfig(
2807 runtime=RuntimeConfig(home=tmp_path, max_job_cost_usd=5.0),
2808 model=ModelConfig(context_length=262_144),
2809 )
2810 db = AgentDB(tmp_path / "state.db")
2811 try:
2812 job_id = db.create_job("Keep a long-running task inside budget", title="budget limit", kind="generic")
2813 db.append_event(
2814 job_id,
2815 event_type="loop",
2816 title="message_end",
2817 metadata={"usage": {"prompt_tokens": 1_000_000, "completion_tokens": 10_000, "total_tokens": 1_010_000, "cost": 5.25}},
2818 )
2819
2820 result = run_one_step(job_id, config=config, db=db, llm=ExplodingLLM())
2821
2822 assert result.status == "completed"
2823 assert result.tool_name == "budget_limit"
2824 assert result.result["paused"] is True
2825 assert result.result["cost"] == 5.25
2826 job = db.get_job(job_id)
2827 assert job["status"] == "paused"
2828 assert job["metadata"]["usage_budget_limit"]["limit"] == 5.0
2829 assert "configured model cost limit" in job["metadata"]["last_note"]
2830 finally:
2831 db.close()
2832
2833
2834def test_run_one_step_ignores_cost_limit_without_provider_cost_metadata(tmp_path):
2835 config = AppConfig(
2836 runtime=RuntimeConfig(home=tmp_path, max_job_cost_usd=5.0),
2837 model=ModelConfig(context_length=262_144),
2838 )
2839 db = AgentDB(tmp_path / "state.db")
2840 try:
2841 job_id = db.create_job("Keep a long-running task inside budget", title="budget estimate", kind="generic")
2842 db.append_event(
2843 job_id,
2844 event_type="loop",
2845 title="message_end",
2846 metadata={"usage": {"prompt_tokens": 1_000_000, "completion_tokens": 10_000, "total_tokens": 1_010_000}},
2847 )
2848 llm = ScriptedLLM([
2849 LLMResponse(tool_calls=[ToolCall(name="report_update", arguments={"message": "cost not provider reported"})])
2850 ])
2851
2852 result = run_one_step(job_id, config=config, db=db, llm=llm)
2853
2854 assert result.status == "completed"
2855 assert result.tool_name == "report_update"
2856 assert db.get_job(job_id)["status"] == "running"
2857 finally:
2858 db.close()
2859
2860
2861def test_run_one_step_does_not_defer_critical_usage_after_progress(tmp_path):
2862 config = AppConfig(runtime=RuntimeConfig(home=tmp_path), model=ModelConfig(context_length=262_144))
2863 db = AgentDB(tmp_path / "state.db")
2864 try:
2865 job_id = db.create_job("Keep a long-running task efficient", title="usage progress", kind="generic")
2866 db.append_event(
2867 job_id,
2868 event_type="loop",
2869 title="message_end",
2870 metadata={"usage": {"prompt_tokens": 21_000_000, "completion_tokens": 10_000, "total_tokens": 21_010_000, "cost": 11.0}},
2871 )
2872 for error_type in ["APITimeoutError", "APITimeoutError"]:
2873 run_id = db.start_run(job_id, model="test")
2874 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="llm", status="failed")
2875 db.finish_step(
2876 step_id,
2877 status="failed",
2878 summary=f"model call failed: {error_type}",
2879 output_data={"success": False, "error": "timeout", "error_type": error_type},
2880 error="timeout",
2881 )
2882 db.finish_run(run_id, "failed", error="timeout")
2883 run_id = db.start_run(job_id, model="test")
2884 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_experiment")
2885 db.finish_step(
2886 step_id,
2887 status="completed",
2888 summary="recorded measured result",
2889 output_data={
2890 "success": True,
2891 "experiment": {
2892 "title": "measured result",
2893 "status": "measured",
2894 "metric_name": "score",
2895 "metric_value": 1.0,
2896 },
2897 },
2898 )
2899 db.finish_run(run_id, "completed")
2900 llm = ScriptedLLM([
2901 LLMResponse(tool_calls=[ToolCall(name="report_update", arguments={"message": "continuing from measured progress"})])
2902 ])
2903
2904 result = run_one_step(job_id, config=config, db=db, llm=llm)
2905
2906 assert result.status == "completed"
2907 assert result.tool_name == "report_update"
2908 finally:
2909 db.close()
2910
2911
2912def test_run_one_step_drops_conversation_only_chat_from_worker_prompt(tmp_path):
2913 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
2914 db = AgentDB(tmp_path / "state.db")
2915 try:
2916 job_id = db.create_job("Keep improving a generic task", title="context", kind="generic")
2917 chat = db.append_operator_message(job_id, "hello", source="chat")
2918 correction = db.append_operator_message(job_id, "use the corrected target from chat", source="chat")
2919 llm = CapturingLLM(
2920 LLMResponse(tool_calls=[ToolCall(name="report_update", arguments={"message": "noted", "category": "progress"})])
2921 )
2922
2923 run_one_step(job_id, config=config, db=db, llm=llm)
2924
2925 content = llm.messages[-1]["content"]
2926 assert "hello" not in content
2927 assert "use the corrected target from chat" in content
2928 job = db.get_job(job_id)
2929 messages = {entry["event_id"]: entry for entry in job["metadata"]["operator_messages"]}
2930 assert messages[chat["event_id"]]["acknowledged_at"]
2931 assert messages[correction["event_id"]]["claimed_at"]
2932 assert not messages[correction["event_id"]].get("acknowledged_at")
2933 finally:
2934 db.close()
2935
2936
2937def test_build_messages_keeps_generic_context_under_budget():
2938 job = {
2939 "title": "large context",
2940 "kind": "generic",
2941 "objective": "Improve a measurable process without looping.",
2942 "metadata": {
2943 "operator_messages": [
2944 {"event_id": "chat", "mode": "steer", "message": "how is it going?"},
2945 {"event_id": "use", "mode": "steer", "message": "use the corrected target from chat"},
2946 ],
2947 "lessons": [{"category": "memory", "lesson": "lesson " + "x" * 700} for _ in range(30)],
2948 "task_queue": [
2949 {
2950 "title": f"Task {index}",
2951 "status": "open" if index % 3 else "done",
2952 "priority": index,
2953 "output_contract": "experiment",
2954 "acceptance_criteria": "accept " + "x" * 500,
2955 "evidence_needed": "evidence " + "x" * 500,
2956 "stall_behavior": "stall " + "x" * 500,
2957 }
2958 for index in range(40)
2959 ],
2960 "finding_ledger": [{"name": f"Finding {index}", "category": "generic", "score": index} for index in range(200)],
2961 "source_ledger": [
2962 {
2963 "source": f"https://source{index}.example",
2964 "source_type": "web",
2965 "usefulness_score": index / 100,
2966 "yield_count": index % 4,
2967 "fail_count": index % 3,
2968 "last_outcome": "outcome " + "x" * 500,
2969 }
2970 for index in range(90)
2971 ],
2972 "experiment_ledger": [
2973 {
2974 "title": f"Experiment {index}",
2975 "status": "measured",
2976 "metric_name": "score",
2977 "metric_value": index,
2978 "metric_unit": "units",
2979 "best_observed": index in {38, 39},
2980 "result": "result " + "x" * 600,
2981 "next_action": "next " + "x" * 600,
2982 }
2983 for index in range(40)
2984 ],
2985 "reflections": [{"summary": "summary " + "x" * 800, "strategy": "strategy " + "x" * 800} for _ in range(20)],
2986 },
2987 }
2988 steps = [
2989 {
2990 "step_no": index,
2991 "kind": "tool",
2992 "status": "completed",
2993 "tool_name": "shell_exec",
2994 "summary": "summary " + "x" * 800,
2995 "input": {"arguments": {"command": "command " + "x" * 800}},
2996 "output": {"success": True, "command": "command", "returncode": 0, "stdout": "stdout " + "x" * 3000},
2997 }
2998 for index in range(30)
2999 ]
3000 memory_entries = [{"key": "rolling_state", "summary": "memory " + "x" * 20000, "artifact_refs": [f"art_{i}" for i in range(40)]}]
3001 timeline = [{"event_type": "tool_result", "title": "event", "body": "body " + "x" * 900} for _ in range(40)]
3002
3003 messages = build_messages(job, steps, memory_entries=memory_entries, timeline_events=timeline)
3004 content = messages[-1]["content"]
3005
3006 assert "use the corrected target from chat" in content
3007 assert "how is it going" not in content
3008 assert len(content) < MAX_WORKER_PROMPT_CHARS
3009 assert "Next-action constraint:" in content
3010
3011
3012def test_prompt_timeline_filters_low_signal_tool_noise():
3013 job = {
3014 "title": "timeline",
3015 "kind": "generic",
3016 "objective": "keep useful context visible",
3017 "metadata": {},
3018 }
3019 timeline = [
3020 {
3021 "event_type": "tool_result",
3022 "title": "web_search",
3023 "body": f"search noise {index}",
3024 "metadata": {"status": "completed"},
3025 "created_at": f"2026-05-01T12:{index:02d}:00+00:00",
3026 }
3027 for index in range(20)
3028 ]
3029 timeline.extend([
3030 {
3031 "event_type": "artifact",
3032 "title": "Saved durable report",
3033 "body": "operator-visible output",
3034 "metadata": {},
3035 "created_at": "2026-05-01T13:00:00+00:00",
3036 },
3037 {
3038 "event_type": "finding",
3039 "title": "Useful durable finding",
3040 "body": "result worth keeping",
3041 "metadata": {},
3042 "created_at": "2026-05-01T13:01:00+00:00",
3043 },
3044 {
3045 "event_type": "tool_result",
3046 "title": "shell_exec",
3047 "body": "command failed with actionable blocker",
3048 "metadata": {"status": "failed"},
3049 "created_at": "2026-05-01T13:02:00+00:00",
3050 },
3051 ])
3052
3053 content = build_messages(job, [], timeline_events=timeline)[-1]["content"]
3054
3055 assert "Recent visible timeline:" in content
3056 assert "High-signal timeline counts:" in content
3057 assert "Saved durable report" in content
3058 assert "Useful durable finding" in content
3059 assert "command failed with actionable blocker" in content
3060 assert "search noise" not in content
3061
3062
3063def test_prompt_includes_durable_outcome_summary():
3064 job = {
3065 "title": "outcomes",
3066 "kind": "generic",
3067 "objective": "keep useful durable progress visible",
3068 "metadata": {},
3069 }
3070 events = [
3071 {
3072 "event_type": "artifact",
3073 "title": "Draft checkpoint",
3074 "body": "",
3075 "metadata": {},
3076 },
3077 {
3078 "event_type": "finding",
3079 "title": "Reusable finding",
3080 "body": "",
3081 "metadata": {},
3082 },
3083 {
3084 "event_type": "experiment",
3085 "title": "Quality check",
3086 "body": "",
3087 "metadata": {"metric_name": "score", "metric_value": 0.82, "metric_unit": ""},
3088 },
3089 {
3090 "event_type": "tool_result",
3091 "title": "web_search",
3092 "body": "web_search query='background' returned 5 results",
3093 "metadata": {"status": "completed"},
3094 },
3095 ]
3096
3097 content = build_messages(job, [], timeline_events=events)[-1]["content"]
3098 outcome_section = content.split("Durable outcomes:", 1)[1].split("Ledgers:", 1)[0]
3099
3100 assert "Outcome counts: 1 outputs 1 findings 1 measurements." in outcome_section
3101 assert "save: Draft checkpoint" in outcome_section
3102 assert "find: Reusable finding" in outcome_section
3103 assert "test: Quality check" in outcome_section
3104 assert "background" not in outcome_section
3105
3106
3107def test_emergency_prompt_clipping_repeats_operator_and_next_action():
3108 job = {"title": "clip", "kind": "generic", "objective": "keep context safe"}
3109 sections = [(f"Noise {index}", "noise " * 2000) for index in range(90)]
3110 sections.insert(45, ("Operator context", "Still-active durable operator context: use the corrected target."))
3111 sections.append(("Next-action constraint", "Next use the validated branch."))
3112
3113 content = _render_worker_prompt(job, sections=sections)
3114
3115 assert len(content) <= MAX_WORKER_PROMPT_CHARS
3116 assert "middle context clipped" in content
3117 suffix = content.split("middle context clipped", 1)[1]
3118 assert "Operator context:" in suffix
3119 assert "use the corrected target" in suffix
3120 assert "Next-action constraint:" in suffix
3121 assert "Next use the validated branch" in suffix
3122
3123
3124def test_build_messages_keeps_rolling_memory_when_not_first():
3125 job = {"title": "memory order", "kind": "generic", "objective": "keep long-running context stable"}
3126 memory_entries = [
3127 {"key": "newer_note", "summary": "newer side note"},
3128 {"key": "other_note", "summary": "less important side note"},
3129 {"key": "rolling_state", "summary": "durable rolling state with usage and task progress"},
3130 ]
3131
3132 content = build_messages(job, [], memory_entries=memory_entries)[-1]["content"]
3133
3134 assert "durable rolling state with usage and task progress" in content
3135 assert "newer side note" in content
3136 assert "less important side note" not in content
3137
3138
3139def test_build_messages_surfaces_recent_measurement_evidence_outside_state_window():
3140 job = {"title": "measure", "kind": "generic", "objective": "improve a measurable process", "metadata": {}}
3141 recent_steps = [
3142 {
3143 "step_no": 1,
3144 "kind": "tool",
3145 "status": "completed",
3146 "tool_name": "shell_exec",
3147 "input": {"arguments": {"command": "run benchmark"}},
3148 "output": {
3149 "success": True,
3150 "stdout": (
3151 "| model | test | t/s |\n"
3152 "| --- | ---: | ---: |\n"
3153 "| example | pp32 | 5.48 ± 0.11 |\n"
3154 "| example | tg128 | 3.44 ± 0.05 |\n"
3155 ),
3156 },
3157 },
3158 *[
3159 {
3160 "step_no": index,
3161 "kind": "tool",
3162 "status": "completed",
3163 "tool_name": "record_lesson",
3164 "summary": f"later step {index}",
3165 "input": {},
3166 "output": {},
3167 }
3168 for index in range(2, 12)
3169 ],
3170 ]
3171
3172 content = build_messages(job, recent_steps)[-1]["content"]
3173
3174 assert "Recent measurement evidence:" in content
3175 assert "step #1 completed" in content
3176 assert "pp32 5.48 ± 0.11 t/s" in content
3177 assert "tg128 3.44 ± 0.05 t/s" in content
3178
3179
3180def test_measurement_obligation_blocks_research_until_recorded(tmp_path):
3181 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
3182 db = AgentDB(tmp_path / "state.db")
3183 try:
3184 job_id = db.create_job("Improve a measurable process", title="measure", kind="generic")
3185
3186 first = run_one_step(
3187 job_id,
3188 config=config,
3189 db=db,
3190 llm=ScriptedLLM([LLMResponse(tool_calls=[ToolCall(name="shell_exec", arguments={"command": "run test"})])]),
3191 registry=MeasuredShellRegistry(),
3192 )
3193 job = db.get_job(job_id)
3194 assert first.tool_name == "shell_exec"
3195 assert job["metadata"]["pending_measurement_obligation"]["metric_candidates"]
3196
3197 second = run_one_step(
3198 job_id,
3199 config=config,
3200 db=db,
3201 llm=ScriptedLLM([LLMResponse(tool_calls=[ToolCall(name="web_search", arguments={"query": "more notes"})])]),
3202 registry=MeasuredShellRegistry(),
3203 )
3204 assert second.status == "blocked"
3205 assert second.result["error"] == "measurement obligation pending"
3206
3207 third = run_one_step(
3208 job_id,
3209 config=config,
3210 db=db,
3211 llm=ScriptedLLM([
3212 LLMResponse(tool_calls=[
3213 ToolCall(
3214 name="record_experiment",
3215 arguments={
3216 "title": "measured trial",
3217 "status": "measured",
3218 "metric_name": "score",
3219 "metric_value": 2.7,
3220 "metric_unit": "units/s",
3221 "next_action": "compare the next concrete variant",
3222 },
3223 )
3224 ])
3225 ]),
3226 )
3227 job = db.get_job(job_id)
3228 assert third.tool_name == "record_experiment"
3229 assert job["metadata"].get("pending_measurement_obligation") == {}
3230 assert job["metadata"]["experiment_ledger"][0]["metric_value"] == 2.7
3231 finally:
3232 db.close()
3233
3234
3235def test_measurement_obligation_preserves_table_metric_candidates(tmp_path):
3236 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
3237 db = AgentDB(tmp_path / "state.db")
3238 try:
3239 job_id = db.create_job("Improve a measurable process", title="measure-table", kind="generic")
3240
3241 step = run_one_step(
3242 job_id,
3243 config=config,
3244 db=db,
3245 llm=ScriptedLLM([LLMResponse(tool_calls=[ToolCall(name="shell_exec", arguments={"command": "run benchmark"})])]),
3246 registry=TableBenchmarkShellRegistry(),
3247 )
3248
3249 job = db.get_job(job_id)
3250 candidates = job["metadata"]["pending_measurement_obligation"]["metric_candidates"]
3251 assert step.tool_name == "shell_exec"
3252 assert "pp32 5.48 ± 0.11 t/s" in candidates
3253 assert "tg128 3.44 ± 0.05 t/s" in candidates
3254 finally:
3255 db.close()
3256
3257
3258def test_failed_shell_measurement_output_still_requires_accounting(tmp_path):
3259 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
3260 db = AgentDB(tmp_path / "state.db")
3261 try:
3262 job_id = db.create_job("Improve a measurable process", title="measure-failed-table", kind="generic")
3263
3264 step = run_one_step(
3265 job_id,
3266 config=config,
3267 db=db,
3268 llm=ScriptedLLM([LLMResponse(tool_calls=[ToolCall(name="shell_exec", arguments={"command": "run benchmark"})])]),
3269 registry=FailedTableBenchmarkShellRegistry(),
3270 )
3271
3272 job = db.get_job(job_id)
3273 candidates = job["metadata"]["pending_measurement_obligation"]["metric_candidates"]
3274 assert step.status == "failed"
3275 assert "pp32 5.48 ± 0.11 t/s" in candidates
3276 finally:
3277 db.close()
3278
3279
3280def test_measurement_obligation_blocks_operator_acknowledgement_churn(tmp_path):
3281 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
3282 db = AgentDB(tmp_path / "state.db")
3283 try:
3284 job_id = db.create_job("Improve a measurable process", title="measure-ack", kind="generic")
3285 db.update_job_metadata(
3286 job_id,
3287 {
3288 "pending_measurement_obligation": {
3289 "source_step_no": 12,
3290 "tool": "shell_exec",
3291 "metric_candidates": ["2.7 tok/s"],
3292 "command": "run benchmark",
3293 }
3294 },
3295 )
3296
3297 result = run_one_step(
3298 job_id,
3299 config=config,
3300 db=db,
3301 llm=ScriptedLLM([
3302 LLMResponse(tool_calls=[
3303 ToolCall(
3304 name="acknowledge_operator_context",
3305 arguments={"message_ids": [], "summary": "acknowledged"},
3306 )
3307 ])
3308 ]),
3309 )
3310
3311 assert result.status == "blocked"
3312 assert result.tool_name == "acknowledge_operator_context"
3313 assert result.result["error"] == "measurement obligation pending"
3314 finally:
3315 db.close()
3316
3317
3318def test_pending_measurement_narrows_available_tools(tmp_path):
3319 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
3320 db = AgentDB(tmp_path / "state.db")
3321 try:
3322 job_id = db.create_job("Improve a measurable process", title="measure-tools", kind="generic")
3323 db.update_job_metadata(
3324 job_id,
3325 {
3326 "pending_measurement_obligation": {
3327 "source_step_no": 12,
3328 "tool": "shell_exec",
3329 "metric_candidates": ["2.7 tok/s"],
3330 "command": "run benchmark",
3331 }
3332 },
3333 )
3334 llm = CapturingLLM(
3335 LLMResponse(tool_calls=[
3336 ToolCall(
3337 name="record_experiment",
3338 arguments={
3339 "title": "measured trial",
3340 "status": "measured",
3341 "metric_name": "speed",
3342 "metric_value": 2.7,
3343 "metric_unit": "tok/s",
3344 "next_action": "try the next measured branch",
3345 },
3346 )
3347 ])
3348 )
3349
3350 run_one_step(job_id, config=config, db=db, llm=llm)
3351
3352 tool_names = {tool["function"]["name"] for tool in llm.tools}
3353 assert {"record_experiment", "record_lesson", "record_tasks"}.issubset(tool_names)
3354 assert "shell_exec" not in tool_names
3355 assert "web_search" not in tool_names
3356 assert "acknowledge_operator_context" not in tool_names
3357 finally:
3358 db.close()
3359
3360
3361def test_resolution_tools_survive_task_saturation_suppression(tmp_path):
3362 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
3363 db = AgentDB(tmp_path / "state.db")
3364 try:
3365 job_id = db.create_job(
3366 "Improve a measurable process",
3367 title="measure-tools-after-saturation",
3368 kind="generic",
3369 metadata={
3370 "pending_measurement_obligation": {
3371 "source_step_no": 12,
3372 "tool": "shell_exec",
3373 "metric_candidates": ["2.7 tok/s"],
3374 "command": "run benchmark",
3375 }
3376 },
3377 )
3378 run_id = db.start_run(job_id, model="fake")
3379 for step_no in range(2):
3380 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_tasks")
3381 db.finish_step(
3382 step_id,
3383 status="blocked",
3384 summary="blocked record_tasks; task queue saturated",
3385 output_data={
3386 "success": False,
3387 "error": "task queue saturated",
3388 "task_queue": {"reason": "total task queue is too large", "total_count": 80 + step_no},
3389 },
3390 )
3391 llm = CapturingLLM(
3392 LLMResponse(tool_calls=[
3393 ToolCall(
3394 name="record_lesson",
3395 arguments={"lesson": "Measurement is blocked until the current branch is reconciled."},
3396 )
3397 ])
3398 )
3399
3400 run_one_step(job_id, config=config, db=db, llm=llm)
3401
3402 tool_names = {tool["function"]["name"] for tool in llm.tools}
3403 assert {"record_experiment", "record_lesson", "record_tasks"}.issubset(tool_names)
3404 assert "web_search" not in tool_names
3405 finally:
3406 db.close()
3407
3408
3409def test_pending_evidence_checkpoint_narrows_available_tools(tmp_path):
3410 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
3411 db = AgentDB(tmp_path / "state.db")
3412 try:
3413 job_id = db.create_job("Account for checkpointed evidence", title="checkpoint-tools", kind="generic")
3414 db.update_job_metadata(
3415 job_id,
3416 {
3417 "pending_evidence_checkpoint": {
3418 "artifact_id": "art_checkpoint",
3419 "title": "Checkpoint",
3420 "evidence_step_no": 12,
3421 "blocked_tool": "shell_exec",
3422 "created_at": "2026-01-01T00:00:00+00:00",
3423 }
3424 },
3425 )
3426 llm = CapturingLLM(
3427 LLMResponse(tool_calls=[
3428 ToolCall(name="record_lesson", arguments={"lesson": "checkpoint accounted for", "category": "memory"})
3429 ])
3430 )
3431
3432 run_one_step(job_id, config=config, db=db, llm=llm)
3433
3434 tool_names = {tool["function"]["name"] for tool in llm.tools}
3435 assert "read_artifact" in tool_names
3436 assert {"record_findings", "record_source", "record_lesson", "record_experiment"}.issubset(tool_names)
3437 assert "record_tasks" not in tool_names
3438 assert "shell_exec" not in tool_names
3439 assert "web_search" not in tool_names
3440 assert "acknowledge_operator_context" not in tool_names
3441 finally:
3442 db.close()
3443
3444
3445def test_acknowledge_operator_context_hidden_without_active_operator_context(tmp_path):
3446 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
3447 db = AgentDB(tmp_path / "state.db")
3448 try:
3449 job_id = db.create_job("Run ordinary autonomous work", title="no-operator", kind="generic")
3450 llm = CapturingLLM(LLMResponse(tool_calls=[ToolCall(name="report_update", arguments={"message": "working"})]))
3451
3452 run_one_step(job_id, config=config, db=db, llm=llm)
3453
3454 tool_names = {tool["function"]["name"] for tool in llm.tools}
3455 assert "acknowledge_operator_context" not in tool_names
3456 finally:
3457 db.close()
3458
3459
3460def test_acknowledge_operator_context_visible_with_active_operator_context(tmp_path):
3461 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
3462 db = AgentDB(tmp_path / "state.db")
3463 try:
3464 job_id = db.create_job("Run with operator steering", title="operator", kind="generic")
3465 db.append_operator_message(job_id, "use the corrected target", source="chat")
3466 llm = CapturingLLM(LLMResponse(tool_calls=[ToolCall(name="report_update", arguments={"message": "working"})]))
3467
3468 run_one_step(job_id, config=config, db=db, llm=llm)
3469
3470 tool_names = {tool["function"]["name"] for tool in llm.tools}
3471 assert "acknowledge_operator_context" in tool_names
3472 finally:
3473 db.close()
3474
3475
3476def test_diagnostic_shell_output_does_not_create_measurement_obligation(tmp_path):
3477 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
3478 db = AgentDB(tmp_path / "state.db")
3479 try:
3480 job_id = db.create_job("Improve a measurable process", title="measure", kind="generic")
3481
3482 result = run_one_step(
3483 job_id,
3484 config=config,
3485 db=db,
3486 llm=ScriptedLLM([LLMResponse(tool_calls=[ToolCall(name="shell_exec", arguments={"command": "df -h && nproc && free -h"})])]),
3487 registry=DiagnosticShellRegistry(),
3488 )
3489
3490 job = db.get_job(job_id)
3491 assert result.tool_name == "shell_exec"
3492 assert job["metadata"].get("pending_measurement_obligation") in (None, {})
3493 finally:
3494 db.close()
3495
3496
3497def test_source_code_shell_output_does_not_create_measurement_obligation(tmp_path):
3498 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
3499 db = AgentDB(tmp_path / "state.db")
3500 try:
3501 job_id = db.create_job("Improve a measurable process", title="measure", kind="generic")
3502
3503 result = run_one_step(
3504 job_id,
3505 config=config,
3506 db=db,
3507 llm=ScriptedLLM([
3508 LLMResponse(tool_calls=[ToolCall(name="shell_exec", arguments={"command": "git show HEAD:nipux_cli/cli.py"})])
3509 ]),
3510 registry=SourceCodeShellRegistry(),
3511 )
3512
3513 job = db.get_job(job_id)
3514 assert result.tool_name == "shell_exec"
3515 assert job["metadata"].get("pending_measurement_obligation") in (None, {})
3516 finally:
3517 db.close()
3518
3519
3520def test_prose_from_timed_command_does_not_create_measurement_obligation(tmp_path):
3521 class ProseShellRegistry:
3522 def openai_tools(self):
3523 return []
3524
3525 def handle(self, name, args, ctx):
3526 del args, ctx
3527 if name == "shell_exec":
3528 return json.dumps({
3529 "success": True,
3530 "command": "time cat draft.txt",
3531 "returncode": 0,
3532 "stdout": 'This draft says "time". 2 examples are listed. It asks readers to rate a story.',
3533 "stderr": "",
3534 })
3535 return json.dumps({"success": True})
3536
3537 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
3538 db = AgentDB(tmp_path / "state.db")
3539 try:
3540 job_id = db.create_job("Improve a measurable process", title="measure", kind="generic")
3541
3542 result = run_one_step(
3543 job_id,
3544 config=config,
3545 db=db,
3546 llm=ScriptedLLM([LLMResponse(tool_calls=[ToolCall(name="shell_exec", arguments={"command": "time cat draft.txt"})])]),
3547 registry=ProseShellRegistry(),
3548 )
3549
3550 job = db.get_job(job_id)
3551 assert result.tool_name == "shell_exec"
3552 assert job["metadata"].get("pending_measurement_obligation") in (None, {})
3553 finally:
3554 db.close()
3555
3556
3557def test_large_shell_output_must_be_saved_before_more_shell_churn(tmp_path):
3558 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
3559 db = AgentDB(tmp_path / "state.db")
3560 try:
3561 job_id = db.create_job("Audit a repository", title="audit", kind="generic")
3562
3563 first = run_one_step(
3564 job_id,
3565 config=config,
3566 db=db,
3567 llm=ScriptedLLM([LLMResponse(tool_calls=[ToolCall(name="shell_exec", arguments={"command": "find . -type f"})])]),
3568 registry=LargeShellEvidenceRegistry(),
3569 )
3570 second = run_one_step(
3571 job_id,
3572 config=config,
3573 db=db,
3574 llm=ScriptedLLM([LLMResponse(tool_calls=[ToolCall(name="shell_exec", arguments={"command": "find . -name '*.md'"})])]),
3575 registry=LargeShellEvidenceRegistry(),
3576 )
3577
3578 assert first.tool_name == "shell_exec"
3579 assert second.status == "blocked"
3580 assert second.result["error"] == "artifact required before more research"
3581 finally:
3582 db.close()
3583
3584
3585def test_stale_diagnostic_measurement_obligation_is_cleared(tmp_path):
3586 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
3587 db = AgentDB(tmp_path / "state.db")
3588 try:
3589 job_id = db.create_job(
3590 "Improve a measurable process",
3591 title="measure",
3592 kind="generic",
3593 metadata={
3594 "pending_measurement_obligation": {
3595 "source_step_no": 1,
3596 "command": "df -h && nproc && free -h",
3597 "metric_candidates": ["CPU COUNT 24", "RAM 93"],
3598 }
3599 },
3600 )
3601
3602 result = run_one_step(
3603 job_id,
3604 config=config,
3605 db=db,
3606 llm=ScriptedLLM([
3607 LLMResponse(tool_calls=[
3608 ToolCall(
3609 name="record_lesson",
3610 arguments={
3611 "lesson": "The stale output is diagnostic context, not a valid measurement; rerun with a metric.",
3612 "category": "memory",
3613 },
3614 )
3615 ])
3616 ]),
3617 )
3618
3619 job = db.get_job(job_id)
3620 assert result.tool_name == "record_lesson"
3621 assert job["metadata"].get("pending_measurement_obligation") == {}
3622 assert "diagnostic context" in job["metadata"]["last_agent_update"]["message"]
3623 finally:
3624 db.close()
3625
3626
3627def test_measurable_objective_blocks_research_after_budget_but_allows_action(tmp_path):
3628 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
3629 db = AgentDB(tmp_path / "state.db")
3630 try:
3631 job_id = db.create_job("Optimize a measurable process", title="measured", kind="generic")
3632 for index in range(19):
3633 run_id = db.start_run(job_id)
3634 step_id = db.add_step(
3635 job_id=job_id,
3636 run_id=run_id,
3637 kind="tool",
3638 tool_name="web_search" if index % 2 == 0 else "web_extract",
3639 input_data={"arguments": {"query": f"research branch {index}"}},
3640 )
3641 db.finish_step(step_id, status="completed", output_data={"success": True})
3642 db.finish_run(run_id, "completed")
3643
3644 blocked = run_one_step(
3645 job_id,
3646 config=config,
3647 db=db,
3648 llm=ScriptedLLM([LLMResponse(tool_calls=[ToolCall(name="web_search", arguments={"query": "more research"})])]),
3649 registry=MeasuredShellRegistry(),
3650 )
3651 assert blocked.status == "blocked"
3652 assert blocked.result["error"] == "measured progress required"
3653
3654 action = run_one_step(
3655 job_id,
3656 config=config,
3657 db=db,
3658 llm=ScriptedLLM([LLMResponse(tool_calls=[ToolCall(name="shell_exec", arguments={"command": "run test"})])]),
3659 registry=MeasuredShellRegistry(),
3660 )
3661 job = db.get_job(job_id)
3662 assert action.status == "completed"
3663 assert action.tool_name == "shell_exec"
3664 assert job["metadata"]["pending_measurement_obligation"]["metric_candidates"]
3665 finally:
3666 db.close()
3667
3668
3669def test_measurable_objective_blocks_shell_churn_without_experiment_accounting(tmp_path):
3670 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
3671 db = AgentDB(tmp_path / "state.db")
3672 try:
3673 job_id = db.create_job("Optimize a measurable process", title="measured", kind="generic")
3674 for index in range(4):
3675 run_id = db.start_run(job_id)
3676 step_id = db.add_step(
3677 job_id=job_id,
3678 run_id=run_id,
3679 kind="tool",
3680 tool_name="shell_exec",
3681 input_data={"arguments": {"command": f"probe {index}"}},
3682 )
3683 db.finish_step(step_id, status="completed", output_data={"success": True, "stdout": "no metric"})
3684 db.finish_run(run_id, "completed")
3685
3686 blocked = run_one_step(
3687 job_id,
3688 config=config,
3689 db=db,
3690 llm=ScriptedLLM([LLMResponse(tool_calls=[ToolCall(name="shell_exec", arguments={"command": "probe again"})])]),
3691 registry=MeasuredShellRegistry(),
3692 )
3693
3694 assert blocked.status == "blocked"
3695 assert blocked.result["error"] == "measured progress required"
3696 finally:
3697 db.close()
3698
3699
3700def test_measured_progress_guard_narrows_available_tools_after_shell_budget(tmp_path):
3701 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
3702 db = AgentDB(tmp_path / "state.db")
3703 try:
3704 job_id = db.create_job("Optimize a measurable process", title="measured-tools", kind="generic")
3705 for index in range(4):
3706 run_id = db.start_run(job_id)
3707 step_id = db.add_step(
3708 job_id=job_id,
3709 run_id=run_id,
3710 kind="tool",
3711 tool_name="shell_exec",
3712 input_data={"arguments": {"command": f"probe {index}"}},
3713 )
3714 db.finish_step(step_id, status="completed", output_data={"success": True, "stdout": "no metric"})
3715 db.finish_run(run_id, "completed")
3716 llm = CapturingLLM(
3717 LLMResponse(tool_calls=[
3718 ToolCall(
3719 name="record_lesson",
3720 arguments={"lesson": "measurement blocked after probes", "category": "blocker"},
3721 )
3722 ])
3723 )
3724
3725 run_one_step(job_id, config=config, db=db, llm=llm)
3726
3727 tool_names = {tool["function"]["name"] for tool in llm.tools}
3728 assert {"record_experiment", "record_lesson", "record_tasks"}.issubset(tool_names)
3729 assert "shell_exec" not in tool_names
3730 assert "write_artifact" not in tool_names
3731 assert "web_search" not in tool_names
3732 finally:
3733 db.close()
3734
3735
3736def test_measured_progress_guard_keeps_shell_available_before_shell_budget(tmp_path):
3737 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
3738 db = AgentDB(tmp_path / "state.db")
3739 try:
3740 job_id = db.create_job("Optimize a measurable process", title="measured-tools", kind="generic")
3741 for index in range(19):
3742 run_id = db.start_run(job_id)
3743 step_id = db.add_step(
3744 job_id=job_id,
3745 run_id=run_id,
3746 kind="tool",
3747 tool_name="web_search" if index % 2 == 0 else "web_extract",
3748 input_data={"arguments": {"query": f"research branch {index}"}},
3749 )
3750 db.finish_step(step_id, status="completed", output_data={"success": True})
3751 db.finish_run(run_id, "completed")
3752 llm = CapturingLLM(
3753 LLMResponse(tool_calls=[
3754 ToolCall(
3755 name="record_lesson",
3756 arguments={"lesson": "convert research budget into a measured trial", "category": "strategy"},
3757 )
3758 ])
3759 )
3760
3761 run_one_step(job_id, config=config, db=db, llm=llm)
3762
3763 tool_names = {tool["function"]["name"] for tool in llm.tools}
3764 assert "shell_exec" in tool_names
3765 assert {"record_experiment", "record_lesson", "record_tasks"}.issubset(tool_names)
3766 assert "write_artifact" not in tool_names
3767 assert "web_search" not in tool_names
3768 finally:
3769 db.close()
3770
3771
3772def test_measured_progress_guard_ignores_non_measurement_task_updates(tmp_path):
3773 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
3774 db = AgentDB(tmp_path / "state.db")
3775 try:
3776 job_id = db.create_job("Optimize a measurable process", title="measured", kind="generic")
3777 for index in range(18):
3778 run_id = db.start_run(job_id)
3779 step_id = db.add_step(
3780 job_id=job_id,
3781 run_id=run_id,
3782 kind="tool",
3783 tool_name="web_search",
3784 input_data={"arguments": {"query": f"research branch {index}"}},
3785 )
3786 db.finish_step(step_id, status="completed", output_data={"success": True})
3787 db.finish_run(run_id, "completed")
3788 run_id = db.start_run(job_id)
3789 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_tasks")
3790 db.finish_step(
3791 step_id,
3792 status="completed",
3793 output_data={
3794 "success": True,
3795 "tasks": [{"title": "Write notes", "status": "open", "output_contract": "report"}],
3796 },
3797 )
3798 db.finish_run(run_id, "completed")
3799
3800 blocked = run_one_step(
3801 job_id,
3802 config=config,
3803 db=db,
3804 llm=ScriptedLLM([LLMResponse(tool_calls=[ToolCall(name="web_search", arguments={"query": "more notes"})])]),
3805 registry=MeasuredShellRegistry(),
3806 )
3807
3808 assert blocked.status == "blocked"
3809 assert blocked.result["error"] == "measured progress required"
3810 finally:
3811 db.close()
3812
3813
3814def test_measured_progress_guard_accepts_measurement_task_update(tmp_path):
3815 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
3816 db = AgentDB(tmp_path / "state.db")
3817 try:
3818 job_id = db.create_job("Optimize a measurable process", title="measured", kind="generic")
3819 for index in range(18):
3820 run_id = db.start_run(job_id)
3821 step_id = db.add_step(
3822 job_id=job_id,
3823 run_id=run_id,
3824 kind="tool",
3825 tool_name="web_search",
3826 input_data={"arguments": {"query": f"research branch {index}"}},
3827 )
3828 db.finish_step(step_id, status="completed", output_data={"success": True})
3829 db.finish_run(run_id, "completed")
3830 run_id = db.start_run(job_id)
3831 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_tasks")
3832 db.finish_step(
3833 step_id,
3834 status="completed",
3835 output_data={
3836 "success": True,
3837 "tasks": [{"title": "Run measured variant", "status": "open", "output_contract": "experiment"}],
3838 },
3839 )
3840 db.finish_run(run_id, "completed")
3841
3842 allowed = run_one_step(
3843 job_id,
3844 config=config,
3845 db=db,
3846 llm=ScriptedLLM([LLMResponse(tool_calls=[ToolCall(name="shell_exec", arguments={"command": "run measured variant"})])]),
3847 registry=MeasuredShellRegistry(),
3848 )
3849
3850 assert allowed.status == "completed"
3851 assert allowed.tool_name == "shell_exec"
3852 finally:
3853 db.close()
3854
3855
3856def test_measurable_objective_allows_candidate_file_validation_shell_after_budget(tmp_path):
3857 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
3858 db = AgentDB(tmp_path / "state.db")
3859 try:
3860 job_id = db.create_job("Optimize a measurable file-backed process", title="measured-file", kind="generic")
3861 db.update_job_metadata(
3862 job_id,
3863 {
3864 "task_queue": [
3865 {
3866 "title": "Validate candidate file and benchmark",
3867 "status": "active",
3868 "acceptance_criteria": "Exact candidate path is validated before benchmarking.",
3869 "evidence_needed": "Shell output showing file size for /srv/models/AlphaModel-99-Q4.foo.",
3870 "output_contract": "experiment",
3871 }
3872 ]
3873 },
3874 )
3875 for index in range(4):
3876 run_id = db.start_run(job_id)
3877 stdout = "no metric"
3878 if index == 0:
3879 stdout = "/srv/models/AlphaModel-99-Q4.foo\n"
3880 step_id = db.add_step(
3881 job_id=job_id,
3882 run_id=run_id,
3883 kind="tool",
3884 tool_name="shell_exec",
3885 input_data={"arguments": {"command": f"probe {index}"}},
3886 )
3887 db.finish_step(step_id, status="completed", output_data={"success": True, "stdout": stdout})
3888 db.finish_run(run_id, "completed")
3889
3890 result = run_one_step(
3891 job_id,
3892 config=config,
3893 db=db,
3894 llm=ScriptedLLM([
3895 LLMResponse(tool_calls=[
3896 ToolCall(
3897 name="shell_exec",
3898 arguments={"command": "ls -lh /srv/models/AlphaModel-99-Q4.foo && file /srv/models/AlphaModel-99-Q4.foo"},
3899 )
3900 ])
3901 ]),
3902 registry=MeasuredShellRegistry(),
3903 )
3904
3905 assert result.status == "completed"
3906 assert result.tool_name == "shell_exec"
3907 finally:
3908 db.close()
3909
3910
3911def test_prompt_includes_durable_lessons():
3912 job = {
3913 "title": "research",
3914 "kind": "generic",
3915 "objective": "find research",
3916 "metadata": {
3917 "lessons": [{
3918 "category": "source_quality",
3919 "lesson": "Low-evidence pages are background noise, not durable findings.",
3920 }],
3921 },
3922 }
3923
3924 messages = build_messages(job, [])
3925
3926 content = messages[-1]["content"]
3927 assert "Lessons learned:" in content
3928 assert "Low-evidence pages are background noise" in content
3929
3930
3931def test_prompt_suppresses_stale_negative_lessons_when_positive_durable_evidence_exists():
3932 job = {
3933 "title": "research",
3934 "kind": "generic",
3935 "objective": "keep facts current",
3936 "metadata": {
3937 "finding_ledger": [
3938 {
3939 "name": "Observed local model",
3940 "reason": "ModelX-99 appears in the local model list with size 17 GB.",
3941 }
3942 ],
3943 "lessons": [
3944 {
3945 "category": "strategy",
3946 "lesson": (
3947 "No ModelX-99 model has been successfully downloaded, so keep the download "
3948 "branch as the primary blocker before benchmark work."
3949 ),
3950 }
3951 ],
3952 },
3953 }
3954
3955 content = build_messages(job, [])[-1]["content"]
3956
3957 assert "Potentially stale negative lesson suppressed for ModelX-99" in content
3958 assert "No ModelX-99 model has been successfully downloaded" not in content
3959
3960
3961def test_prompt_keeps_negative_lessons_when_durable_evidence_is_negative():
3962 job = {
3963 "title": "research",
3964 "kind": "generic",
3965 "objective": "keep facts current",
3966 "metadata": {
3967 "finding_ledger": [
3968 {
3969 "name": "Missing local model",
3970 "reason": "ls cannot access ModelX-99: no such file or directory.",
3971 }
3972 ],
3973 "lessons": [
3974 {
3975 "category": "strategy",
3976 "lesson": "No ModelX-99 file exists in the checked path; use a different observed source.",
3977 }
3978 ],
3979 },
3980 }
3981
3982 content = build_messages(job, [])[-1]["content"]
3983
3984 assert "No ModelX-99 file exists" in content
3985 assert "Potentially stale negative lesson suppressed" not in content
3986
3987
3988def test_prompt_includes_memory_graph_slice():
3989 job = {
3990 "title": "research",
3991 "kind": "generic",
3992 "objective": "keep improving the output",
3993 "metadata": {
3994 "memory_graph": {
3995 "nodes": [
3996 {
3997 "key": "validated-checkpoints",
3998 "title": "Validated checkpoints compound progress",
3999 "kind": "strategy",
4000 "status": "active",
4001 "summary": "Save evidence, validate it, then branch from the gap.",
4002 "salience": 0.95,
4003 "tags": ["progress"],
4004 "evidence_refs": ["art_1"],
4005 },
4006 {
4007 "key": "weak-source",
4008 "title": "Weak source path",
4009 "kind": "source",
4010 "status": "deprecated",
4011 "summary": "This path produced low-yield repeats.",
4012 "salience": 0.1,
4013 },
4014 ],
4015 "edges": [
4016 {
4017 "from_key": "validated-checkpoints",
4018 "to_key": "weak-source",
4019 "relation": "replaces",
4020 }
4021 ],
4022 }
4023 },
4024 }
4025
4026 content = build_messages(job, [])[-1]["content"]
4027
4028 assert "Memory graph:" in content
4029 assert "Validated checkpoints compound progress" in content
4030 assert "strategy" in content
4031 assert "replaces -> weak-source" in content
4032 assert "art_1" in content
4033
4034
4035def test_prompt_suppresses_memory_graph_nodes_matching_stale_claim_tokens(tmp_path):
4036 db = AgentDB(tmp_path / "state.db")
4037 try:
4038 job_id = db.create_job("Prefer current durable evidence", title="stale-graph", kind="generic")
4039 db.append_lesson(
4040 job_id,
4041 "Evidence grounding rejected unsupported concrete tokens for record_experiment: E5-2690, v3. Treat matching prior ledger claims as stale.",
4042 category="mistake",
4043 )
4044 db.append_memory_graph_records(
4045 job_id,
4046 nodes=[
4047 {
4048 "key": "old-hardware",
4049 "title": "Intel Xeon E5-2690 v3 baseline",
4050 "kind": "fact",
4051 "status": "stable",
4052 "summary": "Old baseline that should not enter the prompt after contradiction.",
4053 },
4054 {
4055 "key": "current-evidence",
4056 "title": "Current observed hardware needs verification",
4057 "kind": "fact",
4058 "status": "active",
4059 "summary": "Continue from fresh shell evidence only.",
4060 },
4061 ],
4062 )
4063
4064 job = db.get_job(job_id)
4065 content = build_messages(job, db.list_steps(job_id=job_id))[-1]["content"]
4066
4067 assert "Suppressed 1 stale memory node" in content
4068 assert "Current observed hardware needs verification" in content
4069 assert "Intel Xeon E5-2690 v3 baseline" not in content
4070 finally:
4071 db.close()
4072
4073
4074def test_prompt_suppresses_negative_memory_graph_nodes_matching_stale_file_type(tmp_path):
4075 db = AgentDB(tmp_path / "state.db")
4076 try:
4077 job_id = db.create_job("Prefer current file evidence", title="stale-file-type", kind="generic")
4078 db.update_job_metadata(
4079 job_id,
4080 {
4081 "stale_negative_records": [
4082 {
4083 "kind": "memory_node",
4084 "record_id": "old-absence",
4085 "token": ".foo",
4086 "evidence": "/srv/models/AlphaModel.foo",
4087 }
4088 ]
4089 },
4090 )
4091 db.append_memory_graph_records(
4092 job_id,
4093 nodes=[
4094 {
4095 "key": "download-blocker",
4096 "title": "Model file download critical blocker",
4097 "kind": "constraint",
4098 "status": "active",
4099 "summary": "FOO model download attempts return 0 files. All downstream work is blocked until a model file exists locally.",
4100 },
4101 {
4102 "key": "format-skill",
4103 "title": "FOO format tuning",
4104 "kind": "skill",
4105 "status": "active",
4106 "summary": "Use the FOO runtime flags after a valid file is selected.",
4107 },
4108 ],
4109 )
4110
4111 job = db.get_job(job_id)
4112 content = build_messages(job, db.list_steps(job_id=job_id))[-1]["content"]
4113
4114 assert "Suppressed 1 stale memory node" in content
4115 assert "Model file download critical blocker" not in content
4116 assert "FOO format tuning" in content
4117 finally:
4118 db.close()
4119
4120
4121def test_prompt_pushes_memory_graph_consolidation_when_ledgers_exist_without_nodes():
4122 job = {
4123 "title": "research",
4124 "kind": "generic",
4125 "objective": "keep improving the output",
4126 "metadata": {
4127 "lessons": [{"lesson": "Prefer validated checkpoints.", "category": "strategy"}],
4128 "experiment_ledger": [{"title": "Trial", "status": "measured"}],
4129 "task_queue": [{"title": "Next branch", "status": "open"}],
4130 },
4131 }
4132
4133 content = build_messages(job, [])[-1]["content"]
4134
4135 assert "No memory graph yet" in content
4136 assert "Durable ledgers already contain 3 reusable item" in content
4137 assert "record_memory_graph" in content
4138
4139
4140def test_prompt_adds_memory_consolidation_guard_when_graph_lags_ledgers():
4141 job = {
4142 "title": "research",
4143 "kind": "generic",
4144 "objective": "keep improving the output",
4145 "metadata": {
4146 "lessons": [
4147 {"lesson": "Use validated checkpoints.", "category": "strategy"},
4148 {"lesson": "Reject low-yield branches.", "category": "strategy"},
4149 ],
4150 "experiment_ledger": [{"title": "Trial", "status": "measured"}],
4151 "finding_ledger": [{"name": "Finding A"}, {"name": "Finding B"}],
4152 "source_ledger": [{"source": "source:a"}],
4153 },
4154 }
4155
4156 content = build_messages(job, [])[-1]["content"]
4157
4158 assert "Memory consolidation guard:" in content
4159 assert "durable_records=6" in content
4160 assert "record_memory_graph" in content
4161
4162
4163def test_prompt_adds_research_balance_guard_for_execution_without_sources():
4164 job = {
4165 "title": "workflow builder",
4166 "kind": "generic",
4167 "objective": "build a durable workflow and keep improving it",
4168 "metadata": {
4169 "experiment_ledger": [{"title": "Validation check", "status": "measured"}],
4170 },
4171 }
4172 steps = [
4173 {
4174 "step_no": index,
4175 "kind": "tool",
4176 "tool_name": "shell_exec",
4177 "status": "completed",
4178 "input": {"arguments": {"command": f"echo branch-{index}"}},
4179 }
4180 for index in range(1, 7)
4181 ]
4182
4183 content = build_messages(job, steps)[-1]["content"]
4184
4185 assert "Research balance guard:" in content
4186 assert "execution-heavy" in content
4187 assert "sources=0" in content
4188 assert "record_source" in content
4189
4190
4191def test_prompt_research_balance_guard_clears_when_sources_exist():
4192 job = {
4193 "title": "workflow builder",
4194 "kind": "generic",
4195 "objective": "build a durable workflow and keep improving it",
4196 "metadata": {
4197 "source_ledger": [{"source": "project docs"}],
4198 "experiment_ledger": [{"title": "Validation check", "status": "measured"}],
4199 },
4200 }
4201 steps = [
4202 {
4203 "step_no": index,
4204 "kind": "tool",
4205 "tool_name": "shell_exec",
4206 "status": "completed",
4207 "input": {"arguments": {"command": f"echo branch-{index}"}},
4208 }
4209 for index in range(1, 7)
4210 ]
4211
4212 content = build_messages(job, steps)[-1]["content"]
4213
4214 assert "Recent work is execution-heavy" not in content
4215
4216
4217def _source_yield_metadata(source_count: int = 16, finding_count: int = 1, *, include_memory_graph: bool = True) -> dict:
4218 metadata = {
4219 "source_ledger": [
4220 {
4221 "source": f"https://source.example/{index}",
4222 "source_type": "web_extract",
4223 "usefulness_score": 0.55,
4224 "yield_count": 0,
4225 "last_outcome": "extracted source text for possible use",
4226 }
4227 for index in range(source_count)
4228 ],
4229 "finding_ledger": [
4230 {
4231 "name": f"Finding {index}",
4232 "source_url": f"https://source.example/{index}",
4233 }
4234 for index in range(finding_count)
4235 ],
4236 }
4237 if include_memory_graph:
4238 metadata["memory_graph"] = {
4239 "nodes": [
4240 {"key": f"source-node-{index}", "kind": "source", "title": f"Source set {index}"}
4241 for index in range(4)
4242 ],
4243 "edges": [
4244 {"from": "source-node-0", "to": "source-node-1", "kind": "supports"},
4245 {"from": "source-node-1", "to": "source-node-2", "kind": "supports"},
4246 {"from": "source-node-2", "to": "source-node-3", "kind": "supports"},
4247 ],
4248 }
4249 return metadata
4250
4251
4252def _source_gathering_steps(count: int = 6) -> list[dict]:
4253 return [
4254 {
4255 "step_no": index,
4256 "kind": "tool",
4257 "tool_name": "web_extract" if index % 2 else "web_search",
4258 "status": "completed",
4259 "input": {"arguments": {"query": f"source branch {index}"}},
4260 }
4261 for index in range(1, count + 1)
4262 ]
4263
4264
4265def test_prompt_adds_source_yield_guard_when_sources_are_not_synthesized():
4266 job = {
4267 "title": "source-heavy job",
4268 "kind": "generic",
4269 "objective": "research and produce durable conclusions",
4270 "metadata": _source_yield_metadata(),
4271 }
4272
4273 content = build_messages(job, _source_gathering_steps())[-1]["content"]
4274
4275 assert "Source yield guard:" in content
4276 assert "Many sources have been gathered" in content
4277 assert "sources=16" in content
4278 assert "findings=1" in content
4279 assert "record_findings" in content
4280
4281
4282def test_prompt_source_yield_guard_clears_when_findings_cover_sources():
4283 job = {
4284 "title": "source-heavy job",
4285 "kind": "generic",
4286 "objective": "research and produce durable conclusions",
4287 "metadata": _source_yield_metadata(finding_count=2),
4288 }
4289
4290 content = build_messages(job, _source_gathering_steps())[-1]["content"]
4291
4292 assert "Source yield guard:" in content
4293 assert "Many sources have been gathered" not in content
4294
4295
4296def test_run_one_step_blocks_more_source_gathering_when_source_yield_is_missing(tmp_path):
4297 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
4298 db = AgentDB(tmp_path / "state.db")
4299 try:
4300 job_id = db.create_job(
4301 "Research and produce durable conclusions",
4302 title="source-yield",
4303 kind="generic",
4304 metadata=_source_yield_metadata(),
4305 )
4306 run_id = db.start_run(job_id, model="test")
4307 for step in _source_gathering_steps():
4308 step_id = db.add_step(
4309 job_id=job_id,
4310 run_id=run_id,
4311 kind="tool",
4312 tool_name=step["tool_name"],
4313 input_data=step["input"],
4314 )
4315 db.finish_step(step_id, status="completed", output_data={"success": True})
4316 db.finish_run(run_id, "completed")
4317
4318 result = run_one_step(
4319 job_id,
4320 config=config,
4321 db=db,
4322 llm=ScriptedLLM([
4323 LLMResponse(tool_calls=[ToolCall(name="web_search", arguments={"query": "more sources"})])
4324 ]),
4325 )
4326
4327 assert result.status == "blocked"
4328 assert result.result["error"] == "source yield accounting required"
4329 assert result.result["source_yield"]["sources"] == 16
4330 finally:
4331 db.close()
4332
4333
4334def test_source_yield_guard_takes_priority_over_memory_consolidation(tmp_path):
4335 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
4336 db = AgentDB(tmp_path / "state.db")
4337 try:
4338 job_id = db.create_job(
4339 "Research and produce durable conclusions",
4340 title="source-yield-priority",
4341 kind="generic",
4342 metadata=_source_yield_metadata(include_memory_graph=False),
4343 )
4344 run_id = db.start_run(job_id, model="test")
4345 for step in _source_gathering_steps():
4346 step_id = db.add_step(
4347 job_id=job_id,
4348 run_id=run_id,
4349 kind="tool",
4350 tool_name=step["tool_name"],
4351 input_data=step["input"],
4352 )
4353 db.finish_step(step_id, status="completed", output_data={"success": True})
4354 db.finish_run(run_id, "completed")
4355
4356 result = run_one_step(
4357 job_id,
4358 config=config,
4359 db=db,
4360 llm=ScriptedLLM([
4361 LLMResponse(tool_calls=[ToolCall(name="web_extract", arguments={"urls": ["https://source.example/new"]})])
4362 ]),
4363 )
4364
4365 assert result.status == "blocked"
4366 assert result.result["error"] == "source yield accounting required"
4367 finally:
4368 db.close()
4369
4370
4371def test_run_one_step_allows_source_yield_accounting(tmp_path):
4372 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
4373 db = AgentDB(tmp_path / "state.db")
4374 try:
4375 job_id = db.create_job(
4376 "Research and produce durable conclusions",
4377 title="source-yield",
4378 kind="generic",
4379 metadata=_source_yield_metadata(),
4380 )
4381 run_id = db.start_run(job_id, model="test")
4382 for step in _source_gathering_steps():
4383 step_id = db.add_step(
4384 job_id=job_id,
4385 run_id=run_id,
4386 kind="tool",
4387 tool_name=step["tool_name"],
4388 input_data=step["input"],
4389 )
4390 db.finish_step(step_id, status="completed", output_data={"success": True})
4391 db.finish_run(run_id, "completed")
4392
4393 result = run_one_step(
4394 job_id,
4395 config=config,
4396 db=db,
4397 llm=ScriptedLLM([
4398 LLMResponse(tool_calls=[
4399 ToolCall(
4400 name="record_source",
4401 arguments={
4402 "source": "https://source.example/0",
4403 "source_type": "web_extract",
4404 "yield_count": 1,
4405 "outcome": "Source produced a durable conclusion for the active branch.",
4406 },
4407 )
4408 ])
4409 ]),
4410 )
4411
4412 assert result.status == "completed"
4413 assert result.tool_name == "record_source"
4414 finally:
4415 db.close()
4416
4417
4418def test_run_one_step_blocks_execution_when_research_balance_is_missing(tmp_path):
4419 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
4420 db = AgentDB(tmp_path / "state.db")
4421 try:
4422 job_id = db.create_job(
4423 "Build a durable workflow and keep improving it",
4424 title="research-balance",
4425 kind="generic",
4426 metadata={"experiment_ledger": [{"title": "Validation check", "status": "measured"}]},
4427 )
4428 run_id = db.start_run(job_id, model="test")
4429 for index in range(6):
4430 step_id = db.add_step(
4431 job_id=job_id,
4432 run_id=run_id,
4433 kind="tool",
4434 tool_name="shell_exec",
4435 input_data={"arguments": {"command": f"python branch_{index}.py"}},
4436 )
4437 db.finish_step(step_id, status="completed", output_data={"success": True, "stdout": "ok"})
4438 db.finish_run(run_id, "completed")
4439
4440 result = run_one_step(
4441 job_id,
4442 config=config,
4443 db=db,
4444 llm=ScriptedLLM([
4445 LLMResponse(tool_calls=[ToolCall(name="shell_exec", arguments={"command": "python continue_branch.py"})])
4446 ]),
4447 )
4448
4449 assert result.status == "blocked"
4450 assert result.result["error"] == "research balance required"
4451 assert result.result["blocked_tool"] == "shell_exec"
4452 assert "record_source" in result.result["guidance"]
4453 finally:
4454 db.close()
4455
4456
4457def test_run_one_step_blocks_lesson_churn_when_research_balance_is_missing(tmp_path):
4458 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
4459 db = AgentDB(tmp_path / "state.db")
4460 try:
4461 job_id = db.create_job(
4462 "Build a durable workflow and keep improving it",
4463 title="research-balance-lessons",
4464 kind="generic",
4465 metadata={"experiment_ledger": [{"title": "Validation check", "status": "measured"}]},
4466 )
4467 run_id = db.start_run(job_id, model="test")
4468 for index in range(6):
4469 step_id = db.add_step(
4470 job_id=job_id,
4471 run_id=run_id,
4472 kind="tool",
4473 tool_name="shell_exec",
4474 input_data={"arguments": {"command": f"python branch_{index}.py"}},
4475 )
4476 db.finish_step(step_id, status="completed", output_data={"success": True, "stdout": "ok"})
4477 db.finish_run(run_id, "completed")
4478
4479 result = run_one_step(
4480 job_id,
4481 config=config,
4482 db=db,
4483 llm=ScriptedLLM([
4484 LLMResponse(tool_calls=[
4485 ToolCall(
4486 name="record_lesson",
4487 arguments={
4488 "lesson": "The latest execution branch worked; continue similar attempts.",
4489 "category": "strategy",
4490 },
4491 )
4492 ])
4493 ]),
4494 )
4495
4496 assert result.status == "blocked"
4497 assert result.result["error"] == "research balance required"
4498 assert result.result["blocked_tool"] == "record_lesson"
4499 assert "raw lesson accumulation" in result.result["guidance"]
4500 finally:
4501 db.close()
4502
4503
4504def test_run_one_step_blocks_durable_records_with_unsupported_concrete_claims(tmp_path):
4505 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
4506 db = AgentDB(tmp_path / "state.db")
4507 try:
4508 job_id = db.create_job("Optimize a measurable process on observed hardware", title="grounding", kind="generic")
4509 run_id = db.start_run(job_id, model="test")
4510 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
4511 db.finish_step(
4512 step_id,
4513 status="completed",
4514 output_data={
4515 "success": True,
4516 "stdout": "GPU: AMD Device 7590\nCPU: AMD Ryzen 9 7900X\nMemory: 93Gi\n",
4517 "stderr": "",
4518 },
4519 )
4520 db.finish_run(run_id, "completed")
4521
4522 result = run_one_step(
4523 job_id,
4524 config=config,
4525 db=db,
4526 llm=ScriptedLLM([
4527 LLMResponse(tool_calls=[
4528 ToolCall(
4529 name="record_roadmap",
4530 arguments={
4531 "title": "Performance roadmap",
4532 "status": "active",
4533 "current_milestone": "Environment",
4534 "metadata": {
4535 "hardware": "NVIDIA GTX 970 with CUDA and i5-8400 CPU",
4536 "claim": "Use CUDA-first optimization.",
4537 },
4538 "milestones": [],
4539 },
4540 )
4541 ])
4542 ]),
4543 )
4544
4545 assert result.status == "blocked"
4546 assert result.result["error"] == "evidence grounding required"
4547 assert result.result["blocked_tool"] == "record_roadmap"
4548 assert "GTX" in result.result["evidence_grounding"]["unsupported_tokens"]
4549 lessons = db.get_job(job_id)["metadata"]["lessons"]
4550 assert any("GTX" in lesson["lesson"] and "stale" in lesson["lesson"] for lesson in lessons)
4551 finally:
4552 db.close()
4553
4554
4555def test_record_experiment_blocks_unsupported_proper_noun_hardware_claims(tmp_path):
4556 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
4557 db = AgentDB(tmp_path / "state.db")
4558 try:
4559 job_id = db.create_job("Record exact observed environment", title="grounding", kind="generic")
4560 run_id = db.start_run(job_id, model="test")
4561 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
4562 db.finish_step(
4563 step_id,
4564 status="completed",
4565 output_data={
4566 "success": True,
4567 "stdout": (
4568 "NO_NVIDIA_GPU\n"
4569 "GPU: Advanced Micro Devices Device 7590\n"
4570 "Threads: 24\n"
4571 "CPU: AMD Ryzen 9 7900X 12-Core Processor\n"
4572 ),
4573 },
4574 )
4575 db.finish_run(run_id, "completed")
4576
4577 blocked = run_one_step(
4578 job_id,
4579 config=config,
4580 db=db,
4581 llm=ScriptedLLM([
4582 LLMResponse(tool_calls=[
4583 ToolCall(
4584 name="record_experiment",
4585 arguments={
4586 "title": "Environment Baseline - Hardware Runtime Facts",
4587 "status": "measured",
4588 "metric_name": "cpu_threads",
4589 "metric_value": 16,
4590 "metric_unit": "threads",
4591 "result": "Environment baseline captured. Hardware: Dual Intel Xeon CPUs, 16 threads total.",
4592 "next_action": "Continue from exact observed hardware facts.",
4593 },
4594 )
4595 ])
4596 ]),
4597 )
4598
4599 assert blocked.status == "blocked"
4600 assert blocked.result["error"] == "evidence grounding required"
4601 assert {"Dual", "Intel", "Xeon"} <= set(blocked.result["evidence_grounding"]["unsupported_tokens"])
4602 finally:
4603 db.close()
4604
4605
4606def test_record_experiment_allows_supported_proper_noun_hardware_claims(tmp_path):
4607 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
4608 db = AgentDB(tmp_path / "state.db")
4609 try:
4610 job_id = db.create_job("Record exact observed environment", title="grounding", kind="generic")
4611 run_id = db.start_run(job_id, model="test")
4612 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
4613 db.finish_step(
4614 step_id,
4615 status="completed",
4616 output_data={
4617 "success": True,
4618 "stdout": (
4619 "NO_NVIDIA_GPU\n"
4620 "GPU: Advanced Micro Devices Device 7590\n"
4621 "Threads: 24\n"
4622 "CPU: AMD Ryzen 9 7900X 12-Core Processor\n"
4623 ),
4624 },
4625 )
4626 db.finish_run(run_id, "completed")
4627
4628 result = run_one_step(
4629 job_id,
4630 config=config,
4631 db=db,
4632 llm=ScriptedLLM([
4633 LLMResponse(tool_calls=[
4634 ToolCall(
4635 name="record_experiment",
4636 arguments={
4637 "title": "Environment Baseline - Hardware Runtime Facts",
4638 "status": "measured",
4639 "metric_name": "cpu_threads",
4640 "metric_value": 24,
4641 "metric_unit": "threads",
4642 "result": "Environment baseline captured. Hardware: AMD Ryzen 9 7900X, 24 threads total.",
4643 "next_action": "Continue from exact observed hardware facts.",
4644 },
4645 )
4646 ])
4647 ]),
4648 )
4649
4650 assert result.status == "completed"
4651 experiment = db.get_job(job_id)["metadata"]["experiment_ledger"][0]
4652 assert "AMD Ryzen 9 7900X" in experiment["result"]
4653 finally:
4654 db.close()
4655
4656
4657def test_record_lesson_blocks_negative_claim_that_conflicts_with_positive_evidence(tmp_path):
4658 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
4659 db = AgentDB(tmp_path / "state.db")
4660 try:
4661 job_id = db.create_job("Keep exact observed facts durable", title="lesson-grounding", kind="generic")
4662 run_id = db.start_run(job_id, model="test")
4663 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
4664 db.finish_step(
4665 step_id,
4666 status="completed",
4667 output_data={
4668 "success": True,
4669 "stdout": (
4670 "NAME ID SIZE MODIFIED\n"
4671 "ModelX-99 a50eda8ed977 17 GB 2 weeks ago\n"
4672 "OtherModel 69492d6584c5 14 GB 2 months ago\n"
4673 ),
4674 },
4675 )
4676 db.finish_run(run_id, "completed")
4677
4678 blocked = run_one_step(
4679 job_id,
4680 config=config,
4681 db=db,
4682 llm=ScriptedLLM([
4683 LLMResponse(tool_calls=[
4684 ToolCall(
4685 name="record_lesson",
4686 arguments={
4687 "category": "strategy",
4688 "lesson": (
4689 "No ModelX-99 model has been successfully downloaded, so keep the download branch "
4690 "as the primary blocker before any benchmark work."
4691 ),
4692 },
4693 )
4694 ])
4695 ]),
4696 )
4697
4698 assert blocked.status == "blocked"
4699 assert blocked.result["error"] == "evidence grounding required"
4700 grounding = blocked.result["evidence_grounding"]
4701 assert "ModelX-99" in grounding["unsupported_tokens"]
4702 assert grounding["negative_claim_conflicts"][0]["token"] == "ModelX-99"
4703 finally:
4704 db.close()
4705
4706
4707def test_record_lesson_ignores_plain_titlecase_negative_conflict_tokens(tmp_path):
4708 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
4709 db = AgentDB(tmp_path / "state.db")
4710 try:
4711 job_id = db.create_job("Keep exact observed facts durable", title="lesson-grounding", kind="generic")
4712 run_id = db.start_run(job_id, model="test")
4713 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
4714 db.finish_step(
4715 step_id,
4716 status="completed",
4717 output_data={
4718 "success": True,
4719 "stdout": "/srv/vendor/lmstudio-community/Model.foo\n",
4720 },
4721 )
4722 db.finish_run(run_id, "completed")
4723
4724 result = run_one_step(
4725 job_id,
4726 config=config,
4727 db=db,
4728 llm=ScriptedLLM([
4729 LLMResponse(tool_calls=[
4730 ToolCall(
4731 name="record_lesson",
4732 arguments={
4733 "category": "strategy",
4734 "lesson": "No Studio-specific conclusion should be drawn from this branch yet.",
4735 },
4736 )
4737 ])
4738 ]),
4739 )
4740
4741 assert result.status == "completed"
4742 finally:
4743 db.close()
4744
4745
4746def test_record_lesson_allows_negative_claim_when_evidence_is_also_negative(tmp_path):
4747 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
4748 db = AgentDB(tmp_path / "state.db")
4749 try:
4750 job_id = db.create_job("Keep exact observed facts durable", title="lesson-grounding", kind="generic")
4751 run_id = db.start_run(job_id, model="test")
4752 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
4753 db.finish_step(
4754 step_id,
4755 status="completed",
4756 output_data={
4757 "success": True,
4758 "stdout": "ls: cannot access '/tmp/ModelX-99.gguf': No such file or directory\n",
4759 },
4760 )
4761 db.finish_run(run_id, "completed")
4762
4763 result = run_one_step(
4764 job_id,
4765 config=config,
4766 db=db,
4767 llm=ScriptedLLM([
4768 LLMResponse(tool_calls=[
4769 ToolCall(
4770 name="record_lesson",
4771 arguments={
4772 "category": "strategy",
4773 "lesson": (
4774 "No ModelX-99 file exists in the checked path, so the next branch must use a "
4775 "different observed source or record the missing file as blocked."
4776 ),
4777 },
4778 )
4779 ])
4780 ]),
4781 )
4782
4783 assert result.status == "completed"
4784 lesson = db.get_job(job_id)["metadata"]["lessons"][0]
4785 assert "ModelX-99" in lesson["lesson"]
4786 finally:
4787 db.close()
4788
4789
4790def test_shell_path_recovery_prompt_shows_missing_executable(tmp_path):
4791 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
4792 db = AgentDB(tmp_path / "state.db")
4793 try:
4794 job_id = db.create_job("Run a measured tool after validating paths", title="missing-executable", kind="generic")
4795 run_id = db.start_run(job_id, model="test")
4796 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
4797 db.finish_step(
4798 step_id,
4799 status="completed",
4800 output_data={
4801 "success": True,
4802 "command": "/opt/tools/runner --measure",
4803 "stdout": "/bin/sh: 1: /opt/tools/runner: not found\n",
4804 "stderr": "",
4805 },
4806 )
4807 db.finish_run(run_id, "completed")
4808 llm = CapturingLLM(LLMResponse(tool_calls=[
4809 ToolCall(
4810 name="record_lesson",
4811 arguments={
4812 "category": "strategy",
4813 "lesson": "The /opt/tools/runner executable was missing, so validate a real executable path before measuring.",
4814 },
4815 )
4816 ]))
4817
4818 result = run_one_step(job_id, config=config, db=db, llm=llm)
4819
4820 assert result.status == "completed"
4821 prompt = llm.messages[-1]["content"]
4822 assert "Shell path recovery" in prompt
4823 assert "/opt/tools/runner" in prompt
4824 assert "Do not treat this output as a successful measurement" in prompt
4825 assert "locate or verify the real executable/file path" in prompt
4826 finally:
4827 db.close()
4828
4829
4830def test_shell_path_recovery_prompt_prefers_observed_candidate_executable(tmp_path):
4831 db = AgentDB(tmp_path / "state.db")
4832 try:
4833 job_id = db.create_job("Build and benchmark a generic project", title="candidate-executable", kind="generic")
4834 run_id = db.start_run(job_id, model="test")
4835 observed_step = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
4836 db.finish_step(
4837 observed_step,
4838 status="completed",
4839 output_data={
4840 "success": True,
4841 "command": "ls /tmp/tools/build-tool",
4842 "stdout": "/tmp/tools/build-tool\n---\nbuild-tool\nhelper\n",
4843 "stderr": "",
4844 },
4845 )
4846 failed_step = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
4847 db.finish_step(
4848 failed_step,
4849 status="failed",
4850 output_data={
4851 "success": False,
4852 "command": "cd /tmp/project && build-tool ..",
4853 "stdout": "/bin/sh: 1: build-tool: not found\n",
4854 "stderr": "",
4855 "error": "command output indicates missing command despite exit status 0",
4856 },
4857 )
4858 db.finish_run(run_id, "completed")
4859
4860 messages = build_messages(db.get_job(job_id), db.list_steps(job_id=job_id))
4861 prompt = messages[-1]["content"]
4862
4863 assert "Shell path recovery" in prompt
4864 assert "Missing commands: build-tool" in prompt
4865 assert "Observed candidate executable for build-tool: /tmp/tools/build-tool" in prompt
4866 assert "try the exact candidate path or add its directory to PATH" in prompt
4867 finally:
4868 db.close()
4869
4870
4871def test_shell_path_recovery_prompt_preserves_partial_success_paths(tmp_path):
4872 db = AgentDB(tmp_path / "state.db")
4873 try:
4874 job_id = db.create_job("Recover from mixed shell output", title="partial-shell-paths", kind="generic")
4875 run_id = db.start_run(job_id, model="test")
4876 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
4877 db.finish_step(
4878 step_id,
4879 status="failed",
4880 output_data={
4881 "success": False,
4882 "command": "ls /tmp/bin/build-tool /tmp/bin/compiler; which compiler",
4883 "returncode": 1,
4884 "stdout": (
4885 "ls: cannot access '/tmp/bin/compiler': No such file or directory\n"
4886 "lrwxrwxrwx 1 user user 30 Jan 1 00:00 /tmp/bin/build-tool -> /tmp/runtime/build-tool\n"
4887 "/usr/bin/compiler\n"
4888 ),
4889 "stderr": "",
4890 "error": "command exited with status 1",
4891 },
4892 )
4893 db.finish_run(run_id, "completed")
4894
4895 messages = build_messages(db.get_job(job_id), db.list_steps(job_id=job_id))
4896 prompt = messages[-1]["content"]
4897
4898 assert "Shell path recovery" in prompt
4899 assert "Missing paths: /tmp/bin/compiler" in prompt
4900 assert "Observed executable paths in partial shell output" in prompt
4901 assert "/tmp/bin/build-tool" in prompt
4902 assert "/tmp/runtime/build-tool" in prompt
4903 assert "/usr/bin/compiler" in prompt
4904 assert "Observed executable paths in partial shell output: /tmp/bin/compiler" not in prompt
4905 finally:
4906 db.close()
4907
4908
4909def test_shell_exec_blocks_bare_retry_when_candidate_executable_observed(tmp_path):
4910 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
4911 db = AgentDB(tmp_path / "state.db")
4912 try:
4913 job_id = db.create_job("Recover with observed executable", title="candidate-retry", kind="generic")
4914 run_id = db.start_run(job_id, model="test")
4915 observed_step = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
4916 db.finish_step(
4917 observed_step,
4918 status="completed",
4919 output_data={
4920 "success": True,
4921 "command": "ls /tmp/tools/build-tool",
4922 "stdout": "/tmp/tools/build-tool\n",
4923 "stderr": "",
4924 },
4925 )
4926 failed_step = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
4927 db.finish_step(
4928 failed_step,
4929 status="failed",
4930 output_data={
4931 "success": False,
4932 "command": "build-tool --version",
4933 "stdout": "/bin/sh: 1: build-tool: not found\n",
4934 "stderr": "",
4935 "error": "command output indicates missing command despite exit status 0",
4936 },
4937 )
4938 db.finish_run(run_id, "completed")
4939
4940 result = run_one_step(
4941 job_id,
4942 config=config,
4943 db=db,
4944 llm=ScriptedLLM([
4945 LLMResponse(tool_calls=[
4946 ToolCall(name="shell_exec", arguments={"command": "build-tool --version"})
4947 ])
4948 ]),
4949 registry=SuccessRegistry(),
4950 )
4951
4952 assert result.status == "blocked"
4953 assert result.result["error"] == "observed executable recovery required"
4954 assert result.result["candidate_recovery"]["missing_command"] == "build-tool"
4955 assert result.result["candidate_recovery"]["candidate_executables"] == ["/tmp/tools/build-tool"]
4956 finally:
4957 db.close()
4958
4959
4960def test_permission_failure_prompt_blocks_package_manager_retry(tmp_path):
4961 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
4962 db = AgentDB(tmp_path / "state.db")
4963 try:
4964 job_id = db.create_job("Recover from generic build prerequisites", title="permission-recovery", kind="generic")
4965 run_id = db.start_run(job_id, model="test")
4966 step_id = db.add_step(
4967 job_id=job_id,
4968 run_id=run_id,
4969 kind="tool",
4970 tool_name="shell_exec",
4971 input_data={"arguments": {"command": "apt-get install -y build-tool"}},
4972 )
4973 db.finish_step(
4974 step_id,
4975 status="failed",
4976 output_data={
4977 "success": False,
4978 "command": "apt-get install -y build-tool",
4979 "returncode": 0,
4980 "stdout": (
4981 "E: Could not open lock file /var/lib/dpkg/lock-frontend - open (13: Permission denied)\n"
4982 "E: Unable to acquire the dpkg frontend lock (/var/lib/dpkg/lock-frontend), are you root?\n"
4983 ),
4984 "stderr": "",
4985 "error": "command output indicates authentication or authorization failure despite exit status 0",
4986 },
4987 )
4988 db.finish_run(run_id, "completed")
4989 llm = ScriptedLLM([LLMResponse(tool_calls=[
4990 ToolCall(name="shell_exec", arguments={"command": "apt-get install -y another-tool"})
4991 ])])
4992
4993 result = run_one_step(job_id, config=config, db=db, llm=llm, registry=SuccessRegistry())
4994
4995 assert result.status == "blocked"
4996 assert result.result["error"] == "privileged command recovery required"
4997 assert result.result["privileged_failure"]["step_no"] == 1
4998 assert "non-privileged recovery" in result.result["guidance"]
4999 finally:
5000 db.close()
5001
5002
5003def test_permission_failure_prompt_mentions_non_privileged_recovery(tmp_path):
5004 db = AgentDB(tmp_path / "state.db")
5005 try:
5006 job_id = db.create_job("Recover from generic build prerequisites", title="permission-prompt", kind="generic")
5007 run_id = db.start_run(job_id, model="test")
5008 step_id = db.add_step(
5009 job_id=job_id,
5010 run_id=run_id,
5011 kind="tool",
5012 tool_name="shell_exec",
5013 input_data={"arguments": {"command": "sudo package-manager install build-tool"}},
5014 )
5015 db.finish_step(
5016 step_id,
5017 status="failed",
5018 output_data={
5019 "success": False,
5020 "command": "sudo package-manager install build-tool",
5021 "stdout": "sudo: a password is required\n",
5022 "stderr": "",
5023 "error": "authentication or authorization failure",
5024 },
5025 )
5026 db.finish_run(run_id, "completed")
5027
5028 messages = build_messages(db.get_job(job_id), db.list_steps(job_id=job_id))
5029 prompt = messages[-1]["content"]
5030
5031 assert "Shell permission recovery" in prompt
5032 assert "failed because a privileged/package-manager command lacked permission" in prompt
5033 assert "non-privileged alternatives" in prompt
5034 assert "operator credentials" in prompt
5035 finally:
5036 db.close()
5037
5038
5039def test_record_findings_blocks_negative_file_pattern_that_conflicts_with_positive_evidence(tmp_path):
5040 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
5041 db = AgentDB(tmp_path / "state.db")
5042 try:
5043 job_id = db.create_job("Keep file discovery evidence exact", title="file-grounding", kind="generic")
5044 run_id = db.start_run(job_id, model="test")
5045 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
5046 db.finish_step(
5047 step_id,
5048 status="completed",
5049 output_data={
5050 "success": True,
5051 "stdout": (
5052 "/srv/data/WidgetModel-99-Q4.foo\n"
5053 "/tmp/results/report.alpha\n"
5054 "/var/cache/other-file.foo\n"
5055 ),
5056 },
5057 )
5058 db.finish_run(run_id, "completed")
5059
5060 blocked = run_one_step(
5061 job_id,
5062 config=config,
5063 db=db,
5064 llm=ScriptedLLM([
5065 LLMResponse(tool_calls=[
5066 ToolCall(
5067 name="record_findings",
5068 arguments={
5069 "findings": [
5070 {
5071 "name": "No .foo files found on filesystem",
5072 "category": "environment_baseline",
5073 "status": "confirmed",
5074 "reason": "Shell search found zero .foo files larger than 1MB anywhere on the system.",
5075 }
5076 ]
5077 },
5078 )
5079 ])
5080 ]),
5081 )
5082
5083 assert blocked.status == "blocked"
5084 assert blocked.result["error"] == "evidence grounding required"
5085 grounding = blocked.result["evidence_grounding"]
5086 assert ".foo" in grounding["unsupported_tokens"]
5087 assert grounding["negative_claim_conflicts"][0]["token"] == ".foo"
5088 finally:
5089 db.close()
5090
5091
5092def test_file_pattern_grounding_ignores_hidden_path_components():
5093 text = (
5094 "No compiled binary exists yet. Valid data is at "
5095 "/srv/cache/.lmstudio/models/ModelX.gguf and /tmp/.cache/item.bin. "
5096 "No *.foo files were found."
5097 )
5098
5099 tokens = _file_pattern_tokens_for_grounding(text)
5100
5101 assert ".lmstudio" not in tokens
5102 assert ".cache" not in tokens
5103 assert ".foo" in tokens
5104
5105
5106def test_record_experiment_allows_classifying_observed_files_as_non_primary(tmp_path):
5107 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
5108 db = AgentDB(tmp_path / "state.db")
5109 try:
5110 job_id = db.create_job("Validate observed files before primary artifact work", title="file-classification", kind="generic")
5111 run_id = db.start_run(job_id, model="test")
5112 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
5113 db.finish_step(
5114 step_id,
5115 status="completed",
5116 output_data={
5117 "success": True,
5118 "stdout": (
5119 "/srv/data/support-alpha-v2.foo\n"
5120 "/srv/data/support-beta-v2.foo\n"
5121 ),
5122 },
5123 )
5124 db.finish_run(run_id, "completed")
5125
5126 result = run_one_step(
5127 job_id,
5128 config=config,
5129 db=db,
5130 llm=ScriptedLLM([
5131 LLMResponse(tool_calls=[
5132 ToolCall(
5133 name="record_experiment",
5134 arguments={
5135 "title": "primary artifact scan",
5136 "status": "measured",
5137 "metric_name": "primary_artifacts_found",
5138 "metric_value": 0,
5139 "metric_unit": "files",
5140 "config": {
5141 "files_found": [
5142 "/srv/data/support-alpha-v2.foo",
5143 "/srv/data/support-beta-v2.foo",
5144 ],
5145 },
5146 "result": (
5147 "scan found only support files: /srv/data/support-alpha-v2.foo and "
5148 "/srv/data/support-beta-v2.foo. observed files are not the required "
5149 "primary artifact, so the primary artifact remains missing."
5150 ),
5151 "next_action": "select a different observed source for the primary artifact.",
5152 },
5153 )
5154 ])
5155 ]),
5156 )
5157
5158 assert result.status == "completed"
5159 assert result.tool_name == "record_experiment"
5160 finally:
5161 db.close()
5162
5163
5164def test_record_findings_requires_exact_paths_when_file_candidates_exist(tmp_path):
5165 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
5166 db = AgentDB(tmp_path / "state.db")
5167 try:
5168 job_id = db.create_job("Keep file candidate evidence exact", title="path-grounding", kind="generic")
5169 run_id = db.start_run(job_id, model="test")
5170 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
5171 db.finish_step(
5172 step_id,
5173 status="completed",
5174 output_data={
5175 "success": True,
5176 "stdout": (
5177 "/srv/models/AlphaModel-Q4.foo\n"
5178 "/srv/models/BetaModel-Q8.foo\n"
5179 "/tmp/results/summary.json\n"
5180 ),
5181 },
5182 )
5183 db.finish_run(run_id, "completed")
5184
5185 blocked = run_one_step(
5186 job_id,
5187 config=config,
5188 db=db,
5189 llm=ScriptedLLM([
5190 LLMResponse(tool_calls=[
5191 ToolCall(
5192 name="record_findings",
5193 arguments={
5194 "findings": [
5195 {
5196 "name": "Model files found on disk",
5197 "category": "environment",
5198 "status": "new",
5199 "reason": "Shell search found candidate files, so the next branch can validate them.",
5200 }
5201 ]
5202 },
5203 )
5204 ])
5205 ]),
5206 )
5207
5208 assert blocked.status == "blocked"
5209 assert blocked.result["error"] == "evidence grounding required"
5210 grounding = blocked.result["evidence_grounding"]
5211 assert "/srv/models/AlphaModel-Q4.foo" in grounding["missing_candidate_paths"]
5212 assert "exact observed candidate paths" in grounding["guidance"]
5213 finally:
5214 db.close()
5215
5216
5217def test_missing_candidate_paths_are_ranked_before_grounding_guidance(tmp_path):
5218 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
5219 db = AgentDB(tmp_path / "state.db")
5220 try:
5221 job_id = db.create_job("Validate OmegaModel file before benchmarking", title="omega benchmark", kind="generic")
5222 run_id = db.start_run(job_id, model="test")
5223 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
5224 db.finish_step(
5225 step_id,
5226 status="completed",
5227 output_data={
5228 "success": True,
5229 "stdout": "\n".join(
5230 [f"/srv/models/ggml-vocab-{index}.foo" for index in range(20)]
5231 + ["/srv/models/OmegaModel-primary.foo"]
5232 ),
5233 },
5234 )
5235 db.finish_run(run_id, "completed")
5236
5237 blocked = run_one_step(
5238 job_id,
5239 config=config,
5240 db=db,
5241 llm=ScriptedLLM([
5242 LLMResponse(tool_calls=[
5243 ToolCall(
5244 name="record_findings",
5245 arguments={
5246 "findings": [
5247 {
5248 "name": "Candidate files found",
5249 "category": "environment",
5250 "status": "new",
5251 "reason": "A file search found candidate files to validate.",
5252 }
5253 ]
5254 },
5255 )
5256 ])
5257 ]),
5258 )
5259
5260 assert blocked.status == "blocked"
5261 grounding = blocked.result["evidence_grounding"]
5262 assert grounding["missing_candidate_paths"][0] == "/srv/models/OmegaModel-primary.foo"
5263 finally:
5264 db.close()
5265
5266
5267def test_record_findings_allows_exact_candidate_path_summary(tmp_path):
5268 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
5269 db = AgentDB(tmp_path / "state.db")
5270 try:
5271 job_id = db.create_job("Keep file candidate evidence exact", title="path-grounding", kind="generic")
5272 run_id = db.start_run(job_id, model="test")
5273 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
5274 db.finish_step(
5275 step_id,
5276 status="completed",
5277 output_data={
5278 "success": True,
5279 "stdout": "/srv/models/AlphaModel-Q4.foo\n/tmp/results/summary.json\n",
5280 },
5281 )
5282 db.finish_run(run_id, "completed")
5283
5284 result = run_one_step(
5285 job_id,
5286 config=config,
5287 db=db,
5288 llm=ScriptedLLM([
5289 LLMResponse(tool_calls=[
5290 ToolCall(
5291 name="record_findings",
5292 arguments={
5293 "findings": [
5294 {
5295 "name": "Model file candidate",
5296 "category": "environment",
5297 "status": "new",
5298 "reason": "Candidate path /srv/models/AlphaModel-Q4.foo should be validated next.",
5299 }
5300 ]
5301 },
5302 )
5303 ])
5304 ]),
5305 )
5306
5307 assert result.status == "completed"
5308 finally:
5309 db.close()
5310
5311
5312def test_evidence_grounding_blocks_positive_claim_for_missing_path(tmp_path):
5313 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
5314 db = AgentDB(tmp_path / "state.db")
5315 try:
5316 job_id = db.create_job("Verify a generic executable path", title="path polarity", kind="generic")
5317 run_id = db.start_run(job_id, model="test")
5318 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
5319 db.finish_step(
5320 step_id,
5321 status="completed",
5322 output_data={
5323 "success": True,
5324 "command": "ls /tmp/tools/build-tool /usr/bin/make",
5325 "stdout": (
5326 "ls: cannot access '/tmp/tools/build-tool': No such file or directory\n"
5327 "/usr/bin/make\n"
5328 "This shell probe checked candidate executable paths before the build step.\n"
5329 ),
5330 "stderr": "",
5331 },
5332 )
5333 db.finish_run(run_id, "completed")
5334
5335 blocked = run_one_step(
5336 job_id,
5337 config=config,
5338 db=db,
5339 llm=ScriptedLLM([
5340 LLMResponse(tool_calls=[
5341 ToolCall(
5342 name="record_experiment",
5343 arguments={
5344 "title": "Build tool path verification",
5345 "status": "measured",
5346 "metric_name": "build_prerequisites",
5347 "metric_value": 2,
5348 "metric_unit": "items",
5349 "result": "Found build tool at /tmp/tools/build-tool and make at /usr/bin/make. Build prerequisites are verified.",
5350 },
5351 )
5352 ])
5353 ]),
5354 )
5355
5356 assert blocked.status == "blocked"
5357 assert blocked.result["error"] == "evidence grounding required"
5358 grounding = blocked.result["evidence_grounding"]
5359 assert "/tmp/tools/build-tool" in grounding["unsupported_tokens"]
5360 assert grounding["negative_path_conflicts"][0]["path"] == "/tmp/tools/build-tool"
5361 assert "claims a path or executable is present" in grounding["guidance"]
5362 finally:
5363 db.close()
5364
5365
5366def test_evidence_grounding_checks_later_positive_path_mentions(tmp_path):
5367 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
5368 db = AgentDB(tmp_path / "state.db")
5369 try:
5370 job_id = db.create_job("Verify executable path polarity", title="path later mention", kind="generic")
5371 run_id = db.start_run(job_id, model="test")
5372 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
5373 db.finish_step(
5374 step_id,
5375 status="completed",
5376 output_data={
5377 "success": True,
5378 "stdout": (
5379 "ls: cannot access '/tmp/tools/build-tool': No such file or directory\n"
5380 "The probe also checked unrelated files and returned partial output for review.\n"
5381 ),
5382 "stderr": "",
5383 },
5384 )
5385 db.finish_run(run_id, "completed")
5386
5387 blocked = run_one_step(
5388 job_id,
5389 config=config,
5390 db=db,
5391 llm=ScriptedLLM([
5392 LLMResponse(tool_calls=[
5393 ToolCall(
5394 name="record_lesson",
5395 arguments={
5396 "category": "constraint",
5397 "lesson": (
5398 "candidate path /tmp/tools/build-tool was examined. "
5399 "The executable is at /tmp/tools/build-tool and should be used for the next build."
5400 ),
5401 },
5402 )
5403 ])
5404 ]),
5405 )
5406
5407 assert blocked.status == "blocked"
5408 assert blocked.result["error"] == "evidence grounding required"
5409 grounding = blocked.result["evidence_grounding"]
5410 assert grounding["negative_path_conflicts"][0]["path"] == "/tmp/tools/build-tool"
5411 finally:
5412 db.close()
5413
5414
5415def test_record_findings_allows_negative_file_pattern_when_evidence_is_negative(tmp_path):
5416 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
5417 db = AgentDB(tmp_path / "state.db")
5418 try:
5419 job_id = db.create_job("Keep file discovery evidence exact", title="file-grounding", kind="generic")
5420 run_id = db.start_run(job_id, model="test")
5421 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
5422 db.finish_step(
5423 step_id,
5424 status="completed",
5425 output_data={
5426 "success": True,
5427 "stdout": "find: '/tmp/WidgetModel-99.foo': No such file or directory\n",
5428 },
5429 )
5430 db.finish_run(run_id, "completed")
5431
5432 result = run_one_step(
5433 job_id,
5434 config=config,
5435 db=db,
5436 llm=ScriptedLLM([
5437 LLMResponse(tool_calls=[
5438 ToolCall(
5439 name="record_findings",
5440 arguments={
5441 "findings": [
5442 {
5443 "name": "No .foo file exists in the checked path",
5444 "category": "environment_baseline",
5445 "status": "confirmed",
5446 "reason": "The shell output says the checked .foo path does not exist.",
5447 }
5448 ]
5449 },
5450 )
5451 ])
5452 ]),
5453 )
5454
5455 assert result.status == "completed"
5456 findings = db.get_job(job_id)["metadata"]["finding_ledger"]
5457 assert findings[0]["name"] == "No .foo file exists in the checked path"
5458 finally:
5459 db.close()
5460
5461
5462def test_run_one_step_marks_contradicted_negative_finding_stale(tmp_path):
5463 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
5464 db = AgentDB(tmp_path / "state.db")
5465 try:
5466 job_id = db.create_job("Keep durable findings aligned with fresh evidence", title="stale-finding", kind="generic")
5467 db.append_finding_record(
5468 job_id,
5469 name="No .foo files found",
5470 category="environment_baseline",
5471 reason="Shell search found zero .foo files anywhere in the checked filesystem.",
5472 status="confirmed",
5473 )
5474 run_id = db.start_run(job_id, model="test")
5475 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
5476 db.finish_step(
5477 step_id,
5478 status="completed",
5479 output_data={
5480 "success": True,
5481 "stdout": (
5482 "discovery results from current filesystem scan:\n"
5483 "/srv/data/WidgetModel-99-Q4.foo\n"
5484 "/var/cache/other-file.foo\n"
5485 ),
5486 },
5487 )
5488 db.finish_run(run_id, "completed")
5489
5490 result = run_one_step(
5491 job_id,
5492 config=config,
5493 db=db,
5494 llm=ScriptedLLM([
5495 LLMResponse(tool_calls=[
5496 ToolCall(
5497 name="record_lesson",
5498 arguments={
5499 "category": "strategy",
5500 "lesson": "Fresh file-discovery evidence should override older absence claims.",
5501 },
5502 )
5503 ])
5504 ]),
5505 )
5506
5507 assert result.status == "completed"
5508 job = db.get_job(job_id)
5509 stale_records = job["metadata"].get("stale_negative_records")
5510 assert isinstance(stale_records, list)
5511 assert stale_records[0]["kind"] == "finding"
5512 assert stale_records[0]["token"] == ".foo"
5513
5514 from nipux_cli.worker_prompt_context import _ledgers_for_prompt
5515
5516 ledgers = _ledgers_for_prompt(job)
5517 assert "Contradicted negative findings suppressed" in ledgers
5518 assert "Suppressed 1 stale finding" in ledgers
5519 finally:
5520 db.close()
5521
5522
5523def test_run_one_step_marks_contradicted_negative_memory_node_stale(tmp_path):
5524 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
5525 db = AgentDB(tmp_path / "state.db")
5526 try:
5527 job_id = db.create_job("Keep memory aligned with fresh evidence", title="stale-memory", kind="generic")
5528 db.append_memory_graph_records(
5529 job_id,
5530 nodes=[
5531 {
5532 "key": "fact-no-local-foo",
5533 "title": "No local foo files",
5534 "kind": "fact",
5535 "status": "active",
5536 "summary": "Filesystem searches for *.foo files return 0 results.",
5537 },
5538 {
5539 "key": "current-branch",
5540 "title": "Current branch",
5541 "kind": "strategy",
5542 "status": "active",
5543 "summary": "Use fresh shell evidence before recording durable claims.",
5544 },
5545 ],
5546 )
5547 run_id = db.start_run(job_id, model="test")
5548 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
5549 db.finish_step(
5550 step_id,
5551 status="completed",
5552 output_data={
5553 "success": True,
5554 "stdout": (
5555 "Fresh filesystem discovery found an exact candidate path with enough surrounding context "
5556 "to count as evidence: /srv/data/WidgetModel-99-Q4.foo\n"
5557 ),
5558 },
5559 )
5560 db.finish_run(run_id, "completed")
5561
5562 result = run_one_step(
5563 job_id,
5564 config=config,
5565 db=db,
5566 llm=ScriptedLLM([
5567 LLMResponse(tool_calls=[
5568 ToolCall(
5569 name="record_lesson",
5570 arguments={
5571 "category": "strategy",
5572 "lesson": "Fresh file evidence overrides stale absence memory.",
5573 },
5574 )
5575 ])
5576 ]),
5577 )
5578
5579 assert result.status == "completed"
5580 job = db.get_job(job_id)
5581 stale_records = job["metadata"].get("stale_negative_records")
5582 assert any(record["kind"] == "memory_node" and record["record_id"] == "fact-no-local-foo" for record in stale_records)
5583
5584 content = build_messages(job, db.list_steps(job_id=job_id))[-1]["content"]
5585 assert "Suppressed 1 stale memory node" in content
5586 assert "No local foo files" not in content
5587 assert "Current branch" in content
5588 finally:
5589 db.close()
5590
5591
5592def test_record_lesson_allows_generic_strategy_without_concrete_facts(tmp_path):
5593 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
5594 db = AgentDB(tmp_path / "state.db")
5595 try:
5596 job_id = db.create_job("Improve a workflow", title="lesson-grounding", kind="generic")
5597 run_id = db.start_run(job_id, model="test")
5598 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
5599 db.finish_step(step_id, status="completed", output_data={"success": True, "stdout": "branch stalled\n"})
5600 db.finish_run(run_id, "completed")
5601
5602 result = run_one_step(
5603 job_id,
5604 config=config,
5605 db=db,
5606 llm=ScriptedLLM([
5607 LLMResponse(tool_calls=[
5608 ToolCall(
5609 name="record_lesson",
5610 arguments={
5611 "category": "strategy",
5612 "lesson": "When a branch stalls, pivot to the next measurable action instead of adding more notes.",
5613 },
5614 )
5615 ])
5616 ]),
5617 )
5618
5619 assert result.status == "completed"
5620 finally:
5621 db.close()
5622
5623
5624def test_record_lesson_allows_positive_checkpoint_summary_with_new_concrete_terms(tmp_path):
5625 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
5626 db = AgentDB(tmp_path / "state.db")
5627 try:
5628 job_id = db.create_job("Summarize a broad checkpoint", title="lesson-grounding", kind="generic")
5629 run_id = db.start_run(job_id, model="test")
5630 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
5631 db.finish_step(
5632 step_id,
5633 status="completed",
5634 output_data={"success": True, "stdout": "checkpoint read and accounting required\n"},
5635 )
5636 db.finish_run(run_id, "completed")
5637
5638 result = run_one_step(
5639 job_id,
5640 config=config,
5641 db=db,
5642 llm=ScriptedLLM([
5643 LLMResponse(tool_calls=[
5644 ToolCall(
5645 name="record_lesson",
5646 arguments={
5647 "category": "memory",
5648 "lesson": (
5649 "Recording checkpoint context says PackageManager-42 and RuntimeProbe-7 should stay "
5650 "available for the next branch, but no final benchmark decision has been made."
5651 ),
5652 },
5653 )
5654 ])
5655 ]),
5656 )
5657
5658 assert result.status == "completed"
5659 finally:
5660 db.close()
5661
5662
5663def test_record_findings_blocks_single_unsupported_identifier(tmp_path):
5664 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
5665 db = AgentDB(tmp_path / "state.db")
5666 try:
5667 job_id = db.create_job("Record only observed identifiers", title="grounding", kind="generic")
5668 run_id = db.start_run(job_id, model="test")
5669 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
5670 db.finish_step(
5671 step_id,
5672 status="completed",
5673 output_data={
5674 "success": True,
5675 "stdout": (
5676 "Observed candidate list from tool output. "
5677 "The source contains AlphaCandidate and BetaCandidate with ordinary text evidence. "
5678 "No generated opaque identifiers are present in this evidence."
5679 )
5680 * 12,
5681 },
5682 )
5683 db.finish_run(run_id, "completed")
5684
5685 result = run_one_step(
5686 job_id,
5687 config=config,
5688 db=db,
5689 llm=ScriptedLLM([
5690 LLMResponse(tool_calls=[ToolCall(name="record_findings", arguments={
5691 "findings": [{
5692 "name": "WWHHH5 generated candidate",
5693 "category": "test",
5694 "reason": "Observed candidate list needs follow-up, but this identifier was not in evidence.",
5695 "status": "new",
5696 }]
5697 })])
5698 ]),
5699 )
5700
5701 assert result.status == "blocked"
5702 assert result.result["error"] == "evidence grounding required"
5703 assert result.result["evidence_grounding"]["unsupported_tokens"] == ["WWHHH5"]
5704 finally:
5705 db.close()
5706
5707
5708def test_evidence_grounding_ignores_job_context_labels(tmp_path):
5709 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
5710 db = AgentDB(tmp_path / "state.db")
5711 try:
5712 job_id = db.create_job(
5713 "Benchmark AlphaModel throughput",
5714 title="alphamodel throughput fixed",
5715 kind="generic",
5716 )
5717 run_id = db.start_run(job_id, model="test")
5718 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
5719 db.finish_step(
5720 step_id,
5721 status="completed",
5722 output_data={
5723 "success": True,
5724 "stdout": (
5725 "Observed benchmark setup is ready. Runtime exists, candidate file exists, "
5726 "and the next action is a planned baseline measurement. "
5727 )
5728 * 6,
5729 },
5730 )
5731 db.finish_run(run_id, "completed")
5732
5733 result = run_one_step(
5734 job_id,
5735 config=config,
5736 db=db,
5737 llm=ScriptedLLM([
5738 LLMResponse(tool_calls=[ToolCall(name="record_experiment", arguments={
5739 "title": "Baseline Throughput - AlphaModel",
5740 "status": "planned",
5741 "higher_is_better": True,
5742 "metadata": {"project": "alphamodel-throughput"},
5743 "next_action": "Run the baseline measurement and record the observed metric.",
5744 })])
5745 ]),
5746 )
5747
5748 assert result.status == "completed"
5749 finally:
5750 db.close()
5751
5752
5753def test_evidence_grounding_blocks_unsupported_numeric_measurements(tmp_path):
5754 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
5755 db = AgentDB(tmp_path / "state.db")
5756 try:
5757 job_id = db.create_job("Validate candidate file size", title="size-grounding", kind="generic")
5758 run_id = db.start_run(job_id, model="test")
5759 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
5760 db.finish_step(
5761 step_id,
5762 status="completed",
5763 output_data={
5764 "success": True,
5765 "stdout": "-rw-r--r-- 1 user user 12G May 14 /srv/models/AlphaModel-Q4.foo\n",
5766 },
5767 )
5768 db.finish_run(run_id, "completed")
5769
5770 result = run_one_step(
5771 job_id,
5772 config=config,
5773 db=db,
5774 llm=ScriptedLLM([
5775 LLMResponse(tool_calls=[ToolCall(name="record_findings", arguments={
5776 "findings": [{
5777 "name": "Candidate file",
5778 "category": "environment",
5779 "location": "/srv/models/AlphaModel-Q4.foo",
5780 "metadata": {"file_size": "16G"},
5781 "status": "verified",
5782 }]
5783 })])
5784 ]),
5785 )
5786
5787 assert result.status == "blocked"
5788 assert result.result["error"] == "evidence grounding required"
5789 assert "16G" in result.result["evidence_grounding"]["unsupported_tokens"]
5790 finally:
5791 db.close()
5792
5793
5794def test_evidence_grounding_ignores_record_schema_keys(tmp_path):
5795 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
5796 db = AgentDB(tmp_path / "state.db")
5797 try:
5798 job_id = db.create_job("Record observed setup status", title="grounding", kind="generic")
5799 run_id = db.start_run(job_id, model="test")
5800 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
5801 db.finish_step(
5802 step_id,
5803 status="completed",
5804 output_data={"success": True, "stdout": "Python 3 is installed. curl is available. No token file was found."},
5805 )
5806 db.finish_run(run_id, "completed")
5807
5808 result = run_one_step(
5809 job_id,
5810 config=config,
5811 db=db,
5812 llm=ScriptedLLM([
5813 LLMResponse(tool_calls=[
5814 ToolCall(
5815 name="record_experiment",
5816 arguments={
5817 "title": "Setup status",
5818 "status": "measured",
5819 "metric_name": "ready_components",
5820 "metric_value": 1,
5821 "config": {"python_3_installed": True, "curl_available": True},
5822 "result": "Python 3 is installed and curl is available.",
5823 "next_action": "record remaining setup gaps or proceed to the next validation",
5824 },
5825 )
5826 ])
5827 ]),
5828 )
5829
5830 assert result.status == "completed"
5831 experiment = db.get_job(job_id)["metadata"]["experiment_ledger"][0]
5832 assert experiment["config"]["python_3_installed"] is True
5833 finally:
5834 db.close()
5835
5836
5837def test_evidence_grounding_uses_durable_finding_location_and_metadata(tmp_path):
5838 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
5839 db = AgentDB(tmp_path / "state.db")
5840 try:
5841 job_id = db.create_job("Record known candidate from durable state", title="durable-grounding", kind="generic")
5842 db.append_finding_record(
5843 job_id,
5844 name="Candidate runtime model",
5845 category="environment",
5846 location="/srv/models/AlphaModel-99-Q4.gguf",
5847 reason="Observed candidate model path is ready for later measurement.",
5848 status="available",
5849 metadata={"quantization": "Q4"},
5850 )
5851
5852 result = run_one_step(
5853 job_id,
5854 config=config,
5855 db=db,
5856 llm=ScriptedLLM([
5857 LLMResponse(tool_calls=[
5858 ToolCall(
5859 name="record_experiment",
5860 arguments={
5861 "title": "Candidate runtime model readiness",
5862 "status": "measured",
5863 "metric_name": "candidate_files",
5864 "metric_value": 1,
5865 "config": {"model": "/srv/models/AlphaModel-99-Q4.gguf"},
5866 "result": "Durable finding shows /srv/models/AlphaModel-99-Q4.gguf is available.",
5867 "next_action": "measure throughput with the durable candidate model",
5868 },
5869 )
5870 ])
5871 ]),
5872 )
5873
5874 assert result.status == "completed"
5875 assert result.tool_name == "record_experiment"
5876 finally:
5877 db.close()
5878
5879
5880def test_evidence_grounding_ignores_json_literals_even_when_stale_tokens_exist(tmp_path):
5881 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
5882 db = AgentDB(tmp_path / "state.db")
5883 try:
5884 job_id = db.create_job("Record observed benchmark plan", title="literal-grounding", kind="generic")
5885 db.update_job_metadata(job_id, {"unsupported_claim_tokens": ["true"]})
5886 run_id = db.start_run(job_id, model="test")
5887 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
5888 db.finish_step(
5889 step_id,
5890 status="completed",
5891 output_data={
5892 "success": True,
5893 "stdout": "Observed benchmark harness is ready and next action is to measure throughput. " * 4,
5894 },
5895 )
5896 db.finish_run(run_id, "completed")
5897
5898 result = run_one_step(
5899 job_id,
5900 config=config,
5901 db=db,
5902 llm=ScriptedLLM([
5903 LLMResponse(tool_calls=[ToolCall(name="record_experiment", arguments={
5904 "title": "Baseline benchmark plan",
5905 "status": "planned",
5906 "higher_is_better": True,
5907 "metric_name": "throughput",
5908 "metric_unit": "tokens/sec",
5909 "next_action": "Run the benchmark and record the observed metric.",
5910 })])
5911 ]),
5912 )
5913
5914 assert result.status == "completed"
5915 finally:
5916 db.close()
5917
5918
5919def test_evidence_grounding_ignores_planning_and_status_labels(tmp_path):
5920 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
5921 db = AgentDB(tmp_path / "state.db")
5922 try:
5923 job_id = db.create_job("Record observed build validation", title="status-grounding", kind="generic")
5924 run_id = db.start_run(job_id, model="test")
5925 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
5926 db.finish_step(
5927 step_id,
5928 status="completed",
5929 output_data={
5930 "success": True,
5931 "stdout": (
5932 "Observed file /srv/models/AlphaModel-Q4.foo exists. "
5933 "The tool output showed rc=0 and the benchmark branch can continue. "
5934 )
5935 * 4,
5936 },
5937 )
5938 db.finish_run(run_id, "completed")
5939
5940 result = run_one_step(
5941 job_id,
5942 config=config,
5943 db=db,
5944 llm=ScriptedLLM([
5945 LLMResponse(tool_calls=[ToolCall(name="record_roadmap", arguments={
5946 "title": "Build validation roadmap",
5947 "scope": "Checking the observed candidate before ongoing benchmark work.",
5948 "milestones": [
5949 {"title": "P1 validate observed candidate", "status": "active"},
5950 {"title": "P2 proceed to benchmark", "status": "planned"},
5951 ],
5952 })])
5953 ]),
5954 )
5955
5956 assert result.status == "completed"
5957 finally:
5958 db.close()
5959
5960
5961def test_run_one_step_blocks_memory_graph_with_unsupported_claims(tmp_path):
5962 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
5963 db = AgentDB(tmp_path / "state.db")
5964 try:
5965 job_id = db.create_job("Consolidate observed facts", title="memory-grounding", kind="generic")
5966 run_id = db.start_run(job_id, model="test")
5967 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
5968 db.finish_step(
5969 step_id,
5970 status="completed",
5971 output_data={
5972 "success": True,
5973 "stdout": "GPU: AMD Device 7590\nCPU: AMD Ryzen 9 7900X\nMemory: 93Gi\n",
5974 },
5975 )
5976 db.finish_run(run_id, "completed")
5977
5978 result = run_one_step(
5979 job_id,
5980 config=config,
5981 db=db,
5982 llm=ScriptedLLM([
5983 LLMResponse(tool_calls=[
5984 ToolCall(
5985 name="record_memory_graph",
5986 arguments={
5987 "nodes": [
5988 {
5989 "key": "hardware",
5990 "kind": "fact",
5991 "title": "NVIDIA GTX 970 CUDA hardware",
5992 "summary": "The machine has NVIDIA GTX 970 CUDA hardware.",
5993 }
5994 ]
5995 },
5996 )
5997 ])
5998 ]),
5999 )
6000
6001 assert result.status == "blocked"
6002 assert result.result["error"] == "evidence grounding required"
6003 assert result.result["blocked_tool"] == "record_memory_graph"
6004 finally:
6005 db.close()
6006
6007
6008def test_run_one_step_allows_memory_graph_identifier_labels_without_evidence(tmp_path):
6009 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
6010 db = AgentDB(tmp_path / "state.db")
6011 try:
6012 job_id = db.create_job("Consolidate abstract graph labels", title="memory-grounding", kind="generic")
6013 run_id = db.start_run(job_id, model="test")
6014 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
6015 db.finish_step(
6016 step_id,
6017 status="completed",
6018 output_data={
6019 "success": True,
6020 "stdout": (
6021 "Current observation: AMD Ryzen 9 7900X host with fresh API discovery evidence. "
6022 "The next branch is to convert existing source evidence into a download decision."
6023 ),
6024 },
6025 )
6026 db.finish_run(run_id, "completed")
6027
6028 result = run_one_step(
6029 job_id,
6030 config=config,
6031 db=db,
6032 llm=ScriptedLLM([
6033 LLMResponse(tool_calls=[
6034 ToolCall(
6035 name="record_memory_graph",
6036 arguments={
6037 "edges": [
6038 {
6039 "from_key": "decision-q4-km-primary",
6040 "relation": "informs",
6041 "to_key": "question-download-q4-km-url",
6042 },
6043 {
6044 "from_key": "skill-api-download-pattern",
6045 "relation": "supports",
6046 "to_key": "milestone-direct-url-download",
6047 },
6048 ]
6049 },
6050 )
6051 ])
6052 ]),
6053 )
6054
6055 assert result.status == "completed"
6056 assert result.tool_name == "record_memory_graph"
6057 finally:
6058 db.close()
6059
6060
6061def test_run_one_step_still_blocks_stale_memory_graph_key_claims(tmp_path):
6062 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
6063 db = AgentDB(tmp_path / "state.db")
6064 try:
6065 job_id = db.create_job(
6066 "Do not reintroduce stale graph labels",
6067 title="memory-grounding",
6068 kind="generic",
6069 metadata={"unsupported_claim_tokens": ["XeonE5-2690"]},
6070 )
6071 run_id = db.start_run(job_id, model="test")
6072 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
6073 db.finish_step(
6074 step_id,
6075 status="completed",
6076 output_data={
6077 "success": True,
6078 "stdout": (
6079 "Current observation: AMD Ryzen 9 7900X host with no legacy CPU marker in fresh evidence. "
6080 "Durable memory must not reuse unsupported old hardware claims."
6081 ),
6082 },
6083 )
6084 db.finish_run(run_id, "completed")
6085
6086 result = run_one_step(
6087 job_id,
6088 config=config,
6089 db=db,
6090 llm=ScriptedLLM([
6091 LLMResponse(tool_calls=[
6092 ToolCall(
6093 name="record_memory_graph",
6094 arguments={
6095 "edges": [
6096 {
6097 "from_key": "XeonE5-2690",
6098 "relation": "constrains",
6099 "to_key": "current-plan",
6100 }
6101 ]
6102 },
6103 )
6104 ])
6105 ]),
6106 )
6107
6108 assert result.status == "blocked"
6109 assert result.result["error"] == "evidence grounding required"
6110 assert "XeonE5-2690" in result.result["evidence_grounding"]["unsupported_tokens"]
6111 finally:
6112 db.close()
6113
6114
6115def test_run_one_step_allows_memory_graph_grounded_in_durable_records(tmp_path):
6116 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
6117 db = AgentDB(tmp_path / "state.db")
6118 try:
6119 job_id = db.create_job("Consolidate durable facts", title="memory-grounded-ledger", kind="generic")
6120 db.append_finding_record(
6121 job_id,
6122 name="Artifact cache includes Package_A-2.7.1 and backend XYZ123",
6123 category="environment_fact",
6124 reason="A saved checkpoint established Package_A-2.7.1 and backend XYZ123 as available options.",
6125 metadata={"evidence_artifact": "art_env"},
6126 )
6127
6128 result = run_one_step(
6129 job_id,
6130 config=config,
6131 db=db,
6132 llm=ScriptedLLM([
6133 LLMResponse(tool_calls=[
6134 ToolCall(
6135 name="record_memory_graph",
6136 arguments={
6137 "nodes": [
6138 {
6139 "key": "package-a",
6140 "kind": "fact",
6141 "title": "Package_A-2.7.1 via backend XYZ123",
6142 "summary": "Durable finding says Package_A-2.7.1 is available through backend XYZ123.",
6143 }
6144 ]
6145 },
6146 )
6147 ])
6148 ]),
6149 )
6150
6151 assert result.status == "completed"
6152 assert result.tool_name == "record_memory_graph"
6153 finally:
6154 db.close()
6155
6156
6157def test_run_one_step_blocks_memory_graph_grounded_only_in_stale_records(tmp_path):
6158 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
6159 db = AgentDB(tmp_path / "state.db")
6160 try:
6161 job_id = db.create_job(
6162 "Consolidate durable facts",
6163 title="memory-stale-ledger",
6164 kind="generic",
6165 metadata={"unsupported_claim_tokens": ["XeonE5-2690"]},
6166 )
6167 db.append_finding_record(
6168 job_id,
6169 name="Artifact cache includes XeonE5-2690",
6170 category="environment_fact",
6171 reason="Older ledger record mentioned XeonE5-2690.",
6172 metadata={"evidence_artifact": "art_old"},
6173 )
6174
6175 result = run_one_step(
6176 job_id,
6177 config=config,
6178 db=db,
6179 llm=ScriptedLLM([
6180 LLMResponse(tool_calls=[
6181 ToolCall(
6182 name="record_memory_graph",
6183 arguments={
6184 "nodes": [
6185 {
6186 "key": "package-a",
6187 "kind": "fact",
6188 "title": "XeonE5-2690",
6189 "summary": "XeonE5-2690 is still valid.",
6190 }
6191 ]
6192 },
6193 )
6194 ])
6195 ]),
6196 )
6197
6198 assert result.status == "blocked"
6199 assert result.result["error"] == "evidence grounding required"
6200 assert "XeonE5-2690" in result.result["evidence_grounding"]["unsupported_tokens"]
6201 finally:
6202 db.close()
6203
6204
6205def test_run_one_step_allows_stale_token_when_fresh_evidence_revalidates_it(tmp_path):
6206 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
6207 db = AgentDB(tmp_path / "state.db")
6208 try:
6209 job_id = db.create_job(
6210 "Revalidate durable facts",
6211 title="memory-stale-revalidated",
6212 kind="generic",
6213 metadata={"unsupported_claim_tokens": ["XeonE5-2690"]},
6214 )
6215 run_id = db.start_run(job_id, model="test")
6216 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
6217 db.finish_step(
6218 step_id,
6219 status="completed",
6220 output_data={"success": True, "stdout": "Fresh probe: CPU marker XeonE5-2690 is visible in this environment."},
6221 )
6222 db.finish_run(run_id, "completed")
6223
6224 result = run_one_step(
6225 job_id,
6226 config=config,
6227 db=db,
6228 llm=ScriptedLLM([
6229 LLMResponse(tool_calls=[
6230 ToolCall(
6231 name="record_memory_graph",
6232 arguments={
6233 "nodes": [
6234 {
6235 "key": "fresh-cpu",
6236 "kind": "fact",
6237 "title": "XeonE5-2690",
6238 "summary": "Fresh shell evidence revalidated XeonE5-2690.",
6239 }
6240 ]
6241 },
6242 )
6243 ])
6244 ]),
6245 )
6246
6247 assert result.status == "completed"
6248 assert result.tool_name == "record_memory_graph"
6249 finally:
6250 db.close()
6251
6252
6253def test_run_one_step_allows_durable_records_grounded_in_read_artifact(tmp_path):
6254 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
6255 db = AgentDB(tmp_path / "state.db")
6256 try:
6257 job_id = db.create_job("Record facts from saved evidence", title="grounded-read", kind="generic")
6258 run_id = db.start_run(job_id, model="test")
6259 step_id = db.add_step(
6260 job_id=job_id,
6261 run_id=run_id,
6262 kind="tool",
6263 tool_name="read_artifact",
6264 input_data={"arguments": {"artifact_id": "art_checkpoint"}},
6265 )
6266 db.finish_step(
6267 step_id,
6268 status="completed",
6269 output_data={
6270 "success": True,
6271 "content": (
6272 "Environment evidence: CPU Intel Xeon E5-2690 v3, architecture x86_64, "
6273 "memory 62.8G total, no NVIDIA GPU visible from nvidia-smi. "
6274 "This content is the source for durable records."
6275 ),
6276 },
6277 )
6278 db.finish_run(run_id, "completed")
6279
6280 result = run_one_step(
6281 job_id,
6282 config=config,
6283 db=db,
6284 llm=ScriptedLLM([
6285 LLMResponse(tool_calls=[
6286 ToolCall(
6287 name="record_findings",
6288 arguments={
6289 "findings": [
6290 {
6291 "name": "Intel Xeon E5-2690 v3 x86_64 environment",
6292 "category": "hardware_fact",
6293 "reason": "Saved checkpoint states CPU Intel Xeon E5-2690 v3, x86_64, memory 62.8G total, and no NVIDIA GPU visible.",
6294 "evidence_artifact": "art_checkpoint",
6295 }
6296 ]
6297 },
6298 )
6299 ])
6300 ]),
6301 )
6302
6303 assert result.status == "completed"
6304 findings = db.get_job(job_id)["metadata"]["finding_ledger"]
6305 assert findings[0]["name"] == "Intel Xeon E5-2690 v3 x86_64 environment"
6306 finally:
6307 db.close()
6308
6309
6310def test_run_one_step_scopes_grounding_to_cited_step(tmp_path):
6311 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
6312 db = AgentDB(tmp_path / "state.db")
6313 try:
6314 job_id = db.create_job("Record facts from cited evidence", title="cited-grounding", kind="generic")
6315 run_id = db.start_run(job_id, model="test")
6316 old_step = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
6317 db.finish_step(
6318 old_step,
6319 status="completed",
6320 output_data={
6321 "success": True,
6322 "stdout": (
6323 "Old evidence: Intel Xeon E5-2690 v3 with 62.8G memory. "
6324 "This is intentionally stale evidence from an earlier step and should not validate step #2."
6325 ),
6326 },
6327 )
6328 new_step = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
6329 db.finish_step(
6330 new_step,
6331 status="completed",
6332 output_data={
6333 "success": True,
6334 "stdout": (
6335 "Current evidence: AMD Ryzen 9 7900X with 93Gi memory and AMD GPU. "
6336 "This newer cited step is the only source that should ground claims citing step #2."
6337 ),
6338 },
6339 )
6340 db.finish_run(run_id, "completed")
6341
6342 result = run_one_step(
6343 job_id,
6344 config=config,
6345 db=db,
6346 llm=ScriptedLLM([
6347 LLMResponse(tool_calls=[
6348 ToolCall(
6349 name="write_artifact",
6350 arguments={
6351 "title": "Cited baseline",
6352 "summary": "Baseline from step #2.",
6353 "content": "From step #2: Intel Xeon E5-2690 v3 with 62.8G memory.",
6354 "artifact_type": "text",
6355 },
6356 )
6357 ])
6358 ]),
6359 )
6360
6361 assert result.status == "blocked"
6362 assert result.result["error"] == "evidence grounding required"
6363 assert "E5-2690" in result.result["evidence_grounding"]["unsupported_tokens"]
6364 assert result.result["evidence_grounding"]["evidence_steps"] == [2]
6365 assert "E5-2690" in db.get_job(job_id)["metadata"]["unsupported_claim_tokens"]
6366 finally:
6367 db.close()
6368
6369
6370def test_cited_step_numbers_ignore_ordinal_hash_labels():
6371 text = (
6372 "llama.cpp Build Attempt #3 should not cite old evidence. "
6373 "Use step #42 and shell_exec_step_1037 if explicit evidence is needed. "
6374 "The older step-2678 reference is also explicit."
6375 )
6376
6377 assert _cited_step_numbers(text) == {42, 1037, 2678}
6378
6379
6380def test_prompt_shows_evidence_grounding_tokens_after_block(tmp_path):
6381 db = AgentDB(tmp_path / "state.db")
6382 try:
6383 job_id = db.create_job("Use only observed evidence", title="grounding-prompt", kind="generic")
6384 run_id = db.start_run(job_id, model="test")
6385 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="write_artifact")
6386 db.finish_step(
6387 step_id,
6388 status="blocked",
6389 output_data={
6390 "success": True,
6391 "recoverable": True,
6392 "error": "evidence grounding required",
6393 "evidence_grounding": {"unsupported_tokens": ["NVIDIA", "Xeon", "AVX-512"]},
6394 },
6395 )
6396 job = db.get_job(job_id)
6397 content = build_messages(job, db.list_steps(job_id=job_id))[-1]["content"]
6398
6399 assert "unsupported=NVIDIA, Xeon, AVX-512" in content
6400 assert "use only tokens present in recent observed evidence" in content
6401 finally:
6402 db.close()
6403
6404
6405def test_prompt_shows_missing_candidate_paths_after_grounding_block(tmp_path):
6406 db = AgentDB(tmp_path / "state.db")
6407 try:
6408 job_id = db.create_job("Optimize benchmark speed with exact file evidence", title="grounding-paths", kind="generic")
6409 run_id = db.start_run(job_id, model="test")
6410 for index in range(18):
6411 shell_step = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
6412 db.finish_step(shell_step, status="completed", output_data={"success": True, "stdout": f"probe {index}"})
6413 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_experiment")
6414 db.finish_step(
6415 step_id,
6416 status="blocked",
6417 summary="blocked record_experiment; evidence grounding required",
6418 output_data={
6419 "success": True,
6420 "recoverable": True,
6421 "error": "evidence grounding required",
6422 "evidence_grounding": {
6423 "missing_candidate_paths": [
6424 "/srv/models/AlphaModel-Q4.foo",
6425 "/srv/models/BetaModel-Q8.foo",
6426 ],
6427 "unsupported_tokens": [
6428 "/srv/models/AlphaModel-Q4.foo",
6429 "/srv/models/BetaModel-Q8.foo",
6430 ],
6431 },
6432 },
6433 )
6434 job = db.get_job(job_id)
6435 content = build_messages(job, db.list_steps(job_id=job_id))[-1]["content"]
6436
6437 assert "Recent evidence grounding blocked a durable record" in content
6438 assert "This job needs measured progress" not in content
6439 assert "/srv/models/AlphaModel-Q4.foo" in content
6440 assert "rewrite the durable record with exact observed paths" in content
6441 finally:
6442 db.close()
6443
6444
6445def test_prompt_adds_ranked_current_candidates_to_stale_grounding_block(tmp_path):
6446 db = AgentDB(tmp_path / "state.db")
6447 try:
6448 job_id = db.create_job("Benchmark OmegaModel throughput", title="grounding-current-candidates", kind="generic")
6449 db.update_job_metadata(
6450 job_id,
6451 {
6452 "task_queue": [
6453 {
6454 "title": "Validate OmegaModel file path",
6455 "status": "open",
6456 "contract": "experiment",
6457 "acceptance_criteria": "Use a validated candidate path.",
6458 "evidence_needed": "Shell output with file size and benchmark result.",
6459 }
6460 ]
6461 },
6462 )
6463 run_id = db.start_run(job_id, model="test")
6464 shell_step = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
6465 db.finish_step(
6466 shell_step,
6467 status="completed",
6468 output_data={
6469 "success": True,
6470 "stdout": (
6471 "/tmp/aux/ggml-vocab-alpha.foo\n"
6472 "/srv/models/OmegaModel-primary.foo\n"
6473 ),
6474 },
6475 )
6476 block_step = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_experiment")
6477 db.finish_step(
6478 block_step,
6479 status="blocked",
6480 summary="blocked record_experiment; evidence grounding required",
6481 output_data={
6482 "success": True,
6483 "recoverable": True,
6484 "error": "evidence grounding required",
6485 "evidence_grounding": {
6486 "missing_candidate_paths": ["/tmp/aux/ggml-vocab-alpha.foo"],
6487 "unsupported_tokens": ["/tmp/aux/ggml-vocab-alpha.foo"],
6488 },
6489 },
6490 )
6491
6492 content = build_messages(db.get_job(job_id), db.list_steps(job_id=job_id))[-1]["content"]
6493 idx = content.index("Next-action constraint:")
6494 next_constraint = content[idx: idx + 1200]
6495
6496 assert "current ranked candidate paths are available" in next_constraint
6497 ranked_text = next_constraint[next_constraint.index("Candidate paths:"):]
6498 assert ranked_text.index("/srv/models/OmegaModel-primary.foo") < ranked_text.index("/tmp/aux/ggml-vocab-alpha.foo")
6499 finally:
6500 db.close()
6501
6502
6503def test_prompt_does_not_resurface_grounding_block_after_durable_resolution(tmp_path):
6504 db = AgentDB(tmp_path / "state.db")
6505 try:
6506 job_id = db.create_job("Use exact file evidence", title="grounding-resolved", kind="generic")
6507 run_id = db.start_run(job_id, model="test")
6508 block_step = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_findings")
6509 db.finish_step(
6510 block_step,
6511 status="blocked",
6512 summary="blocked record_findings; evidence grounding required",
6513 output_data={
6514 "success": True,
6515 "recoverable": True,
6516 "error": "evidence grounding required",
6517 "evidence_grounding": {
6518 "missing_candidate_paths": ["/srv/models/AlphaModel-Q4.foo"],
6519 "unsupported_tokens": ["/srv/models/AlphaModel-Q4.foo"],
6520 },
6521 },
6522 )
6523 resolved_step = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_findings")
6524 db.finish_step(
6525 resolved_step,
6526 status="completed",
6527 output_data={
6528 "success": True,
6529 "findings": [{"name": "Exact path accounted", "reason": "/srv/models/AlphaModel-Q4.foo was validated."}],
6530 },
6531 )
6532 db.finish_run(run_id, "completed")
6533
6534 content = build_messages(db.get_job(job_id), db.list_steps(job_id=job_id))[-1]["content"]
6535 next_action = content.split("Next-action constraint:", 1)[1].split("\n\n", 1)[0]
6536
6537 assert "Recent evidence grounding blocked a durable record" not in content
6538 assert "/srv/models/AlphaModel-Q4.foo" not in next_action
6539 finally:
6540 db.close()
6541
6542
6543def test_prompt_suppresses_findings_matching_stale_claim_tokens(tmp_path):
6544 db = AgentDB(tmp_path / "state.db")
6545 try:
6546 job_id = db.create_job("Prefer current durable evidence", title="stale-ledger", kind="generic")
6547 db.append_finding_record(job_id, name="Intel Xeon E5-2690 v3 baseline", category="hardware")
6548 db.append_finding_record(job_id, name="AMD Ryzen 9 7900X baseline", category="hardware")
6549 db.append_lesson(
6550 job_id,
6551 "Evidence grounding rejected unsupported concrete tokens for record_experiment: E5-2690, v3, RAM. Treat matching prior ledger claims as stale.",
6552 category="mistake",
6553 )
6554
6555 job = db.get_job(job_id)
6556 content = build_messages(job, db.list_steps(job_id=job_id))[-1]["content"]
6557
6558 assert "Unsupported/stale claim tokens to avoid until re-verified: [unsupported-stale-claim]" in content
6559 assert "Suppressed 1 stale finding" in content
6560 assert "AMD Ryzen 9 7900X baseline" in content
6561 assert "Intel Xeon E5-2690 v3 baseline" not in content
6562 finally:
6563 db.close()
6564
6565
6566def test_prompt_prioritizes_validation_for_recent_candidate_file_paths(tmp_path):
6567 db = AgentDB(tmp_path / "state.db")
6568 try:
6569 job_id = db.create_job("Validate a discovered runtime file", title="candidate-file", kind="generic")
6570 db.update_job_metadata(
6571 job_id,
6572 {
6573 "task_queue": [
6574 {
6575 "title": "Run baseline benchmark with the discovered file",
6576 "status": "open",
6577 "contract": "experiment",
6578 "acceptance_criteria": "Benchmark command uses a validated file path.",
6579 "evidence_needed": "Shell output showing file size and benchmark result.",
6580 }
6581 ]
6582 },
6583 )
6584 run_id = db.start_run(job_id, model="test")
6585 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
6586 db.finish_step(
6587 step_id,
6588 status="completed",
6589 output_data={
6590 "success": True,
6591 "stdout": (
6592 "candidate files:\n"
6593 "/srv/models/ExampleModel-Q4.foo\n"
6594 "/srv/models/sidecar.txt\n"
6595 ),
6596 },
6597 )
6598 db.finish_run(run_id, "completed")
6599
6600 content = build_messages(db.get_job(job_id), db.list_steps(job_id=job_id))[-1]["content"]
6601
6602 assert "Candidate file discovery:" in content
6603 assert "/srv/models/ExampleModel-Q4.foo" in content
6604 assert "Validate likely candidates with shell_exec" in content
6605 assert "Do not reject a non-empty candidate binary from `file` output alone" in content
6606 finally:
6607 db.close()
6608
6609
6610def test_prompt_deprioritizes_recent_stub_candidate_file_paths(tmp_path):
6611 db = AgentDB(tmp_path / "state.db")
6612 try:
6613 job_id = db.create_job("Benchmark AlphaModel throughput", title="alpha benchmark", kind="generic")
6614 db.update_job_metadata(
6615 job_id,
6616 {
6617 "task_queue": [
6618 {
6619 "title": "Validate candidate model file before benchmark",
6620 "status": "open",
6621 "contract": "experiment",
6622 "acceptance_criteria": "Benchmark uses a validated model file.",
6623 "evidence_needed": "Shell output showing file size and parser/header status.",
6624 }
6625 ]
6626 },
6627 )
6628 run_id = db.start_run(job_id, model="test")
6629 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
6630 db.finish_step(
6631 step_id,
6632 status="completed",
6633 output_data={
6634 "success": True,
6635 "stdout": (
6636 "-rw-r--r-- 1 user user 29 May 15 10:00 /tmp/models/AlphaModel-Q4.foo\n"
6637 "/tmp/models/AlphaModel-Q4.foo: ASCII text, with no line terminators\n"
6638 "-rw-r--r-- 1 user user 12G May 15 10:01 /srv/models/AlphaModel-IQ3.foo\n"
6639 ),
6640 },
6641 )
6642 db.finish_run(run_id, "completed")
6643
6644 content = build_messages(db.get_job(job_id), db.list_steps(job_id=job_id))[-1]["content"]
6645
6646 idx = content.index("Next-action constraint:")
6647 next_constraint = content[idx: idx + 1400]
6648 assert "/srv/models/AlphaModel-IQ3.foo" in next_constraint
6649 assert "/tmp/models/AlphaModel-Q4.foo" not in next_constraint
6650 assert "Recently invalid or stub-like candidates" in content
6651 assert "/tmp/models/AlphaModel-Q4.foo" in content
6652 finally:
6653 db.close()
6654
6655
6656def test_prompt_isolates_current_execution_focus_for_candidate_validation(tmp_path):
6657 db = AgentDB(tmp_path / "state.db")
6658 try:
6659 job_id = db.create_job("Benchmark AlphaModel throughput", title="alpha benchmark", kind="generic")
6660 db.update_job_metadata(
6661 job_id,
6662 {
6663 "task_queue": [
6664 {
6665 "title": f"Old branch {index}",
6666 "status": "open",
6667 "priority": index,
6668 }
6669 for index in range(82)
6670 ] + [
6671 {
6672 "title": "Validate AlphaModel candidate file before benchmark",
6673 "status": "active",
6674 "priority": 100,
6675 "contract": "experiment",
6676 "acceptance_criteria": "Validated candidate file is used in a measurement.",
6677 "evidence_needed": "Shell output with candidate file size and benchmark result.",
6678 }
6679 ]
6680 },
6681 )
6682 run_id = db.start_run(job_id, model="test")
6683 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
6684 db.finish_step(
6685 step_id,
6686 status="failed",
6687 output_data={
6688 "success": False,
6689 "stdout": "-rw-r--r-- 1 user user 12G May 15 10:01 /srv/models/AlphaModel-IQ3.foo\n",
6690 "stderr": "ls: cannot access '/tmp/models/AlphaModel-Q4.foo': No such file or directory\n",
6691 },
6692 )
6693 db.finish_run(run_id, "failed")
6694
6695 content = build_messages(db.get_job(job_id), db.list_steps(job_id=job_id))[-1]["content"]
6696
6697 focus = content[content.index("Current execution focus:"): content.index("Pending measurement obligation:")]
6698 assert "phase=execute_with_validated_candidate" in focus
6699 assert "Use the recently validated candidate path: /srv/models/AlphaModel-IQ3.foo" in focus
6700 assert "backlog=83 tasks" in focus
6701 assert "Treat it as advisory" in focus
6702 next_constraint = content[content.index("Next-action constraint:"):]
6703 assert "/srv/models/AlphaModel-IQ3.foo" in next_constraint
6704 assert "/tmp/models/AlphaModel-Q4.foo" not in next_constraint
6705 finally:
6706 db.close()
6707
6708
6709def test_prompt_moves_from_candidate_validation_to_candidate_use_after_positive_evidence(tmp_path):
6710 db = AgentDB(tmp_path / "state.db")
6711 try:
6712 job_id = db.create_job("Benchmark AlphaModel throughput", title="alpha benchmark", kind="generic")
6713 db.update_job_metadata(
6714 job_id,
6715 {
6716 "task_queue": [
6717 {
6718 "title": "Run benchmark with validated AlphaModel file",
6719 "status": "active",
6720 "priority": 100,
6721 "contract": "experiment",
6722 "acceptance_criteria": "Benchmark command uses a validated file path.",
6723 "evidence_needed": "Shell output showing file size and benchmark result.",
6724 }
6725 ]
6726 },
6727 )
6728 run_id = db.start_run(job_id, model="test")
6729 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
6730 db.finish_step(
6731 step_id,
6732 status="completed",
6733 output_data={
6734 "success": True,
6735 "stdout": "-rw-r--r-- 1 user user 12G May 15 10:01 /srv/models/AlphaModel-IQ3.foo\n",
6736 },
6737 )
6738 db.finish_run(run_id, "completed")
6739
6740 content = build_messages(db.get_job(job_id), db.list_steps(job_id=job_id))[-1]["content"]
6741
6742 focus = content[content.index("Current execution focus:"): content.index("Pending measurement obligation:")]
6743 assert "phase=execute_with_validated_candidate" in focus
6744 assert "Use the recently validated candidate path: /srv/models/AlphaModel-IQ3.foo" in focus
6745 next_constraint = content[content.index("Next-action constraint:"):]
6746 assert "Use it in the next bounded action or measurement" in next_constraint
6747 assert "repeating existence checks" in next_constraint
6748 finally:
6749 db.close()
6750
6751
6752def test_prompt_ranks_context_matching_candidate_paths_before_auxiliary_files(tmp_path):
6753 db = AgentDB(tmp_path / "state.db")
6754 try:
6755 job_id = db.create_job("Benchmark AlphaModel throughput", title="alpha benchmark", kind="generic")
6756 db.update_job_metadata(
6757 job_id,
6758 {
6759 "task_queue": [
6760 {
6761 "title": "Validate candidate file path before benchmark",
6762 "status": "open",
6763 "contract": "experiment",
6764 "acceptance_criteria": "Validated primary file is used in a measurement.",
6765 "evidence_needed": "Shell output with file size and benchmark result.",
6766 }
6767 ]
6768 },
6769 )
6770 run_id = db.start_run(job_id, model="test")
6771 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
6772 db.finish_step(
6773 step_id,
6774 status="completed",
6775 output_data={
6776 "success": True,
6777 "stdout": (
6778 "/srv/models/ggml-vocab-alpha.foo\n"
6779 "/srv/models/sidecar-mmproj-alpha.foo\n"
6780 "/srv/models/AlphaModel-Q4.foo\n"
6781 ),
6782 },
6783 )
6784 db.finish_run(run_id, "completed")
6785
6786 job = db.get_job(job_id)
6787 content = build_messages(job, db.list_steps(job_id=job_id))[-1]["content"]
6788 ranked = _rank_candidate_file_paths(
6789 job,
6790 "Validate candidate file path before benchmark",
6791 [
6792 "/srv/models/ggml-vocab-alpha.foo",
6793 "/srv/models/sidecar-mmproj-alpha.foo",
6794 "/srv/models/AlphaModel-Q4.foo",
6795 ],
6796 )
6797
6798 section = content[content.index("Candidate file discovery:"): content.index("Measured progress guard:")]
6799 assert "Candidate paths:" in section
6800 assert ranked[0] == "/srv/models/AlphaModel-Q4.foo"
6801 assert "/srv/models/Alp" in section
6802 assert "This supersedes stale no-candidate/no-file memory" in section
6803 assert "header/signature bytes" in section
6804 finally:
6805 db.close()
6806
6807
6808def test_next_action_prioritizes_candidate_file_validation_over_download_retry(tmp_path):
6809 db = AgentDB(tmp_path / "state.db")
6810 try:
6811 job_id = db.create_job("Benchmark AlphaModel throughput", title="alpha benchmark", kind="generic")
6812 db.update_job_metadata(
6813 job_id,
6814 {
6815 "task_queue": [
6816 {
6817 "title": "Run baseline benchmark with the discovered file",
6818 "status": "open",
6819 "contract": "experiment",
6820 "acceptance_criteria": "Benchmark command uses a validated file path.",
6821 "evidence_needed": "Shell output showing file size and benchmark result.",
6822 }
6823 ],
6824 "experiment_ledger": [
6825 {
6826 "title": "Remote download failed",
6827 "status": "failed",
6828 "metric_name": "downloaded_files",
6829 "metric_value": 0,
6830 "next_action": "Record tasks to explore alternative download methods and remote sources.",
6831 }
6832 ],
6833 },
6834 )
6835 run_id = db.start_run(job_id, model="test")
6836 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
6837 db.finish_step(
6838 step_id,
6839 status="completed",
6840 output_data={
6841 "success": True,
6842 "stdout": "/srv/models/AlphaModel-Q4.foo\n",
6843 },
6844 )
6845 db.finish_run(run_id, "completed")
6846
6847 content = build_messages(db.get_job(job_id), db.list_steps(job_id=job_id))[-1]["content"]
6848
6849 idx = content.index("Next-action constraint:")
6850 next_constraint = content[idx: idx + 900]
6851 assert "Concrete candidate file paths are available" in next_constraint
6852 assert "/srv/models/AlphaModel-Q4.foo" in next_constraint
6853 assert "alternative download methods" not in next_constraint
6854 finally:
6855 db.close()
6856
6857
6858def test_prompt_ranks_late_candidate_paths_from_large_shell_listing(tmp_path):
6859 db = AgentDB(tmp_path / "state.db")
6860 try:
6861 job_id = db.create_job("Benchmark OmegaModel throughput", title="omega benchmark", kind="generic")
6862 db.update_job_metadata(
6863 job_id,
6864 {
6865 "task_queue": [
6866 {
6867 "title": "Validate OmegaModel candidate file before measurement",
6868 "status": "open",
6869 "contract": "experiment",
6870 "acceptance_criteria": "Validated candidate file is used in a measurement.",
6871 "evidence_needed": "Shell output with candidate file size and benchmark result.",
6872 }
6873 ]
6874 },
6875 )
6876 run_id = db.start_run(job_id, model="test")
6877 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
6878 db.finish_step(
6879 step_id,
6880 status="completed",
6881 output_data={
6882 "success": True,
6883 "stdout": "\n".join(
6884 [f"/srv/models/ggml-vocab-{index}.foo" for index in range(30)]
6885 + ["/srv/models/OmegaModel-primary.foo"]
6886 ),
6887 },
6888 )
6889 db.finish_run(run_id, "completed")
6890
6891 content = build_messages(db.get_job(job_id), db.list_steps(job_id=job_id))[-1]["content"]
6892 section = content[content.index("Candidate file discovery:"): content.index("Measured progress guard:")]
6893
6894 assert "/srv/models/Ome" in section
6895 finally:
6896 db.close()
6897
6898
6899def test_prompt_prioritizes_structured_candidate_file_paths(tmp_path):
6900 db = AgentDB(tmp_path / "state.db")
6901 try:
6902 job_id = db.create_job("Validate a discovered remote file", title="candidate-file", kind="generic")
6903 db.update_job_metadata(
6904 job_id,
6905 {
6906 "task_queue": [
6907 {
6908 "title": "Download and validate a candidate file",
6909 "status": "open",
6910 "contract": "action",
6911 "acceptance_criteria": "A candidate file path is selected and validated before use.",
6912 "evidence_needed": "Shell output with size, hash, or validation metadata.",
6913 }
6914 ]
6915 },
6916 )
6917 run_id = db.start_run(job_id, model="test")
6918 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
6919 db.finish_step(
6920 step_id,
6921 status="completed",
6922 output_data={
6923 "success": True,
6924 "stdout": (
6925 '[{"type":"file","size":123456789,"path":"ExampleModel-Q4.foo"},'
6926 '{"type":"file","size":42,"path":".gitattributes"}]'
6927 ),
6928 },
6929 )
6930 db.finish_run(run_id, "completed")
6931
6932 content = build_messages(db.get_job(job_id), db.list_steps(job_id=job_id))[-1]["content"]
6933
6934 assert "Candidate file discovery:" in content
6935 assert "ExampleModel-Q4.foo" in content
6936 assert "Validate likely candidates with shell_exec" in content
6937 finally:
6938 db.close()
6939
6940
6941def test_prompt_filters_truncated_and_url_like_candidate_file_paths(tmp_path):
6942 db = AgentDB(tmp_path / "state.db")
6943 try:
6944 job_id = db.create_job("Validate concrete local candidates", title="candidate-file", kind="generic")
6945 db.update_job_metadata(
6946 job_id,
6947 {
6948 "task_queue": [
6949 {
6950 "title": "Validate remembered file path",
6951 "status": "open",
6952 "contract": "action",
6953 "acceptance_criteria": "A candidate file path is validated before use.",
6954 "evidence_needed": "Shell output with file size or hash.",
6955 }
6956 ],
6957 "experiment_ledger": [
6958 {
6959 "title": "Prior candidate discovery",
6960 "result": "Avoid pseudo-paths like //example.com and truncated paths like /tmp/...",
6961 "next_action": "Validate /opt/models/ConcreteModel-Q4.foo before declaring no usable file.",
6962 }
6963 ],
6964 },
6965 )
6966
6967 content = build_messages(db.get_job(job_id), db.list_steps(job_id=job_id))[-1]["content"]
6968
6969 assert "Candidate file discovery:" in content
6970 assert "/opt/models/ConcreteModel-Q4.foo" in content
6971 assert "//example.com" not in content
6972 assert "/tmp/..." not in content
6973 finally:
6974 db.close()
6975
6976
6977def test_candidate_path_extraction_stops_at_escaped_newline_metadata():
6978 text = (
6979 'output="/srv/models/AlphaModel-Q4.foo\\n-rw-rw-r-- owner size" '
6980 '{"path": "/srv/models/BetaModel-Q8.foo\\n-rw-rw-r--"}'
6981 )
6982
6983 paths = _extract_candidate_file_paths(text)
6984
6985 assert "/srv/models/AlphaModel-Q4.foo" in paths
6986 assert "/srv/models/BetaModel-Q8.foo" in paths
6987 assert all("\\n-rw-rw-r--" not in path for path in paths)
6988
6989
6990def test_candidate_path_extraction_skips_globs_and_truncated_fragments():
6991 text = (
6992 "/srv/models/*.foo\n"
6993 "/srv/models/AlphaModel-Q4.foo\n"
6994 "/srv/models/AlphaModel-Q4\n"
6995 "/srv/models/AlphaModel-v1.2-UnfinishedFrag\n"
6996 "/srv/models/BetaModel-v1.2-Q8.foo\n"
6997 )
6998
6999 paths = _extract_candidate_file_paths(text)
7000
7001 assert "/srv/models/AlphaModel-Q4.foo" in paths
7002 assert "/srv/models/BetaModel-v1.2-Q8.foo" in paths
7003 assert "/srv/models/*.foo" not in paths
7004 assert "/srv/models/AlphaModel-Q4" not in paths
7005 assert "/srv/models/AlphaModel-v1.2-UnfinishedFrag" not in paths
7006
7007
7008def test_prompt_resurfaces_durable_candidate_file_paths(tmp_path):
7009 db = AgentDB(tmp_path / "state.db")
7010 try:
7011 job_id = db.create_job("Validate a remembered file candidate", title="durable-candidate", kind="generic")
7012 db.update_job_metadata(
7013 job_id,
7014 {
7015 "task_queue": [
7016 {
7017 "title": "Validate remembered file path",
7018 "status": "open",
7019 "contract": "action",
7020 "acceptance_criteria": "A candidate file path is validated before use.",
7021 "evidence_needed": "Shell output with file size or hash.",
7022 }
7023 ],
7024 "experiment_ledger": [
7025 {
7026 "title": "Prior candidate discovery",
7027 "result": "A previous branch listed /opt/models/Remembered-Model-Q4.foo as a candidate.",
7028 "next_action": "Validate /opt/models/Remembered-Model-Q4.foo before declaring no usable file.",
7029 }
7030 ],
7031 },
7032 )
7033
7034 content = build_messages(db.get_job(job_id), db.list_steps(job_id=job_id))[-1]["content"]
7035
7036 assert "Candidate file discovery:" in content
7037 assert "Durable records mention candidate file paths" in content
7038 assert "/opt/models/Remembered-Model-Q4.foo" in content
7039 assert "Treat durable-record candidates as candidates until revalidated" in content
7040 finally:
7041 db.close()
7042
7043
7044def test_prompt_resurfaces_candidate_paths_from_recent_grounding_block(tmp_path):
7045 db = AgentDB(tmp_path / "state.db")
7046 try:
7047 job_id = db.create_job("Validate a candidate file", title="candidate-file", kind="generic")
7048 db.update_job_metadata(
7049 job_id,
7050 {
7051 "task_queue": [
7052 {
7053 "title": "Validate candidate file path",
7054 "status": "open",
7055 "contract": "action",
7056 "acceptance_criteria": "A candidate file path is validated before use.",
7057 "evidence_needed": "Shell output with file size or hash.",
7058 }
7059 ]
7060 },
7061 )
7062 run_id = db.start_run(job_id, model="test")
7063 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_findings")
7064 db.finish_step(
7065 step_id,
7066 status="blocked",
7067 output_data={
7068 "success": True,
7069 "error": "evidence grounding required",
7070 "evidence_grounding": {
7071 "missing_candidate_paths": [
7072 "/srv/models/ExactModel-Q4.foo",
7073 "/srv/models/*.foo",
7074 "/srv/models/Fragment-v1.2-Unfinished",
7075 ]
7076 },
7077 },
7078 summary="blocked record_findings; evidence grounding required",
7079 )
7080 db.finish_run(run_id, "blocked")
7081
7082 content = build_messages(db.get_job(job_id), db.list_steps(job_id=job_id))[-1]["content"]
7083
7084 assert "Candidate file discovery:" in content
7085 assert "/srv/models/ExactModel-Q4.foo" in content
7086 assert "/srv/models/*.foo" not in content
7087 assert "/srv/models/Fragment-v1.2-Unfinished" not in content
7088 finally:
7089 db.close()
7090
7091
7092def test_grounding_uses_recent_missing_candidate_paths_after_raw_evidence_ages(tmp_path):
7093 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
7094 db = AgentDB(tmp_path / "state.db")
7095 try:
7096 job_id = db.create_job("Validate a candidate file", title="candidate-file", kind="generic")
7097 db.update_job_metadata(
7098 job_id,
7099 {
7100 "task_queue": [
7101 {
7102 "title": "Validate candidate file path",
7103 "status": "open",
7104 "contract": "experiment",
7105 "acceptance_criteria": "A candidate file path is validated before use.",
7106 "evidence_needed": "Shell output with file size or hash.",
7107 }
7108 ]
7109 },
7110 )
7111 run_id = db.start_run(job_id, model="test")
7112 for index in range(10):
7113 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_lesson")
7114 db.finish_step(
7115 step_id,
7116 status="completed",
7117 output_data={"success": True, "lesson": {"lesson": f"filler {index}"}},
7118 summary=f"filler {index}",
7119 )
7120 blocked_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_findings")
7121 db.finish_step(
7122 blocked_id,
7123 status="blocked",
7124 output_data={
7125 "success": True,
7126 "error": "evidence grounding required",
7127 "evidence_grounding": {
7128 "missing_candidate_paths": ["/srv/models/ExactModel-Q4.foo"]
7129 },
7130 },
7131 summary="blocked record_findings; evidence grounding required",
7132 )
7133 db.finish_run(run_id, "blocked")
7134
7135 result = run_one_step(
7136 job_id,
7137 config=config,
7138 db=db,
7139 llm=ScriptedLLM([
7140 LLMResponse(tool_calls=[
7141 ToolCall(
7142 name="record_experiment",
7143 arguments={
7144 "title": "Candidate file validation",
7145 "hypothesis": "A candidate model file may be available.",
7146 "metric_name": "validated_files",
7147 "metric_value": 0,
7148 "metric_unit": "files",
7149 "result": "Candidate files were summarized but not named.",
7150 },
7151 )
7152 ])
7153 ]),
7154 )
7155
7156 assert result.status == "blocked"
7157 grounding = result.result["evidence_grounding"]
7158 assert "/srv/models/ExactModel-Q4.foo" in grounding["missing_candidate_paths"]
7159 finally:
7160 db.close()
7161
7162
7163def test_prompt_filters_stale_generated_and_objective_tokens(tmp_path):
7164 db = AgentDB(tmp_path / "state.db")
7165 try:
7166 job_id = db.create_job("Optimize Qwen3.6-27B GGUF throughput", title="qwen job", kind="generic")
7167 db.append_lesson(
7168 job_id,
7169 (
7170 "Evidence grounding rejected unsupported concrete tokens for record_experiment: "
7171 "Qwen3.6-27B-GGUF, JSON, shell_exec_step_1037, timeout_after_300s, E5-2690. "
7172 "Treat matching prior ledger claims as stale."
7173 ),
7174 category="mistake",
7175 )
7176 db.append_finding_record(job_id, name="Qwen3.6-27B-GGUF source", category="source")
7177 db.append_finding_record(job_id, name="Intel Xeon E5-2690 baseline", category="hardware")
7178
7179 content = build_messages(db.get_job(job_id), db.list_steps(job_id=job_id))[-1]["content"]
7180
7181 assert "Qwen3.6-27B-GGUF" in content
7182 assert "JSON" not in content
7183 assert "shell_exec_step_1037" not in content
7184 assert "timeout_after_300s" not in content
7185 assert "Unsupported/stale claim tokens to avoid until re-verified: [unsupported-stale-claim]" in content
7186 assert "Intel Xeon E5-2690 baseline" not in content
7187 finally:
7188 db.close()
7189
7190
7191def test_prompt_redacts_stale_tokens_from_recent_state(tmp_path):
7192 db = AgentDB(tmp_path / "state.db")
7193 try:
7194 job_id = db.create_job(
7195 "Prefer current durable evidence",
7196 title="stale-recent-state",
7197 kind="generic",
7198 metadata={"unsupported_claim_tokens": ["E5-2690", "v3"]},
7199 )
7200 run_id = db.start_run(job_id, model="test")
7201 step_id = db.add_step(
7202 job_id=job_id,
7203 run_id=run_id,
7204 kind="tool",
7205 tool_name="record_findings",
7206 input_data={"arguments": {"finding": "Old CPU claim: Intel Xeon E5-2690 v3"}},
7207 )
7208 db.finish_step(
7209 step_id,
7210 status="blocked",
7211 output_data={"success": False, "error": "evidence grounding required"},
7212 summary="blocked record_findings; Intel Xeon E5-2690 v3 unsupported",
7213 )
7214 db.finish_run(run_id, "blocked")
7215
7216 content = build_messages(db.get_job(job_id), db.list_steps(job_id=job_id))[-1]["content"]
7217
7218 assert "E5-2690" not in content
7219 assert "[unsupported-stale-claim]" in content
7220 finally:
7221 db.close()
7222
7223
7224def test_prompt_does_not_redact_stale_tokens_inside_exact_paths(tmp_path):
7225 db = AgentDB(tmp_path / "state.db")
7226 try:
7227 job_id = db.create_job(
7228 "Validate exact candidate paths",
7229 title="path-redaction",
7230 kind="generic",
7231 metadata={"unsupported_claim_tokens": ["AlphaModel-99"]},
7232 )
7233 db.update_job_metadata(
7234 job_id,
7235 {
7236 "task_queue": [
7237 {
7238 "title": "Validate candidate file path",
7239 "status": "open",
7240 "contract": "experiment",
7241 "acceptance_criteria": "Exact path is validated.",
7242 "evidence_needed": "Shell output with file size.",
7243 }
7244 ]
7245 },
7246 )
7247 run_id = db.start_run(job_id, model="test")
7248 step_id = db.add_step(
7249 job_id=job_id,
7250 run_id=run_id,
7251 kind="tool",
7252 tool_name="record_findings",
7253 input_data={"arguments": {"finding": "Old unsupported AlphaModel-99 claim"}},
7254 )
7255 db.finish_step(
7256 step_id,
7257 status="blocked",
7258 output_data={
7259 "success": False,
7260 "error": "evidence grounding required",
7261 "evidence_grounding": {
7262 "missing_candidate_paths": ["/srv/models/AlphaModel-99-Q4.foo"],
7263 "unsupported_tokens": ["/srv/models/AlphaModel-99-Q4.foo"],
7264 },
7265 },
7266 summary="blocked record_findings; AlphaModel-99 unsupported",
7267 )
7268 db.finish_run(run_id, "blocked")
7269
7270 content = build_messages(db.get_job(job_id), db.list_steps(job_id=job_id))[-1]["content"]
7271
7272 assert "/srv/models/AlphaModel-99-Q4.foo" in content
7273 assert "unsupported [unsupported-stale-claim] claim" in content
7274 finally:
7275 db.close()
7276
7277
7278def test_prompt_redacts_older_stale_tokens_from_task_queue(tmp_path):
7279 stale_tail = [f"GPU{i}X" for i in range(60)]
7280 job = {
7281 "title": "stale task cleanup",
7282 "kind": "generic",
7283 "objective": "use current evidence",
7284 "metadata": {
7285 "unsupported_claim_tokens": ["E5-2690", *stale_tail],
7286 "task_queue": [
7287 {
7288 "title": "Record old baseline",
7289 "status": "active",
7290 "priority": 10,
7291 "goal": "Record CPU: Dual Intel Xeon E5-2690 v3 from old evidence.",
7292 "output_contract": "experiment",
7293 }
7294 ],
7295 },
7296 }
7297
7298 content = build_messages(job, [])[-1]["content"]
7299
7300 assert "E5-2690" not in content
7301 assert "[unsupported-stale-claim]" in content
7302
7303
7304def test_run_one_step_requires_accounting_after_auto_checkpoint_read(tmp_path):
7305 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
7306 db = AgentDB(tmp_path / "state.db")
7307 try:
7308 job_id = db.create_job("Convert evidence checkpoints into progress", title="checkpoint", kind="generic")
7309 run_id = db.start_run(job_id, model="test")
7310 checkpoint_step = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
7311 db.finish_step(
7312 checkpoint_step,
7313 status="blocked",
7314 output_data={
7315 "success": True,
7316 "error": "artifact required before more research",
7317 "auto_checkpoint": {
7318 "artifact_id": "art_checkpoint",
7319 "path": str(tmp_path / "checkpoint.md"),
7320 "title": "Auto Evidence Checkpoint after step 1",
7321 "evidence_step": "step_evidence",
7322 "blocked_tool": "shell_exec",
7323 },
7324 },
7325 )
7326 read_step = db.add_step(
7327 job_id=job_id,
7328 run_id=run_id,
7329 kind="tool",
7330 tool_name="read_artifact",
7331 input_data={"arguments": {"artifact_id": "art_checkpoint"}},
7332 )
7333 db.finish_step(read_step, status="completed", output_data={"success": True, "content": "evidence"})
7334 db.finish_run(run_id, "completed")
7335
7336 result = run_one_step(
7337 job_id,
7338 config=config,
7339 db=db,
7340 llm=ScriptedLLM([
7341 LLMResponse(tool_calls=[ToolCall(name="shell_exec", arguments={"command": "echo more discovery"})])
7342 ]),
7343 )
7344
7345 assert result.status == "blocked"
7346 assert result.result["error"] == "evidence checkpoint accounting required"
7347 assert result.result["blocked_tool"] == "shell_exec"
7348 finally:
7349 db.close()
7350
7351
7352def test_run_one_step_reads_checkpoint_before_batched_branch_work(tmp_path):
7353 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
7354 db = AgentDB(tmp_path / "state.db")
7355 try:
7356 job_id = db.create_job("Convert evidence checkpoints into progress", title="checkpoint", kind="generic")
7357 db.update_job_metadata(
7358 job_id,
7359 {
7360 "pending_evidence_checkpoint": {
7361 "artifact_id": "art_checkpoint",
7362 "title": "Auto Evidence Checkpoint after step 1",
7363 "evidence_step_no": 1,
7364 "blocked_tool": "shell_exec",
7365 }
7366 },
7367 )
7368
7369 result = run_one_step(
7370 job_id,
7371 config=config,
7372 db=db,
7373 llm=ScriptedLLM([
7374 LLMResponse(tool_calls=[
7375 ToolCall(name="shell_exec", arguments={"command": "echo more discovery"}),
7376 ToolCall(name="read_artifact", arguments={"artifact_id": "art_checkpoint"}),
7377 ])
7378 ]),
7379 registry=SuccessRegistry(),
7380 )
7381
7382 tool_steps = [step for step in db.list_steps(job_id=job_id) if step.get("kind") == "tool"]
7383 assert [step["tool_name"] for step in tool_steps[-2:]] == ["read_artifact", "shell_exec"]
7384 assert result.status == "blocked"
7385 assert result.result["error"] == "evidence checkpoint accounting required"
7386 assert result.result["blocked_tool"] == "shell_exec"
7387 assert result.result["checkpoint_already_read"] is True
7388 pending = db.get_job(job_id)["metadata"]["pending_evidence_checkpoint"]
7389 assert pending["read_at"]
7390 finally:
7391 db.close()
7392
7393
7394def test_run_one_step_allows_checkpoint_read_when_deliverable_guard_is_active(tmp_path):
7395 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
7396 db = AgentDB(tmp_path / "state.db")
7397 try:
7398 job_id = db.create_job("Write a report from long research", title="report checkpoint", kind="generic")
7399 run_id = db.start_run(job_id, model="test")
7400 for index in range(18):
7401 step_id = db.add_step(
7402 job_id=job_id,
7403 run_id=run_id,
7404 kind="tool",
7405 tool_name="shell_exec",
7406 input_data={"arguments": {"command": f"ls item-{index}"}},
7407 )
7408 db.finish_step(step_id, status="completed", output_data={"success": True, "stdout": "evidence"})
7409 db.finish_run(run_id, "completed")
7410 db.update_job_metadata(
7411 job_id,
7412 {
7413 "pending_evidence_checkpoint": {
7414 "artifact_id": "art_checkpoint",
7415 "title": "Auto Evidence Checkpoint after step 18",
7416 "evidence_step_no": 18,
7417 "blocked_tool": "shell_exec",
7418 }
7419 },
7420 )
7421
7422 result = run_one_step(
7423 job_id,
7424 config=config,
7425 db=db,
7426 llm=ScriptedLLM([
7427 LLMResponse(tool_calls=[ToolCall(name="read_artifact", arguments={"artifact_id": "art_checkpoint"})])
7428 ]),
7429 registry=SuccessRegistry(),
7430 )
7431
7432 assert result.status == "completed"
7433 assert result.tool_name == "read_artifact"
7434 pending = db.get_job(job_id)["metadata"]["pending_evidence_checkpoint"]
7435 assert pending["read_at"]
7436 assert pending["read_step_no"] == 19
7437 finally:
7438 db.close()
7439
7440
7441def test_run_one_step_accounts_checkpoint_before_batched_branch_work(tmp_path):
7442 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
7443 db = AgentDB(tmp_path / "state.db")
7444 try:
7445 job_id = db.create_job("Convert evidence checkpoints into progress", title="checkpoint", kind="generic")
7446 db.update_job_metadata(
7447 job_id,
7448 {
7449 "pending_evidence_checkpoint": {
7450 "artifact_id": "art_checkpoint",
7451 "title": "Auto Evidence Checkpoint after step 1",
7452 "read_at": "2026-01-01T00:00:00+00:00",
7453 "evidence_step_no": 1,
7454 "blocked_tool": "shell_exec",
7455 }
7456 },
7457 )
7458
7459 result = run_one_step(
7460 job_id,
7461 config=config,
7462 db=db,
7463 llm=ScriptedLLM([
7464 LLMResponse(tool_calls=[
7465 ToolCall(name="shell_exec", arguments={"command": "echo more discovery"}),
7466 ToolCall(name="record_lesson", arguments={"lesson": "checkpoint accounted", "category": "strategy"}),
7467 ])
7468 ]),
7469 registry=SuccessRegistry(),
7470 )
7471
7472 tool_steps = [step for step in db.list_steps(job_id=job_id) if step.get("kind") == "tool"]
7473 assert [step["tool_name"] for step in tool_steps[-2:]] == ["record_lesson", "shell_exec"]
7474 assert result.status == "completed"
7475 assert result.tool_name == "shell_exec"
7476 pending = db.get_job(job_id)["metadata"]["pending_evidence_checkpoint"]
7477 assert pending["resolved_at"]
7478 assert pending["resolved_by_tool"] == "record_lesson"
7479 finally:
7480 db.close()
7481
7482
7483def test_run_one_step_treats_guard_recovery_as_checkpoint_accounting(tmp_path):
7484 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
7485 db = AgentDB(tmp_path / "state.db")
7486 try:
7487 job_id = db.create_job("Convert evidence checkpoints into progress", title="checkpoint", kind="generic")
7488 run_id = db.start_run(job_id, model="test")
7489 checkpoint_step = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
7490 db.finish_step(
7491 checkpoint_step,
7492 status="blocked",
7493 output_data={
7494 "success": True,
7495 "error": "artifact required before more research",
7496 "auto_checkpoint": {
7497 "artifact_id": "art_checkpoint",
7498 "path": str(tmp_path / "checkpoint.md"),
7499 "title": "Auto Evidence Checkpoint after step 1",
7500 "evidence_step": "step_evidence",
7501 "blocked_tool": "shell_exec",
7502 },
7503 },
7504 )
7505 read_step = db.add_step(
7506 job_id=job_id,
7507 run_id=run_id,
7508 kind="tool",
7509 tool_name="read_artifact",
7510 input_data={"arguments": {"artifact_id": "art_checkpoint"}},
7511 )
7512 db.finish_step(read_step, status="completed", output_data={"success": True, "content": "evidence"})
7513 guard_step = db.add_step(
7514 job_id=job_id,
7515 run_id=run_id,
7516 kind="tool",
7517 tool_name="read_artifact",
7518 input_data={"arguments": {"artifact_id": "art_checkpoint"}},
7519 )
7520 db.finish_step(
7521 guard_step,
7522 status="blocked",
7523 output_data={
7524 "success": True,
7525 "recoverable": True,
7526 "error": "evidence checkpoint accounting required",
7527 "pending_evidence_checkpoint": {
7528 "artifact_id": "art_checkpoint",
7529 "title": "Auto Evidence Checkpoint after step 1",
7530 "checkpoint_read": True,
7531 },
7532 },
7533 )
7534 recovery_step = db.add_step(job_id=job_id, run_id=run_id, kind="recovery", tool_name="guard_recovery")
7535 db.finish_step(
7536 recovery_step,
7537 status="completed",
7538 output_data={
7539 "success": True,
7540 "lesson": {"lesson": "Open a task after repeated guard blocks."},
7541 "task": {"title": "Resolve guard"},
7542 },
7543 )
7544 db.finish_run(run_id, "completed")
7545
7546 result = run_one_step(
7547 job_id,
7548 config=config,
7549 db=db,
7550 llm=ScriptedLLM([
7551 LLMResponse(tool_calls=[ToolCall(name="shell_exec", arguments={"command": "echo more discovery"})])
7552 ]),
7553 registry=SuccessRegistry(),
7554 )
7555
7556 assert result.status == "completed"
7557 assert result.tool_name == "shell_exec"
7558 finally:
7559 db.close()
7560
7561
7562def test_checkpoint_resolution_tool_bypasses_measured_progress_guard(tmp_path):
7563 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
7564 db = AgentDB(tmp_path / "state.db")
7565 try:
7566 job_id = db.create_job("Optimize benchmark speed", title="checkpoint-measure", kind="generic")
7567 db.update_job_metadata(
7568 job_id,
7569 {
7570 "pending_evidence_checkpoint": {
7571 "artifact_id": "art_checkpoint",
7572 "title": "Auto Evidence Checkpoint after step 1",
7573 "read_at": "2026-01-01T00:00:00+00:00",
7574 "evidence_step_no": 1,
7575 "blocked_tool": "shell_exec",
7576 }
7577 },
7578 )
7579 run_id = db.start_run(job_id, model="fake")
7580 for index in range(18):
7581 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
7582 db.finish_step(step_id, status="completed", output_data={"success": True, "stdout": f"probe {index}"})
7583 db.finish_run(run_id, "completed")
7584
7585 result = run_one_step(
7586 job_id,
7587 config=config,
7588 db=db,
7589 llm=ScriptedLLM([
7590 LLMResponse(tool_calls=[
7591 ToolCall(
7592 name="record_source",
7593 arguments={
7594 "source": "file:///tmp/checkpoint",
7595 "source_type": "checkpoint",
7596 "outcome": "checkpoint accounted before more benchmark work",
7597 },
7598 )
7599 ])
7600 ]),
7601 )
7602
7603 assert result.status == "completed"
7604 assert result.tool_name == "record_source"
7605 pending = db.get_job(job_id)["metadata"]["pending_evidence_checkpoint"]
7606 assert pending["resolved_by_tool"] == "record_source"
7607 finally:
7608 db.close()
7609
7610
7611def test_run_one_step_persists_checkpoint_obligation_until_accounted(tmp_path):
7612 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
7613 db = AgentDB(tmp_path / "state.db")
7614 try:
7615 job_id = db.create_job("Convert evidence checkpoints into progress", title="checkpoint", kind="generic")
7616 db.update_job_metadata(
7617 job_id,
7618 {
7619 "pending_evidence_checkpoint": {
7620 "artifact_id": "art_checkpoint",
7621 "title": "Auto Evidence Checkpoint after step 1",
7622 "path": str(tmp_path / "checkpoint.md"),
7623 "created_at": "2026-01-01T00:00:00+00:00",
7624 "evidence_step": "step_evidence",
7625 "evidence_step_no": 1,
7626 "blocked_tool": "shell_exec",
7627 }
7628 },
7629 )
7630
7631 blocked = run_one_step(
7632 job_id,
7633 config=config,
7634 db=db,
7635 llm=ScriptedLLM([
7636 LLMResponse(tool_calls=[ToolCall(name="shell_exec", arguments={"command": "echo more discovery"})])
7637 ]),
7638 )
7639 assert blocked.status == "blocked"
7640 assert blocked.result["error"] == "evidence checkpoint accounting required"
7641
7642 accounted = run_one_step(
7643 job_id,
7644 config=config,
7645 db=db,
7646 llm=ScriptedLLM([
7647 LLMResponse(tool_calls=[
7648 ToolCall(
7649 name="record_lesson",
7650 arguments={
7651 "lesson": "The checkpoint contains only diagnostic setup evidence; record it and move to the next concrete branch.",
7652 "category": "strategy",
7653 },
7654 )
7655 ])
7656 ]),
7657 )
7658 assert accounted.status == "completed"
7659 pending = db.get_job(job_id)["metadata"]["pending_evidence_checkpoint"]
7660 assert pending["resolved_at"]
7661 assert pending["resolved_by_tool"] == "record_lesson"
7662 finally:
7663 db.close()
7664
7665
7666def test_run_one_step_blocks_branch_work_when_memory_graph_needs_consolidation(tmp_path):
7667 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
7668 db = AgentDB(tmp_path / "state.db")
7669 try:
7670 job_id = db.create_job("Keep improving durable work", title="memory")
7671 db.append_lesson(job_id, "Use validated checkpoints.", category="strategy")
7672 db.append_lesson(job_id, "Reject low-yield branches.", category="strategy")
7673 db.append_finding_record(job_id, name="Finding A")
7674 db.append_finding_record(job_id, name="Finding B")
7675 db.append_source_record(job_id, "source:a")
7676 db.append_experiment_record(job_id, title="Trial", status="measured")
7677 llm = ScriptedLLM([LLMResponse(tool_calls=[ToolCall(name="web_search", arguments={"query": "more research"})])])
7678
7679 result = run_one_step(job_id, config=config, db=db, llm=llm)
7680
7681 assert result.status == "blocked"
7682 assert result.result["error"] == "memory graph consolidation required"
7683 assert result.result["blocked_tool"] == "web_search"
7684 finally:
7685 db.close()
7686
7687
7688def test_run_one_step_allows_memory_graph_consolidation_when_guard_is_active(tmp_path):
7689 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
7690 db = AgentDB(tmp_path / "state.db")
7691 try:
7692 job_id = db.create_job("Keep improving durable work", title="memory")
7693 db.append_lesson(job_id, "Use validated checkpoints.", category="strategy")
7694 db.append_lesson(job_id, "Reject low-yield branches.", category="strategy")
7695 db.append_finding_record(job_id, name="Finding A")
7696 db.append_finding_record(job_id, name="Finding B")
7697 db.append_source_record(job_id, "source:a")
7698 db.append_experiment_record(job_id, title="Trial", status="measured")
7699 llm = ScriptedLLM([
7700 LLMResponse(
7701 tool_calls=[
7702 ToolCall(
7703 name="record_memory_graph",
7704 arguments={
7705 "nodes": [
7706 {
7707 "key": "validated-checkpoints",
7708 "kind": "strategy",
7709 "title": "Validated checkpoints",
7710 "summary": "Use measured checkpoints to decide the next branch.",
7711 "salience": 0.9,
7712 }
7713 ]
7714 },
7715 )
7716 ]
7717 )
7718 ])
7719
7720 result = run_one_step(job_id, config=config, db=db, llm=llm)
7721
7722 assert result.status == "completed"
7723 assert result.tool_name == "record_memory_graph"
7724 graph = db.get_job(job_id)["metadata"]["memory_graph"]
7725 assert graph["nodes"][0]["key"] == "validated-checkpoints"
7726 finally:
7727 db.close()
7728
7729
7730def test_prompt_adds_lesson_consolidation_guard_when_raw_lessons_sprawl():
7731 job = {
7732 "title": "lesson sprawl",
7733 "kind": "generic",
7734 "objective": "keep improving a long-running job",
7735 "metadata": {
7736 "lessons": [
7737 {"lesson": f"Reusable lesson {index}", "category": "strategy"}
7738 for index in range(30)
7739 ],
7740 },
7741 }
7742 steps = [
7743 {
7744 "step_no": index,
7745 "kind": "tool",
7746 "tool_name": "record_lesson",
7747 "status": "completed",
7748 "summary": f"lesson {index}",
7749 }
7750 for index in range(1, 4)
7751 ]
7752
7753 content = build_messages(job, steps)[-1]["content"]
7754
7755 assert "Lesson consolidation guard:" in content
7756 assert "Raw lessons are accumulating faster than consolidated memory" in content
7757 assert "lessons=30" in content
7758 assert "record_memory_graph" in content
7759
7760
7761def test_run_one_step_blocks_more_lessons_when_lesson_sprawl_needs_graph(tmp_path):
7762 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
7763 db = AgentDB(tmp_path / "state.db")
7764 try:
7765 job_id = db.create_job("Keep improving durable work", title="lesson-sprawl")
7766 for index in range(30):
7767 db.append_lesson(job_id, f"Reusable lesson {index}", category="strategy")
7768 run_id = db.start_run(job_id, model="test")
7769 for index in range(3):
7770 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_lesson")
7771 db.finish_step(step_id, status="completed", output_data={"success": True, "lesson": {"lesson": f"recent {index}"}})
7772 db.finish_run(run_id, "completed")
7773
7774 result = run_one_step(
7775 job_id,
7776 config=config,
7777 db=db,
7778 llm=ScriptedLLM([
7779 LLMResponse(tool_calls=[ToolCall(name="record_lesson", arguments={"lesson": "one more lesson"})])
7780 ]),
7781 )
7782
7783 assert result.status == "blocked"
7784 assert result.result["error"] == "lesson consolidation required"
7785 assert result.result["lesson_consolidation"]["lessons"] == 30
7786 assert result.result["blocked_tool"] == "record_lesson"
7787 finally:
7788 db.close()
7789
7790
7791def test_run_one_step_allows_memory_graph_when_lesson_sprawl_is_active(tmp_path):
7792 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
7793 db = AgentDB(tmp_path / "state.db")
7794 try:
7795 job_id = db.create_job("Keep improving durable work", title="lesson-sprawl")
7796 for index in range(30):
7797 db.append_lesson(job_id, f"Reusable lesson {index}", category="strategy")
7798 run_id = db.start_run(job_id, model="test")
7799 for index in range(3):
7800 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_lesson")
7801 db.finish_step(step_id, status="completed", output_data={"success": True, "lesson": {"lesson": f"recent {index}"}})
7802 db.finish_run(run_id, "completed")
7803
7804 result = run_one_step(
7805 job_id,
7806 config=config,
7807 db=db,
7808 llm=ScriptedLLM([
7809 LLMResponse(
7810 tool_calls=[
7811 ToolCall(
7812 name="record_memory_graph",
7813 arguments={
7814 "nodes": [
7815 {
7816 "key": "lesson-sprawl-strategy",
7817 "kind": "strategy",
7818 "title": "Consolidate repeated lessons",
7819 "summary": "Compress repeated lessons into graph memory before adding more.",
7820 }
7821 ]
7822 },
7823 )
7824 ]
7825 )
7826 ]),
7827 )
7828
7829 assert result.status == "completed"
7830 assert result.tool_name == "record_memory_graph"
7831 finally:
7832 db.close()
7833
7834
7835def test_prompt_includes_activity_stagnation_context():
7836 job = {
7837 "title": "research",
7838 "kind": "generic",
7839 "objective": "keep making durable progress",
7840 "metadata": {
7841 "activity_checkpoint_streak": 3,
7842 "last_checkpoint_counts": {
7843 "findings": 1,
7844 "sources": 2,
7845 "tasks": 4,
7846 "experiments": 0,
7847 "lessons": 1,
7848 "milestones": 0,
7849 },
7850 },
7851 }
7852
7853 content = build_messages(job, [])[-1]["content"]
7854
7855 assert "Activity stagnation" in content
7856 assert "activity_checkpoint_streak=3" in content
7857 assert "Recent checkpoints show activity without durable progress" in content
7858
7859
7860def test_prompt_includes_task_planning_guard_context():
7861 job = {
7862 "title": "research",
7863 "kind": "generic",
7864 "objective": "keep making durable progress",
7865 "metadata": {
7866 "task_planning_checkpoint_streak": 2,
7867 "task_queue": [
7868 {"title": "Plan branch", "status": "open"},
7869 {"title": "Executed branch", "status": "done"},
7870 ],
7871 },
7872 }
7873
7874 content = build_messages(job, [])[-1]["content"]
7875
7876 assert "Task planning guard" in content
7877 assert "task_only_checkpoints=2" in content
7878 assert "Do not create more new open tasks next" in content
7879
7880
7881def test_prompt_includes_durable_yield_pressure():
7882 job = {
7883 "title": "research",
7884 "kind": "generic",
7885 "objective": "keep making durable progress",
7886 "metadata": {},
7887 }
7888 steps = [
7889 {
7890 "step_no": index,
7891 "kind": "tool",
7892 "status": "completed",
7893 "tool_name": "web_search",
7894 "summary": f"search {index}",
7895 }
7896 for index in range(1, 31)
7897 ]
7898
7899 content = build_messages(job, steps)[-1]["content"]
7900
7901 assert "Durable progress yield" in content
7902 assert "No durable progress records after 30 completed actions" in content
7903 assert "record findings/source/experiment/lesson/roadmap progress" in content
7904
7905
7906def test_prompt_includes_finding_source_ledgers_and_reflections():
7907 job = {
7908 "title": "research",
7909 "kind": "generic",
7910 "objective": "find research",
7911 "metadata": {
7912 "finding_ledger": [{"name": "Acme Finding", "category": "example category", "location": "Toronto", "score": 0.8}],
7913 "task_queue": [{"title": "Explore primary sources", "status": "open", "priority": 5, "goal": "Find evidence"}],
7914 "source_ledger": [{"source": "https://example.com", "source_type": "web_source", "usefulness_score": 0.9, "yield_count": 3}],
7915 "reflections": [{"summary": "Primary source map is working", "strategy": "Try archival sources next"}],
7916 },
7917 }
7918
7919 messages = build_messages(job, [])
7920
7921 content = messages[-1]["content"]
7922 assert "Finding ledger: 1 unique candidates." in content
7923 assert "Acme Finding" in content
7924 assert "Explore primary sources" in content
7925 assert "https://example.com" in content
7926 assert "Primary source map is working" in content
7927
7928
7929def test_prompt_includes_experiment_ledger_and_best_result():
7930 job = {
7931 "title": "improve process",
7932 "kind": "generic",
7933 "objective": "make a measurable process better",
7934 "metadata": {
7935 "experiment_ledger": [
7936 {
7937 "title": "variant a",
7938 "status": "measured",
7939 "metric_name": "score",
7940 "metric_value": 2.0,
7941 "metric_unit": "units",
7942 "higher_is_better": True,
7943 "result": "baseline",
7944 "best_observed": False,
7945 },
7946 {
7947 "title": "variant b",
7948 "status": "measured",
7949 "metric_name": "score",
7950 "metric_value": 3.5,
7951 "metric_unit": "units",
7952 "higher_is_better": True,
7953 "result": "better",
7954 "next_action": "try another independent variant",
7955 "best_observed": True,
7956 },
7957 ],
7958 },
7959 }
7960
7961 messages = build_messages(job, [])
7962
7963 content = messages[-1]["content"]
7964 assert "Experiment ledger:" in content
7965 assert "Best observed results:" in content
7966 assert "variant b" in content
7967 assert "score=3.5 units" in content
7968 assert "Next-action constraint:" in content
7969 assert "latest measured experiment selected a concrete next action" in content
7970 assert "try another independent variant" in content
7971
7972
7973def _stagnant_experiments():
7974 return [
7975 {
7976 "title": "best variant",
7977 "status": "measured",
7978 "metric_name": "score",
7979 "metric_value": 10.0,
7980 "metric_unit": "units",
7981 "higher_is_better": True,
7982 "best_observed": True,
7983 "next_action": "try a materially different branch",
7984 },
7985 *[
7986 {
7987 "title": f"flat variant {index}",
7988 "status": "measured",
7989 "metric_name": "score",
7990 "metric_value": 8.0 + index * 0.1,
7991 "metric_unit": "units",
7992 "higher_is_better": True,
7993 "best_observed": False,
7994 "next_action": "try another small variant",
7995 }
7996 for index in range(1, 6)
7997 ],
7998 ]
7999
8000
8001def _stagnant_experiment_metadata():
8002 return {
8003 "experiment_ledger": _stagnant_experiments(),
8004 "memory_graph": {
8005 "nodes": [
8006 {"key": "best-variant", "kind": "decision", "title": "Best measured variant"},
8007 {"key": "stagnant-branch", "kind": "strategy", "title": "Stagnant branch should pivot"},
8008 ]
8009 },
8010 }
8011
8012
8013def test_prompt_includes_experiment_stagnation_guard():
8014 job = {
8015 "title": "improve measured process",
8016 "kind": "generic",
8017 "objective": "optimize throughput and keep improving",
8018 "metadata": _stagnant_experiment_metadata(),
8019 }
8020
8021 content = build_messages(job, [])[-1]["content"]
8022
8023 assert "Experiment stagnation guard:" in content
8024 assert "Recent measured trials have not improved" in content
8025 assert "best=10.0" in content
8026 assert "non_improving=5" in content
8027
8028
8029def test_prompt_infers_experiment_stagnation_from_metric_direction():
8030 job = {
8031 "title": "improve measured process",
8032 "kind": "generic",
8033 "objective": "reduce latency and keep improving",
8034 "metadata": {
8035 "experiment_ledger": [
8036 {
8037 "title": "best latency",
8038 "status": "measured",
8039 "metric_name": "latency",
8040 "metric_value": 1.0,
8041 "metric_unit": "s",
8042 "higher_is_better": False,
8043 },
8044 *[
8045 {
8046 "title": f"slower variant {index}",
8047 "status": "measured",
8048 "metric_name": "latency",
8049 "metric_value": 1.0 + index * 0.1,
8050 "metric_unit": "s",
8051 "higher_is_better": False,
8052 }
8053 for index in range(1, 6)
8054 ],
8055 ],
8056 "memory_graph": {
8057 "nodes": [
8058 {"key": "latency-best", "kind": "decision", "title": "Best latency"},
8059 {"key": "latency-pivot", "kind": "strategy", "title": "Pivot stagnant latency branch"},
8060 ]
8061 },
8062 },
8063 }
8064
8065 content = build_messages(job, [])[-1]["content"]
8066
8067 assert "Experiment stagnation guard:" in content
8068 assert "best=1.0" in content
8069 assert "latest=1.5" in content
8070 assert "Recent measured trials have not improved" in content
8071
8072
8073def test_prompt_does_not_treat_unmarked_improvements_as_stagnation():
8074 job = {
8075 "title": "improve measured process",
8076 "kind": "generic",
8077 "objective": "increase score and keep improving",
8078 "metadata": {
8079 "experiment_ledger": [
8080 {
8081 "title": f"better variant {index}",
8082 "status": "measured",
8083 "metric_name": "score",
8084 "metric_value": float(index),
8085 "metric_unit": "points",
8086 "higher_is_better": True,
8087 "best_observed": False,
8088 }
8089 for index in range(1, 7)
8090 ],
8091 "memory_graph": {
8092 "nodes": [
8093 {"key": "score-progress", "kind": "decision", "title": "Score is improving"},
8094 {"key": "score-next", "kind": "strategy", "title": "Continue measured branch"},
8095 ]
8096 },
8097 },
8098 }
8099
8100 content = build_messages(job, [])[-1]["content"]
8101
8102 assert "Experiment stagnation guard:" in content
8103 assert "Recent measured trials have not improved" not in content
8104
8105
8106def test_run_one_step_blocks_branch_work_after_experiment_stagnation(tmp_path):
8107 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
8108 db = AgentDB(tmp_path / "state.db")
8109 try:
8110 job_id = db.create_job(
8111 "Optimize a measurable process and keep improving",
8112 title="experiment-stagnation",
8113 kind="generic",
8114 metadata=_stagnant_experiment_metadata(),
8115 )
8116 run_id = db.start_run(job_id, model="test")
8117 for index in range(6):
8118 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_experiment")
8119 db.finish_step(step_id, status="completed", output_data={"success": True, "experiment": {"title": f"trial {index}"}})
8120 db.finish_run(run_id, "completed")
8121
8122 result = run_one_step(
8123 job_id,
8124 config=config,
8125 db=db,
8126 llm=ScriptedLLM([
8127 LLMResponse(tool_calls=[ToolCall(name="shell_exec", arguments={"command": "python run_next_trial.py"})])
8128 ]),
8129 )
8130
8131 assert result.status == "blocked"
8132 assert result.result["error"] == "experiment stagnation decision required"
8133 assert result.result["blocked_tool"] == "shell_exec"
8134 assert result.result["experiment_stagnation"]["non_improving_count"] == 5
8135 finally:
8136 db.close()
8137
8138
8139def test_run_one_step_allows_branch_decision_after_experiment_stagnation(tmp_path):
8140 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
8141 db = AgentDB(tmp_path / "state.db")
8142 try:
8143 job_id = db.create_job(
8144 "Optimize a measurable process and keep improving",
8145 title="experiment-stagnation",
8146 kind="generic",
8147 metadata=_stagnant_experiment_metadata(),
8148 )
8149 run_id = db.start_run(job_id, model="test")
8150 for index in range(6):
8151 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_experiment")
8152 db.finish_step(step_id, status="completed", output_data={"success": True, "experiment": {"title": f"trial {index}"}})
8153 db.finish_run(run_id, "completed")
8154
8155 result = run_one_step(
8156 job_id,
8157 config=config,
8158 db=db,
8159 llm=ScriptedLLM([
8160 LLMResponse(tool_calls=[
8161 ToolCall(
8162 name="record_tasks",
8163 arguments={
8164 "tasks": [
8165 {
8166 "title": "Pivot away from stagnant measured branch",
8167 "status": "open",
8168 "output_contract": "decision",
8169 "acceptance_criteria": "A materially different branch is selected.",
8170 }
8171 ]
8172 },
8173 )
8174 ])
8175 ]),
8176 )
8177
8178 assert result.status == "completed"
8179 assert result.tool_name == "record_tasks"
8180 finally:
8181 db.close()
8182
8183
8184def test_run_one_step_allows_blocked_experiment_after_experiment_stagnation(tmp_path):
8185 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
8186 db = AgentDB(tmp_path / "state.db")
8187 try:
8188 job_id = db.create_job(
8189 "Optimize a measurable process and keep improving",
8190 title="experiment-stagnation",
8191 kind="generic",
8192 metadata=_stagnant_experiment_metadata(),
8193 )
8194 run_id = db.start_run(job_id, model="test")
8195 for index in range(6):
8196 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_experiment")
8197 db.finish_step(step_id, status="completed", output_data={"success": True, "experiment": {"title": f"trial {index}"}})
8198 db.finish_run(run_id, "completed")
8199
8200 result = run_one_step(
8201 job_id,
8202 config=config,
8203 db=db,
8204 llm=ScriptedLLM([
8205 LLMResponse(tool_calls=[
8206 ToolCall(
8207 name="record_experiment",
8208 arguments={
8209 "title": "Stagnant branch decision",
8210 "status": "blocked",
8211 "metric_name": "score",
8212 "metric_unit": "units",
8213 "result": "recent trials did not improve the objective",
8214 "next_action": "pivot to a materially different branch",
8215 },
8216 )
8217 ])
8218 ]),
8219 )
8220
8221 assert result.status == "completed"
8222 assert result.tool_name == "record_experiment"
8223 finally:
8224 db.close()
8225
8226
8227def test_delivery_experiment_next_action_blocks_unrelated_research(tmp_path):
8228 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
8229 db = AgentDB(tmp_path / "state.db")
8230 try:
8231 job_id = db.create_job("Improve a generic deliverable", title="deliverable", kind="generic")
8232 db.update_job_metadata(job_id, {
8233 "experiment_ledger": [{
8234 "title": "deliverable gap",
8235 "status": "measured",
8236 "metric_name": "coverage",
8237 "metric_value": 0.25,
8238 "metric_unit": "ratio",
8239 "next_action": "merge the measured output into the deliverable file",
8240 }],
8241 })
8242
8243 result = run_one_step(
8244 job_id,
8245 config=config,
8246 db=db,
8247 llm=ScriptedLLM([LLMResponse(tool_calls=[ToolCall(name="web_search", arguments={"query": "more background"})])]),
8248 registry=SuccessRegistry(),
8249 )
8250
8251 assert result.status == "blocked"
8252 assert result.result["error"] == "experiment next action pending"
8253 assert "merge the measured output" in result.result["experiment_next_action"]["next_action"]
8254 finally:
8255 db.close()
8256
8257
8258def test_research_experiment_next_action_allows_research(tmp_path):
8259 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
8260 db = AgentDB(tmp_path / "state.db")
8261 try:
8262 job_id = db.create_job("Improve a generic deliverable", title="deliverable", kind="generic")
8263 db.update_job_metadata(job_id, {
8264 "experiment_ledger": [{
8265 "title": "source gap",
8266 "status": "measured",
8267 "metric_name": "coverage",
8268 "metric_value": 0.25,
8269 "metric_unit": "ratio",
8270 "next_action": "search for additional independent sources",
8271 }],
8272 })
8273
8274 result = run_one_step(
8275 job_id,
8276 config=config,
8277 db=db,
8278 llm=ScriptedLLM([LLMResponse(tool_calls=[ToolCall(name="web_search", arguments={"query": "more background"})])]),
8279 registry=SuccessRegistry(),
8280 )
8281
8282 assert result.status == "completed"
8283 assert result.tool_name == "web_search"
8284 finally:
8285 db.close()
8286
8287
8288def test_delivery_experiment_next_action_blocks_read_only_shell(tmp_path):
8289 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
8290 db = AgentDB(tmp_path / "state.db")
8291 try:
8292 job_id = db.create_job("Improve a generic deliverable", title="deliverable", kind="generic")
8293 db.update_job_metadata(job_id, {
8294 "experiment_ledger": [{
8295 "title": "deliverable gap",
8296 "status": "measured",
8297 "metric_name": "coverage",
8298 "metric_value": 0.25,
8299 "metric_unit": "ratio",
8300 "next_action": "merge the measured output into the deliverable file",
8301 }],
8302 })
8303
8304 result = run_one_step(
8305 job_id,
8306 config=config,
8307 db=db,
8308 llm=ScriptedLLM([LLMResponse(tool_calls=[ToolCall(name="shell_exec", arguments={"command": "cat output.txt 2>/dev/null"})])]),
8309 registry=SuccessRegistry(),
8310 )
8311
8312 assert result.status == "blocked"
8313 assert result.result["error"] == "experiment next action pending"
8314 finally:
8315 db.close()
8316
8317
8318def test_delivery_experiment_next_action_allows_bounded_verification_shell(tmp_path):
8319 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
8320 db = AgentDB(tmp_path / "state.db")
8321 try:
8322 job_id = db.create_job("Improve a generic runtime", title="runtime", kind="generic")
8323 db.update_job_metadata(job_id, {
8324 "experiment_ledger": [{
8325 "title": "runtime gap",
8326 "status": "measured",
8327 "metric_name": "valid_files",
8328 "metric_value": 1,
8329 "metric_unit": "files",
8330 "next_action": "build runner binary then run benchmark with validated file",
8331 }],
8332 })
8333
8334 result = run_one_step(
8335 job_id,
8336 config=config,
8337 db=db,
8338 llm=ScriptedLLM([LLMResponse(tool_calls=[
8339 ToolCall(name="shell_exec", arguments={"command": "ls build/bin/runner 2>/dev/null || command -v runner"})
8340 ])]),
8341 registry=SuccessRegistry(),
8342 )
8343
8344 assert result.status == "completed"
8345 assert result.tool_name == "shell_exec"
8346 finally:
8347 db.close()
8348
8349
8350def test_failed_next_action_requires_accounting_before_more_shell(tmp_path):
8351 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
8352 db = AgentDB(tmp_path / "state.db")
8353 try:
8354 job_id = db.create_job("Improve a generic runtime", title="runtime", kind="generic")
8355 db.update_job_metadata(job_id, {
8356 "experiment_ledger": [{
8357 "title": "runtime gap",
8358 "status": "measured",
8359 "metric_name": "valid_files",
8360 "metric_value": 1,
8361 "metric_unit": "files",
8362 "next_action": "build runner binary then run benchmark with validated file",
8363 }],
8364 })
8365 run_id = db.start_run(job_id, model="test")
8366 step_id = db.add_step(
8367 job_id=job_id,
8368 run_id=run_id,
8369 kind="tool",
8370 tool_name="shell_exec",
8371 input_data={"arguments": {"command": "cd /tmp/runtime && mkdir -p build && build-tool .."}},
8372 )
8373 db.finish_step(
8374 step_id,
8375 status="completed",
8376 output_data={
8377 "success": True,
8378 "returncode": 0,
8379 "stdout": "/bin/sh: 1: build-tool: not found\n",
8380 "stderr": "",
8381 },
8382 )
8383 db.finish_run(run_id, "completed")
8384
8385 result = run_one_step(
8386 job_id,
8387 config=config,
8388 db=db,
8389 llm=ScriptedLLM([LLMResponse(tool_calls=[
8390 ToolCall(name="shell_exec", arguments={"command": "ls /tmp/runtime/build/bin/runner 2>&1"})
8391 ])]),
8392 registry=SuccessRegistry(),
8393 )
8394
8395 assert result.status == "blocked"
8396 assert result.result["error"] == "action result accounting required"
8397 assert result.result["action_failure"]["step_no"] == 1
8398 assert result.result["action_failure"]["missing_commands"] == ["build-tool"]
8399 assert "build-tool: not found" in result.result["action_failure"]["excerpt"]
8400 finally:
8401 db.close()
8402
8403
8404def test_failed_next_action_prompt_prioritizes_accounting(tmp_path):
8405 db = AgentDB(tmp_path / "state.db")
8406 try:
8407 job_id = db.create_job("Improve a generic runtime", title="runtime", kind="generic")
8408 db.update_job_metadata(job_id, {
8409 "experiment_ledger": [{
8410 "title": "runtime gap",
8411 "status": "measured",
8412 "metric_name": "valid_files",
8413 "metric_value": 1,
8414 "metric_unit": "files",
8415 "next_action": "build runner binary then run benchmark with validated file",
8416 }],
8417 })
8418 run_id = db.start_run(job_id, model="test")
8419 step_id = db.add_step(
8420 job_id=job_id,
8421 run_id=run_id,
8422 kind="tool",
8423 tool_name="shell_exec",
8424 input_data={"arguments": {"command": "cd /tmp/runtime && build-tool .."}},
8425 )
8426 db.finish_step(
8427 step_id,
8428 status="failed",
8429 output_data={
8430 "success": False,
8431 "returncode": 0,
8432 "stdout": "/bin/sh: 1: build-tool: not found\n",
8433 "error": "command output indicates missing command despite exit status 0",
8434 },
8435 )
8436 db.finish_run(run_id, "completed")
8437
8438 messages = build_messages(db.get_job(job_id), db.list_steps(job_id=job_id))
8439 prompt = messages[-1]["content"]
8440
8441 assert "latest experiment next action was attempted" in prompt
8442 assert "Missing commands: build-tool" in prompt
8443 assert "record_experiment" in prompt
8444 assert "build-tool: not found" in prompt
8445 finally:
8446 db.close()
8447
8448
8449def test_failed_next_action_narrows_available_tools_to_accounting(tmp_path):
8450 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
8451 db = AgentDB(tmp_path / "state.db")
8452 try:
8453 job_id = db.create_job("Improve a generic runtime", title="runtime", kind="generic")
8454 db.update_job_metadata(job_id, {
8455 "experiment_ledger": [{
8456 "title": "runtime gap",
8457 "status": "measured",
8458 "metric_name": "valid_files",
8459 "metric_value": 1,
8460 "metric_unit": "files",
8461 "next_action": "build runner binary then run benchmark with validated file",
8462 }],
8463 })
8464 run_id = db.start_run(job_id, model="test")
8465 step_id = db.add_step(
8466 job_id=job_id,
8467 run_id=run_id,
8468 kind="tool",
8469 tool_name="shell_exec",
8470 input_data={"arguments": {"command": "cd /tmp/runtime && build-tool .."}},
8471 )
8472 db.finish_step(
8473 step_id,
8474 status="completed",
8475 output_data={"success": True, "returncode": 0, "stdout": "/bin/sh: 1: build-tool: not found\n"},
8476 )
8477 db.finish_run(run_id, "completed")
8478 llm = CapturingLLM(
8479 LLMResponse(tool_calls=[ToolCall(name="record_lesson", arguments={"lesson": "missing build tool"})])
8480 )
8481
8482 run_one_step(job_id, config=config, db=db, llm=llm)
8483
8484 tool_names = {tool["function"]["name"] for tool in llm.tools}
8485 assert {"record_experiment", "record_lesson", "record_tasks"}.issubset(tool_names)
8486 assert "shell_exec" not in tool_names
8487 assert "web_search" not in tool_names
8488 assert "write_artifact" not in tool_names
8489 finally:
8490 db.close()
8491
8492
8493def test_accounted_next_action_failure_does_not_keep_blocking(tmp_path):
8494 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
8495 db = AgentDB(tmp_path / "state.db")
8496 try:
8497 job_id = db.create_job("Improve a generic runtime", title="runtime", kind="generic")
8498 db.update_job_metadata(job_id, {
8499 "experiment_ledger": [{
8500 "title": "runtime gap",
8501 "status": "measured",
8502 "metric_name": "valid_files",
8503 "metric_value": 1,
8504 "metric_unit": "files",
8505 "next_action": "build runner binary then run benchmark with validated file",
8506 }],
8507 })
8508 run_id = db.start_run(job_id, model="test")
8509 failed_step_id = db.add_step(
8510 job_id=job_id,
8511 run_id=run_id,
8512 kind="tool",
8513 tool_name="shell_exec",
8514 input_data={"arguments": {"command": "cd /tmp/runtime && build-tool .."}},
8515 )
8516 db.finish_step(
8517 failed_step_id,
8518 status="completed",
8519 output_data={"success": True, "returncode": 0, "stdout": "/bin/sh: 1: build-tool: not found\n"},
8520 )
8521 accounted_step_id = db.add_step(
8522 job_id=job_id,
8523 run_id=run_id,
8524 kind="tool",
8525 tool_name="record_experiment",
8526 input_data={"arguments": {"title": "build failed"}},
8527 )
8528 db.finish_step(
8529 accounted_step_id,
8530 status="completed",
8531 output_data={"success": True, "experiment": {"title": "build failed", "status": "failed"}},
8532 )
8533 db.finish_run(run_id, "completed")
8534
8535 result = run_one_step(
8536 job_id,
8537 config=config,
8538 db=db,
8539 llm=ScriptedLLM([LLMResponse(tool_calls=[
8540 ToolCall(name="shell_exec", arguments={"command": "printf updated > /tmp/runtime/recovery-plan.txt"})
8541 ])]),
8542 registry=SuccessRegistry(),
8543 )
8544
8545 assert result.status == "completed"
8546 assert result.tool_name == "shell_exec"
8547 finally:
8548 db.close()
8549
8550
8551def test_delivery_experiment_next_action_allows_write_shell(tmp_path):
8552 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
8553 db = AgentDB(tmp_path / "state.db")
8554 try:
8555 job_id = db.create_job("Improve a generic deliverable", title="deliverable", kind="generic")
8556 db.update_job_metadata(job_id, {
8557 "experiment_ledger": [{
8558 "title": "deliverable gap",
8559 "status": "measured",
8560 "metric_name": "coverage",
8561 "metric_value": 0.25,
8562 "metric_unit": "ratio",
8563 "next_action": "merge the measured output into the deliverable file",
8564 }],
8565 })
8566
8567 result = run_one_step(
8568 job_id,
8569 config=config,
8570 db=db,
8571 llm=ScriptedLLM([LLMResponse(tool_calls=[ToolCall(name="shell_exec", arguments={"command": "printf updated > output.txt"})])]),
8572 registry=SuccessRegistry(),
8573 )
8574
8575 assert result.status == "completed"
8576 assert result.tool_name == "shell_exec"
8577 finally:
8578 db.close()
8579
8580
8581def test_write_file_can_consume_recent_shell_evidence(tmp_path):
8582 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
8583 db = AgentDB(tmp_path / "state.db")
8584 try:
8585 job_id = db.create_job("Create a concrete output", title="output", kind="generic")
8586
8587 first = run_one_step(
8588 job_id,
8589 config=config,
8590 db=db,
8591 llm=ScriptedLLM([LLMResponse(tool_calls=[ToolCall(name="shell_exec", arguments={"command": "find . -type f"})])]),
8592 registry=LargeShellEvidenceRegistry(),
8593 )
8594 second = run_one_step(
8595 job_id,
8596 config=config,
8597 db=db,
8598 llm=ScriptedLLM([LLMResponse(tool_calls=[ToolCall(name="write_file", arguments={"path": "out.txt", "content": "done"})])]),
8599 registry=SuccessRegistry(),
8600 )
8601
8602 assert first.tool_name == "shell_exec"
8603 assert second.status == "completed"
8604 assert second.tool_name == "write_file"
8605 finally:
8606 db.close()
8607
8608
8609def test_write_file_creates_validation_obligation_for_code_outputs(tmp_path):
8610 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
8611 db = AgentDB(tmp_path / "state.db")
8612 try:
8613 job_id = db.create_job("Create a validated script", title="validate-file", kind="generic")
8614 path = tmp_path / "generated.py"
8615
8616 first = run_one_step(
8617 job_id,
8618 config=config,
8619 db=db,
8620 llm=ScriptedLLM([
8621 LLMResponse(tool_calls=[ToolCall(
8622 name="write_file",
8623 arguments={"path": str(path), "content": "print('ok')\n"},
8624 )])
8625 ]),
8626 )
8627 job = db.get_job(job_id)
8628 obligation = job["metadata"]["pending_file_validation_obligation"]
8629
8630 assert first.status == "completed"
8631 assert obligation["path"] == str(path)
8632 assert "py_compile" in obligation["suggested_validation"]
8633 finally:
8634 db.close()
8635
8636
8637def test_file_validation_obligation_blocks_research_until_validated(tmp_path):
8638 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
8639 db = AgentDB(tmp_path / "state.db")
8640 try:
8641 job_id = db.create_job("Create a validated script", title="validate-file", kind="generic")
8642 path = tmp_path / "generated.py"
8643 run_one_step(
8644 job_id,
8645 config=config,
8646 db=db,
8647 llm=ScriptedLLM([
8648 LLMResponse(tool_calls=[ToolCall(
8649 name="write_file",
8650 arguments={"path": str(path), "content": "print('ok')\n"},
8651 )])
8652 ]),
8653 )
8654
8655 blocked = run_one_step(
8656 job_id,
8657 config=config,
8658 db=db,
8659 llm=ScriptedLLM([LLMResponse(tool_calls=[ToolCall(name="web_search", arguments={"query": "more context"})])]),
8660 registry=SuccessRegistry(),
8661 )
8662 validated = run_one_step(
8663 job_id,
8664 config=config,
8665 db=db,
8666 llm=ScriptedLLM([
8667 LLMResponse(tool_calls=[ToolCall(
8668 name="shell_exec",
8669 arguments={"command": f"python3 -m py_compile {path}"},
8670 )])
8671 ]),
8672 )
8673 job = db.get_job(job_id)
8674
8675 assert blocked.status == "blocked"
8676 assert blocked.result["error"] == "file validation pending"
8677 assert validated.status == "completed"
8678 assert job["metadata"].get("pending_file_validation_obligation") == {}
8679 assert job["metadata"]["last_file_validation_obligation"]["resolution_status"] == "validated"
8680 finally:
8681 db.close()
8682
8683
8684def test_delivery_experiment_next_action_allows_internal_artifact_review(tmp_path):
8685 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
8686 db = AgentDB(tmp_path / "state.db")
8687 try:
8688 job_id = db.create_job("Improve a generic deliverable", title="deliverable", kind="generic")
8689 db.update_job_metadata(job_id, {
8690 "experiment_ledger": [{
8691 "title": "deliverable gap",
8692 "status": "measured",
8693 "metric_name": "coverage",
8694 "metric_value": 0.25,
8695 "metric_unit": "ratio",
8696 "next_action": "merge the measured output into the deliverable file",
8697 }],
8698 })
8699
8700 result = run_one_step(
8701 job_id,
8702 config=config,
8703 db=db,
8704 llm=ScriptedLLM([LLMResponse(tool_calls=[ToolCall(name="search_artifacts", arguments={"query": "saved evidence"})])]),
8705 registry=SuccessRegistry(),
8706 )
8707
8708 assert result.status == "completed"
8709 assert result.tool_name == "search_artifacts"
8710 finally:
8711 db.close()
8712
8713
8714def test_prompt_marks_recent_anti_bot_browser_source():
8715 job = {"title": "research", "kind": "generic", "objective": "find research"}
8716 steps = [{
8717 "step_no": 8,
8718 "kind": "tool",
8719 "status": "completed",
8720 "tool_name": "browser_navigate",
8721 "summary": "browser_navigate opened Just a moment... <https://clutch.co/example>",
8722 "input": {"arguments": {"url": "https://clutch.co/example"}},
8723 "output": {
8724 "data": {"title": "Just a moment...", "url": "https://clutch.co/example"},
8725 "snapshot": "Performing security verification. Cloudflare security challenge.",
8726 },
8727 }]
8728
8729 messages = build_messages(job, steps)
8730
8731 assert "source_warning=cloudflare anti-bot challenge" in messages[-1]["content"]
8732
8733
8734def test_prompt_marks_recent_captcha_browser_block():
8735 job = {"title": "research", "kind": "generic", "objective": "find research"}
8736 steps = [{
8737 "step_no": 8,
8738 "kind": "tool",
8739 "status": "completed",
8740 "tool_name": "browser_snapshot",
8741 "summary": "browser_snapshot returned 1250 chars",
8742 "input": {"arguments": {"full": True}},
8743 "output": {
8744 "data": {
8745 "origin": "https://source.example/search",
8746 "snapshot": 'Iframe "Security CAPTCHA" You have been blocked. You are browsing and clicking at a speed much faster than expected.',
8747 },
8748 },
8749 }]
8750
8751 messages = build_messages(job, steps)
8752
8753 assert "source_warning=captcha/anti-bot block" in messages[-1]["content"]
8754
8755
8756def test_prompt_includes_browser_candidate_names():
8757 job = {"title": "research", "kind": "generic", "objective": "find research"}
8758 steps = [{
8759 "step_no": 9,
8760 "kind": "tool",
8761 "status": "completed",
8762 "tool_name": "browser_snapshot",
8763 "summary": "browser_snapshot returned 2000 chars",
8764 "input": {"arguments": {"full": False}},
8765 "output": {
8766 "data": {
8767 "snapshot": "source page",
8768 "refs": {
8769 "e1": {"name": "Contact", "role": "link"},
8770 "e2": {"name": "Drytech Interiors", "role": "link"},
8771 "e3": {"name": "Flavour Chaser", "role": "link"},
8772 },
8773 },
8774 },
8775 }]
8776
8777 messages = build_messages(job, steps)
8778
8779 assert "Drytech Interiors (@e2)" in messages[-1]["content"]
8780 assert "Flavour Chaser (@e3)" in messages[-1]["content"]
8781 assert "Contact (@e1)" not in messages[-1]["content"]
8782
8783
8784def test_prompt_includes_candidate_names_from_table_cells():
8785 job = {"title": "research", "kind": "generic", "objective": "find research"}
8786 steps = [{
8787 "step_no": 10,
8788 "kind": "tool",
8789 "status": "completed",
8790 "tool_name": "browser_navigate",
8791 "summary": "browser_navigate opened list",
8792 "input": {"arguments": {"url": "https://example.com/list"}},
8793 "output": {
8794 "data": {"title": "list", "url": "https://example.com/list"},
8795 "snapshot": "table",
8796 "refs": {
8797 "e100": {"name": "Organization Name", "role": "cell"},
8798 "e101": {"name": "Services", "role": "cell"},
8799 "e102": {
8800 "name": "Custom integration, workflow automation, reliability testing, reporting",
8801 "role": "cell",
8802 },
8803 "e103": {"name": "4.8", "role": "cell"},
8804 "e104": {"name": "Major Tom", "role": "cell"},
8805 "e105": {"name": "Kffein", "role": "cell"},
8806 },
8807 },
8808 }]
8809
8810 messages = build_messages(job, steps)
8811
8812 content = messages[-1]["content"]
8813 assert "Major Tom (@e104)" in content
8814 assert "Kffein (@e105)" in content
8815 assert "Organization Name (@e100)" not in content
8816 assert "Custom ecommerce" not in content
8817 assert "4.8 (@e103)" not in content
8818
8819
8820def test_prompt_includes_recovery_candidates_after_stale_ref():
8821 job = {"title": "research", "kind": "generic", "objective": "find research"}
8822 steps = [{
8823 "step_no": 10,
8824 "kind": "tool",
8825 "status": "failed",
8826 "tool_name": "browser_click",
8827 "summary": "browser_click failed: Unknown ref: e102",
8828 "input": {"arguments": {"ref": "@e102"}},
8829 "error": "Unknown ref: e102",
8830 "output": {
8831 "success": False,
8832 "error": "Unknown ref: e102",
8833 "recovery_guidance": "The ref was stale or missing.",
8834 "recovery_snapshot": {
8835 "data": {
8836 "refs": {
8837 "e4": {"name": "Clearset Vac Truck Services", "role": "link"},
8838 },
8839 },
8840 },
8841 },
8842 }]
8843
8844 messages = build_messages(job, steps)
8845
8846 content = messages[-1]["content"]
8847 assert "Unknown ref: e102" in content
8848 assert "Clearset Vac Truck Services (@e4)" in content
8849
8850
8851def test_run_one_step_blocks_exact_duplicate_tool_call(tmp_path):
8852 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
8853 db = AgentDB(tmp_path / "state.db")
8854 call = ToolCall(
8855 name="write_artifact",
8856 arguments={"title": "same", "content": "same content"},
8857 )
8858 try:
8859 job_id = db.create_job("Do not repeat exact tools", title="dedupe")
8860 first = run_one_step(job_id, config=config, db=db, llm=ScriptedLLM([LLMResponse(tool_calls=[call])]))
8861 second = run_one_step(job_id, config=config, db=db, llm=ScriptedLLM([LLMResponse(tool_calls=[call])]))
8862
8863 assert first.status == "completed"
8864 assert second.status == "blocked"
8865 assert second.result["error"] == "duplicate tool call blocked"
8866 assert second.result["recoverable"] is True
8867 assert "previous_step" in second.result
8868 finally:
8869 db.close()
8870
8871
8872def test_duplicate_artifact_read_guidance_pushes_follow_up_work(tmp_path):
8873 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
8874 db = AgentDB(tmp_path / "state.db")
8875 try:
8876 job_id = db.create_job("Use artifact once", title="artifact")
8877 run_id = db.start_run(job_id)
8878 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="write_artifact")
8879 artifacts = ArtifactStore(tmp_path, db)
8880 stored = artifacts.write_text(job_id=job_id, run_id=run_id, step_id=step_id, title="Evidence", content="saved")
8881 db.finish_step(step_id, status="completed", output_data={"success": True, "artifact_id": stored.id, "path": str(stored.path)})
8882 db.finish_run(run_id, "completed")
8883 call = ToolCall(name="read_artifact", arguments={"artifact_id": stored.id})
8884
8885 first = run_one_step(job_id, config=config, db=db, llm=ScriptedLLM([LLMResponse(tool_calls=[call])]))
8886 second = run_one_step(job_id, config=config, db=db, llm=ScriptedLLM([LLMResponse(tool_calls=[call])]))
8887
8888 assert first.status == "completed"
8889 assert second.status == "blocked"
8890 assert "Do not read it again" in second.result["guidance"]
8891 finally:
8892 db.close()
8893
8894
8895def test_fresh_evidence_guard_takes_priority_over_duplicate_read(tmp_path):
8896 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
8897 db = AgentDB(tmp_path / "state.db")
8898 try:
8899 job_id = db.create_job("Save fresh evidence before reviewing old artifacts", title="fresh-evidence")
8900 run_id = db.start_run(job_id)
8901 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="write_artifact")
8902 artifacts = ArtifactStore(tmp_path, db)
8903 stored = artifacts.write_text(job_id=job_id, run_id=run_id, step_id=step_id, title="Old Evidence", content="saved")
8904 db.finish_step(step_id, status="completed", output_data={"success": True, "artifact_id": stored.id, "path": str(stored.path)})
8905 db.finish_run(run_id, "completed")
8906
8907 read = ToolCall(name="read_artifact", arguments={"artifact_id": stored.id})
8908 first_read = run_one_step(job_id, config=config, db=db, llm=ScriptedLLM([LLMResponse(tool_calls=[read])]))
8909 assert first_read.status == "completed"
8910
8911 shell = ToolCall(name="shell_exec", arguments={"command": "find . -type f"})
8912 evidence = run_one_step(
8913 job_id,
8914 config=config,
8915 db=db,
8916 llm=ScriptedLLM([LLMResponse(tool_calls=[shell])]),
8917 registry=LargeShellEvidenceRegistry(),
8918 )
8919 assert evidence.status == "completed"
8920
8921 blocked = run_one_step(job_id, config=config, db=db, llm=ScriptedLLM([LLMResponse(tool_calls=[read])]))
8922
8923 assert blocked.status == "blocked"
8924 assert blocked.result["error"] == "artifact required before more research"
8925 assert blocked.result["blocked_tool"] == "read_artifact"
8926 finally:
8927 db.close()
8928
8929
8930def test_run_one_step_allows_repeated_browser_snapshot(tmp_path):
8931 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
8932 db = AgentDB(tmp_path / "state.db")
8933 try:
8934 job_id = db.create_job("Snapshots are stateful", title="snap")
8935 first = run_one_step(
8936 job_id,
8937 config=config,
8938 db=db,
8939 llm=ScriptedLLM([LLMResponse(tool_calls=[ToolCall(name="browser_snapshot", arguments={"full": False})])]),
8940 registry=SnapshotRegistry(),
8941 )
8942 second = run_one_step(
8943 job_id,
8944 config=config,
8945 db=db,
8946 llm=ScriptedLLM([LLMResponse(tool_calls=[ToolCall(name="browser_snapshot", arguments={"full": False})])]),
8947 registry=SnapshotRegistry(),
8948 )
8949
8950 assert first.status == "completed"
8951 assert second.status == "completed"
8952 finally:
8953 db.close()
8954
8955
8956def test_run_one_step_blocks_browser_tools_after_runtime_missing(tmp_path):
8957 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
8958 db = AgentDB(tmp_path / "state.db")
8959 try:
8960 job_id = db.create_job("Browser runtime can be unavailable", title="browser-runtime")
8961 run_id = db.start_run(job_id)
8962 step_id = db.add_step(
8963 job_id=job_id,
8964 run_id=run_id,
8965 kind="tool",
8966 tool_name="browser_navigate",
8967 input_data={"arguments": {"url": "https://example.test"}},
8968 )
8969 db.finish_step(
8970 step_id,
8971 status="failed",
8972 output_data={
8973 "success": False,
8974 "error": "Chrome not found. Checked: Playwright browser cache and Puppeteer browser cache.",
8975 },
8976 summary="browser_navigate failed: Chrome not found",
8977 )
8978 db.finish_run(run_id, "failed")
8979
8980 result = run_one_step(
8981 job_id,
8982 config=config,
8983 db=db,
8984 llm=ScriptedLLM([
8985 LLMResponse(tool_calls=[ToolCall(name="browser_snapshot", arguments={"full": False})])
8986 ]),
8987 registry=SnapshotRegistry(),
8988 )
8989
8990 assert result.status == "blocked"
8991 assert result.result["error"] == "browser runtime unavailable"
8992 assert result.result["browser_runtime"]["tool"] == "browser_navigate"
8993 assert "Use web_search" in result.result["guidance"]
8994 finally:
8995 db.close()
8996
8997
8998def test_run_one_step_allows_non_browser_work_after_runtime_missing(tmp_path):
8999 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
9000 db = AgentDB(tmp_path / "state.db")
9001 try:
9002 job_id = db.create_job("Use fallback tools when browser is missing", title="browser-runtime")
9003 run_id = db.start_run(job_id)
9004 step_id = db.add_step(
9005 job_id=job_id,
9006 run_id=run_id,
9007 kind="tool",
9008 tool_name="browser_navigate",
9009 input_data={"arguments": {"url": "https://example.test"}},
9010 )
9011 db.finish_step(
9012 step_id,
9013 status="failed",
9014 output_data={"success": False, "error": "Browser executable doesn't exist on this host."},
9015 summary="browser_navigate failed: browser executable missing",
9016 )
9017 db.finish_run(run_id, "failed")
9018
9019 result = run_one_step(
9020 job_id,
9021 config=config,
9022 db=db,
9023 llm=ScriptedLLM([
9024 LLMResponse(tool_calls=[ToolCall(name="web_search", arguments={"query": "public docs", "limit": 5})])
9025 ]),
9026 registry=SuccessRegistry(),
9027 )
9028
9029 assert result.status == "completed"
9030 assert result.tool_name == "web_search"
9031 finally:
9032 db.close()
9033
9034
9035def test_run_one_step_skips_batched_browser_call_when_runtime_missing_and_fallback_present(tmp_path):
9036 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
9037 db = AgentDB(tmp_path / "state.db")
9038 try:
9039 job_id = db.create_job("Use fallback tools when browser is missing", title="browser-runtime")
9040 run_id = db.start_run(job_id)
9041 step_id = db.add_step(
9042 job_id=job_id,
9043 run_id=run_id,
9044 kind="tool",
9045 tool_name="browser_navigate",
9046 input_data={"arguments": {"url": "https://example.test"}},
9047 )
9048 db.finish_step(
9049 step_id,
9050 status="failed",
9051 output_data={"success": False, "error": "Chrome not found. Checked: Playwright browser cache."},
9052 summary="browser_navigate failed: Chrome not found",
9053 )
9054 db.finish_run(run_id, "failed")
9055
9056 result = run_one_step(
9057 job_id,
9058 config=config,
9059 db=db,
9060 llm=ScriptedLLM([
9061 LLMResponse(tool_calls=[
9062 ToolCall(name="browser_navigate", arguments={"url": "https://example.test/next"}),
9063 ToolCall(name="web_search", arguments={"query": "public docs", "limit": 5}),
9064 ])
9065 ]),
9066 registry=SuccessRegistry(),
9067 )
9068
9069 tool_steps = [step for step in db.list_steps(job_id=job_id) if step.get("kind") == "tool"]
9070 assert result.status == "completed"
9071 assert result.tool_name == "web_search"
9072 assert tool_steps[-1]["tool_name"] == "web_search"
9073 assert all(
9074 step["input"].get("arguments", {}).get("url") != "https://example.test/next"
9075 for step in tool_steps
9076 if step.get("tool_name") == "browser_navigate"
9077 )
9078 finally:
9079 db.close()
9080
9081
9082def test_run_one_step_removes_browser_tools_from_schema_after_runtime_missing(tmp_path):
9083 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
9084 db = AgentDB(tmp_path / "state.db")
9085 try:
9086 job_id = db.create_job("Use fallback tools when browser is missing", title="browser-runtime")
9087 run_id = db.start_run(job_id)
9088 step_id = db.add_step(
9089 job_id=job_id,
9090 run_id=run_id,
9091 kind="tool",
9092 tool_name="browser_navigate",
9093 input_data={"arguments": {"url": "https://example.test"}},
9094 )
9095 db.finish_step(
9096 step_id,
9097 status="failed",
9098 output_data={"success": False, "error": "Chrome not found. Checked: Playwright browser cache."},
9099 summary="browser_navigate failed: Chrome not found",
9100 )
9101 db.finish_run(run_id, "failed")
9102 llm = CapturingLLM(LLMResponse(tool_calls=[ToolCall(name="web_search", arguments={"query": "fallback"})]))
9103
9104 run_one_step(
9105 job_id,
9106 config=config,
9107 db=db,
9108 llm=llm,
9109 registry=BrowserAndWebRegistry(),
9110 )
9111
9112 tool_names = [tool["function"]["name"] for tool in llm.tools]
9113 assert tool_names == ["web_search"]
9114 finally:
9115 db.close()
9116
9117
9118def test_run_one_step_removes_browser_tools_after_older_runtime_missing(tmp_path):
9119 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
9120 db = AgentDB(tmp_path / "state.db")
9121 try:
9122 job_id = db.create_job("Use fallback tools when browser is missing", title="browser-runtime")
9123 run_id = db.start_run(job_id)
9124 step_id = db.add_step(
9125 job_id=job_id,
9126 run_id=run_id,
9127 kind="tool",
9128 tool_name="browser_navigate",
9129 input_data={"arguments": {"url": "https://example.test"}},
9130 )
9131 db.finish_step(
9132 step_id,
9133 status="failed",
9134 output_data={"success": False, "error": "Chrome not found. Checked: Playwright browser cache."},
9135 summary="browser_navigate failed: Chrome not found",
9136 )
9137 for index in range(80):
9138 filler_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="web_search")
9139 db.finish_step(
9140 filler_id,
9141 status="completed",
9142 output_data={"success": True, "query": f"query {index}", "results": []},
9143 summary=f"web_search query {index}",
9144 )
9145 db.finish_run(run_id, "completed")
9146 llm = CapturingLLM(LLMResponse(tool_calls=[ToolCall(name="web_search", arguments={"query": "fallback"})]))
9147
9148 run_one_step(
9149 job_id,
9150 config=config,
9151 db=db,
9152 llm=llm,
9153 registry=BrowserAndWebRegistry(),
9154 )
9155
9156 tool_names = [tool["function"]["name"] for tool in llm.tools]
9157 assert tool_names == ["web_search"]
9158 finally:
9159 db.close()
9160
9161
9162def test_run_one_step_allows_repeated_defer_for_monitor_intervals(tmp_path):
9163 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
9164 db = AgentDB(tmp_path / "state.db")
9165 call = ToolCall(name="defer_job", arguments={"seconds": 60, "reason": "wait for monitor interval"})
9166 try:
9167 job_id = db.create_job("Check a long-running process later", title="defer")
9168 first = run_one_step(job_id, config=config, db=db, llm=ScriptedLLM([LLMResponse(tool_calls=[call])]))
9169 second = run_one_step(job_id, config=config, db=db, llm=ScriptedLLM([LLMResponse(tool_calls=[call])]))
9170
9171 assert first.status == "completed"
9172 assert second.status == "completed"
9173 assert first.tool_name == "defer_job"
9174 assert second.tool_name == "defer_job"
9175 finally:
9176 db.close()
9177
9178
9179def test_run_one_step_blocks_self_defer_for_next_worker_turn(tmp_path):
9180 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
9181 db = AgentDB(tmp_path / "state.db")
9182 try:
9183 job_id = db.create_job("Keep making progress", title="self-defer")
9184
9185 result = run_one_step(
9186 job_id,
9187 config=config,
9188 db=db,
9189 llm=ScriptedLLM([
9190 LLMResponse(tool_calls=[
9191 ToolCall(
9192 name="defer_job",
9193 arguments={
9194 "seconds": 300,
9195 "reason": "waiting for tasks to be picked up by next worker turn",
9196 "next_action": "continue in the next worker step",
9197 },
9198 )
9199 ])
9200 ]),
9201 )
9202
9203 assert result.status == "blocked"
9204 assert result.tool_name == "defer_job"
9205 assert result.result["error"] == "self-defer blocked"
9206 assert result.result["self_defer"]["matched"] == "next worker turn"
9207 finally:
9208 db.close()
9209
9210
9211def test_run_one_step_blocks_defer_without_wait_reason(tmp_path):
9212 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
9213 db = AgentDB(tmp_path / "state.db")
9214 try:
9215 job_id = db.create_job("Keep making progress", title="self-defer")
9216
9217 result = run_one_step(
9218 job_id,
9219 config=config,
9220 db=db,
9221 llm=ScriptedLLM([
9222 LLMResponse(tool_calls=[
9223 ToolCall(
9224 name="defer_job",
9225 arguments={
9226 "seconds": 300,
9227 "next_action": "build the project and run the measurement",
9228 },
9229 )
9230 ])
9231 ]),
9232 )
9233
9234 assert result.status == "blocked"
9235 assert result.tool_name == "defer_job"
9236 assert result.result["error"] == "self-defer blocked"
9237 assert result.result["self_defer"]["matched"] == "missing wait reason"
9238 finally:
9239 db.close()
9240
9241
9242def test_run_one_step_blocks_search_after_unpersisted_extract(tmp_path):
9243 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
9244 db = AgentDB(tmp_path / "state.db")
9245 try:
9246 job_id = db.create_job("Save extracted evidence before more search", title="guard")
9247 run_id = db.start_run(job_id)
9248 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="web_extract")
9249 db.finish_step(
9250 step_id,
9251 status="completed",
9252 output_data={"success": True, "pages": [{"url": "https://example.com", "text": "useful evidence"}]},
9253 )
9254 db.finish_run(run_id, "completed")
9255
9256 result = run_one_step(
9257 job_id,
9258 config=config,
9259 db=db,
9260 llm=ScriptedLLM([
9261 LLMResponse(tool_calls=[ToolCall(name="web_search", arguments={"query": "more findings", "limit": 5})])
9262 ]),
9263 )
9264
9265 assert result.status == "blocked"
9266 assert result.result["error"] == "artifact required before more research"
9267 assert result.result["blocked_tool"] == "web_search"
9268 assert "auto_checkpoint" in result.result
9269 artifacts = db.list_artifacts(job_id)
9270 assert artifacts[0]["title"].startswith("Auto Evidence Checkpoint")
9271
9272 next_result = run_one_step(
9273 job_id,
9274 config=config,
9275 db=db,
9276 llm=ScriptedLLM([
9277 LLMResponse(tool_calls=[ToolCall(name="web_search", arguments={"query": "different findings", "limit": 5})])
9278 ]),
9279 registry=SuccessRegistry(),
9280 )
9281 assert next_result.status == "blocked"
9282 assert next_result.result["error"] == "evidence checkpoint accounting required"
9283 finally:
9284 db.close()
9285
9286
9287def test_prompt_tells_model_to_save_unpersisted_evidence_before_more_research(tmp_path):
9288 db = AgentDB(tmp_path / "state.db")
9289 try:
9290 job_id = db.create_job("Save evidence before searching", title="guard")
9291 run_id = db.start_run(job_id)
9292 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="web_extract")
9293 db.finish_step(
9294 step_id,
9295 status="completed",
9296 output_data={"success": True, "pages": [{"url": "https://example.com", "text": "useful evidence"}]},
9297 )
9298 job = db.get_job(job_id)
9299 steps = db.list_steps(job_id=job_id)
9300
9301 messages = build_messages(job, steps)
9302
9303 assert "Next-action constraint:" in messages[-1]["content"]
9304 assert "Your next tool call should usually be write_artifact" in messages[-1]["content"]
9305 finally:
9306 db.close()
9307
9308
9309def test_run_one_step_blocks_research_after_unpersisted_browser_snapshot(tmp_path):
9310 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
9311 db = AgentDB(tmp_path / "state.db")
9312 try:
9313 job_id = db.create_job("Save browser evidence before more browsing", title="guard")
9314 run_id = db.start_run(job_id)
9315 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="browser_snapshot")
9316 db.finish_step(
9317 step_id,
9318 status="completed",
9319 output_data={
9320 "success": True,
9321 "data": {"origin": "https://example.com"},
9322 "snapshot": "Useful finding evidence. " * 40,
9323 },
9324 )
9325 db.finish_run(run_id, "completed")
9326
9327 result = run_one_step(
9328 job_id,
9329 config=config,
9330 db=db,
9331 llm=ScriptedLLM([
9332 LLMResponse(tool_calls=[ToolCall(name="browser_scroll", arguments={"direction": "down"})])
9333 ]),
9334 )
9335
9336 assert result.status == "blocked"
9337 assert result.result["error"] == "artifact required before more research"
9338 assert result.result["blocked_tool"] == "browser_scroll"
9339 assert "auto_checkpoint" in result.result
9340 finally:
9341 db.close()
9342
9343
9344def test_prompt_tells_model_to_open_new_branch_when_tasks_are_exhausted():
9345 job = {
9346 "title": "research",
9347 "kind": "generic",
9348 "objective": "keep improving",
9349 "metadata": {
9350 "task_queue": [
9351 {"title": "Initial branch", "status": "done", "priority": 5, "result": "Checkpoint saved"},
9352 {"title": "Blocked branch", "status": "blocked", "priority": 4, "result": "Source unavailable"},
9353 ],
9354 },
9355 }
9356
9357 messages = build_messages(job, [])
9358
9359 content = messages[-1]["content"]
9360 assert "All durable task branches are done" in content
9361 assert "use record_tasks to open the next concrete branch" in content
9362
9363
9364def test_prompt_pushes_deliverable_checkpoint_after_long_research():
9365 job = {
9366 "title": "paper",
9367 "kind": "generic",
9368 "objective": "write a complete research paper from evidence",
9369 "metadata": {
9370 "task_queue": [
9371 {
9372 "title": "Save the first durable draft",
9373 "status": "open",
9374 "priority": 8,
9375 "output_contract": "report",
9376 }
9377 ],
9378 },
9379 }
9380 steps = [
9381 {
9382 "step_no": index + 1,
9383 "status": "completed",
9384 "kind": "tool",
9385 "tool_name": "shell_exec",
9386 "input": {"arguments": {"command": f"cat source_{index}.txt"}},
9387 }
9388 for index in range(18)
9389 ]
9390
9391 content = build_messages(job, steps)[-1]["content"]
9392
9393 assert "Deliverable progress guard:" in content
9394 assert "durable deliverable checkpoint" in content
9395 assert "write_file or write_artifact" in content
9396
9397
9398def test_low_priority_report_task_does_not_block_execution_task_prompt():
9399 job = {
9400 "title": "execution",
9401 "kind": "generic",
9402 "objective": "keep useful work moving",
9403 "metadata": {
9404 "task_queue": [
9405 {
9406 "title": "Review saved output later",
9407 "status": "open",
9408 "priority": 4,
9409 "output_contract": "report",
9410 },
9411 {
9412 "title": "Run current experiment",
9413 "status": "active",
9414 "priority": 9,
9415 "output_contract": "experiment",
9416 },
9417 ],
9418 },
9419 }
9420 steps = [
9421 {
9422 "step_no": index + 1,
9423 "status": "completed",
9424 "kind": "tool",
9425 "tool_name": "shell_exec",
9426 "input": {"arguments": {"command": f"probe_{index}"}},
9427 }
9428 for index in range(18)
9429 ]
9430
9431 content = build_messages(job, steps)[-1]["content"]
9432
9433 assert "Deliverable progress guard:\nNone." in content
9434 assert "durable deliverable checkpoint" not in content
9435
9436
9437def test_run_one_step_blocks_more_research_when_deliverable_needs_checkpoint(tmp_path):
9438 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
9439 db = AgentDB(tmp_path / "state.db")
9440 try:
9441 job_id = db.create_job(
9442 "Write a complete report from collected evidence",
9443 title="deliverable",
9444 metadata={
9445 "task_queue": [
9446 {
9447 "title": "Save the first durable report checkpoint",
9448 "status": "open",
9449 "priority": 8,
9450 "output_contract": "report",
9451 }
9452 ]
9453 },
9454 )
9455 run_id = db.start_run(job_id, model="fake")
9456 for index in range(15):
9457 step_id = db.add_step(
9458 job_id=job_id,
9459 run_id=run_id,
9460 kind="tool",
9461 tool_name="shell_exec",
9462 input_data={"arguments": {"command": f"cat source_{index}.txt"}},
9463 )
9464 db.finish_step(step_id, status="completed", output_data={"success": True, "stdout": "note"})
9465 ledger_step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_findings")
9466 db.finish_step(ledger_step_id, status="completed", output_data={"success": True})
9467 for index in range(15, 18):
9468 step_id = db.add_step(
9469 job_id=job_id,
9470 run_id=run_id,
9471 kind="tool",
9472 tool_name="shell_exec",
9473 input_data={"arguments": {"command": f"cat source_{index}.txt"}},
9474 )
9475 db.finish_step(step_id, status="completed", output_data={"success": True, "stdout": "note"})
9476 db.finish_run(run_id, "completed")
9477
9478 result = run_one_step(
9479 job_id,
9480 config=config,
9481 db=db,
9482 llm=ScriptedLLM([
9483 LLMResponse(tool_calls=[ToolCall(name="web_search", arguments={"query": "more background sources"})])
9484 ]),
9485 )
9486
9487 assert result.status == "blocked"
9488 assert result.result["error"] == "deliverable checkpoint required"
9489 assert result.result["blocked_tool"] == "web_search"
9490 assert result.result["recoverable"] is True
9491 finally:
9492 db.close()
9493
9494
9495def test_prompt_includes_roadmap_and_validation_constraints():
9496 job = {
9497 "title": "broad work",
9498 "kind": "generic",
9499 "objective": "build a broad durable outcome",
9500 "metadata": {
9501 "roadmap": {
9502 "title": "Broad Roadmap",
9503 "status": "active",
9504 "current_milestone": "Foundation",
9505 "validation_contract": "check observable evidence",
9506 "milestones": [{
9507 "title": "Foundation",
9508 "status": "validating",
9509 "validation_status": "pending",
9510 "acceptance_criteria": "evidence exists",
9511 "evidence_needed": "saved output",
9512 "features": [{"title": "First feature", "status": "done"}],
9513 }],
9514 },
9515 },
9516 }
9517
9518 messages = build_messages(job, [])
9519 content = messages[-1]["content"]
9520
9521 assert "Roadmap:" in content
9522 assert "Broad Roadmap" in content
9523 assert "validation=pending" in content
9524 assert "Use record_milestone_validation" in content
9525
9526
9527def test_prompt_suggests_roadmap_for_broad_jobs_without_one():
9528 job = {
9529 "title": "broad work",
9530 "kind": "generic",
9531 "objective": "research and implement a broad multi phase system with validation and durable output",
9532 "metadata": {},
9533 }
9534
9535 messages = build_messages(job, [])
9536 content = messages[-1]["content"]
9537
9538 assert "No roadmap yet" in content
9539 assert "use record_roadmap" in content
9540
9541
9542def test_run_one_step_blocks_branch_work_when_milestone_needs_validation(tmp_path):
9543 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
9544 db = AgentDB(tmp_path / "state.db")
9545 try:
9546 job_id = db.create_job(
9547 "Keep broad work gated by validation",
9548 title="roadmap-gate",
9549 metadata={
9550 "roadmap": {
9551 "title": "Generic Roadmap",
9552 "status": "active",
9553 "milestones": [{
9554 "title": "Foundation",
9555 "status": "validating",
9556 "validation_status": "pending",
9557 "acceptance_criteria": "evidence exists",
9558 "evidence_needed": "saved artifact",
9559 }],
9560 },
9561 },
9562 )
9563
9564 result = run_one_step(
9565 job_id,
9566 config=config,
9567 db=db,
9568 llm=ScriptedLLM([
9569 LLMResponse(tool_calls=[ToolCall(name="web_search", arguments={"query": "new branch", "limit": 5})])
9570 ]),
9571 )
9572
9573 assert result.status == "blocked"
9574 assert result.result["error"] == "milestone validation required"
9575 assert result.result["blocked_tool"] == "web_search"
9576 finally:
9577 db.close()
9578
9579
9580def test_run_one_step_allows_milestone_validation_when_gate_is_active(tmp_path):
9581 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
9582 db = AgentDB(tmp_path / "state.db")
9583 try:
9584 job_id = db.create_job(
9585 "Validate a gated milestone",
9586 title="roadmap-validate",
9587 metadata={
9588 "roadmap": {
9589 "title": "Generic Roadmap",
9590 "status": "active",
9591 "milestones": [{
9592 "title": "Foundation",
9593 "status": "validating",
9594 "validation_status": "pending",
9595 }],
9596 },
9597 },
9598 )
9599
9600 result = run_one_step(
9601 job_id,
9602 config=config,
9603 db=db,
9604 llm=ScriptedLLM([
9605 LLMResponse(tool_calls=[ToolCall(name="record_milestone_validation", arguments={
9606 "milestone": "Foundation",
9607 "validation_status": "passed",
9608 "result": "Acceptance criteria met.",
9609 "evidence": "artifact",
9610 })])
9611 ]),
9612 )
9613
9614 assert result.status == "completed"
9615 assert result.tool_name == "record_milestone_validation"
9616 roadmap = db.get_job(job_id)["metadata"]["roadmap"]
9617 assert roadmap["milestones"][0]["validation_status"] == "passed"
9618 finally:
9619 db.close()
9620
9621
9622def test_run_one_step_allows_matching_pending_milestone_evidence_action(tmp_path):
9623 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
9624 db = AgentDB(tmp_path / "state.db")
9625 try:
9626 job_id = db.create_job(
9627 "Validate a pending milestone",
9628 title="roadmap-pending-shell",
9629 metadata={
9630 "roadmap": {
9631 "title": "Generic Roadmap",
9632 "status": "validating",
9633 "milestones": [{
9634 "title": "Environment baseline",
9635 "status": "validating",
9636 "validation_status": "pending",
9637 "next_action": "Validate candidate files with a shell probe.",
9638 "evidence_needed": "Shell output showing candidate file status.",
9639 }],
9640 },
9641 },
9642 )
9643
9644 result = run_one_step(
9645 job_id,
9646 config=config,
9647 db=db,
9648 llm=ScriptedLLM([
9649 LLMResponse(tool_calls=[ToolCall(name="shell_exec", arguments={
9650 "command": "printf 'candidate file ok\\n'",
9651 "timeout_seconds": 5,
9652 })])
9653 ]),
9654 )
9655
9656 assert result.status == "completed"
9657 assert result.tool_name == "shell_exec"
9658 finally:
9659 db.close()
9660
9661
9662def test_run_one_step_allows_matching_pending_milestone_validation_evidence_action(tmp_path):
9663 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
9664 db = AgentDB(tmp_path / "state.db")
9665 try:
9666 job_id = db.create_job(
9667 "Validate a pending milestone",
9668 title="roadmap-pending-validation-evidence",
9669 metadata={
9670 "roadmap": {
9671 "title": "Generic Roadmap",
9672 "status": "validating",
9673 "milestones": [{
9674 "title": "Build tools",
9675 "status": "validating",
9676 "validation_status": "pending",
9677 "validation_evidence": "Need to verify cmake and compiler paths before building.",
9678 }],
9679 },
9680 },
9681 )
9682
9683 result = run_one_step(
9684 job_id,
9685 config=config,
9686 db=db,
9687 llm=ScriptedLLM([
9688 LLMResponse(tool_calls=[ToolCall(name="shell_exec", arguments={
9689 "command": "printf 'cmake compiler ok\\n'",
9690 "timeout_seconds": 5,
9691 })])
9692 ]),
9693 )
9694
9695 assert result.status == "completed"
9696 assert result.tool_name == "shell_exec"
9697 finally:
9698 db.close()
9699
9700
9701def test_run_one_step_blocks_non_matching_pending_milestone_action(tmp_path):
9702 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
9703 db = AgentDB(tmp_path / "state.db")
9704 try:
9705 job_id = db.create_job(
9706 "Validate a pending milestone",
9707 title="roadmap-pending-unrelated",
9708 metadata={
9709 "roadmap": {
9710 "title": "Generic Roadmap",
9711 "status": "validating",
9712 "milestones": [{
9713 "title": "Environment baseline",
9714 "status": "validating",
9715 "validation_status": "pending",
9716 "next_action": "Validate candidate files with a shell probe.",
9717 "evidence_needed": "Shell output showing candidate file status.",
9718 }],
9719 },
9720 },
9721 )
9722
9723 result = run_one_step(
9724 job_id,
9725 config=config,
9726 db=db,
9727 llm=ScriptedLLM([
9728 LLMResponse(tool_calls=[ToolCall(name="web_search", arguments={
9729 "query": "unrelated topic",
9730 "limit": 5,
9731 })])
9732 ]),
9733 )
9734
9735 assert result.status == "blocked"
9736 assert result.result["error"] == "milestone validation required"
9737 finally:
9738 db.close()
9739
9740
9741def test_run_one_step_blocks_wrong_milestone_validation_when_gate_is_active(tmp_path):
9742 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
9743 db = AgentDB(tmp_path / "state.db")
9744 try:
9745 job_id = db.create_job(
9746 "Validate the active milestone only",
9747 title="roadmap-wrong-milestone",
9748 metadata={
9749 "roadmap": {
9750 "title": "Generic Roadmap",
9751 "status": "validating",
9752 "milestones": [{
9753 "title": "Current milestone",
9754 "status": "validating",
9755 "validation_status": "pending",
9756 }],
9757 },
9758 },
9759 )
9760
9761 result = run_one_step(
9762 job_id,
9763 config=config,
9764 db=db,
9765 llm=ScriptedLLM([
9766 LLMResponse(tool_calls=[ToolCall(name="record_milestone_validation", arguments={
9767 "milestone": "Different milestone",
9768 "validation_status": "passed",
9769 })])
9770 ]),
9771 )
9772
9773 assert result.status == "blocked"
9774 assert result.result["error"] == "current milestone validation required"
9775 roadmap = db.get_job(job_id)["metadata"]["roadmap"]
9776 assert [milestone["title"] for milestone in roadmap["milestones"]] == ["Current milestone"]
9777 finally:
9778 db.close()
9779
9780
9781def test_run_one_step_normalizes_matching_validation_to_active_milestone(tmp_path):
9782 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
9783 db = AgentDB(tmp_path / "state.db")
9784 try:
9785 job_id = db.create_job(
9786 "Validate the active milestone from matching evidence",
9787 title="roadmap-normalize-milestone-validation",
9788 metadata={
9789 "roadmap": {
9790 "title": "Generic Roadmap",
9791 "status": "validating",
9792 "milestones": [{
9793 "title": "Environment baseline evidence: check build tools",
9794 "status": "validating",
9795 "validation_status": "pending",
9796 "validation_evidence": "Need to verify cmake, compiler, and candidate files before building.",
9797 }],
9798 },
9799 },
9800 )
9801
9802 result = run_one_step(
9803 job_id,
9804 config=config,
9805 db=db,
9806 llm=ScriptedLLM([
9807 LLMResponse(tool_calls=[ToolCall(name="record_milestone_validation", arguments={
9808 "milestone": "Validate candidate files and build environment",
9809 "validation_status": "blocked",
9810 "result": "cmake path failed, compiler still needs verification, and candidate file status is unclear.",
9811 "evidence": "shell output showed missing cmake path and file checks are still needed.",
9812 "issues": ["cmake path missing", "candidate file status unresolved"],
9813 })])
9814 ]),
9815 )
9816
9817 assert result.status == "completed"
9818 assert result.tool_name == "record_milestone_validation"
9819 roadmap = db.get_job(job_id)["metadata"]["roadmap"]
9820 assert [milestone["title"] for milestone in roadmap["milestones"]] == [
9821 "Environment baseline evidence: check build tools"
9822 ]
9823 milestone = roadmap["milestones"][0]
9824 assert milestone["validation_status"] == "blocked"
9825 assert milestone["metadata"]["normalized_from_milestone"] == "Validate candidate files and build environment"
9826 assert milestone["metadata"]["normalized_to_active_gate"] is True
9827 finally:
9828 db.close()
9829
9830
9831def test_run_one_step_blocks_task_churn_when_roadmap_stalls(tmp_path):
9832 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
9833 db = AgentDB(tmp_path / "state.db")
9834 try:
9835 job_id = db.create_job(
9836 "Keep roadmap aligned with broad work",
9837 title="roadmap-stale",
9838 metadata={
9839 "roadmap": {
9840 "title": "Generic Roadmap",
9841 "status": "planned",
9842 "milestones": [{
9843 "title": "Foundation",
9844 "status": "planned",
9845 "validation_status": "not_started",
9846 }],
9847 },
9848 "task_queue": [{"title": f"Task {index}", "status": "done"} for index in range(8)],
9849 },
9850 )
9851 run_id = db.start_run(job_id, model="fake")
9852 for index in range(2):
9853 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="write_artifact")
9854 db.finish_step(step_id, status="completed", summary=f"artifact {index}", output_data={"success": True})
9855
9856 result = run_one_step(
9857 job_id,
9858 config=config,
9859 db=db,
9860 llm=ScriptedLLM([
9861 LLMResponse(tool_calls=[ToolCall(name="record_tasks", arguments={
9862 "tasks": [{"title": "More task churn", "status": "open"}]
9863 })])
9864 ]),
9865 )
9866
9867 assert result.status == "blocked"
9868 assert result.result["error"] == "roadmap update required"
9869 assert result.result["blocked_tool"] == "record_tasks"
9870 finally:
9871 db.close()
9872
9873
9874def test_run_one_step_allows_roadmap_update_when_roadmap_stalls(tmp_path):
9875 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
9876 db = AgentDB(tmp_path / "state.db")
9877 try:
9878 job_id = db.create_job(
9879 "Update stale roadmap",
9880 title="roadmap-update",
9881 metadata={
9882 "roadmap": {
9883 "title": "Generic Roadmap",
9884 "status": "planned",
9885 "milestones": [{
9886 "title": "Foundation",
9887 "status": "planned",
9888 "validation_status": "not_started",
9889 }],
9890 },
9891 "task_queue": [{"title": f"Task {index}", "status": "done"} for index in range(8)],
9892 },
9893 )
9894 run_id = db.start_run(job_id, model="fake")
9895 for index in range(2):
9896 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="write_artifact")
9897 db.finish_step(step_id, status="completed", summary=f"artifact {index}", output_data={"success": True})
9898
9899 result = run_one_step(
9900 job_id,
9901 config=config,
9902 db=db,
9903 llm=ScriptedLLM([
9904 LLMResponse(tool_calls=[ToolCall(name="record_roadmap", arguments={
9905 "title": "Generic Roadmap",
9906 "status": "active",
9907 "current_milestone": "Foundation",
9908 "milestones": [{
9909 "title": "Foundation",
9910 "status": "active",
9911 "validation_status": "pending",
9912 "acceptance_criteria": "evidence reviewed",
9913 }],
9914 })])
9915 ]),
9916 )
9917
9918 assert result.status == "completed"
9919 assert result.tool_name == "record_roadmap"
9920 roadmap = db.get_job(job_id)["metadata"]["roadmap"]
9921 assert roadmap["status"] == "active"
9922 assert roadmap["milestones"][0]["validation_status"] == "pending"
9923 finally:
9924 db.close()
9925
9926
9927def test_run_one_step_blocks_branch_work_when_tasks_are_exhausted(tmp_path):
9928 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
9929 db = AgentDB(tmp_path / "state.db")
9930 try:
9931 job_id = db.create_job(
9932 "Keep improving without looping",
9933 title="exhausted",
9934 metadata={"task_queue": [{"title": "First branch", "status": "done", "priority": 5}]},
9935 )
9936
9937 result = run_one_step(
9938 job_id,
9939 config=config,
9940 db=db,
9941 llm=ScriptedLLM([
9942 LLMResponse(tool_calls=[ToolCall(name="web_search", arguments={"query": "same broad topic", "limit": 5})])
9943 ]),
9944 )
9945
9946 assert result.status == "blocked"
9947 assert result.result["error"] == "task branch required before more work"
9948 assert result.result["blocked_tool"] == "web_search"
9949 assert result.result["recoverable"] is True
9950 finally:
9951 db.close()
9952
9953
9954def test_run_one_step_allows_record_tasks_when_tasks_are_exhausted(tmp_path):
9955 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
9956 db = AgentDB(tmp_path / "state.db")
9957 try:
9958 job_id = db.create_job(
9959 "Keep improving by opening branches",
9960 title="branch",
9961 metadata={"task_queue": [{"title": "First branch", "status": "done", "priority": 5}]},
9962 )
9963
9964 result = run_one_step(
9965 job_id,
9966 config=config,
9967 db=db,
9968 llm=ScriptedLLM([
9969 LLMResponse(tool_calls=[ToolCall(name="record_tasks", arguments={
9970 "tasks": [{"title": "Next branch", "status": "open", "priority": 6}]
9971 })])
9972 ]),
9973 )
9974
9975 assert result.status == "completed"
9976 assert result.tool_name == "record_tasks"
9977 job = db.get_job(job_id)
9978 assert any(task["title"] == "Next branch" and task["status"] == "open" for task in job["metadata"]["task_queue"])
9979 finally:
9980 db.close()
9981
9982
9983def test_run_one_step_blocks_new_tasks_when_queue_is_saturated(tmp_path):
9984 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
9985 db = AgentDB(tmp_path / "state.db")
9986 try:
9987 job_id = db.create_job(
9988 "Finish existing work",
9989 title="saturated",
9990 kind="generic",
9991 metadata={
9992 "task_queue": [
9993 {"title": f"Open branch {index}", "status": "open", "priority": index}
9994 for index in range(40)
9995 ]
9996 },
9997 )
9998
9999 result = run_one_step(
10000 job_id,
10001 config=config,
10002 db=db,
10003 llm=ScriptedLLM([
10004 LLMResponse(tool_calls=[
10005 ToolCall(name="record_tasks", arguments={"tasks": [{"title": "Yet another branch", "status": "open"}]})
10006 ])
10007 ]),
10008 )
10009
10010 assert result.status == "blocked"
10011 assert result.result["error"] == "task queue saturated"
10012 assert result.result["task_queue"]["open_count"] == 40
10013 job = db.get_job(job_id)
10014 pressure = job["metadata"]["task_backlog_pressure"]
10015 assert pressure["source"] == "blocked_record_tasks"
10016 assert pressure["open_count"] == 40
10017 assert pressure["reason"] == "too many open tasks"
10018 finally:
10019 db.close()
10020
10021
10022def test_run_one_step_blocks_batch_that_would_saturate_task_queue(tmp_path):
10023 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
10024 db = AgentDB(tmp_path / "state.db")
10025 try:
10026 job_id = db.create_job(
10027 "Keep long-running work focused",
10028 title="projected-sprawl",
10029 kind="generic",
10030 metadata={
10031 "task_queue": [
10032 {"title": f"Existing branch {index}", "status": "done", "priority": index}
10033 for index in range(74)
10034 ]
10035 },
10036 )
10037
10038 result = run_one_step(
10039 job_id,
10040 config=config,
10041 db=db,
10042 llm=ScriptedLLM([
10043 LLMResponse(tool_calls=[
10044 ToolCall(
10045 name="record_tasks",
10046 arguments={
10047 "tasks": [
10048 {"title": f"New branch {index}", "status": "open"}
10049 for index in range(10)
10050 ]
10051 },
10052 )
10053 ])
10054 ]),
10055 )
10056
10057 assert result.status == "blocked"
10058 assert result.result["error"] == "task queue saturated"
10059 assert result.result["task_queue"]["reason"] == "total task queue is too large"
10060 assert result.result["task_queue"]["projected_total_count"] == 84
10061 job = db.get_job(job_id)
10062 assert len(job["metadata"]["task_queue"]) == 74
10063 finally:
10064 db.close()
10065
10066
10067def test_run_one_step_executes_accounting_before_saturated_record_tasks(tmp_path):
10068 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
10069 db = AgentDB(tmp_path / "state.db")
10070 try:
10071 job_id = db.create_job(
10072 "Keep useful recovery state",
10073 title="saturated-batch-order",
10074 kind="generic",
10075 metadata={
10076 "task_queue": [
10077 {"title": f"Existing branch {index}", "status": "done", "priority": index}
10078 for index in range(84)
10079 ]
10080 },
10081 )
10082
10083 result = run_one_step(
10084 job_id,
10085 config=config,
10086 db=db,
10087 llm=ScriptedLLM([
10088 LLMResponse(tool_calls=[
10089 ToolCall(
10090 name="record_tasks",
10091 arguments={"tasks": [{"title": "New blocked branch", "status": "open"}]},
10092 ),
10093 ToolCall(
10094 name="record_lesson",
10095 arguments={"lesson": "Use the existing branch before adding more tasks.", "category": "strategy"},
10096 ),
10097 ])
10098 ]),
10099 )
10100
10101 tool_steps = [step for step in db.list_steps(job_id=job_id) if step.get("kind") == "tool"]
10102 assert [step["tool_name"] for step in tool_steps[-2:]] == ["record_lesson", "record_tasks"]
10103 assert result.status == "blocked"
10104 assert result.result["error"] == "task queue saturated"
10105 lessons = db.get_job(job_id)["metadata"].get("lessons") or []
10106 assert any("existing branch" in str(lesson.get("lesson") or "") for lesson in lessons)
10107 finally:
10108 db.close()
10109
10110
10111def test_run_one_step_blocks_batch_that_would_saturate_open_tasks(tmp_path):
10112 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
10113 db = AgentDB(tmp_path / "state.db")
10114 try:
10115 job_id = db.create_job(
10116 "Execute current branches before planning more",
10117 title="projected-open-sprawl",
10118 kind="generic",
10119 metadata={
10120 "task_queue": [
10121 {"title": f"Open branch {index}", "status": "open", "priority": index}
10122 for index in range(35)
10123 ]
10124 },
10125 )
10126
10127 result = run_one_step(
10128 job_id,
10129 config=config,
10130 db=db,
10131 llm=ScriptedLLM([
10132 LLMResponse(tool_calls=[
10133 ToolCall(
10134 name="record_tasks",
10135 arguments={
10136 "tasks": [
10137 {"title": f"New open branch {index}", "status": "open"}
10138 for index in range(5)
10139 ]
10140 },
10141 )
10142 ])
10143 ]),
10144 )
10145
10146 assert result.status == "blocked"
10147 assert result.result["error"] == "task queue saturated"
10148 assert result.result["task_queue"]["reason"] == "too many open tasks"
10149 assert result.result["task_queue"]["projected_open_count"] == 40
10150 job = db.get_job(job_id)
10151 assert len(job["metadata"]["task_queue"]) == 35
10152 finally:
10153 db.close()
10154
10155
10156def test_run_one_step_ignores_guard_recovery_tasks_for_queue_saturation(tmp_path):
10157 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
10158 db = AgentDB(tmp_path / "state.db")
10159 try:
10160 job_id = db.create_job(
10161 "Continue objective work after guard recovery",
10162 title="guard-task-sprawl",
10163 kind="generic",
10164 metadata={
10165 "task_queue": [
10166 {
10167 "title": f"Resolve guard: recoverable blocker {index}",
10168 "status": "open",
10169 "priority": 9,
10170 "metadata": {"guard_recovery": {"error": f"recoverable blocker {index}"}},
10171 }
10172 for index in range(45)
10173 ]
10174 },
10175 )
10176
10177 result = run_one_step(
10178 job_id,
10179 config=config,
10180 db=db,
10181 llm=ScriptedLLM([
10182 LLMResponse(tool_calls=[
10183 ToolCall(name="record_tasks", arguments={"tasks": [{"title": "Run next objective branch", "status": "open"}]})
10184 ])
10185 ]),
10186 )
10187
10188 assert result.status == "completed"
10189 assert result.tool_name == "record_tasks"
10190 job = db.get_job(job_id)
10191 assert any(task["title"] == "Run next objective branch" for task in job["metadata"]["task_queue"])
10192 finally:
10193 db.close()
10194
10195
10196def test_run_one_step_ignores_guard_recovery_tasks_for_total_sprawl(tmp_path):
10197 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
10198 db = AgentDB(tmp_path / "state.db")
10199 try:
10200 job_id = db.create_job(
10201 "Continue objective work after many recovered guards",
10202 title="guard-total-sprawl",
10203 kind="generic",
10204 metadata={
10205 "task_queue": [
10206 {
10207 "title": f"Resolve guard: recovered blocker {index}",
10208 "status": "done",
10209 "priority": 9,
10210 "metadata": {"guard_recovery": {"error": f"recovered blocker {index}"}},
10211 }
10212 for index in range(85)
10213 ]
10214 },
10215 )
10216
10217 result = run_one_step(
10218 job_id,
10219 config=config,
10220 db=db,
10221 llm=ScriptedLLM([
10222 LLMResponse(tool_calls=[
10223 ToolCall(name="record_tasks", arguments={"tasks": [{"title": "Fresh objective branch", "status": "open"}]})
10224 ])
10225 ]),
10226 )
10227
10228 assert result.status == "completed"
10229 assert result.tool_name == "record_tasks"
10230 job = db.get_job(job_id)
10231 assert any(task["title"] == "Fresh objective branch" for task in job["metadata"]["task_queue"])
10232 finally:
10233 db.close()
10234
10235
10236def test_run_one_step_blocks_read_only_shell_churn(tmp_path):
10237 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
10238 db = AgentDB(tmp_path / "state.db")
10239 try:
10240 job_id = db.create_job("Choose from discovered candidates", title="read-only-churn", kind="generic")
10241 for command in [
10242 "find /tmp/work -type f | head",
10243 "ls -lah /tmp/work",
10244 "curl -s https://example.test/api/list | head -100",
10245 ]:
10246 run_id = db.start_run(job_id, model="test")
10247 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec", input_data={"arguments": {"command": command}})
10248 db.finish_step(step_id, status="completed", output_data={"success": True, "returncode": 0, "stdout": "candidate-a\ncandidate-b"})
10249 db.finish_run(run_id, "completed")
10250
10251 result = run_one_step(
10252 job_id,
10253 config=config,
10254 db=db,
10255 llm=ScriptedLLM([
10256 LLMResponse(tool_calls=[
10257 ToolCall(name="shell_exec", arguments={"command": "curl -s https://example.test/api/list?page=2"})
10258 ])
10259 ]),
10260 )
10261
10262 assert result.status == "blocked"
10263 assert result.result["error"] == "action decision required"
10264 assert result.result["read_only_shell_churn"]["read_only_shell_count"] == 3
10265 finally:
10266 db.close()
10267
10268
10269def test_run_one_step_allows_action_after_read_only_shell_churn(tmp_path):
10270 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
10271 db = AgentDB(tmp_path / "state.db")
10272 try:
10273 job_id = db.create_job("Act after discovered candidates", title="read-only-to-action", kind="generic")
10274 for command in [
10275 "find /tmp/work -type f | head",
10276 "ls -lah /tmp/work",
10277 "curl -s https://example.test/api/list | head -100",
10278 ]:
10279 run_id = db.start_run(job_id, model="test")
10280 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec", input_data={"arguments": {"command": command}})
10281 db.finish_step(step_id, status="completed", output_data={"success": True, "returncode": 0, "stdout": "candidate-a\ncandidate-b"})
10282 db.finish_run(run_id, "completed")
10283
10284 result = run_one_step(
10285 job_id,
10286 config=config,
10287 db=db,
10288 llm=ScriptedLLM([
10289 LLMResponse(tool_calls=[
10290 ToolCall(name="shell_exec", arguments={"command": "python run_candidate.py --input candidate-a"})
10291 ])
10292 ]),
10293 registry=SuccessRegistry(),
10294 )
10295
10296 assert result.status == "completed"
10297 assert result.tool_name == "shell_exec"
10298 finally:
10299 db.close()
10300
10301
10302def test_run_one_step_allows_read_only_shell_after_durable_decision(tmp_path):
10303 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
10304 db = AgentDB(tmp_path / "state.db")
10305 try:
10306 job_id = db.create_job("Recover from inspection churn", title="read-only-decision", kind="generic")
10307 for command in [
10308 "find /tmp/work -type f | head",
10309 "ls -lah /tmp/work",
10310 "curl -s https://example.test/api/list | head -100",
10311 ]:
10312 run_id = db.start_run(job_id, model="test")
10313 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec", input_data={"arguments": {"command": command}})
10314 db.finish_step(step_id, status="completed", output_data={"success": True, "returncode": 0, "stdout": "candidate-a\ncandidate-b"})
10315 db.finish_run(run_id, "completed")
10316
10317 run_id = db.start_run(job_id, model="test")
10318 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="record_lesson")
10319 db.finish_step(
10320 step_id,
10321 status="completed",
10322 output_data={"success": True, "lesson": {"category": "decision", "lesson": "Use candidate-a and inspect its exact metadata next."}},
10323 )
10324 db.finish_run(run_id, "completed")
10325
10326 result = run_one_step(
10327 job_id,
10328 config=config,
10329 db=db,
10330 llm=ScriptedLLM([
10331 LLMResponse(tool_calls=[
10332 ToolCall(name="shell_exec", arguments={"command": "ls -lah /tmp/work/candidate-a"})
10333 ])
10334 ]),
10335 registry=SuccessRegistry(),
10336 )
10337
10338 assert result.status == "completed"
10339 assert result.tool_name == "shell_exec"
10340 finally:
10341 db.close()
10342
10343
10344def test_run_one_step_allows_explicit_download_after_read_only_shell_churn(tmp_path):
10345 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
10346 db = AgentDB(tmp_path / "state.db")
10347 try:
10348 job_id = db.create_job("Download selected candidate", title="read-only-to-download", kind="generic")
10349 for command in [
10350 "find /tmp/work -type f | head",
10351 "ls -lah /tmp/work",
10352 "curl -s https://example.test/api/list | head -100",
10353 ]:
10354 run_id = db.start_run(job_id, model="test")
10355 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec", input_data={"arguments": {"command": command}})
10356 db.finish_step(step_id, status="completed", output_data={"success": True, "returncode": 0, "stdout": "candidate-a\ncandidate-b"})
10357 db.finish_run(run_id, "completed")
10358
10359 result = run_one_step(
10360 job_id,
10361 config=config,
10362 db=db,
10363 llm=ScriptedLLM([
10364 LLMResponse(tool_calls=[
10365 ToolCall(name="shell_exec", arguments={"command": "curl -L -o /tmp/candidate.bin https://example.test/candidate.bin"})
10366 ])
10367 ]),
10368 registry=SuccessRegistry(),
10369 )
10370
10371 assert result.status == "completed"
10372 assert result.tool_name == "shell_exec"
10373 finally:
10374 db.close()
10375
10376
10377def test_run_one_step_blocks_new_tasks_when_queue_sprawls(tmp_path):
10378 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
10379 db = AgentDB(tmp_path / "state.db")
10380 try:
10381 job_id = db.create_job(
10382 "Consolidate long-running work",
10383 title="task-sprawl",
10384 kind="generic",
10385 metadata={
10386 "task_queue": [
10387 {"title": f"Completed branch {index}", "status": "done", "priority": index}
10388 for index in range(80)
10389 ]
10390 },
10391 )
10392
10393 result = run_one_step(
10394 job_id,
10395 config=config,
10396 db=db,
10397 llm=ScriptedLLM([
10398 LLMResponse(tool_calls=[
10399 ToolCall(name="record_tasks", arguments={"tasks": [{"title": "New branch", "status": "open"}]})
10400 ])
10401 ]),
10402 )
10403
10404 assert result.status == "blocked"
10405 assert result.result["error"] == "task queue saturated"
10406 assert result.result["task_queue"]["reason"] == "total task queue is too large"
10407 assert result.result["task_queue"]["total_count"] == 80
10408 finally:
10409 db.close()
10410
10411
10412def test_recent_task_saturation_keeps_record_tasks_for_existing_updates(tmp_path):
10413 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
10414 db = AgentDB(tmp_path / "state.db")
10415 try:
10416 job_id = db.create_job(
10417 "Execute existing work",
10418 title="saturated-tools",
10419 kind="generic",
10420 metadata={
10421 "task_queue": [
10422 {"title": f"Open branch {index}", "status": "open", "priority": index}
10423 for index in range(40)
10424 ]
10425 },
10426 )
10427 first = run_one_step(
10428 job_id,
10429 config=config,
10430 db=db,
10431 llm=ScriptedLLM([
10432 LLMResponse(tool_calls=[
10433 ToolCall(name="record_tasks", arguments={"tasks": [{"title": "New branch", "status": "open"}]})
10434 ])
10435 ]),
10436 )
10437 assert first.status == "blocked"
10438 llm = CapturingLLM(LLMResponse(tool_calls=[ToolCall(name="record_lesson", arguments={"lesson": "execute existing work"})]))
10439
10440 run_one_step(job_id, config=config, db=db, llm=llm)
10441
10442 tool_names = {tool["function"]["name"] for tool in llm.tools}
10443 prompt = llm.messages[-1]["content"]
10444 assert "Task queue saturation" in prompt
10445 assert "Do not create new task branches" in prompt
10446 assert "Existing runnable task titles" in prompt
10447 assert "Open branch 0" in prompt
10448 assert "record_tasks only to update existing task titles" in prompt
10449 assert "record_tasks" in tool_names
10450 assert "record_lesson" in tool_names
10451 assert "shell_exec" in tool_names
10452 finally:
10453 db.close()
10454
10455
10456def test_repeated_task_saturation_temporarily_suppresses_record_tasks(tmp_path):
10457 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
10458 db = AgentDB(tmp_path / "state.db")
10459 try:
10460 job_id = db.create_job(
10461 "Execute existing work",
10462 title="repeated-saturation",
10463 kind="generic",
10464 metadata={
10465 "task_queue": [
10466 {"title": f"Open branch {index}", "status": "open", "priority": index}
10467 for index in range(40)
10468 ]
10469 },
10470 )
10471 for title in ("New branch one", "New branch two"):
10472 blocked = run_one_step(
10473 job_id,
10474 config=config,
10475 db=db,
10476 llm=ScriptedLLM([
10477 LLMResponse(tool_calls=[ToolCall(name="record_tasks", arguments={"tasks": [{"title": title, "status": "open"}]})])
10478 ]),
10479 )
10480 assert blocked.status == "blocked"
10481 assert blocked.result["error"] == "task queue saturated"
10482
10483 llm = CapturingLLM(LLMResponse(tool_calls=[ToolCall(name="record_lesson", arguments={"lesson": "execute existing branch"})]))
10484 run_one_step(job_id, config=config, db=db, llm=llm)
10485
10486 tool_names = {tool["function"]["name"] for tool in llm.tools}
10487 assert "record_tasks" not in tool_names
10488 assert "record_lesson" in tool_names
10489 assert "shell_exec" in tool_names
10490 finally:
10491 db.close()
10492
10493
10494def test_chronic_backlog_suppresses_new_task_planning_tool(tmp_path):
10495 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
10496 db = AgentDB(tmp_path / "state.db")
10497 try:
10498 job_id = db.create_job(
10499 "Execute existing work",
10500 title="chronic-backlog-tools",
10501 kind="generic",
10502 metadata={
10503 "task_queue": [
10504 {"title": f"Open branch {index}", "status": "open", "priority": index}
10505 for index in range(82)
10506 ]
10507 },
10508 )
10509 llm = CapturingLLM(LLMResponse(tool_calls=[ToolCall(name="record_lesson", arguments={"lesson": "execute existing work"})]))
10510
10511 run_one_step(job_id, config=config, db=db, llm=llm)
10512
10513 tool_names = {tool["function"]["name"] for tool in llm.tools}
10514 prompt = llm.messages[-1]["content"]
10515 assert "Current execution focus" in prompt
10516 assert "backlog=82 tasks" in prompt
10517 assert "record_tasks" not in tool_names
10518 assert "record_lesson" in tool_names
10519 assert "shell_exec" in tool_names
10520 finally:
10521 db.close()
10522
10523
10524def test_run_one_step_allows_existing_task_update_when_queue_is_saturated(tmp_path):
10525 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
10526 db = AgentDB(tmp_path / "state.db")
10527 try:
10528 job_id = db.create_job(
10529 "Finish existing work",
10530 title="saturated",
10531 kind="generic",
10532 metadata={
10533 "task_queue": [
10534 {"title": f"Open branch {index}", "status": "open", "priority": index}
10535 for index in range(40)
10536 ]
10537 },
10538 )
10539
10540 result = run_one_step(
10541 job_id,
10542 config=config,
10543 db=db,
10544 llm=ScriptedLLM([
10545 LLMResponse(tool_calls=[
10546 ToolCall(name="record_tasks", arguments={"tasks": [{"title": "Open branch 0", "status": "active"}]})
10547 ])
10548 ]),
10549 )
10550
10551 assert result.status == "completed"
10552 assert result.tool_name == "record_tasks"
10553 job = db.get_job(job_id)
10554 assert job["metadata"]["task_queue"][0]["status"] == "active"
10555 finally:
10556 db.close()
10557
10558
10559def test_run_one_step_allows_semantic_task_update_when_queue_is_saturated(tmp_path):
10560 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
10561 db = AgentDB(tmp_path / "state.db")
10562 try:
10563 job_id = db.create_job(
10564 "Finish existing work",
10565 title="semantic-saturated",
10566 kind="generic",
10567 metadata={
10568 "task_queue": [
10569 {
10570 "title": "Validate model files and run baseline benchmark",
10571 "status": "open",
10572 "priority": 5,
10573 },
10574 *[
10575 {"title": f"Completed branch {index}", "status": "done", "priority": index}
10576 for index in range(81)
10577 ],
10578 ]
10579 },
10580 )
10581
10582 result = run_one_step(
10583 job_id,
10584 config=config,
10585 db=db,
10586 llm=ScriptedLLM([
10587 LLMResponse(tool_calls=[
10588 ToolCall(
10589 name="record_tasks",
10590 arguments={
10591 "tasks": [{
10592 "title": "Validate candidate model files and run baseline benchmark",
10593 "status": "active",
10594 "priority": 10,
10595 }]
10596 },
10597 )
10598 ])
10599 ]),
10600 )
10601
10602 assert result.status == "completed"
10603 assert result.tool_name == "record_tasks"
10604 job = db.get_job(job_id)
10605 task = job["metadata"]["task_queue"][0]
10606 assert task["title"] == "Validate model files and run baseline benchmark"
10607 assert task["status"] == "active"
10608 assert task["metadata"]["original_title"] == "Validate candidate model files and run baseline benchmark"
10609 finally:
10610 db.close()
10611
10612
10613def test_run_one_step_auto_records_anti_bot_browser_source(tmp_path):
10614 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
10615 db = AgentDB(tmp_path / "state.db")
10616 try:
10617 job_id = db.create_job("Avoid blocked browser pages", title="guard")
10618
10619 result = run_one_step(
10620 job_id,
10621 config=config,
10622 db=db,
10623 llm=ScriptedLLM([
10624 LLMResponse(tool_calls=[ToolCall(name="browser_snapshot", arguments={"full": True})])
10625 ]),
10626 registry=AntiBotBrowserRegistry(),
10627 )
10628 job = db.get_job(job_id)
10629 source = job["metadata"]["source_ledger"][0]
10630
10631 assert result.status == "completed"
10632 assert result.result["source_warning"] == "captcha/anti-bot block"
10633 assert source["source"] == "https://source.example/search"
10634 assert source["fail_count"] == 1
10635 assert source["usefulness_score"] == 0.02
10636 assert job["metadata"]["last_lesson"]["category"] == "source_quality"
10637 finally:
10638 db.close()
10639
10640
10641def test_run_one_step_blocks_misleading_artifact_after_anti_bot_snapshot(tmp_path):
10642 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
10643 db = AgentDB(tmp_path / "state.db")
10644 try:
10645 job_id = db.create_job("Do not invent findings from blocked pages", title="guard")
10646 run_id = db.start_run(job_id)
10647 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="browser_snapshot")
10648 db.finish_step(
10649 step_id,
10650 status="completed",
10651 output_data={
10652 "success": True,
10653 "data": {
10654 "origin": "https://source.example/search",
10655 "snapshot": 'Iframe "Security CAPTCHA" You have been blocked.',
10656 },
10657 },
10658 summary="browser_snapshot returned 1250 chars",
10659 )
10660 db.finish_run(run_id, "completed")
10661
10662 result = run_one_step(
10663 job_id,
10664 config=config,
10665 db=db,
10666 llm=ScriptedLLM([
10667 LLMResponse(tool_calls=[ToolCall(
10668 name="write_artifact",
10669 arguments={
10670 "title": "Directory finding source",
10671 "summary": "Contains result listings for finding extraction",
10672 "content": "This source contains reusable findings.",
10673 },
10674 )])
10675 ]),
10676 )
10677 job = db.get_job(job_id)
10678
10679 assert result.status == "blocked"
10680 assert result.result["error"] == "misleading blocked-source artifact blocked"
10681 assert result.result["auto_source_record"]["source"]["source"] == "https://source.example/search"
10682 assert db.list_artifacts(job_id) == []
10683 assert job["metadata"]["source_ledger"][0]["warnings"] == ["captcha/anti-bot block"]
10684 finally:
10685 db.close()
10686
10687
10688def test_run_one_step_allows_blocked_source_artifact_when_acknowledged(tmp_path):
10689 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
10690 db = AgentDB(tmp_path / "state.db")
10691 try:
10692 job_id = db.create_job("Save blocked source notes", title="guard")
10693 run_id = db.start_run(job_id)
10694 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="browser_snapshot")
10695 db.finish_step(
10696 step_id,
10697 status="completed",
10698 output_data={
10699 "success": True,
10700 "data": {
10701 "origin": "https://source.example/search",
10702 "snapshot": 'Iframe "Security CAPTCHA" You have been blocked.',
10703 },
10704 },
10705 )
10706 db.finish_run(run_id, "completed")
10707
10708 result = run_one_step(
10709 job_id,
10710 config=config,
10711 db=db,
10712 llm=ScriptedLLM([
10713 LLMResponse(tool_calls=[ToolCall(
10714 name="write_artifact",
10715 arguments={
10716 "title": "Blocked source note",
10717 "summary": "Blocked by CAPTCHA; not usable as finding evidence",
10718 "content": "The page showed a CAPTCHA and no usable evidence was visible.",
10719 },
10720 )])
10721 ]),
10722 )
10723
10724 assert result.status == "completed"
10725 assert db.list_artifacts(job_id)[0]["title"] == "Blocked source note"
10726 finally:
10727 db.close()
10728
10729
10730def test_run_one_step_blocks_browser_loop_after_anti_bot_snapshot(tmp_path):
10731 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
10732 db = AgentDB(tmp_path / "state.db")
10733 try:
10734 job_id = db.create_job("Pivot after blocked browser pages", title="guard")
10735 run_id = db.start_run(job_id)
10736 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="browser_snapshot")
10737 db.finish_step(
10738 step_id,
10739 status="completed",
10740 output_data={
10741 "success": True,
10742 "data": {
10743 "origin": "https://source.example/search",
10744 "snapshot": 'Iframe "Security CAPTCHA" You have been blocked.',
10745 },
10746 },
10747 )
10748 db.finish_run(run_id, "completed")
10749
10750 result = run_one_step(
10751 job_id,
10752 config=config,
10753 db=db,
10754 llm=ScriptedLLM([
10755 LLMResponse(tool_calls=[ToolCall(name="browser_scroll", arguments={"direction": "down"})])
10756 ]),
10757 )
10758
10759 assert result.status == "blocked"
10760 assert result.result["error"] == "anti-bot source loop blocked"
10761 assert result.result["auto_source_record"]["source"]["fail_count"] == 1
10762 finally:
10763 db.close()
10764
10765
10766def test_run_one_step_blocks_known_bad_browser_source_from_ledger(tmp_path):
10767 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
10768 db = AgentDB(tmp_path / "state.db")
10769 try:
10770 job_id = db.create_job("Avoid sources already scored as bad", title="guard")
10771 db.append_source_record(
10772 job_id,
10773 "https://blocked.example/search",
10774 source_type="blocked_browser_source",
10775 usefulness_score=0.02,
10776 fail_count_delta=1,
10777 warnings=["captcha/anti-bot block"],
10778 outcome="blocked; pivot",
10779 )
10780
10781 result = run_one_step(
10782 job_id,
10783 config=config,
10784 db=db,
10785 llm=ScriptedLLM([
10786 LLMResponse(tool_calls=[ToolCall(name="browser_navigate", arguments={"url": "https://www.blocked.example/search?page=2"})])
10787 ]),
10788 )
10789 job = db.get_job(job_id)
10790
10791 assert result.status == "blocked"
10792 assert result.result["error"] == "known bad source blocked"
10793 assert result.result["known_bad_source"]["source"] == "https://blocked.example/search"
10794 assert job["metadata"]["last_agent_update"]["category"] == "blocked"
10795 finally:
10796 db.close()
10797
10798
10799def test_run_one_step_blocks_known_bad_extract_source_from_ledger(tmp_path):
10800 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
10801 db = AgentDB(tmp_path / "state.db")
10802 try:
10803 job_id = db.create_job("Avoid extracting bad sources", title="guard")
10804 db.append_source_record(
10805 job_id,
10806 "https://lowyield.example/source",
10807 source_type="web_source",
10808 usefulness_score=0.05,
10809 fail_count_delta=2,
10810 outcome="no useful candidates",
10811 )
10812
10813 result = run_one_step(
10814 job_id,
10815 config=config,
10816 db=db,
10817 llm=ScriptedLLM([
10818 LLMResponse(tool_calls=[ToolCall(
10819 name="web_extract",
10820 arguments={"urls": ["https://lowyield.example/source?retry=1"]},
10821 )])
10822 ]),
10823 )
10824
10825 assert result.status == "blocked"
10826 assert result.result["error"] == "known bad source blocked"
10827 assert result.result["known_bad_source"]["fail_count"] == 2
10828 finally:
10829 db.close()
10830
10831
10832def test_run_one_step_allows_child_url_when_bad_web_source_is_domain_root(tmp_path):
10833 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
10834 db = AgentDB(tmp_path / "state.db")
10835 try:
10836 job_id = db.create_job("Avoid over-broad domain source blocks", title="guard")
10837 db.append_source_record(
10838 job_id,
10839 "https://source.example",
10840 source_type="web_source",
10841 usefulness_score=0.05,
10842 fail_count_delta=1,
10843 outcome="root health check failed",
10844 )
10845
10846 result = run_one_step(
10847 job_id,
10848 config=config,
10849 db=db,
10850 llm=ScriptedLLM([
10851 LLMResponse(tool_calls=[ToolCall(
10852 name="web_extract",
10853 arguments={"urls": ["https://source.example/api/public/models"]},
10854 )])
10855 ]),
10856 registry=SuccessRegistry(),
10857 )
10858
10859 assert result.status == "completed"
10860 assert result.tool_name == "web_extract"
10861 finally:
10862 db.close()
10863
10864
10865def test_run_one_step_records_failed_shell_url_source(tmp_path):
10866 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
10867 db = AgentDB(tmp_path / "state.db")
10868 try:
10869 job_id = db.create_job("Avoid broken shell URL sources", title="guard")
10870 url = "https://source.example/api/private/tree/main"
10871
10872 result = run_one_step(
10873 job_id,
10874 config=config,
10875 db=db,
10876 llm=ScriptedLLM([
10877 LLMResponse(tool_calls=[ToolCall(name="shell_exec", arguments={"command": f"curl -s {url}"})])
10878 ]),
10879 registry=FailedUrlShellRegistry(),
10880 )
10881 sources = db.get_job(job_id)["metadata"]["source_ledger"]
10882 source = sources[0]
10883
10884 assert result.status == "failed"
10885 assert source["source"] == url
10886 assert source["source_type"] == "shell_exec"
10887 assert source["fail_count"] == 1
10888 assert source["usefulness_score"] == 0.01
10889 assert source["metadata"]["failure_kind"] == "auth_or_http"
10890 finally:
10891 db.close()
10892
10893
10894def test_run_one_step_records_pathful_failed_shell_urls_not_root_health_checks(tmp_path):
10895 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
10896 db = AgentDB(tmp_path / "state.db")
10897 try:
10898 job_id = db.create_job("Avoid poisoning whole hosts from mixed probes", title="guard")
10899 bad_url = "https://source.example/api/private/tree/main"
10900
10901 result = run_one_step(
10902 job_id,
10903 config=config,
10904 db=db,
10905 llm=ScriptedLLM([
10906 LLMResponse(tool_calls=[ToolCall(
10907 name="shell_exec",
10908 arguments={"command": f"curl -sI https://source.example && curl -s {bad_url}"},
10909 )])
10910 ]),
10911 registry=FailedUrlShellRegistry(),
10912 )
10913 sources = db.get_job(job_id)["metadata"]["source_ledger"]
10914
10915 assert result.status == "failed"
10916 assert [source["source"] for source in sources] == [bad_url]
10917 finally:
10918 db.close()
10919
10920
10921def test_run_one_step_blocks_known_bad_shell_source_family(tmp_path):
10922 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
10923 db = AgentDB(tmp_path / "state.db")
10924 try:
10925 job_id = db.create_job("Pivot from failed source family", title="guard")
10926
10927 first = run_one_step(
10928 job_id,
10929 config=config,
10930 db=db,
10931 llm=ScriptedLLM([
10932 LLMResponse(tool_calls=[ToolCall(
10933 name="shell_exec",
10934 arguments={"command": "curl -L https://source.example/downloads/private/model-a.bin"},
10935 )])
10936 ]),
10937 registry=FailedUrlShellRegistry(),
10938 )
10939 blocked = run_one_step(
10940 job_id,
10941 config=config,
10942 db=db,
10943 llm=ScriptedLLM([
10944 LLMResponse(tool_calls=[ToolCall(
10945 name="shell_exec",
10946 arguments={"command": "curl -L https://source.example/downloads/private/model-b.bin"},
10947 )])
10948 ]),
10949 )
10950 allowed = run_one_step(
10951 job_id,
10952 config=config,
10953 db=db,
10954 llm=ScriptedLLM([
10955 LLMResponse(tool_calls=[ToolCall(
10956 name="shell_exec",
10957 arguments={"command": "curl -L https://source.example/downloads/public/model-b.bin"},
10958 )])
10959 ]),
10960 registry=SuccessRegistry(),
10961 )
10962
10963 assert first.status == "failed"
10964 assert blocked.status == "blocked"
10965 assert blocked.result["error"] == "known bad source blocked"
10966 assert blocked.result["known_bad_source"]["source"] == "https://source.example/downloads/private"
10967 assert allowed.status == "completed"
10968 finally:
10969 db.close()
10970
10971
10972def test_run_one_step_derives_bad_shell_source_family_from_exact_failure(tmp_path):
10973 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
10974 db = AgentDB(tmp_path / "state.db")
10975 try:
10976 job_id = db.create_job("Pivot from exact failed file source", title="guard")
10977 db.append_source_record(
10978 job_id,
10979 "https://source.example/downloads/private/model-a.bin",
10980 source_type="shell_exec",
10981 usefulness_score=0.01,
10982 fail_count_delta=1,
10983 warnings=["auth failure"],
10984 outcome="401 Unauthorized",
10985 )
10986
10987 blocked = run_one_step(
10988 job_id,
10989 config=config,
10990 db=db,
10991 llm=ScriptedLLM([
10992 LLMResponse(tool_calls=[ToolCall(
10993 name="shell_exec",
10994 arguments={"command": "curl -L https://source.example/downloads/private/model-b.bin"},
10995 )])
10996 ]),
10997 )
10998
10999 assert blocked.status == "blocked"
11000 assert blocked.result["error"] == "known bad source blocked"
11001 assert blocked.result["known_bad_source"]["source"] == "https://source.example/downloads/private"
11002 assert blocked.result["known_bad_source"]["metadata"]["source_family_from"].endswith("/model-a.bin")
11003 finally:
11004 db.close()
11005
11006
11007def test_run_one_step_does_not_block_entire_host_after_auth_source_families(tmp_path):
11008 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
11009 db = AgentDB(tmp_path / "state.db")
11010 try:
11011 job_id = db.create_job("Pivot from repeated host auth failures", title="guard")
11012 for source in (
11013 "https://source.example/private/a/model.bin",
11014 "https://source.example/private/b/model.bin",
11015 "https://source.example/private/c/model.bin",
11016 ):
11017 db.append_source_record(
11018 job_id,
11019 source,
11020 source_type="shell_exec",
11021 usefulness_score=0.01,
11022 fail_count_delta=1,
11023 warnings=["401 unauthorized"],
11024 outcome="HTTP 401 Unauthorized",
11025 metadata={"failure_kind": "auth_or_http"},
11026 )
11027
11028 allowed_same_host = run_one_step(
11029 job_id,
11030 config=config,
11031 db=db,
11032 llm=ScriptedLLM([
11033 LLMResponse(tool_calls=[ToolCall(
11034 name="shell_exec",
11035 arguments={"command": "curl -L https://source.example/private/d/model.bin"},
11036 )])
11037 ]),
11038 registry=SuccessRegistry(),
11039 )
11040 allowed = run_one_step(
11041 job_id,
11042 config=config,
11043 db=db,
11044 llm=ScriptedLLM([
11045 LLMResponse(tool_calls=[ToolCall(
11046 name="shell_exec",
11047 arguments={"command": "curl -L https://other.example/private/d/model.bin"},
11048 )])
11049 ]),
11050 registry=SuccessRegistry(),
11051 )
11052
11053 assert allowed_same_host.status == "completed"
11054 assert allowed.status == "completed"
11055 finally:
11056 db.close()
11057
11058
11059def test_run_one_step_blocks_known_bad_shell_source_path(tmp_path):
11060 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
11061 db = AgentDB(tmp_path / "state.db")
11062 try:
11063 job_id = db.create_job("Pivot from failed shell URL source", title="guard")
11064 db.append_source_record(
11065 job_id,
11066 "https://source.example/api/private/tree/main",
11067 source_type="shell_exec",
11068 usefulness_score=0.01,
11069 fail_count_delta=1,
11070 warnings=["auth failure"],
11071 outcome="401 Unauthorized",
11072 )
11073
11074 blocked = run_one_step(
11075 job_id,
11076 config=config,
11077 db=db,
11078 llm=ScriptedLLM([
11079 LLMResponse(tool_calls=[ToolCall(
11080 name="shell_exec",
11081 arguments={"command": "curl -s 'https://source.example/api/private/tree/main?recursive=true'"},
11082 )])
11083 ]),
11084 )
11085 allowed = run_one_step(
11086 job_id,
11087 config=config,
11088 db=db,
11089 llm=ScriptedLLM([
11090 LLMResponse(tool_calls=[ToolCall(
11091 name="shell_exec",
11092 arguments={"command": "curl -s 'https://source.example/api/public/models'"},
11093 )])
11094 ]),
11095 registry=SuccessRegistry(),
11096 )
11097
11098 assert blocked.status == "blocked"
11099 assert blocked.result["error"] == "known bad source blocked"
11100 assert blocked.result["known_bad_source"]["source"] == "https://source.example/api/private/tree/main"
11101 assert allowed.status == "completed"
11102 finally:
11103 db.close()
11104
11105
11106def test_run_one_step_allows_mixed_shell_command_with_bad_root_health_check(tmp_path):
11107 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
11108 db = AgentDB(tmp_path / "state.db")
11109 try:
11110 job_id = db.create_job("Avoid over-broad shell root source blocks", title="guard")
11111 db.append_source_record(
11112 job_id,
11113 "https://source.example",
11114 source_type="shell_exec",
11115 usefulness_score=0.01,
11116 fail_count_delta=1,
11117 warnings=["root health check failed earlier"],
11118 outcome="HTTP failure",
11119 )
11120
11121 result = run_one_step(
11122 job_id,
11123 config=config,
11124 db=db,
11125 llm=ScriptedLLM([
11126 LLMResponse(tool_calls=[ToolCall(
11127 name="shell_exec",
11128 arguments={"command": "curl -sI https://source.example && curl -s https://source.example/api/public/models"},
11129 )])
11130 ]),
11131 registry=SuccessRegistry(),
11132 )
11133
11134 assert result.status == "completed"
11135 assert result.tool_name == "shell_exec"
11136 finally:
11137 db.close()
11138
11139
11140def test_run_one_step_saves_unpersisted_evidence_before_known_bad_source_block(tmp_path):
11141 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
11142 db = AgentDB(tmp_path / "state.db")
11143 try:
11144 job_id = db.create_job("Evidence checkpoint still wins", title="guard")
11145 db.append_source_record(
11146 job_id,
11147 "https://blocked.example/search",
11148 source_type="blocked_browser_source",
11149 usefulness_score=0.02,
11150 fail_count_delta=1,
11151 warnings=["captcha/anti-bot block"],
11152 outcome="blocked; pivot",
11153 )
11154 run_id = db.start_run(job_id)
11155 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="browser_snapshot")
11156 db.finish_step(
11157 step_id,
11158 status="completed",
11159 output_data={
11160 "success": True,
11161 "data": {"origin": "https://useful.example"},
11162 "snapshot": "Useful source evidence. " * 80,
11163 },
11164 )
11165 db.finish_run(run_id, "completed")
11166
11167 result = run_one_step(
11168 job_id,
11169 config=config,
11170 db=db,
11171 llm=ScriptedLLM([
11172 LLMResponse(tool_calls=[ToolCall(name="browser_navigate", arguments={"url": "https://blocked.example/search"})])
11173 ]),
11174 )
11175
11176 assert result.status == "blocked"
11177 assert result.result["error"] == "artifact required before more research"
11178 assert "auto_checkpoint" in result.result
11179 assert result.result["auto_checkpoint"]["artifact_id"]
11180 finally:
11181 db.close()
11182
11183
11184def test_run_one_step_blocks_search_streak(tmp_path):
11185 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
11186 db = AgentDB(tmp_path / "state.db")
11187 try:
11188 job_id = db.create_job("Do not search forever", title="guard")
11189 for query in ("alpha findings", "beta findings", "gamma findings"):
11190 run_id = db.start_run(job_id)
11191 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="web_search", input_data={"arguments": {"query": query}})
11192 db.finish_step(step_id, status="completed", output_data={"success": True, "query": query, "results": []})
11193 db.finish_run(run_id, "completed")
11194
11195 result = run_one_step(
11196 job_id,
11197 config=config,
11198 db=db,
11199 llm=ScriptedLLM([
11200 LLMResponse(tool_calls=[ToolCall(name="web_search", arguments={"query": "delta findings", "limit": 5})])
11201 ]),
11202 )
11203
11204 assert result.status == "blocked"
11205 assert result.result["error"] == "search loop blocked"
11206 assert result.result["recent_search_streak"] == 3
11207 finally:
11208 db.close()
11209
11210
11211def test_run_one_step_blocks_similar_search_query(tmp_path):
11212 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
11213 db = AgentDB(tmp_path / "state.db")
11214 try:
11215 job_id = db.create_job("Avoid query rewrites", title="guard")
11216 run_id = db.start_run(job_id)
11217 step_id = db.add_step(
11218 job_id=job_id,
11219 run_id=run_id,
11220 kind="tool",
11221 tool_name="web_search",
11222 input_data={"arguments": {"query": "target digital marketing research"}},
11223 )
11224 db.finish_step(step_id, status="completed", output_data={"success": True, "query": "target digital marketing research", "results": []})
11225 db.finish_run(run_id, "completed")
11226
11227 result = run_one_step(
11228 job_id,
11229 config=config,
11230 db=db,
11231 llm=ScriptedLLM([
11232 LLMResponse(tool_calls=[ToolCall(name="web_search", arguments={"query": "target marketing digital research", "limit": 5})])
11233 ]),
11234 )
11235
11236 assert result.status == "blocked"
11237 assert result.result["error"] == "similar search query blocked"
11238 finally:
11239 db.close()
11240
11241
11242def test_run_one_step_reflects_every_fixed_interval(tmp_path):
11243 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
11244 db = AgentDB(tmp_path / "state.db")
11245 try:
11246 job_id = db.create_job("Reflect over work", title="reflect")
11247 for index in range(12):
11248 run_id = db.start_run(job_id)
11249 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="web_search")
11250 db.finish_step(step_id, status="completed", summary=f"step {index}", output_data={"success": True})
11251 db.finish_run(run_id, "completed")
11252
11253 result = run_one_step(
11254 job_id,
11255 config=config,
11256 db=db,
11257 llm=ScriptedLLM([
11258 LLMResponse(tool_calls=[ToolCall(name="web_search", arguments={"query": "should not be used"})])
11259 ]),
11260 )
11261 job = db.get_job(job_id)
11262
11263 assert result.tool_name == "reflect"
11264 assert result.status == "completed"
11265 assert job["metadata"]["reflections"]
11266 assert job["metadata"]["last_agent_update"]["category"] == "plan"
11267 assert "Lessons learned:" in build_messages(job, db.list_steps(job_id=job_id))[-1]["content"]
11268 finally:
11269 db.close()
11270
11271
11272def test_reflection_does_not_repeat_existing_strategy_lesson(tmp_path):
11273 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
11274 db = AgentDB(tmp_path / "state.db")
11275 strategy = "Choose the next branch from durable evidence, then record the result as findings, tasks, experiments, sources, or memory."
11276 try:
11277 job_id = db.create_job("Reflect over repeated work", title="reflect")
11278 db.append_lesson(job_id, strategy, category="strategy")
11279 for index in range(12):
11280 run_id = db.start_run(job_id)
11281 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="web_search")
11282 db.finish_step(step_id, status="completed", summary=f"step {index}", output_data={"success": True})
11283 db.finish_run(run_id, "completed")
11284
11285 result = run_one_step(
11286 job_id,
11287 config=config,
11288 db=db,
11289 llm=ScriptedLLM([
11290 LLMResponse(tool_calls=[ToolCall(name="web_search", arguments={"query": "should not be used"})])
11291 ]),
11292 )
11293 job = db.get_job(job_id)
11294
11295 assert result.tool_name == "reflect"
11296 assert result.result["lesson_recorded"] is False
11297 assert len(job["metadata"]["lessons"]) == 1
11298 assert job["metadata"]["lessons"][0].get("seen_count") is None
11299 finally:
11300 db.close()
11301
11302
11303def test_reflection_strategy_uses_current_operator_state(tmp_path):
11304 config = AppConfig(runtime=RuntimeConfig(home=tmp_path))
11305 db = AgentDB(tmp_path / "state.db")
11306 try:
11307 job_id = db.create_job(
11308 "Reflect over operator context",
11309 title="reflect",
11310 metadata={
11311 "operator_messages": [
11312 {
11313 "id": "op_1",
11314 "mode": "steer",
11315 "message": "Use the corrected target before continuing.",
11316 "created_at": "2026-01-01T00:00:00+00:00",
11317 }
11318 ]
11319 },
11320 )
11321 for index in range(12):
11322 run_id = db.start_run(job_id)
11323 step_id = db.add_step(job_id=job_id, run_id=run_id, kind="tool", tool_name="shell_exec")
11324 db.finish_step(step_id, status="completed", summary=f"step {index}", output_data={"success": True})
11325 db.finish_run(run_id, "completed")
11326
11327 result = run_one_step(
11328 job_id,
11329 config=config,
11330 db=db,
11331 llm=ScriptedLLM([
11332 LLMResponse(tool_calls=[ToolCall(name="shell_exec", arguments={"command": "should not run"})])
11333 ]),
11334 )
11335
11336 assert result.tool_name == "reflect"
11337 assert "Incorporate or supersede active operator context" in result.result["reflection"]["strategy"]
11338 finally:
11339 db.close()