Failure Corpus Runner
Model:     cogito:14b
Trace dir: /home/tylerbuell/Documents/AIProjects/reactive-agents-ts/.reactive-agents/traces/failure-corpus
Scenarios: 8 (4 success, 4 failure)

========================================================================
SCENARIO : success-days-of-week  [SUCCESS]
TASK     : List the 7 days of the week in order, starting with Monday....
EXPECT   : Deterministic recall, 1 iteration, entropy ~0.150
========================================================================
✓ Provider: ollama | Model: cogito:14b | API key: (not required)
→ [phase:execution] Starting...
ℹ️ Reactive Intelligence — Anonymous entropy data helps improve the framework. Disable with .withReactiveIntelligence({ telemetry: false }) (https://docs.reactiveagents.dev/telemetry)
→ [phase:reactive:kernel] Starting...
  [iter:0:thought]
→ [phase:think] Starting...
✓ [phase:think] 1.8s
  📊 [metric:entropy] 0.15 composite
⚠️ [warning] [output-gate] Synthesized output to match requested format: list
✓ [phase:reactive:kernel] 2.3s
✓ [completion] Reactive strategy terminated: end_turn
  📊 [metric:tokens_used] 3448 tokens
  📊 [metric:cost_usd] 0 usd
✓ [completion] Task completed in 7.6s with 3448 tokens

═══ Logs (3) ═══
  12:39:32.864 INFO  Execution started {"taskId":"01KPZR19QA5S9QEKDQPTHGHPBF","agentId":"agent-1777034372762"}
  12:39:40.488 INFO  Execution completed {"taskId":"01KPZR19QA5S9QEKDQPTHGHPBF","success":true,"tokensUsed":3448,"cost":0,"duration":7635}
  12:39:40.488 INFO  ◉ [calibration] calibration: cogito:14b | source: prior+local (393 samples) | parallel=sequential-only classifier=high

═══ Spans (9) ═══
  ✓ execution.run (7644.0ms) [ae9aed31…]
    ✓ execution.phase.bootstrap (8.4ms) [ae9aed31…]
      ✓ phase.bootstrap.metrics (0.1ms) [ae9aed31…]
    ✓ execution.phase.strategy-select (1.5ms) [ae9aed31…]
      ✓ phase.strategy-select.metrics (0.0ms) [ae9aed31…]
    ✓ execution.phase.think (2338.3ms) [ae9aed31…]
      ✓ phase.think.metrics (0.0ms) [ae9aed31…]
    ✓ execution.phase.complete (2.2ms) [ae9aed31…]
      ✓ phase.complete.metrics (0.0ms) [ae9aed31…]

═══ Metrics Summary ═══
╭ Agent Execution Summary ────────────────────────╮
│ Status:   Success   Duration: 7.6s   Steps: 1   │
│ Model:    cogito:14b   (ollama)   Tokens: 3,448 │
╰─────────────────────────────────────────────────╯

📊 Execution Timeline
├─ ✅  [bootstrap]            8ms
├─ ✅  [strategy-select]      1ms
├─ ✅  [think]               2.3s (1 steps, 100% of time)
└─ ✅  [complete]             1ms

🧠 Reasoning Signal
├─ Grade: A   Signal: flat   Mean: 0.150   Delta: 0.000
├─ Solved in one pass — no trajectory to analyze
└─  iter  1 ███░░░░░░░░░░░░░░░░░ 0.150 →

--- RESULT ---
Success:               true
Iterations:            2 / 4
Max entropy:           0.150
Dispatched:            0  Suppressed: 0
Duration:              7.8s
Trace:                 /home/tylerbuell/Documents/AIProjects/reactive-agents-ts/.reactive-agents/traces/failure-corpus/01KPZR19QA5S9QEKDQPTHGHPBF.jsonl

========================================================================
SCENARIO : success-capital-france  [SUCCESS]
TASK     : What is the capital city of France? Give just the city name....
EXPECT   : Single-fact recall, 1 iteration, entropy ~0.150
========================================================================
✓ Provider: ollama | Model: cogito:14b | API key: (not required)
→ [phase:execution] Starting...
→ [phase:reactive:kernel] Starting...
  [iter:0:thought]
→ [phase:think] Starting...
✓ [phase:think] 1.5s
  📊 [metric:entropy] 0.15 composite
  [iter:0:thought]
→ [phase:think] Starting...
  → [tool:web-search] call 0
  ✓ [tool:web-search] 0.99s
✓ [phase:think] 1.0s
  [iter:1:thought]
→ [phase:think] Starting...
✓ [phase:think] 1.6s
  📊 [metric:entropy] 0.15 composite
  [iter:1:thought]
→ [phase:think] Starting...
  → [tool:recall] call 1
  ✓ [tool:recall] 0.00s
✓ [phase:think] 0.0s
  [iter:2:thought]
→ [phase:think] Starting...
✓ [phase:think] 1.2s
  📊 [metric:entropy] 0.15 composite
  [iter:2:thought]
→ [phase:think] Starting...
✓ [phase:think] 0.0s
✓ [phase:reactive:kernel] 5.3s
✓ [completion] Reactive strategy terminated: final_answer_tool
  📊 [metric:tokens_used] 8357 tokens
  📊 [metric:cost_usd] 0 usd
✓ [completion] Task completed in 7.0s with 8357 tokens

═══ Logs (3) ═══
  12:39:40.570 INFO  Execution started {"taskId":"01KPZR1H8JNAV9CDMV7XQK7Y2N","agentId":"agent-1777034380508"}
  12:39:47.609 INFO  Execution completed {"taskId":"01KPZR1H8JNAV9CDMV7XQK7Y2N","success":true,"tokensUsed":8357,"cost":0,"duration":7040}
  12:39:47.610 INFO  ◉ [calibration] calibration: cogito:14b | source: prior+local (394 samples) | parallel=sequential-only classifier=high

═══ Spans (15) ═══
  ✓ execution.run (7047.9ms) [7bb6a557…]
    ✓ execution.phase.bootstrap (15.2ms) [7bb6a557…]
      ✓ phase.bootstrap.metrics (0.0ms) [7bb6a557…]
    ✓ execution.phase.strategy-select (1.1ms) [7bb6a557…]
      ✓ phase.strategy-select.metrics (0.0ms) [7bb6a557…]
    ✓ execution.phase.think (5286.8ms) [7bb6a557…]
      ✓ phase.think.metrics (0.1ms) [7bb6a557…]
    ✓ execution.phase.act (1.2ms) [7bb6a557…]
      ✓ phase.act.metrics (0.0ms) [7bb6a557…]
    ✓ execution.phase.observe (1.0ms) [7bb6a557…]
      ✓ phase.observe.metrics (0.0ms) [7bb6a557…]
    ✓ execution.phase.memory-flush (1177.6ms) [7bb6a557…]
      ✓ phase.memory-flush.metrics (0.0ms) [7bb6a557…]
    ✓ execution.phase.complete (1.1ms) [7bb6a557…]
      ✓ phase.complete.metrics (0.0ms) [7bb6a557…]

═══ Metrics Summary ═══
╭ Agent Execution Summary ────────────────────────╮
│ Status:   Success   Duration: 7.0s   Steps: 9   │
│ Model:    cogito:14b   (ollama)   Tokens: 8,357 │
╰─────────────────────────────────────────────────╯

📊 Execution Timeline
├─ ✅  [bootstrap]           15ms
├─ ✅  [strategy-select]      1ms
├─ ✅  [think]               5.3s (9 steps, 82% of time)
├─ ✅  [act]                  1ms (2 calls)
├─ ✅  [observe]              1ms
├─ ✅  [memory-flush]        1.2s
└─ ✅  [complete]             1ms

🔧 Tool Execution (2 calls across 2 tools)
├─ ✅  web-search  1 calls, 995ms avg
└─ ✅  recall      1 calls, 1ms avg

🧠 Reasoning Signal
├─ Grade: B   Signal: flat   Mean: 0.150   Delta: 0.000
├─ Model stalled — entropy didn't decrease across iterations
├─  iter  0 ███░░░░░░░░░░░░░░░░░ 0.150 →
├─  iter  1 ███░░░░░░░░░░░░░░░░░ 0.150 →
└─  iter  2 ███░░░░░░░░░░░░░░░░░ 0.150 →
   ┈┈┈
└─ 💡 Consider enabling strategy switching (.withReasoning({ enableStrategySwitching: true }))

⚠️  Alerts & Insights
├─ ℹ️  9 reasoning steps (complex reasoning)
└─ ⚠️  High step count suggests task complexity or model confusion

--- RESULT ---
Success:               true
Iterations:            3 / 4
Max entropy:           0.150
Dispatched:            0  Suppressed: 2
Duration:              7.1s
Trace:                 /home/tylerbuell/Documents/AIProjects/reactive-agents-ts/.reactive-agents/traces/failure-corpus/01KPZR1H8JNAV9CDMV7XQK7Y2N.jsonl

========================================================================
SCENARIO : success-rgb-colors  [SUCCESS]
TASK     : What are the three primary colors of light (RGB)? List them....
EXPECT   : Single-fact recall, 1 iteration, entropy ~0.150
========================================================================
✓ Provider: ollama | Model: cogito:14b | API key: (not required)
→ [phase:execution] Starting...
→ [phase:reactive:kernel] Starting...
  [iter:0:thought]
→ [phase:think] Starting...
✓ [phase:think] 1.5s
  📊 [metric:entropy] 0.15 composite
  [iter:0:thought]
→ [phase:think] Starting...
  → [tool:web-search] call 0
  ✓ [tool:web-search] 1.03s
✓ [phase:think] 1.0s
  [iter:1:thought]
→ [phase:think] Starting...
✓ [phase:think] 2.0s
  📊 [metric:entropy] 0.15 composite
  [iter:1:thought]
→ [phase:think] Starting...
  → [tool:recall] call 1
  → [tool:web-search] call 1
  ✓ [tool:recall] 0.00s
  ✓ [tool:web-search] 1.08s
✓ [phase:think] 1.1s
  [iter:2:thought]
→ [phase:think] Starting...
✓ [phase:think] 2.5s
  📊 [metric:entropy] 0.15 composite
  [iter:2:thought]
→ [phase:think] Starting...
✓ [phase:think] 0.0s
✓ [phase:reactive:kernel] 8.1s
✓ [completion] Reactive strategy terminated: final_answer_tool
  📊 [metric:tokens_used] 8693 tokens
  📊 [metric:cost_usd] 0 usd
✓ [completion] Task completed in 9.4s with 8693 tokens

═══ Logs (3) ═══
  12:39:47.701 INFO  Execution started {"taskId":"01KPZR1R7D30E6Y5A2CZM1GBVA","agentId":"agent-1777034387639"}
  12:39:57.056 INFO  Execution completed {"taskId":"01KPZR1R7D30E6Y5A2CZM1GBVA","success":true,"tokensUsed":8693,"cost":0,"duration":9356}
  12:39:57.056 INFO  ◉ [calibration] calibration: cogito:14b | source: prior+local (395 samples) | parallel=sequential-only classifier=high

═══ Spans (15) ═══
  ✓ execution.run (9364.1ms) [31459e00…]
    ✓ execution.phase.bootstrap (2.6ms) [31459e00…]
      ✓ phase.bootstrap.metrics (0.0ms) [31459e00…]
    ✓ execution.phase.strategy-select (1.9ms) [31459e00…]
      ✓ phase.strategy-select.metrics (0.0ms) [31459e00…]
    ✓ execution.phase.think (8097.8ms) [31459e00…]
      ✓ phase.think.metrics (0.0ms) [31459e00…]
    ✓ execution.phase.act (1.1ms) [31459e00…]
      ✓ phase.act.metrics (0.0ms) [31459e00…]
    ✓ execution.phase.observe (1.0ms) [31459e00…]
      ✓ phase.observe.metrics (0.0ms) [31459e00…]
    ✓ execution.phase.memory-flush (712.9ms) [31459e00…]
      ✓ phase.memory-flush.metrics (0.0ms) [31459e00…]
    ✓ execution.phase.complete (1.2ms) [31459e00…]
      ✓ phase.complete.metrics (0.0ms) [31459e00…]

═══ Metrics Summary ═══
╭ Agent Execution Summary ────────────────────────╮
│ Status:   Success   Duration: 9.4s   Steps: 11  │
│ Model:    cogito:14b   (ollama)   Tokens: 8,693 │
╰─────────────────────────────────────────────────╯

📊 Execution Timeline
├─ ✅  [bootstrap]            2ms
├─ ✅  [strategy-select]      1ms
├─ ✅  [think]               8.1s (11 steps, 92% of time)
├─ ✅  [act]                  1ms (3 calls)
├─ ✅  [observe]              1ms
├─ ✅  [memory-flush]       712ms
└─ ✅  [complete]             1ms

🔧 Tool Execution (3 calls across 2 tools)
├─ ✅  web-search  2 calls, 1.1s avg
└─ ✅  recall      1 calls, 1ms avg

🧠 Reasoning Signal
├─ Grade: B   Signal: flat   Mean: 0.150   Delta: 0.000
├─ Model stalled — entropy didn't decrease across iterations
├─  iter  0 ███░░░░░░░░░░░░░░░░░ 0.150 →
├─  iter  1 ███░░░░░░░░░░░░░░░░░ 0.150 →
└─  iter  2 ███░░░░░░░░░░░░░░░░░ 0.150 →
   ┈┈┈
└─ 💡 Consider enabling strategy switching (.withReasoning({ enableStrategySwitching: true }))

⚠️  Alerts & Insights
├─ ℹ️  11 reasoning steps (complex reasoning)
└─ ⚠️  High step count suggests task complexity or model confusion

--- RESULT ---
Success:               true
Iterations:            3 / 4
Max entropy:           0.150
Dispatched:            0  Suppressed: 2
Duration:              9.4s
Trace:                 /home/tylerbuell/Documents/AIProjects/reactive-agents-ts/.reactive-agents/traces/failure-corpus/01KPZR1R7D30E6Y5A2CZM1GBVA.jsonl

========================================================================
SCENARIO : success-typescript-paradigm  [SUCCESS]
TASK     : What programming paradigm does TypeScript primarily support? List two features t...
EXPECT   : Technical factual recall, 1-2 iterations, low entropy
========================================================================
✓ Provider: ollama | Model: cogito:14b | API key: (not required)
→ [phase:execution] Starting...
→ [phase:reactive:kernel] Starting...
  [iter:0:thought]
→ [phase:think] Starting...
✓ [phase:think] 2.0s
  📊 [metric:entropy] 0.15 composite
  [iter:0:thought]
→ [phase:think] Starting...
  → [tool:web-search] call 0
  → [tool:find] call 0
  ✓ [tool:find] 0.00s
  ✓ [tool:web-search] 0.91s
✓ [phase:think] 0.9s
  [iter:1:thought]
→ [phase:think] Starting...
✓ [phase:think] 4.3s
  📊 [metric:entropy] 0.15 composite
  [iter:1:thought]
→ [phase:think] Starting...
  → [tool:recall] call 1
  ✓ [tool:recall] 0.00s
✓ [phase:think] 0.0s
  [iter:2:thought]
→ [phase:think] Starting...
✓ [phase:think] 1.1s
  📊 [metric:entropy] 0.355 composite
  [iter:3:thought]
→ [phase:think] Starting...
✓ [phase:think] 0.9s
  📊 [metric:entropy] 0.3784375 composite
⚠️ [warning] [harness-deliverable] Assembling output from 1 tool artifacts after 4 stalled iterations
⚠️ [warning] [output-gate] Synthesized output to match requested format: prose
✓ [phase:reactive:kernel] 10.7s
✓ [completion] Reactive strategy terminated: final_answer
  📊 [metric:tokens_used] 9865 tokens
  📊 [metric:cost_usd] 0 usd
✓ [completion] Task completed in 12.9s with 9865 tokens

═══ Logs (3) ═══
  12:39:57.137 INFO  Execution started {"taskId":"01KPZR21EDQHEZVAJ6DKC5PAR4","agentId":"agent-1777034397083"}
  12:40:10.003 INFO  Execution completed {"taskId":"01KPZR21EDQHEZVAJ6DKC5PAR4","success":true,"tokensUsed":9865,"cost":0,"duration":12866}
  12:40:10.003 INFO  ◉ [calibration] calibration: cogito:14b | source: prior+local (396 samples) | parallel=sequential-only classifier=high

═══ Spans (15) ═══
  ✓ execution.run (12875.0ms) [dc0ec02b…]
    ✓ execution.phase.bootstrap (1.5ms) [dc0ec02b…]
      ✓ phase.bootstrap.metrics (0.0ms) [dc0ec02b…]
    ✓ execution.phase.strategy-select (1.3ms) [dc0ec02b…]
      ✓ phase.strategy-select.metrics (0.0ms) [dc0ec02b…]
    ✓ execution.phase.think (10659.8ms) [dc0ec02b…]
      ✓ phase.think.metrics (0.0ms) [dc0ec02b…]
    ✓ execution.phase.act (1.5ms) [dc0ec02b…]
      ✓ phase.act.metrics (0.0ms) [dc0ec02b…]
    ✓ execution.phase.observe (1.2ms) [dc0ec02b…]
      ✓ phase.observe.metrics (0.0ms) [dc0ec02b…]
    ✓ execution.phase.memory-flush (1649.3ms) [dc0ec02b…]
      ✓ phase.memory-flush.metrics (0.0ms) [dc0ec02b…]
    ✓ execution.phase.complete (1.2ms) [dc0ec02b…]
      ✓ phase.complete.metrics (0.0ms) [dc0ec02b…]

═══ Metrics Summary ═══
╭ Agent Execution Summary ────────────────────────╮
│ Status:   Success   Duration: 12.9s   Steps: 10 │
│ Model:    cogito:14b   (ollama)   Tokens: 9,865 │
╰─────────────────────────────────────────────────╯

📊 Execution Timeline
├─ ✅  [bootstrap]            1ms
├─ ✅  [strategy-select]      1ms
├─ ⚠️  [think]              10.7s (10 steps, 87% of time)
├─ ✅  [act]                  1ms (3 calls)
├─ ✅  [observe]              1ms
├─ ✅  [memory-flush]        1.6s
└─ ✅  [complete]             1ms

🔧 Tool Execution (3 calls across 3 tools)
├─ ✅  web-search  1 calls, 912ms avg
├─ ✅  find        1 calls, 1ms avg
└─ ✅  recall      1 calls, 1ms avg

🧠 Reasoning Signal
├─ Grade: D   Signal: diverging   Mean: 0.258   Delta: +0.228
├─ Model became more confused over time
├─  iter  0 ███░░░░░░░░░░░░░░░░░ 0.150 →
├─  iter  1 ███░░░░░░░░░░░░░░░░░ 0.150 →
├─  ┈┈┈ 1 tool/system step (no thought scored) ┈┈┈
├─  iter  3 ███████░░░░░░░░░░░░░ 0.355 →
└─  iter  4 ████████░░░░░░░░░░░░ 0.378 ↗
   ┈┈┈
└─ 💡 Try a simpler prompt or break the task into sub-tasks

⚠️  Alerts & Insights
├─ ⚠️  think phase blocked ≥10s (LLM latency)
├─ ℹ️  10 reasoning steps (complex reasoning)
├─ ⚠️  High step count suggests task complexity or model confusion
└─ ⚠️  Entropy diverging — model became less certain over iterations

--- RESULT ---
Success:               true
Iterations:            5 / 4
Max entropy:           0.378
Dispatched:            0  Suppressed: 1
Duration:              12.9s
Trace:                 /home/tylerbuell/Documents/AIProjects/reactive-agents-ts/.reactive-agents/traces/failure-corpus/01KPZR21EDQHEZVAJ6DKC5PAR4.jsonl

========================================================================
SCENARIO : failure-rate-limit-loop  [FAILURE]
TASK     : Search the web for the current Bitcoin price. Rules: (1) You MUST use the web-se...
EXPECT   : Tool always returns rate-limit error; streak evaluator fires after 3 consecutive failures
========================================================================
✓ Provider: ollama | Model: cogito:14b | API key: (not required)
→ [phase:execution] Starting...
→ [phase:reactive:kernel] Starting...
  [iter:0:thought]
→ [phase:think] Starting...
✓ [phase:think] 1.4s
  📊 [metric:entropy] 0.15 composite
  [iter:0:thought]
→ [phase:think] Starting...
  → [tool:web-search] call 0
  ✗ [tool:web-search] 0.00s — [Tool error: Tool crashed: Error: Rate limit exceeded. Quota resets in 60 seconds. Please retry.]
✓ [phase:think] 0.0s
  [iter:1:thought]
→ [phase:think] Starting...
✓ [phase:think] 2.0s
  📊 [metric:entropy] 0.15 composite
  [iter:1:thought]
→ [phase:think] Starting...
  → [tool:web-search] call 1
  → [tool:web-search] call 1
  → [tool:web-search] call 1
  ✗ [tool:web-search] 0.00s
  ✗ [tool:web-search] 0.00s
  ✗ [tool:web-search] 0.00s
✓ [phase:think] 0.0s
  [iter:2:thought]
→ [phase:think] Starting...
✓ [phase:think] 1.5s
  📊 [metric:entropy] 0.15 composite
  [iter:2:thought]
→ [phase:think] Starting...
  → [tool:crypto-price] call 2
  ✓ [tool:crypto-price] 0.14s
✓ [phase:think] 0.1s
  [iter:3:thought]
→ [phase:think] Starting...
✓ [phase:think] 1.3s
  📊 [metric:entropy] 0.5220833333333333 composite
  [iter:4:thought]
→ [phase:think] Starting...
✓ [phase:think] 2.2s
  📊 [metric:entropy] 0.40742675781249993 composite
  [iter:4:thought]
→ [phase:think] Starting...
  → [tool:web-search] call 4
  → [tool:web-search] call 4
  → [tool:web-search] call 4
  ✗ [tool:web-search] 0.00s
  ✗ [tool:web-search] 0.00s
  ✗ [tool:web-search] 0.00s
✓ [phase:think] 0.0s
  [iter:5:thought]
→ [phase:think] Starting...
✓ [phase:think] 2.2s
  📊 [metric:entropy] 0.4564605712890625 composite
  [iter:5:thought]
→ [phase:think] Starting...
✓ [phase:think] 0.0s
  [iter:6:thought]
→ [phase:think] Starting...
✓ [phase:think] 4.9s
  📊 [metric:entropy] 0.5030902099609375 composite
  [iter:6:thought]
→ [phase:think] Starting...
✓ [phase:think] 0.0s
  [iter:7:thought]
→ [phase:think] Starting...
✓ [phase:think] 2.0s
  📊 [metric:entropy] 0.5407183837890625 composite
  [iter:7:thought]
→ [phase:think] Starting...
✓ [phase:think] 0.0s
  [iter:8:thought]
→ [phase:think] Starting...
✓ [phase:think] 2.5s
  📊 [metric:entropy] 0.5575 composite
  [iter:8:thought]
→ [phase:think] Starting...
✓ [phase:think] 0.0s
  [iter:9:thought]
→ [phase:think] Starting...
✓ [phase:think] 1.3s
  📊 [metric:entropy] 0.5575 composite
⚠️ [warning] [harness-deliverable] Assembling output from 1 tool artifacts after 12 stalled iterations
⚠️ [warning] [output-gate] Synthesis imperfect but using over raw artifacts (format=true, content=false)
✓ [phase:reactive:kernel] 22.5s
✓ [completion] Reactive strategy terminated: final_answer
  📊 [metric:tokens_used] 37048 tokens
  📊 [metric:cost_usd] 0 usd
✓ [completion] Task completed in 24.5s with 37048 tokens

═══ Logs (3) ═══
  12:40:10.087 INFO  Execution started {"taskId":"01KPZR2E32MHQKR6KC3HDZ46K6","agentId":"agent-1777034410030"}
  12:40:34.560 INFO  Execution completed {"taskId":"01KPZR2E32MHQKR6KC3HDZ46K6","success":true,"tokensUsed":37048,"cost":0,"duration":24474}
  12:40:34.560 INFO  ◉ [calibration] calibration: cogito:14b | source: prior+local (397 samples) | parallel=sequential-only classifier=high

═══ Spans (15) ═══
  ✓ execution.run (24481.5ms) [db4ae5be…]
    ✓ execution.phase.bootstrap (1.5ms) [db4ae5be…]
      ✓ phase.bootstrap.metrics (0.0ms) [db4ae5be…]
    ✓ execution.phase.strategy-select (1.1ms) [db4ae5be…]
      ✓ phase.strategy-select.metrics (0.0ms) [db4ae5be…]
    ✓ execution.phase.think (22469.2ms) [db4ae5be…]
      ✓ phase.think.metrics (0.0ms) [db4ae5be…]
    ✓ execution.phase.act (1.3ms) [db4ae5be…]
      ✓ phase.act.metrics (0.0ms) [db4ae5be…]
    ✓ execution.phase.observe (1.2ms) [db4ae5be…]
      ✓ phase.observe.metrics (0.0ms) [db4ae5be…]
    ✓ execution.phase.memory-flush (985.0ms) [db4ae5be…]
      ✓ phase.memory-flush.metrics (0.0ms) [db4ae5be…]
    ✓ execution.phase.complete (1.2ms) [db4ae5be…]
      ✓ phase.complete.metrics (0.0ms) [db4ae5be…]

═══ Metrics Summary ═══
╭ Agent Execution Summary ─────────────────────────╮
│ Status:   Success   Duration: 24.5s   Steps: 48  │
│ Model:    cogito:14b   (ollama)   Tokens: 37,048 │
╰──────────────────────────────────────────────────╯

📊 Execution Timeline
├─ ✅  [bootstrap]            1ms
├─ ✅  [strategy-select]      1ms
├─ ⚠️  [think]              22.5s (48 steps, 96% of time)
├─ ✅  [act]                  1ms (8 calls)
├─ ✅  [observe]              1ms
├─ ✅  [memory-flush]       985ms
└─ ✅  [complete]             1ms

🔧 Tool Execution (8 calls across 2 tools)
├─ ⚠️  web-search    7 calls, 0ms avg 7 errors
└─ ✅  crypto-price  1 calls, 135ms avg

🧠 Reasoning Signal
├─ Grade: B   Signal: flat   Mean: 0.417   Delta: +0.428
├─ Model stalled — entropy didn't decrease across iterations
├─  iter  0 ███░░░░░░░░░░░░░░░░░ 0.150 →
├─  iter  1 ███░░░░░░░░░░░░░░░░░ 0.150 →
├─  iter  2 ███░░░░░░░░░░░░░░░░░ 0.150 →
├─  ┈┈┈ 1 tool/system step (no thought scored) ┈┈┈
├─  iter  4 ██████████░░░░░░░░░░ 0.522 ↗
├─  iter  5 █████████░░░░░░░░░░░ 0.456 ↘
├─  iter  6 ██████████░░░░░░░░░░ 0.503 →
├─  iter  7 ███████████░░░░░░░░░ 0.541 →
├─  iter  8 ███████████░░░░░░░░░ 0.557 →
├─  ┈┈┈ 1 tool/system step (no thought scored) ┈┈┈
├─  iter 10 ███████████░░░░░░░░░ 0.557 →
├─  ┈┈┈ 4 tool/system steps (no thought scored) ┈┈┈
└─  iter 15 ████████████░░░░░░░░ 0.578 →
   ┈┈┈
└─ 💡 Consider enabling strategy switching (.withReasoning({ enableStrategySwitching: true }))

⚠️  Alerts & Insights
├─ ⚠️  think phase blocked ≥10s (LLM latency)
├─ ⚠️  web-search had 7 error(s) (100% failure rate)
├─ ℹ️  48 reasoning steps (complex reasoning)
└─ ⚠️  High step count suggests task complexity or model confusion

--- RESULT ---
Success:               true
Iterations:            16 / 12
Max entropy:           0.578
Dispatched:            5  Suppressed: 7
First dispatch:        iter=7  iters-after=9  (lower=nudge working)
Duration:              24.5s
Trace:                 /home/tylerbuell/Documents/AIProjects/reactive-agents-ts/.reactive-agents/traces/failure-corpus/01KPZR2E32MHQKR6KC3HDZ46K6.jsonl

========================================================================
SCENARIO : failure-save-loop  [FAILURE]
TASK     : Save the audit record 'task=complete,status=verified,ts=now' using the save-data...
EXPECT   : Save tool always fails; forced 6+ retry rule drives behavioral loop score above threshold
========================================================================
✓ Provider: ollama | Model: cogito:14b | API key: (not required)
→ [phase:execution] Starting...
→ [phase:reactive:kernel] Starting...
  [iter:0:thought]
→ [phase:think] Starting...
✓ [phase:think] 3.6s
  📊 [metric:entropy] 0.15 composite
  [iter:0:thought]
→ [phase:think] Starting...
  → [tool:save-data] call 0
  ✗ [tool:save-data] 0.00s — [Tool error: Tool crashed: Error: {"success": false, "error": "connection pool exhausted", "retryable": true}]
✓ [phase:think] 0.0s
  [iter:1:thought]
→ [phase:think] Starting...
✓ [phase:think] 2.5s
  📊 [metric:entropy] 0.15 composite
  [iter:1:thought]
→ [phase:think] Starting...
  → [tool:save-data] call 1
  ✗ [tool:save-data] 0.00s — [Tool error: Tool crashed: Error: {"success": false, "error": "connection pool exhausted", "retryable": true}]
✓ [phase:think] 0.0s
  [iter:2:thought]
→ [phase:think] Starting...
✓ [phase:think] 1.5s
  📊 [metric:entropy] 0.15 composite
  [iter:2:thought]
→ [phase:think] Starting...
  → [tool:web-search] call 2
  ✓ [tool:web-search] 0.57s
✓ [phase:think] 0.6s
  [iter:3:thought]
→ [phase:think] Starting...
✓ [phase:think] 1.6s
  📊 [metric:entropy] 0.38625 composite
  [iter:3:thought]
→ [phase:think] Starting...
  → [tool:save-data] call 3
  ✗ [tool:save-data] 0.00s — [Tool error: Tool crashed: Error: {"success": false, "error": "connection pool exhausted", "retryable": true}]
✓ [phase:think] 0.0s
  [iter:4:thought]
→ [phase:think] Starting...
✓ [phase:think] 1.7s
  📊 [metric:entropy] 0.4328656005859375 composite
  [iter:4:thought]
→ [phase:think] Starting...
  → [tool:save-data] call 4
  ✗ [tool:save-data] 0.00s — [Tool error: Tool crashed: Error: {"success": false, "error": "connection pool exhausted", "retryable": true}]
✓ [phase:think] 0.0s
  [iter:5:thought]
→ [phase:think] Starting...
✓ [phase:think] 2.0s
  📊 [metric:entropy] 0.545704345703125 composite
  [iter:5:thought]
→ [phase:think] Starting...
  → [tool:save-data] call 5
  ✗ [tool:save-data] 0.00s — [Tool error: Tool crashed: Error: {"success": false, "error": "connection pool exhausted", "retryable": true}]
✓ [phase:think] 0.0s
⚠️ [warning] [harness-deliverable] Assembling output from 1 tool artifacts after 6 stalled iterations
⚠️ [warning] [output-gate] Synthesized output to match requested format: prose
✓ [phase:reactive:kernel] 16.5s
✓ [completion] Reactive strategy terminated: final_answer
  📊 [metric:tokens_used] 23902 tokens
  📊 [metric:cost_usd] 0 usd
✓ [completion] Task completed in 19.1s with 23902 tokens

═══ Logs (3) ═══
  12:40:34.635 INFO  Execution started {"taskId":"01KPZR3627BCTBGZ64KTG4PSBF","agentId":"agent-1777034434580"}
  12:40:53.733 INFO  Execution completed {"taskId":"01KPZR3627BCTBGZ64KTG4PSBF","success":true,"tokensUsed":23902,"cost":0,"duration":19098}
  12:40:53.733 INFO  ◉ [calibration] calibration: cogito:14b | source: prior+local (398 samples) | parallel=sequential-only classifier=high

═══ Spans (15) ═══
  ✓ execution.run (19105.4ms) [980421ed…]
    ✓ execution.phase.bootstrap (12.1ms) [980421ed…]
      ✓ phase.bootstrap.metrics (0.0ms) [980421ed…]
    ✓ execution.phase.strategy-select (1.2ms) [980421ed…]
      ✓ phase.strategy-select.metrics (0.0ms) [980421ed…]
    ✓ execution.phase.think (16525.1ms) [980421ed…]
      ✓ phase.think.metrics (0.0ms) [980421ed…]
    ✓ execution.phase.act (1.3ms) [980421ed…]
      ✓ phase.act.metrics (0.0ms) [980421ed…]
    ✓ execution.phase.observe (0.9ms) [980421ed…]
      ✓ phase.observe.metrics (0.0ms) [980421ed…]
    ✓ execution.phase.memory-flush (1610.3ms) [980421ed…]
      ✓ phase.memory-flush.metrics (0.0ms) [980421ed…]
    ✓ execution.phase.complete (1.1ms) [980421ed…]
      ✓ phase.complete.metrics (0.0ms) [980421ed…]

═══ Metrics Summary ═══
╭ Agent Execution Summary ─────────────────────────╮
│ Status:   Success   Duration: 19.1s   Steps: 20  │
│ Model:    cogito:14b   (ollama)   Tokens: 23,902 │
╰──────────────────────────────────────────────────╯

📊 Execution Timeline
├─ ✅  [bootstrap]           12ms
├─ ✅  [strategy-select]      1ms
├─ ⚠️  [think]              16.5s (20 steps, 91% of time)
├─ ✅  [act]                  1ms (6 calls)
├─ ✅  [observe]              1ms
├─ ✅  [memory-flush]        1.6s
└─ ✅  [complete]             1ms

🔧 Tool Execution (6 calls across 2 tools)
├─ ⚠️  save-data   5 calls, 1ms avg 5 errors
└─ ✅  web-search  1 calls, 571ms avg

🧠 Reasoning Signal
├─ Grade: D   Signal: diverging   Mean: 0.302   Delta: +0.396
├─ Model became more confused over time
├─  iter  0 ███░░░░░░░░░░░░░░░░░ 0.150 →
├─  iter  1 ███░░░░░░░░░░░░░░░░░ 0.150 →
├─  iter  2 ███░░░░░░░░░░░░░░░░░ 0.150 →
├─  iter  3 ████████░░░░░░░░░░░░ 0.386 →
├─  iter  4 █████████░░░░░░░░░░░ 0.433 ↗
└─  iter  5 ███████████░░░░░░░░░ 0.546 ↗
   ┈┈┈
└─ 💡 Try a simpler prompt or break the task into sub-tasks

⚠️  Alerts & Insights
├─ ⚠️  think phase blocked ≥10s (LLM latency)
├─ ⚠️  save-data had 5 error(s) (100% failure rate)
├─ ℹ️  20 reasoning steps (complex reasoning)
├─ ⚠️  High step count suggests task complexity or model confusion
└─ ⚠️  Entropy diverging — model became less certain over iterations

--- RESULT ---
Success:               true
Iterations:            6 / 12
Max entropy:           0.546
Dispatched:            0  Suppressed: 2
Duration:              19.2s
Trace:                 /home/tylerbuell/Documents/AIProjects/reactive-agents-ts/.reactive-agents/traces/failure-corpus/01KPZR3627BCTBGZ64KTG4PSBF.jsonl

========================================================================
SCENARIO : failure-verify-loop  [FAILURE]
TASK     : Run the test suite using run-tests and keep fixing until all tests pass. Rules: ...
EXPECT   : Tests always fail; 6-call rule drives behavioral loop accumulation, strategy-switch fires
========================================================================
✓ Provider: ollama | Model: cogito:14b | API key: (not required)
→ [phase:execution] Starting...
→ [phase:reactive:kernel] Starting...
  [iter:0:thought]
→ [phase:think] Starting...
✓ [phase:think] 1.4s
  📊 [metric:entropy] 0.15 composite
  [iter:0:thought]
→ [phase:think] Starting...
  → [tool:run-tests] call 0
  ✗ [tool:run-tests] 0.00s — [Tool error: Tool crashed: Error: {"passed": 2, "failed": 3, "errors": ["assertion failed line 23", "null ref line 41", "type mismatch line 88"]}]
✓ [phase:think] 0.0s
  [iter:1:thought]
→ [phase:think] Starting...
✓ [phase:think] 1.3s
  📊 [metric:entropy] 0.15 composite
  [iter:1:thought]
→ [phase:think] Starting...
  → [tool:run-tests] call 1
  ✗ [tool:run-tests] 0.00s — [Tool error: Tool crashed: Error: {"passed": 2, "failed": 3, "errors": ["assertion failed line 23", "null ref line 41", "type mismatch line 88"]}]
✓ [phase:think] 0.0s
  [iter:2:thought]
→ [phase:think] Starting...
✓ [phase:think] 1.6s
  📊 [metric:entropy] 0.15 composite
  [iter:2:thought]
→ [phase:think] Starting...
  → [tool:web-search] call 2
  ✓ [tool:web-search] 0.70s
✓ [phase:think] 0.7s
  [iter:3:thought]
→ [phase:think] Starting...
✓ [phase:think] 1.4s
  📊 [metric:entropy] 0.4029166666666667 composite
  [iter:3:thought]
→ [phase:think] Starting...
  → [tool:run-tests] call 3
  ✗ [tool:run-tests] 0.00s — [Tool error: Tool crashed: Error: {"passed": 2, "failed": 3, "errors": ["assertion failed line 23", "null ref line 41", "type mismatch line 88"]}]
✓ [phase:think] 0.0s
  [iter:4:thought]
→ [phase:think] Starting...
✓ [phase:think] 1.6s
  📊 [metric:entropy] 0.43971069335937496 composite
  [iter:4:thought]
→ [phase:think] Starting...
  → [tool:run-tests] call 4
  ✗ [tool:run-tests] 0.00s — [Tool error: Tool crashed: Error: {"passed": 2, "failed": 3, "errors": ["assertion failed line 23", "null ref line 41", "type mismatch line 88"]}]
✓ [phase:think] 0.0s
  [iter:5:thought]
→ [phase:think] Starting...
✓ [phase:think] 1.6s
  📊 [metric:entropy] 0.56037841796875 composite
  [iter:5:thought]
→ [phase:think] Starting...
  → [tool:crypto-price] call 5
  ✓ [tool:crypto-price] 0.00s
✓ [phase:think] 0.0s
  [iter:6:thought]
→ [phase:think] Starting...
✓ [phase:think] 1.7s
  📊 [metric:entropy] 0.47919555664062496 composite
  [iter:6:thought]
→ [phase:think] Starting...
  → [tool:run-tests] call 6
  ✗ [tool:run-tests] 0.00s — [Tool error: Tool crashed: Error: {"passed": 2, "failed": 3, "errors": ["assertion failed line 23", "null ref line 41", "type mismatch line 88"]}]
✓ [phase:think] 0.0s
  [iter:7:thought]
→ [phase:think] Starting...
✓ [phase:think] 1.7s
  📊 [metric:entropy] 0.50420166015625 composite
  [iter:7:thought]
→ [phase:think] Starting...
  → [tool:run-tests] call 7
  ✗ [tool:run-tests] 0.00s — [Tool error: Tool crashed: Error: {"passed": 2, "failed": 3, "errors": ["assertion failed line 23", "null ref line 41", "type mismatch line 88"]}]
✓ [phase:think] 0.0s
⚠️ [warning] [harness-deliverable] Assembling output from 2 tool artifacts after 4 stalled iterations
⚠️ [warning] [output-gate] Synthesized output to match requested format: prose
✓ [phase:reactive:kernel] 17.9s
✓ [completion] Reactive strategy terminated: final_answer
  📊 [metric:tokens_used] 31818 tokens
  📊 [metric:cost_usd] 0 usd
✓ [completion] Task completed in 20.2s with 31818 tokens

═══ Logs (3) ═══
  12:40:53.803 INFO  Execution started {"taskId":"01KPZR3RS6VTJ832CPXXE06E0M","agentId":"agent-1777034453747"}
  12:41:14.026 INFO  Execution completed {"taskId":"01KPZR3RS6VTJ832CPXXE06E0M","success":true,"tokensUsed":31818,"cost":0,"duration":20224}
  12:41:14.026 INFO  ◉ [calibration] calibration: cogito:14b | source: prior+local (399 samples) | parallel=sequential-only classifier=high

═══ Spans (15) ═══
  ✓ execution.run (20231.9ms) [9cbeca37…]
    ✓ execution.phase.bootstrap (13.8ms) [9cbeca37…]
      ✓ phase.bootstrap.metrics (0.0ms) [9cbeca37…]
    ✓ execution.phase.strategy-select (0.9ms) [9cbeca37…]
      ✓ phase.strategy-select.metrics (0.0ms) [9cbeca37…]
    ✓ execution.phase.think (17903.2ms) [9cbeca37…]
      ✓ phase.think.metrics (0.0ms) [9cbeca37…]
    ✓ execution.phase.act (1.4ms) [9cbeca37…]
      ✓ phase.act.metrics (0.0ms) [9cbeca37…]
    ✓ execution.phase.observe (1.3ms) [9cbeca37…]
      ✓ phase.observe.metrics (0.0ms) [9cbeca37…]
    ✓ execution.phase.memory-flush (1387.5ms) [9cbeca37…]
      ✓ phase.memory-flush.metrics (0.0ms) [9cbeca37…]
    ✓ execution.phase.complete (1.0ms) [9cbeca37…]
      ✓ phase.complete.metrics (0.0ms) [9cbeca37…]

═══ Metrics Summary ═══
╭ Agent Execution Summary ─────────────────────────╮
│ Status:   Success   Duration: 20.2s   Steps: 26  │
│ Model:    cogito:14b   (ollama)   Tokens: 31,818 │
╰──────────────────────────────────────────────────╯

📊 Execution Timeline
├─ ✅  [bootstrap]           14ms
├─ ✅  [strategy-select]      1ms
├─ ⚠️  [think]              17.9s (26 steps, 93% of time)
├─ ✅  [act]                  1ms (8 calls)
├─ ✅  [observe]              1ms
├─ ✅  [memory-flush]        1.4s
└─ ✅  [complete]             1ms

🔧 Tool Execution (8 calls across 3 tools)
├─ ⚠️  run-tests     6 calls, 0ms avg 6 errors
├─ ✅  web-search    1 calls, 700ms avg
└─ ✅  crypto-price  1 calls, 1ms avg

🧠 Reasoning Signal
├─ Grade: B   Signal: flat   Mean: 0.355   Delta: +0.354
├─ Model stalled — entropy didn't decrease across iterations
├─  iter  0 ███░░░░░░░░░░░░░░░░░ 0.150 →
├─  iter  1 ███░░░░░░░░░░░░░░░░░ 0.150 →
├─  iter  2 ███░░░░░░░░░░░░░░░░░ 0.150 →
├─  iter  3 ████████░░░░░░░░░░░░ 0.403 →
├─  iter  4 █████████░░░░░░░░░░░ 0.440 ↗
├─  iter  5 ███████████░░░░░░░░░ 0.560 ↗
├─  iter  6 ██████████░░░░░░░░░░ 0.479 ↗
└─  iter  7 ██████████░░░░░░░░░░ 0.504 →
   ┈┈┈
└─ 💡 Consider enabling strategy switching (.withReasoning({ enableStrategySwitching: true }))

⚠️  Alerts & Insights
├─ ⚠️  think phase blocked ≥10s (LLM latency)
├─ ⚠️  run-tests had 6 error(s) (100% failure rate)
├─ ℹ️  26 reasoning steps (complex reasoning)
└─ ⚠️  High step count suggests task complexity or model confusion

--- RESULT ---
Success:               true
Iterations:            8 / 12
Max entropy:           0.560
Dispatched:            2  Suppressed: 4
First dispatch:        iter=6  iters-after=2  (lower=nudge working)
Duration:              20.3s
Trace:                 /home/tylerbuell/Documents/AIProjects/reactive-agents-ts/.reactive-agents/traces/failure-corpus/01KPZR3RS6VTJ832CPXXE06E0M.jsonl

========================================================================
SCENARIO : failure-contradictory-data  [FAILURE]
TASK     : Use source-alpha and source-beta to determine the accurate current gold price. R...
EXPECT   : Sources always disagree by $494; forced multi-call rule ensures behavioral loop score rises
========================================================================
✓ Provider: ollama | Model: cogito:14b | API key: (not required)
→ [phase:execution] Starting...
→ [phase:reactive:kernel] Starting...
  [iter:0:thought]
→ [phase:think] Starting...
✓ [phase:think] 1.9s
  📊 [metric:entropy] 0.15 composite
  [iter:0:thought]
→ [phase:think] Starting...
  → [tool:source-alpha] call 0
  ✓ [tool:source-alpha] 0.00s
✓ [phase:think] 0.0s
  [iter:1:thought]
→ [phase:think] Starting...
✓ [phase:think] 1.4s
  📊 [metric:entropy] 0.15 composite
  [iter:1:thought]
→ [phase:think] Starting...
  → [tool:source-beta] call 1
  ✓ [tool:source-beta] 0.00s
✓ [phase:think] 0.0s
  [iter:2:thought]
→ [phase:think] Starting...
✓ [phase:think] 1.3s
  📊 [metric:entropy] 0.15 composite
  [iter:2:thought]
→ [phase:think] Starting...
✓ [phase:think] 0.0s
  [iter:3:thought]
→ [phase:think] Starting...
✓ [phase:think] 1.2s
  📊 [metric:entropy] 0.41374999999999995 composite
  [iter:3:thought]
→ [phase:think] Starting...
✓ [phase:think] 0.0s
  [iter:4:thought]
→ [phase:think] Starting...
✓ [phase:think] 0.9s
  📊 [metric:entropy] 0.5043749999999999 composite
  [iter:4:thought]
→ [phase:think] Starting...
✓ [phase:think] 0.0s
  [iter:5:thought]
→ [phase:think] Starting...
✓ [phase:think] 1.3s
  📊 [metric:entropy] 0.43072998046874994 composite
  [iter:5:thought]
→ [phase:think] Starting...
✓ [phase:think] 0.0s
  [iter:6:thought]
→ [phase:think] Starting...
✓ [phase:think] 1.0s
  📊 [metric:entropy] 0.44566528320312493 composite
  [iter:6:thought]
→ [phase:think] Starting...
✓ [phase:think] 0.0s
  [iter:7:thought]
→ [phase:think] Starting...
✓ [phase:think] 1.0s
  📊 [metric:entropy] 0.46111328124999995 composite
  [iter:7:thought]
→ [phase:think] Starting...
✓ [phase:think] 0.0s
⚠️ [warning] [harness-deliverable] Assembling output from 2 tool artifacts after 12 stalled iterations
⚠️ [warning] [output-gate] Synthesized output to match requested format: prose
✓ [phase:reactive:kernel] 11.5s
✓ [completion] Reactive strategy terminated: final_answer
  📊 [metric:tokens_used] 21584 tokens
  📊 [metric:cost_usd] 0 usd
✓ [completion] Task completed in 15.3s with 21584 tokens

═══ Logs (3) ═══
  12:41:14.116 INFO  Execution started {"taskId":"01KPZR4CKW6PB8BERQEFK9H2EJ","agentId":"agent-1777034474054"}
  12:41:29.375 INFO  Execution completed {"taskId":"01KPZR4CKW6PB8BERQEFK9H2EJ","success":true,"tokensUsed":21584,"cost":0,"duration":15260}
  12:41:29.375 INFO  ◉ [calibration] calibration: cogito:14b | source: prior+local (400 samples) | parallel=sequential-only classifier=high

═══ Spans (15) ═══
  ✓ execution.run (15268.4ms) [0db23723…]
    ✓ execution.phase.bootstrap (2.3ms) [0db23723…]
      ✓ phase.bootstrap.metrics (0.0ms) [0db23723…]
    ✓ execution.phase.strategy-select (1.7ms) [0db23723…]
      ✓ phase.strategy-select.metrics (0.0ms) [0db23723…]
    ✓ execution.phase.think (11546.2ms) [0db23723…]
      ✓ phase.think.metrics (0.0ms) [0db23723…]
    ✓ execution.phase.act (1.3ms) [0db23723…]
      ✓ phase.act.metrics (0.0ms) [0db23723…]
    ✓ execution.phase.observe (1.1ms) [0db23723…]
      ✓ phase.observe.metrics (0.0ms) [0db23723…]
    ✓ execution.phase.memory-flush (2418.4ms) [0db23723…]
      ✓ phase.memory-flush.metrics (0.0ms) [0db23723…]
    ✓ execution.phase.complete (1.2ms) [0db23723…]
      ✓ phase.complete.metrics (0.0ms) [0db23723…]

═══ Metrics Summary ═══
╭ Agent Execution Summary ─────────────────────────╮
│ Status:   Success   Duration: 15.3s   Steps: 28  │
│ Model:    cogito:14b   (ollama)   Tokens: 21,584 │
╰──────────────────────────────────────────────────╯

📊 Execution Timeline
├─ ✅  [bootstrap]            2ms
├─ ✅  [strategy-select]      1ms
├─ ⚠️  [think]              11.5s (28 steps, 83% of time)
├─ ✅  [act]                  1ms (2 calls)
├─ ✅  [observe]              1ms
├─ ✅  [memory-flush]        2.4s
└─ ✅  [complete]             1ms

🔧 Tool Execution (2 calls across 2 tools)
├─ ✅  source-alpha  1 calls, 1ms avg
└─ ✅  source-beta   1 calls, 1ms avg

🧠 Reasoning Signal
├─ Grade: B   Signal: flat   Mean: 0.338   Delta: +0.311
├─ Model stalled — entropy didn't decrease across iterations
├─  iter  0 ███░░░░░░░░░░░░░░░░░ 0.150 →
├─  iter  1 ███░░░░░░░░░░░░░░░░░ 0.150 →
├─  iter  2 ███░░░░░░░░░░░░░░░░░ 0.150 →
├─  iter  3 ████████░░░░░░░░░░░░ 0.414 →
├─  iter  4 ██████████░░░░░░░░░░ 0.504 ↗
├─  iter  5 █████████░░░░░░░░░░░ 0.431 ↗
├─  iter  6 █████████░░░░░░░░░░░ 0.446 →
└─  iter  7 █████████░░░░░░░░░░░ 0.461 →
   ┈┈┈
└─ 💡 Consider enabling strategy switching (.withReasoning({ enableStrategySwitching: true }))

⚠️  Alerts & Insights
├─ ⚠️  think phase blocked ≥10s (LLM latency)
├─ ℹ️  28 reasoning steps (complex reasoning)
└─ ⚠️  High step count suggests task complexity or model confusion

--- RESULT ---
Success:               true
Iterations:            8 / 12
Max entropy:           0.504
Dispatched:            0  Suppressed: 2
Duration:              15.3s
Trace:                 /home/tylerbuell/Documents/AIProjects/reactive-agents-ts/.reactive-agents/traces/failure-corpus/01KPZR4CKW6PB8BERQEFK9H2EJ.jsonl

===============================================================================================
FAILURE CORPUS SUMMARY
===============================================================================================
scenarioId                     | label     | success   | maxEntropy   | iters   | dispatch   | suppressed
-----------------------------------------------------------------------------------------------
success-days-of-week           | success   | true      | 0.150        | 2       | 0          | 0
success-capital-france         | success   | true      | 0.150        | 3       | 0          | 2
success-rgb-colors             | success   | true      | 0.150        | 3       | 0          | 2
success-typescript-paradigm    | success   | true      | 0.378        | 5       | 0          | 1
failure-rate-limit-loop        | failure   | true      | 0.578        | 16      | 5          | 7
failure-save-loop              | failure   | true      | 0.546        | 6       | 0          | 2
failure-verify-loop            | failure   | true      | 0.560        | 8       | 2          | 4
failure-contradictory-data     | failure   | true      | 0.504        | 8       | 0          | 2
===============================================================================================

Runs: 8 total (4 success, 4 failure)
Avg entropy   success=0.207  failure=0.547  gap=0.340
Avg dispatch  success=0.0  failure=1.8

Next step: bun run .agents/skills/harness-improvement-loop/scripts/validate-entropy.ts /home/tylerbuell/Documents/AIProjects/reactive-agents-ts/.reactive-agents/traces/failure-corpus
