Learned Skills Trigger Flow

Brainstorming diagram for Right Agent stage 1: foreground learning ships first, while nudge signals, counters, provenance, and review contracts are laid down now so the background loop can be enabled later without redesign.

Hermes Model: Three Loops

Immediate self-improvement

Foreground agent works Complex task, recovered failure, user correction, or non-trivial workflow.
Prompt guidance says to save procedures as skills.
skill_manage during conversation Agent creates or patches a skill while still in the active turn.
Useful, but depends on model compliance.
Future reuse Skill appears in the skill index and loads by progressive disclosure.

Background and maintenance

Post-turn review fork After enough tool iterations, Hermes can fork a review agent with memory and skills tools.
This is the real safety net for missed foreground learning.
Review creates or patches skills It surveys the just-finished session and captures reusable procedures.
Later, a separate idle loop handles library hygiene.
Curator Weekly-ish idle maintenance: telemetry, stale/archive, patch/consolidate.

Why Nudges Exist

Foreground learning creates immediate value, but by itself it is only prompt compliance. Nudges make learning an observable system loop: after enough useful work, Right can ask a dedicated reviewer whether anything should have been persisted or repaired.

Reliability

Secondary goal

The foreground agent is focused on solving the user's task. Skill creation is easy to skip when the turn is long, urgent, or tool-heavy.

Trajectory

Episode view

Some reusable lessons are only obvious after failed attempts, user correction, and a final verified path are all visible together.

Repair

Skill drift

A loaded skill can be stale or incomplete. Foreground should patch it, but a nudge review catches missed repairs from the transcript.

Economics

Worth a review

Tool/correction counters do not prove there is a skill. They mean the episode was expensive enough that a bounded review may be worth it.

Right Agent Stage 1: Foreground Learning + Nudge Foundation

Foreground agent works Uses loaded skills, discovers workflow, hits surprise, or finds a skill incomplete.
->
Load right-learn-skill A built-in authoring skill teaches when to learn, how to name packages, and how to avoid noisy one-off skills.
skill_learning_start before writing Before creating or updating package files, the agent calls mcp__right__skill_learning_start.
->
Write learned skill package The active agent writes SKILL.md, scripts, refs, and assets directly under .claude/skills/<skill_name>/.
Sandbox-local publish New skills use rl-*. Updates may patch any non-core custom/manual/hub/rl-* skill.
->
skill_learning_finish receipt Successful finish sends localized text: Learned skill: <name> or Updated skill: <name>.
Future use receipt When a learned skill materially guides work, final response says Used learned skill: <name>.
->
Deferred-learning signal Emit only when a create/update trigger exists, no successful finish call happened, and the agent can cite event refs plus a reason for deferring.
Progress is not approval The user sees that value is being created, but there is no interactive approval gate.
->
Record nudge telemetry Store start/finish calls, provenance, event refs, skill use, skipped signals, and cheap counters for future background review.
Nudge worker can be off Stage 1 can persist signals and counters even if no background fork consumes them yet.
->
Write boundary Stage 1 may create rl-* skills and patch non-core skills; no core/platform/bundled/codegen-owned mutation.

Nudge Loop Foundation

Nudges are the reliability layer. Stage 1 does not need to run the background fork, but it should already record the signals and counters the fork will consume later. This turns learning from prompt compliance into an observable system loop.

Explicit

Foreground hints

  • learning_signal for new-skill candidates
  • skill_issue_signal for update candidates
  • only valid when no successful learning finish exists
Periodic

Nudge fallback

  • complex session with many tool calls
  • loaded learned skill followed by correction
  • recovered failure without foreground learning
Never

Excluded work

  • reviewing its own review output
  • editing core/platform/bundled/codegen-owned skills
  • sending user-facing progress chatter from background review
Decision

Fork decides

  • create rl-* skill
  • patch non-core skill
  • do nothing silently
Foundation rule: implement storage and schema now; running the fork is optional. The nudge loop must be able to catch both missed creation and missed skill repair once enabled.

Nudge Cost Controls

Cost controls are part of the foundation even if the worker is disabled. They define when a future background review is worth paying for and prevent a noisy learning loop.

Counters

Cheap gates

  • tool_iters_since_review
  • turns_since_review
  • skill_issue_hints_since_review
  • last_review_at
Gate

Run only when

  • foreground learning was skipped
  • skill issue hint exists
  • nudge threshold is crossed
  • and no review is already running
Budget caps

Hard limits

  • minimum interval between reviews
  • per-review model budget
  • per-day review count
  • per-day learning budget
Coalescing

One review window

  • merge multiple busy turns into one review
  • review events since last successful review
  • bounded by max events/chars
  • selected by salience, not tail length
Reviewer limits: exact thresholds can be tuned later, but the contract is fixed: no user-visible message unless the fork creates or updates a skill.

When Foreground Should Learn

Failure types are guidance for right-learn-skill, not a Rust classification system. The foreground agent decides whether the experience is reusable while the context is fresh.

Create

New workflow

  • complex task required several non-obvious steps
  • user explicitly asked to remember the workflow
  • tool/API usage pattern is likely to recur
Create

Recovered surprise

  • command/tool failed, then reusable fix was found
  • tool succeeded but returned an unexpected shape
  • user correction exposed a durable gotcha
Update

Loaded skill drift

  • loaded non-core skill missed a required step
  • skill had stale command/API behavior
  • skill caused confusion or rework
Skip

No skill

  • one-off task details or temporary progress
  • generic facts better stored as memory
  • unverified workaround or failed attempt
  • core/platform/bundled/codegen-owned skill changes
Write protocol: before any create/update file write, call mcp__right__skill_learning_start. After success or failure, call mcp__right__skill_learning_finish. If unsure, leave an optional nudge signal only when the signal contract below is satisfied.

Learning MCP Protocol + Structured Output

Foreground success is recorded through dedicated MCP calls, not through a final skill_receipt field. Structured output keeps only future-facing nudge signals and used-skill receipts.

Start call

{
  "action": "create",
  "skill_name": "rl-notion-database-filters",
  "reason": "recovered_surprise",
  "event_refs": ["e17", "e19", "e21"],
  "message": "Learning a reusable skill for Notion database filters."
}

Finish call

{
  "action": "create",
  "skill_name": "rl-notion-database-filters",
  "status": "created",
  "message": "<localized learned-skill receipt>",
  "summary": "Captured reusable Notion filter schema rule."
}

Rust handling

record_learning_start(call);
if call.is_foreground() {
    send_progress_to_user(call.message);
}

record_learning_finish(call);
if call.status.is_success() {
    send_receipt_to_user(call.message);
    ignore_background_signals_for_invocation();
}

Structured output remains

{
  "used_skill_receipts": [
    {
      "skill_name": "rl-notion-database-filters",
      "message": "<localized used-skill receipt>"
    }
  ],
  "learning_signal": null,
  "skill_issue_signal": null
}
Default

No skill

No successful finish call, no used receipt, and no nudge signal. Most turns should not learn anything.

Foreground

Foreground skill

The model calls start, writes or patches the package, then calls finish. MCP sends progress and receipt.

Nudge

Create signal

learning_signal queues a background fork only for a concrete new-skill candidate that foreground did not publish.

Nudge

Update signal

skill_issue_signal queues a background fork only for a concrete defect in an existing non-core skill.

Exact Signal Conditions

New-skill candidate

{
  "learning_signal": {
    "kind": "create_candidate",
    "package_name_hint": "rl-notion-database-filters",
    "trigger": "recovered_surprise",
    "reason_not_written": "needs_full_context_review",
    "event_refs": ["e17", "e19", "e21"],
    "summary": "Reusable Notion filter schema gotcha."
  }
}

Existing-skill update

{
  "skill_issue_signal": {
    "kind": "update_candidate",
    "skill_name": "notion-database-filters",
    "issue": "wrong_api_assumption",
    "reason_not_patched": "needs_existing_skill_diff",
    "observed_effect": "retry_after_user_correction",
    "event_refs": ["e12", "e14", "e18"],
    "patch_hint": "Use rich_text filter for this property."
  }
}
All required

Signal allowed

  • one create/update trigger from When Foreground Should Learn
  • no successful learning finish call was recorded
  • exactly one of learning_signal or skill_issue_signal
  • allowed defer reason is present
  • at least two useful event_refs, or one explicit user request
  • candidate is reusable across future sessions
Allowed

Defer reason

  • conversation_still_evolving
  • needs_full_context_review
  • write_or_publish_failed
  • needs_existing_skill_diff
Never signal

Reject

  • one-off task details
  • generic memory facts
  • unverified workaround
  • skill already created or updated successfully
Routing

Background action

  • learning_signal can create only
  • skill_issue_signal can patch only
  • if both appear, drop both and log schema violation
  • both are ignored after a successful learning finish
  • fork may still return learned=false
Create codes

Allowed triggers

  • explicit_user_request
  • multi_step_workflow
  • recovered_surprise
  • user_correction
  • repeated_tool_pattern
Update codes

Allowed issues

  • missing_step
  • stale_command
  • wrong_api_assumption
  • overbroad_activation
  • broken_script
  • unsafe_instruction
Evidence

Observed effects

  • retry_after_tool_error
  • retry_after_user_correction
  • manual_override
  • verified_alternative
Schema

Required naming

  • create uses package_name_hint with rl-*
  • update uses existing non-core skill_name
  • create uses reason_not_written
  • update uses reason_not_patched

Background Review Contract

Inputs

  • full completed conversation transcript when within budget
  • fallback context.json bundle when transcript is too large
  • invocation/run id and started/finished timestamps
  • timeline index with stable event ids
  • salient events selected by reason, not line count
  • learning_signal or skill_issue_signal and event refs when present
  • existing skill names and source/core classification when available
  • right-learn-skill authoring guidance

Allowed actions

  • decide whether a reusable workflow exists
  • create rl-* packages from learning_signal
  • patch non-core packages from skill_issue_signal
  • write under .claude/skills/<skill_name>/ inside the agent sandbox
  • include SKILL.md, scripts/, references/, assets/
  • call learning start/finish for metadata and receipts

Forbidden actions

  • ask the user questions
  • send user-facing progress messages
  • edit project files outside the learned skill package
  • edit core/platform/bundled/codegen-owned skills
  • trigger another learning review
  • learn from its own review output

Finish result

created or updated Successful finish stores skill name and sends receipt: Learned skill: <name>.
or
failed or skipped Store a short reason; do not send a success receipt.

Background Context Strategy

Learning should use the full completed conversation when it fits. That preserves multi-pass feedback loops where the skill only becomes obvious after several user corrections and model retries. If full context is too large, Rust builds a bounded review bundle from the completed invocation and starts an isolated background learner with an expansion index.

Primary fork scope

  • foreground: completed conversation transcript
  • background learner: separate task, not resumed main session
  • write target: .claude/skills/<skill_name>/ for create rl-* or patch non-core only
  • notification: sent only through successful learning finish

Fallback scope boundary

  • selected transcript windows from the completed conversation
  • event ids stable across transcript and bundle
  • timestamps used only as metadata, not as the main filter
  • expansion requests allowed for referenced events

Salience selection

  • always include user input and final answer excerpts
  • include all errors, retries, verification, and file writes
  • include all events referenced by signal event_refs
  • summarize low-signal spans by counts and short descriptions

Review bundle shape

{
  "scope": {
    "kind": "foreground",
    "session_id": "...",
    "invocation_id": "...",
    "started_at": "...",
    "finished_at": "..."
  },
  "skill_issue_signal": {
    "kind": "update_candidate",
    "skill_name": "notion-database-filters",
    "issue": "wrong_api_assumption",
    "reason_not_patched": "needs_existing_skill_diff",
    "observed_effect": "retry_after_user_correction",
    "event_refs": ["e17", "e19", "e21"],
    "patch_hint": "Use rich_text filter for this property."
  },
  "timeline_index": [
    { "id": "e17", "kind": "tool_use", "tool": "mcp__notion__query_database" }
  ],
  "included_events": [
    { "id": "e17", "reason": "signal_ref", "detail": "..." },
    { "id": "e19", "reason": "retry", "detail": "..." },
    { "id": "e21", "reason": "success_after_retry", "detail": "..." }
  ],
  "omitted_spans": [
    { "range": ["e22", "e80"], "summary": "read-only repository inspection" }
  ]
}

Why not always raw JSONL?

  • full sessions may hit context limits or trigger compaction
  • tail-only truncation can drop the important early surprise
  • date-only filtering still includes long low-value stretches
  • event ids let the foreground model point to exactly what mattered
  • fresh review session has predictable context and cost

Where Right Differs From Hermes

Boundary

Sandbox-local skill content

Hermes writes agent-created skills into the user's Hermes skills tree. Right keeps learned package files inside the agent sandbox in stage 1; host stores metadata only.

Trigger

Foreground plus nudges

Hermes supports foreground and background learning. Right ships foreground learning first, but records nudge-ready signals and counters from day one.

UX

Receipt, not approval

No approval queue. The value is visible through learned/used receipts, usually in the same response that created or updated the skill.

API

MCP start/finish only

Learning MCP tools validate skill names, send progress/receipts, and record metadata. The agent writes files directly; MCP never imports arbitrary host paths.

Scope

No curator in stage 1

Curator-style stale/archive/consolidation is deferred until learned skills exist and usage telemetry is meaningful.

Format

Full Agent Skills package

scripts/, references/, and assets/ are supported from the start; this is not a markdown-only memory feature.

Development start: implement right-learn-skill, sandbox-local learned skill publishing, rl-* create prefix, mcp__right__skill_learning_start, mcp__right__skill_learning_finish, used-skill receipts, signal schema, provenance, and nudge counters. The background worker can remain disabled until the foreground loop has real data to review.