Brainstorming diagram for Right Agent stage 1: foreground learning ships first, while nudge signals, counters, provenance, and review contracts are laid down now so the background loop can be enabled later without redesign.
Foreground learning creates immediate value, but by itself it is only prompt compliance. Nudges make learning an observable system loop: after enough useful work, Right can ask a dedicated reviewer whether anything should have been persisted or repaired.
The foreground agent is focused on solving the user's task. Skill creation is easy to skip when the turn is long, urgent, or tool-heavy.
Some reusable lessons are only obvious after failed attempts, user correction, and a final verified path are all visible together.
A loaded skill can be stale or incomplete. Foreground should patch it, but a nudge review catches missed repairs from the transcript.
Tool/correction counters do not prove there is a skill. They mean the episode was expensive enough that a bounded review may be worth it.
mcp__right__skill_learning_start.
SKILL.md, scripts, refs, and assets directly under .claude/skills/<skill_name>/.
rl-*. Updates may patch any non-core custom/manual/hub/rl-* skill.
Learned skill: <name> or Updated skill: <name>.
Used learned skill: <name>.
rl-* skills and patch non-core skills; no core/platform/bundled/codegen-owned mutation.
Nudges are the reliability layer. Stage 1 does not need to run the background fork, but it should already record the signals and counters the fork will consume later. This turns learning from prompt compliance into an observable system loop.
learning_signal for new-skill candidatesskill_issue_signal for update candidatesrl-* skillCost controls are part of the foundation even if the worker is disabled. They define when a future background review is worth paying for and prevent a noisy learning loop.
tool_iters_since_reviewturns_since_reviewskill_issue_hints_since_reviewlast_review_at
Failure types are guidance for right-learn-skill, not a Rust classification system. The foreground
agent decides whether the experience is reusable while the context is fresh.
mcp__right__skill_learning_start. After success or failure, call
mcp__right__skill_learning_finish. If unsure, leave an optional nudge signal only when the signal
contract below is satisfied.
Foreground success is recorded through dedicated MCP calls, not through a final skill_receipt field.
Structured output keeps only future-facing nudge signals and used-skill receipts.
{
"action": "create",
"skill_name": "rl-notion-database-filters",
"reason": "recovered_surprise",
"event_refs": ["e17", "e19", "e21"],
"message": "Learning a reusable skill for Notion database filters."
}
{
"action": "create",
"skill_name": "rl-notion-database-filters",
"status": "created",
"message": "<localized learned-skill receipt>",
"summary": "Captured reusable Notion filter schema rule."
}
record_learning_start(call);
if call.is_foreground() {
send_progress_to_user(call.message);
}
record_learning_finish(call);
if call.status.is_success() {
send_receipt_to_user(call.message);
ignore_background_signals_for_invocation();
}
{
"used_skill_receipts": [
{
"skill_name": "rl-notion-database-filters",
"message": "<localized used-skill receipt>"
}
],
"learning_signal": null,
"skill_issue_signal": null
}
No successful finish call, no used receipt, and no nudge signal. Most turns should not learn anything.
The model calls start, writes or patches the package, then calls finish. MCP sends progress and receipt.
learning_signal queues a background fork only for a concrete new-skill candidate that foreground did not publish.
skill_issue_signal queues a background fork only for a concrete defect in an existing non-core skill.
{
"learning_signal": {
"kind": "create_candidate",
"package_name_hint": "rl-notion-database-filters",
"trigger": "recovered_surprise",
"reason_not_written": "needs_full_context_review",
"event_refs": ["e17", "e19", "e21"],
"summary": "Reusable Notion filter schema gotcha."
}
}
{
"skill_issue_signal": {
"kind": "update_candidate",
"skill_name": "notion-database-filters",
"issue": "wrong_api_assumption",
"reason_not_patched": "needs_existing_skill_diff",
"observed_effect": "retry_after_user_correction",
"event_refs": ["e12", "e14", "e18"],
"patch_hint": "Use rich_text filter for this property."
}
}
When Foreground Should Learnlearning_signal or skill_issue_signalevent_refs, or one explicit user requestconversation_still_evolvingneeds_full_context_reviewwrite_or_publish_failedneeds_existing_skill_difflearning_signal can create onlyskill_issue_signal can patch onlylearned=falseexplicit_user_requestmulti_step_workflowrecovered_surpriseuser_correctionrepeated_tool_patternmissing_stepstale_commandwrong_api_assumptionoverbroad_activationbroken_scriptunsafe_instructionretry_after_tool_errorretry_after_user_correctionmanual_overrideverified_alternativepackage_name_hint with rl-*skill_namereason_not_writtenreason_not_patchedcontext.json bundle when transcript is too largelearning_signal or skill_issue_signal and event refs when presentright-learn-skill authoring guidancerl-* packages from learning_signalskill_issue_signal.claude/skills/<skill_name>/ inside the agent sandboxSKILL.md, scripts/, references/, assets/Learned skill: <name>.
Learning should use the full completed conversation when it fits. That preserves multi-pass feedback loops where the skill only becomes obvious after several user corrections and model retries. If full context is too large, Rust builds a bounded review bundle from the completed invocation and starts an isolated background learner with an expansion index.
.claude/skills/<skill_name>/ for create rl-* or patch non-core onlyevent_refs{
"scope": {
"kind": "foreground",
"session_id": "...",
"invocation_id": "...",
"started_at": "...",
"finished_at": "..."
},
"skill_issue_signal": {
"kind": "update_candidate",
"skill_name": "notion-database-filters",
"issue": "wrong_api_assumption",
"reason_not_patched": "needs_existing_skill_diff",
"observed_effect": "retry_after_user_correction",
"event_refs": ["e17", "e19", "e21"],
"patch_hint": "Use rich_text filter for this property."
},
"timeline_index": [
{ "id": "e17", "kind": "tool_use", "tool": "mcp__notion__query_database" }
],
"included_events": [
{ "id": "e17", "reason": "signal_ref", "detail": "..." },
{ "id": "e19", "reason": "retry", "detail": "..." },
{ "id": "e21", "reason": "success_after_retry", "detail": "..." }
],
"omitted_spans": [
{ "range": ["e22", "e80"], "summary": "read-only repository inspection" }
]
}
Hermes writes agent-created skills into the user's Hermes skills tree. Right keeps learned package files inside the agent sandbox in stage 1; host stores metadata only.
Hermes supports foreground and background learning. Right ships foreground learning first, but records nudge-ready signals and counters from day one.
No approval queue. The value is visible through learned/used receipts, usually in the same response that created or updated the skill.
Learning MCP tools validate skill names, send progress/receipts, and record metadata. The agent writes files directly; MCP never imports arbitrary host paths.
Curator-style stale/archive/consolidation is deferred until learned skills exist and usage telemetry is meaningful.
scripts/, references/, and assets/ are supported from the start; this is not a markdown-only memory feature.
right-learn-skill, sandbox-local learned skill publishing,
rl-* create prefix, mcp__right__skill_learning_start,
mcp__right__skill_learning_finish, used-skill receipts, signal schema, provenance, and nudge counters.
The background worker can remain disabled until the foreground loop has real data to review.