Grasp accuracy regressed against the last passing baseline.
One command runs a MuJoCo grasp trial, raises metric alarms, then lines up the exact visual evidence across semantic phases so a coding agent can decide what to edit next without a human replay ritual.
Grasp accuracy regressed against the last passing baseline.
Failure appears before contact. No need to inspect lift or placement first.
Still inside the target autonomous retry budget for CC/Codex.
Side view suggests elbow-body interference mid-trajectory despite final success.
Policy loaded
object pose seeded
baseline selected
Robot safe
bottle localized
path ready
First visual drift
Contact happens but is asymmetric.
Object leaves table, but the path quality is already degraded.
Looks acceptable at a glance. This is the trap.
Delta vs baseline: gripper centerline shifted 6 mm right.
Failure is obvious here: elbow path cuts through the body envelope.
This frame should trigger the collision suspicion banner.
Yaw alignment started clean, then drifted late in the trajectory.
Useful for telling planning error from contact error.
Finger symmetry breaks right before squeeze closes.
Helps the agent decide whether to tune approach or grip.
Task “passes” in scalar terms, but path quality is still ugly.
Important because success-rate-only tooling would stop here.
Reference card reminds the agent what “good” looked like.
This is the bridge from one-off report to repeatable harness.
python examples/demos/mujoco/grasp.py --report --compare baseline/grasp_good
outputs/
trial_041/
alarms.json
phase_manifest.json
report.html
approach/front_rgb.png
approach/side_rgb.png
contact/wrist_rgb.png