Combined preference record for training.
Readonly
The routing decision
The task outcome
Optional
Explicit preference (if provided)
Computed reward signal
Combined preference record for training.