Implementation:Alibaba ROLL DPO ActorWorker Compute Log Probs
Appearance
| Knowledge Sources | |
|---|---|
| Domains | Alignment, LLM_Inference |
| Last Updated | 2026-02-07 20:00 GMT |
Overview
Concrete reference log probability computation method from the DPO ActorWorker provided by the Alibaba ROLL library.
Description
The ActorWorker.compute_log_probs method runs a forward pass through the reference model to extract per-token log probabilities. It uses the DP_MP_DISPATCH_FIRST dispatch pattern for distributed computation.
Usage
Called by the DPO pipeline before each training batch to precompute reference log probabilities.
Code Reference
Source Location
- Repository: Alibaba ROLL
- File: roll/pipeline/dpo/actor_worker.py
- Lines: L106-149
Signature
class ActorWorker(BaseActorWorker):
@register(dispatch_mode=Dispatch.DP_MP_DISPATCH_FIRST, clear_cache=False)
def compute_log_probs(self, data: DataProto) -> DataProto:
"""
Compute per-token log probabilities.
Args:
data: DataProto with input_ids, attention_mask, prompt_id_lens, position_ids
Returns:
DataProto with log_probs tensor (2*B, seq_len-1)
"""
Import
from roll.pipeline.dpo.actor_worker import ActorWorker
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| data | DataProto | Yes | Batch with interleaved chosen/rejected sequences |
Outputs
| Name | Type | Description |
|---|---|---|
| log_probs | torch.Tensor | Per-token log probabilities shape (2*B, seq_len-1) |
Usage Examples
# Called via cluster dispatch:
ref_data = reference_cluster.execute_all_sync("compute_log_probs", batch)
ref_log_probs = ref_data.batch["log_probs"]
Related Pages
Implements Principle
Requires Environment
Environment Dependencies
This implementation requires the following environment constraints:
Heuristics Applied
This implementation uses the following heuristics:
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment