Implementation:Alibaba ROLL DPO ActorWorker Compute Log Probs

Knowledge Sources	Alibaba ROLL
Domains	Alignment, LLM_Inference
Last Updated	2026-02-07 20:00 GMT

Overview

Concrete reference log probability computation method from the DPO ActorWorker provided by the Alibaba ROLL library.

Description

The ActorWorker.compute_log_probs method runs a forward pass through the reference model to extract per-token log probabilities. It uses the DP_MP_DISPATCH_FIRST dispatch pattern for distributed computation.

Usage

Called by the DPO pipeline before each training batch to precompute reference log probabilities.

Code Reference

Source Location

Repository: Alibaba ROLL
File: roll/pipeline/dpo/actor_worker.py
Lines: L106-149

Signature

class ActorWorker(BaseActorWorker):
    @register(dispatch_mode=Dispatch.DP_MP_DISPATCH_FIRST, clear_cache=False)
    def compute_log_probs(self, data: DataProto) -> DataProto:
        """
        Compute per-token log probabilities.

        Args:
            data: DataProto with input_ids, attention_mask, prompt_id_lens, position_ids

        Returns:
            DataProto with log_probs tensor (2*B, seq_len-1)
        """

Import

from roll.pipeline.dpo.actor_worker import ActorWorker

I/O Contract

Inputs

Name	Type	Required	Description
data	DataProto	Yes	Batch with interleaved chosen/rejected sequences

Outputs

Name	Type	Description
log_probs	torch.Tensor	Per-token log probabilities shape (2*B, seq_len-1)

Usage Examples

# Called via cluster dispatch:
ref_data = reference_cluster.execute_all_sync("compute_log_probs", batch)
ref_log_probs = ref_data.batch["log_probs"]

Related Pages

Implements Principle

Principle:Alibaba_ROLL_Reference_Log_Probability

Requires Environment

Environment Dependencies

This implementation requires the following environment constraints:

Environment:Alibaba_ROLL_CUDA_GPU_Environment

Heuristics Applied

This implementation uses the following heuristics:

Heuristic:Alibaba_ROLL_Numerical_Stability_Epsilon

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment