Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Alibaba ROLL DPO ActorWorker Compute Log Probs

From Leeroopedia


Knowledge Sources
Domains Alignment, LLM_Inference
Last Updated 2026-02-07 20:00 GMT

Overview

Concrete reference log probability computation method from the DPO ActorWorker provided by the Alibaba ROLL library.

Description

The ActorWorker.compute_log_probs method runs a forward pass through the reference model to extract per-token log probabilities. It uses the DP_MP_DISPATCH_FIRST dispatch pattern for distributed computation.

Usage

Called by the DPO pipeline before each training batch to precompute reference log probabilities.

Code Reference

Source Location

  • Repository: Alibaba ROLL
  • File: roll/pipeline/dpo/actor_worker.py
  • Lines: L106-149

Signature

class ActorWorker(BaseActorWorker):
    @register(dispatch_mode=Dispatch.DP_MP_DISPATCH_FIRST, clear_cache=False)
    def compute_log_probs(self, data: DataProto) -> DataProto:
        """
        Compute per-token log probabilities.

        Args:
            data: DataProto with input_ids, attention_mask, prompt_id_lens, position_ids

        Returns:
            DataProto with log_probs tensor (2*B, seq_len-1)
        """

Import

from roll.pipeline.dpo.actor_worker import ActorWorker

I/O Contract

Inputs

Name Type Required Description
data DataProto Yes Batch with interleaved chosen/rejected sequences

Outputs

Name Type Description
log_probs torch.Tensor Per-token log probabilities shape (2*B, seq_len-1)

Usage Examples

# Called via cluster dispatch:
ref_data = reference_cluster.execute_all_sync("compute_log_probs", batch)
ref_log_probs = ref_data.batch["log_probs"]

Related Pages

Implements Principle

Requires Environment

Environment Dependencies

This implementation requires the following environment constraints:

Heuristics Applied

This implementation uses the following heuristics:

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment