Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Intel Ipex llm Init Pipeline Parallel

From Leeroopedia
Revision as of 15:12, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Intel_Ipex_llm_Init_Pipeline_Parallel.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Knowledge Sources
Domains Distributed_Computing, Inference
Last Updated 2026-02-09 00:00 GMT

Overview

Concrete tool for initializing IPEX-LLM's pipeline parallelism communication layer across Intel GPUs.

Description

The init_pipeline_parallel() function initializes the pipeline parallel communication backend. It uses torch.distributed environment variables (set by mpirun) to establish communication between GPUs. This is a one-time call at the start of a pipeline parallel script.

Usage

Call once at script startup, before loading any models. Must be used in conjunction with mpirun launcher.

Code Reference

Source Location

  • Repository: IPEX-LLM
  • File: python/llm/example/GPU/Pipeline-Parallel-Inference/generate.py
  • Lines: 22-25

Signature

def init_pipeline_parallel() -> None:
    """Initialize pipeline parallel communication using torch.distributed."""

Import

from ipex_llm.transformers import init_pipeline_parallel

I/O Contract

Inputs

Name Type Required Description
(none) Uses torch.distributed environment variables (LOCAL_RANK, WORLD_SIZE, etc.)

Outputs

Name Type Description
(none) Pipeline parallel communication initialized (side effect)

Usage Examples

from ipex_llm.transformers import AutoModelForCausalLM, init_pipeline_parallel

# 1. Initialize pipeline parallel (must be first)
init_pipeline_parallel()

# 2. Load model with pipeline parallel stages
model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-2-13b-chat-hf",
    load_in_low_bit="sym_int4",
    optimize_model=True,
    trust_remote_code=True,
    use_cache=True,
    torch_dtype=torch.float16,
    pipeline_parallel_stages=2,  # Distribute across 2 GPUs
)
# Launch with mpirun
mpirun -n 2 --ppn 2 python generate.py --gpu-num 2

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment