Implementation:Intel Ipex llm Init Pipeline Parallel
Appearance
| Knowledge Sources | |
|---|---|
| Domains | Distributed_Computing, Inference |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
Concrete tool for initializing IPEX-LLM's pipeline parallelism communication layer across Intel GPUs.
Description
The init_pipeline_parallel() function initializes the pipeline parallel communication backend. It uses torch.distributed environment variables (set by mpirun) to establish communication between GPUs. This is a one-time call at the start of a pipeline parallel script.
Usage
Call once at script startup, before loading any models. Must be used in conjunction with mpirun launcher.
Code Reference
Source Location
- Repository: IPEX-LLM
- File: python/llm/example/GPU/Pipeline-Parallel-Inference/generate.py
- Lines: 22-25
Signature
def init_pipeline_parallel() -> None:
"""Initialize pipeline parallel communication using torch.distributed."""
Import
from ipex_llm.transformers import init_pipeline_parallel
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| (none) | — | — | Uses torch.distributed environment variables (LOCAL_RANK, WORLD_SIZE, etc.) |
Outputs
| Name | Type | Description |
|---|---|---|
| (none) | — | Pipeline parallel communication initialized (side effect) |
Usage Examples
from ipex_llm.transformers import AutoModelForCausalLM, init_pipeline_parallel
# 1. Initialize pipeline parallel (must be first)
init_pipeline_parallel()
# 2. Load model with pipeline parallel stages
model = AutoModelForCausalLM.from_pretrained(
"meta-llama/Llama-2-13b-chat-hf",
load_in_low_bit="sym_int4",
optimize_model=True,
trust_remote_code=True,
use_cache=True,
torch_dtype=torch.float16,
pipeline_parallel_stages=2, # Distribute across 2 GPUs
)
# Launch with mpirun
mpirun -n 2 --ppn 2 python generate.py --gpu-num 2
Related Pages
Implements Principle
Requires Environment
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment