Implementation:Intel Ipex llm Pipeline Parallel Multimodal

Knowledge Sources	Intel IPEX-LLM
Domains	Pipeline_Parallelism, Multimodal, Distributed_Inference
Last Updated	2026-02-09 04:00 GMT

Overview

Concrete tool for pipeline-parallel distributed inference with multimodal vision-language models on Intel XPU.

Description

This script demonstrates pipeline-parallel inference for the GLM-4V vision-language model across multiple Intel XPU devices. It uses IPEX-LLM's init_pipeline_parallel to set up distributed communication, loads the model with pipeline_parallel_stages for automatic layer distribution, and processes both text prompts and images (from URL or local path) for multimodal generation.

Usage

Use this when running multimodal vision-language models that require distribution across multiple GPUs for inference. It extends the standard pipeline-parallel pattern to handle image inputs alongside text.

Code Reference

Source Location

Repository: Intel IPEX-LLM
File: python/llm/example/GPU/Pipeline-Parallel-Inference/glm_4v_generate.py
Lines: 1-87

Signature

# Script-based execution with argparse
# Key API calls:
init_pipeline_parallel()
model = AutoModelForCausalLM.from_pretrained(
    model_path,
    load_in_low_bit=args.low_bit,
    pipeline_parallel_stages=args.gpu_num,
    trust_remote_code=True,
)

Import

from ipex_llm.transformers import AutoModelForCausalLM, init_pipeline_parallel
from transformers import AutoTokenizer
from PIL import Image

I/O Contract

Inputs

Name	Type	Required	Description
repo-id-or-model-path	str	Yes	Model path (default: THUDM/glm-4v-9b)
image-url-or-path	str	No	Image URL or local file path
prompt	str	No	Text prompt for the model
gpu-num	int	No	Number of GPUs for pipeline (default: 2)
low-bit	str	No	Quantization type (default: sym_int4)

Outputs

Name	Type	Description
Generated text	Console (rank 0)	Multimodal model response
Timing metrics	Console	Inference latency

Usage Examples

Pipeline Parallel Multimodal Inference

python -m torch.distributed.run --nproc_per_node 2 \
    glm_4v_generate.py \
    --repo-id-or-model-path "THUDM/glm-4v-9b" \
    --image-url-or-path "https://example.com/image.jpg" \
    --prompt "What is in this image?" \
    --gpu-num 2

Related Pages

Environment:Intel_Ipex_llm_Pipeline_Parallel_Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment