Implementation:Intel Ipex llm Pipeline Parallel Multimodal
| Knowledge Sources | |
|---|---|
| Domains | Pipeline_Parallelism, Multimodal, Distributed_Inference |
| Last Updated | 2026-02-09 04:00 GMT |
Overview
Concrete tool for pipeline-parallel distributed inference with multimodal vision-language models on Intel XPU.
Description
This script demonstrates pipeline-parallel inference for the GLM-4V vision-language model across multiple Intel XPU devices. It uses IPEX-LLM's init_pipeline_parallel to set up distributed communication, loads the model with pipeline_parallel_stages for automatic layer distribution, and processes both text prompts and images (from URL or local path) for multimodal generation.
Usage
Use this when running multimodal vision-language models that require distribution across multiple GPUs for inference. It extends the standard pipeline-parallel pattern to handle image inputs alongside text.
Code Reference
Source Location
- Repository: Intel IPEX-LLM
- File: python/llm/example/GPU/Pipeline-Parallel-Inference/glm_4v_generate.py
- Lines: 1-87
Signature
# Script-based execution with argparse
# Key API calls:
init_pipeline_parallel()
model = AutoModelForCausalLM.from_pretrained(
model_path,
load_in_low_bit=args.low_bit,
pipeline_parallel_stages=args.gpu_num,
trust_remote_code=True,
)
Import
from ipex_llm.transformers import AutoModelForCausalLM, init_pipeline_parallel
from transformers import AutoTokenizer
from PIL import Image
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| repo-id-or-model-path | str | Yes | Model path (default: THUDM/glm-4v-9b) |
| image-url-or-path | str | No | Image URL or local file path |
| prompt | str | No | Text prompt for the model |
| gpu-num | int | No | Number of GPUs for pipeline (default: 2) |
| low-bit | str | No | Quantization type (default: sym_int4) |
Outputs
| Name | Type | Description |
|---|---|---|
| Generated text | Console (rank 0) | Multimodal model response |
| Timing metrics | Console | Inference latency |
Usage Examples
Pipeline Parallel Multimodal Inference
python -m torch.distributed.run --nproc_per_node 2 \
glm_4v_generate.py \
--repo-id-or-model-path "THUDM/glm-4v-9b" \
--image-url-or-path "https://example.com/image.jpg" \
--prompt "What is in this image?" \
--gpu-num 2