Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Intel Ipex llm Pipeline Parallel Multimodal

From Leeroopedia


Knowledge Sources
Domains Pipeline_Parallelism, Multimodal, Distributed_Inference
Last Updated 2026-02-09 04:00 GMT

Overview

Concrete tool for pipeline-parallel distributed inference with multimodal vision-language models on Intel XPU.

Description

This script demonstrates pipeline-parallel inference for the GLM-4V vision-language model across multiple Intel XPU devices. It uses IPEX-LLM's init_pipeline_parallel to set up distributed communication, loads the model with pipeline_parallel_stages for automatic layer distribution, and processes both text prompts and images (from URL or local path) for multimodal generation.

Usage

Use this when running multimodal vision-language models that require distribution across multiple GPUs for inference. It extends the standard pipeline-parallel pattern to handle image inputs alongside text.

Code Reference

Source Location

Signature

# Script-based execution with argparse
# Key API calls:
init_pipeline_parallel()
model = AutoModelForCausalLM.from_pretrained(
    model_path,
    load_in_low_bit=args.low_bit,
    pipeline_parallel_stages=args.gpu_num,
    trust_remote_code=True,
)

Import

from ipex_llm.transformers import AutoModelForCausalLM, init_pipeline_parallel
from transformers import AutoTokenizer
from PIL import Image

I/O Contract

Inputs

Name Type Required Description
repo-id-or-model-path str Yes Model path (default: THUDM/glm-4v-9b)
image-url-or-path str No Image URL or local file path
prompt str No Text prompt for the model
gpu-num int No Number of GPUs for pipeline (default: 2)
low-bit str No Quantization type (default: sym_int4)

Outputs

Name Type Description
Generated text Console (rank 0) Multimodal model response
Timing metrics Console Inference latency

Usage Examples

Pipeline Parallel Multimodal Inference

python -m torch.distributed.run --nproc_per_node 2 \
    glm_4v_generate.py \
    --repo-id-or-model-path "THUDM/glm-4v-9b" \
    --image-url-or-path "https://example.com/image.jpg" \
    --prompt "What is in this image?" \
    --gpu-num 2

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment