Implementation:InternLM Lmdeploy Pipeline Chat VLM

Knowledge Sources	LMDeploy VLM Pipeline
Domains	Vision_Language_Models, Multimodal_AI
Last Updated	2026-02-07 15:00 GMT

Overview

Concrete tool for executing vision-language model inference with image-text inputs through the Pipeline chat interface provided by the LMDeploy library.

Description

The Pipeline.chat() method and Pipeline.__call__() with tuple inputs provide VLM inference capabilities. The chat method supports multi-turn conversations with session state, while __call__ handles batch VLM inference. Both accept prompts as tuples of (text, image) or (text, [image1, image2]).

Usage

Use Pipeline.__call__() for batch VLM inference and Pipeline.chat() for interactive multi-turn VLM conversations. Pass images as PIL Image objects loaded via load_image().

Code Reference

Source Location

Repository: lmdeploy
File: lmdeploy/pipeline.py
Lines: L169-227 (chat), L83-122 (infer), L305-309 (__call__)

Signature

class Pipeline:
    def chat(self,
             prompt: str | Tuple[str, Union[Image, List[Image]]],
             session=None,
             gen_config: GenerationConfig = None,
             stream_response: bool = False,
             do_preprocess: bool = None,
             adapter_name: str = None,
             **kwargs) -> Session | Iterator:
        """Multi-turn chat with optional image input."""
        ...

    def __call__(self,
                 prompts: List[Tuple] | Tuple | List[str] | str,
                 gen_config=None, **kwargs):
        """Batch inference with optional image inputs."""
        ...

Import

from lmdeploy import pipeline
from lmdeploy.vl import load_image

I/O Contract

Inputs

Name	Type	Required	Description
prompt	str or Tuple[str, Image]	Yes	Text prompt or (text, image) tuple
session	Session	No	Session object for multi-turn conversations
gen_config	GenerationConfig	No	Generation parameters
stream_response	bool	No	Enable token streaming (default: False)

Outputs

Name	Type	Description
Session	Session	Updated session with response (chat mode)
Response or List[Response]	Response	Generated text describing/analyzing images (batch mode)

Usage Examples

Single Image Inference

from lmdeploy import pipeline, TurbomindEngineConfig
from lmdeploy.vl import load_image

pipe = pipeline('OpenGVLab/InternVL2-8B',
                backend_config=TurbomindEngineConfig(session_len=8192))

image = load_image('https://example.com/photo.jpg')
response = pipe(('Describe this image in detail', image))
print(response.text)

Multi-turn Chat

from lmdeploy import pipeline
from lmdeploy.vl import load_image

pipe = pipeline('OpenGVLab/InternVL2-8B')
image = load_image('photo.jpg')

# First turn with image
session = pipe.chat(('What is in this image?', image))
print(session.response.text)

# Follow-up question (same session)
session = pipe.chat('How many objects are there?', session=session)
print(session.response.text)

Related Pages

Implements Principle

Principle:InternLM_Lmdeploy_Multimodal_Inference

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment