Implementation:InternLM Lmdeploy Pipeline Chat VLM
| Knowledge Sources | |
|---|---|
| Domains | Vision_Language_Models, Multimodal_AI |
| Last Updated | 2026-02-07 15:00 GMT |
Overview
Concrete tool for executing vision-language model inference with image-text inputs through the Pipeline chat interface provided by the LMDeploy library.
Description
The Pipeline.chat() method and Pipeline.__call__() with tuple inputs provide VLM inference capabilities. The chat method supports multi-turn conversations with session state, while __call__ handles batch VLM inference. Both accept prompts as tuples of (text, image) or (text, [image1, image2]).
Usage
Use Pipeline.__call__() for batch VLM inference and Pipeline.chat() for interactive multi-turn VLM conversations. Pass images as PIL Image objects loaded via load_image().
Code Reference
Source Location
- Repository: lmdeploy
- File: lmdeploy/pipeline.py
- Lines: L169-227 (chat), L83-122 (infer), L305-309 (__call__)
Signature
class Pipeline:
def chat(self,
prompt: str | Tuple[str, Union[Image, List[Image]]],
session=None,
gen_config: GenerationConfig = None,
stream_response: bool = False,
do_preprocess: bool = None,
adapter_name: str = None,
**kwargs) -> Session | Iterator:
"""Multi-turn chat with optional image input."""
...
def __call__(self,
prompts: List[Tuple] | Tuple | List[str] | str,
gen_config=None, **kwargs):
"""Batch inference with optional image inputs."""
...
Import
from lmdeploy import pipeline
from lmdeploy.vl import load_image
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| prompt | str or Tuple[str, Image] | Yes | Text prompt or (text, image) tuple |
| session | Session | No | Session object for multi-turn conversations |
| gen_config | GenerationConfig | No | Generation parameters |
| stream_response | bool | No | Enable token streaming (default: False) |
Outputs
| Name | Type | Description |
|---|---|---|
| Session | Session | Updated session with response (chat mode) |
| Response or List[Response] | Response | Generated text describing/analyzing images (batch mode) |
Usage Examples
Single Image Inference
from lmdeploy import pipeline, TurbomindEngineConfig
from lmdeploy.vl import load_image
pipe = pipeline('OpenGVLab/InternVL2-8B',
backend_config=TurbomindEngineConfig(session_len=8192))
image = load_image('https://example.com/photo.jpg')
response = pipe(('Describe this image in detail', image))
print(response.text)
Multi-turn Chat
from lmdeploy import pipeline
from lmdeploy.vl import load_image
pipe = pipeline('OpenGVLab/InternVL2-8B')
image = load_image('photo.jpg')
# First turn with image
session = pipe.chat(('What is in this image?', image))
print(session.response.text)
# Follow-up question (same session)
session = pipe.chat('How many objects are there?', session=session)
print(session.response.text)