Implementation:OpenGVLab InternVL LLaVA CLI

Knowledge Sources	OpenGVLab_InternVL
Domains	Inference, CLI, LLaVA
Last Updated	2026-02-07 14:00 GMT

Overview

Interactive command-line chat interface for LLaVA multimodal models, enabling conversational image understanding from the terminal.

Description

This script provides a terminal-based interactive demo for LLaVA models. It loads a pretrained model via load_pretrained_model() with optional 8-bit or 4-bit quantization, then enters an interactive loop. The load_image() helper supports both local file paths and HTTP/HTTPS URLs. The conversation mode is auto-detected from the model name (supporting llava_llama_2, llava_v1, mpt, and llava_v0 templates), with an override via --conv-mode. On the first user message, the image is prepended to the prompt using DEFAULT_IMAGE_TOKEN (with optional start/end tokens based on model config). Subsequent messages are text-only within the same conversation context. Token generation uses torch.inference_mode(), sampling with configurable temperature and max_new_tokens, and HuggingFace's TextStreamer for real-time token-by-token output. The conversation history is maintained across turns, with a KeywordsStoppingCriteria to halt generation at the conversation separator.

Usage

Use this script for quick terminal-based testing of LLaVA models with a single image and multi-turn text conversation, without needing to set up a web server.

Code Reference

Source Location

Repository: OpenGVLab_InternVL
File: internvl_chat_llava/llava/serve/cli.py
Lines: 1-125

Signature

def load_image(image_file): ...
def main(args): ...

Import

# Standalone script, run directly:
# python -m llava.serve.cli --model-path <path> --image-file <path>

I/O Contract

Inputs

Name	Type	Required	Description
--model-path	str	Yes	Path to the pretrained LLaVA model
--image-file	str	Yes	Path or URL to the input image
--model-base	str	No	Base model path for LoRA models
--device	str	No	Device to run on (default: "cuda")
--conv-mode	str	No	Conversation template override
--temperature	float	No	Sampling temperature (default: 0.2)
--max-new-tokens	int	No	Maximum generated tokens (default: 512)
--load-8bit	flag	No	Enable 8-bit quantization
--load-4bit	flag	No	Enable 4-bit quantization

Outputs

Name	Type	Description
stdout	text	Streamed model responses printed to terminal

Usage Examples

Basic Usage

# From command line:
# python -m llava.serve.cli \
#     --model-path liuhaotian/llava-v1.5-7b \
#     --image-file https://example.com/image.jpg \
#     --temperature 0.2

# Interactive session:
# user: What do you see in this image?
# assistant: I see a cat sitting on a windowsill...

Related Pages

Principle:OpenGVLab_InternVL_Model_Inference_Loading

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment