Implementation:OpenGVLab InternVL Run LLaVA CLI

Knowledge Sources	OpenGVLab_InternVL
Domains	Inference, CLI, Multimodal
Last Updated	2026-02-07 14:00 GMT

Overview

This script provides a simple command-line interface for running single-image inference with a LLaVA model, supporting both local files and URLs as image sources.

Description

The run_llava.py script is a lightweight CLI tool for quick LLaVA inference on individual image-question pairs. It implements:

load_image function: Handles image loading from two sources:

HTTP/HTTPS URLs: Downloads the image via requests.get and opens it from bytes
Local file paths: Opens directly from disk

Both paths convert the image to RGB format.

eval_model function: Runs the full inference pipeline:

Loads the model via load_pretrained_model
Prepends the appropriate image token to the query based on mm_use_im_start_end config
Auto-detects conversation mode from the model name: "llava_llama_2" for LLaMA-2 models, "llava_v1" for v1 models, "mpt" for MPT models, and "llava_v0" as default
Constructs the conversation prompt, tokenizes with image token insertion, and preprocesses the image
Runs inference with temperature=0.2 sampling and 1024 max new tokens
Prints the generated response to stdout

The script warns if the auto-inferred conversation mode differs from an explicitly provided --conv-mode argument.

Usage

Use this script for quick testing and demonstration of LLaVA model capabilities on individual images. It is not designed for batch evaluation.

Code Reference

Source Location

Repository: OpenGVLab_InternVL
File: internvl_chat_llava/llava/eval/run_llava.py
Lines: 1-97

Signature

def load_image(image_file: str) -> Image.Image: ...

def eval_model(args: argparse.Namespace) -> None: ...

Import

from llava.eval.run_llava import load_image, eval_model

I/O Contract

Inputs

Name	Type	Required	Description
--model-path	str	Yes	Path to the pretrained LLaVA model
--model-base	str	No	Base model path for LoRA or projector-only models
--image-file	str	Yes	Path or URL to the input image
--query	str	Yes	The question to ask about the image
--conv-mode	str	No	Conversation template override (default: auto-detected from model name)

Outputs

Name	Type	Description
stdout	text	The model's generated text response printed to console

Usage Examples

Basic Usage

# Run inference on a local image
# python internvl_chat_llava/llava/eval/run_llava.py \
#     --model-path /path/to/llava-v1.5-7b \
#     --image-file /path/to/image.jpg \
#     --query "Describe this image in detail."

# Run inference on a URL image
# python internvl_chat_llava/llava/eval/run_llava.py \
#     --model-path /path/to/llava-v1.5-7b \
#     --image-file "https://example.com/image.jpg" \
#     --query "What objects are in this image?"

Related Pages

Principle:OpenGVLab_InternVL_Model_Inference_Loading

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment