Implementation:OpenGVLab InternVL Run LLaVA CLI
| Knowledge Sources | |
|---|---|
| Domains | Inference, CLI, Multimodal |
| Last Updated | 2026-02-07 14:00 GMT |
Overview
This script provides a simple command-line interface for running single-image inference with a LLaVA model, supporting both local files and URLs as image sources.
Description
The run_llava.py script is a lightweight CLI tool for quick LLaVA inference on individual image-question pairs. It implements:
load_image function: Handles image loading from two sources:
- HTTP/HTTPS URLs: Downloads the image via
requests.getand opens it from bytes - Local file paths: Opens directly from disk
Both paths convert the image to RGB format.
eval_model function: Runs the full inference pipeline:
- Loads the model via
load_pretrained_model - Prepends the appropriate image token to the query based on
mm_use_im_start_endconfig - Auto-detects conversation mode from the model name: "llava_llama_2" for LLaMA-2 models, "llava_v1" for v1 models, "mpt" for MPT models, and "llava_v0" as default
- Constructs the conversation prompt, tokenizes with image token insertion, and preprocesses the image
- Runs inference with temperature=0.2 sampling and 1024 max new tokens
- Prints the generated response to stdout
The script warns if the auto-inferred conversation mode differs from an explicitly provided --conv-mode argument.
Usage
Use this script for quick testing and demonstration of LLaVA model capabilities on individual images. It is not designed for batch evaluation.
Code Reference
Source Location
- Repository: OpenGVLab_InternVL
- File: internvl_chat_llava/llava/eval/run_llava.py
- Lines: 1-97
Signature
def load_image(image_file: str) -> Image.Image: ...
def eval_model(args: argparse.Namespace) -> None: ...
Import
from llava.eval.run_llava import load_image, eval_model
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| --model-path | str | Yes | Path to the pretrained LLaVA model |
| --model-base | str | No | Base model path for LoRA or projector-only models |
| --image-file | str | Yes | Path or URL to the input image |
| --query | str | Yes | The question to ask about the image |
| --conv-mode | str | No | Conversation template override (default: auto-detected from model name) |
Outputs
| Name | Type | Description |
|---|---|---|
| stdout | text | The model's generated text response printed to console |
Usage Examples
Basic Usage
# Run inference on a local image
# python internvl_chat_llava/llava/eval/run_llava.py \
# --model-path /path/to/llava-v1.5-7b \
# --image-file /path/to/image.jpg \
# --query "Describe this image in detail."
# Run inference on a URL image
# python internvl_chat_llava/llava/eval/run_llava.py \
# --model-path /path/to/llava-v1.5-7b \
# --image-file "https://example.com/image.jpg" \
# --query "What objects are in this image?"