Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:OpenGVLab InternVL Run LLaVA CLI

From Leeroopedia


Knowledge Sources
Domains Inference, CLI, Multimodal
Last Updated 2026-02-07 14:00 GMT

Overview

This script provides a simple command-line interface for running single-image inference with a LLaVA model, supporting both local files and URLs as image sources.

Description

The run_llava.py script is a lightweight CLI tool for quick LLaVA inference on individual image-question pairs. It implements:

load_image function: Handles image loading from two sources:

  • HTTP/HTTPS URLs: Downloads the image via requests.get and opens it from bytes
  • Local file paths: Opens directly from disk

Both paths convert the image to RGB format.

eval_model function: Runs the full inference pipeline:

  1. Loads the model via load_pretrained_model
  2. Prepends the appropriate image token to the query based on mm_use_im_start_end config
  3. Auto-detects conversation mode from the model name: "llava_llama_2" for LLaMA-2 models, "llava_v1" for v1 models, "mpt" for MPT models, and "llava_v0" as default
  4. Constructs the conversation prompt, tokenizes with image token insertion, and preprocesses the image
  5. Runs inference with temperature=0.2 sampling and 1024 max new tokens
  6. Prints the generated response to stdout

The script warns if the auto-inferred conversation mode differs from an explicitly provided --conv-mode argument.

Usage

Use this script for quick testing and demonstration of LLaVA model capabilities on individual images. It is not designed for batch evaluation.

Code Reference

Source Location

Signature

def load_image(image_file: str) -> Image.Image: ...

def eval_model(args: argparse.Namespace) -> None: ...

Import

from llava.eval.run_llava import load_image, eval_model

I/O Contract

Inputs

Name Type Required Description
--model-path str Yes Path to the pretrained LLaVA model
--model-base str No Base model path for LoRA or projector-only models
--image-file str Yes Path or URL to the input image
--query str Yes The question to ask about the image
--conv-mode str No Conversation template override (default: auto-detected from model name)

Outputs

Name Type Description
stdout text The model's generated text response printed to console

Usage Examples

Basic Usage

# Run inference on a local image
# python internvl_chat_llava/llava/eval/run_llava.py \
#     --model-path /path/to/llava-v1.5-7b \
#     --image-file /path/to/image.jpg \
#     --query "Describe this image in detail."

# Run inference on a URL image
# python internvl_chat_llava/llava/eval/run_llava.py \
#     --model-path /path/to/llava-v1.5-7b \
#     --image-file "https://example.com/image.jpg" \
#     --query "What objects are in this image?"

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment