Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Haotian liu LLaVA CLI Main

From Leeroopedia

Overview

Concrete tool for running interactive multi-turn visual chat from the command line. The main() function loads a LLaVA model, processes an image, and enters an interactive conversation loop.

Source

  • File: llava/serve/cli.py
  • Lines: L27-111

Signature

def main(args) -> None:
    """
    Run interactive CLI chat with a LLaVA model.

    Args (via argparse namespace):
        args.model_path: str          # HuggingFace ID or local path to model
        args.model_base: str          # Base model path (for LoRA adapters)
        args.image_file: str          # URL or local path to image
        args.device: str = 'cuda'     # Device to load model on
        args.conv_mode: str           # Conversation template (auto-detected from model name)
        args.temperature: float = 0.2 # Sampling temperature
        args.max_new_tokens: int = 512 # Maximum tokens to generate
        args.load_8bit: bool          # Enable 8-bit quantization
        args.load_4bit: bool          # Enable 4-bit quantization
    """

CLI Usage

python -m llava.serve.cli \
    --model-path liuhaotian/llava-v1.5-13b \
    --image-file image.jpg

With 4-bit quantization:

python -m llava.serve.cli \
    --model-path liuhaotian/llava-v1.5-13b \
    --image-file image.jpg \
    --load-4bit

With LoRA adapter:

python -m llava.serve.cli \
    --model-path /path/to/lora-adapter \
    --model-base liuhaotian/llava-v1.5-13b \
    --image-file image.jpg

Import

from llava.serve.cli import main

Inputs

Parameter Type Required Description
model_path str Yes HuggingFace model ID or local checkpoint path
model_base str For LoRA Base model path when using LoRA adapters
image_file str Yes Path to local image file or HTTP URL
device str No Device for model loading (default: cuda)
conv_mode str No Conversation template (auto-detected from model name)
temperature float No Sampling temperature (default: 0.2)
max_new_tokens int No Max tokens to generate (default: 512)
load_8bit bool No Enable 8-bit quantization
load_4bit bool No Enable 4-bit NF4 quantization

Outputs

Interactive streaming text responses printed to the terminal. The user interacts via stdin, and the model's responses are streamed token-by-token to stdout.

Description

The main() function executes the following sequence:

  1. Load model -- Calls load_pretrained_model() to load the tokenizer, model, image processor, and determine context length.
  2. Load and process image -- Loads the image from a file path or URL, converts to RGB, and preprocesses it using process_images().
  3. Auto-detect conversation mode -- Determines the appropriate conversation template based on the model name (e.g., llava_v1, llava_llama_2, mistral_instruct).
  4. Enter interactive loop:
    • Read user input from stdin
    • If first turn, prepend <image>\n to the user message
    • Append user message to the Conversation object
    • Generate the full prompt via conv.get_prompt()
    • Tokenize with tokenizer_image_token()
    • Call model.generate() with TextStreamer for streaming output
    • Append assistant response to the conversation
    • Repeat until the user exits

Metadata

Field Value
Knowledge Sources Repo - LLaVA - https://github.com/haotian-liu/LLaVA
Domains User_Interface, Interactive_Inference
Last Updated 2026-02-13 14:00 GMT

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment