Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:OpenGVLab InternVL LLaVA CLI

From Leeroopedia
Revision as of 16:14, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/OpenGVLab_InternVL_LLaVA_CLI.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Knowledge Sources
Domains Inference, CLI, LLaVA
Last Updated 2026-02-07 14:00 GMT

Overview

Interactive command-line chat interface for LLaVA multimodal models, enabling conversational image understanding from the terminal.

Description

This script provides a terminal-based interactive demo for LLaVA models. It loads a pretrained model via load_pretrained_model() with optional 8-bit or 4-bit quantization, then enters an interactive loop. The load_image() helper supports both local file paths and HTTP/HTTPS URLs. The conversation mode is auto-detected from the model name (supporting llava_llama_2, llava_v1, mpt, and llava_v0 templates), with an override via --conv-mode. On the first user message, the image is prepended to the prompt using DEFAULT_IMAGE_TOKEN (with optional start/end tokens based on model config). Subsequent messages are text-only within the same conversation context. Token generation uses torch.inference_mode(), sampling with configurable temperature and max_new_tokens, and HuggingFace's TextStreamer for real-time token-by-token output. The conversation history is maintained across turns, with a KeywordsStoppingCriteria to halt generation at the conversation separator.

Usage

Use this script for quick terminal-based testing of LLaVA models with a single image and multi-turn text conversation, without needing to set up a web server.

Code Reference

Source Location

Signature

def load_image(image_file): ...
def main(args): ...

Import

# Standalone script, run directly:
# python -m llava.serve.cli --model-path <path> --image-file <path>

I/O Contract

Inputs

Name Type Required Description
--model-path str Yes Path to the pretrained LLaVA model
--image-file str Yes Path or URL to the input image
--model-base str No Base model path for LoRA models
--device str No Device to run on (default: "cuda")
--conv-mode str No Conversation template override
--temperature float No Sampling temperature (default: 0.2)
--max-new-tokens int No Maximum generated tokens (default: 512)
--load-8bit flag No Enable 8-bit quantization
--load-4bit flag No Enable 4-bit quantization

Outputs

Name Type Description
stdout text Streamed model responses printed to terminal

Usage Examples

Basic Usage

# From command line:
# python -m llava.serve.cli \
#     --model-path liuhaotian/llava-v1.5-7b \
#     --image-file https://example.com/image.jpg \
#     --temperature 0.2

# Interactive session:
# user: What do you see in this image?
# assistant: I see a cat sitting on a windowsill...

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment