Principle:Haotian liu LLaVA CLI Interactive Chat

Overview

Interactive command-line interface pattern for multi-turn visual conversation with streaming text output.

Description

The CLI chat provides a terminal-based REPL (Read-Eval-Print Loop) for multi-turn conversation with a LLaVA model about a single image. Unlike the distributed controller-worker-Gradio stack, the CLI loads the model directly into the process.

Key characteristics:

Direct model loading -- No controller or worker architecture; the model is loaded directly into the CLI process.
Single-image context -- The image is processed once at startup and reused across all conversation turns.
Multi-turn conversation -- The user can ask multiple sequential questions, with full conversation history maintained across turns.
Streaming output -- Responses stream token-by-token using TextStreamer for real-time terminal output.
Conversation history -- Messages are appended to a Conversation object, and the full prompt is regenerated each turn.

Usage

Use for quick testing and debugging of LLaVA models without the overhead of deploying the controller-worker-Gradio stack.

Supported configurations:

LoRA models -- Use --model-base to specify the base model when loading a LoRA adapter.
Quantized inference -- Use --load-4bit or --load-8bit for inference on smaller GPUs.
Custom conversation modes -- Use --conv-mode to override the auto-detected conversation template.

Theoretical Basis

Multi-turn conversation is implemented by appending messages to a Conversation object and regenerating the full prompt each turn. This means the full conversation history is re-tokenized on every turn, which is acceptable for interactive use but not optimal for high-throughput serving.

Token streaming uses TextStreamer, which hooks into model.generate() to print tokens to stdout as they are produced. This provides immediate visual feedback in the terminal.

Image token placement: The <image> token is prepended to the first user message only. On subsequent turns, the image context is carried implicitly through the conversation history and the cached image tensor.

Metadata

Field	Value
Knowledge Sources	Repo - LLaVA - https://github.com/haotian-liu/LLaVA
Domains	User_Interface, Interactive_Inference
Last Updated	2026-02-13 14:00 GMT

Related Pages

Implementation:Haotian_liu_LLaVA_CLI_Main

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment