Principle:Haotian liu LLaVA Custom Dataset Formatting

Overview

Data format specification for structuring custom visual question-answering data into LLaVA's expected conversation format.

Description

LLaVA requires training data in a specific JSON conversation format. Each sample is a dict with three required keys:

"id" -- A unique string identifier for the sample.
"image" -- The relative filename of the image (relative to the --image_folder argument).
"conversations" -- A list of turn dicts, each containing "from" and "value" keys.

Human turns use "from": "human" and must include the <image> token placeholder in the "value" field. GPT/assistant turns use "from": "gpt". This format is consumed directly by LazySupervisedDataset (defined in llava/train/train.py:L658), which loads the JSON file and lazily processes each sample during training.

The conversation list supports both single-turn and multi-turn formats. In multi-turn conversations, the <image> token should appear only in the first human turn. Each subsequent human-GPT pair adds an additional training target.

Usage

Use this pattern when preparing custom data for LoRA or full finetuning of LLaVA. All training data must conform to this schema. The JSON file path is passed via --data_path and the image directory via --image_folder in the training command.

Theoretical Basis

The conversation format maps directly to tokenization: human turns become input context (masked from loss computation), and GPT turns become training targets. The <image> placeholder is replaced during tokenization with IMAGE_TOKEN_INDEX (-200) tokens, which are later expanded to visual embeddings from the CLIP vision tower via the multimodal projector.

This masking strategy ensures the model only learns to predict assistant responses, not to reproduce user queries, which aligns with the standard instruction-tuning objective for language models.

Knowledge Sources

Doc -- Finetune Custom Data -- https://github.com/haotian-liu/LLaVA/blob/main/docs/Finetune_Custom_Data.md

Domains

Data_Engineering
Fine_Tuning

Metadata

Field	Value
last_updated	2026-02-13 14:00 GMT
source_repo	Haotian_liu_LLaVA
commit	799f5f207c89
type	Principle

Related Pages

Implementation:Haotian_liu_LLaVA_LLaVA_Conversation_Format

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment