Principle:Haotian liu LLaVA Custom Dataset Formatting
Overview
Data format specification for structuring custom visual question-answering data into LLaVA's expected conversation format.
Description
LLaVA requires training data in a specific JSON conversation format. Each sample is a dict with three required keys:
- "id" -- A unique string identifier for the sample.
- "image" -- The relative filename of the image (relative to the --image_folder argument).
- "conversations" -- A list of turn dicts, each containing "from" and "value" keys.
Human turns use "from": "human" and must include the <image> token placeholder in the "value" field. GPT/assistant turns use "from": "gpt". This format is consumed directly by LazySupervisedDataset (defined in llava/train/train.py:L658), which loads the JSON file and lazily processes each sample during training.
The conversation list supports both single-turn and multi-turn formats. In multi-turn conversations, the <image> token should appear only in the first human turn. Each subsequent human-GPT pair adds an additional training target.
Usage
Use this pattern when preparing custom data for LoRA or full finetuning of LLaVA. All training data must conform to this schema. The JSON file path is passed via --data_path and the image directory via --image_folder in the training command.
Theoretical Basis
The conversation format maps directly to tokenization: human turns become input context (masked from loss computation), and GPT turns become training targets. The <image> placeholder is replaced during tokenization with IMAGE_TOKEN_INDEX (-200) tokens, which are later expanded to visual embeddings from the CLIP vision tower via the multimodal projector.
This masking strategy ensures the model only learns to predict assistant responses, not to reproduce user queries, which aligns with the standard instruction-tuning objective for language models.
Knowledge Sources
- Doc -- Finetune Custom Data -- https://github.com/haotian-liu/LLaVA/blob/main/docs/Finetune_Custom_Data.md
Domains
- Data_Engineering
- Fine_Tuning
Metadata
| Field | Value |
|---|---|
| last_updated | 2026-02-13 14:00 GMT |
| source_repo | Haotian_liu_LLaVA |
| commit | 799f5f207c89 |
| type | Principle |