Implementation:Sgl project Sglang LLaVA Video Pipeline
| Knowledge Sources | |
|---|---|
| Domains | Multimodal, Video Understanding |
| Last Updated | 2026-02-10 00:00 GMT |
Overview
Demonstrates video question-answering using the LLaVA-Video model with SGLang, supporting both single video and batch processing across distributed chunks.
Description
srt_example_llava_v.py uses the sgl.video() multimodal primitive to embed video frames in prompts and sgl.gen for generating descriptions. The core video_qa function, decorated with @sgl.function, takes a video path and question, constructs a user message with video frames, and generates an assistant response.
For batch processing, videos are split into configurable chunks (split_into_chunks) for multi-node distributed processing. Results are saved incrementally to CSV files via save_batch_results and compiled into final results via compile_and_cleanup_final_results. The script also handles video file discovery from directories, supporting .mp4, .avi, and .mov formats.
The main block configures the LLaVA-Video model with custom json_model_override_args including architecture type (LlavaVidForCausalLM), spatial pooling stride (mm_spatial_pool_stride), and optional RoPE scaling for 32-frame processing. It downloads a sample video from GitHub for testing and supports both 7B and 34B model variants with appropriate tokenizer paths.
Usage
Use this example for video understanding tasks including video description, question-answering, and batch video analysis. It demonstrates multi-node distributed video processing with SGLang's multimodal capabilities.
Code Reference
Source Location
- Repository: Sgl_project_Sglang
- File: examples/frontend_language/usage/llava_video/srt_example_llava_v.py
- Lines: 1-260
Signature
@sgl.function
def video_qa(s, num_frames, video_path, question): ...
def single(path, num_frames=16): ...
def split_into_chunks(lst, num_chunks): ...
def save_batch_results(batch_video_files, states, cur_chunk, batch_idx, save_dir): ...
def compile_and_cleanup_final_results(cur_chunk, num_batches, save_dir): ...
def find_video_files(video_dir): ...
def batch(video_dir, save_dir, cur_chunk, num_chunks, num_frames=16, batch_size=64): ...
Import
import argparse
import csv
import json
import os
import time
import requests
import sglang as sgl
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| --port | int | No (default: 30000) | The master port for distributed serving |
| --chunk-idx | int | No (default: 0) | The index of the chunk to process |
| --num-chunks | int | No (default: 8) | The number of chunks for distributed processing |
| --save-dir | str | No (default: "./work_dirs/llava_video") | Directory to save processed results |
| --video-dir | str | No (default: "~/.cache/jobs.mp4") | Directory or path to video files |
| --model-path | str | No (default: "lmms-lab/LLaVA-NeXT-Video-7B") | Model path for video processing |
| --num-frames | int | No (default: 16) | Number of frames to extract from each video |
| --mm_spatial_pool_stride | int | No (default: 2) | Spatial pooling stride for the vision module |
Outputs
| Name | Type | Description |
|---|---|---|
| Console output | str | Video descriptions printed to standard output |
| CSV files | file | Batch results saved as CSV with video_name and answer columns |
| Final CSV | file | Compiled final results per chunk (final_results_chunk_{idx}.csv) |
Usage Examples
# Install dependency
# pip install opencv-python-headless
# Run single video processing
# python3 srt_example_llava_v.py
# Run with custom model and frame count
# python3 srt_example_llava_v.py --model-path lmms-lab/LLaVA-NeXT-Video-7B --num-frames 32
# Programmatic usage of the video_qa function
import sglang as sgl
@sgl.function
def video_qa(s, num_frames, video_path, question):
s += sgl.user(sgl.video(video_path, num_frames) + question)
s += sgl.assistant(sgl.gen("answer"))
state = video_qa.run(
num_frames=16,
video_path="path/to/video.mp4",
question="Describe the video.",
temperature=0.0,
max_new_tokens=1024,
)
print(state["answer"])