Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Sgl project Sglang LLaVA Video Pipeline

From Leeroopedia


Knowledge Sources
Domains Multimodal, Video Understanding
Last Updated 2026-02-10 00:00 GMT

Overview

Demonstrates video question-answering using the LLaVA-Video model with SGLang, supporting both single video and batch processing across distributed chunks.

Description

srt_example_llava_v.py uses the sgl.video() multimodal primitive to embed video frames in prompts and sgl.gen for generating descriptions. The core video_qa function, decorated with @sgl.function, takes a video path and question, constructs a user message with video frames, and generates an assistant response.

For batch processing, videos are split into configurable chunks (split_into_chunks) for multi-node distributed processing. Results are saved incrementally to CSV files via save_batch_results and compiled into final results via compile_and_cleanup_final_results. The script also handles video file discovery from directories, supporting .mp4, .avi, and .mov formats.

The main block configures the LLaVA-Video model with custom json_model_override_args including architecture type (LlavaVidForCausalLM), spatial pooling stride (mm_spatial_pool_stride), and optional RoPE scaling for 32-frame processing. It downloads a sample video from GitHub for testing and supports both 7B and 34B model variants with appropriate tokenizer paths.

Usage

Use this example for video understanding tasks including video description, question-answering, and batch video analysis. It demonstrates multi-node distributed video processing with SGLang's multimodal capabilities.

Code Reference

Source Location

Signature

@sgl.function
def video_qa(s, num_frames, video_path, question): ...

def single(path, num_frames=16): ...
def split_into_chunks(lst, num_chunks): ...
def save_batch_results(batch_video_files, states, cur_chunk, batch_idx, save_dir): ...
def compile_and_cleanup_final_results(cur_chunk, num_batches, save_dir): ...
def find_video_files(video_dir): ...
def batch(video_dir, save_dir, cur_chunk, num_chunks, num_frames=16, batch_size=64): ...

Import

import argparse
import csv
import json
import os
import time

import requests

import sglang as sgl

I/O Contract

Inputs

Name Type Required Description
--port int No (default: 30000) The master port for distributed serving
--chunk-idx int No (default: 0) The index of the chunk to process
--num-chunks int No (default: 8) The number of chunks for distributed processing
--save-dir str No (default: "./work_dirs/llava_video") Directory to save processed results
--video-dir str No (default: "~/.cache/jobs.mp4") Directory or path to video files
--model-path str No (default: "lmms-lab/LLaVA-NeXT-Video-7B") Model path for video processing
--num-frames int No (default: 16) Number of frames to extract from each video
--mm_spatial_pool_stride int No (default: 2) Spatial pooling stride for the vision module

Outputs

Name Type Description
Console output str Video descriptions printed to standard output
CSV files file Batch results saved as CSV with video_name and answer columns
Final CSV file Compiled final results per chunk (final_results_chunk_{idx}.csv)

Usage Examples

# Install dependency
# pip install opencv-python-headless

# Run single video processing
# python3 srt_example_llava_v.py

# Run with custom model and frame count
# python3 srt_example_llava_v.py --model-path lmms-lab/LLaVA-NeXT-Video-7B --num-frames 32

# Programmatic usage of the video_qa function
import sglang as sgl

@sgl.function
def video_qa(s, num_frames, video_path, question):
    s += sgl.user(sgl.video(video_path, num_frames) + question)
    s += sgl.assistant(sgl.gen("answer"))

state = video_qa.run(
    num_frames=16,
    video_path="path/to/video.mp4",
    question="Describe the video.",
    temperature=0.0,
    max_new_tokens=1024,
)
print(state["answer"])

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment