Environment:Sgl project Sglang Multimodal
Appearance
Sgl_project_Sglang_Multimodal is the multimodal model dependencies environment for SGLang, providing the libraries needed to serve vision-language models (VLMs) that process both text and image/video inputs.
Requirements
- Python 3.10+
- PyTorch 2.9.1+ with CUDA support
- `transformers` >= 4.57.1 (with vision model support)
- `pillow` for image processing
- `torchvision` for image transforms
- `torchaudio` and `torchcodec` for video/audio processing
- `einops` for tensor reshaping operations
- GPU with sufficient VRAM (16GB+ recommended for multimodal models)
Required By
See Also
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment