Principle:EvolvingLMMs Lab Lmms eval Environment Setup
| Knowledge Sources | |
|---|---|
| Domains | Evaluation, Infrastructure |
| Last Updated | 2026-02-14 00:00 GMT |
Overview
Environment setup is the process of installing all required dependencies and configuring entry points so that an evaluation framework can be invoked from the command line or imported as a library.
Description
Before any multi-modal model evaluation can take place, the runtime environment must contain the correct versions of dozens of interdependent packages spanning deep learning inference, dataset loading, metrics computation, and result formatting. The lmms-eval framework codifies these requirements in a single pyproject.toml manifest that enumerates core dependencies, optional feature groups (audio, video, metrics, server, TUI), and console script entry points.
The core dependency set includes:
- accelerate (>=0.29.1) -- Hugging Face device-placement and distributed launch utilities.
- datasets (>=2.19.0) -- Hugging Face Datasets for streaming and caching evaluation data.
- torch (>=2.1.0) -- Required for SDPA attention and running large models on single GPUs.
- transformers (>=4.39.2) -- Model tokenizers, configs, and stopping-criteria utilities.
Beyond core dependencies, optional groups such as [audio], [video], [metrics], and [server] allow users to install only what is needed for their specific evaluation tasks. The [all] extra pulls in every optional group.
The framework registers two console entry points:
lmms-eval-- maps tolmms_eval.__main__:cli_evaluate, the primary evaluation CLI.lmms-eval-ui-- maps tolmms_eval.tui.cli:main, a terminal UI for browsing results.
Usage
Use environment setup whenever:
- You are setting up a new machine or CI runner for evaluation.
- You need to add a new optional dependency group (e.g., adding video decoding support).
- You want to ensure reproducible environments across team members via lockfile-based installation (
uv sync). - You are troubleshooting version conflicts between torch, transformers, and accelerate.
Theoretical Basis
The dependency resolution process follows PEP 517/518 build-system conventions:
- The build backend (setuptools) reads
pyproject.toml. - Core and optional dependencies are resolved against PyPI (or a configured index).
- The solver produces a locked dependency graph satisfying all version constraints.
- Console entry points are installed as executable shims pointing to the specified Python callables.
The key constraint in multi-modal evaluation frameworks is the torch-transformers-accelerate triangle: these three libraries must be version-compatible to enable features like SDPA attention, FSDP sharding, and automatic device mapping. The minimum versions declared in pyproject.toml encode the known-good lower bounds for this compatibility.
For reproducibility, the project supports lockfile-based workflows where uv sync reads uv.lock to install bit-identical environments across machines.