Environment:Mlc ai Mlc llm Python Serving Environment
| Knowledge Sources | |
|---|---|
| Domains | Infrastructure, Web_Serving |
| Last Updated | 2026-02-09 19:00 GMT |
Overview
Python 3.9+ environment with FastAPI, uvicorn, and OpenAI client library for running MLC-LLM as an OpenAI-compatible REST API server.
Description
This environment provides the web serving layer for MLC-LLM. It uses FastAPI for the REST API framework and uvicorn as the ASGI server. The server exposes OpenAI-compatible endpoints (`/v1/chat/completions`, `/v1/completions`, `/v1/embeddings`) and supports streaming responses via Server-Sent Events. The OpenAI Python client is included as a dependency for testing and client-side usage. Additional dependencies include `shortuuid` for request ID generation, `prompt_toolkit` for interactive chat CLI, and `pandas`/`datasets` for benchmarking.
Usage
Use this environment for the REST API Serving workflow and for running the interactive chat CLI. It is also required for the benchmarking module (`mlc_llm bench`).
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| OS | Linux, macOS, Windows | Any platform with Python support |
| Python | >= 3.9 | Specified in pyproject.toml |
| Network | Open port (default 8000) | For REST API server |
Dependencies
Python Packages
- `fastapi` (REST API framework)
- `uvicorn` (ASGI server)
- `openai` (OpenAI client library for testing)
- `requests` (HTTP client)
- `shortuuid` (unique request ID generation)
- `prompt_toolkit` (interactive chat CLI)
- `pandas` (benchmarking data processing)
- `datasets` (benchmarking dataset loading)
- `tqdm` (progress bars)
Credentials
No credentials required for basic serving. For model downloading:
- `HF_TOKEN`: HuggingFace API token for accessing gated models.
Quick Install
# Install all serving dependencies
pip install mlc-llm
# Or install individually
pip install fastapi uvicorn openai requests shortuuid prompt_toolkit pandas datasets tqdm
Code Evidence
Python package dependencies from `pyproject.toml:37-55`:
dependencies = [
"apache-tvm-ffi",
"datasets",
"fastapi",
"flashinfer-python; sys_platform == 'linux'",
"ml_dtypes>=0.5.1",
"openai",
"pandas",
"prompt_toolkit",
"requests",
"safetensors",
"sentencepiece",
"shortuuid",
"tiktoken",
"torch",
"tqdm",
"transformers",
"uvicorn",
]
Python version requirement from `pyproject.toml:35`:
requires-python = ">=3.9"
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
| `Address already in use` | Port 8000 already occupied | Use `--port` flag to specify a different port |
| `ModuleNotFoundError: No module named 'fastapi'` | FastAPI not installed | `pip install fastapi uvicorn` |
Compatibility Notes
- FlashInfer: The `flashinfer-python` dependency is Linux-only (`sys_platform == 'linux'`). On macOS/Windows, it is automatically excluded.
- All Platforms: The serving layer itself is platform-agnostic; only the GPU backend varies.