Environment:Mlc ai Mlc llm Python Serving Environment

Knowledge Sources	MLC-LLM FastAPI
Domains	Infrastructure, Web_Serving
Last Updated	2026-02-09 19:00 GMT

Overview

Python 3.9+ environment with FastAPI, uvicorn, and OpenAI client library for running MLC-LLM as an OpenAI-compatible REST API server.

Description

This environment provides the web serving layer for MLC-LLM. It uses FastAPI for the REST API framework and uvicorn as the ASGI server. The server exposes OpenAI-compatible endpoints (`/v1/chat/completions`, `/v1/completions`, `/v1/embeddings`) and supports streaming responses via Server-Sent Events. The OpenAI Python client is included as a dependency for testing and client-side usage. Additional dependencies include `shortuuid` for request ID generation, `prompt_toolkit` for interactive chat CLI, and `pandas`/`datasets` for benchmarking.

Usage

Use this environment for the REST API Serving workflow and for running the interactive chat CLI. It is also required for the benchmarking module (`mlc_llm bench`).

System Requirements

Category	Requirement	Notes
OS	Linux, macOS, Windows	Any platform with Python support
Python	>= 3.9	Specified in pyproject.toml
Network	Open port (default 8000)	For REST API server

Dependencies

Python Packages

`fastapi` (REST API framework)
`uvicorn` (ASGI server)
`openai` (OpenAI client library for testing)
`requests` (HTTP client)
`shortuuid` (unique request ID generation)
`prompt_toolkit` (interactive chat CLI)
`pandas` (benchmarking data processing)
`datasets` (benchmarking dataset loading)
`tqdm` (progress bars)

Credentials

No credentials required for basic serving. For model downloading:

`HF_TOKEN`: HuggingFace API token for accessing gated models.

Quick Install

# Install all serving dependencies
pip install mlc-llm

# Or install individually
pip install fastapi uvicorn openai requests shortuuid prompt_toolkit pandas datasets tqdm

Code Evidence

Python package dependencies from `pyproject.toml:37-55`:

dependencies = [
    "apache-tvm-ffi",
    "datasets",
    "fastapi",
    "flashinfer-python; sys_platform == 'linux'",
    "ml_dtypes>=0.5.1",
    "openai",
    "pandas",
    "prompt_toolkit",
    "requests",
    "safetensors",
    "sentencepiece",
    "shortuuid",
    "tiktoken",
    "torch",
    "tqdm",
    "transformers",
    "uvicorn",
]

Python version requirement from `pyproject.toml:35`:

requires-python = ">=3.9"

Common Errors

Error Message	Cause	Solution
`Address already in use`	Port 8000 already occupied	Use `--port` flag to specify a different port
`ModuleNotFoundError: No module named 'fastapi'`	FastAPI not installed	`pip install fastapi uvicorn`

Compatibility Notes

FlashInfer: The `flashinfer-python` dependency is Linux-only (`sys_platform == 'linux'`). On macOS/Windows, it is automatically excluded.
All Platforms: The serving layer itself is platform-agnostic; only the GPU backend varies.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment