Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Environment:Mlc ai Mlc llm Python Serving Environment

From Leeroopedia


Knowledge Sources
Domains Infrastructure, Web_Serving
Last Updated 2026-02-09 19:00 GMT

Overview

Python 3.9+ environment with FastAPI, uvicorn, and OpenAI client library for running MLC-LLM as an OpenAI-compatible REST API server.

Description

This environment provides the web serving layer for MLC-LLM. It uses FastAPI for the REST API framework and uvicorn as the ASGI server. The server exposes OpenAI-compatible endpoints (`/v1/chat/completions`, `/v1/completions`, `/v1/embeddings`) and supports streaming responses via Server-Sent Events. The OpenAI Python client is included as a dependency for testing and client-side usage. Additional dependencies include `shortuuid` for request ID generation, `prompt_toolkit` for interactive chat CLI, and `pandas`/`datasets` for benchmarking.

Usage

Use this environment for the REST API Serving workflow and for running the interactive chat CLI. It is also required for the benchmarking module (`mlc_llm bench`).

System Requirements

Category Requirement Notes
OS Linux, macOS, Windows Any platform with Python support
Python >= 3.9 Specified in pyproject.toml
Network Open port (default 8000) For REST API server

Dependencies

Python Packages

  • `fastapi` (REST API framework)
  • `uvicorn` (ASGI server)
  • `openai` (OpenAI client library for testing)
  • `requests` (HTTP client)
  • `shortuuid` (unique request ID generation)
  • `prompt_toolkit` (interactive chat CLI)
  • `pandas` (benchmarking data processing)
  • `datasets` (benchmarking dataset loading)
  • `tqdm` (progress bars)

Credentials

No credentials required for basic serving. For model downloading:

  • `HF_TOKEN`: HuggingFace API token for accessing gated models.

Quick Install

# Install all serving dependencies
pip install mlc-llm

# Or install individually
pip install fastapi uvicorn openai requests shortuuid prompt_toolkit pandas datasets tqdm

Code Evidence

Python package dependencies from `pyproject.toml:37-55`:

dependencies = [
    "apache-tvm-ffi",
    "datasets",
    "fastapi",
    "flashinfer-python; sys_platform == 'linux'",
    "ml_dtypes>=0.5.1",
    "openai",
    "pandas",
    "prompt_toolkit",
    "requests",
    "safetensors",
    "sentencepiece",
    "shortuuid",
    "tiktoken",
    "torch",
    "tqdm",
    "transformers",
    "uvicorn",
]

Python version requirement from `pyproject.toml:35`:

requires-python = ">=3.9"

Common Errors

Error Message Cause Solution
`Address already in use` Port 8000 already occupied Use `--port` flag to specify a different port
`ModuleNotFoundError: No module named 'fastapi'` FastAPI not installed `pip install fastapi uvicorn`

Compatibility Notes

  • FlashInfer: The `flashinfer-python` dependency is Linux-only (`sys_platform == 'linux'`). On macOS/Windows, it is automatically excluded.
  • All Platforms: The serving layer itself is platform-agnostic; only the GPU backend varies.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment