Implementation:Mlflow Mlflow Models Serve CLI
| Knowledge Sources | |
|---|---|
| Domains | ML_Ops, Model_Serving |
| Last Updated | 2026-02-13 20:00 GMT |
Overview
Concrete tool for launching a local HTTP server that serves an MLflow model for inference, provided by the MLflow library.
Description
The mlflow models serve CLI command starts a web server that loads an MLflow model from a specified URI and exposes it through standard REST endpoints. The server supports the python_function and crate (R Function) flavors. It handles environment setup through configurable environment managers (virtualenv, conda, or local), model deserialization, and HTTP request routing. The default backend uses uvicorn/gunicorn, but users can optionally enable the MLServer backend for KServe/Seldon compatibility.
The command delegates to the appropriate flavor backend via get_flavor_backend(), which resolves the model flavor and constructs the serving infrastructure. The server exposes two primary endpoints: /invocations for receiving prediction requests and /ping for health checks. Input data can be provided in JSON (with dataframe_split, dataframe_records, instances, or inputs keys), CSV, or Parquet format.
Usage
Use this command when you need to serve a model locally for development testing, integration validation, or quick demonstrations. It is typically invoked after logging a model with mlflow.log_model() and provides a fast way to verify that the model works correctly behind an HTTP interface before containerizing or deploying to production.
Code Reference
Source Location
- Repository: mlflow
- File:
mlflow/models/cli.py - Lines: L24-106
Signature
@commands.command("serve")
def serve(
model_uri,
port,
host,
timeout,
workers,
env_manager=None,
no_conda=False,
install_mlflow=False,
enable_mlserver=False,
):
Import
# CLI invocation (not imported directly)
# Usage: mlflow models serve -m <model_uri> [OPTIONS]
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| model_uri | str | Yes | URI to the model (e.g., runs:/<run_id>/model, models:/<name>/<version>, or a local path)
|
| port | int | No | Port to listen on (default: 5000) |
| host | str | No | Host interface to bind to (default: 127.0.0.1) |
| timeout | int | No | Request timeout in seconds for the scoring server |
| workers | int | No | Number of server worker processes |
| env_manager | str | No | Environment manager to use: virtualenv, conda, or local
|
| no_conda | bool | No | If True, use local environment (deprecated in favor of env_manager) |
| install_mlflow | bool | No | If True, install MLflow into the model environment |
| enable_mlserver | bool | No | If True, use Seldon MLServer as the serving backend |
Outputs
| Name | Type | Description |
|---|---|---|
| HTTP Server | Running process | A web server bound to http://host:port with /invocations and /ping endpoints
|
Usage Examples
Basic Usage
# Serve a model from a run artifact
mlflow models serve -m runs:/abc123/my-model --port 5000
# Send a prediction request using records-oriented JSON
curl http://127.0.0.1:5000/invocations \
-H 'Content-Type: application/json' \
-d '{"dataframe_records": [{"feature_a": 1, "feature_b": 2}]}'
# Serve using a specific environment manager
mlflow models serve -m models:/my-model/1 --env-manager virtualenv
# Serve with MLServer backend enabled
mlflow models serve -m runs:/abc123/my-model --enable-mlserver