Implementation:Mlflow Mlflow Models Serve CLI

Knowledge Sources	MLflow MLflow Models CLI
Domains	ML_Ops, Model_Serving
Last Updated	2026-02-13 20:00 GMT

Overview

Concrete tool for launching a local HTTP server that serves an MLflow model for inference, provided by the MLflow library.

Description

The mlflow models serve CLI command starts a web server that loads an MLflow model from a specified URI and exposes it through standard REST endpoints. The server supports the python_function and crate (R Function) flavors. It handles environment setup through configurable environment managers (virtualenv, conda, or local), model deserialization, and HTTP request routing. The default backend uses uvicorn/gunicorn, but users can optionally enable the MLServer backend for KServe/Seldon compatibility.

The command delegates to the appropriate flavor backend via get_flavor_backend(), which resolves the model flavor and constructs the serving infrastructure. The server exposes two primary endpoints: /invocations for receiving prediction requests and /ping for health checks. Input data can be provided in JSON (with dataframe_split, dataframe_records, instances, or inputs keys), CSV, or Parquet format.

Usage

Use this command when you need to serve a model locally for development testing, integration validation, or quick demonstrations. It is typically invoked after logging a model with mlflow.log_model() and provides a fast way to verify that the model works correctly behind an HTTP interface before containerizing or deploying to production.

Code Reference

Source Location

Repository: mlflow
File: mlflow/models/cli.py
Lines: L24-106

Signature

@commands.command("serve")
def serve(
    model_uri,
    port,
    host,
    timeout,
    workers,
    env_manager=None,
    no_conda=False,
    install_mlflow=False,
    enable_mlserver=False,
):

Import

# CLI invocation (not imported directly)
# Usage: mlflow models serve -m <model_uri> [OPTIONS]

I/O Contract

Inputs

Name	Type	Required	Description
model_uri	str	Yes	URI to the model (e.g., `runs:/<run_id>/model`, `models:/<name>/<version>`, or a local path)
port	int	No	Port to listen on (default: 5000)
host	str	No	Host interface to bind to (default: 127.0.0.1)
timeout	int	No	Request timeout in seconds for the scoring server
workers	int	No	Number of server worker processes
env_manager	str	No	Environment manager to use: `virtualenv`, `conda`, or `local`
no_conda	bool	No	If True, use local environment (deprecated in favor of env_manager)
install_mlflow	bool	No	If True, install MLflow into the model environment
enable_mlserver	bool	No	If True, use Seldon MLServer as the serving backend

Outputs

Name	Type	Description
HTTP Server	Running process	A web server bound to `http://host:port` with `/invocations` and `/ping` endpoints

Usage Examples

Basic Usage

# Serve a model from a run artifact
mlflow models serve -m runs:/abc123/my-model --port 5000

# Send a prediction request using records-oriented JSON
curl http://127.0.0.1:5000/invocations \
  -H 'Content-Type: application/json' \
  -d '{"dataframe_records": [{"feature_a": 1, "feature_b": 2}]}'

# Serve using a specific environment manager
mlflow models serve -m models:/my-model/1 --env-manager virtualenv

# Serve with MLServer backend enabled
mlflow models serve -m runs:/abc123/my-model --enable-mlserver

Related Pages

Implements Principle

Principle:Mlflow_Mlflow_Local_Model_Serving

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment