Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Mlflow Mlflow Models Serve CLI

From Leeroopedia
Knowledge Sources
Domains ML_Ops, Model_Serving
Last Updated 2026-02-13 20:00 GMT

Overview

Concrete tool for launching a local HTTP server that serves an MLflow model for inference, provided by the MLflow library.

Description

The mlflow models serve CLI command starts a web server that loads an MLflow model from a specified URI and exposes it through standard REST endpoints. The server supports the python_function and crate (R Function) flavors. It handles environment setup through configurable environment managers (virtualenv, conda, or local), model deserialization, and HTTP request routing. The default backend uses uvicorn/gunicorn, but users can optionally enable the MLServer backend for KServe/Seldon compatibility.

The command delegates to the appropriate flavor backend via get_flavor_backend(), which resolves the model flavor and constructs the serving infrastructure. The server exposes two primary endpoints: /invocations for receiving prediction requests and /ping for health checks. Input data can be provided in JSON (with dataframe_split, dataframe_records, instances, or inputs keys), CSV, or Parquet format.

Usage

Use this command when you need to serve a model locally for development testing, integration validation, or quick demonstrations. It is typically invoked after logging a model with mlflow.log_model() and provides a fast way to verify that the model works correctly behind an HTTP interface before containerizing or deploying to production.

Code Reference

Source Location

  • Repository: mlflow
  • File: mlflow/models/cli.py
  • Lines: L24-106

Signature

@commands.command("serve")
def serve(
    model_uri,
    port,
    host,
    timeout,
    workers,
    env_manager=None,
    no_conda=False,
    install_mlflow=False,
    enable_mlserver=False,
):

Import

# CLI invocation (not imported directly)
# Usage: mlflow models serve -m <model_uri> [OPTIONS]

I/O Contract

Inputs

Name Type Required Description
model_uri str Yes URI to the model (e.g., runs:/<run_id>/model, models:/<name>/<version>, or a local path)
port int No Port to listen on (default: 5000)
host str No Host interface to bind to (default: 127.0.0.1)
timeout int No Request timeout in seconds for the scoring server
workers int No Number of server worker processes
env_manager str No Environment manager to use: virtualenv, conda, or local
no_conda bool No If True, use local environment (deprecated in favor of env_manager)
install_mlflow bool No If True, install MLflow into the model environment
enable_mlserver bool No If True, use Seldon MLServer as the serving backend

Outputs

Name Type Description
HTTP Server Running process A web server bound to http://host:port with /invocations and /ping endpoints

Usage Examples

Basic Usage

# Serve a model from a run artifact
mlflow models serve -m runs:/abc123/my-model --port 5000

# Send a prediction request using records-oriented JSON
curl http://127.0.0.1:5000/invocations \
  -H 'Content-Type: application/json' \
  -d '{"dataframe_records": [{"feature_a": 1, "feature_b": 2}]}'

# Serve using a specific environment manager
mlflow models serve -m models:/my-model/1 --env-manager virtualenv

# Serve with MLServer backend enabled
mlflow models serve -m runs:/abc123/my-model --enable-mlserver

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment