Implementation:Mlflow Mlflow Scoring Server Invocations

Knowledge Sources	MLflow MLflow Scoring Server
Domains	ML_Ops, Model_Serving
Last Updated	2026-02-13 20:00 GMT

Overview

Concrete tool for handling prediction requests through the MLflow scoring server's HTTP endpoints, including input parsing, model invocation, and response formatting, provided by the MLflow library.

Description

The scoring server module implements the core inference pipeline for MLflow model serving. It defines the invocations() function that processes raw HTTP request data through content-type negotiation, input deserialization, model prediction, and response serialization. The init() function constructs a FastAPI application with routes for /invocations (POST), /ping (GET), /health (GET), and /version (GET).

The invocations() function supports three content types: application/json (with dataframe_split, dataframe_records, instances, or inputs keys), text/csv, and application/vnd.apache.parquet. For JSON payloads, it also handles the unified LLM input format for large language model serving. The function returns an InvocationsResponse named tuple containing the serialized predictions, HTTP status code, and MIME type.

The init() function sets up a FastAPI application with a configurable request timeout middleware and registers all route handlers. The /invocations endpoint delegates to the invocations() function in a thread pool to avoid blocking the async event loop during synchronous model prediction.

Usage

This implementation is used internally by MLflow whenever a model is served via mlflow models serve, within Docker containers, or through any MLflow-compatible serving infrastructure. Understanding this module is important when debugging prediction failures, customizing input processing, or extending the scoring server with additional endpoints.

Code Reference

Source Location

Repository: mlflow
File: mlflow/pyfunc/scoring_server/__init__.py
Lines: L329-428 (invocations), L487-547 (init/app)

Signature

class InvocationsResponse(NamedTuple):
    response: str
    status: int
    mimetype: str


def invocations(data, content_type, model, input_schema):
    ...


def init(model: PyFuncModel) -> FastAPI:
    ...

Import

from mlflow.pyfunc.scoring_server import invocations, init

I/O Contract

Inputs

Name	Type	Required	Description
data	bytes or str	Yes	Raw request body containing model input data
content_type	str	Yes	HTTP Content-Type header; one of `application/json`, `text/csv`, or `application/vnd.apache.parquet`
model	PyFuncModel	Yes	Loaded MLflow PyFunc model instance with a `predict()` method
input_schema	Schema	No	Model input schema for validation (may be None)

Outputs

Name	Type	Description
response	str	JSON-serialized prediction results
status	int	HTTP status code (200 on success, 415 for unsupported media type, 400/500 on error)
mimetype	str	Response MIME type (typically `application/json`)

Usage Examples

Basic Usage

from mlflow.pyfunc import load_model
from mlflow.pyfunc.scoring_server import invocations

# Load a model
model = load_model("runs:/abc123/my-model")
input_schema = model.metadata.get_input_schema()

# Simulate a JSON prediction request
import json
payload = json.dumps({"dataframe_split": {"columns": ["a", "b"], "data": [[1, 2], [3, 4]]}})

result = invocations(
    data=payload,
    content_type="application/json",
    model=model,
    input_schema=input_schema,
)
print(result.response)  # JSON string of predictions
print(result.status)     # 200

FastAPI App Initialization

from mlflow.pyfunc import load_model
from mlflow.pyfunc.scoring_server import init

# Load model and create the FastAPI application
model = load_model("runs:/abc123/my-model")
app = init(model)

# The app exposes:
#   POST /invocations  - prediction endpoint
#   GET  /ping         - health check
#   GET  /health       - health check (alias)
#   GET  /version      - MLflow version

Related Pages

Implements Principle

Principle:Mlflow_Mlflow_Prediction_Endpoint

Requires Environment

Environment:Mlflow_Mlflow_MLflow_Server_Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment