Implementation:Mlflow Mlflow Scoring Server Invocations
| Knowledge Sources | |
|---|---|
| Domains | ML_Ops, Model_Serving |
| Last Updated | 2026-02-13 20:00 GMT |
Overview
Concrete tool for handling prediction requests through the MLflow scoring server's HTTP endpoints, including input parsing, model invocation, and response formatting, provided by the MLflow library.
Description
The scoring server module implements the core inference pipeline for MLflow model serving. It defines the invocations() function that processes raw HTTP request data through content-type negotiation, input deserialization, model prediction, and response serialization. The init() function constructs a FastAPI application with routes for /invocations (POST), /ping (GET), /health (GET), and /version (GET).
The invocations() function supports three content types: application/json (with dataframe_split, dataframe_records, instances, or inputs keys), text/csv, and application/vnd.apache.parquet. For JSON payloads, it also handles the unified LLM input format for large language model serving. The function returns an InvocationsResponse named tuple containing the serialized predictions, HTTP status code, and MIME type.
The init() function sets up a FastAPI application with a configurable request timeout middleware and registers all route handlers. The /invocations endpoint delegates to the invocations() function in a thread pool to avoid blocking the async event loop during synchronous model prediction.
Usage
This implementation is used internally by MLflow whenever a model is served via mlflow models serve, within Docker containers, or through any MLflow-compatible serving infrastructure. Understanding this module is important when debugging prediction failures, customizing input processing, or extending the scoring server with additional endpoints.
Code Reference
Source Location
- Repository: mlflow
- File:
mlflow/pyfunc/scoring_server/__init__.py - Lines: L329-428 (invocations), L487-547 (init/app)
Signature
class InvocationsResponse(NamedTuple):
response: str
status: int
mimetype: str
def invocations(data, content_type, model, input_schema):
...
def init(model: PyFuncModel) -> FastAPI:
...
Import
from mlflow.pyfunc.scoring_server import invocations, init
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| data | bytes or str | Yes | Raw request body containing model input data |
| content_type | str | Yes | HTTP Content-Type header; one of application/json, text/csv, or application/vnd.apache.parquet
|
| model | PyFuncModel | Yes | Loaded MLflow PyFunc model instance with a predict() method
|
| input_schema | Schema | No | Model input schema for validation (may be None) |
Outputs
| Name | Type | Description |
|---|---|---|
| response | str | JSON-serialized prediction results |
| status | int | HTTP status code (200 on success, 415 for unsupported media type, 400/500 on error) |
| mimetype | str | Response MIME type (typically application/json)
|
Usage Examples
Basic Usage
from mlflow.pyfunc import load_model
from mlflow.pyfunc.scoring_server import invocations
# Load a model
model = load_model("runs:/abc123/my-model")
input_schema = model.metadata.get_input_schema()
# Simulate a JSON prediction request
import json
payload = json.dumps({"dataframe_split": {"columns": ["a", "b"], "data": [[1, 2], [3, 4]]}})
result = invocations(
data=payload,
content_type="application/json",
model=model,
input_schema=input_schema,
)
print(result.response) # JSON string of predictions
print(result.status) # 200
FastAPI App Initialization
from mlflow.pyfunc import load_model
from mlflow.pyfunc.scoring_server import init
# Load model and create the FastAPI application
model = load_model("runs:/abc123/my-model")
app = init(model)
# The app exposes:
# POST /invocations - prediction endpoint
# GET /ping - health check
# GET /health - health check (alias)
# GET /version - MLflow version