Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Mlflow Mlflow Scoring Server Invocations

From Leeroopedia
Knowledge Sources
Domains ML_Ops, Model_Serving
Last Updated 2026-02-13 20:00 GMT

Overview

Concrete tool for handling prediction requests through the MLflow scoring server's HTTP endpoints, including input parsing, model invocation, and response formatting, provided by the MLflow library.

Description

The scoring server module implements the core inference pipeline for MLflow model serving. It defines the invocations() function that processes raw HTTP request data through content-type negotiation, input deserialization, model prediction, and response serialization. The init() function constructs a FastAPI application with routes for /invocations (POST), /ping (GET), /health (GET), and /version (GET).

The invocations() function supports three content types: application/json (with dataframe_split, dataframe_records, instances, or inputs keys), text/csv, and application/vnd.apache.parquet. For JSON payloads, it also handles the unified LLM input format for large language model serving. The function returns an InvocationsResponse named tuple containing the serialized predictions, HTTP status code, and MIME type.

The init() function sets up a FastAPI application with a configurable request timeout middleware and registers all route handlers. The /invocations endpoint delegates to the invocations() function in a thread pool to avoid blocking the async event loop during synchronous model prediction.

Usage

This implementation is used internally by MLflow whenever a model is served via mlflow models serve, within Docker containers, or through any MLflow-compatible serving infrastructure. Understanding this module is important when debugging prediction failures, customizing input processing, or extending the scoring server with additional endpoints.

Code Reference

Source Location

  • Repository: mlflow
  • File: mlflow/pyfunc/scoring_server/__init__.py
  • Lines: L329-428 (invocations), L487-547 (init/app)

Signature

class InvocationsResponse(NamedTuple):
    response: str
    status: int
    mimetype: str


def invocations(data, content_type, model, input_schema):
    ...


def init(model: PyFuncModel) -> FastAPI:
    ...

Import

from mlflow.pyfunc.scoring_server import invocations, init

I/O Contract

Inputs

Name Type Required Description
data bytes or str Yes Raw request body containing model input data
content_type str Yes HTTP Content-Type header; one of application/json, text/csv, or application/vnd.apache.parquet
model PyFuncModel Yes Loaded MLflow PyFunc model instance with a predict() method
input_schema Schema No Model input schema for validation (may be None)

Outputs

Name Type Description
response str JSON-serialized prediction results
status int HTTP status code (200 on success, 415 for unsupported media type, 400/500 on error)
mimetype str Response MIME type (typically application/json)

Usage Examples

Basic Usage

from mlflow.pyfunc import load_model
from mlflow.pyfunc.scoring_server import invocations

# Load a model
model = load_model("runs:/abc123/my-model")
input_schema = model.metadata.get_input_schema()

# Simulate a JSON prediction request
import json
payload = json.dumps({"dataframe_split": {"columns": ["a", "b"], "data": [[1, 2], [3, 4]]}})

result = invocations(
    data=payload,
    content_type="application/json",
    model=model,
    input_schema=input_schema,
)
print(result.response)  # JSON string of predictions
print(result.status)     # 200

FastAPI App Initialization

from mlflow.pyfunc import load_model
from mlflow.pyfunc.scoring_server import init

# Load model and create the FastAPI application
model = load_model("runs:/abc123/my-model")
app = init(model)

# The app exposes:
#   POST /invocations  - prediction endpoint
#   GET  /ping         - health check
#   GET  /health       - health check (alias)
#   GET  /version      - MLflow version

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment