Heuristic:Mlflow Mlflow Model Signature Inference Tips
| Knowledge Sources | |
|---|---|
| Domains | Model_Management, Best_Practices |
| Last Updated | 2026-02-13 20:00 GMT |
Overview
Best practices for model signature inference to ensure correct schema enforcement during prediction, including supported input types, type hint integration, and common pitfalls.
Description
Model signatures define the expected input and output schema for MLflow models. When logging a model, MLflow can automatically infer the signature from an input example or from Python type hints on PythonModel subclasses. However, inference can fail silently, falling back to `AnyType` which disables schema validation. Understanding the supported types and common failure modes is essential for building production-ready models.
Usage
Use this heuristic when logging models with `mlflow.pyfunc.log_model()` or any flavor-specific log function. Correct signatures prevent runtime errors during serving, ensure API documentation is accurate, and enable input validation at the scoring server.
The Insight (Rule of Thumb)
- Action: Always provide an `input_example` when logging models. For custom PythonModel classes, use Python type hints on the `predict()` method for automatic schema enforcement.
- Value: Supported input types: pandas DataFrame/Series, numpy ndarray, dict of numpy arrays, PySpark DataFrame, scipy sparse matrices, JSON-convertible dicts/lists.
- Trade-off: If inference fails, the signature silently defaults to `AnyType` — meaning no validation at serving time. Enable DEBUG logging to see inference failures.
Supported Pattern:
import mlflow
from mlflow.models import infer_signature
# Best: Explicit signature from actual data
signature = infer_signature(X_train, model.predict(X_train))
mlflow.sklearn.log_model(model, "model", signature=signature)
# Good: Input example (signature inferred automatically)
mlflow.sklearn.log_model(model, "model", input_example=X_train[:5])
# For PythonModel: Use type hints for automatic validation
class MyModel(mlflow.pyfunc.PythonModel):
def predict(self, context, model_input: pd.DataFrame) -> pd.DataFrame:
return model_input * 2
Reasoning
Silent fallback to AnyType from `mlflow/models/signature.py:279`:
# If schema inference fails, defaults to:
Schema([ColSpec(type=AnyType())])
# This disables all input validation during serving
Warning template for inference failures from `mlflow/models/signature.py:57-61`:
"Failed to infer the model signature from the input example. Reason: %s. "
"To see the full traceback, set the logging level to DEBUG via "
"`logging.getLogger('mlflow').setLevel(logging.DEBUG)`."
Extra inputs warning from `mlflow/models/utils.py:1260-1263`:
"Found extra inputs in the model input that are not defined in the model "
"signature: `{extra_cols}`. These inputs will be ignored."
PySpark date handling from `mlflow/models/signature.py:209-211`:
# Both DateType and TimestampType inferred as datetime
# This may cause precision loss for timestamps
Key pitfalls:
- Extra columns in input are silently dropped during schema enforcement
- PySpark `DateType` and `TimestampType` are both inferred as `datetime`
- Multi-dimensional numpy arrays may fail when used as DataFrame columns
- Feature Store models have special exemptions for array/map/struct columns
- Complex nested types (arrays of structs) are not supported in ColSpec signatures