Principle:Haifengl Smile Inference Request Processing

Overview

Inference Request Processing encompasses the end-to-end flow of receiving input features from a client, routing the request to the correct model, executing the prediction (classification or regression), and returning structured results. This principle covers schema validation, type dispatching, and the mathematical semantics of the prediction output.

Theoretical Basis

Model Inference as Function Evaluation

At its core, model inference is the evaluation of a learned function:

f(x) -> y_hat

where x is a feature vector (the input tuple) and y_hat is the predicted output. The function f was learned during training and is now frozen inside the serialized model. The inference server's job is to:

Accept x from the client (as JSON or CSV).
Parse x into the model's expected internal representation (a Tuple).
Evaluate f(x) using the loaded model.
Format and return y_hat to the client.

Classification vs. Regression

The nature of y_hat depends on the model type:

Model Type	Output	Semantics
Classification	Integer label	The predicted class index. For example, `0`, `1`, or `2` for a 3-class problem.
Soft Classification	Integer label + probability array	x) for each class `k`. For example, `prediction: 1, probabilities: [0.1, 0.9]`.
Regression	Real-valued number	The predicted continuous value. For example, a house price prediction of `245000.0`.

For soft classifiers (those that implement probability estimation, such as random forests, logistic regression, and gradient boosted trees), the inference also computes the full posterior probability distribution across all classes. This is valuable for:

Confidence assessment -- the maximum probability indicates prediction confidence.
Threshold tuning -- a client can apply a custom decision threshold rather than using argmax.
Multi-label scenarios -- examining probabilities for all classes, not just the top prediction.

Schema Validation

Before prediction can proceed, the input must be validated against the model's expected schema. The schema is a StructType captured at training time that describes:

The exact set of input feature names (excluding the response variable).
The data type of each feature (double, int, String, etc.).
Nullability constraints.

If the client's input JSON or CSV does not contain enough fields to match the schema, the server rejects the request with a 400 Bad Request error. This validation is a critical safety check that prevents:

Index-out-of-bounds errors from missing features.
Type casting exceptions from incompatible data types.
Silent prediction errors from features in the wrong order (for CSV) or with wrong names (for JSON).

Input Format Conversion

The inference server accepts input in multiple formats and converts them to Smile's internal Tuple representation:

Input Format	Conversion Method	Field Mapping
JSON object	`InferenceModel.json(JsonObject)`	Fields matched by name from the schema
CSV string	`InferenceModel.csv(String)`	Fields matched by position (comma-separated, same order as schema)

The Tuple is Smile's row abstraction -- an array of typed values conforming to a StructType. It bridges the gap between external data formats (JSON, CSV) and the model's internal representation.

Type Dispatching

Smile models are polymorphic. The Model interface has two primary implementations:

ClassificationModel -- wraps a DataFrameClassifier.
RegressionModel -- wraps a DataFrameRegression.

The predict method uses Java's pattern matching switch to dispatch to the correct implementation at runtime:

Number y = switch (model) {
    case ClassificationModel m -> m.predict(x, probabilities);
    case RegressionModel m -> m.predict(x);
    default -> 0;
};

This pattern enables a single REST endpoint to serve both classification and regression models without the client needing to specify the model type.

Request Processing Pipeline

The complete request processing pipeline consists of these stages:

HTTP Receipt -- JAX-RS receives the POST request with JSON body.
Model Routing -- The {id} path parameter identifies which model to invoke.
Input Parsing -- The JSON body is parsed into a JsonObject by Vert.x.
Schema Validation -- Feature count is validated against the model's StructType.
Tuple Construction -- JSON fields are extracted by name and assembled into a Tuple.
Type Dispatch -- The model type (classification vs. regression) determines the prediction path.
Prediction -- The model's predict() method evaluates the learned function.
Response Construction -- The prediction and optional probabilities are wrapped in an InferenceResponse.
JSON Serialization -- Jackson serializes the response to JSON (with custom probability formatting).
HTTP Response -- The JSON is returned to the client.

Design Considerations

Stateless Request Processing

Each prediction request is processed independently. The model is read-only, so no state is modified between requests. This enables safe concurrent processing of multiple requests.

Probability Formatting

Posterior probabilities are formatted to 3 decimal places using a custom Jackson serializer (ProbabilitySerializer). This keeps response payloads compact while providing sufficient precision for most applications.

Knowledge Sources

Smile

Domains

MLOps, Model_Deployment, Machine_Learning

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment