Principle:Haifengl Smile Inference Request Processing
Overview
Inference Request Processing encompasses the end-to-end flow of receiving input features from a client, routing the request to the correct model, executing the prediction (classification or regression), and returning structured results. This principle covers schema validation, type dispatching, and the mathematical semantics of the prediction output.
Theoretical Basis
Model Inference as Function Evaluation
At its core, model inference is the evaluation of a learned function:
f(x) -> y_hat
where x is a feature vector (the input tuple) and y_hat is the predicted output. The function f was learned during training and is now frozen inside the serialized model. The inference server's job is to:
- Accept
xfrom the client (as JSON or CSV). - Parse
xinto the model's expected internal representation (aTuple). - Evaluate
f(x)using the loaded model. - Format and return
y_hatto the client.
Classification vs. Regression
The nature of y_hat depends on the model type:
| Model Type | Output | Semantics |
|---|---|---|
| Classification | Integer label | The predicted class index. For example, 0, 1, or 2 for a 3-class problem.
|
| Soft Classification | Integer label + probability array | x) for each class k. For example, prediction: 1, probabilities: [0.1, 0.9].
|
| Regression | Real-valued number | The predicted continuous value. For example, a house price prediction of 245000.0.
|
For soft classifiers (those that implement probability estimation, such as random forests, logistic regression, and gradient boosted trees), the inference also computes the full posterior probability distribution across all classes. This is valuable for:
- Confidence assessment -- the maximum probability indicates prediction confidence.
- Threshold tuning -- a client can apply a custom decision threshold rather than using
argmax. - Multi-label scenarios -- examining probabilities for all classes, not just the top prediction.
Schema Validation
Before prediction can proceed, the input must be validated against the model's expected schema. The schema is a StructType captured at training time that describes:
- The exact set of input feature names (excluding the response variable).
- The data type of each feature (
double,int,String, etc.). - Nullability constraints.
If the client's input JSON or CSV does not contain enough fields to match the schema, the server rejects the request with a 400 Bad Request error. This validation is a critical safety check that prevents:
- Index-out-of-bounds errors from missing features.
- Type casting exceptions from incompatible data types.
- Silent prediction errors from features in the wrong order (for CSV) or with wrong names (for JSON).
Input Format Conversion
The inference server accepts input in multiple formats and converts them to Smile's internal Tuple representation:
| Input Format | Conversion Method | Field Mapping |
|---|---|---|
| JSON object | InferenceModel.json(JsonObject) |
Fields matched by name from the schema |
| CSV string | InferenceModel.csv(String) |
Fields matched by position (comma-separated, same order as schema) |
The Tuple is Smile's row abstraction -- an array of typed values conforming to a StructType. It bridges the gap between external data formats (JSON, CSV) and the model's internal representation.
Type Dispatching
Smile models are polymorphic. The Model interface has two primary implementations:
ClassificationModel-- wraps aDataFrameClassifier.RegressionModel-- wraps aDataFrameRegression.
The predict method uses Java's pattern matching switch to dispatch to the correct implementation at runtime:
Number y = switch (model) {
case ClassificationModel m -> m.predict(x, probabilities);
case RegressionModel m -> m.predict(x);
default -> 0;
};
This pattern enables a single REST endpoint to serve both classification and regression models without the client needing to specify the model type.
Request Processing Pipeline
The complete request processing pipeline consists of these stages:
- HTTP Receipt -- JAX-RS receives the POST request with JSON body.
- Model Routing -- The
{id}path parameter identifies which model to invoke. - Input Parsing -- The JSON body is parsed into a
JsonObjectby Vert.x. - Schema Validation -- Feature count is validated against the model's
StructType. - Tuple Construction -- JSON fields are extracted by name and assembled into a
Tuple. - Type Dispatch -- The model type (classification vs. regression) determines the prediction path.
- Prediction -- The model's
predict()method evaluates the learned function. - Response Construction -- The prediction and optional probabilities are wrapped in an
InferenceResponse. - JSON Serialization -- Jackson serializes the response to JSON (with custom probability formatting).
- HTTP Response -- The JSON is returned to the client.
Design Considerations
Stateless Request Processing
Each prediction request is processed independently. The model is read-only, so no state is modified between requests. This enables safe concurrent processing of multiple requests.
Probability Formatting
Posterior probabilities are formatted to 3 decimal places using a custom Jackson serializer (ProbabilitySerializer). This keeps response payloads compact while providing sufficient precision for most applications.
Knowledge Sources
Domains
MLOps, Model_Deployment, Machine_Learning