Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Haifengl Smile Inference Request Processing

From Leeroopedia


Overview

Inference Request Processing encompasses the end-to-end flow of receiving input features from a client, routing the request to the correct model, executing the prediction (classification or regression), and returning structured results. This principle covers schema validation, type dispatching, and the mathematical semantics of the prediction output.

Theoretical Basis

Model Inference as Function Evaluation

At its core, model inference is the evaluation of a learned function:

f(x) -> y_hat

where x is a feature vector (the input tuple) and y_hat is the predicted output. The function f was learned during training and is now frozen inside the serialized model. The inference server's job is to:

  1. Accept x from the client (as JSON or CSV).
  2. Parse x into the model's expected internal representation (a Tuple).
  3. Evaluate f(x) using the loaded model.
  4. Format and return y_hat to the client.

Classification vs. Regression

The nature of y_hat depends on the model type:

Model Type Output Semantics
Classification Integer label The predicted class index. For example, 0, 1, or 2 for a 3-class problem.
Soft Classification Integer label + probability array x) for each class k. For example, prediction: 1, probabilities: [0.1, 0.9].
Regression Real-valued number The predicted continuous value. For example, a house price prediction of 245000.0.

For soft classifiers (those that implement probability estimation, such as random forests, logistic regression, and gradient boosted trees), the inference also computes the full posterior probability distribution across all classes. This is valuable for:

  • Confidence assessment -- the maximum probability indicates prediction confidence.
  • Threshold tuning -- a client can apply a custom decision threshold rather than using argmax.
  • Multi-label scenarios -- examining probabilities for all classes, not just the top prediction.

Schema Validation

Before prediction can proceed, the input must be validated against the model's expected schema. The schema is a StructType captured at training time that describes:

  • The exact set of input feature names (excluding the response variable).
  • The data type of each feature (double, int, String, etc.).
  • Nullability constraints.

If the client's input JSON or CSV does not contain enough fields to match the schema, the server rejects the request with a 400 Bad Request error. This validation is a critical safety check that prevents:

  • Index-out-of-bounds errors from missing features.
  • Type casting exceptions from incompatible data types.
  • Silent prediction errors from features in the wrong order (for CSV) or with wrong names (for JSON).

Input Format Conversion

The inference server accepts input in multiple formats and converts them to Smile's internal Tuple representation:

Input Format Conversion Method Field Mapping
JSON object InferenceModel.json(JsonObject) Fields matched by name from the schema
CSV string InferenceModel.csv(String) Fields matched by position (comma-separated, same order as schema)

The Tuple is Smile's row abstraction -- an array of typed values conforming to a StructType. It bridges the gap between external data formats (JSON, CSV) and the model's internal representation.

Type Dispatching

Smile models are polymorphic. The Model interface has two primary implementations:

  • ClassificationModel -- wraps a DataFrameClassifier.
  • RegressionModel -- wraps a DataFrameRegression.

The predict method uses Java's pattern matching switch to dispatch to the correct implementation at runtime:

Number y = switch (model) {
    case ClassificationModel m -> m.predict(x, probabilities);
    case RegressionModel m -> m.predict(x);
    default -> 0;
};

This pattern enables a single REST endpoint to serve both classification and regression models without the client needing to specify the model type.

Request Processing Pipeline

The complete request processing pipeline consists of these stages:

  1. HTTP Receipt -- JAX-RS receives the POST request with JSON body.
  2. Model Routing -- The {id} path parameter identifies which model to invoke.
  3. Input Parsing -- The JSON body is parsed into a JsonObject by Vert.x.
  4. Schema Validation -- Feature count is validated against the model's StructType.
  5. Tuple Construction -- JSON fields are extracted by name and assembled into a Tuple.
  6. Type Dispatch -- The model type (classification vs. regression) determines the prediction path.
  7. Prediction -- The model's predict() method evaluates the learned function.
  8. Response Construction -- The prediction and optional probabilities are wrapped in an InferenceResponse.
  9. JSON Serialization -- Jackson serializes the response to JSON (with custom probability formatting).
  10. HTTP Response -- The JSON is returned to the client.

Design Considerations

Stateless Request Processing

Each prediction request is processed independently. The model is read-only, so no state is modified between requests. This enables safe concurrent processing of multiple requests.

Probability Formatting

Posterior probabilities are formatted to 3 decimal places using a custom Jackson serializer (ProbabilitySerializer). This keeps response payloads compact while providing sufficient precision for most applications.

Knowledge Sources

Smile

Domains

MLOps, Model_Deployment, Machine_Learning

Related

Implementation:Haifengl_Smile_InferenceModel_Predict

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment