Workflow:Haifengl Smile Model Serving Pipeline

Knowledge Sources	Smile Smile Docs Quarkus
Domains	MLOps, Model_Serving
Last Updated	2026-02-08 21:00 GMT

Overview

End-to-end process for serializing trained Smile machine learning models, deploying them via the Smile Serve inference server, and consuming predictions through REST API endpoints.

Description

This workflow covers the model serving lifecycle in Smile. It begins with serializing a trained model (classification or regression) to a .sml file using Java serialization. The Smile Serve module, built on Quarkus, automatically loads serialized models at startup and exposes them through OpenAI-compatible REST API endpoints. Clients can query available models, retrieve metadata, and send inference requests in JSON or CSV format. The server also supports stream processing for batch predictions.

Usage

Execute this workflow when you have trained a Smile classification or regression model and need to deploy it as a REST API for production inference. This is the standard path for operationalizing Smile models in web services, microservices, or batch prediction pipelines.

Execution Steps

Step 1: Train and Serialize the Model

Train a classification or regression model using the Smile API, then serialize it to a .sml file. The model object must implement the Model interface, which wraps the trained classifier or regressor along with its schema metadata and version tags.

Key considerations:

Models implement Java Serializable interface for persistence
The .sml file format is standard Java object serialization
Model tags include ID and VERSION for identification in the serving layer
The model schema (StructType) is preserved for input validation at inference time
Both ClassificationModel and RegressionModel are supported

Step 2: Configure and Start the Inference Server

Deploy the Smile Serve application, which is a Quarkus-based REST server. Configure the model directory path (defaults to ../model) and server port. On startup, InferenceService scans the model directory and loads all .sml files.

Key considerations:

Default model directory is ../model (configurable via smile.serve.model system property)
Default server port is 8080 (configurable via quarkus.http.port)
The server can load individual .sml files or all files in a directory
Models are loaded once at startup and cached in memory
The server requires JVM flags: --add-opens java.base/java.lang=ALL-UNNAMED

Step 3: Discover Available Models

Query the /v1/models endpoint to list all loaded models and their identifiers. Each model is identified by a combination of its ID tag and version number. Retrieve detailed metadata for a specific model using /v1/models/{modelId}.

Key considerations:

GET /v1/models returns a list of model identifiers
GET /v1/models/{modelId} returns metadata including schema, type, and algorithm
Model IDs follow the format: {name}-{version}
The metadata includes the input schema for constructing valid requests

Step 4: Send Inference Requests

Submit prediction requests to the model endpoint. Requests can be sent as JSON objects (with field names matching the model schema) or as CSV-formatted text. For classification models with soft predictions, the response includes class probabilities.

Key considerations:

POST /v1/models/{modelId} accepts JSON body with feature values
The JSON keys must match the model schema field names
ClassificationModel returns predicted class label and optional probabilities
RegressionModel returns the predicted numeric value
Input validation checks that all required fields are present

Step 5: Process Batch Predictions via Streaming

For batch inference, use the stream endpoint to process multiple records efficiently. Send a stream of CSV lines or JSON objects, and receive predictions line by line using reactive streaming.

Key considerations:

POST /v1/models/{modelId}/stream accepts text/plain (CSV) or application/json content
Each line is processed independently and results are streamed back
Uses Mutiny reactive streams for non-blocking I/O
Suitable for processing large files or continuous data feeds

Execution Diagram

GitHub URL

Workflow Repository