Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Workflow:Haifengl Smile Model Serving Pipeline

From Leeroopedia


Knowledge Sources
Domains MLOps, Model_Serving
Last Updated 2026-02-08 21:00 GMT

Overview

End-to-end process for serializing trained Smile machine learning models, deploying them via the Smile Serve inference server, and consuming predictions through REST API endpoints.

Description

This workflow covers the model serving lifecycle in Smile. It begins with serializing a trained model (classification or regression) to a .sml file using Java serialization. The Smile Serve module, built on Quarkus, automatically loads serialized models at startup and exposes them through OpenAI-compatible REST API endpoints. Clients can query available models, retrieve metadata, and send inference requests in JSON or CSV format. The server also supports stream processing for batch predictions.

Usage

Execute this workflow when you have trained a Smile classification or regression model and need to deploy it as a REST API for production inference. This is the standard path for operationalizing Smile models in web services, microservices, or batch prediction pipelines.

Execution Steps

Step 1: Train and Serialize the Model

Train a classification or regression model using the Smile API, then serialize it to a .sml file. The model object must implement the Model interface, which wraps the trained classifier or regressor along with its schema metadata and version tags.

Key considerations:

  • Models implement Java Serializable interface for persistence
  • The .sml file format is standard Java object serialization
  • Model tags include ID and VERSION for identification in the serving layer
  • The model schema (StructType) is preserved for input validation at inference time
  • Both ClassificationModel and RegressionModel are supported

Step 2: Configure and Start the Inference Server

Deploy the Smile Serve application, which is a Quarkus-based REST server. Configure the model directory path (defaults to ../model) and server port. On startup, InferenceService scans the model directory and loads all .sml files.

Key considerations:

  • Default model directory is ../model (configurable via smile.serve.model system property)
  • Default server port is 8080 (configurable via quarkus.http.port)
  • The server can load individual .sml files or all files in a directory
  • Models are loaded once at startup and cached in memory
  • The server requires JVM flags: --add-opens java.base/java.lang=ALL-UNNAMED

Step 3: Discover Available Models

Query the /v1/models endpoint to list all loaded models and their identifiers. Each model is identified by a combination of its ID tag and version number. Retrieve detailed metadata for a specific model using /v1/models/{modelId}.

Key considerations:

  • GET /v1/models returns a list of model identifiers
  • GET /v1/models/{modelId} returns metadata including schema, type, and algorithm
  • Model IDs follow the format: {name}-{version}
  • The metadata includes the input schema for constructing valid requests

Step 4: Send Inference Requests

Submit prediction requests to the model endpoint. Requests can be sent as JSON objects (with field names matching the model schema) or as CSV-formatted text. For classification models with soft predictions, the response includes class probabilities.

Key considerations:

  • POST /v1/models/{modelId} accepts JSON body with feature values
  • The JSON keys must match the model schema field names
  • ClassificationModel returns predicted class label and optional probabilities
  • RegressionModel returns the predicted numeric value
  • Input validation checks that all required fields are present

Step 5: Process Batch Predictions via Streaming

For batch inference, use the stream endpoint to process multiple records efficiently. Send a stream of CSV lines or JSON objects, and receive predictions line by line using reactive streaming.

Key considerations:

  • POST /v1/models/{modelId}/stream accepts text/plain (CSV) or application/json content
  • Each line is processed independently and results are streamed back
  • Uses Mutiny reactive streams for non-blocking I/O
  • Suitable for processing large files or continuous data feeds

Execution Diagram

GitHub URL

Workflow Repository