Principle:Haifengl Smile Model Discovery
Overview
Model Discovery is the principle of enabling clients to programmatically enumerate deployed models and inspect their metadata (algorithm type, input schema, version tags) through well-defined API endpoints. Before sending prediction requests, a client can query the server to determine which models are available, what input features each model expects, and what data types those features require.
Theoretical Basis
Service Discovery in Microservices
Service discovery is a foundational pattern in microservice architectures. It allows consumers to locate and interact with services without hardcoding addresses or interfaces. In the context of ML model serving, model discovery extends this pattern to the model level:
- Service discovery answers: "Where is the inference server?"
- Model discovery answers: "What models does this server host, and how do I call them?"
This two-level discovery enables dynamic client configurations. A frontend application, an orchestration engine, or a monitoring dashboard can introspect the server's capabilities at runtime rather than relying on out-of-band documentation.
Model Metadata
Each deployed model exposes metadata that describes its contract with clients:
| Metadata Field | Purpose | Example |
|---|---|---|
| ID | Unique identifier for routing requests | "iris-classifier-2"
|
| Algorithm | The learning algorithm used | "random-forest"
|
| Schema | Input feature names, types, and nullability | {"sepal_length": {"type": "double", "nullable": false}}
|
| Tags | Arbitrary key-value metadata (version, author, etc.) | {"version": "2", "author": "haifeng"}
|
This metadata serves multiple purposes:
- Client validation -- clients can validate their request payloads against the schema before sending.
- Documentation -- the metadata endpoint is a self-describing API; no separate documentation is needed.
- Monitoring -- operations teams can query which models and versions are currently deployed.
- Automation -- CI/CD pipelines can verify that a newly deployed model is correctly registered.
HATEOAS and RESTful Resource Listing
The model discovery API follows REST conventions:
- Collection endpoint (
GET /v1/models) -- returns a list of all available model identifiers. This is analogous to a "list resources" operation in any RESTful API. - Instance endpoint (
GET /v1/models/{id}) -- returns detailed metadata for a specific model. This follows the REST pattern of addressing individual resources by ID within a collection.
These endpoints enable HATEOAS (Hypermedia as the Engine of Application State) principles: a client can start with the collection endpoint, discover available model IDs, and then navigate to individual model metadata endpoints to learn the input contract before sending prediction requests.
Schema Introspection
The schema portion of model metadata is particularly important for ML serving. Unlike traditional web services where the API contract is defined at development time, an ML model's input schema is determined at training time based on the training data's feature columns. The schema encodes:
- Feature names -- the exact field names the model expects (e.g.,
"sepal_length","petal_width"). - Data types -- whether each feature is
double,int,String, etc. - Nullability -- whether the model can handle missing values for a given feature.
By exposing this schema via the discovery API, the server communicates the training-time contract to inference-time clients, bridging the gap between the data scientist who trained the model and the application developer who consumes it.
Design Considerations
Static vs. Dynamic Model Registration
Smile's serve module uses static registration -- models are loaded at startup from a configured path and the registry does not change at runtime. This is simpler and avoids consistency issues, but requires a server restart to deploy new models. An alternative approach would be dynamic registration (hot-loading models), which trades simplicity for operational flexibility.
Model ID Format
In Smile, the model ID is composed of the model's id tag and version tag joined by a hyphen (e.g., "iris-classifier-2"). This format embeds version information directly in the ID, enabling version-aware routing without a separate versioning mechanism.
Knowledge Sources
Domains
MLOps, Model_Deployment, Microservices