Principle:SeldonIO Seldon core V2 Inference Protocol
| Property | Value |
|---|---|
| Principle Name | V2_Inference_Protocol |
| Overview | Standardized request-response protocol for ML model inference based on the Open Inference Protocol (V2). |
| Workflow | Model_Deployment |
| Domains | MLOps, Inference |
| Related Implementation | SeldonIO_Seldon_core_Seldon_Model_Infer |
| Last Updated | 2026-02-13 00:00 GMT |
Description
The V2 Inference Protocol (also called the Open Inference Protocol) provides a vendor-neutral REST and gRPC API for running inference against deployed ML models. Requests contain typed tensors with name, shape, datatype, and data fields. Responses contain model outputs in the same tensor format. This standardization means that clients can interact with any V2-compliant inference server without modification, regardless of whether the backend is MLServer, Triton Inference Server, TorchServe, or another compatible runtime.
The protocol defines several key endpoints:
- Model Inference:
POST /v2/models/{model_name}/inferfor running predictions - Model Metadata:
GET /v2/models/{model_name}for retrieving model information - Model Ready:
GET /v2/models/{model_name}/readyfor checking model readiness - Server Ready:
GET /v2/health/readyfor checking server readiness
The inference payload structure is the core of the protocol:
{
"inputs": [
{
"name": "predict",
"shape": [1, 4],
"datatype": "FP32",
"data": [[5.1, 3.5, 1.4, 0.2]]
}
]
}
Theoretical Basis
The V2 protocol standardizes ML serving interfaces by defining a common payload format: inputs as named tensors with explicit shapes and datatypes. The supported datatypes include:
- Numeric types:
FP32,FP64,INT8,INT16,INT32,INT64,UINT8,UINT16,UINT32,UINT64,FP16,BF16 - Boolean:
BOOL - String/Binary:
BYTES
This enables interoperability between inference servers (MLServer, Triton, TorchServe) because:
- Payload format is framework-agnostic: The same JSON/protobuf structure works for sklearn, TensorFlow, PyTorch, and custom models
- Type safety is explicit: Datatypes are declared rather than inferred, preventing silent type coercion errors
- Shape validation is built-in: The server can validate that input shapes match model expectations before running inference
The gRPC variant of the protocol uses protobuf definitions from the inference.GRPCInferenceService service, offering lower latency and more efficient serialization for high-throughput workloads.
In Seldon Core 2, the V2 protocol is the primary interface for all model interactions. The Seldon envoy proxy routes requests to the appropriate model based on the model name, and the inference server translates the V2 payload into framework-specific tensor formats internally.
Usage
This principle applies when sending prediction requests to any Seldon Core 2 model or pipeline. The protocol is used for:
- Single model inference: Direct prediction requests to individual models
- Pipeline inference: Requests routed through multi-step inference pipelines
- A/B testing: Identical payloads sent to different model versions for comparison
REST Example
curl -X POST http://localhost:9000/v2/models/iris/infer \
-H "Content-Type: application/json" \
-d '{"inputs": [{"name": "predict", "shape": [1, 4], "datatype": "FP32", "data": [[5.1, 3.5, 1.4, 0.2]]}]}'
gRPC Example
grpcurl -d '{"model_name": "iris", "inputs": [{"name": "predict", "shape": [1, 4], "datatype": "FP32", "contents": {"fp32_contents": [5.1, 3.5, 1.4, 0.2]}}]}' \
localhost:9000 inference.GRPCInferenceService/ModelInfer
Response Format
{
"model_name": "iris",
"model_version": "v0.1.0",
"outputs": [
{
"name": "predict",
"shape": [1],
"datatype": "INT64",
"data": [0]
}
]
}
Related Pages
- SeldonIO_Seldon_core_Seldon_Model_Infer implements SeldonIO_Seldon_core_V2_Inference_Protocol
- SeldonIO_Seldon_core_Model_Readiness_Verification precedes SeldonIO_Seldon_core_V2_Inference_Protocol
- SeldonIO_Seldon_core_Model_Deployment_Execution enables SeldonIO_Seldon_core_V2_Inference_Protocol
- Heuristic:SeldonIO_Seldon_core_Tracing_Latency_Tip
Implementation:SeldonIO_Seldon_core_Seldon_Model_Infer Implementation:SeldonIO_Seldon_core_Open_Inference_Protocol_V2_OpenAPI