Principle:SeldonIO Seldon core V2 Inference Protocol

Property	Value
Principle Name	V2_Inference_Protocol
Overview	Standardized request-response protocol for ML model inference based on the Open Inference Protocol (V2).
Workflow	Model_Deployment
Domains	MLOps, Inference
Related Implementation	SeldonIO_Seldon_core_Seldon_Model_Infer
Last Updated	2026-02-13 00:00 GMT

Description

The V2 Inference Protocol (also called the Open Inference Protocol) provides a vendor-neutral REST and gRPC API for running inference against deployed ML models. Requests contain typed tensors with name, shape, datatype, and data fields. Responses contain model outputs in the same tensor format. This standardization means that clients can interact with any V2-compliant inference server without modification, regardless of whether the backend is MLServer, Triton Inference Server, TorchServe, or another compatible runtime.

The protocol defines several key endpoints:

Model Inference: POST /v2/models/{model_name}/infer for running predictions
Model Metadata: GET /v2/models/{model_name} for retrieving model information
Model Ready: GET /v2/models/{model_name}/ready for checking model readiness
Server Ready: GET /v2/health/ready for checking server readiness

The inference payload structure is the core of the protocol:

{
  "inputs": [
    {
      "name": "predict",
      "shape": [1, 4],
      "datatype": "FP32",
      "data": [[5.1, 3.5, 1.4, 0.2]]
    }
  ]
}

Theoretical Basis

The V2 protocol standardizes ML serving interfaces by defining a common payload format: inputs as named tensors with explicit shapes and datatypes. The supported datatypes include:

Numeric types: FP32, FP64, INT8, INT16, INT32, INT64, UINT8, UINT16, UINT32, UINT64, FP16, BF16
Boolean: BOOL
String/Binary: BYTES

This enables interoperability between inference servers (MLServer, Triton, TorchServe) because:

Payload format is framework-agnostic: The same JSON/protobuf structure works for sklearn, TensorFlow, PyTorch, and custom models
Type safety is explicit: Datatypes are declared rather than inferred, preventing silent type coercion errors
Shape validation is built-in: The server can validate that input shapes match model expectations before running inference

The gRPC variant of the protocol uses protobuf definitions from the inference.GRPCInferenceService service, offering lower latency and more efficient serialization for high-throughput workloads.

In Seldon Core 2, the V2 protocol is the primary interface for all model interactions. The Seldon envoy proxy routes requests to the appropriate model based on the model name, and the inference server translates the V2 payload into framework-specific tensor formats internally.

Usage

This principle applies when sending prediction requests to any Seldon Core 2 model or pipeline. The protocol is used for:

Single model inference: Direct prediction requests to individual models
Pipeline inference: Requests routed through multi-step inference pipelines
A/B testing: Identical payloads sent to different model versions for comparison

REST Example

curl -X POST http://localhost:9000/v2/models/iris/infer \
  -H "Content-Type: application/json" \
  -d '{"inputs": [{"name": "predict", "shape": [1, 4], "datatype": "FP32", "data": [[5.1, 3.5, 1.4, 0.2]]}]}'

gRPC Example

grpcurl -d '{"model_name": "iris", "inputs": [{"name": "predict", "shape": [1, 4], "datatype": "FP32", "contents": {"fp32_contents": [5.1, 3.5, 1.4, 0.2]}}]}' \
  localhost:9000 inference.GRPCInferenceService/ModelInfer

Response Format

{
  "model_name": "iris",
  "model_version": "v0.1.0",
  "outputs": [
    {
      "name": "predict",
      "shape": [1],
      "datatype": "INT64",
      "data": [0]
    }
  ]
}

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment