Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:SeldonIO Seldon core V2 Inference Protocol

From Leeroopedia
Property Value
Principle Name V2_Inference_Protocol
Overview Standardized request-response protocol for ML model inference based on the Open Inference Protocol (V2).
Workflow Model_Deployment
Domains MLOps, Inference
Related Implementation SeldonIO_Seldon_core_Seldon_Model_Infer
Last Updated 2026-02-13 00:00 GMT

Description

The V2 Inference Protocol (also called the Open Inference Protocol) provides a vendor-neutral REST and gRPC API for running inference against deployed ML models. Requests contain typed tensors with name, shape, datatype, and data fields. Responses contain model outputs in the same tensor format. This standardization means that clients can interact with any V2-compliant inference server without modification, regardless of whether the backend is MLServer, Triton Inference Server, TorchServe, or another compatible runtime.

The protocol defines several key endpoints:

  • Model Inference: POST /v2/models/{model_name}/infer for running predictions
  • Model Metadata: GET /v2/models/{model_name} for retrieving model information
  • Model Ready: GET /v2/models/{model_name}/ready for checking model readiness
  • Server Ready: GET /v2/health/ready for checking server readiness

The inference payload structure is the core of the protocol:

{
  "inputs": [
    {
      "name": "predict",
      "shape": [1, 4],
      "datatype": "FP32",
      "data": [[5.1, 3.5, 1.4, 0.2]]
    }
  ]
}

Theoretical Basis

The V2 protocol standardizes ML serving interfaces by defining a common payload format: inputs as named tensors with explicit shapes and datatypes. The supported datatypes include:

  • Numeric types: FP32, FP64, INT8, INT16, INT32, INT64, UINT8, UINT16, UINT32, UINT64, FP16, BF16
  • Boolean: BOOL
  • String/Binary: BYTES

This enables interoperability between inference servers (MLServer, Triton, TorchServe) because:

  1. Payload format is framework-agnostic: The same JSON/protobuf structure works for sklearn, TensorFlow, PyTorch, and custom models
  2. Type safety is explicit: Datatypes are declared rather than inferred, preventing silent type coercion errors
  3. Shape validation is built-in: The server can validate that input shapes match model expectations before running inference

The gRPC variant of the protocol uses protobuf definitions from the inference.GRPCInferenceService service, offering lower latency and more efficient serialization for high-throughput workloads.

In Seldon Core 2, the V2 protocol is the primary interface for all model interactions. The Seldon envoy proxy routes requests to the appropriate model based on the model name, and the inference server translates the V2 payload into framework-specific tensor formats internally.

Usage

This principle applies when sending prediction requests to any Seldon Core 2 model or pipeline. The protocol is used for:

  • Single model inference: Direct prediction requests to individual models
  • Pipeline inference: Requests routed through multi-step inference pipelines
  • A/B testing: Identical payloads sent to different model versions for comparison

REST Example

curl -X POST http://localhost:9000/v2/models/iris/infer \
  -H "Content-Type: application/json" \
  -d '{"inputs": [{"name": "predict", "shape": [1, 4], "datatype": "FP32", "data": [[5.1, 3.5, 1.4, 0.2]]}]}'

gRPC Example

grpcurl -d '{"model_name": "iris", "inputs": [{"name": "predict", "shape": [1, 4], "datatype": "FP32", "contents": {"fp32_contents": [5.1, 3.5, 1.4, 0.2]}}]}' \
  localhost:9000 inference.GRPCInferenceService/ModelInfer

Response Format

{
  "model_name": "iris",
  "model_version": "v0.1.0",
  "outputs": [
    {
      "name": "predict",
      "shape": [1],
      "datatype": "INT64",
      "data": [0]
    }
  ]
}

Related Pages

Implementation:SeldonIO_Seldon_core_Seldon_Model_Infer Implementation:SeldonIO_Seldon_core_Open_Inference_Protocol_V2_OpenAPI

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment