Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:SeldonIO Seldon core Seldon Model Infer With Headers

From Leeroopedia
Field Value
Type External Tool Doc
Overview Concrete CLI tool for monitoring experiment traffic distribution using inference with response headers in Seldon Core 2.
Domains MLOps, Experimentation
Related Principle SeldonIO_Seldon_core_Experiment_Traffic_Analysis
Source docs-gb/cli/seldon_model_infer.md:L1-35, samples/local-experiments.md:L130-230
Knowledge Sources Repo, Doc
Last Updated 2026-02-13 00:00 GMT

Description

This implementation provides the concrete CLI commands for monitoring experiment traffic distribution in Seldon Core 2. The seldon model infer command sends V2 inference requests to a model endpoint and can display response headers (including x-seldon-route), run multiple iterations for distribution analysis, and use sticky sessions for route pinning.

Code Reference

CLI Signature

seldon model infer <modelName> '<data>' [--show-headers] [-i iterations] [-t seconds] [-s sticky-session] [--header key=value]

Single Inference with Headers

seldon model infer iris \
  '{"inputs": [{"name": "predict", "shape": [1, 4], "datatype": "FP32", "data": [[1, 2, 3, 4]]}]}' \
  --show-headers

Example response:

# Headers:
#   x-seldon-route: iris2_1
# Response:
{
  "model_name": "iris2_1",
  "outputs": [{"name": "predict", "shape": [1, 1], "datatype": "FP64", "data": [2]}]
}

Multi-Iteration Distribution Analysis

seldon model infer iris \
  '{"inputs": [{"name": "predict", "shape": [1, 4], "datatype": "FP32", "data": [[1, 2, 3, 4]]}]}' \
  -i 100

Example output:

Success: map[:iris_1::50 :iris2_1::50]

I/O Contract

Direction Description
Inputs V2 inference payload (JSON), experiment default model name or <experiment-name>.experiment as the target endpoint.
Outputs V2 inference responses with model_name field showing which candidate served the request. Response header x-seldon-route identifying the routed candidate. Multi-iteration mode produces traffic statistics (e.g., "Success: map[:iris_1::50 :iris2_1::50]").

Key Parameters

Parameter Description Default Required
modelName (positional) Target model name (default model or <experiment>.experiment) Yes
data (positional) V2 inference payload as JSON string Yes
--show-headers Display response headers including x-seldon-route false No
-i / --iterations Number of inference requests to send (for distribution analysis) 1 No
-t / --seconds Run inferences for a specified duration (seconds) No
-s / --sticky-session Enable sticky session; reuse the route from the first response false No
--header Pass custom headers (e.g., x-seldon-route=iris2_1 for route pinning) No
--inference-host Inference server address 0.0.0.0:9000 No

Usage Examples

Verify Traffic Split After Starting Experiment

# Run 100 iterations to verify 50/50 split
seldon model infer iris \
  '{"inputs": [{"name": "predict", "shape": [1, 4], "datatype": "FP32", "data": [[1, 2, 3, 4]]}]}' \
  -i 100

# Expected output (approximately):
# Success: map[:iris_1::50 :iris2_1::50]

Sticky Session: Pin to a Specific Candidate

# First request: discover which candidate was assigned
seldon model infer iris \
  '{"inputs": [{"name": "predict", "shape": [1, 4], "datatype": "FP32", "data": [[1, 2, 3, 4]]}]}' \
  --show-headers

# Subsequent requests: pin to iris2_1 using the route header
seldon model infer iris \
  '{"inputs": [{"name": "predict", "shape": [1, 4], "datatype": "FP32", "data": [[1, 2, 3, 4]]}]}' \
  --header x-seldon-route=iris2_1 \
  -s

Timed Distribution Analysis

# Run inferences for 30 seconds and report distribution
seldon model infer iris \
  '{"inputs": [{"name": "predict", "shape": [1, 4], "datatype": "FP32", "data": [[1, 2, 3, 4]]}]}' \
  -t 30

Using curl for Direct HTTP Inference

# Direct HTTP inference with header inspection
curl -v http://localhost:9000/v2/models/iris/infer \
  -H "Content-Type: application/json" \
  -d '{"inputs": [{"name": "predict", "shape": [1, 4], "datatype": "FP32", "data": [[1, 2, 3, 4]]}]}'

# The response headers will include:
# x-seldon-route: iris_1  (or iris2_1)

External Dependencies

  • seldon CLI — Command-line tool for inference and experiment interaction
  • curl — Alternative HTTP client for direct V2 inference protocol requests
  • V2 inference protocol — Open Inference Protocol for request/response format

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment