Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:SeldonIO Seldon core Seldon Model Infer With Headers

From Leeroopedia
Revision as of 13:50, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/SeldonIO_Seldon_core_Seldon_Model_Infer_With_Headers.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Field Value
Type External Tool Doc
Overview Concrete CLI tool for monitoring experiment traffic distribution using inference with response headers in Seldon Core 2.
Domains MLOps, Experimentation
Related Principle SeldonIO_Seldon_core_Experiment_Traffic_Analysis
Source docs-gb/cli/seldon_model_infer.md:L1-35, samples/local-experiments.md:L130-230
Knowledge Sources Repo, Doc
Last Updated 2026-02-13 00:00 GMT

Description

This implementation provides the concrete CLI commands for monitoring experiment traffic distribution in Seldon Core 2. The seldon model infer command sends V2 inference requests to a model endpoint and can display response headers (including x-seldon-route), run multiple iterations for distribution analysis, and use sticky sessions for route pinning.

Code Reference

CLI Signature

seldon model infer <modelName> '<data>' [--show-headers] [-i iterations] [-t seconds] [-s sticky-session] [--header key=value]

Single Inference with Headers

seldon model infer iris \
  '{"inputs": [{"name": "predict", "shape": [1, 4], "datatype": "FP32", "data": [[1, 2, 3, 4]]}]}' \
  --show-headers

Example response:

# Headers:
#   x-seldon-route: iris2_1
# Response:
{
  "model_name": "iris2_1",
  "outputs": [{"name": "predict", "shape": [1, 1], "datatype": "FP64", "data": [2]}]
}

Multi-Iteration Distribution Analysis

seldon model infer iris \
  '{"inputs": [{"name": "predict", "shape": [1, 4], "datatype": "FP32", "data": [[1, 2, 3, 4]]}]}' \
  -i 100

Example output:

Success: map[:iris_1::50 :iris2_1::50]

I/O Contract

Direction Description
Inputs V2 inference payload (JSON), experiment default model name or <experiment-name>.experiment as the target endpoint.
Outputs V2 inference responses with model_name field showing which candidate served the request. Response header x-seldon-route identifying the routed candidate. Multi-iteration mode produces traffic statistics (e.g., "Success: map[:iris_1::50 :iris2_1::50]").

Key Parameters

Parameter Description Default Required
modelName (positional) Target model name (default model or <experiment>.experiment) Yes
data (positional) V2 inference payload as JSON string Yes
--show-headers Display response headers including x-seldon-route false No
-i / --iterations Number of inference requests to send (for distribution analysis) 1 No
-t / --seconds Run inferences for a specified duration (seconds) No
-s / --sticky-session Enable sticky session; reuse the route from the first response false No
--header Pass custom headers (e.g., x-seldon-route=iris2_1 for route pinning) No
--inference-host Inference server address 0.0.0.0:9000 No

Usage Examples

Verify Traffic Split After Starting Experiment

# Run 100 iterations to verify 50/50 split
seldon model infer iris \
  '{"inputs": [{"name": "predict", "shape": [1, 4], "datatype": "FP32", "data": [[1, 2, 3, 4]]}]}' \
  -i 100

# Expected output (approximately):
# Success: map[:iris_1::50 :iris2_1::50]

Sticky Session: Pin to a Specific Candidate

# First request: discover which candidate was assigned
seldon model infer iris \
  '{"inputs": [{"name": "predict", "shape": [1, 4], "datatype": "FP32", "data": [[1, 2, 3, 4]]}]}' \
  --show-headers

# Subsequent requests: pin to iris2_1 using the route header
seldon model infer iris \
  '{"inputs": [{"name": "predict", "shape": [1, 4], "datatype": "FP32", "data": [[1, 2, 3, 4]]}]}' \
  --header x-seldon-route=iris2_1 \
  -s

Timed Distribution Analysis

# Run inferences for 30 seconds and report distribution
seldon model infer iris \
  '{"inputs": [{"name": "predict", "shape": [1, 4], "datatype": "FP32", "data": [[1, 2, 3, 4]]}]}' \
  -t 30

Using curl for Direct HTTP Inference

# Direct HTTP inference with header inspection
curl -v http://localhost:9000/v2/models/iris/infer \
  -H "Content-Type: application/json" \
  -d '{"inputs": [{"name": "predict", "shape": [1, 4], "datatype": "FP32", "data": [[1, 2, 3, 4]]}]}'

# The response headers will include:
# x-seldon-route: iris_1  (or iris2_1)

External Dependencies

  • seldon CLI — Command-line tool for inference and experiment interaction
  • curl — Alternative HTTP client for direct V2 inference protocol requests
  • V2 inference protocol — Open Inference Protocol for request/response format

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment