Principle:SeldonIO Seldon core Pipeline Inference Execution

Field	Value
Principle Name	Pipeline Inference Execution
Overview	Sending inference requests through a multi-step pipeline and receiving aggregated outputs.
Domains	MLOps, Inference
Related Implementation	SeldonIO_Seldon_core_Seldon_Pipeline_Infer
Last Updated	2026-02-13 00:00 GMT

Description

Pipeline inference sends a V2-protocol request to the pipeline's first step. Data flows through all steps via Kafka, with each step's output becoming the next step's input (potentially remapped via tensorMap). The final output aggregates results from the designated output steps.

The inference flow operates as follows:

Ingress: The client sends a V2 Inference Protocol request (JSON or gRPC) to the pipeline endpoint. The request contains named input tensors with their data, shape, and datatype.
Step-by-step Processing: The first step (or steps without explicit inputs) receives the pipeline input. Each subsequent step receives data from its declared input sources via Kafka topics. Tensor names are remapped according to tensorMap configurations.
Egress: The output from the designated spec.output.steps is collected and returned to the client as a V2 Inference Protocol response.
Inspection: The seldon pipeline inspect command can trace data through all steps for debugging, showing the intermediate tensor values at each stage.

Theoretical Basis

Pipeline inference follows the dataflow programming model: input data enters the graph, is transformed by each node, and exits at designated output points. Kafka provides durable, ordered message delivery between nodes. The pipeline inspect command can trace data through all steps for debugging.

Key theoretical properties:

V2 Inference Protocol: Seldon Core 2 uses the standard V2 (Open Inference Protocol) for all inference communication. This protocol defines a standard JSON schema with inputs (list of named tensors) and outputs (list of named result tensors). This standardization allows any V2-compatible client to interact with any pipeline.
Asynchronous Message Passing: Kafka topics between steps provide asynchronous, durable data transfer. This decouples step execution timing and provides natural buffering for steps with different processing speeds.
Tensor Flow Semantics: Data flows as named tensors through the graph. Each tensor has a name, shape, datatype, and data payload. The type system ensures compatibility between connected steps (or surfaces mismatches as errors).
End-to-end Traceability: The pipeline inspect facility allows operators to observe the exact data flowing through each step, which is essential for debugging data transformation issues, verifying tensor remapping, and validating pipeline correctness.

When to Use

Use this principle when running predictions through a multi-model pipeline:

When sending inference requests to a deployed and ready pipeline.
When testing pipeline correctness with known input data.
When debugging unexpected pipeline outputs using the inspect facility.
When benchmarking pipeline throughput with repeated inference iterations.
When integrating pipeline inference into application code.

Structure

The inference execution flow:

Construct V2 request: Build a JSON payload with named input tensors matching the first step's expected input schema.
Send to pipeline endpoint: Use seldon pipeline infer (CLI), curl (REST), or a gRPC client to send the request.
Data flows through DAG: Each step processes its inputs and publishes outputs to Kafka topics. Downstream steps consume from these topics.
Receive response: The output from the designated output steps is collected and returned as a V2 response.
Optionally inspect: Use seldon pipeline inspect to trace intermediate data for debugging.

Related Pages

SeldonIO_Seldon_core_Seldon_Pipeline_Infer - implements - Concrete CLI tool for sending V2-protocol inference requests.
SeldonIO_Seldon_core_Pipeline_Readiness_Verification - prerequisite - Pipeline must be verified ready before inference.
SeldonIO_Seldon_core_Pipeline_Topology_Definition - determines flow - The DAG topology determines how data flows through the pipeline.
SeldonIO_Seldon_core_Pipeline_Conditional_Routing - routing logic - Conditional routing affects which steps execute during inference.
Heuristic:SeldonIO_Seldon_core_Kafka_Partition_Throughput_Tip
Heuristic:SeldonIO_Seldon_core_Tracing_Latency_Tip

Implementation:SeldonIO_Seldon_core_Seldon_Pipeline_Infer

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment