Principle:SeldonIO Seldon core Pipeline Inference Execution
| Field | Value |
|---|---|
| Principle Name | Pipeline Inference Execution |
| Overview | Sending inference requests through a multi-step pipeline and receiving aggregated outputs. |
| Domains | MLOps, Inference |
| Related Implementation | SeldonIO_Seldon_core_Seldon_Pipeline_Infer |
| Last Updated | 2026-02-13 00:00 GMT |
Description
Pipeline inference sends a V2-protocol request to the pipeline's first step. Data flows through all steps via Kafka, with each step's output becoming the next step's input (potentially remapped via tensorMap). The final output aggregates results from the designated output steps.
The inference flow operates as follows:
- Ingress: The client sends a V2 Inference Protocol request (JSON or gRPC) to the pipeline endpoint. The request contains named input tensors with their data, shape, and datatype.
- Step-by-step Processing: The first step (or steps without explicit inputs) receives the pipeline input. Each subsequent step receives data from its declared input sources via Kafka topics. Tensor names are remapped according to
tensorMapconfigurations. - Egress: The output from the designated
spec.output.stepsis collected and returned to the client as a V2 Inference Protocol response. - Inspection: The
seldon pipeline inspectcommand can trace data through all steps for debugging, showing the intermediate tensor values at each stage.
Theoretical Basis
Pipeline inference follows the dataflow programming model: input data enters the graph, is transformed by each node, and exits at designated output points. Kafka provides durable, ordered message delivery between nodes. The pipeline inspect command can trace data through all steps for debugging.
Key theoretical properties:
- V2 Inference Protocol: Seldon Core 2 uses the standard V2 (Open Inference Protocol) for all inference communication. This protocol defines a standard JSON schema with
inputs(list of named tensors) andoutputs(list of named result tensors). This standardization allows any V2-compatible client to interact with any pipeline. - Asynchronous Message Passing: Kafka topics between steps provide asynchronous, durable data transfer. This decouples step execution timing and provides natural buffering for steps with different processing speeds.
- Tensor Flow Semantics: Data flows as named tensors through the graph. Each tensor has a name, shape, datatype, and data payload. The type system ensures compatibility between connected steps (or surfaces mismatches as errors).
- End-to-end Traceability: The pipeline inspect facility allows operators to observe the exact data flowing through each step, which is essential for debugging data transformation issues, verifying tensor remapping, and validating pipeline correctness.
When to Use
Use this principle when running predictions through a multi-model pipeline:
- When sending inference requests to a deployed and ready pipeline.
- When testing pipeline correctness with known input data.
- When debugging unexpected pipeline outputs using the inspect facility.
- When benchmarking pipeline throughput with repeated inference iterations.
- When integrating pipeline inference into application code.
Structure
The inference execution flow:
- Construct V2 request: Build a JSON payload with named input tensors matching the first step's expected input schema.
- Send to pipeline endpoint: Use
seldon pipeline infer(CLI),curl(REST), or a gRPC client to send the request. - Data flows through DAG: Each step processes its inputs and publishes outputs to Kafka topics. Downstream steps consume from these topics.
- Receive response: The output from the designated output steps is collected and returned as a V2 response.
- Optionally inspect: Use
seldon pipeline inspectto trace intermediate data for debugging.
Related Pages
- SeldonIO_Seldon_core_Seldon_Pipeline_Infer - implements - Concrete CLI tool for sending V2-protocol inference requests.
- SeldonIO_Seldon_core_Pipeline_Readiness_Verification - prerequisite - Pipeline must be verified ready before inference.
- SeldonIO_Seldon_core_Pipeline_Topology_Definition - determines flow - The DAG topology determines how data flows through the pipeline.
- SeldonIO_Seldon_core_Pipeline_Conditional_Routing - routing logic - Conditional routing affects which steps execute during inference.
- Heuristic:SeldonIO_Seldon_core_Kafka_Partition_Throughput_Tip
- Heuristic:SeldonIO_Seldon_core_Tracing_Latency_Tip