Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Workflow:SeldonIO Seldon core Inference Pipeline

From Leeroopedia
Knowledge Sources
Domains MLOps, Data_Engineering, Kubernetes
Last Updated 2026-02-13 14:00 GMT

Overview

End-to-end process for composing multiple models into a directed acyclic graph (DAG) pipeline for multi-step inference in Seldon Core 2.

Description

This workflow covers creating inference pipelines that chain multiple models together using Seldon Core 2's Pipeline custom resource. Pipelines connect models via Kafka-based data flow, enabling patterns such as linear chains (model A feeds model B), parallel fan-out with joins, conditional routing based on model outputs, and pipeline-to-pipeline composition. Each step in the pipeline can optionally define tensor mappings to transform output names into the format expected by downstream steps. Pipelines support batch processing, trigger-based execution, and selective output exposure.

Usage

Execute this workflow when you need to compose multiple inference components into a single application endpoint. Common scenarios include preprocessing followed by prediction, ensemble models that aggregate multiple classifiers, conditional routing based on data content, and multi-modal pipelines that combine different types of models (e.g., speech-to-text followed by sentiment analysis).

Execution Steps

Step 1: Deploy Component Models

Deploy all individual models that will participate in the pipeline. Each model must be independently loaded and reach the Available state before the pipeline can become ready. Models can include classifiers, preprocessors, transformers, detectors, or any other inference component.

Key considerations:

  • All models referenced by the pipeline must be deployed before or concurrently with the pipeline
  • Models can be deployed on different inference servers (MLServer and Triton can coexist in the same pipeline)
  • Each model must have the correct requirements and memory allocation configured

Step 2: Define Pipeline Topology

Design the DAG structure of the pipeline by defining the step dependencies and data flow. Each step references a model name and declares its inputs (which can come from previous steps or from the pipeline-level input). The output section defines which step results are exposed in the pipeline response.

Key considerations:

  • Steps can reference other models by name in their inputs list
  • Use tensorMap to rename output tensors when downstream models expect different input names
  • Pipelines default to feeding the pipeline input to the first step if no inputs are specified
  • Output steps can select specific tensors (e.g., step_name.outputs.TENSOR_NAME)

Step 3: Configure Advanced Routing

Optionally add conditional logic, triggers, and joins to the pipeline. Triggers gate step execution based on upstream results, joins aggregate outputs from multiple parallel steps, and conditional models route data to different branches based on content.

Key considerations:

  • Triggers use a step's output as a gate condition (step only runs if trigger fires)
  • Joins can use any or all semantics (triggersJoinType field)
  • Conditional routing requires a model that produces multiple named outputs for different branches
  • Pipeline-to-pipeline composition allows one pipeline to reference another pipeline as a step

Step 4: Deploy Pipeline

Apply the Pipeline custom resource to the cluster. The Seldon scheduler validates the pipeline topology, creates the necessary Kafka topics for inter-step communication, and configures the data flow engine (chainer) to route data between steps.

Key considerations:

  • Pipeline names must not collide with any model name in the same namespace
  • Kafka topics are automatically created for each pipeline step
  • The pipeline status transitions through: PipelineCreating, PipelineReady

Step 5: Verify Pipeline Readiness

Wait for the pipeline to reach the PipelineReady condition. Check that all component models are loaded and the Kafka topic infrastructure is provisioned. Query pipeline metadata to verify the expected input/output schema.

Key considerations:

  • Pipeline readiness depends on all referenced models being available
  • Use pipeline metadata endpoint to verify the expected tensor names and shapes
  • Pipeline inspect command allows viewing data flowing through individual steps for debugging

Step 6: Run Pipeline Inference

Send inference requests to the pipeline endpoint. The request is routed through the pipeline steps according to the defined topology, with intermediate results flowing through Kafka topics. The final response contains the outputs defined in the pipeline's output section.

Key considerations:

  • Pipeline inference endpoint: /v2/pipelines/{pipeline_name}/infer
  • Input format must match the first step's expected input schema
  • Pipeline response latency includes all step execution times plus Kafka overhead
  • Both REST and gRPC are supported for pipeline inference

Execution Diagram

GitHub URL

Workflow Repository