Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Heuristic:SeldonIO Seldon core Kafka Partition Throughput Tip

From Leeroopedia
Knowledge Sources
Domains Optimization, Pipelines, Kafka
Last Updated 2026-02-13 14:00 GMT

Overview

Pipeline throughput optimization technique: match pipelinegateway replicas to Kafka partition count and increase gateway workers before adding replicas.

Description

Seldon Core 2 pipelines use Kafka topics for inter-step communication. Each pipeline gateway instance consumes from Kafka partitions. The number of Kafka partitions directly determines the maximum parallelism of pipeline processing. Having more gateway replicas than partitions wastes resources, while having fewer replicas than partitions underutilizes available throughput. Additionally, each gateway has a configurable number of lightweight worker goroutines per consumer.

Usage

Use this heuristic when pipeline throughput is insufficient or when scaling pipeline infrastructure. It is especially important for high-throughput production deployments where pipeline latency and throughput are critical metrics.

The Insight (Rule of Thumb)

  • Action 1: Set Kafka `numPartitions` to match desired parallelism level.
  • Action 2: Set pipelinegateway replicas >= Kafka partition count for balanced distribution.
  • Action 3: Increase `MODELGATEWAY_NUM_WORKERS` (goroutines per consumer) before adding more replicas.
  • Action 4: Minimize network latency between Seldon components and the Kafka cluster.
  • Value: Default is 1 partition and 1 replica. For production, use 3-10 partitions with matching replicas.
  • Trade-off: More partitions = higher throughput but more Kafka broker resources. More workers = higher CPU usage per pod.

Reasoning

Kafka's parallelism model is partition-based: each partition can be consumed by at most one consumer in a consumer group. If you have 4 partitions but only 2 gateway replicas, each replica handles 2 partitions (suboptimal distribution). If you have 2 partitions but 4 replicas, 2 replicas sit idle (wasted resources).

From `docs-gb/performance-tuning/pipelines/core-2-configuration.md`:

  • Number of Kafka partitions significantly influences throughput
  • Each dataflow-engine processes one partition across all pipelines
  • Increasing workers first, then replicas, improves throughput
  • Only works if pod has resources to support more workers

Gateway scalability parameters:

MODELGATEWAY_NUM_WORKERS = number of lightweight inference workers (goroutines)
MODELGATEWAY_MAX_NUM_CONSUMERS = size of hash table for model-to-consumer assignment (default: 100)

Network latency matters: Lower network latency between Core 2 components and Kafka directly improves pipeline performance. A separate Kafka cluster = higher latency = lower throughput.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment