Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Recommenders team Recommenders K8s Utils

From Leeroopedia


Knowledge Sources
Domains Deployment, Kubernetes, Capacity Planning
Last Updated 2026-02-10 00:00 GMT

Overview

Capacity planning utility functions for estimating Kubernetes replica counts and throughput when deploying recommendation model services on Azure Kubernetes Service (AKS).

Description

This module provides three estimation functions for Kubernetes deployment planning:

  • qps_to_replicas estimates the number of replicas needed to support a target queries per second (QPS). It calculates concurrent queries as target_qps * processing_time / target_utilization and divides by the maximum queries per replica, using math.ceil for conservative rounding upward.
  • replicas_to_qps performs the inverse calculation, estimating the throughput (QPS) supported by a given number of replicas using math.floor for conservative rounding downward.
  • nodes_to_replicas estimates the total number of replicas supported by an AKS cluster configuration. It accounts for system overhead by computing available cores as (cores_per_node - 0.5) * n_nodes - 4.45, where the 0.5 per node and 4.45 total represent Kubernetes system resource reservations.

All functions log their estimates via Python's logging module.

Usage

Use these functions during deployment planning to right-size AKS clusters for recommendation model serving. They provide rough estimates to help practitioners determine how many replicas or nodes are needed to meet target throughput requirements.

Code Reference

Source Location

Signature

def qps_to_replicas(target_qps, processing_time, max_qp_replica=1, target_utilization=0.7)

def replicas_to_qps(num_replicas, processing_time, max_qp_replica=1, target_utilization=0.7)

def nodes_to_replicas(n_cores_per_node, n_nodes=3, cpu_cores_per_replica=0.1)

Import

from recommenders.utils.k8s_utils import qps_to_replicas, replicas_to_qps, nodes_to_replicas

I/O Contract

Inputs

qps_to_replicas

Name Type Required Description
target_qps int Yes Target queries per second to support
processing_time float Yes Estimated time (in seconds) for a single service call
max_qp_replica int No Maximum concurrent queries per replica (default: 1)
target_utilization float No Target CPU utilization proportion (default: 0.7)

replicas_to_qps

Name Type Required Description
num_replicas int Yes Number of replicas available
processing_time float Yes Estimated time (in seconds) for a single service call
max_qp_replica int No Maximum concurrent queries per replica (default: 1)
target_utilization float No Target CPU utilization proportion (default: 0.7)

nodes_to_replicas

Name Type Required Description
n_cores_per_node int Yes Total CPU cores per node in the AKS cluster
n_nodes int No Number of nodes (VMs) in the cluster (default: 3)
cpu_cores_per_replica float No CPU cores assigned to each replica (default: 0.1)

Outputs

Name Type Description
return (qps_to_replicas) int Estimated number of replicas required (ceiling)
return (replicas_to_qps) int Estimated queries per second supported (floor)
return (nodes_to_replicas) int Estimated total replicas supported by the cluster (floor)

Usage Examples

Basic Usage

from recommenders.utils.k8s_utils import qps_to_replicas, replicas_to_qps, nodes_to_replicas

# How many replicas needed for 100 QPS with 0.2s processing time?
replicas = qps_to_replicas(target_qps=100, processing_time=0.2)
# Result: 29 replicas

# How many QPS can 30 replicas handle?
qps = replicas_to_qps(num_replicas=30, processing_time=0.2)
# Result: 105 QPS

# How many replicas can a 3-node cluster with 4 cores per node support?
max_replicas = nodes_to_replicas(n_cores_per_node=4, n_nodes=3, cpu_cores_per_replica=0.1)
# Result: 60 replicas

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment