Implementation:Recommenders team Recommenders K8s Utils
| Knowledge Sources | |
|---|---|
| Domains | Deployment, Kubernetes, Capacity Planning |
| Last Updated | 2026-02-10 00:00 GMT |
Overview
Capacity planning utility functions for estimating Kubernetes replica counts and throughput when deploying recommendation model services on Azure Kubernetes Service (AKS).
Description
This module provides three estimation functions for Kubernetes deployment planning:
- qps_to_replicas estimates the number of replicas needed to support a target queries per second (QPS). It calculates concurrent queries as target_qps * processing_time / target_utilization and divides by the maximum queries per replica, using math.ceil for conservative rounding upward.
- replicas_to_qps performs the inverse calculation, estimating the throughput (QPS) supported by a given number of replicas using math.floor for conservative rounding downward.
- nodes_to_replicas estimates the total number of replicas supported by an AKS cluster configuration. It accounts for system overhead by computing available cores as (cores_per_node - 0.5) * n_nodes - 4.45, where the 0.5 per node and 4.45 total represent Kubernetes system resource reservations.
All functions log their estimates via Python's logging module.
Usage
Use these functions during deployment planning to right-size AKS clusters for recommendation model serving. They provide rough estimates to help practitioners determine how many replicas or nodes are needed to meet target throughput requirements.
Code Reference
Source Location
- Repository: Recommenders
- File: recommenders/utils/k8s_utils.py
- Lines: 1-81
Signature
def qps_to_replicas(target_qps, processing_time, max_qp_replica=1, target_utilization=0.7)
def replicas_to_qps(num_replicas, processing_time, max_qp_replica=1, target_utilization=0.7)
def nodes_to_replicas(n_cores_per_node, n_nodes=3, cpu_cores_per_replica=0.1)
Import
from recommenders.utils.k8s_utils import qps_to_replicas, replicas_to_qps, nodes_to_replicas
I/O Contract
Inputs
qps_to_replicas
| Name | Type | Required | Description |
|---|---|---|---|
| target_qps | int | Yes | Target queries per second to support |
| processing_time | float | Yes | Estimated time (in seconds) for a single service call |
| max_qp_replica | int | No | Maximum concurrent queries per replica (default: 1) |
| target_utilization | float | No | Target CPU utilization proportion (default: 0.7) |
replicas_to_qps
| Name | Type | Required | Description |
|---|---|---|---|
| num_replicas | int | Yes | Number of replicas available |
| processing_time | float | Yes | Estimated time (in seconds) for a single service call |
| max_qp_replica | int | No | Maximum concurrent queries per replica (default: 1) |
| target_utilization | float | No | Target CPU utilization proportion (default: 0.7) |
nodes_to_replicas
| Name | Type | Required | Description |
|---|---|---|---|
| n_cores_per_node | int | Yes | Total CPU cores per node in the AKS cluster |
| n_nodes | int | No | Number of nodes (VMs) in the cluster (default: 3) |
| cpu_cores_per_replica | float | No | CPU cores assigned to each replica (default: 0.1) |
Outputs
| Name | Type | Description |
|---|---|---|
| return (qps_to_replicas) | int | Estimated number of replicas required (ceiling) |
| return (replicas_to_qps) | int | Estimated queries per second supported (floor) |
| return (nodes_to_replicas) | int | Estimated total replicas supported by the cluster (floor) |
Usage Examples
Basic Usage
from recommenders.utils.k8s_utils import qps_to_replicas, replicas_to_qps, nodes_to_replicas
# How many replicas needed for 100 QPS with 0.2s processing time?
replicas = qps_to_replicas(target_qps=100, processing_time=0.2)
# Result: 29 replicas
# How many QPS can 30 replicas handle?
qps = replicas_to_qps(num_replicas=30, processing_time=0.2)
# Result: 105 QPS
# How many replicas can a 3-node cluster with 4 cores per node support?
max_replicas = nodes_to_replicas(n_cores_per_node=4, n_nodes=3, cpu_cores_per_replica=0.1)
# Result: 60 replicas