Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Environment:Kserve Kserve GPU Accelerator

From Leeroopedia
Knowledge Sources
Domains Infrastructure, GPU_Computing
Last Updated 2026-02-13 14:00 GMT

Overview

GPU accelerator environment supporting NVIDIA, AMD, Intel, and Habana Gaudi devices for hardware-accelerated model inference.

Description

KServe supports multiple GPU vendors through Kubernetes device plugins. GPU resources are requested via standard Kubernetes resource limits (e.g., `nvidia.com/gpu`). The system auto-detects GPU type from container resource specifications and defaults to NVIDIA if unspecified. Custom GPU resource types can be added via the inferenceservice-config ConfigMap.

Usage

Use this environment for any GPU-accelerated inference workload, including LLM serving with vLLM, TensorFlow GPU models, PyTorch GPU models, and multi-node distributed inference with data/expert parallelism.

System Requirements

Category Requirement Notes
Hardware NVIDIA, AMD, Intel GPU, or Habana Gaudi Device plugin must be installed
NVIDIA Drivers Compatible with CUDA runtime GKE provides auto-install DaemonSet
Device Plugin Vendor-specific K8s device plugin Exposes GPU resources to scheduler
VRAM Model-dependent 7B models: 16GB+; DeepSeek-R1: 8x80GB per node

Dependencies

Kubernetes Resources

  • NVIDIA GPU device plugin (for `nvidia.com/gpu`)
  • AMD GPU device plugin (for `amd.com/gpu`)
  • Intel GPU device plugin (for `intel.com/gpu`)
  • Habana Gaudi device plugin (for `habana.ai/gaudi`)

Credentials

No additional credentials beyond cluster access.

Quick Install

# For GKE - install NVIDIA device plugin
kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/master/nvidia-driver-installer/cos/daemonset-preloaded-latest.yaml

# Verify GPU availability
kubectl get nodes -o json | jq '.items[].status.allocatable | select(.["nvidia.com/gpu"])'

Code Evidence

Default GPU resource types from `pkg/constants/constants.go:289-296`:

// GPU Constants
const (
    NvidiaGPUResourceType          = "nvidia.com/gpu"
    NvidiaMigGPUResourceTypePrefix = "nvidia.com/mig"
    AmdGPUResourceType             = "amd.com/gpu"
    IntelGPUResourceType           = "intel.com/gpu"
    GaudiGPUResourceType           = "habana.ai/gaudi"
)

GPU assignment logic from `pkg/controller/v1beta1/inferenceservice/reconcilers/deployment/deployment_reconciler.go:404-409`:

// If no GPU resource is explicitly set, it defaults to "nvidia.com/gpu".
// Ensures that the container's Limits and Requests maps are initialized.
func addGPUResourceToDeployment(deployment *appsv1.Deployment, ...) error {
    // Default GPU type is "nvidia.com/gpu"
    gpuResourceType := corev1.ResourceName(constants.NvidiaGPUResourceType)

Common Errors

Error Message Cause Solution
`0/N nodes are available: insufficient nvidia.com/gpu` No GPU nodes or all GPUs allocated Add GPU nodes or reduce GPU requests
`nvidia.com/gpu:NoSchedule` taint GKE auto-taints GPU nodes Add toleration for `nvidia.com/gpu` in InferenceService

Compatibility Notes

  • NVIDIA MIG: Supported via `nvidia.com/mig` prefix resources (e.g., `nvidia.com/mig-1g.5gb`)
  • Custom GPUs: Add custom resource types via `multiNode.customGPUResourceTypeList` in ConfigMap
  • Default: If no GPU resource type found in container spec, defaults to `nvidia.com/gpu`
  • Multi-node: DeepSeek-R1 requires 8 GPUs per pod with RDMA interconnect

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment