Environment:Kserve Kserve GPU Accelerator

Knowledge Sources	KServe GPU Accelerator Samples
Domains	Infrastructure, GPU_Computing
Last Updated	2026-02-13 14:00 GMT

Overview

GPU accelerator environment supporting NVIDIA, AMD, Intel, and Habana Gaudi devices for hardware-accelerated model inference.

Description

KServe supports multiple GPU vendors through Kubernetes device plugins. GPU resources are requested via standard Kubernetes resource limits (e.g., `nvidia.com/gpu`). The system auto-detects GPU type from container resource specifications and defaults to NVIDIA if unspecified. Custom GPU resource types can be added via the inferenceservice-config ConfigMap.

Usage

Use this environment for any GPU-accelerated inference workload, including LLM serving with vLLM, TensorFlow GPU models, PyTorch GPU models, and multi-node distributed inference with data/expert parallelism.

System Requirements

Category	Requirement	Notes
Hardware	NVIDIA, AMD, Intel GPU, or Habana Gaudi	Device plugin must be installed
NVIDIA Drivers	Compatible with CUDA runtime	GKE provides auto-install DaemonSet
Device Plugin	Vendor-specific K8s device plugin	Exposes GPU resources to scheduler
VRAM	Model-dependent	7B models: 16GB+; DeepSeek-R1: 8x80GB per node

Dependencies

Kubernetes Resources

NVIDIA GPU device plugin (for `nvidia.com/gpu`)
AMD GPU device plugin (for `amd.com/gpu`)
Intel GPU device plugin (for `intel.com/gpu`)
Habana Gaudi device plugin (for `habana.ai/gaudi`)

Credentials

No additional credentials beyond cluster access.

Quick Install

# For GKE - install NVIDIA device plugin
kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/master/nvidia-driver-installer/cos/daemonset-preloaded-latest.yaml

# Verify GPU availability
kubectl get nodes -o json | jq '.items[].status.allocatable | select(.["nvidia.com/gpu"])'

Code Evidence

Default GPU resource types from `pkg/constants/constants.go:289-296`:

// GPU Constants
const (
    NvidiaGPUResourceType          = "nvidia.com/gpu"
    NvidiaMigGPUResourceTypePrefix = "nvidia.com/mig"
    AmdGPUResourceType             = "amd.com/gpu"
    IntelGPUResourceType           = "intel.com/gpu"
    GaudiGPUResourceType           = "habana.ai/gaudi"
)

GPU assignment logic from `pkg/controller/v1beta1/inferenceservice/reconcilers/deployment/deployment_reconciler.go:404-409`:

// If no GPU resource is explicitly set, it defaults to "nvidia.com/gpu".
// Ensures that the container's Limits and Requests maps are initialized.
func addGPUResourceToDeployment(deployment *appsv1.Deployment, ...) error {
    // Default GPU type is "nvidia.com/gpu"
    gpuResourceType := corev1.ResourceName(constants.NvidiaGPUResourceType)

Common Errors

Error Message	Cause	Solution
`0/N nodes are available: insufficient nvidia.com/gpu`	No GPU nodes or all GPUs allocated	Add GPU nodes or reduce GPU requests
`nvidia.com/gpu:NoSchedule` taint	GKE auto-taints GPU nodes	Add toleration for `nvidia.com/gpu` in InferenceService

Compatibility Notes

NVIDIA MIG: Supported via `nvidia.com/mig` prefix resources (e.g., `nvidia.com/mig-1g.5gb`)
Custom GPUs: Add custom resource types via `multiNode.customGPUResourceTypeList` in ConfigMap
Default: If no GPU resource type found in container spec, defaults to `nvidia.com/gpu`
Multi-node: DeepSeek-R1 requires 8 GPUs per pod with RDMA interconnect

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment