Environment:Kserve Kserve GPU Accelerator
| Knowledge Sources | |
|---|---|
| Domains | Infrastructure, GPU_Computing |
| Last Updated | 2026-02-13 14:00 GMT |
Overview
GPU accelerator environment supporting NVIDIA, AMD, Intel, and Habana Gaudi devices for hardware-accelerated model inference.
Description
KServe supports multiple GPU vendors through Kubernetes device plugins. GPU resources are requested via standard Kubernetes resource limits (e.g., `nvidia.com/gpu`). The system auto-detects GPU type from container resource specifications and defaults to NVIDIA if unspecified. Custom GPU resource types can be added via the inferenceservice-config ConfigMap.
Usage
Use this environment for any GPU-accelerated inference workload, including LLM serving with vLLM, TensorFlow GPU models, PyTorch GPU models, and multi-node distributed inference with data/expert parallelism.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| Hardware | NVIDIA, AMD, Intel GPU, or Habana Gaudi | Device plugin must be installed |
| NVIDIA Drivers | Compatible with CUDA runtime | GKE provides auto-install DaemonSet |
| Device Plugin | Vendor-specific K8s device plugin | Exposes GPU resources to scheduler |
| VRAM | Model-dependent | 7B models: 16GB+; DeepSeek-R1: 8x80GB per node |
Dependencies
Kubernetes Resources
- NVIDIA GPU device plugin (for `nvidia.com/gpu`)
- AMD GPU device plugin (for `amd.com/gpu`)
- Intel GPU device plugin (for `intel.com/gpu`)
- Habana Gaudi device plugin (for `habana.ai/gaudi`)
Credentials
No additional credentials beyond cluster access.
Quick Install
# For GKE - install NVIDIA device plugin
kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/master/nvidia-driver-installer/cos/daemonset-preloaded-latest.yaml
# Verify GPU availability
kubectl get nodes -o json | jq '.items[].status.allocatable | select(.["nvidia.com/gpu"])'
Code Evidence
Default GPU resource types from `pkg/constants/constants.go:289-296`:
// GPU Constants
const (
NvidiaGPUResourceType = "nvidia.com/gpu"
NvidiaMigGPUResourceTypePrefix = "nvidia.com/mig"
AmdGPUResourceType = "amd.com/gpu"
IntelGPUResourceType = "intel.com/gpu"
GaudiGPUResourceType = "habana.ai/gaudi"
)
GPU assignment logic from `pkg/controller/v1beta1/inferenceservice/reconcilers/deployment/deployment_reconciler.go:404-409`:
// If no GPU resource is explicitly set, it defaults to "nvidia.com/gpu".
// Ensures that the container's Limits and Requests maps are initialized.
func addGPUResourceToDeployment(deployment *appsv1.Deployment, ...) error {
// Default GPU type is "nvidia.com/gpu"
gpuResourceType := corev1.ResourceName(constants.NvidiaGPUResourceType)
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
| `0/N nodes are available: insufficient nvidia.com/gpu` | No GPU nodes or all GPUs allocated | Add GPU nodes or reduce GPU requests |
| `nvidia.com/gpu:NoSchedule` taint | GKE auto-taints GPU nodes | Add toleration for `nvidia.com/gpu` in InferenceService |
Compatibility Notes
- NVIDIA MIG: Supported via `nvidia.com/mig` prefix resources (e.g., `nvidia.com/mig-1g.5gb`)
- Custom GPUs: Add custom resource types via `multiNode.customGPUResourceTypeList` in ConfigMap
- Default: If no GPU resource type found in container spec, defaults to `nvidia.com/gpu`
- Multi-node: DeepSeek-R1 requires 8 GPUs per pod with RDMA interconnect