Environment:Tensorflow Serving Kubernetes Deployment Environment
| Knowledge Sources | |
|---|---|
| Domains | Infrastructure, Kubernetes, Cloud |
| Last Updated | 2026-02-13 17:00 GMT |
Overview
Kubernetes (GKE) deployment environment with Docker, Google Cloud SDK, and `kubectl` for orchestrating TensorFlow Serving at scale.
Description
This environment provides the tools and infrastructure for deploying TensorFlow Serving on Kubernetes, specifically Google Kubernetes Engine (GKE). It requires Docker for building serving images, the Google Cloud SDK (`gcloud`) for cluster management and container registry access, and `kubectl` for deploying Kubernetes resources. The deployment pattern uses a Deployment with 3 replicas behind a LoadBalancer Service, as demonstrated in the ResNet example.
Usage
Use this environment when deploying TensorFlow Serving to production Kubernetes clusters for scalable, load-balanced model serving. This is the prerequisite for the entire Kubernetes Deployment workflow, including image building, registry push, cluster creation, and resource deployment.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| Container Runtime | Docker Engine | For building serving images locally |
| Cloud SDK | Google Cloud SDK (`gcloud`) | For GKE cluster management and Container Registry |
| Kubernetes CLI | `kubectl` | For deploying Kubernetes manifests |
| Container Registry | Google Container Registry (GCR) or equivalent | For storing serving images |
| Cluster | Kubernetes cluster (GKE recommended) | Example uses `--num-nodes=5` |
Dependencies
CLI Tools
- `docker` (for image building)
- `gcloud` (Google Cloud SDK)
- `kubectl` (Kubernetes CLI, bundled with gcloud)
Kubernetes Resources
- Deployment: TensorFlow Serving pods (default: 3 replicas)
- Service: LoadBalancer type for external access
- Container port: 8500 (gRPC)
Credentials
The following credentials are required:
- `GOOGLE_CLOUD_PROJECT`: GCP project ID for GKE cluster and Container Registry
- GCP authentication via `gcloud auth login`
- Docker authentication via `gcloud auth configure-docker`
Quick Install
# Install Google Cloud SDK
curl https://sdk.cloud.google.com | bash
# Authenticate
gcloud auth login
gcloud config set project YOUR_PROJECT_ID
# Install kubectl
gcloud components install kubectl
# Create GKE cluster
gcloud container clusters create serving-cluster --num-nodes 5
# Get cluster credentials
gcloud container clusters get-credentials serving-cluster
Code Evidence
GKE cluster creation from `serving_kubernetes.md:168-218`:
gcloud container clusters create serving-cluster --num-nodes 5
gcloud container clusters get-credentials serving-cluster
Docker image push to GCR from `serving_kubernetes.md:220-243`:
docker tag $USER/resnet_serving gcr.io/YOUR_PROJECT/resnet_serving
docker push gcr.io/YOUR_PROJECT/resnet_serving
Kubernetes manifest from `resnet_k8s.yaml:1-49`:
apiVersion: apps/v1
kind: Deployment
metadata:
name: resnet-deployment
spec:
replicas: 3
selector:
matchLabels:
app: resnet-server
template:
spec:
containers:
- name: resnet-container
image: gcr.io/YOUR_PROJECT/resnet_serving
ports:
- containerPort: 8500
---
apiVersion: v1
kind: Service
metadata:
name: resnet-service
spec:
type: LoadBalancer
ports:
- port: 8500
targetPort: 8500
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
| `ERROR: (gcloud.container.clusters.create) ... quota exceeded` | GCP quota insufficient | Request quota increase in GCP Console |
| `ImagePullBackOff` | Container image not found in registry | Verify `docker push` succeeded and image name matches manifest |
| `CrashLoopBackOff` | Model not found in container | Ensure model was copied into Docker image during build step |
Compatibility Notes
- GKE specific: The tutorial targets GKE but the Kubernetes manifests are portable to other providers (EKS, AKS) with appropriate registry and cluster setup changes.
- Scaling: TensorFlow Serving performance is better on fewer, larger machines due to resource sharing efficiency and lower fixed costs (see performance.md).
- GPU on K8s: For GPU serving on Kubernetes, install the NVIDIA device plugin and use GPU-enabled serving images.