Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Environment:Tensorflow Serving Kubernetes Deployment Environment

From Leeroopedia
Knowledge Sources
Domains Infrastructure, Kubernetes, Cloud
Last Updated 2026-02-13 17:00 GMT

Overview

Kubernetes (GKE) deployment environment with Docker, Google Cloud SDK, and `kubectl` for orchestrating TensorFlow Serving at scale.

Description

This environment provides the tools and infrastructure for deploying TensorFlow Serving on Kubernetes, specifically Google Kubernetes Engine (GKE). It requires Docker for building serving images, the Google Cloud SDK (`gcloud`) for cluster management and container registry access, and `kubectl` for deploying Kubernetes resources. The deployment pattern uses a Deployment with 3 replicas behind a LoadBalancer Service, as demonstrated in the ResNet example.

Usage

Use this environment when deploying TensorFlow Serving to production Kubernetes clusters for scalable, load-balanced model serving. This is the prerequisite for the entire Kubernetes Deployment workflow, including image building, registry push, cluster creation, and resource deployment.

System Requirements

Category Requirement Notes
Container Runtime Docker Engine For building serving images locally
Cloud SDK Google Cloud SDK (`gcloud`) For GKE cluster management and Container Registry
Kubernetes CLI `kubectl` For deploying Kubernetes manifests
Container Registry Google Container Registry (GCR) or equivalent For storing serving images
Cluster Kubernetes cluster (GKE recommended) Example uses `--num-nodes=5`

Dependencies

CLI Tools

  • `docker` (for image building)
  • `gcloud` (Google Cloud SDK)
  • `kubectl` (Kubernetes CLI, bundled with gcloud)

Kubernetes Resources

  • Deployment: TensorFlow Serving pods (default: 3 replicas)
  • Service: LoadBalancer type for external access
  • Container port: 8500 (gRPC)

Credentials

The following credentials are required:

  • `GOOGLE_CLOUD_PROJECT`: GCP project ID for GKE cluster and Container Registry
  • GCP authentication via `gcloud auth login`
  • Docker authentication via `gcloud auth configure-docker`

Quick Install

# Install Google Cloud SDK
curl https://sdk.cloud.google.com | bash

# Authenticate
gcloud auth login
gcloud config set project YOUR_PROJECT_ID

# Install kubectl
gcloud components install kubectl

# Create GKE cluster
gcloud container clusters create serving-cluster --num-nodes 5

# Get cluster credentials
gcloud container clusters get-credentials serving-cluster

Code Evidence

GKE cluster creation from `serving_kubernetes.md:168-218`:

gcloud container clusters create serving-cluster --num-nodes 5
gcloud container clusters get-credentials serving-cluster

Docker image push to GCR from `serving_kubernetes.md:220-243`:

docker tag $USER/resnet_serving gcr.io/YOUR_PROJECT/resnet_serving
docker push gcr.io/YOUR_PROJECT/resnet_serving

Kubernetes manifest from `resnet_k8s.yaml:1-49`:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: resnet-deployment
spec:
  replicas: 3
  selector:
    matchLabels:
      app: resnet-server
  template:
    spec:
      containers:
      - name: resnet-container
        image: gcr.io/YOUR_PROJECT/resnet_serving
        ports:
        - containerPort: 8500
---
apiVersion: v1
kind: Service
metadata:
  name: resnet-service
spec:
  type: LoadBalancer
  ports:
  - port: 8500
    targetPort: 8500

Common Errors

Error Message Cause Solution
`ERROR: (gcloud.container.clusters.create) ... quota exceeded` GCP quota insufficient Request quota increase in GCP Console
`ImagePullBackOff` Container image not found in registry Verify `docker push` succeeded and image name matches manifest
`CrashLoopBackOff` Model not found in container Ensure model was copied into Docker image during build step

Compatibility Notes

  • GKE specific: The tutorial targets GKE but the Kubernetes manifests are portable to other providers (EKS, AKS) with appropriate registry and cluster setup changes.
  • Scaling: TensorFlow Serving performance is better on fewer, larger machines due to resource sharing efficiency and lower fixed costs (see performance.md).
  • GPU on K8s: For GPU serving on Kubernetes, install the NVIDIA device plugin and use GPU-enabled serving images.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment