Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Tensorflow Tfjs Model Hosting

From Leeroopedia


Knowledge Sources
Domains Deployment, Infrastructure
Implementation Implementation:Tensorflow_Tfjs_Model_Hosting_Pattern
Type Pattern Doc
Last Updated 2026-02-10 00:00 GMT

Overview

Making converted model files accessible over HTTP for browser-based loading. Deploying ML models for web inference requires serving model artifacts via standard web protocols, with proper configuration for cross-origin access, caching, and efficient delivery.

Theory

Browser-Based Model Loading Requirements

Browser-based ML inference with TensorFlow.js requires that model files are fetchable via HTTP(S). Unlike server-side frameworks where models can be loaded from the local filesystem, browser applications are constrained by the web security model and must retrieve model artifacts through network requests.

The fundamental requirements for serving TF.js models are:

  1. HTTP(S) accessibility: The model.json manifest and all weight shard files must be accessible at known URLs
  2. Co-location of artifacts: Weight shard files must be discoverable relative to the model.json URL (same base path)
  3. CORS compliance: Cross-Origin Resource Sharing headers must be configured when the model is served from a different origin than the web application
  4. Correct MIME types: The server should serve .json files as application/json and .bin files as application/octet-stream

The model.json Manifest and Weight Shards

When TensorFlow.js loads a model from a URL, the loading process is:

  1. Fetch model.json: The loader makes an HTTP GET request to the provided URL
  2. Parse weight manifest: The model.json file contains a weightsManifest array that lists weight shard filenames
  3. Resolve shard URLs: Shard filenames are resolved relative to the model.json URL's base path
  4. Fetch weight shards: All .bin shard files are fetched (potentially in parallel)
  5. Deserialize weights: Binary data is deserialized into typed arrays and assigned to model variables

This means the directory structure on the server must preserve the file layout produced by the converter:

/models/my-model/
  model.json              # Manifest file
  group1-shard1of3.bin    # Weight shard 1
  group1-shard2of3.bin    # Weight shard 2
  group1-shard3of3.bin    # Weight shard 3

CORS Configuration

Cross-Origin Resource Sharing (CORS) is critical when the model files are served from a different domain, subdomain, or port than the web application. Without proper CORS headers, the browser will block the model loading requests.

Required CORS headers:

Header Value Purpose
Access-Control-Allow-Origin * or specific origin Permits cross-origin requests from the web application
Access-Control-Allow-Methods GET, HEAD, OPTIONS Permits the HTTP methods used for model loading
Access-Control-Allow-Headers Content-Type Permits the request headers sent by the TF.js loader
Access-Control-Expose-Headers Content-Length Allows the client to read the response size for progress tracking

Hosting Options

Hosting Option CORS Support CDN Cost Best For
Google Cloud Storage (GCS) Built-in configurable Google CDN Pay per use Production, large models
Amazon S3 + CloudFront Bucket CORS policy CloudFront CDN Pay per use AWS-based deployments
Azure Blob Storage Built-in configurable Azure CDN Pay per use Azure-based deployments
GitHub Pages Permissive by default GitHub CDN Free (public repos) Open-source demos
Firebase Hosting Configurable via firebase.json Firebase CDN Free tier available Firebase-integrated apps
Bundled with application Same-origin (no CORS needed) Application CDN Included Small models, offline apps
Custom web server Must configure manually Optional Self-hosted Full control requirements

Caching Strategy

Model files are typically large and change infrequently, making them ideal candidates for aggressive caching:

  • Weight shard files (.bin): These are immutable once generated. Use long cache TTLs (e.g., Cache-Control: public, max-age=31536000, immutable)
  • model.json: This changes when the model is updated. Use shorter cache TTLs or cache-busting techniques (e.g., versioned URLs like /models/v2/model.json)
  • ETags: Enable ETag-based conditional requests for efficient cache revalidation

Compression

Weight shard files (.bin) contain binary floating-point data that does not compress well with standard HTTP compression (gzip/brotli). However, model.json files can benefit significantly from compression due to their text-based JSON content.

Security Considerations

  • Model intellectual property: Serving models publicly exposes the model architecture and weights. Consider authentication if model IP protection is required
  • Model integrity: Use HTTPS to prevent man-in-the-middle attacks that could tamper with model weights
  • Access control: Use the requestInit option in TF.js model loading to pass authentication headers for private model endpoints

Inputs and Outputs

Inputs

  • Converted model artifacts from the tensorflowjs_converter: model.json file and one or more .bin weight shard files
  • A web server or cloud storage service configured to serve static files

Outputs

Example Configurations

Google Cloud Storage

# Upload model files to GCS bucket
gsutil cp -r /path/to/converted_model/* gs://my-bucket/models/

# Set CORS configuration
gsutil cors set cors-config.json gs://my-bucket

# Make files publicly readable
gsutil iam ch allUsers:objectViewer gs://my-bucket

Nginx Configuration

server {
    location /models/ {
        root /var/www/;
        add_header Access-Control-Allow-Origin *;
        add_header Access-Control-Allow-Methods "GET, HEAD, OPTIONS";
        add_header Cache-Control "public, max-age=86400";

        # Correct MIME types
        types {
            application/json json;
            application/octet-stream bin;
        }
    }
}

Express.js Static Serving

const express = require('express');
const cors = require('cors');
const app = express();

// Enable CORS for all routes
app.use(cors());

// Serve model files as static assets
app.use('/models', express.static('path/to/converted_model'));

app.listen(3000);

See Also

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment