Principle:Tensorflow Tfjs Model Hosting
| Knowledge Sources | |
|---|---|
| Domains | Deployment, Infrastructure |
| Implementation | Implementation:Tensorflow_Tfjs_Model_Hosting_Pattern |
| Type | Pattern Doc |
| Last Updated | 2026-02-10 00:00 GMT |
Overview
Making converted model files accessible over HTTP for browser-based loading. Deploying ML models for web inference requires serving model artifacts via standard web protocols, with proper configuration for cross-origin access, caching, and efficient delivery.
Theory
Browser-Based Model Loading Requirements
Browser-based ML inference with TensorFlow.js requires that model files are fetchable via HTTP(S). Unlike server-side frameworks where models can be loaded from the local filesystem, browser applications are constrained by the web security model and must retrieve model artifacts through network requests.
The fundamental requirements for serving TF.js models are:
- HTTP(S) accessibility: The model.json manifest and all weight shard files must be accessible at known URLs
- Co-location of artifacts: Weight shard files must be discoverable relative to the model.json URL (same base path)
- CORS compliance: Cross-Origin Resource Sharing headers must be configured when the model is served from a different origin than the web application
- Correct MIME types: The server should serve .json files as application/json and .bin files as application/octet-stream
The model.json Manifest and Weight Shards
When TensorFlow.js loads a model from a URL, the loading process is:
- Fetch model.json: The loader makes an HTTP GET request to the provided URL
- Parse weight manifest: The model.json file contains a weightsManifest array that lists weight shard filenames
- Resolve shard URLs: Shard filenames are resolved relative to the model.json URL's base path
- Fetch weight shards: All .bin shard files are fetched (potentially in parallel)
- Deserialize weights: Binary data is deserialized into typed arrays and assigned to model variables
This means the directory structure on the server must preserve the file layout produced by the converter:
/models/my-model/
model.json # Manifest file
group1-shard1of3.bin # Weight shard 1
group1-shard2of3.bin # Weight shard 2
group1-shard3of3.bin # Weight shard 3
CORS Configuration
Cross-Origin Resource Sharing (CORS) is critical when the model files are served from a different domain, subdomain, or port than the web application. Without proper CORS headers, the browser will block the model loading requests.
Required CORS headers:
| Header | Value | Purpose |
|---|---|---|
| Access-Control-Allow-Origin | * or specific origin | Permits cross-origin requests from the web application |
| Access-Control-Allow-Methods | GET, HEAD, OPTIONS | Permits the HTTP methods used for model loading |
| Access-Control-Allow-Headers | Content-Type | Permits the request headers sent by the TF.js loader |
| Access-Control-Expose-Headers | Content-Length | Allows the client to read the response size for progress tracking |
Hosting Options
| Hosting Option | CORS Support | CDN | Cost | Best For |
|---|---|---|---|---|
| Google Cloud Storage (GCS) | Built-in configurable | Google CDN | Pay per use | Production, large models |
| Amazon S3 + CloudFront | Bucket CORS policy | CloudFront CDN | Pay per use | AWS-based deployments |
| Azure Blob Storage | Built-in configurable | Azure CDN | Pay per use | Azure-based deployments |
| GitHub Pages | Permissive by default | GitHub CDN | Free (public repos) | Open-source demos |
| Firebase Hosting | Configurable via firebase.json | Firebase CDN | Free tier available | Firebase-integrated apps |
| Bundled with application | Same-origin (no CORS needed) | Application CDN | Included | Small models, offline apps |
| Custom web server | Must configure manually | Optional | Self-hosted | Full control requirements |
Caching Strategy
Model files are typically large and change infrequently, making them ideal candidates for aggressive caching:
- Weight shard files (.bin): These are immutable once generated. Use long cache TTLs (e.g., Cache-Control: public, max-age=31536000, immutable)
- model.json: This changes when the model is updated. Use shorter cache TTLs or cache-busting techniques (e.g., versioned URLs like /models/v2/model.json)
- ETags: Enable ETag-based conditional requests for efficient cache revalidation
Compression
Weight shard files (.bin) contain binary floating-point data that does not compress well with standard HTTP compression (gzip/brotli). However, model.json files can benefit significantly from compression due to their text-based JSON content.
Security Considerations
- Model intellectual property: Serving models publicly exposes the model architecture and weights. Consider authentication if model IP protection is required
- Model integrity: Use HTTPS to prevent man-in-the-middle attacks that could tamper with model weights
- Access control: Use the requestInit option in TF.js model loading to pass authentication headers for private model endpoints
Inputs and Outputs
Inputs
- Converted model artifacts from the tensorflowjs_converter: model.json file and one or more .bin weight shard files
- A web server or cloud storage service configured to serve static files
Outputs
- A publicly accessible URL pointing to the model.json file (e.g., https://storage.googleapis.com/my-bucket/models/model.json)
- All weight shard files accessible at the same base URL path
- CORS headers configured to allow access from the web application's origin
Example Configurations
Google Cloud Storage
# Upload model files to GCS bucket
gsutil cp -r /path/to/converted_model/* gs://my-bucket/models/
# Set CORS configuration
gsutil cors set cors-config.json gs://my-bucket
# Make files publicly readable
gsutil iam ch allUsers:objectViewer gs://my-bucket
Nginx Configuration
server {
location /models/ {
root /var/www/;
add_header Access-Control-Allow-Origin *;
add_header Access-Control-Allow-Methods "GET, HEAD, OPTIONS";
add_header Cache-Control "public, max-age=86400";
# Correct MIME types
types {
application/json json;
application/octet-stream bin;
}
}
}
Express.js Static Serving
const express = require('express');
const cors = require('cors');
const app = express();
// Enable CORS for all routes
app.use(cors());
// Serve model files as static assets
app.use('/models', express.static('path/to/converted_model'));
app.listen(3000);
See Also
- Implementation:Tensorflow_Tfjs_Model_Hosting_Pattern — Concrete implementation of this principle
- Principle:Tensorflow_Tfjs_Model_Format_Conversion — Previous step: converting the model
- Principle:Tensorflow_Tfjs_Pretrained_Model_Loading — Next step: loading the hosted model in JavaScript