Implementation:Tensorflow Tfjs Tf Tensor Creation
| Knowledge Sources | |
|---|---|
| Domains | Data_Preprocessing, Deep_Learning |
| Last Updated | 2026-02-10 00:00 GMT |
Environment:Tensorflow_Tfjs_Browser_Runtime
Overview
Concrete APIs for converting raw JavaScript data into TensorFlow.js tensor format, using tf.tensor2d() for in-memory data and tf.data.generator() for streaming datasets.
Description
TensorFlow.js provides two primary approaches for preparing training data:
In-memory tensor creation via tf.tensor2d() converts JavaScript arrays or TypedArrays directly into GPU-backed 2D tensors. This is the simplest and fastest approach when the entire dataset fits in memory. The function accepts nested arrays (e.g., [[1, 2], [3, 4]]) or flat TypedArrays with an explicit shape parameter. The resulting Tensor2D has shape [numSamples, numFeatures] and is ready to be passed directly to model.fit().
Streaming dataset creation via tf.data.generator() wraps a JavaScript generator function into a Dataset object. The generator yields individual data points on demand, enabling constant-memory processing of arbitrarily large datasets. Each yielded value should contain xs (features) and ys (labels). The resulting Dataset supports chained operations like .batch(), .shuffle(), and .prefetch(), and is consumed via model.fitDataset() rather than model.fit().
Key difference: tf.tensor2d() creates the entire tensor immediately in memory, while tf.data.generator() creates a lazy pipeline that produces data on demand. Choose based on dataset size relative to available memory.
Code Reference
Source
Repository: https://github.com/tensorflow/tfjs
| File/Package | Key Locations |
|---|---|
@tensorflow/tfjs-core |
tf.tensor2d() — tensor creation from arrays/TypedArrays
|
@tensorflow/tfjs-data |
tf.data.generator() — dataset creation from generator functions
|
@tensorflow/tfjs-data |
Dataset class with .batch(), .shuffle(), .prefetch() methods
|
Note: This is a Wrapper Doc documenting the public API surface. The underlying implementations are spread across the @tensorflow/tfjs-core and @tensorflow/tfjs-data packages.
Signature
// In-memory 2D tensor creation
tf.tensor2d(
values: TypedArray | number[][],
shape?: [number, number],
dtype?: DataType
): Tensor2D
// Streaming dataset from generator
tf.data.generator(
generator: () => Iterator<{value: {xs: any, ys: any}}>
): Dataset
Import
import * as tf from '@tensorflow/tfjs';
// Then use:
// tf.tensor2d(values, shape, dtype)
// tf.data.generator(generatorFunction)
External Dependencies
@tensorflow/tfjs-core— Providestf.tensor2d()and all tensor operations. Handles memory allocation on the active backend (CPU, WebGL, WebGPU, WASM).@tensorflow/tfjs-data— Providestf.data.generator()and theDatasetclass with batching, shuffling, and prefetching capabilities.
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| values (tensor2d) | number[][] | Yes | Raw data as nested arrays or flat typed array |
| shape (tensor2d) | [number, number] |
Conditional | Required when values is a flat TypedArray; inferred from nested array structure otherwise |
| dtype (tensor2d) | DataType |
No | Data type of elements; defaults to 'float32'
|
| generator | () => Iterator |
Yes (for generator) | Factory function that returns an iterator yielding {value: {xs, ys}} objects
|
Outputs
| Name | Type | Description |
|---|---|---|
| tensor2d return | Tensor2D |
GPU-backed 2D tensor with shape [rows, cols], ready for model.fit()
|
| generator return | Dataset |
Lazy dataset object supporting .batch(), .shuffle(), .prefetch(), consumed via model.fitDataset()
|
Usage Examples
Basic In-Memory Tensor Creation
import * as tf from '@tensorflow/tfjs';
// Create features and labels from nested arrays
const xs = tf.tensor2d([[0, 0], [0, 1], [1, 0], [1, 1]]); // shape: [4, 2]
const ys = tf.tensor2d([[0], [1], [1], [0]]); // shape: [4, 1]
// xs.shape => [4, 2]
// ys.shape => [4, 1]
Tensor Creation from Flat TypedArray
import * as tf from '@tensorflow/tfjs';
// Flat Float32Array with explicit shape
const data = new Float32Array([1, 2, 3, 4, 5, 6]);
const tensor = tf.tensor2d(data, [3, 2]); // 3 samples, 2 features each
// tensor.shape => [3, 2]
// tensor.print() =>
// [[1, 2],
// [3, 4],
// [5, 6]]
Normalization Before Tensor Creation
import * as tf from '@tensorflow/tfjs';
// Raw data with different scales
const rawData = [[25, 50000], [30, 60000], [35, 70000], [40, 80000]];
// Create tensor and normalize
const tensor = tf.tensor2d(rawData);
const min = tensor.min(0); // per-feature minimum
const max = tensor.max(0); // per-feature maximum
const normalized = tensor.sub(min).div(max.sub(min)); // min-max scaling to [0, 1]
// normalized.print() =>
// [[0, 0],
// [0.333, 0.333],
// [0.667, 0.667],
// [1, 1]]
One-Hot Encoding Labels
import * as tf from '@tensorflow/tfjs';
// Integer class labels
const labels = [0, 2, 1, 0, 2];
// Convert to one-hot encoding
const oneHot = tf.oneHot(tf.tensor1d(labels, 'int32'), 3);
// oneHot.shape => [5, 3]
// oneHot.print() =>
// [[1, 0, 0],
// [0, 0, 1],
// [0, 1, 0],
// [1, 0, 0],
// [0, 0, 1]]
// Use as 2D label tensor for categoricalCrossentropy loss
Streaming Dataset from Generator
import * as tf from '@tensorflow/tfjs';
// Create dataset from generator function
const dataset = tf.data.generator(function* () {
for (let i = 0; i < 100; i++) {
yield {value: {xs: [i / 100], ys: [Math.sin(i / 100)]}};
}
});
// Chain dataset operations
const batchedDataset = dataset
.shuffle(100) // shuffle buffer of 100 elements
.batch(16); // batch size of 16
// Use with fitDataset instead of fit
await model.fitDataset(batchedDataset, {
epochs: 50
});
Large Dataset with Prefetching
import * as tf from '@tensorflow/tfjs';
// Simulate a large dataset loaded in chunks
function* largeDataGenerator() {
for (let i = 0; i < 10000; i++) {
const features = Array.from({length: 20}, () => Math.random());
const label = features[0] > 0.5 ? [1] : [0];
yield {value: {xs: features, ys: label}};
}
}
const dataset = tf.data.generator(largeDataGenerator)
.shuffle(1000) // shuffle buffer
.batch(32) // batch size
.prefetch(2); // prefetch 2 batches ahead for performance
await model.fitDataset(dataset, {
epochs: 10,
validationData: validationDataset // another Dataset for validation
});
Related Pages
Implements Principle
Environments
- Environment:Tensorflow_Tfjs_Browser_Runtime -- Browser runtime (WebGL / WebGPU / WASM / CPU backends)