Implementation:Tensorflow Tfjs LayersModel Evaluate
Overview
Tensorflow_Tfjs_LayersModel_Evaluate documents the TensorFlow.js API for evaluating a trained model's performance on test data. It provides two methods: evaluate() for in-memory tensor data and evaluateDataset() for streaming dataset evaluation.
Principle:Tensorflow_Tfjs_Model_Evaluation
Deep_Learning Model_Assessment
Environment:Tensorflow_Tfjs_Browser_Runtime Environment:Tensorflow_Tfjs_Node_Native_Runtime
Type: API Doc
External Dependencies: @tensorflow/tfjs-core
API Signature
evaluate()
evaluate(
x: Tensor | Tensor[],
y: Tensor | Tensor[],
args?: ModelEvaluateArgs
): Scalar | Scalar[]
evaluateDataset()
async evaluateDataset<T>(
dataset: Dataset<T> | LazyIterator<T>,
args: ModelEvaluateDatasetArgs
): Promise<Scalar | Scalar[]>
ModelEvaluateArgs
| Parameter | Type | Default | Description |
|---|---|---|---|
| batchSize | number |
32 | Number of samples per evaluation batch |
| verbose | ModelLoggingVerbosity |
— | Verbosity level for logging during evaluation |
| steps | number |
— | Total number of steps (batches) before declaring evaluation complete. If not specified, evaluates over the entire dataset. |
ModelEvaluateDatasetArgs
| Parameter | Type | Default | Description |
|---|---|---|---|
| batches | number |
— | Number of batches to draw from the dataset. If not specified, iterates until the dataset is exhausted. |
| verbose | ModelLoggingVerbosity |
— | Verbosity level for logging during evaluation |
Code Reference
Source files:
tfjs-layers/src/engine/training.ts— Lines 840-863 (evaluate method)tfjs-layers/src/engine/training_dataset.ts— Lines 533-614 (evaluateDataset method)
The evaluate method internally calls the model's forward pass in inference mode (no dropout, uses running batch normalization statistics) and computes the compiled loss and metrics over the provided tensors. It processes data in batches of the specified size.
The evaluateDataset method works similarly but pulls batches from a tf.data.Dataset pipeline, enabling evaluation of datasets that exceed available memory.
Import
import * as tf from '@tensorflow/tfjs';
I/O Contract
Inputs
| Input | Type | Description |
|---|---|---|
| Model | LayersModel |
A trained and compiled model with loss and metrics defined |
| x (evaluate) | Tensor[] | Test input data. A single tensor for single-input models, or an array for multi-input models. |
| y (evaluate) | Tensor[] | Test target (label) data. Must match the shape expected by the model's loss function. |
| dataset (evaluateDataset) | LazyIterator<T> | A dataset yielding batches of {xs, ys} pairs for streaming evaluation |
| args | ModelEvaluateDatasetArgs | Optional configuration for batch size, verbosity, and step count |
Outputs
| Output | Type | Description |
|---|---|---|
| Single metric | Scalar |
If the model has no additional metrics (only loss), returns a single Scalar representing the test loss |
| Multiple metrics | Scalar[] |
If the model has additional metrics, returns an array where the first element is the loss and subsequent elements are the metric values in the order they were specified during compilation |
Call .dataSync() on any returned Scalar to extract the numeric value as a Float32Array. Access the first element with [0] for a plain number.
Usage Examples
Basic Evaluation with Tensors
// Assume model is already trained and compiled with:
// loss: 'categoricalCrossentropy', metrics: ['accuracy']
const testXs = tf.randomNormal([100, 784]); // 100 test samples, 784 features
const testYs = tf.oneHot(tf.randomUniform([100], 0, 10, 'int32'), 10);
const [loss, accuracy] = model.evaluate(testXs, testYs, {batchSize: 32});
console.log('Test loss:', loss.dataSync()[0]);
console.log('Test accuracy:', accuracy.dataSync()[0]);
// Dispose tensors to free memory
loss.dispose();
accuracy.dispose();
testXs.dispose();
testYs.dispose();
Evaluation with Custom Batch Size
// Use a smaller batch size for memory-constrained environments
const result = model.evaluate(testXs, testYs, {
batchSize: 8,
verbose: 1
});
if (Array.isArray(result)) {
console.log('Loss:', result[0].dataSync()[0]);
console.log('Accuracy:', result[1].dataSync()[0]);
result.forEach(scalar => scalar.dispose());
} else {
console.log('Loss:', result.dataSync()[0]);
result.dispose();
}
Evaluation with Dataset (Streaming)
// Create a dataset for large test sets
const testDataset = tf.data.generator(function* () {
for (let i = 0; i < 1000; i++) {
yield {
xs: tf.randomNormal([784]),
ys: tf.oneHot(Math.floor(Math.random() * 10), 10)
};
}
}).batch(32);
const results = await model.evaluateDataset(testDataset, {
batches: 10 // Evaluate on first 10 batches only
});
if (Array.isArray(results)) {
console.log('Test loss:', results[0].dataSync()[0]);
console.log('Test accuracy:', results[1].dataSync()[0]);
results.forEach(scalar => scalar.dispose());
} else {
console.log('Test loss:', results.dataSync()[0]);
results.dispose();
}
Comparing Training vs. Evaluation Metrics
// After training, compare train and test performance
const trainResult = model.evaluate(trainXs, trainYs, {batchSize: 32});
const testResult = model.evaluate(testXs, testYs, {batchSize: 32});
const trainLoss = trainResult[0].dataSync()[0];
const trainAcc = trainResult[1].dataSync()[0];
const testLoss = testResult[0].dataSync()[0];
const testAcc = testResult[1].dataSync()[0];
console.log(`Train Loss: ${trainLoss.toFixed(4)}, Train Acc: ${trainAcc.toFixed(4)}`);
console.log(`Test Loss: ${testLoss.toFixed(4)}, Test Acc: ${testAcc.toFixed(4)}`);
if (trainAcc - testAcc > 0.1) {
console.log('Warning: Possible overfitting detected.');
}
// Dispose all results
trainResult.forEach(s => s.dispose());
testResult.forEach(s => s.dispose());
Important Notes
- The model must be compiled before calling
evaluate()orevaluateDataset(). Calling these methods on an uncompiled model will throw an error. - The returned
Scalarobjects are TensorFlow.js tensors and must be disposed after extracting their values to avoid memory leaks. - The
evaluate()method runs synchronously (returns Scalar or Scalar[] directly), whileevaluateDataset()is asynchronous (returns a Promise). - The order of returned scalars is always: [loss, ...metrics] in the order metrics were specified during
model.compile(). - During evaluation, layers like Dropout are automatically set to inference mode (no randomness), and BatchNormalization uses its learned running averages rather than batch statistics.
Related Pages
- Principle:Tensorflow_Tfjs_Model_Evaluation -- The principle this implementation realizes
- Implementation:Tensorflow_Tfjs_LayersModel_Predict -- For generating predictions without loss/metric computation
- Implementation:Tensorflow_Tfjs_LayersModel_Save -- For saving a model after evaluation confirms acceptable performance
Environments
- Environment:Tensorflow_Tfjs_Browser_Runtime -- Browser runtime (WebGL / WebGPU / WASM / CPU backends)
- Environment:Tensorflow_Tfjs_Node_Native_Runtime -- Node.js native runtime (TensorFlow C binding)