Implementation:Tensorflow Tfjs LayersModel Evaluate

Overview

Tensorflow_Tfjs_LayersModel_Evaluate documents the TensorFlow.js API for evaluating a trained model's performance on test data. It provides two methods: evaluate() for in-memory tensor data and evaluateDataset() for streaming dataset evaluation.

Principle:Tensorflow_Tfjs_Model_Evaluation

TensorFlow.js

Deep_Learning Model_Assessment

Environment:Tensorflow_Tfjs_Browser_Runtime Environment:Tensorflow_Tfjs_Node_Native_Runtime

Type: API Doc

External Dependencies: @tensorflow/tfjs-core

API Signature

evaluate()

evaluate(
  x: Tensor | Tensor[],
  y: Tensor | Tensor[],
  args?: ModelEvaluateArgs
): Scalar | Scalar[]

evaluateDataset()

async evaluateDataset<T>(
  dataset: Dataset<T> | LazyIterator<T>,
  args: ModelEvaluateDatasetArgs
): Promise<Scalar | Scalar[]>

ModelEvaluateArgs

Parameter	Type	Default	Description
batchSize	`number`	32	Number of samples per evaluation batch
verbose	`ModelLoggingVerbosity`	—	Verbosity level for logging during evaluation
steps	`number`	—	Total number of steps (batches) before declaring evaluation complete. If not specified, evaluates over the entire dataset.

ModelEvaluateDatasetArgs

Parameter	Type	Default	Description
batches	`number`	—	Number of batches to draw from the dataset. If not specified, iterates until the dataset is exhausted.
verbose	`ModelLoggingVerbosity`	—	Verbosity level for logging during evaluation

Code Reference

Source files:

tfjs-layers/src/engine/training.ts — Lines 840-863 (evaluate method)
tfjs-layers/src/engine/training_dataset.ts — Lines 533-614 (evaluateDataset method)

The evaluate method internally calls the model's forward pass in inference mode (no dropout, uses running batch normalization statistics) and computes the compiled loss and metrics over the provided tensors. It processes data in batches of the specified size.

The evaluateDataset method works similarly but pulls batches from a tf.data.Dataset pipeline, enabling evaluation of datasets that exceed available memory.

Import

import * as tf from '@tensorflow/tfjs';

I/O Contract

Inputs

Input	Type	Description
Model	`LayersModel`	A trained and compiled model with loss and metrics defined
x (evaluate)	Tensor[]	Test input data. A single tensor for single-input models, or an array for multi-input models.
y (evaluate)	Tensor[]	Test target (label) data. Must match the shape expected by the model's loss function.
dataset (evaluateDataset)	LazyIterator<T>	A dataset yielding batches of {xs, ys} pairs for streaming evaluation
args	ModelEvaluateDatasetArgs	Optional configuration for batch size, verbosity, and step count

Outputs

Output	Type	Description
Single metric	`Scalar`	If the model has no additional metrics (only loss), returns a single Scalar representing the test loss
Multiple metrics	`Scalar[]`	If the model has additional metrics, returns an array where the first element is the loss and subsequent elements are the metric values in the order they were specified during compilation

Call .dataSync() on any returned Scalar to extract the numeric value as a Float32Array. Access the first element with [0] for a plain number.

Usage Examples

Basic Evaluation with Tensors

// Assume model is already trained and compiled with:
//   loss: 'categoricalCrossentropy', metrics: ['accuracy']

const testXs = tf.randomNormal([100, 784]);  // 100 test samples, 784 features
const testYs = tf.oneHot(tf.randomUniform([100], 0, 10, 'int32'), 10);

const [loss, accuracy] = model.evaluate(testXs, testYs, {batchSize: 32});
console.log('Test loss:', loss.dataSync()[0]);
console.log('Test accuracy:', accuracy.dataSync()[0]);

// Dispose tensors to free memory
loss.dispose();
accuracy.dispose();
testXs.dispose();
testYs.dispose();

Evaluation with Custom Batch Size

// Use a smaller batch size for memory-constrained environments
const result = model.evaluate(testXs, testYs, {
  batchSize: 8,
  verbose: 1
});

if (Array.isArray(result)) {
  console.log('Loss:', result[0].dataSync()[0]);
  console.log('Accuracy:', result[1].dataSync()[0]);
  result.forEach(scalar => scalar.dispose());
} else {
  console.log('Loss:', result.dataSync()[0]);
  result.dispose();
}

Evaluation with Dataset (Streaming)

// Create a dataset for large test sets
const testDataset = tf.data.generator(function* () {
  for (let i = 0; i < 1000; i++) {
    yield {
      xs: tf.randomNormal([784]),
      ys: tf.oneHot(Math.floor(Math.random() * 10), 10)
    };
  }
}).batch(32);

const results = await model.evaluateDataset(testDataset, {
  batches: 10  // Evaluate on first 10 batches only
});

if (Array.isArray(results)) {
  console.log('Test loss:', results[0].dataSync()[0]);
  console.log('Test accuracy:', results[1].dataSync()[0]);
  results.forEach(scalar => scalar.dispose());
} else {
  console.log('Test loss:', results.dataSync()[0]);
  results.dispose();
}

Comparing Training vs. Evaluation Metrics

// After training, compare train and test performance
const trainResult = model.evaluate(trainXs, trainYs, {batchSize: 32});
const testResult = model.evaluate(testXs, testYs, {batchSize: 32});

const trainLoss = trainResult[0].dataSync()[0];
const trainAcc = trainResult[1].dataSync()[0];
const testLoss = testResult[0].dataSync()[0];
const testAcc = testResult[1].dataSync()[0];

console.log(`Train Loss: ${trainLoss.toFixed(4)}, Train Acc: ${trainAcc.toFixed(4)}`);
console.log(`Test  Loss: ${testLoss.toFixed(4)}, Test  Acc: ${testAcc.toFixed(4)}`);

if (trainAcc - testAcc > 0.1) {
  console.log('Warning: Possible overfitting detected.');
}

// Dispose all results
trainResult.forEach(s => s.dispose());
testResult.forEach(s => s.dispose());

Important Notes

The model must be compiled before calling evaluate() or evaluateDataset(). Calling these methods on an uncompiled model will throw an error.
The returned Scalar objects are TensorFlow.js tensors and must be disposed after extracting their values to avoid memory leaks.
The evaluate() method runs synchronously (returns Scalar or Scalar[] directly), while evaluateDataset() is asynchronous (returns a Promise).
The order of returned scalars is always: [loss, ...metrics] in the order metrics were specified during model.compile().
During evaluation, layers like Dropout are automatically set to inference mode (no randomness), and BatchNormalization uses its learned running averages rather than batch statistics.

Related Pages

Principle:Tensorflow_Tfjs_Model_Evaluation -- The principle this implementation realizes
Implementation:Tensorflow_Tfjs_LayersModel_Predict -- For generating predictions without loss/metric computation
Implementation:Tensorflow_Tfjs_LayersModel_Save -- For saving a model after evaluation confirms acceptable performance

Environments

Environment:Tensorflow_Tfjs_Browser_Runtime -- Browser runtime (WebGL / WebGPU / WASM / CPU backends)
Environment:Tensorflow_Tfjs_Node_Native_Runtime -- Node.js native runtime (TensorFlow C binding)

2026-02-10 00:00 GMT

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment