Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Tensorflow Tfjs LayersModel Evaluate

From Leeroopedia


Overview

Tensorflow_Tfjs_LayersModel_Evaluate documents the TensorFlow.js API for evaluating a trained model's performance on test data. It provides two methods: evaluate() for in-memory tensor data and evaluateDataset() for streaming dataset evaluation.

Principle:Tensorflow_Tfjs_Model_Evaluation

TensorFlow.js

Deep_Learning Model_Assessment

Environment:Tensorflow_Tfjs_Browser_Runtime Environment:Tensorflow_Tfjs_Node_Native_Runtime

Type: API Doc

External Dependencies: @tensorflow/tfjs-core

API Signature

evaluate()

evaluate(
  x: Tensor | Tensor[],
  y: Tensor | Tensor[],
  args?: ModelEvaluateArgs
): Scalar | Scalar[]

evaluateDataset()

async evaluateDataset<T>(
  dataset: Dataset<T> | LazyIterator<T>,
  args: ModelEvaluateDatasetArgs
): Promise<Scalar | Scalar[]>

ModelEvaluateArgs

Parameter Type Default Description
batchSize number 32 Number of samples per evaluation batch
verbose ModelLoggingVerbosity Verbosity level for logging during evaluation
steps number Total number of steps (batches) before declaring evaluation complete. If not specified, evaluates over the entire dataset.

ModelEvaluateDatasetArgs

Parameter Type Default Description
batches number Number of batches to draw from the dataset. If not specified, iterates until the dataset is exhausted.
verbose ModelLoggingVerbosity Verbosity level for logging during evaluation

Code Reference

Source files:

  • tfjs-layers/src/engine/training.ts — Lines 840-863 (evaluate method)
  • tfjs-layers/src/engine/training_dataset.ts — Lines 533-614 (evaluateDataset method)

The evaluate method internally calls the model's forward pass in inference mode (no dropout, uses running batch normalization statistics) and computes the compiled loss and metrics over the provided tensors. It processes data in batches of the specified size.

The evaluateDataset method works similarly but pulls batches from a tf.data.Dataset pipeline, enabling evaluation of datasets that exceed available memory.

Import

import * as tf from '@tensorflow/tfjs';

I/O Contract

Inputs

Input Type Description
Model LayersModel A trained and compiled model with loss and metrics defined
x (evaluate) Tensor[] Test input data. A single tensor for single-input models, or an array for multi-input models.
y (evaluate) Tensor[] Test target (label) data. Must match the shape expected by the model's loss function.
dataset (evaluateDataset) LazyIterator<T> A dataset yielding batches of {xs, ys} pairs for streaming evaluation
args ModelEvaluateDatasetArgs Optional configuration for batch size, verbosity, and step count

Outputs

Output Type Description
Single metric Scalar If the model has no additional metrics (only loss), returns a single Scalar representing the test loss
Multiple metrics Scalar[] If the model has additional metrics, returns an array where the first element is the loss and subsequent elements are the metric values in the order they were specified during compilation

Call .dataSync() on any returned Scalar to extract the numeric value as a Float32Array. Access the first element with [0] for a plain number.

Usage Examples

Basic Evaluation with Tensors

// Assume model is already trained and compiled with:
//   loss: 'categoricalCrossentropy', metrics: ['accuracy']

const testXs = tf.randomNormal([100, 784]);  // 100 test samples, 784 features
const testYs = tf.oneHot(tf.randomUniform([100], 0, 10, 'int32'), 10);

const [loss, accuracy] = model.evaluate(testXs, testYs, {batchSize: 32});
console.log('Test loss:', loss.dataSync()[0]);
console.log('Test accuracy:', accuracy.dataSync()[0]);

// Dispose tensors to free memory
loss.dispose();
accuracy.dispose();
testXs.dispose();
testYs.dispose();

Evaluation with Custom Batch Size

// Use a smaller batch size for memory-constrained environments
const result = model.evaluate(testXs, testYs, {
  batchSize: 8,
  verbose: 1
});

if (Array.isArray(result)) {
  console.log('Loss:', result[0].dataSync()[0]);
  console.log('Accuracy:', result[1].dataSync()[0]);
  result.forEach(scalar => scalar.dispose());
} else {
  console.log('Loss:', result.dataSync()[0]);
  result.dispose();
}

Evaluation with Dataset (Streaming)

// Create a dataset for large test sets
const testDataset = tf.data.generator(function* () {
  for (let i = 0; i < 1000; i++) {
    yield {
      xs: tf.randomNormal([784]),
      ys: tf.oneHot(Math.floor(Math.random() * 10), 10)
    };
  }
}).batch(32);

const results = await model.evaluateDataset(testDataset, {
  batches: 10  // Evaluate on first 10 batches only
});

if (Array.isArray(results)) {
  console.log('Test loss:', results[0].dataSync()[0]);
  console.log('Test accuracy:', results[1].dataSync()[0]);
  results.forEach(scalar => scalar.dispose());
} else {
  console.log('Test loss:', results.dataSync()[0]);
  results.dispose();
}

Comparing Training vs. Evaluation Metrics

// After training, compare train and test performance
const trainResult = model.evaluate(trainXs, trainYs, {batchSize: 32});
const testResult = model.evaluate(testXs, testYs, {batchSize: 32});

const trainLoss = trainResult[0].dataSync()[0];
const trainAcc = trainResult[1].dataSync()[0];
const testLoss = testResult[0].dataSync()[0];
const testAcc = testResult[1].dataSync()[0];

console.log(`Train Loss: ${trainLoss.toFixed(4)}, Train Acc: ${trainAcc.toFixed(4)}`);
console.log(`Test  Loss: ${testLoss.toFixed(4)}, Test  Acc: ${testAcc.toFixed(4)}`);

if (trainAcc - testAcc > 0.1) {
  console.log('Warning: Possible overfitting detected.');
}

// Dispose all results
trainResult.forEach(s => s.dispose());
testResult.forEach(s => s.dispose());

Important Notes

  • The model must be compiled before calling evaluate() or evaluateDataset(). Calling these methods on an uncompiled model will throw an error.
  • The returned Scalar objects are TensorFlow.js tensors and must be disposed after extracting their values to avoid memory leaks.
  • The evaluate() method runs synchronously (returns Scalar or Scalar[] directly), while evaluateDataset() is asynchronous (returns a Promise).
  • The order of returned scalars is always: [loss, ...metrics] in the order metrics were specified during model.compile().
  • During evaluation, layers like Dropout are automatically set to inference mode (no randomness), and BatchNormalization uses its learned running averages rather than batch statistics.

Related Pages

Environments

2026-02-10 00:00 GMT

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment