Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Tensorflow Tfjs GPT2Preprocessor Constructor

From Leeroopedia


Summary

GPT2Preprocessor prepares raw text for GPT-2 model input by tokenizing, adding start/end tokens, padding, and generating attention masks. It combines GPT2Tokenizer for tokenization with StartEndPacker for sequence packing into fixed-length tensors.

API

new GPT2Preprocessor(args: GPT2PreprocessorArgs) + StartEndPacker

Source

  • tfjs-layers/src/layers/nlp/models/gpt2/gpt2_preprocessor.ts:L126-218 (GPT2Preprocessor)
  • tfjs-layers/src/layers/nlp/preprocessing/start_end_packer.ts:L86-198 (StartEndPacker)

Type

API Doc

Signatures

GPT2Preprocessor

interface GPT2PreprocessorArgs extends LayerArgs {
  tokenizer: GPT2Tokenizer;
  sequenceLength?: number;  // default 1024
  addStartToken?: boolean;  // default true
  addEndToken?: boolean;    // default true
}

class GPT2Preprocessor extends Preprocessor {
  constructor(args: GPT2PreprocessorArgs)
  call(inputs: Tensor|Tensor[], kwargs: GPT2PreprocessorOptions): Tensor|Tensor[]
  callAndPackArgs(inputs, kwargs): NamedTensorMap | [NamedTensorMap, Tensor] | [NamedTensorMap, Tensor, Tensor]
}

StartEndPacker

interface StartEndPackerArgs extends LayerArgs {
  sequenceLength: number;
  startValue?: number|string;
  endValue?: number|string;
  padValue?: number|string;
}

class StartEndPacker extends Layer {
  call(inputs: Tensor|Tensor[], kwargs?: StartEndPackerOptions): Tensor|Tensor2D
  callAndReturnPaddingMask(inputs, kwargs?): [Tensor1D|Tensor2D, Tensor1D|Tensor2D]
}

Constructor Parameters

GPT2PreprocessorArgs

Parameter Type Default Description
tokenizer GPT2Tokenizer (required) The GPT-2 tokenizer instance for encoding text
sequenceLength number 1024 Target length for all output sequences
addStartToken boolean true Whether to prepend the start-of-sequence token
addEndToken boolean true Whether to append the end-of-sequence token

StartEndPackerArgs

Parameter Type Default Description
sequenceLength number (required) Fixed output length for packed sequences
startValue string undefined Value to prepend as start token
endValue string undefined Value to append as end token
padValue string undefined Value to use for padding

Methods

Method Signature Description
call Tensor[], kwargs: GPT2PreprocessorOptions): Tensor | Tensor[] Preprocesses input text tensors
callAndPackArgs [NamedTensorMap, Tensor] | [NamedTensorMap, Tensor, Tensor] Preprocesses and packs into named tensor map with token IDs and padding mask

I/O

  • Inputs: GPT2Tokenizer instance + raw text strings as a Tensor of dtype 'string'
  • Outputs: NamedTensorMap containing:
    • tokenIds: Tensor2D of padded token IDs
    • paddingMask: Tensor2D where 1 = real token, 0 = padding

Example

const preprocessor = new GPT2Preprocessor({
  tokenizer: tokenizer,
  sequenceLength: 128,
  addStartToken: true,
  addEndToken: true,
});

const text = tf.tensor1d(['Once upon a time'], 'string');
const processed = preprocessor.callAndPackArgs(text, {});
// processed.tokenIds: [1, 7454, 2402, 257, 640, 2, 0, 0, ...]
// processed.paddingMask: [1, 1, 1, 1, 1, 1, 0, 0, ...]

Implements

Principle:Tensorflow_Tfjs_Sequence_Preprocessing

Environment:Tensorflow_Tfjs_Browser_Runtime

Domains

NLP Data_Preprocessing

Sources

TensorFlow.js

Related Pages

Environments

Metadata

2026-02-10 00:00 GMT

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment