Implementation:Tensorflow Tfjs GPT2Backbone Constructor
Appearance
Summary
GPT2Backbone constructs the core GPT-2 transformer architecture in TensorFlow.js, consisting of token embeddings, positional embeddings, N transformer decoder blocks, and final layer normalization. It uses TransformerDecoder layers for the decoder blocks and PositionEmbedding for positional encoding.
API
new GPT2Backbone(args: GPT2BackboneArgs) + TransformerDecoder + PositionEmbedding
Source
tfjs-layers/src/layers/nlp/models/gpt2/gpt2_backbone.ts:L125-221(GPT2Backbone)tfjs-layers/src/layers/nlp/modeling/transformer_decoder.ts:L206-494(TransformerDecoder)tfjs-layers/src/layers/nlp/modeling/position_embedding.ts:L86-147(PositionEmbedding)
Type
API Doc
Signatures
GPT2Backbone
interface GPT2BackboneArgs {
vocabularySize: number;
numLayers: number;
numHeads: number;
hiddenDim: number;
intermediateDim: number;
dropout?: number; // default 0.1
maxSequenceLength?: number; // default 1024
}
class GPT2Backbone extends Backbone {
constructor(args: GPT2BackboneArgs)
get tokenEmbedding(): Embedding
}
TransformerDecoder
interface TransformerDecoderArgs extends LayerArgs {
intermediateDim: number;
numHeads: number;
dropout?: number;
activation?: Activation|ActivationIdentifier;
layerNormEpsilon?: number;
normalizeFirst?: boolean;
}
class TransformerDecoder extends Layer {
call(decoderSequence: Tensor, kwargs: TransformerDecoderOptions): Tensor
callAndReturnCaches(decoderSequence, kwargs): [Tensor, Tensor, Tensor]
}
PositionEmbedding
interface PositionEmbeddingArgs extends LayerArgs {
sequenceLength: number;
initializer?: Initializer|InitializerIdentifier;
}
class PositionEmbedding extends Layer {
call(inputs: Tensor|Tensor[], kwargs?: PositionEmbeddingOptions): Tensor
}
Constructor Parameters
GPT2BackboneArgs
| Parameter | Type | Default | Description |
|---|---|---|---|
vocabularySize |
number |
(required) | Number of tokens in the vocabulary (e.g., 50257 for GPT-2) |
numLayers |
number |
(required) | Number of transformer decoder blocks |
numHeads |
number |
(required) | Number of attention heads per decoder block |
hiddenDim |
number |
(required) | Dimensionality of the hidden representations |
intermediateDim |
number |
(required) | Inner dimension of the feed-forward network in each block |
dropout |
number |
0.1 | Dropout rate for regularization |
maxSequenceLength |
number |
1024 | Maximum sequence length for positional embeddings |
TransformerDecoderArgs
| Parameter | Type | Default | Description |
|---|---|---|---|
intermediateDim |
number |
(required) | Inner dimension of the feed-forward network |
numHeads |
number |
(required) | Number of attention heads |
dropout |
number |
undefined | Dropout rate |
activation |
ActivationIdentifier | undefined | Activation function for the FFN |
layerNormEpsilon |
number |
undefined | Epsilon value for layer normalization |
normalizeFirst |
boolean |
undefined | Whether to apply layer norm before (pre-norm) or after sub-layers |
PositionEmbeddingArgs
| Parameter | Type | Default | Description |
|---|---|---|---|
sequenceLength |
number |
(required) | Maximum sequence length for positional encoding |
initializer |
InitializerIdentifier | undefined | Weight initializer for position embeddings |
Properties
| Property | Return Type | Description |
|---|---|---|
tokenEmbedding |
Embedding |
The token embedding layer (used for weight tying in the LM head) |
Methods
| Class | Method | Description |
|---|---|---|
TransformerDecoder |
call(decoderSequence, kwargs) |
Forward pass through a single decoder block |
TransformerDecoder |
callAndReturnCaches(decoderSequence, kwargs) |
Forward pass that also returns KV caches for autoregressive generation |
PositionEmbedding |
call(inputs, kwargs) |
Adds positional embeddings to input token embeddings |
I/O
- Inputs: Architecture hyperparameters (vocabulary size, number of layers, heads, dimensions)
- Outputs: A
GPT2Backbonemodel that accepts{token_ids, padding_mask}and produces sequence hidden states of shape [batch, seq_len, hidden_dim]
Example
const backbone = new GPT2Backbone({
vocabularySize: 50257,
numLayers: 12,
numHeads: 12,
hiddenDim: 768,
intermediateDim: 3072,
dropout: 0.1,
maxSequenceLength: 1024,
});
Implements
Principle:Tensorflow_Tfjs_Transformer_Backbone_Construction
Environment:Tensorflow_Tfjs_Browser_Runtime
Domains
Sources
Related Pages
Environments
- Environment:Tensorflow_Tfjs_Browser_Runtime -- Browser runtime (WebGL / WebGPU / WASM / CPU backends)
Metadata
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment