Implementation:Lucidrains X transformers XTransformer Init

Metadata

Field	Value
Repository	x-transformers
Domains	NLP, Model_Architecture
Last Updated	2026-02-08 18:00 GMT

Overview

Concrete tool for configuring encoder-decoder sequence-to-sequence transformer models provided by the x-transformers library.

Description

XTransformer combines an encoder TransformerWrapper and a decoder TransformerWrapper (wrapped with AutoregressiveWrapper) into a single module. It accepts all configuration via prefixed keyword arguments: enc_* for encoder settings and dec_* for decoder settings.

Key behaviors of __init__:

The shared dim parameter sets the model dimension for both encoder and decoder.
All keyword arguments prefixed with enc_ are extracted and forwarded to the encoder TransformerWrapper and its inner Encoder (AttentionLayers with causal=False).
All keyword arguments prefixed with dec_ are extracted and forwarded to the decoder TransformerWrapper and its inner Decoder (AttentionLayers with causal=True, cross_attend=True).
The encoder is internally configured with return_only_embed=True, so it outputs hidden states rather than logits.
The decoder is wrapped in AutoregressiveWrapper for automatic input/target splitting and loss computation.
tie_token_emb -- When True, the encoder and decoder share the same token embedding matrix. Useful when source and target vocabularies are identical (e.g., copy tasks, monolingual summarization).
cross_attn_tokens_dropout -- Applies dropout to cross-attention tokens during training as a regularization strategy. A fraction of encoder hidden states are randomly dropped before being passed to the decoder cross-attention layers.

Usage

Import XTransformer when building sequence-to-sequence models. Configure encoder and decoder separately via prefixed parameters. Use for machine translation, summarization, copy tasks, or any input-to-output sequence transduction.

Code Reference

Field	Value
Repository	x-transformers
File	`x_transformers/x_transformers.py`
Lines	L3830-3873

Signature:

class XTransformer(Module):
    def __init__(
        self,
        *,
        dim,
        tie_token_emb = False,
        ignore_index = -100,
        pad_value = 0,
        cross_attn_tokens_dropout = 0.,
        **kwargs  # enc_* and dec_* prefixed params
    ):

Import:

from x_transformers import XTransformer

I/O Contract

Inputs

Name	Type	Required	Description
`dim`	`int`	Yes	Shared model dimension for encoder and decoder
`tie_token_emb`	`bool`	No	Tie encoder/decoder token embeddings (default `False`)
`enc_num_tokens`	`int`	Yes	Encoder vocabulary size
`enc_depth`	`int`	Yes	Number of encoder layers
`enc_heads`	`int`	Yes	Number of encoder attention heads
`enc_max_seq_len`	`int`	Yes	Maximum encoder sequence length
`dec_num_tokens`	`int`	Yes	Decoder vocabulary size
`dec_depth`	`int`	Yes	Number of decoder layers
`dec_heads`	`int`	Yes	Number of decoder attention heads
`dec_max_seq_len`	`int`	Yes	Maximum decoder sequence length
`cross_attn_tokens_dropout`	`float`	No	Dropout rate for cross-attention tokens (default `0.`)

Outputs

Name	Type	Description
`model`	`XTransformer`	Module with `.encoder` (`TransformerWrapper`) and `.decoder` (`AutoregressiveWrapper`)

Usage Examples

Copy Task (from train_copy.py)

from x_transformers import XTransformer

model = XTransformer(
    dim = 128,
    tie_token_emb = True,
    return_tgt_loss = True,
    enc_num_tokens = 18,
    enc_depth = 3,
    enc_heads = 8,
    enc_max_seq_len = 32,
    dec_num_tokens = 18,
    dec_depth = 3,
    dec_heads = 8,
    dec_max_seq_len = 65
).cuda()

Larger Translation Model

model = XTransformer(
    dim = 512,
    enc_num_tokens = 30000,
    enc_depth = 6,
    enc_heads = 8,
    enc_max_seq_len = 512,
    dec_num_tokens = 30000,
    dec_depth = 6,
    dec_heads = 8,
    dec_max_seq_len = 512,
    tie_token_emb = True,
    cross_attn_tokens_dropout = 0.1
)

Related Pages

Implements Principle

Principle:Lucidrains_X_transformers_Encoder_Decoder_Configuration

Requires Environment

Environment:Lucidrains_X_transformers_PyTorch_CUDA

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment