Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Lucidrains X transformers XTransformer Init

From Leeroopedia


Metadata

Field Value
Repository x-transformers
Domains NLP, Model_Architecture
Last Updated 2026-02-08 18:00 GMT

Overview

Concrete tool for configuring encoder-decoder sequence-to-sequence transformer models provided by the x-transformers library.

Description

XTransformer combines an encoder TransformerWrapper and a decoder TransformerWrapper (wrapped with AutoregressiveWrapper) into a single module. It accepts all configuration via prefixed keyword arguments: enc_* for encoder settings and dec_* for decoder settings.

Key behaviors of __init__:

  • The shared dim parameter sets the model dimension for both encoder and decoder.
  • All keyword arguments prefixed with enc_ are extracted and forwarded to the encoder TransformerWrapper and its inner Encoder (AttentionLayers with causal=False).
  • All keyword arguments prefixed with dec_ are extracted and forwarded to the decoder TransformerWrapper and its inner Decoder (AttentionLayers with causal=True, cross_attend=True).
  • The encoder is internally configured with return_only_embed=True, so it outputs hidden states rather than logits.
  • The decoder is wrapped in AutoregressiveWrapper for automatic input/target splitting and loss computation.
  • tie_token_emb -- When True, the encoder and decoder share the same token embedding matrix. Useful when source and target vocabularies are identical (e.g., copy tasks, monolingual summarization).
  • cross_attn_tokens_dropout -- Applies dropout to cross-attention tokens during training as a regularization strategy. A fraction of encoder hidden states are randomly dropped before being passed to the decoder cross-attention layers.

Usage

Import XTransformer when building sequence-to-sequence models. Configure encoder and decoder separately via prefixed parameters. Use for machine translation, summarization, copy tasks, or any input-to-output sequence transduction.

Code Reference

Field Value
Repository x-transformers
File x_transformers/x_transformers.py
Lines L3830-3873

Signature:

class XTransformer(Module):
    def __init__(
        self,
        *,
        dim,
        tie_token_emb = False,
        ignore_index = -100,
        pad_value = 0,
        cross_attn_tokens_dropout = 0.,
        **kwargs  # enc_* and dec_* prefixed params
    ):

Import:

from x_transformers import XTransformer

I/O Contract

Inputs

Name Type Required Description
dim int Yes Shared model dimension for encoder and decoder
tie_token_emb bool No Tie encoder/decoder token embeddings (default False)
enc_num_tokens int Yes Encoder vocabulary size
enc_depth int Yes Number of encoder layers
enc_heads int Yes Number of encoder attention heads
enc_max_seq_len int Yes Maximum encoder sequence length
dec_num_tokens int Yes Decoder vocabulary size
dec_depth int Yes Number of decoder layers
dec_heads int Yes Number of decoder attention heads
dec_max_seq_len int Yes Maximum decoder sequence length
cross_attn_tokens_dropout float No Dropout rate for cross-attention tokens (default 0.)

Outputs

Name Type Description
model XTransformer Module with .encoder (TransformerWrapper) and .decoder (AutoregressiveWrapper)

Usage Examples

Copy Task (from train_copy.py)

from x_transformers import XTransformer

model = XTransformer(
    dim = 128,
    tie_token_emb = True,
    return_tgt_loss = True,
    enc_num_tokens = 18,
    enc_depth = 3,
    enc_heads = 8,
    enc_max_seq_len = 32,
    dec_num_tokens = 18,
    dec_depth = 3,
    dec_heads = 8,
    dec_max_seq_len = 65
).cuda()

Larger Translation Model

model = XTransformer(
    dim = 512,
    enc_num_tokens = 30000,
    enc_depth = 6,
    enc_heads = 8,
    enc_max_seq_len = 512,
    dec_num_tokens = 30000,
    dec_depth = 6,
    dec_heads = 8,
    dec_max_seq_len = 512,
    tie_token_emb = True,
    cross_attn_tokens_dropout = 0.1
)

Related Pages

Implements Principle

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment