Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Gretelai Gretel synthetics LSTM Model Building

From Leeroopedia
Knowledge Sources
Domains Synthetic_Data, Deep_Learning, Recurrent_Neural_Networks
Last Updated 2026-02-14 19:00 GMT

Overview

LSTM model building is the process of constructing a multi-layer Long Short-Term Memory neural network architecture suitable for character-level or subword-level text generation.

Description

For synthetic text generation, the neural network must learn to predict the next token in a sequence given all preceding tokens. LSTM model building assembles the specific layer stack that accomplishes this. The architecture follows the pattern:

  1. Embedding layer: Converts discrete token IDs (integers) into dense vectors of a fixed dimension, enabling the network to learn distributed representations of tokens.
  2. Dropout layers: Applied after the embedding and between recurrent layers to prevent overfitting by randomly zeroing a fraction of activations during training.
  3. Stacked LSTM layers: Two LSTM layers with return_sequences=True and stateful=True form the recurrent backbone. Stacking two layers allows the model to learn hierarchical temporal patterns. The stateful flag means the hidden state from the last sample in one batch carries over to the first sample in the next batch, enabling the model to learn dependencies across batch boundaries.
  4. Dense output layer: A fully connected layer that maps the LSTM hidden state to logits over the entire vocabulary, producing unnormalized log-probabilities for each possible next token.

The model building step also handles the critical distinction between standard training and differentially private (DP) training. In standard mode, a conventional RMSprop optimizer is used. In DP mode, a privacy-aware optimizer wraps RMSprop with per-example gradient clipping and Gaussian noise injection, providing formal privacy guarantees at the cost of some model accuracy.

Usage

Use the model building step whenever:

  • Initializing a new LSTM model for training from scratch.
  • Rebuilding a model architecture to load pre-trained weights for inference.
  • Switching between standard and differentially private training modes.

Theoretical Basis

An LSTM cell maintains a cell state c_t and a hidden state h_t. At each time step t, given input x_t and previous states, the LSTM computes:

f_t = sigmoid(W_f * [h_{t-1}, x_t] + b_f)        # forget gate
i_t = sigmoid(W_i * [h_{t-1}, x_t] + b_i)        # input gate
c_tilde = tanh(W_c * [h_{t-1}, x_t] + b_c)       # candidate cell state
c_t = f_t * c_{t-1} + i_t * c_tilde               # new cell state
o_t = sigmoid(W_o * [h_{t-1}, x_t] + b_o)        # output gate
h_t = o_t * tanh(c_t)                             # new hidden state

The full model architecture used in gretel-synthetics can be described as:

Input token IDs: shape [batch_size, seq_length]
        |
Embedding(vocab_size, embedding_dim)  -> shape [batch_size, seq_length, embedding_dim]
        |
Dropout(dropout_rate)
        |
LSTM(rnn_units, return_sequences=True, stateful=True)  -> shape [batch_size, seq_length, rnn_units]
        |
Dropout(dropout_rate)
        |
LSTM(rnn_units, return_sequences=True, stateful=True)  -> shape [batch_size, seq_length, rnn_units]
        |
Dropout(dropout_rate)
        |
Dense(vocab_size)  -> shape [batch_size, seq_length, vocab_size]  (logits)

The model is compiled with sparse categorical cross-entropy loss (applied directly to logits) and accuracy as the metric. The loss for a single sequence position is:

L = -log(softmax(z)_{y_true})

where z is the logit vector and y_true is the index of the correct next token.

For differentially private training, the optimizer is wrapped using TensorFlow Privacy's make_keras_optimizer_class(RMSprop) with parameters l2_norm_clip, noise_multiplier, and num_microbatches that control the privacy budget.

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment