Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:LaurentMazare Tch rs Seq2Seq Translation

From Leeroopedia


Knowledge Sources
Domains Natural Language Processing, Sequence to Sequence, Neural Machine Translation
Last Updated 2026-02-08 00:00 GMT

Overview

Implements a GRU-based sequence-to-sequence translation model with attention mechanism for French-to-English translation, following the PyTorch tutorial approach.

Description

This module builds a complete encoder-decoder translation system with Bahdanau-style attention. The Encoder embeds input tokens into a hidden space of size 256 and processes them sequentially through a GRU (Gated Recurrent Unit), producing encoder outputs at each time step.

The Decoder uses an attention mechanism that concatenates the current decoder hidden state with the embedded input token, passes the result through a linear layer to produce attention weights over the encoder outputs (padded to MAX_LENGTH=10). The attended context vector is combined with the embedded token via an attention combine linear layer, followed by ReLU activation. The combined representation is fed through a GRU cell, and the output is projected to the target vocabulary size with log-softmax activation.

The Model struct ties the encoder and decoder together, providing train_loss for computing the NLL loss with optional teacher forcing (randomly toggled per training sample), and predict for greedy decoding during inference. Training uses the Adam optimizer with a learning rate of 0.001 over 100,000 randomly sampled sentence pairs. The dataset is loaded with the reverse flag to train French-to-English translation.

Every 1000 steps, the model prints the average loss and sample predictions showing input, target, and output sentences. Dropout of 0.1 is applied to the decoder embedding during training.

Usage

Use this implementation to train a neural machine translation model on parallel text data. It requires the eng-fra.txt dataset from the PyTorch tutorial data archive placed in the data/ directory. The model trains on CUDA if available, otherwise falls back to CPU.

Code Reference

Source Location

Signature

struct Encoder {
    embedding: nn::Embedding,
    gru: nn::GRU,
}

impl Encoder {
    fn new(vs: nn::Path, in_dim: usize, hidden_dim: usize) -> Self
    fn forward(&self, xs: &Tensor, state: &GRUState) -> (Tensor, GRUState)
}

struct Decoder {
    device: Device,
    embedding: nn::Embedding,
    gru: nn::GRU,
    attn: nn::Linear,
    attn_combine: nn::Linear,
    linear: nn::Linear,
}

impl Decoder {
    fn new(vs: nn::Path, hidden_dim: usize, out_dim: usize) -> Self
    fn forward(
        &self, xs: &Tensor, state: &GRUState, enc_outputs: &Tensor, is_training: bool,
    ) -> (Tensor, GRUState)
}

struct Model {
    encoder: Encoder,
    decoder: Decoder,
    decoder_start: Tensor,
    decoder_eos: usize,
    device: Device,
}

impl Model {
    fn new(vs: nn::Path, ilang: &Lang, olang: &Lang, hidden_dim: usize) -> Self
    fn train_loss(&self, input_: &[usize], target: &[usize], rng: &mut ThreadRng) -> Tensor
    fn predict(&self, input_: &[usize]) -> Vec<usize>
}

pub fn main() -> Result<()>

Import

use anyhow::Result;
use rand::prelude::*;
use tch::nn::{GRUState, Module, OptimizerConfig, RNN};
use tch::{nn, Device, Kind, Tensor};

I/O Contract

Constant Value Description
MAX_LENGTH 10 Maximum sentence length in words
LEARNING_RATE 0.001 Adam optimizer learning rate
SAMPLES 100,000 Number of training iterations
HIDDEN_SIZE 256 GRU hidden dimension and embedding size
Input Type Description
input_ &[usize] Source sentence as word indices
target &[usize] Target sentence as word indices (for training)
Data file Text file data/eng-fra.txt tab-separated sentence pairs
Output Type Description
train_loss Tensor NLL loss accumulated over target tokens
predict Vec<usize> Predicted target word indices (greedy decoding)

Usage Examples

// Run the translation training (entry point)
// Requires data/eng-fra.txt from PyTorch tutorial data
translation::main()?;

// Internal model usage during training:
let dataset = Dataset::new("eng", "fra", MAX_LENGTH)?.reverse();
let vs = nn::VarStore::new(Device::cuda_if_available());
let model = Model::new(vs.root(), ilang, olang, HIDDEN_SIZE);
let mut opt = nn::Adam::default().build(&vs, LEARNING_RATE)?;

// Train on a random pair
let (input_, target) = pairs.choose(&mut rng).unwrap();
let loss = model.train_loss(input_, target, &mut rng);
opt.backward_step(&loss);

// Predict translation
let predicted_indices = model.predict(input_);
let translation = olang.seq_to_string(&predicted_indices);

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment