Implementation:LaurentMazare Tch rs Seq2Seq Translation

Knowledge Sources	LaurentMazare_Tch_rs
Domains	Natural Language Processing, Sequence to Sequence, Neural Machine Translation
Last Updated	2026-02-08 00:00 GMT

Overview

Implements a GRU-based sequence-to-sequence translation model with attention mechanism for French-to-English translation, following the PyTorch tutorial approach.

Description

This module builds a complete encoder-decoder translation system with Bahdanau-style attention. The Encoder embeds input tokens into a hidden space of size 256 and processes them sequentially through a GRU (Gated Recurrent Unit), producing encoder outputs at each time step.

The Decoder uses an attention mechanism that concatenates the current decoder hidden state with the embedded input token, passes the result through a linear layer to produce attention weights over the encoder outputs (padded to MAX_LENGTH=10). The attended context vector is combined with the embedded token via an attention combine linear layer, followed by ReLU activation. The combined representation is fed through a GRU cell, and the output is projected to the target vocabulary size with log-softmax activation.

The Model struct ties the encoder and decoder together, providing train_loss for computing the NLL loss with optional teacher forcing (randomly toggled per training sample), and predict for greedy decoding during inference. Training uses the Adam optimizer with a learning rate of 0.001 over 100,000 randomly sampled sentence pairs. The dataset is loaded with the reverse flag to train French-to-English translation.

Every 1000 steps, the model prints the average loss and sample predictions showing input, target, and output sentences. Dropout of 0.1 is applied to the decoder embedding during training.

Usage

Use this implementation to train a neural machine translation model on parallel text data. It requires the eng-fra.txt dataset from the PyTorch tutorial data archive placed in the data/ directory. The model trains on CUDA if available, otherwise falls back to CPU.

Code Reference

Source Location

Repository: LaurentMazare_Tch_rs
File: examples/translation/main.rs
Lines: 1-224

Signature

struct Encoder {
    embedding: nn::Embedding,
    gru: nn::GRU,
}

impl Encoder {
    fn new(vs: nn::Path, in_dim: usize, hidden_dim: usize) -> Self
    fn forward(&self, xs: &Tensor, state: &GRUState) -> (Tensor, GRUState)
}

struct Decoder {
    device: Device,
    embedding: nn::Embedding,
    gru: nn::GRU,
    attn: nn::Linear,
    attn_combine: nn::Linear,
    linear: nn::Linear,
}

impl Decoder {
    fn new(vs: nn::Path, hidden_dim: usize, out_dim: usize) -> Self
    fn forward(
        &self, xs: &Tensor, state: &GRUState, enc_outputs: &Tensor, is_training: bool,
    ) -> (Tensor, GRUState)
}

struct Model {
    encoder: Encoder,
    decoder: Decoder,
    decoder_start: Tensor,
    decoder_eos: usize,
    device: Device,
}

impl Model {
    fn new(vs: nn::Path, ilang: &Lang, olang: &Lang, hidden_dim: usize) -> Self
    fn train_loss(&self, input_: &[usize], target: &[usize], rng: &mut ThreadRng) -> Tensor
    fn predict(&self, input_: &[usize]) -> Vec<usize>
}

pub fn main() -> Result<()>

Import

use anyhow::Result;
use rand::prelude::*;
use tch::nn::{GRUState, Module, OptimizerConfig, RNN};
use tch::{nn, Device, Kind, Tensor};

I/O Contract

Constant	Value	Description
MAX_LENGTH	10	Maximum sentence length in words
LEARNING_RATE	0.001	Adam optimizer learning rate
SAMPLES	100,000	Number of training iterations
HIDDEN_SIZE	256	GRU hidden dimension and embedding size

Input	Type	Description
input_	&[usize]	Source sentence as word indices
target	&[usize]	Target sentence as word indices (for training)
Data file	Text file	data/eng-fra.txt tab-separated sentence pairs

Output	Type	Description
train_loss	Tensor	NLL loss accumulated over target tokens
predict	Vec<usize>	Predicted target word indices (greedy decoding)

Usage Examples

// Run the translation training (entry point)
// Requires data/eng-fra.txt from PyTorch tutorial data
translation::main()?;

// Internal model usage during training:
let dataset = Dataset::new("eng", "fra", MAX_LENGTH)?.reverse();
let vs = nn::VarStore::new(Device::cuda_if_available());
let model = Model::new(vs.root(), ilang, olang, HIDDEN_SIZE);
let mut opt = nn::Adam::default().build(&vs, LEARNING_RATE)?;

// Train on a random pair
let (input_, target) = pairs.choose(&mut rng).unwrap();
let loss = model.train_loss(input_, target, &mut rng);
opt.backward_step(&loss);

// Predict translation
let predicted_indices = model.predict(input_);
let translation = olang.seq_to_string(&predicted_indices);

Related Pages

Principle:LaurentMazare_Tch_rs_Seq2Seq_Attention_Translation

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment