Implementation:LaurentMazare Tch rs Seq2Seq Translation
| Knowledge Sources | |
|---|---|
| Domains | Natural Language Processing, Sequence to Sequence, Neural Machine Translation |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Implements a GRU-based sequence-to-sequence translation model with attention mechanism for French-to-English translation, following the PyTorch tutorial approach.
Description
This module builds a complete encoder-decoder translation system with Bahdanau-style attention. The Encoder embeds input tokens into a hidden space of size 256 and processes them sequentially through a GRU (Gated Recurrent Unit), producing encoder outputs at each time step.
The Decoder uses an attention mechanism that concatenates the current decoder hidden state with the embedded input token, passes the result through a linear layer to produce attention weights over the encoder outputs (padded to MAX_LENGTH=10). The attended context vector is combined with the embedded token via an attention combine linear layer, followed by ReLU activation. The combined representation is fed through a GRU cell, and the output is projected to the target vocabulary size with log-softmax activation.
The Model struct ties the encoder and decoder together, providing train_loss for computing the NLL loss with optional teacher forcing (randomly toggled per training sample), and predict for greedy decoding during inference. Training uses the Adam optimizer with a learning rate of 0.001 over 100,000 randomly sampled sentence pairs. The dataset is loaded with the reverse flag to train French-to-English translation.
Every 1000 steps, the model prints the average loss and sample predictions showing input, target, and output sentences. Dropout of 0.1 is applied to the decoder embedding during training.
Usage
Use this implementation to train a neural machine translation model on parallel text data. It requires the eng-fra.txt dataset from the PyTorch tutorial data archive placed in the data/ directory. The model trains on CUDA if available, otherwise falls back to CPU.
Code Reference
Source Location
- Repository: LaurentMazare_Tch_rs
- File: examples/translation/main.rs
- Lines: 1-224
Signature
struct Encoder {
embedding: nn::Embedding,
gru: nn::GRU,
}
impl Encoder {
fn new(vs: nn::Path, in_dim: usize, hidden_dim: usize) -> Self
fn forward(&self, xs: &Tensor, state: &GRUState) -> (Tensor, GRUState)
}
struct Decoder {
device: Device,
embedding: nn::Embedding,
gru: nn::GRU,
attn: nn::Linear,
attn_combine: nn::Linear,
linear: nn::Linear,
}
impl Decoder {
fn new(vs: nn::Path, hidden_dim: usize, out_dim: usize) -> Self
fn forward(
&self, xs: &Tensor, state: &GRUState, enc_outputs: &Tensor, is_training: bool,
) -> (Tensor, GRUState)
}
struct Model {
encoder: Encoder,
decoder: Decoder,
decoder_start: Tensor,
decoder_eos: usize,
device: Device,
}
impl Model {
fn new(vs: nn::Path, ilang: &Lang, olang: &Lang, hidden_dim: usize) -> Self
fn train_loss(&self, input_: &[usize], target: &[usize], rng: &mut ThreadRng) -> Tensor
fn predict(&self, input_: &[usize]) -> Vec<usize>
}
pub fn main() -> Result<()>
Import
use anyhow::Result;
use rand::prelude::*;
use tch::nn::{GRUState, Module, OptimizerConfig, RNN};
use tch::{nn, Device, Kind, Tensor};
I/O Contract
| Constant | Value | Description |
|---|---|---|
| MAX_LENGTH | 10 | Maximum sentence length in words |
| LEARNING_RATE | 0.001 | Adam optimizer learning rate |
| SAMPLES | 100,000 | Number of training iterations |
| HIDDEN_SIZE | 256 | GRU hidden dimension and embedding size |
| Input | Type | Description |
|---|---|---|
| input_ | &[usize] | Source sentence as word indices |
| target | &[usize] | Target sentence as word indices (for training) |
| Data file | Text file | data/eng-fra.txt tab-separated sentence pairs |
| Output | Type | Description |
|---|---|---|
| train_loss | Tensor | NLL loss accumulated over target tokens |
| predict | Vec<usize> | Predicted target word indices (greedy decoding) |
Usage Examples
// Run the translation training (entry point)
// Requires data/eng-fra.txt from PyTorch tutorial data
translation::main()?;
// Internal model usage during training:
let dataset = Dataset::new("eng", "fra", MAX_LENGTH)?.reverse();
let vs = nn::VarStore::new(Device::cuda_if_available());
let model = Model::new(vs.root(), ilang, olang, HIDDEN_SIZE);
let mut opt = nn::Adam::default().build(&vs, LEARNING_RATE)?;
// Train on a random pair
let (input_, target) = pairs.choose(&mut rng).unwrap();
let loss = model.train_loss(input_, target, &mut rng);
opt.backward_step(&loss);
// Predict translation
let predicted_indices = model.predict(input_);
let translation = olang.seq_to_string(&predicted_indices);