Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Pytorch Serve TextSentiment Model

From Leeroopedia
Revision as of 13:46, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Pytorch_Serve_TextSentiment_Model.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Overview

TextSentiment is a PyTorch nn.Module for text classification using a bag-of-embeddings architecture. It composes an nn.EmbeddingBag layer with a fully connected nn.Linear layer to classify text into one of 4 sentiment categories. The model is memory-efficient because EmbeddingBag computes mean embeddings on-the-fly without requiring padded input sequences.

Field Value
Implementation Name TextSentiment_Model
Type Model Definition
Workflow Text_Classification_Serving
Domains NLP, Text_Classification
Knowledge Sources Pytorch_Serve
Last Updated 2026-02-13 18:52 GMT

Description

The TextSentiment class implements a lightweight text classification model with two layers: an nn.EmbeddingBag (sparse) for efficient bag-of-words embedding aggregation and a single nn.Linear projection to class logits. The default configuration uses a vocabulary of 1,308,843 tokens, 32-dimensional embeddings, and 4 output classes.

Key Responsibilities

  • Embedding Aggregation: Uses nn.EmbeddingBag with sparse=True to compute mean embeddings per sample without padding, using offsets to delimit variable-length sequences
  • Classification: Projects aggregated embedding through a single nn.Linear layer to produce class logits
  • Weight Initialization: Uniform initialization in the range [-0.5, 0.5] for embedding and linear weights, with bias zeroed

Architecture

Layer Type Input Dim Output Dim Notes
self.embedding nn.EmbeddingBag vocab_size (1,308,843) embed_dim (32) sparse=True, mode="mean" (default)
self.fc nn.Linear embed_dim (32) num_class (4) Final classification layer

Usage

from model import TextSentiment

# Create model with default parameters
model = TextSentiment()

# Or with custom parameters
model = TextSentiment(vocab_size=50000, embed_dim=64, num_class=5)

# Forward pass
# text: 1-D tensor of token indices (concatenated, no padding)
# offsets: tensor of starting indices for each sample in the batch
text = torch.tensor([1, 2, 3, 4, 5, 6])
offsets = torch.tensor([0, 3])  # Two samples: [1,2,3] and [4,5,6]
logits = model(text, offsets)
# logits shape: (2, 4)

Code Reference

Source Location

File Lines Description
examples/text_classification/model.py L1-40 Full module (40 lines)
examples/text_classification/model.py L19-40 TextSentiment class definition
examples/text_classification/model.py L20-24 __init__(vocab_size, embed_dim, num_class) -- layer construction
examples/text_classification/model.py L26-30 init_weights() -- uniform initialization
examples/text_classification/model.py L32-39 forward(text, offsets) -- embedding + linear projection

Signature

class TextSentiment(nn.Module):

    def __init__(self, vocab_size=1308843, embed_dim=32, num_class=4):
        """
        Construct EmbeddingBag + Linear text classifier.

        Args:
            vocab_size (int): Size of the vocabulary. Default: 1,308,843.
            embed_dim (int): Dimensionality of embeddings. Default: 32.
            num_class (int): Number of output classes. Default: 4.
        """
        ...

    def init_weights(self):
        """
        Initialize weights uniformly in [-0.5, 0.5].

        Sets embedding weights, linear weights to uniform(-0.5, 0.5)
        and linear bias to zero.
        """
        ...

    def forward(self, text, offsets):
        """
        Forward pass: EmbeddingBag aggregation followed by linear projection.

        Args:
            text (Tensor): 1-D tensor of token indices (concatenated bag of
                          text tensors, no padding needed).
            offsets (Tensor): 1-D tensor of offsets delimiting individual
                             sequences within the text tensor.

        Returns:
            Tensor: Class logits of shape (batch_size, num_class).
        """
        ...

Import

import torch.nn as nn

I/O Contract

Method Input Output Notes
__init__(vocab_size, embed_dim, num_class) Default: 1308843, 32, 4 None Creates nn.EmbeddingBag(sparse=True) and nn.Linear; calls init_weights()
init_weights() None None uniform_(-0.5, 0.5) for embedding and FC weights; zero_() for FC bias
forward(text, offsets) text: 1-D Tensor of token IDs; offsets: 1-D Tensor of sample boundaries Tensor of shape (batch_size, num_class) No padding required; offsets delimit variable-length sequences

Usage Examples

Example 1: Model Construction and Weight Init

# From model.py L19-30: TextSentiment with EmbeddingBag
class TextSentiment(nn.Module):
    def __init__(self, vocab_size=1308843, embed_dim=32, num_class=4):
        super(TextSentiment, self).__init__()
        self.embedding = nn.EmbeddingBag(vocab_size, embed_dim, sparse=True)
        self.fc = nn.Linear(embed_dim, num_class)
        self.init_weights()

    def init_weights(self):
        initrange = 0.5
        self.embedding.weight.data.uniform_(-initrange, initrange)
        self.fc.weight.data.uniform_(-initrange, initrange)
        self.fc.bias.data.zero_()

Example 2: Forward Pass with Offsets

# From model.py L32-39: forward() uses EmbeddingBag + FC
def forward(self, text, offsets):
    """
    Args:
        text: 1-D tensor representing a bag of text tensors
        offsets: a list of offsets to delimit the 1-D text tensor
            into the individual sequences.
    """
    return self.fc(self.embedding(text, offsets))

# Example usage with variable-length inputs:
import torch

model = TextSentiment()
# Three samples with lengths 2, 3, and 1
text = torch.tensor([10, 20, 30, 40, 50, 60])
offsets = torch.tensor([0, 2, 5])  # Sample boundaries
logits = model(text, offsets)
# logits.shape == (3, 4)

Example 3: Why EmbeddingBag Over Embedding

# nn.EmbeddingBag computes the mean of 'bags' of embeddings.
# Unlike nn.Embedding + mean(), it:
# 1. Requires no padding (uses offsets instead)
# 2. Accumulates the average on-the-fly
# 3. Is faster and more memory-efficient for variable-length sequences
#
# With nn.Embedding, you would need:
#   padded_input = pad_sequence(sequences, batch_first=True)
#   embeddings = embedding(padded_input)  # (batch, max_len, embed_dim)
#   mean_embeddings = embeddings.mean(dim=1)
#
# With nn.EmbeddingBag:
#   mean_embeddings = embedding_bag(concatenated_tokens, offsets)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment