Implementation:Pytorch Serve TextSentiment Model

Overview

TextSentiment is a PyTorch nn.Module for text classification using a bag-of-embeddings architecture. It composes an nn.EmbeddingBag layer with a fully connected nn.Linear layer to classify text into one of 4 sentiment categories. The model is memory-efficient because EmbeddingBag computes mean embeddings on-the-fly without requiring padded input sequences.

Field	Value
Implementation Name	TextSentiment_Model
Type	Model Definition
Workflow	Text_Classification_Serving
Domains	NLP, Text_Classification
Knowledge Sources	Pytorch_Serve
Last Updated	2026-02-13 18:52 GMT

Description

The TextSentiment class implements a lightweight text classification model with two layers: an nn.EmbeddingBag (sparse) for efficient bag-of-words embedding aggregation and a single nn.Linear projection to class logits. The default configuration uses a vocabulary of 1,308,843 tokens, 32-dimensional embeddings, and 4 output classes.

Key Responsibilities

Embedding Aggregation: Uses nn.EmbeddingBag with sparse=True to compute mean embeddings per sample without padding, using offsets to delimit variable-length sequences
Classification: Projects aggregated embedding through a single nn.Linear layer to produce class logits
Weight Initialization: Uniform initialization in the range [-0.5, 0.5] for embedding and linear weights, with bias zeroed

Architecture

Layer	Type	Input Dim	Output Dim	Notes
`self.embedding`	`nn.EmbeddingBag`	vocab_size (1,308,843)	embed_dim (32)	`sparse=True`, mode="mean" (default)
`self.fc`	`nn.Linear`	embed_dim (32)	num_class (4)	Final classification layer

Usage

from model import TextSentiment

# Create model with default parameters
model = TextSentiment()

# Or with custom parameters
model = TextSentiment(vocab_size=50000, embed_dim=64, num_class=5)

# Forward pass
# text: 1-D tensor of token indices (concatenated, no padding)
# offsets: tensor of starting indices for each sample in the batch
text = torch.tensor([1, 2, 3, 4, 5, 6])
offsets = torch.tensor([0, 3])  # Two samples: [1,2,3] and [4,5,6]
logits = model(text, offsets)
# logits shape: (2, 4)

Code Reference

Source Location

File	Lines	Description
`examples/text_classification/model.py`	L1-40	Full module (40 lines)
`examples/text_classification/model.py`	L19-40	`TextSentiment` class definition
`examples/text_classification/model.py`	L20-24	`__init__(vocab_size, embed_dim, num_class)` -- layer construction
`examples/text_classification/model.py`	L26-30	`init_weights()` -- uniform initialization
`examples/text_classification/model.py`	L32-39	`forward(text, offsets)` -- embedding + linear projection

Signature

class TextSentiment(nn.Module):

    def __init__(self, vocab_size=1308843, embed_dim=32, num_class=4):
        """
        Construct EmbeddingBag + Linear text classifier.

        Args:
            vocab_size (int): Size of the vocabulary. Default: 1,308,843.
            embed_dim (int): Dimensionality of embeddings. Default: 32.
            num_class (int): Number of output classes. Default: 4.
        """
        ...

    def init_weights(self):
        """
        Initialize weights uniformly in [-0.5, 0.5].

        Sets embedding weights, linear weights to uniform(-0.5, 0.5)
        and linear bias to zero.
        """
        ...

    def forward(self, text, offsets):
        """
        Forward pass: EmbeddingBag aggregation followed by linear projection.

        Args:
            text (Tensor): 1-D tensor of token indices (concatenated bag of
                          text tensors, no padding needed).
            offsets (Tensor): 1-D tensor of offsets delimiting individual
                             sequences within the text tensor.

        Returns:
            Tensor: Class logits of shape (batch_size, num_class).
        """
        ...

Import

import torch.nn as nn

I/O Contract

Method	Input	Output	Notes
`__init__(vocab_size, embed_dim, num_class)`	Default: `1308843`, `32`, `4`	None	Creates `nn.EmbeddingBag(sparse=True)` and `nn.Linear`; calls `init_weights()`
`init_weights()`	None	None	`uniform_(-0.5, 0.5)` for embedding and FC weights; `zero_()` for FC bias
`forward(text, offsets)`	`text`: 1-D `Tensor` of token IDs; `offsets`: 1-D `Tensor` of sample boundaries	`Tensor` of shape `(batch_size, num_class)`	No padding required; offsets delimit variable-length sequences

Usage Examples

Example 1: Model Construction and Weight Init

# From model.py L19-30: TextSentiment with EmbeddingBag
class TextSentiment(nn.Module):
    def __init__(self, vocab_size=1308843, embed_dim=32, num_class=4):
        super(TextSentiment, self).__init__()
        self.embedding = nn.EmbeddingBag(vocab_size, embed_dim, sparse=True)
        self.fc = nn.Linear(embed_dim, num_class)
        self.init_weights()

    def init_weights(self):
        initrange = 0.5
        self.embedding.weight.data.uniform_(-initrange, initrange)
        self.fc.weight.data.uniform_(-initrange, initrange)
        self.fc.bias.data.zero_()

Example 2: Forward Pass with Offsets

# From model.py L32-39: forward() uses EmbeddingBag + FC
def forward(self, text, offsets):
    """
    Args:
        text: 1-D tensor representing a bag of text tensors
        offsets: a list of offsets to delimit the 1-D text tensor
            into the individual sequences.
    """
    return self.fc(self.embedding(text, offsets))

# Example usage with variable-length inputs:
import torch

model = TextSentiment()
# Three samples with lengths 2, 3, and 1
text = torch.tensor([10, 20, 30, 40, 50, 60])
offsets = torch.tensor([0, 2, 5])  # Sample boundaries
logits = model(text, offsets)
# logits.shape == (3, 4)

Example 3: Why EmbeddingBag Over Embedding

# nn.EmbeddingBag computes the mean of 'bags' of embeddings.
# Unlike nn.Embedding + mean(), it:
# 1. Requires no padding (uses offsets instead)
# 2. Accumulates the average on-the-fly
# 3. Is faster and more memory-efficient for variable-length sequences
#
# With nn.Embedding, you would need:
#   padded_input = pad_sequence(sequences, batch_first=True)
#   embeddings = embedding(padded_input)  # (batch, max_len, embed_dim)
#   mean_embeddings = embeddings.mean(dim=1)
#
# With nn.EmbeddingBag:
#   mean_embeddings = embedding_bag(concatenated_tokens, offsets)

Related Pages

Principle:Pytorch_Serve_Text_Classification -- principle for serving text classification models with TorchServe
Environment:Pytorch_Serve_Python_PyTorch_Runtime - Core Python and PyTorch runtime

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment