Implementation:InternLM Lmdeploy StopCriteria

Knowledge Sources	InternLM_Lmdeploy
Domains	Text Generation, Stop Conditions
Last Updated	2026-02-07 15:00 GMT

Overview

Implements stop criteria checking for text generation, detecting stop words in generated sequences and enforcing maximum sequence length limits.

Description

The StopCriteria class determines when individual requests in a batch should stop generating tokens. It inherits from BaseGenerationParam and supports two operations:

Setup(): Prepares per-batch stop criteria data. For each request in the batch, it extracts the maximum sequence length from RequestCache::max_seq_len and copies it to device memory. It also initializes stop word tensors from each request's GenerationConfig::stop_ids using the shared init_stop_bad_words utility function (from generation/utils.h), which packs variable-length stop word sequences into a fixed-size tensor format suitable for GPU processing.

Forward(): Runs two GPU kernels on the batch:

Stop words criterion: If stop words are configured, calls invokeStopWordsCriterion_v2 to scan each request's generated token sequence for matches against its stop word patterns. Matching requests are marked as finished.
Length criterion: Calls invokeLengthCriterion_v2 to compare each request's current sequence length against its maximum allowed length. Requests exceeding the limit are marked as finished.

The class maintains per-phase StopCriteriaData structs containing device buffers for stop words and maximum sequence lengths, plus pinned host staging buffers.

Usage

Used within the Generation module. Called at kSetup to configure stop conditions for each batch, and at kForward (after sampling) to check whether each request should stop generating.

Code Reference

Source Location

Repository: InternLM_Lmdeploy
File: src/turbomind/generation/stop_criteria.h
File: src/turbomind/generation/stop_criteria.cc
Lines: stop_criteria.h 1-42, stop_criteria.cc 1-89

Signature

class StopCriteria: public BaseGenerationParam {
public:
    explicit StopCriteria(const BaseGenerationParam& base, int phases);

    void Setup(int phase, TensorMap& env);

    void Forward(int phase, TensorMap& env);

private:
    std::vector<std::shared_ptr<StopCriteriaData>> data_;

    Buffer_<int> stop_words_buf_;
    Buffer_<int> max_seq_len_buf_;
};

Import

#include "src/turbomind/generation/stop_criteria.h"

I/O Contract

Inputs

Name	Type	Required	Description
base	BaseGenerationParam	Yes	Base parameters (max_batch_size, vocab_size, vocab_size_padded)
phases	int	Yes	Number of pipeline phases
env["batch"]	BatchData*	Yes (Setup)	Batch data with request caches containing stop words and max lengths
env["copy"]	BatchCopy*	Yes (Setup)	Host-to-device copy utility
env["token_ids_ptrs"]	Buffer_<int*>	Yes (Forward)	Pointers to generated token ID arrays
env["sequence_length"]	Buffer_<int>	Yes (Forward)	Current sequence lengths
env["finished"]	Buffer_<bool>	Yes (Forward)	Per-request finished flags (modified in-place)

Outputs

Name	Type	Description
env["finished"] (modified)	Buffer_<bool>	Updated finished flags: true for requests that hit stop words or length limits

Usage Examples

// Construction (inside Generation module)
StopCriteria stop(base_param, phases);

// Setup: prepare stop words and max sequence lengths for the batch
stop.Setup(phase, env);

// Forward: check stop words and length criteria
stop.Forward(phase, env);
// After this, env["finished"] indicates which requests should stop

Related Pages

Environment:InternLM_Lmdeploy_CUDA_GPU_Runtime

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment