Implementation:InternLM Lmdeploy StopCriteria
| Knowledge Sources | |
|---|---|
| Domains | Text Generation, Stop Conditions |
| Last Updated | 2026-02-07 15:00 GMT |
Overview
Implements stop criteria checking for text generation, detecting stop words in generated sequences and enforcing maximum sequence length limits.
Description
The StopCriteria class determines when individual requests in a batch should stop generating tokens. It inherits from BaseGenerationParam and supports two operations:
Setup(): Prepares per-batch stop criteria data. For each request in the batch, it extracts the maximum sequence length from RequestCache::max_seq_len and copies it to device memory. It also initializes stop word tensors from each request's GenerationConfig::stop_ids using the shared init_stop_bad_words utility function (from generation/utils.h), which packs variable-length stop word sequences into a fixed-size tensor format suitable for GPU processing.
Forward(): Runs two GPU kernels on the batch:
- Stop words criterion: If stop words are configured, calls
invokeStopWordsCriterion_v2to scan each request's generated token sequence for matches against its stop word patterns. Matching requests are marked as finished. - Length criterion: Calls
invokeLengthCriterion_v2to compare each request's current sequence length against its maximum allowed length. Requests exceeding the limit are marked as finished.
The class maintains per-phase StopCriteriaData structs containing device buffers for stop words and maximum sequence lengths, plus pinned host staging buffers.
Usage
Used within the Generation module. Called at kSetup to configure stop conditions for each batch, and at kForward (after sampling) to check whether each request should stop generating.
Code Reference
Source Location
- Repository: InternLM_Lmdeploy
- File: src/turbomind/generation/stop_criteria.h
- File: src/turbomind/generation/stop_criteria.cc
- Lines: stop_criteria.h 1-42, stop_criteria.cc 1-89
Signature
class StopCriteria: public BaseGenerationParam {
public:
explicit StopCriteria(const BaseGenerationParam& base, int phases);
void Setup(int phase, TensorMap& env);
void Forward(int phase, TensorMap& env);
private:
std::vector<std::shared_ptr<StopCriteriaData>> data_;
Buffer_<int> stop_words_buf_;
Buffer_<int> max_seq_len_buf_;
};
Import
#include "src/turbomind/generation/stop_criteria.h"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| base | BaseGenerationParam | Yes | Base parameters (max_batch_size, vocab_size, vocab_size_padded) |
| phases | int | Yes | Number of pipeline phases |
| env["batch"] | BatchData* | Yes (Setup) | Batch data with request caches containing stop words and max lengths |
| env["copy"] | BatchCopy* | Yes (Setup) | Host-to-device copy utility |
| env["token_ids_ptrs"] | Buffer_<int*> | Yes (Forward) | Pointers to generated token ID arrays |
| env["sequence_length"] | Buffer_<int> | Yes (Forward) | Current sequence lengths |
| env["finished"] | Buffer_<bool> | Yes (Forward) | Per-request finished flags (modified in-place) |
Outputs
| Name | Type | Description |
|---|---|---|
| env["finished"] (modified) | Buffer_<bool> | Updated finished flags: true for requests that hit stop words or length limits |
Usage Examples
// Construction (inside Generation module)
StopCriteria stop(base_param, phases);
// Setup: prepare stop words and max sequence lengths for the batch
stop.Setup(phase, env);
// Forward: check stop words and length criteria
stop.Forward(phase, env);
// After this, env["finished"] indicates which requests should stop