Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:InternLM Lmdeploy StopCriteria

From Leeroopedia


Knowledge Sources
Domains Text Generation, Stop Conditions
Last Updated 2026-02-07 15:00 GMT

Overview

Implements stop criteria checking for text generation, detecting stop words in generated sequences and enforcing maximum sequence length limits.

Description

The StopCriteria class determines when individual requests in a batch should stop generating tokens. It inherits from BaseGenerationParam and supports two operations:

Setup(): Prepares per-batch stop criteria data. For each request in the batch, it extracts the maximum sequence length from RequestCache::max_seq_len and copies it to device memory. It also initializes stop word tensors from each request's GenerationConfig::stop_ids using the shared init_stop_bad_words utility function (from generation/utils.h), which packs variable-length stop word sequences into a fixed-size tensor format suitable for GPU processing.

Forward(): Runs two GPU kernels on the batch:

  1. Stop words criterion: If stop words are configured, calls invokeStopWordsCriterion_v2 to scan each request's generated token sequence for matches against its stop word patterns. Matching requests are marked as finished.
  2. Length criterion: Calls invokeLengthCriterion_v2 to compare each request's current sequence length against its maximum allowed length. Requests exceeding the limit are marked as finished.

The class maintains per-phase StopCriteriaData structs containing device buffers for stop words and maximum sequence lengths, plus pinned host staging buffers.

Usage

Used within the Generation module. Called at kSetup to configure stop conditions for each batch, and at kForward (after sampling) to check whether each request should stop generating.

Code Reference

Source Location

Signature

class StopCriteria: public BaseGenerationParam {
public:
    explicit StopCriteria(const BaseGenerationParam& base, int phases);

    void Setup(int phase, TensorMap& env);

    void Forward(int phase, TensorMap& env);

private:
    std::vector<std::shared_ptr<StopCriteriaData>> data_;

    Buffer_<int> stop_words_buf_;
    Buffer_<int> max_seq_len_buf_;
};

Import

#include "src/turbomind/generation/stop_criteria.h"

I/O Contract

Inputs

Name Type Required Description
base BaseGenerationParam Yes Base parameters (max_batch_size, vocab_size, vocab_size_padded)
phases int Yes Number of pipeline phases
env["batch"] BatchData* Yes (Setup) Batch data with request caches containing stop words and max lengths
env["copy"] BatchCopy* Yes (Setup) Host-to-device copy utility
env["token_ids_ptrs"] Buffer_<int*> Yes (Forward) Pointers to generated token ID arrays
env["sequence_length"] Buffer_<int> Yes (Forward) Current sequence lengths
env["finished"] Buffer_<bool> Yes (Forward) Per-request finished flags (modified in-place)

Outputs

Name Type Description
env["finished"] (modified) Buffer_<bool> Updated finished flags: true for requests that hit stop words or length limits

Usage Examples

// Construction (inside Generation module)
StopCriteria stop(base_param, phases);

// Setup: prepare stop words and max sequence lengths for the batch
stop.Setup(phase, env);

// Forward: check stop words and length criteria
stop.Forward(phase, env);
// After this, env["finished"] indicates which requests should stop

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment