Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Alibaba MNN RapidJSON Regex

From Leeroopedia


Knowledge Sources
Domains JSON_Parsing, Schema_Validation
Last Updated 2026-02-10 12:00 GMT

Overview

3rd_party/rapidjson/internal/regex.h (740 lines) implements a lightweight regular expression engine within the vendored RapidJSON library in Alibaba MNN. This engine is used exclusively by the JSON Schema validator (schema.h) to evaluate the "pattern" keyword in JSON Schema Draft v4. It avoids pulling in the standard library <regex> header, keeping the dependency footprint minimal.

Usage note: Vendored dependency used internally by MNN for JSON configuration parsing (model configs, LLM configs). Not directly imported by end users.

Key Classes

GenericRegex

The regex compiler that parses a pattern string into an NFA (Nondeterministic Finite Automaton):

template <typename Encoding, typename Allocator = CrtAllocator>
class GenericRegex {
public:
    typedef Encoding EncodingType;
    typedef typename Encoding::Ch Ch;

    GenericRegex(const Ch* source, Allocator* allocator = 0);
    ~GenericRegex();
    bool IsValid() const;

private:
    // NFA state and range management
    SizeType root_;
    SizeType stateCount_;
    SizeType rangeCount_;
    bool anchorBegin_;
    bool anchorEnd_;
};

GenericRegexSearch

The regex execution engine that matches an input stream against a compiled regex:

template <typename RegexType, typename Allocator = CrtAllocator>
class GenericRegexSearch {
public:
    typedef typename RegexType::EncodingType Encoding;
    typedef typename Encoding::Ch Ch;

    GenericRegexSearch(const RegexType& regex, Allocator* allocator = 0);

    template <typename InputStream>
    bool Match(InputStream& is) const;

    template <typename InputStream>
    bool Search(InputStream& is);
};

DecodedStream

A helper that decodes encoded characters from a source stream one codepoint at a time:

template <typename SourceStream, typename Encoding>
class DecodedStream {
public:
    DecodedStream(SourceStream& ss);
    unsigned Peek();
    unsigned Take();
};

Supported Syntax

The engine supports a subset of ECMAScript regular expression grammar:

Pattern Description
ab Concatenation
b Alternation
a? Zero or one
a* Zero or more
a+ One or more
a{3} Exactly 3 times
a{3,} At least 3 times
a{3,5} 3 to 5 times
(ab) Grouping
^a / a$ Anchors (begin/end)
. Any character
[abc], [a-c], [^abc] Character classes
\f \n \r \t \v Escape sequences

Implementation Details

The engine is a Thompson NFA implementation, based on Russ Cox's article "Regular Expression Matching Can Be Simple And Fast." It compiles the pattern into a graph of states with epsilon transitions, then simulates the NFA using a state-set approach that guarantees linear time matching relative to input length, regardless of pattern complexity.

The verbose mode can be enabled via RAPIDJSON_REGEX_VERBOSE for debugging.

Namespace

All classes reside in RAPIDJSON_NAMESPACE::internal (typically rapidjson::internal).

License

MIT License. Copyright (C) 2015 THL A29 Limited (Tencent) and Milo Yip.

See Also

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment