Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Teamcapybara Capybara RegexpDisassembler

From Leeroopedia
Revision as of 11:53, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Teamcapybara_Capybara_RegexpDisassembler.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Knowledge Sources
Domains Testing, Selector_System
Last Updated 2026-02-12 00:00 GMT

Overview

Internal utility class provided by Capybara::Selector::RegexpDisassembler that extracts fixed literal substrings from Ruby regular expressions for efficient CSS/XPath pre-filtering.

Description

RegexpDisassembler parses a Ruby Regexp using the regexp_parser gem and extracts substrings that must appear in any string matching the regexp. These substrings are used to generate CSS contains() or XPath contains() predicates that pre-filter the DOM before the full regexp is evaluated in Ruby, significantly reducing the number of elements that need post-query regexp matching.

initialize(regexp) stores the regexp. Two public methods provide different extraction strategies:

  • substrings returns a flat Array of strings that must ALL appear in any match (AND semantics). It calls process(alternation: false), takes the first result set, and removes covered duplicates via remove_and_covered (e.g., if both "ab" and "abcd" are required, only "abcd" is kept).
  • alternated_substrings returns an Array of Arrays representing OR-of-AND groups. It calls process(alternation: true) and removes covered alternation sets via remove_or_covered. Returns empty if any alternation branch yields no substrings.

Both methods upcase all extracted strings when the regexp has the casefold? (case-insensitive) flag.

The private processing pipeline works in three stages: extract_strings walks the parsed regexp AST via the nested Expression class, combine handles alternation branching using Set-based products, and collapse joins adjacent literal fragments and removes empty segments.

The nested Expression class wraps Regexp::Parser expression nodes and classifies them: terminal? nodes yield literal text or escape characters, optional? nodes (zero-minimum repeats) produce option sets, alternation? nodes generate Set objects for branching, and indeterminate? nodes (meta/set types) produce nil breaks. Negative lookahead and lookbehind assertions are explicitly ignored.

Usage

Used internally by the selector system when a Regexp locator is passed to selectors that support it, enabling XPath/CSS pre-filtering optimization.

Code Reference

Source Location

  • Repository: capybara
  • File: lib/capybara/selector/regexp_disassembler.rb (211 lines)

Signature

module Capybara
  class Selector
    class RegexpDisassembler
      def initialize(regexp)
        # @param regexp [Regexp] The regular expression to disassemble
      end

      def alternated_substrings
        # @return [Array<Array<String>>] OR-of-AND substring groups
        # Each inner array is a set of strings that must ALL appear (AND)
        # Any one inner array matching is sufficient (OR)
        # Returns [] if any alternation branch yields no substrings
      end

      def substrings
        # @return [Array<String>] Strings that must ALL appear in any match
        # Covered substrings are removed (e.g., "ab" removed if "abcd" present)
      end
    end
  end
end

Import

require 'regexp_parser'
require 'capybara/selector/regexp_disassembler'

I/O Contract

Inputs

Name Type Required Description
regexp Regexp Yes Ruby regular expression to extract literal substrings from

Outputs

Name Type Description
substrings Array<String> Fixed literal substrings that must ALL appear in any match (AND semantics)
alternated_substrings Array<Array<String>> OR-of-AND substring groups for alternation-aware pre-filtering

Internal Details

Private Processing Pipeline

def process(alternation:)
  # 1. Parse regexp via Regexp::Parser.parse(@regexp)
  # 2. Extract strings via Expression#extract_strings
  # 3. Combine alternation branches via combine (uses Set-based products)
  # 4. Collapse adjacent literals via collapse (join, reject empty, uniq)
  # 5. Upcase all strings if @regexp.casefold?
end

Nested Expression Class

The private Expression class wraps Regexp::Parser AST nodes and provides:

Method Description
extract_strings(process_alternatives) Walks child expressions, collecting literal strings and nil breaks
alternation? )
optional? True when minimum repeat count is zero
terminal? True for leaf nodes (literals, escapes)
strings(process_alternatives) Dispatches to terminal_strings, optional_strings, or repeated_strings
alternative_strings Returns Set of alternation branch extractions
ignore? True for negative lookahead/lookbehind assertions

Usage Examples

Extract AND Substrings

disassembler = Capybara::Selector::RegexpDisassembler.new(/foo\d+bar/)
disassembler.substrings
# => ["foo", "bar"]
# Both "foo" and "bar" must appear in any matching string

Extract OR-of-AND Substrings

disassembler = Capybara::Selector::RegexpDisassembler.new(/hello|world/)
disassembler.alternated_substrings
# => [["hello"], ["world"]]
# Either "hello" OR "world" must appear

Case-Insensitive Regexp

disassembler = Capybara::Selector::RegexpDisassembler.new(/FooBar/i)
disassembler.substrings
# => ["FOOBAR"]
# Upcased because casefold? is true

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment