Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Eventual Inc Daft Regexp Extract

From Leeroopedia


Knowledge Sources
Domains Data_Engineering, Text_Processing
Last Updated 2026-02-08 00:00 GMT

Overview

Concrete tool for extracting regex matches from string expressions provided by the Daft library.

Description

The regexp_extract function extracts the specified match group from the first regex match in each string of a String expression. If index is 0 (the default), the entire match is returned. If the pattern does not match or the requested capture group does not exist, a null value is returned. The pattern can be a static string or a dynamic Expression for row-level pattern variation.

Usage

Use this function as a standalone function or via the Expression method .str.extract() when you need to parse substrings from text columns using regular expressions.

Code Reference

Source Location

  • Repository: Daft
  • File: daft/functions/str.py
  • Lines: L1072-1129

Signature

def regexp_extract(
    expr: Expression,
    pattern: str | Expression,
    index: int = 0,
) -> Expression

Import

from daft.functions import regexp_extract

# or use as an Expression method
import daft
daft.col("text").str.extract(pattern, index)

I/O Contract

Inputs

Name Type Required Description
expr Expression (String) Yes A String expression to extract matches from.
pattern Expression Yes The regular expression pattern to match. Can be a static string or a dynamic Expression.
index int No The index of the capture group to extract. 0 returns the entire match; 1 returns the first capture group, etc. Defaults to 0.

Outputs

Name Type Description
return Expression (String) A String expression containing the extracted match for each row, or null if no match or the group does not exist.

Usage Examples

Basic Usage

import daft
from daft.functions import regexp_extract

regex = r"(\d)(\d*)"
df = daft.from_pydict({"x": ["123-456", "789-012", "345-678"]})
df = df.with_column("match", regexp_extract(df["x"], regex))
df.collect()
# Returns: "123", "789", "345" (entire first match)

Extract Specific Capture Group

import daft
from daft.functions import regexp_extract

regex = r"(\d)(\d*)"
df = daft.from_pydict({"x": ["123-456", "789-012", "345-678"]})

# Extract first capture group (single digit)
df = df.with_column("first_digit", regexp_extract(df["x"], regex, 1))
df.collect()
# Returns: "1", "7", "3"

Using Expression Method

import daft

df = daft.from_pydict({"text": ["email: user@example.com", "contact: admin@test.org"]})
df = df.with_column("email", daft.col("text").str.extract(r"[\w.]+@[\w.]+"))
df.collect()

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment