Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Vibrantlabsai Ragas Core Utility Functions

From Leeroopedia
Knowledge Sources
Domains Utilities, Logging, Data_Conversion
Last Updated 2026-02-12 00:00 GMT

Overview

A collection of shared utility functions and classes used throughout the Ragas library for caching, logging, data format conversion, token counting, deprecation management, and name generation.

Description

The utils.py module is a foundational utility module providing cross-cutting functionality used by many parts of the Ragas codebase. Its key components include:

Caching and Environment:

  • get_cache_dir() -- Returns the Ragas cache directory path, respecting XDG_CACHE_HOME and RAGAS_CACHE_HOME environment variables. Uses lru_cache for performance.
  • get_debug_mode() -- Checks the RAGAS_DEBUG environment variable to determine if debug mode is active.

Numeric Utilities:

  • safe_nanmean() -- Computes the mean of a list of floats, safely handling empty lists and all-NaN arrays by returning np.nan.
  • check_if_sum_is_close() -- Checks whether a list of floats sums to a target value within a specified number of decimal places, using integer arithmetic to avoid floating-point errors.
  • is_nan() -- Safely checks if a value is NaN, returning False for non-numeric types.

Token Counting:

  • num_tokens_from_string() -- Counts tokens in a string using either a provided BaseTokenizer instance or tiktoken with a specified encoding (default cl100k_base).

Data Format Conversion:

  • convert_row_v1_to_v2() / convert_v1_to_v2_dataset() / convert_v2_to_v1_dataset() -- Functions for converting between Ragas v1 column names (question, contexts, answer, ground_truth) and v2 column names (user_input, retrieved_contexts, response, reference).
  • get_required_columns_v1() -- Maps v2 metric required columns to their v1 equivalents.

Deprecation Support:

  • DeprecationHelper -- A class that wraps a target class and emits deprecation warnings on instantiation or attribute access.
  • deprecated() -- A decorator for marking functions as deprecated with configurable version info and alternative suggestions.

Logging:

  • set_logging_level() -- Configures a logger with a custom _ContextualFormatter that adds UTC time, local time, Ragas user ID, and app version to each log record.
  • patch_logger() -- A simpler logger configuration utility for setting log levels on specific modules.

Name Generation:

  • MemorableNames -- A class that generates Docker-style memorable names (adjective + scientist name) for experiments and datasets, with uniqueness tracking.

Other Utilities:

  • camel_to_snake() -- Converts CamelCase strings to snake_case.
  • batched() -- Splits an iterable into fixed-size tuples (similar to itertools.batched in Python 3.12).
  • ProgressBarManager -- Manages single and nested tqdm progress bars for batch execution.
  • find_git_root() -- Traverses up from a given path to find the nearest git repository root.
  • create_nano_id() -- Generates short unique identifiers from UUID4 values using base62 encoding.
  • async_to_sync() -- Decorator that converts async functions to sync, handling the case where an event loop is already running by using a ThreadPoolExecutor.
  • get_or_init() / get_from_dict() -- Dictionary access helpers supporting dot-notation keys and lazy default initialization.

Usage

Import specific functions or classes from this module when you need shared utilities such as token counting, dataset format conversion, NaN-safe math, progress bar management, or deprecation helpers in any part of the Ragas codebase.

Code Reference

Source Location

Signature

def get_cache_dir() -> str: ...
def get_debug_mode() -> bool: ...
def safe_nanmean(arr: t.List[float]) -> float: ...
def check_if_sum_is_close(values: t.List[float], close_to: float, num_places: int) -> bool: ...
def is_nan(x) -> bool: ...
def get_metric_language(metric: "Metric") -> str: ...
def num_tokens_from_string(
    string: str,
    encoding_name: str = "cl100k_base",
    tokenizer: t.Optional["BaseTokenizer"] = None,
) -> int: ...
def batched(iterable: t.Iterable, n: int) -> t.Iterator[t.Tuple]: ...
def camel_to_snake(name: str) -> str: ...
def find_git_root(start_path: t.Union[str, Path, None] = None) -> Path: ...
def create_nano_id(size: int = 12) -> str: ...
def async_to_sync(async_func) -> Callable: ...

class DeprecationHelper:
    def __init__(self, new_target: t.Type, deprecation_message: str): ...

def deprecated(since: str, *, removal=None, alternative=None, addendum=None, pending=False): ...

class ProgressBarManager:
    def __init__(self, desc: str, show_progress: bool): ...

class MemorableNames:
    def generate_name(self) -> str: ...
    def generate_unique_name(self) -> str: ...
    def generate_unique_names(self, count: int) -> t.List[str]: ...

def set_logging_level(logger_name: str = __name__, level: int = logging.DEBUG) -> logging.Logger: ...

def convert_v1_to_v2_dataset(dataset: Dataset) -> Dataset: ...
def convert_v2_to_v1_dataset(dataset: Dataset) -> Dataset: ...

Import

from ragas.utils import safe_nanmean, num_tokens_from_string, batched
from ragas.utils import get_cache_dir, get_debug_mode
from ragas.utils import camel_to_snake, create_nano_id, find_git_root
from ragas.utils import deprecated, DeprecationHelper
from ragas.utils import MemorableNames, memorable_names
from ragas.utils import convert_v1_to_v2_dataset, convert_v2_to_v1_dataset
from ragas.utils import ProgressBarManager, async_to_sync
from ragas.utils import set_logging_level, patch_logger

I/O Contract

num_tokens_from_string Inputs

Name Type Required Description
string str Yes The text to count tokens for
encoding_name str No (default "cl100k_base") Tiktoken encoding name (ignored if tokenizer is provided)
tokenizer BaseTokenizer or None No A tokenizer instance; if provided, encoding_name is ignored

num_tokens_from_string Outputs

Name Type Description
return int Number of tokens in the string

safe_nanmean Inputs

Name Type Required Description
arr List[float] Yes List of float values, may contain NaN

safe_nanmean Outputs

Name Type Description
return float The mean of non-NaN values, or np.nan if the list is empty or all-NaN

deprecated Inputs

Name Type Required Description
since str Yes The release version at which the API became deprecated
removal str or None No The expected removal version (required if pending=False)
alternative str or None No The alternative API or function to use instead
addendum str or None No Additional text appended to the deprecation message
pending bool No (default False) Whether the deprecation is pending without a scheduled removal

deprecated Outputs

Name Type Description
return Callable A decorator that wraps the target function to emit DeprecationWarning on invocation

Usage Examples

Token Counting

from ragas.utils import num_tokens_from_string

count = num_tokens_from_string("Hello, how many tokens is this?")
print(f"Token count: {count}")

Safe NaN Mean

from ragas.utils import safe_nanmean

result = safe_nanmean([1.0, 2.0, float('nan'), 4.0])
# result == 2.3333...

Deprecation Decorator

from ragas.utils import deprecated

@deprecated("0.1", removal="0.2", alternative="new_function")
def old_function():
    return "result"

old_function()  # Emits DeprecationWarning

Memorable Name Generation

from ragas.utils import memorable_names

name = memorable_names.generate_unique_name()
# e.g., "bold_turing"

names = memorable_names.generate_unique_names(5)
# e.g., ["eager_hopper", "calm_knuth", "brave_lovelace", "zen_dijkstra", "cool_shannon"]

Dataset Version Conversion

from ragas.utils import convert_v1_to_v2_dataset
from datasets import Dataset

v1_dataset = Dataset.from_dict({
    "question": ["What is AI?"],
    "contexts": [["AI is..."]],
    "answer": ["Artificial Intelligence"],
    "ground_truth": ["AI stands for Artificial Intelligence"],
})

v2_dataset = convert_v1_to_v2_dataset(v1_dataset)
# Columns renamed to: user_input, retrieved_contexts, response, reference

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment