Implementation:Vibrantlabsai Ragas Core Utility Functions
| Knowledge Sources | |
|---|---|
| Domains | Utilities, Logging, Data_Conversion |
| Last Updated | 2026-02-12 00:00 GMT |
Overview
A collection of shared utility functions and classes used throughout the Ragas library for caching, logging, data format conversion, token counting, deprecation management, and name generation.
Description
The utils.py module is a foundational utility module providing cross-cutting functionality used by many parts of the Ragas codebase. Its key components include:
Caching and Environment:
- get_cache_dir() -- Returns the Ragas cache directory path, respecting XDG_CACHE_HOME and RAGAS_CACHE_HOME environment variables. Uses lru_cache for performance.
- get_debug_mode() -- Checks the RAGAS_DEBUG environment variable to determine if debug mode is active.
Numeric Utilities:
- safe_nanmean() -- Computes the mean of a list of floats, safely handling empty lists and all-NaN arrays by returning np.nan.
- check_if_sum_is_close() -- Checks whether a list of floats sums to a target value within a specified number of decimal places, using integer arithmetic to avoid floating-point errors.
- is_nan() -- Safely checks if a value is NaN, returning False for non-numeric types.
Token Counting:
- num_tokens_from_string() -- Counts tokens in a string using either a provided BaseTokenizer instance or tiktoken with a specified encoding (default cl100k_base).
Data Format Conversion:
- convert_row_v1_to_v2() / convert_v1_to_v2_dataset() / convert_v2_to_v1_dataset() -- Functions for converting between Ragas v1 column names (question, contexts, answer, ground_truth) and v2 column names (user_input, retrieved_contexts, response, reference).
- get_required_columns_v1() -- Maps v2 metric required columns to their v1 equivalents.
Deprecation Support:
- DeprecationHelper -- A class that wraps a target class and emits deprecation warnings on instantiation or attribute access.
- deprecated() -- A decorator for marking functions as deprecated with configurable version info and alternative suggestions.
Logging:
- set_logging_level() -- Configures a logger with a custom _ContextualFormatter that adds UTC time, local time, Ragas user ID, and app version to each log record.
- patch_logger() -- A simpler logger configuration utility for setting log levels on specific modules.
Name Generation:
- MemorableNames -- A class that generates Docker-style memorable names (adjective + scientist name) for experiments and datasets, with uniqueness tracking.
Other Utilities:
- camel_to_snake() -- Converts CamelCase strings to snake_case.
- batched() -- Splits an iterable into fixed-size tuples (similar to itertools.batched in Python 3.12).
- ProgressBarManager -- Manages single and nested tqdm progress bars for batch execution.
- find_git_root() -- Traverses up from a given path to find the nearest git repository root.
- create_nano_id() -- Generates short unique identifiers from UUID4 values using base62 encoding.
- async_to_sync() -- Decorator that converts async functions to sync, handling the case where an event loop is already running by using a ThreadPoolExecutor.
- get_or_init() / get_from_dict() -- Dictionary access helpers supporting dot-notation keys and lazy default initialization.
Usage
Import specific functions or classes from this module when you need shared utilities such as token counting, dataset format conversion, NaN-safe math, progress bar management, or deprecation helpers in any part of the Ragas codebase.
Code Reference
Source Location
- Repository: Vibrantlabsai_Ragas
- File: src/ragas/utils.py
Signature
def get_cache_dir() -> str: ...
def get_debug_mode() -> bool: ...
def safe_nanmean(arr: t.List[float]) -> float: ...
def check_if_sum_is_close(values: t.List[float], close_to: float, num_places: int) -> bool: ...
def is_nan(x) -> bool: ...
def get_metric_language(metric: "Metric") -> str: ...
def num_tokens_from_string(
string: str,
encoding_name: str = "cl100k_base",
tokenizer: t.Optional["BaseTokenizer"] = None,
) -> int: ...
def batched(iterable: t.Iterable, n: int) -> t.Iterator[t.Tuple]: ...
def camel_to_snake(name: str) -> str: ...
def find_git_root(start_path: t.Union[str, Path, None] = None) -> Path: ...
def create_nano_id(size: int = 12) -> str: ...
def async_to_sync(async_func) -> Callable: ...
class DeprecationHelper:
def __init__(self, new_target: t.Type, deprecation_message: str): ...
def deprecated(since: str, *, removal=None, alternative=None, addendum=None, pending=False): ...
class ProgressBarManager:
def __init__(self, desc: str, show_progress: bool): ...
class MemorableNames:
def generate_name(self) -> str: ...
def generate_unique_name(self) -> str: ...
def generate_unique_names(self, count: int) -> t.List[str]: ...
def set_logging_level(logger_name: str = __name__, level: int = logging.DEBUG) -> logging.Logger: ...
def convert_v1_to_v2_dataset(dataset: Dataset) -> Dataset: ...
def convert_v2_to_v1_dataset(dataset: Dataset) -> Dataset: ...
Import
from ragas.utils import safe_nanmean, num_tokens_from_string, batched
from ragas.utils import get_cache_dir, get_debug_mode
from ragas.utils import camel_to_snake, create_nano_id, find_git_root
from ragas.utils import deprecated, DeprecationHelper
from ragas.utils import MemorableNames, memorable_names
from ragas.utils import convert_v1_to_v2_dataset, convert_v2_to_v1_dataset
from ragas.utils import ProgressBarManager, async_to_sync
from ragas.utils import set_logging_level, patch_logger
I/O Contract
num_tokens_from_string Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| string | str | Yes | The text to count tokens for |
| encoding_name | str | No (default "cl100k_base") | Tiktoken encoding name (ignored if tokenizer is provided) |
| tokenizer | BaseTokenizer or None | No | A tokenizer instance; if provided, encoding_name is ignored |
num_tokens_from_string Outputs
| Name | Type | Description |
|---|---|---|
| return | int | Number of tokens in the string |
safe_nanmean Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| arr | List[float] | Yes | List of float values, may contain NaN |
safe_nanmean Outputs
| Name | Type | Description |
|---|---|---|
| return | float | The mean of non-NaN values, or np.nan if the list is empty or all-NaN |
deprecated Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| since | str | Yes | The release version at which the API became deprecated |
| removal | str or None | No | The expected removal version (required if pending=False) |
| alternative | str or None | No | The alternative API or function to use instead |
| addendum | str or None | No | Additional text appended to the deprecation message |
| pending | bool | No (default False) | Whether the deprecation is pending without a scheduled removal |
deprecated Outputs
| Name | Type | Description |
|---|---|---|
| return | Callable | A decorator that wraps the target function to emit DeprecationWarning on invocation |
Usage Examples
Token Counting
from ragas.utils import num_tokens_from_string
count = num_tokens_from_string("Hello, how many tokens is this?")
print(f"Token count: {count}")
Safe NaN Mean
from ragas.utils import safe_nanmean
result = safe_nanmean([1.0, 2.0, float('nan'), 4.0])
# result == 2.3333...
Deprecation Decorator
from ragas.utils import deprecated
@deprecated("0.1", removal="0.2", alternative="new_function")
def old_function():
return "result"
old_function() # Emits DeprecationWarning
Memorable Name Generation
from ragas.utils import memorable_names
name = memorable_names.generate_unique_name()
# e.g., "bold_turing"
names = memorable_names.generate_unique_names(5)
# e.g., ["eager_hopper", "calm_knuth", "brave_lovelace", "zen_dijkstra", "cool_shannon"]
Dataset Version Conversion
from ragas.utils import convert_v1_to_v2_dataset
from datasets import Dataset
v1_dataset = Dataset.from_dict({
"question": ["What is AI?"],
"contexts": [["AI is..."]],
"answer": ["Artificial Intelligence"],
"ground_truth": ["AI stands for Artificial Intelligence"],
})
v2_dataset = convert_v1_to_v2_dataset(v1_dataset)
# Columns renamed to: user_input, retrieved_contexts, response, reference