Implementation:EvolvingLMMs Lab Lmms eval Logger Utils

Overview

This implementation provides utility functions for logging and result storage in the lmms_eval framework. It includes functions for string pattern cleaning, serialization checking, environment information gathering, and tokenizer metadata extraction. These utilities support robust result logging by handling non-serializable objects and enriching results with environment context.

File Location

/tmp/kapso_repo_sslb_59s/lmms_eval/loggers/utils.py (136 lines)

Related Principle

Results Output

Dependencies

os: Operating system operations
pickle: Serialization testing
re: Regular expression pattern matching
subprocess: Git command execution
pathlib: Path manipulation
typing: Type hints
numpy: NumPy type handling
loguru: Logging (logger instance)
torch.utils.collect_env: Environment information gathering
transformers: Version information

Core Functions

remove_none_pattern

def remove_none_pattern(input_string: str) -> Tuple[str, bool]

Remove the ',none' substring from the input string if it exists at the end.

Parameters:

input_string (str): The input string from which to remove the ',none' substring

Returns:

Tuple[str, bool]:
- str: Modified string with ',none' removed (or original if not found)
- bool: True if modification was made, False if no change

Logic:

Defines regex pattern: r",none$" (matches ",none" at end of string)
Uses re.sub() to replace pattern with empty string
Compares result to original to determine if change occurred
Returns (modified_string, was_modified)

Use Case: Cleans up string configurations or arguments that may have trailing ",none" artifacts from parsing or concatenation.

Example:

result, removed = remove_none_pattern("model_args=temp=0.5,none")
# result = "model_args=temp=0.5"
# removed = True

result, removed = remove_none_pattern("normal_string")
# result = "normal_string"
# removed = False

is_serializable

def is_serializable(o: Any) -> bool

Test whether an object can be serialized with pickle.

Parameters:

o (Any): The object to test for serializability

Returns:

bool: True if object can be pickled, False otherwise

Logic:

Attempts to serialize object with pickle.dumps()
Returns True if successful
Catches PickleError, TypeError, AttributeError and returns False

Use Case: Validates objects before attempting to save results. Identifies which results need special handling before serialization.

Example:

is_serializable({"key": "value"})  # → True
is_serializable([1, 2, 3])          # → True
is_serializable(lambda x: x)        # → False (lambdas not serializable)
is_serializable(open("file.txt"))   # → False (file handles not serializable)

_handle_non_serializable

def _handle_non_serializable(o: Any) -> Union[int, str, list]

Handle non-serializable objects by converting them to serializable types.

Parameters:

o (Any): The object to be converted

Returns:

Union[int, str, list]: Serializable representation of the object

Logic:

If object is np.int64 or np.int32:
Return as Python int
Elif object is set:
Convert to list
Else:
Convert to string representation

Conversion Rules:

NumPy integers → Python int (preserves numeric value)
Sets → list (makes serializable, loses set semantics)
Everything else → str (fallback, preserves information)

Use Case: Converts problematic types encountered during result serialization. Typically used as a fallback handler when pickle fails.

Example:

import numpy as np

_handle_non_serializable(np.int64(42))     # → 42 (Python int)
_handle_non_serializable(np.int32(7))      # → 7 (Python int)
_handle_non_serializable({1, 2, 3})        # → [1, 2, 3] (list)
_handle_non_serializable(lambda x: x)      # → "<function <lambda>...>" (str)

get_commit_from_path

def get_commit_from_path(repo_path: Union[Path, str]) -> Optional[str]

Retrieve the git commit hash from a repository path by reading .git metadata.

Parameters:

repo_path (Union[Path, str]): Path to the git repository

Returns:

Optional[str]: Git commit hash if found, None on failure

Logic:

Constructs path to .git folder
If .git is a file (submodule case):
Reads file content to get actual .git directory path
Parses path from "gitdir: /path/to/.git" format
If .git/HEAD exists:
Reads HEAD to get reference (e.g., "ref: refs/heads/main")
Reads the reference file to get commit hash
Removes newlines and returns hash
Else:
Returns None
On any exception:
Logs debug message with error
Returns None

Use Cases:

Tracking which code version produced results
Repository state recording
Reproducibility metadata

Edge Cases Handled:

Git submodules (.git as file pointing to parent repo)
Detached HEAD states
Missing .git directory
Corrupted git metadata

Example:

commit = get_commit_from_path("/path/to/repo")
# Returns: "a1b2c3d4e5f6..." or None

# Works with submodules
commit = get_commit_from_path("/path/to/submodule")
# Returns commit hash even when .git is a file

get_git_commit_hash

def get_git_commit_hash() -> Optional[str]

Get the git commit hash of the current repository.

Returns:

Optional[str]: Git commit hash if found, None otherwise

Logic:

Tries to execute: git describe --always
Runs as subprocess
Strips whitespace from output
Decodes bytes to string
Returns git hash/tag
On CalledProcessError or FileNotFoundError:
Falls back to get_commit_from_path(os.getcwd())
Returns result (hash or None)

Method Hierarchy:

Primary: git CLI command (most reliable, works with detached HEAD)
Fallback: Manual .git parsing (works when git not installed)

Source Attribution: Adapted from EleutherAI's gpt-neox project: https://github.com/EleutherAI/gpt-neox/blob/b608043be541602170bfcfb8ec9bf85e8a0799e0/megatron/neox_arguments/neox_args.py#L42

Use Case: Automatically captures code version for reproducibility without requiring explicit version tracking.

Example:

hash = get_git_commit_hash()
# In git repo: "a1b2c3d" or "v1.0.0-5-ga1b2c3d"
# Outside git repo: None

add_env_info

def add_env_info(storage: Dict[str, Any]) -> None

Add environment information to a storage dictionary (modifies in-place).

Parameters:

storage (Dict[str, Any]): Dictionary to augment with environment info

Returns:

None (modifies storage in-place)

Logic:

Tries to collect pretty environment info using PyTorch utility:
Calls torch.utils.collect_env.get_pretty_env_info()
On exception:
Sets pretty_env_info to error string
Gets transformers version from imported module
Gets git hash of parent directory (in case current is submodule):
Calls get_commit_from_path(Path(os.getcwd(), ".."))
Creates dictionary with collected info:
- pretty_env_info: Formatted environment details
- transformers_version: Transformers library version
- upper_git_hash: Git hash of parent directory
Updates storage dictionary with new info

Added Fields:

pretty_env_info: System, PyTorch, CUDA info (formatted string)
transformers_version: transformers library version
upper_git_hash: Git hash of parent directory (for submodule tracking)

Use Case: Enriches result files with complete environment context for debugging and reproducibility.

Example:

results = {"model": "gpt-4", "score": 0.95}
add_env_info(results)

# results now contains:
# {
#     "model": "gpt-4",
#     "score": 0.95,
#     "pretty_env_info": "PyTorch version: 2.0.1\nCUDA version: 11.8...",
#     "transformers_version": "4.30.2",
#     "upper_git_hash": "a1b2c3d4..."
# }

add_tokenizer_info

def add_tokenizer_info(storage: Dict[str, Any], lm) -> None

Add tokenizer metadata to a storage dictionary (modifies in-place).

Parameters:

storage (Dict[str, Any]): Dictionary to augment with tokenizer info
lm: Language model object (must have tokenizer attribute)

Returns:

None (modifies storage in-place)

Logic:

Checks if lm has tokenizer attribute:
Uses getattr(lm, "tokenizer", False)
If tokenizer exists:
Tries to collect tokenizer info:
Creates dictionary with:
- tokenizer_pad_token: [token, token_id]
- tokenizer_eos_token: [token, token_id]
- tokenizer_bos_token: [token, token_id]
- eot_token_id: from lm attribute (if exists)
- max_length: from lm attribute (if exists)
Updates storage with tokenizer info
On exception:
Logs debug message and continues
If no tokenizer:
Logs debug message explaining why info not logged

Added Fields:

tokenizer_pad_token: [pad_token_string, pad_token_id]
tokenizer_eos_token: [eos_token_string, eos_token_id]
tokenizer_bos_token: [bos_token_string, bos_token_id]
eot_token_id: End-of-turn token ID (if applicable)
max_length: Maximum sequence length

Use Cases:

Debugging tokenization issues
Tracking special token usage
Reproducibility (different tokenizers may affect results)
Documentation of model-specific token configurations

Example:

results = {"model": "gpt2", "score": 0.85}
add_tokenizer_info(results, model)

# results now contains:
# {
#     "model": "gpt2",
#     "score": 0.85,
#     "tokenizer_pad_token": ["<|endoftext|>", "50256"],
#     "tokenizer_eos_token": ["<|endoftext|>", "50256"],
#     "tokenizer_bos_token": ["<|endoftext|>", "50256"],
#     "eot_token_id": None,
#     "max_length": 1024
# }

Design Patterns

Defensive Programming

All functions include comprehensive error handling:

Try-except blocks catch exceptions gracefully
Fallback values (None, empty dict) prevent crashes
Debug logging provides visibility without failing operations

In-Place Modification

add_env_info and add_tokenizer_info modify dictionaries in-place:

Avoids creating copies of potentially large result dictionaries
Clear side-effect through naming convention (add_* prefix)
Enables chaining of augmentation operations

Type Conversion Pipeline

Serialization handling uses a two-step process:

Check serializability with is_serializable()
Convert if needed with _handle_non_serializable()

This separates detection from conversion for clarity and testability.

Fallback Hierarchy

get_git_commit_hash uses layered fallbacks:

Try git CLI (most reliable)
Fall back to manual .git parsing (works without git installed)
Return None if all methods fail

Usage Patterns

Result Preparation

from lmms_eval.loggers.utils import add_env_info, add_tokenizer_info

results = {
    "model": "qwen2.5-vl",
    "task": "mmmu",
    "accuracy": 0.87,
    # ... other metrics
}

# Enrich with environment context
add_env_info(results)
add_tokenizer_info(results, model)

# Save results
with open("results.json", "w") as f:
    json.dump(results, f)

Serialization Handling

from lmms_eval.loggers.utils import is_serializable, _handle_non_serializable
import numpy as np

data = {
    "scores": np.array([0.8, 0.9, 0.85]),
    "labels": {1, 2, 3},
    "model": model_instance
}

# Convert non-serializable values
cleaned_data = {}
for key, value in data.items():
    if is_serializable(value):
        cleaned_data[key] = value
    else:
        cleaned_data[key] = _handle_non_serializable(value)

# Now safe to serialize
pickle.dump(cleaned_data, file)

Version Tracking

from lmms_eval.loggers.utils import get_git_commit_hash

# Record code version with results
experiment_metadata = {
    "timestamp": datetime.now().isoformat(),
    "code_version": get_git_commit_hash(),
    "config": config_dict
}

String Cleanup

from lmms_eval.loggers.utils import remove_none_pattern

# Clean up parsed arguments
args_string = "model=gpt4,temp=0.7,none"
cleaned, was_modified = remove_none_pattern(args_string)
if was_modified:
    logger.info(f"Cleaned args: {cleaned}")

Integration with Framework

These utilities are used throughout the logging pipeline:

Result Saving:

# In evaluation pipeline
results = evaluator.run()
add_env_info(results)
add_tokenizer_info(results, model)
logger.log_results(results)

Reproducibility Tracking:

# Automatic version capture
metadata = {
    "git_hash": get_git_commit_hash(),
    "upper_git_hash": get_commit_from_path(".."),
    # ... other metadata
}

Error Handling Examples

Git Not Available

hash = get_git_commit_hash()
if hash is None:
    logger.warning("Could not determine git version, not in repository or git not installed")
else:
    logger.info(f"Running code version: {hash}")

Tokenizer Not Present

results = {}
add_tokenizer_info(results, model)
# If model has no tokenizer, function logs debug message and continues
# results dictionary unchanged, no crash

Serialization Failure

obj = custom_complex_object

if not is_serializable(obj):
    logger.warning(f"Object {type(obj)} is not serializable, converting")
    obj = _handle_non_serializable(obj)

# obj now serializable (int, str, or list)

Best Practices

Environment Information

Call add_env_info() once per evaluation run
Include in all saved result files for reproducibility
Useful for debugging environment-specific issues

Tokenizer Information

Call add_tokenizer_info() for language model evaluations
Helps diagnose tokenization-related issues
Documents model-specific token configurations

Git Version Tracking

Use get_git_commit_hash() at evaluation start
Log warnings if git hash unavailable
Store in result metadata for reproducibility

Serialization

Test serializability before attempting to save
Use _handle_non_serializable() as fallback converter
Document which types needed conversion for future reference

String Cleaning

Apply remove_none_pattern() to user-provided arguments
Check return boolean to log when modifications occur
Use for defensive cleanup of configuration strings

Related Implementations

Results Output: Uses these utilities for result logging
Model Configuration: May use string cleanup utilities
Task Testing: Uses environment info in test results

Testing Considerations

Serialization Tests

Test with NumPy types (int64, int32, arrays)
Test with Python sets, frozensets
Test with lambda functions, file handles
Verify conversion preserves essential information

Git Functions Tests

Mock subprocess for git CLI testing
Test with and without .git directory
Test submodule case (.git as file)
Test detached HEAD state

Environment Info Tests

Mock torch environment info collection
Test graceful handling of missing dependencies
Verify all expected fields added to storage

Tokenizer Info Tests

Test with models with/without tokenizer attribute
Test with various tokenizer types (GPT, BERT, T5)
Verify graceful handling of incomplete tokenizers

Performance Considerations

is_serializable() uses pickle attempt (relatively expensive)
Consider caching serializability results for repeated checks
Git operations involve file I/O and subprocess calls
Environment info collection is one-time overhead per run
All functions designed for infrequent calls (per-run, not per-sample)

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment