Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:EvolvingLMMs Lab Lmms eval Logger Utils

From Leeroopedia

Overview

This implementation provides utility functions for logging and result storage in the lmms_eval framework. It includes functions for string pattern cleaning, serialization checking, environment information gathering, and tokenizer metadata extraction. These utilities support robust result logging by handling non-serializable objects and enriching results with environment context.

File Location

/tmp/kapso_repo_sslb_59s/lmms_eval/loggers/utils.py (136 lines)

Related Principle

Results Output

Dependencies

  • os: Operating system operations
  • pickle: Serialization testing
  • re: Regular expression pattern matching
  • subprocess: Git command execution
  • pathlib: Path manipulation
  • typing: Type hints
  • numpy: NumPy type handling
  • loguru: Logging (logger instance)
  • torch.utils.collect_env: Environment information gathering
  • transformers: Version information

Core Functions

remove_none_pattern

def remove_none_pattern(input_string: str) -> Tuple[str, bool]

Remove the ',none' substring from the input string if it exists at the end.

Parameters:

  • input_string (str): The input string from which to remove the ',none' substring

Returns:

  • Tuple[str, bool]:
    • str: Modified string with ',none' removed (or original if not found)
    • bool: True if modification was made, False if no change

Logic:

  1. Defines regex pattern: r",none$" (matches ",none" at end of string)
  2. Uses re.sub() to replace pattern with empty string
  3. Compares result to original to determine if change occurred
  4. Returns (modified_string, was_modified)

Use Case: Cleans up string configurations or arguments that may have trailing ",none" artifacts from parsing or concatenation.

Example:

result, removed = remove_none_pattern("model_args=temp=0.5,none")
# result = "model_args=temp=0.5"
# removed = True

result, removed = remove_none_pattern("normal_string")
# result = "normal_string"
# removed = False

is_serializable

def is_serializable(o: Any) -> bool

Test whether an object can be serialized with pickle.

Parameters:

  • o (Any): The object to test for serializability

Returns:

  • bool: True if object can be pickled, False otherwise

Logic:

  1. Attempts to serialize object with pickle.dumps()
  2. Returns True if successful
  3. Catches PickleError, TypeError, AttributeError and returns False

Use Case: Validates objects before attempting to save results. Identifies which results need special handling before serialization.

Example:

is_serializable({"key": "value"})  # → True
is_serializable([1, 2, 3])          # → True
is_serializable(lambda x: x)        # → False (lambdas not serializable)
is_serializable(open("file.txt"))   # → False (file handles not serializable)

_handle_non_serializable

def _handle_non_serializable(o: Any) -> Union[int, str, list]

Handle non-serializable objects by converting them to serializable types.

Parameters:

  • o (Any): The object to be converted

Returns:

  • Union[int, str, list]: Serializable representation of the object

Logic:

  1. If object is np.int64 or np.int32:
  2. Return as Python int
  3. Elif object is set:
  4. Convert to list
  5. Else:
  6. Convert to string representation

Conversion Rules:

  • NumPy integers → Python int (preserves numeric value)
  • Sets → list (makes serializable, loses set semantics)
  • Everything else → str (fallback, preserves information)

Use Case: Converts problematic types encountered during result serialization. Typically used as a fallback handler when pickle fails.

Example:

import numpy as np

_handle_non_serializable(np.int64(42))     # → 42 (Python int)
_handle_non_serializable(np.int32(7))      # → 7 (Python int)
_handle_non_serializable({1, 2, 3})        # → [1, 2, 3] (list)
_handle_non_serializable(lambda x: x)      # → "<function <lambda>...>" (str)

get_commit_from_path

def get_commit_from_path(repo_path: Union[Path, str]) -> Optional[str]

Retrieve the git commit hash from a repository path by reading .git metadata.

Parameters:

  • repo_path (Union[Path, str]): Path to the git repository

Returns:

  • Optional[str]: Git commit hash if found, None on failure

Logic:

  1. Constructs path to .git folder
  2. If .git is a file (submodule case):
  3. Reads file content to get actual .git directory path
  4. Parses path from "gitdir: /path/to/.git" format
  5. If .git/HEAD exists:
  6. Reads HEAD to get reference (e.g., "ref: refs/heads/main")
  7. Reads the reference file to get commit hash
  8. Removes newlines and returns hash
  9. Else:
  10. Returns None
  11. On any exception:
  12. Logs debug message with error
  13. Returns None

Use Cases:

  • Tracking which code version produced results
  • Repository state recording
  • Reproducibility metadata

Edge Cases Handled:

  • Git submodules (.git as file pointing to parent repo)
  • Detached HEAD states
  • Missing .git directory
  • Corrupted git metadata

Example:

commit = get_commit_from_path("/path/to/repo")
# Returns: "a1b2c3d4e5f6..." or None

# Works with submodules
commit = get_commit_from_path("/path/to/submodule")
# Returns commit hash even when .git is a file

get_git_commit_hash

def get_git_commit_hash() -> Optional[str]

Get the git commit hash of the current repository.

Returns:

  • Optional[str]: Git commit hash if found, None otherwise

Logic:

  1. Tries to execute: git describe --always
  2. Runs as subprocess
  3. Strips whitespace from output
  4. Decodes bytes to string
  5. Returns git hash/tag
  6. On CalledProcessError or FileNotFoundError:
  7. Falls back to get_commit_from_path(os.getcwd())
  8. Returns result (hash or None)

Method Hierarchy:

  1. Primary: git CLI command (most reliable, works with detached HEAD)
  2. Fallback: Manual .git parsing (works when git not installed)

Source Attribution: Adapted from EleutherAI's gpt-neox project: https://github.com/EleutherAI/gpt-neox/blob/b608043be541602170bfcfb8ec9bf85e8a0799e0/megatron/neox_arguments/neox_args.py#L42

Use Case: Automatically captures code version for reproducibility without requiring explicit version tracking.

Example:

hash = get_git_commit_hash()
# In git repo: "a1b2c3d" or "v1.0.0-5-ga1b2c3d"
# Outside git repo: None

add_env_info

def add_env_info(storage: Dict[str, Any]) -> None

Add environment information to a storage dictionary (modifies in-place).

Parameters:

  • storage (Dict[str, Any]): Dictionary to augment with environment info

Returns:

  • None (modifies storage in-place)

Logic:

  1. Tries to collect pretty environment info using PyTorch utility:
  2. Calls torch.utils.collect_env.get_pretty_env_info()
  3. On exception:
  4. Sets pretty_env_info to error string
  5. Gets transformers version from imported module
  6. Gets git hash of parent directory (in case current is submodule):
  7. Calls get_commit_from_path(Path(os.getcwd(), ".."))
  8. Creates dictionary with collected info:
  9. - pretty_env_info: Formatted environment details
  10. - transformers_version: Transformers library version
  11. - upper_git_hash: Git hash of parent directory
  12. Updates storage dictionary with new info

Added Fields:

  • pretty_env_info: System, PyTorch, CUDA info (formatted string)
  • transformers_version: transformers library version
  • upper_git_hash: Git hash of parent directory (for submodule tracking)

Use Case: Enriches result files with complete environment context for debugging and reproducibility.

Example:

results = {"model": "gpt-4", "score": 0.95}
add_env_info(results)

# results now contains:
# {
#     "model": "gpt-4",
#     "score": 0.95,
#     "pretty_env_info": "PyTorch version: 2.0.1\nCUDA version: 11.8...",
#     "transformers_version": "4.30.2",
#     "upper_git_hash": "a1b2c3d4..."
# }

add_tokenizer_info

def add_tokenizer_info(storage: Dict[str, Any], lm) -> None

Add tokenizer metadata to a storage dictionary (modifies in-place).

Parameters:

  • storage (Dict[str, Any]): Dictionary to augment with tokenizer info
  • lm: Language model object (must have tokenizer attribute)

Returns:

  • None (modifies storage in-place)

Logic:

  1. Checks if lm has tokenizer attribute:
  2. Uses getattr(lm, "tokenizer", False)
  3. If tokenizer exists:
  4. Tries to collect tokenizer info:
  5. Creates dictionary with:
  6. - tokenizer_pad_token: [token, token_id]
  7. - tokenizer_eos_token: [token, token_id]
  8. - tokenizer_bos_token: [token, token_id]
  9. - eot_token_id: from lm attribute (if exists)
  10. - max_length: from lm attribute (if exists)
  11. Updates storage with tokenizer info
  12. On exception:
  13. Logs debug message and continues
  14. If no tokenizer:
  15. Logs debug message explaining why info not logged

Added Fields:

  • tokenizer_pad_token: [pad_token_string, pad_token_id]
  • tokenizer_eos_token: [eos_token_string, eos_token_id]
  • tokenizer_bos_token: [bos_token_string, bos_token_id]
  • eot_token_id: End-of-turn token ID (if applicable)
  • max_length: Maximum sequence length

Use Cases:

  • Debugging tokenization issues
  • Tracking special token usage
  • Reproducibility (different tokenizers may affect results)
  • Documentation of model-specific token configurations

Example:

results = {"model": "gpt2", "score": 0.85}
add_tokenizer_info(results, model)

# results now contains:
# {
#     "model": "gpt2",
#     "score": 0.85,
#     "tokenizer_pad_token": ["<|endoftext|>", "50256"],
#     "tokenizer_eos_token": ["<|endoftext|>", "50256"],
#     "tokenizer_bos_token": ["<|endoftext|>", "50256"],
#     "eot_token_id": None,
#     "max_length": 1024
# }

Design Patterns

Defensive Programming

All functions include comprehensive error handling:

  • Try-except blocks catch exceptions gracefully
  • Fallback values (None, empty dict) prevent crashes
  • Debug logging provides visibility without failing operations

In-Place Modification

add_env_info and add_tokenizer_info modify dictionaries in-place:

  • Avoids creating copies of potentially large result dictionaries
  • Clear side-effect through naming convention (add_* prefix)
  • Enables chaining of augmentation operations

Type Conversion Pipeline

Serialization handling uses a two-step process:

  1. Check serializability with is_serializable()
  2. Convert if needed with _handle_non_serializable()

This separates detection from conversion for clarity and testability.

Fallback Hierarchy

get_git_commit_hash uses layered fallbacks:

  1. Try git CLI (most reliable)
  2. Fall back to manual .git parsing (works without git installed)
  3. Return None if all methods fail

Usage Patterns

Result Preparation

from lmms_eval.loggers.utils import add_env_info, add_tokenizer_info

results = {
    "model": "qwen2.5-vl",
    "task": "mmmu",
    "accuracy": 0.87,
    # ... other metrics
}

# Enrich with environment context
add_env_info(results)
add_tokenizer_info(results, model)

# Save results
with open("results.json", "w") as f:
    json.dump(results, f)

Serialization Handling

from lmms_eval.loggers.utils import is_serializable, _handle_non_serializable
import numpy as np

data = {
    "scores": np.array([0.8, 0.9, 0.85]),
    "labels": {1, 2, 3},
    "model": model_instance
}

# Convert non-serializable values
cleaned_data = {}
for key, value in data.items():
    if is_serializable(value):
        cleaned_data[key] = value
    else:
        cleaned_data[key] = _handle_non_serializable(value)

# Now safe to serialize
pickle.dump(cleaned_data, file)

Version Tracking

from lmms_eval.loggers.utils import get_git_commit_hash

# Record code version with results
experiment_metadata = {
    "timestamp": datetime.now().isoformat(),
    "code_version": get_git_commit_hash(),
    "config": config_dict
}

String Cleanup

from lmms_eval.loggers.utils import remove_none_pattern

# Clean up parsed arguments
args_string = "model=gpt4,temp=0.7,none"
cleaned, was_modified = remove_none_pattern(args_string)
if was_modified:
    logger.info(f"Cleaned args: {cleaned}")

Integration with Framework

These utilities are used throughout the logging pipeline:

Result Saving:

# In evaluation pipeline
results = evaluator.run()
add_env_info(results)
add_tokenizer_info(results, model)
logger.log_results(results)

Reproducibility Tracking:

# Automatic version capture
metadata = {
    "git_hash": get_git_commit_hash(),
    "upper_git_hash": get_commit_from_path(".."),
    # ... other metadata
}

Error Handling Examples

Git Not Available

hash = get_git_commit_hash()
if hash is None:
    logger.warning("Could not determine git version, not in repository or git not installed")
else:
    logger.info(f"Running code version: {hash}")

Tokenizer Not Present

results = {}
add_tokenizer_info(results, model)
# If model has no tokenizer, function logs debug message and continues
# results dictionary unchanged, no crash

Serialization Failure

obj = custom_complex_object

if not is_serializable(obj):
    logger.warning(f"Object {type(obj)} is not serializable, converting")
    obj = _handle_non_serializable(obj)

# obj now serializable (int, str, or list)

Best Practices

Environment Information

  • Call add_env_info() once per evaluation run
  • Include in all saved result files for reproducibility
  • Useful for debugging environment-specific issues

Tokenizer Information

  • Call add_tokenizer_info() for language model evaluations
  • Helps diagnose tokenization-related issues
  • Documents model-specific token configurations

Git Version Tracking

  • Use get_git_commit_hash() at evaluation start
  • Log warnings if git hash unavailable
  • Store in result metadata for reproducibility

Serialization

  • Test serializability before attempting to save
  • Use _handle_non_serializable() as fallback converter
  • Document which types needed conversion for future reference

String Cleaning

  • Apply remove_none_pattern() to user-provided arguments
  • Check return boolean to log when modifications occur
  • Use for defensive cleanup of configuration strings

Related Implementations

Testing Considerations

Serialization Tests

  • Test with NumPy types (int64, int32, arrays)
  • Test with Python sets, frozensets
  • Test with lambda functions, file handles
  • Verify conversion preserves essential information

Git Functions Tests

  • Mock subprocess for git CLI testing
  • Test with and without .git directory
  • Test submodule case (.git as file)
  • Test detached HEAD state

Environment Info Tests

  • Mock torch environment info collection
  • Test graceful handling of missing dependencies
  • Verify all expected fields added to storage

Tokenizer Info Tests

  • Test with models with/without tokenizer attribute
  • Test with various tokenizer types (GPT, BERT, T5)
  • Verify graceful handling of incomplete tokenizers

Performance Considerations

  • is_serializable() uses pickle attempt (relatively expensive)
  • Consider caching serializability results for repeated checks
  • Git operations involve file I/O and subprocess calls
  • Environment info collection is one-time overhead per run
  • All functions designed for infrequent calls (per-run, not per-sample)

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment