Implementation:EvolvingLMMs Lab Lmms eval Logger Utils
Overview
This implementation provides utility functions for logging and result storage in the lmms_eval framework. It includes functions for string pattern cleaning, serialization checking, environment information gathering, and tokenizer metadata extraction. These utilities support robust result logging by handling non-serializable objects and enriching results with environment context.
File Location
/tmp/kapso_repo_sslb_59s/lmms_eval/loggers/utils.py (136 lines)
Related Principle
Dependencies
os: Operating system operationspickle: Serialization testingre: Regular expression pattern matchingsubprocess: Git command executionpathlib: Path manipulationtyping: Type hintsnumpy: NumPy type handlingloguru: Logging (logger instance)torch.utils.collect_env: Environment information gatheringtransformers: Version information
Core Functions
remove_none_pattern
def remove_none_pattern(input_string: str) -> Tuple[str, bool]
Remove the ',none' substring from the input string if it exists at the end.
Parameters:
input_string(str): The input string from which to remove the ',none' substring
Returns:
- Tuple[str, bool]:
- str: Modified string with ',none' removed (or original if not found)
- bool: True if modification was made, False if no change
Logic:
- Defines regex pattern: r",none$" (matches ",none" at end of string)
- Uses re.sub() to replace pattern with empty string
- Compares result to original to determine if change occurred
- Returns (modified_string, was_modified)
Use Case: Cleans up string configurations or arguments that may have trailing ",none" artifacts from parsing or concatenation.
Example:
result, removed = remove_none_pattern("model_args=temp=0.5,none")
# result = "model_args=temp=0.5"
# removed = True
result, removed = remove_none_pattern("normal_string")
# result = "normal_string"
# removed = False
is_serializable
def is_serializable(o: Any) -> bool
Test whether an object can be serialized with pickle.
Parameters:
o(Any): The object to test for serializability
Returns:
- bool: True if object can be pickled, False otherwise
Logic:
- Attempts to serialize object with pickle.dumps()
- Returns True if successful
- Catches PickleError, TypeError, AttributeError and returns False
Use Case: Validates objects before attempting to save results. Identifies which results need special handling before serialization.
Example:
is_serializable({"key": "value"}) # → True
is_serializable([1, 2, 3]) # → True
is_serializable(lambda x: x) # → False (lambdas not serializable)
is_serializable(open("file.txt")) # → False (file handles not serializable)
_handle_non_serializable
def _handle_non_serializable(o: Any) -> Union[int, str, list]
Handle non-serializable objects by converting them to serializable types.
Parameters:
o(Any): The object to be converted
Returns:
- Union[int, str, list]: Serializable representation of the object
Logic:
- If object is np.int64 or np.int32:
- Return as Python int
- Elif object is set:
- Convert to list
- Else:
- Convert to string representation
Conversion Rules:
- NumPy integers → Python int (preserves numeric value)
- Sets → list (makes serializable, loses set semantics)
- Everything else → str (fallback, preserves information)
Use Case: Converts problematic types encountered during result serialization. Typically used as a fallback handler when pickle fails.
Example:
import numpy as np
_handle_non_serializable(np.int64(42)) # → 42 (Python int)
_handle_non_serializable(np.int32(7)) # → 7 (Python int)
_handle_non_serializable({1, 2, 3}) # → [1, 2, 3] (list)
_handle_non_serializable(lambda x: x) # → "<function <lambda>...>" (str)
get_commit_from_path
def get_commit_from_path(repo_path: Union[Path, str]) -> Optional[str]
Retrieve the git commit hash from a repository path by reading .git metadata.
Parameters:
repo_path(Union[Path, str]): Path to the git repository
Returns:
- Optional[str]: Git commit hash if found, None on failure
Logic:
- Constructs path to .git folder
- If .git is a file (submodule case):
- Reads file content to get actual .git directory path
- Parses path from "gitdir: /path/to/.git" format
- If .git/HEAD exists:
- Reads HEAD to get reference (e.g., "ref: refs/heads/main")
- Reads the reference file to get commit hash
- Removes newlines and returns hash
- Else:
- Returns None
- On any exception:
- Logs debug message with error
- Returns None
Use Cases:
- Tracking which code version produced results
- Repository state recording
- Reproducibility metadata
Edge Cases Handled:
- Git submodules (.git as file pointing to parent repo)
- Detached HEAD states
- Missing .git directory
- Corrupted git metadata
Example:
commit = get_commit_from_path("/path/to/repo")
# Returns: "a1b2c3d4e5f6..." or None
# Works with submodules
commit = get_commit_from_path("/path/to/submodule")
# Returns commit hash even when .git is a file
get_git_commit_hash
def get_git_commit_hash() -> Optional[str]
Get the git commit hash of the current repository.
Returns:
- Optional[str]: Git commit hash if found, None otherwise
Logic:
- Tries to execute: git describe --always
- Runs as subprocess
- Strips whitespace from output
- Decodes bytes to string
- Returns git hash/tag
- On CalledProcessError or FileNotFoundError:
- Falls back to get_commit_from_path(os.getcwd())
- Returns result (hash or None)
Method Hierarchy:
- Primary: git CLI command (most reliable, works with detached HEAD)
- Fallback: Manual .git parsing (works when git not installed)
Source Attribution: Adapted from EleutherAI's gpt-neox project: https://github.com/EleutherAI/gpt-neox/blob/b608043be541602170bfcfb8ec9bf85e8a0799e0/megatron/neox_arguments/neox_args.py#L42
Use Case: Automatically captures code version for reproducibility without requiring explicit version tracking.
Example:
hash = get_git_commit_hash()
# In git repo: "a1b2c3d" or "v1.0.0-5-ga1b2c3d"
# Outside git repo: None
add_env_info
def add_env_info(storage: Dict[str, Any]) -> None
Add environment information to a storage dictionary (modifies in-place).
Parameters:
storage(Dict[str, Any]): Dictionary to augment with environment info
Returns:
- None (modifies storage in-place)
Logic:
- Tries to collect pretty environment info using PyTorch utility:
- Calls torch.utils.collect_env.get_pretty_env_info()
- On exception:
- Sets pretty_env_info to error string
- Gets transformers version from imported module
- Gets git hash of parent directory (in case current is submodule):
- Calls get_commit_from_path(Path(os.getcwd(), ".."))
- Creates dictionary with collected info:
- - pretty_env_info: Formatted environment details
- - transformers_version: Transformers library version
- - upper_git_hash: Git hash of parent directory
- Updates storage dictionary with new info
Added Fields:
pretty_env_info: System, PyTorch, CUDA info (formatted string)transformers_version: transformers library versionupper_git_hash: Git hash of parent directory (for submodule tracking)
Use Case: Enriches result files with complete environment context for debugging and reproducibility.
Example:
results = {"model": "gpt-4", "score": 0.95}
add_env_info(results)
# results now contains:
# {
# "model": "gpt-4",
# "score": 0.95,
# "pretty_env_info": "PyTorch version: 2.0.1\nCUDA version: 11.8...",
# "transformers_version": "4.30.2",
# "upper_git_hash": "a1b2c3d4..."
# }
add_tokenizer_info
def add_tokenizer_info(storage: Dict[str, Any], lm) -> None
Add tokenizer metadata to a storage dictionary (modifies in-place).
Parameters:
storage(Dict[str, Any]): Dictionary to augment with tokenizer infolm: Language model object (must have tokenizer attribute)
Returns:
- None (modifies storage in-place)
Logic:
- Checks if lm has tokenizer attribute:
- Uses getattr(lm, "tokenizer", False)
- If tokenizer exists:
- Tries to collect tokenizer info:
- Creates dictionary with:
- - tokenizer_pad_token: [token, token_id]
- - tokenizer_eos_token: [token, token_id]
- - tokenizer_bos_token: [token, token_id]
- - eot_token_id: from lm attribute (if exists)
- - max_length: from lm attribute (if exists)
- Updates storage with tokenizer info
- On exception:
- Logs debug message and continues
- If no tokenizer:
- Logs debug message explaining why info not logged
Added Fields:
tokenizer_pad_token: [pad_token_string, pad_token_id]tokenizer_eos_token: [eos_token_string, eos_token_id]tokenizer_bos_token: [bos_token_string, bos_token_id]eot_token_id: End-of-turn token ID (if applicable)max_length: Maximum sequence length
Use Cases:
- Debugging tokenization issues
- Tracking special token usage
- Reproducibility (different tokenizers may affect results)
- Documentation of model-specific token configurations
Example:
results = {"model": "gpt2", "score": 0.85}
add_tokenizer_info(results, model)
# results now contains:
# {
# "model": "gpt2",
# "score": 0.85,
# "tokenizer_pad_token": ["<|endoftext|>", "50256"],
# "tokenizer_eos_token": ["<|endoftext|>", "50256"],
# "tokenizer_bos_token": ["<|endoftext|>", "50256"],
# "eot_token_id": None,
# "max_length": 1024
# }
Design Patterns
Defensive Programming
All functions include comprehensive error handling:
- Try-except blocks catch exceptions gracefully
- Fallback values (None, empty dict) prevent crashes
- Debug logging provides visibility without failing operations
In-Place Modification
add_env_info and add_tokenizer_info modify dictionaries in-place:
- Avoids creating copies of potentially large result dictionaries
- Clear side-effect through naming convention (add_* prefix)
- Enables chaining of augmentation operations
Type Conversion Pipeline
Serialization handling uses a two-step process:
- Check serializability with
is_serializable() - Convert if needed with
_handle_non_serializable()
This separates detection from conversion for clarity and testability.
Fallback Hierarchy
get_git_commit_hash uses layered fallbacks:
- Try git CLI (most reliable)
- Fall back to manual .git parsing (works without git installed)
- Return None if all methods fail
Usage Patterns
Result Preparation
from lmms_eval.loggers.utils import add_env_info, add_tokenizer_info
results = {
"model": "qwen2.5-vl",
"task": "mmmu",
"accuracy": 0.87,
# ... other metrics
}
# Enrich with environment context
add_env_info(results)
add_tokenizer_info(results, model)
# Save results
with open("results.json", "w") as f:
json.dump(results, f)
Serialization Handling
from lmms_eval.loggers.utils import is_serializable, _handle_non_serializable
import numpy as np
data = {
"scores": np.array([0.8, 0.9, 0.85]),
"labels": {1, 2, 3},
"model": model_instance
}
# Convert non-serializable values
cleaned_data = {}
for key, value in data.items():
if is_serializable(value):
cleaned_data[key] = value
else:
cleaned_data[key] = _handle_non_serializable(value)
# Now safe to serialize
pickle.dump(cleaned_data, file)
Version Tracking
from lmms_eval.loggers.utils import get_git_commit_hash
# Record code version with results
experiment_metadata = {
"timestamp": datetime.now().isoformat(),
"code_version": get_git_commit_hash(),
"config": config_dict
}
String Cleanup
from lmms_eval.loggers.utils import remove_none_pattern
# Clean up parsed arguments
args_string = "model=gpt4,temp=0.7,none"
cleaned, was_modified = remove_none_pattern(args_string)
if was_modified:
logger.info(f"Cleaned args: {cleaned}")
Integration with Framework
These utilities are used throughout the logging pipeline:
Result Saving:
# In evaluation pipeline
results = evaluator.run()
add_env_info(results)
add_tokenizer_info(results, model)
logger.log_results(results)
Reproducibility Tracking:
# Automatic version capture
metadata = {
"git_hash": get_git_commit_hash(),
"upper_git_hash": get_commit_from_path(".."),
# ... other metadata
}
Error Handling Examples
Git Not Available
hash = get_git_commit_hash()
if hash is None:
logger.warning("Could not determine git version, not in repository or git not installed")
else:
logger.info(f"Running code version: {hash}")
Tokenizer Not Present
results = {}
add_tokenizer_info(results, model)
# If model has no tokenizer, function logs debug message and continues
# results dictionary unchanged, no crash
Serialization Failure
obj = custom_complex_object
if not is_serializable(obj):
logger.warning(f"Object {type(obj)} is not serializable, converting")
obj = _handle_non_serializable(obj)
# obj now serializable (int, str, or list)
Best Practices
Environment Information
- Call
add_env_info()once per evaluation run - Include in all saved result files for reproducibility
- Useful for debugging environment-specific issues
Tokenizer Information
- Call
add_tokenizer_info()for language model evaluations - Helps diagnose tokenization-related issues
- Documents model-specific token configurations
Git Version Tracking
- Use
get_git_commit_hash()at evaluation start - Log warnings if git hash unavailable
- Store in result metadata for reproducibility
Serialization
- Test serializability before attempting to save
- Use
_handle_non_serializable()as fallback converter - Document which types needed conversion for future reference
String Cleaning
- Apply
remove_none_pattern()to user-provided arguments - Check return boolean to log when modifications occur
- Use for defensive cleanup of configuration strings
Related Implementations
- Results Output: Uses these utilities for result logging
- Model Configuration: May use string cleanup utilities
- Task Testing: Uses environment info in test results
Testing Considerations
Serialization Tests
- Test with NumPy types (int64, int32, arrays)
- Test with Python sets, frozensets
- Test with lambda functions, file handles
- Verify conversion preserves essential information
Git Functions Tests
- Mock subprocess for git CLI testing
- Test with and without .git directory
- Test submodule case (.git as file)
- Test detached HEAD state
Environment Info Tests
- Mock torch environment info collection
- Test graceful handling of missing dependencies
- Verify all expected fields added to storage
Tokenizer Info Tests
- Test with models with/without tokenizer attribute
- Test with various tokenizer types (GPT, BERT, T5)
- Verify graceful handling of incomplete tokenizers
Performance Considerations
is_serializable()uses pickle attempt (relatively expensive)- Consider caching serializability results for repeated checks
- Git operations involve file I/O and subprocess calls
- Environment info collection is one-time overhead per run
- All functions designed for infrequent calls (per-run, not per-sample)