Implementation:Microsoft LoRA Check Copies
Appearance
Overview
The check_copies.py utility validates that code blocks annotated with # Copied from comments in the Transformers source remain consistent with their original source, and ensures the model list in index.rst matches the README.
Description
This CI script enforces copy consistency across the Transformers codebase. It performs two main checks:
- Copy Consistency Check: Scans all Python files under
src/transformers/for lines matching the pattern# Copied from transformers.<module>.<object>. For each match, it locates the original source code, applies any replacement patterns specified (e.g.,with ClassName->OtherClassName), and compares the copied code against the original. When--fix_and_overwriteis provided, it auto-corrects divergent copies. - Model List Synchronization: Extracts the model list from
README.md, converts it from Markdown to RST format, and verifies it matches the corresponding list indocs/source/index.rst. Supports auto-fixing with overwrite mode.
Key internal functions include:
find_code_in_transformers(object_name): Locates a class or function in the Transformers source by dotted path and returns its source code.blackify(code): Formats code with Black (line length 119, target Python 3.5).is_copy_consistent(filename, overwrite): Checks a single file for copy consistency.convert_to_rst(model_list, max_per_line): Converts Markdown model list entries to RST format with proper link conversion and line wrapping.
Usage
Use this utility when:
- Running CI checks to ensure annotated code copies remain in sync with their originals.
- Verifying that the model list in documentation matches the README after adding or updating models.
- Auto-fixing copy drift using the
--fix_and_overwriteflag ormake fix-copies.
Code Reference
Source Location
examples/NLU/utils/check_copies.py (324 lines)
Signature
def find_code_in_transformers(object_name: str) -> str: ... def blackify(code: str) -> str: ... def get_indent(code: str) -> str: ... def is_copy_consistent(filename: str, overwrite: bool = False) -> list: ... def check_copies(overwrite: bool = False) -> None: ... def get_model_list() -> str: ... def split_long_line_with_indent(line: str, max_per_line: int, indent: int) -> str: ... def convert_to_rst(model_list: str, max_per_line: int = None) -> str: ... def check_model_list_copy(overwrite: bool = False, max_per_line: int = 119) -> None: ...
Import / CLI Usage
# Run from repository root python utils/check_copies.py # Auto-fix inconsistencies python utils/check_copies.py --fix_and_overwrite # Or via Makefile make fix-copies
I/O Contract
Inputs
| Input | Type | Description |
|---|---|---|
--fix_and_overwrite |
CLI flag | When set, overwrites inconsistent copies instead of raising an error |
src/transformers/**/*.py |
Files | All Python files scanned for # Copied from annotations
|
README.md |
File | Source of the canonical model list in Markdown format |
docs/source/index.rst |
File | RST documentation file that should mirror the README model list |
Outputs
| Output | Type | Description |
|---|---|---|
| Exception (check mode) | Exception | Raised with details of all copy inconsistencies found |
| Overwritten files (fix mode) | Files | Python files and index.rst updated to match their originals
|
| Console output | stdout | Messages indicating which files were rewritten |
Usage Examples
# Check for copy consistency (CI mode, raises on failure)
python utils/check_copies.py
# Auto-fix all copy inconsistencies
python utils/check_copies.py --fix_and_overwrite
# Programmatic usage
from check_copies import is_copy_consistent
diffs = is_copy_consistent("src/transformers/models/bert/modeling_bert.py")
if diffs:
for diff in diffs:
print(f"Mismatch: {diff[0]} at line {diff[1]}")
Related Pages
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment