Implementation:Microsoft LoRA Style Doc
Appearance
Overview
The style_doc.py utility enforces consistent line-length formatting and styling rules for RST documentation files and Python docstrings in the Transformers project.
Description
This script provides automated formatting for two types of documentation:
- RST Files: The
CodeStylerclass processes RST files, wrapping text paragraphs to a configurable maximum line length (default 119 characters). It respects special RST constructs:- Code blocks (
::) are left untouched. - Directive blocks (
.. something::) are not restyled internally. - Textual blocks (
.. note::,.. warning::) have their content restyled with proper indentation. - Lists (bullet, numbered) are re-wrapped while preserving list structure.
- Tables are detected and left as-is.
- Title/section underlines are extended to
max_len.
- Code blocks (
- Python Docstrings: The
DocstringStylersubclass extendsCodeStylerwith additional awareness of:- Argument definition blocks (
Args:,Parameters:,Attributes:) where parameter description lines are preserved while sub-descriptions are wrapped. - Return/Raises sections treated as comment-style blocks.
- Example blocks (
::) marked as no-style zones. - Special docstring words (
Args,Returns,Examples, etc.) get blank lines inserted before them.
- Argument definition blocks (
The SpecialBlock enum tracks three states: NOT_SPECIAL, NO_STYLE, and ARG_LIST, enabling the styler to switch formatting modes as it traverses document structure.
Usage
Use this utility when:
- Enforcing documentation style in CI (check-only mode).
- Auto-formatting RST docs and Python docstrings to meet the 119-character line length convention.
- Preparing documentation for Sphinx builds by ensuring consistent formatting.
Code Reference
Source Location
examples/NLU/utils/style_doc.py (523 lines)
Signature
# Core classes
class SpecialBlock(Enum):
NOT_SPECIAL = 0
NO_STYLE = 1
ARG_LIST = 2
class CodeStyler:
def style(self, text: str, max_len: int = 119, min_indent: str = None) -> str: ...
def style_paragraph(self, paragraph: list, max_len: int, no_style: bool = False, min_indent: str = None) -> str: ...
class DocstringStyler(CodeStyler): ...
# Public API
def style_rst_file(doc_file: str, max_len: int = 119, check_only: bool = False) -> bool: ...
def style_docstring(docstring: str, max_len: int = 119) -> str: ...
def style_file_docstrings(code_file: str, max_len: int = 119, check_only: bool = False) -> bool: ...
def style_doc_files(*files, max_len: int = 119, check_only: bool = False) -> list: ...
def main(*files, max_len: int = 119, check_only: bool = False) -> None: ...
# Helpers
def split_text_in_lines(text: str, max_len: int, prefix: str = "", min_indent: str = None) -> str: ...
def get_indent(line: str) -> str: ...
Import / CLI Usage
# Style specific files python utils/style_doc.py docs/source/model_doc/bert.rst # Style a directory recursively python utils/style_doc.py docs/source/ # Check-only mode (for CI) python utils/style_doc.py --check_only docs/source/ # Custom max line length python utils/style_doc.py --max_len 100 docs/source/model_doc/bert.rst
I/O Contract
Inputs
| Input | Type | Description |
|---|---|---|
files |
positional args | One or more file paths or directory paths to process |
--max_len |
int (optional) | Maximum line length; defaults to 119 |
--check_only |
flag | If set, raise an error on needed changes instead of applying them |
Outputs
| Output | Type | Description |
|---|---|---|
| Restyled files | Files | RST and Python files overwritten with formatted content (when not in check-only mode) |
| ValueError | Exception | Raised in check-only mode if files need restyling |
| Console output | stdout | Reports which files were cleaned or how many need restyling |
Usage Examples
# Auto-format all documentation
python utils/style_doc.py docs/source/ src/transformers/
# Check documentation style in CI
python utils/style_doc.py --check_only docs/source/ src/transformers/
# Raises ValueError if files need formatting
# Programmatic usage for a single docstring
from style_doc import style_docstring
raw = """
Args:
input_ids (torch.LongTensor): Indices of input sequence tokens in the vocabulary. These are very long descriptions that should be wrapped.
Returns:
torch.FloatTensor: The model output.
"""
styled = style_docstring(raw, max_len=119)
Related Pages
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment