Implementation:Mlflow Mlflow Clint Symbol Index
| Knowledge Sources | |
|---|---|
| Domains | Static Analysis, Code Linting |
| Last Updated | 2026-02-13 20:00 GMT |
Overview
Symbol indexing module for the Clint custom linter that builds and maintains an index of all MLflow Python functions and classes, enabling cross-module symbol resolution through import chain traversal.
Description
This module provides efficient indexing and lookup of Python symbols (functions, classes) across the entire MLflow codebase using AST parsing and parallel processing. It is a core component of the Clint custom linter, enabling lint rules that require cross-module knowledge of function signatures.
Key Components:
FunctionInfo is a lightweight dataclass that stores function signature information:
- has_vararg - Whether the function accepts *args
- has_kwarg - Whether the function accepts **kwargs
- args - List of regular argument names
- kwonlyargs - List of keyword-only argument names
- posonlyargs - List of positional-only argument names
- from_func_def() - Class method that constructs a FunctionInfo from an AST FunctionDef or AsyncFunctionDef node, with an option to skip the self parameter for methods
- all_args - Property that returns all argument names combined
ModuleSymbolExtractor is an AST NodeVisitor that extracts two kinds of information from a Python module:
- import_mapping - Maps re-exported names to their original fully-qualified names (e.g., mlflow.log_metric -> mlflow.tracking.fluent.log_metric)
- func_mapping - Maps fully-qualified function/class names to their FunctionInfo signatures
- For classes, it extracts the __init__ signature and any @classmethod or @staticmethod methods. Classes without __init__ are recorded with *args, **kwargs.
extract_symbols_from_file() is a standalone function that parses a single file and returns its import and function mappings. It converts file paths to module names (e.g., mlflow/tracking/fluent.py -> mlflow.tracking.fluent).
SymbolIndex is the main index class that:
- build() - Constructs the index by parallel-processing all mlflow/*.py files using ProcessPoolExecutor, listing them via git ls-files
- resolve() - Resolves a fully-qualified symbol name to its FunctionInfo by first checking direct function mappings, then following import chains with circular import detection
- save() / load() - Pickle serialization for efficient sharing between worker processes
Usage
The SymbolIndex is used by Clint lint rules such as unknown-mlflow-function and unknown-mlflow-arguments that need to verify whether a function exists in the MLflow API and whether the correct arguments are being passed. It is built once and shared across all rule checks.
Code Reference
Source Location
- Repository: Mlflow_Mlflow
- File: dev/clint/src/clint/index.py
- Lines: 1-221
Signature
@dataclass
class FunctionInfo:
has_vararg: bool
has_kwarg: bool
args: list[str] = field(default_factory=list)
kwonlyargs: list[str] = field(default_factory=list)
posonlyargs: list[str] = field(default_factory=list)
@classmethod
def from_func_def(
cls, node: ast.FunctionDef | ast.AsyncFunctionDef, skip_self: bool = False
) -> Self: ...
@property
def all_args(self) -> list[str]: ...
class ModuleSymbolExtractor(ast.NodeVisitor):
def __init__(self, mod: str) -> None: ...
def visit_Import(self, node: ast.Import) -> None: ...
def visit_ImportFrom(self, node: ast.ImportFrom) -> None: ...
def visit_FunctionDef(self, node: ast.FunctionDef) -> None: ...
def visit_AsyncFunctionDef(self, node: ast.AsyncFunctionDef) -> None: ...
def visit_ClassDef(self, node: ast.ClassDef) -> None: ...
def extract_symbols_from_file(
rel_path: str, content: str
) -> tuple[dict[str, str], dict[str, FunctionInfo]] | None: ...
class SymbolIndex:
def __init__(self, import_mapping: dict[str, str], func_mapping: dict[str, FunctionInfo]) -> None: ...
def save(self, path: Path) -> None: ...
@classmethod
def load(cls, path: Path) -> Self: ...
@classmethod
def build(cls) -> Self: ...
def resolve(self, target: str) -> FunctionInfo | None: ...
Import
from clint.index import SymbolIndex, FunctionInfo
from clint.index import ModuleSymbolExtractor, extract_symbols_from_file
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| rel_path | str | Yes | Relative file path from repository root (e.g., "mlflow/tracking/fluent.py") |
| content | str | Yes | Python source code content of the file to parse |
| target | str | Yes | Fully-qualified symbol name to resolve (e.g., "mlflow.log_metric") |
| path | Path | Yes | File path for saving/loading the pickled index |
Outputs
| Name | Type | Description |
|---|---|---|
| FunctionInfo | dataclass | Lightweight function signature with argument lists and vararg/kwarg flags |
| SymbolIndex | class | Complete index of all MLflow symbols with resolution capabilities |
| import_mapping | dict[str, str] | Mapping from re-exported names to their original fully-qualified module paths |
| func_mapping | dict[str, FunctionInfo] | Mapping from fully-qualified function names to their signature information |
Usage Examples
Building and Using the Symbol Index
from clint.index import SymbolIndex
# Build an index of all MLflow symbols
index = SymbolIndex.build()
# Resolve a function's signature
func_info = index.resolve("mlflow.log_metric")
if func_info:
print(f"Arguments: {func_info.args}")
print(f"Has **kwargs: {func_info.has_kwarg}")
Saving and Loading the Index
from pathlib import Path
from clint.index import SymbolIndex
# Build and save for later use
index = SymbolIndex.build()
index.save(Path("/tmp/mlflow_symbol_index.pkl"))
# Load from cache
cached_index = SymbolIndex.load(Path("/tmp/mlflow_symbol_index.pkl"))
Extracting Symbols from a Single File
from clint.index import extract_symbols_from_file
source_code = open("mlflow/tracking/fluent.py").read()
result = extract_symbols_from_file("mlflow/tracking/fluent.py", source_code)
if result:
imports, functions = result
for name, info in functions.items():
print(f"{name}: args={info.args}")