Heuristic:Mbzuai oryx Awesome LLM Post training Excel Sheet Name Truncation
| Knowledge Sources | |
|---|---|
| Domains | Data_Export, Debugging |
| Last Updated | 2026-02-08 08:00 GMT |
Overview
Defensive truncation of Excel sheet names to 31 characters to comply with the Excel format specification and prevent export failures.
Description
The Excel file format (.xlsx) imposes a hard limit of 31 characters on worksheet names. When exporting research trend data to Excel with one sheet per keyword, research keywords that exceed 31 characters would cause an InvalidWorksheetName error from openpyxl. The code applies `keyword[:31]` truncation to preemptively avoid this error.
Usage
Apply this heuristic whenever creating Excel files with dynamically generated sheet names. Research keywords, category names, or other user-generated strings may exceed the 31-character Excel limit.
The Insight (Rule of Thumb)
- Action: Truncate Excel sheet names to 31 characters using `string[:31]` before passing to `df.to_excel()`.
- Value: Maximum 31 characters per sheet name (Excel specification hard limit).
- Trade-off: Long keywords are silently truncated, which can cause name collisions if two keywords share the same first 31 characters. No deduplication logic is present in the current implementation.
Reasoning
The Excel file format (OOXML/.xlsx) has a firm 31-character limit on worksheet names inherited from the original Excel binary format. Violating this limit causes openpyxl to raise an error, crashing the export process after potentially hours of data collection. The truncation is a simple defensive measure that prevents this failure mode.
Potential issue: Two keywords like "Reinforcement Learning from Human Feedback in Large Language Models" and "Reinforcement Learning from Human Feedback for Code Generation" would both truncate to "Reinforcement Learning from Hum", causing a sheet name collision. The current code does not handle this edge case.
Code evidence from `scripts/future_research_data.py:97-98`:
# Excel sheet names have a maximum of 31 characters
sheet_name = keyword[:31]