Implementation:EvolvingLMMs Lab Lmms eval IFEval Instructions
| Knowledge Sources | |
|---|---|
| Domains | Natural_Language_Processing, Model_Evaluation, Instruction_Following |
| Last Updated | 2026-02-14 00:00 GMT |
Overview
A library of instruction checker classes for evaluating instruction-following capabilities in language models.
Description
This module provides 30+ instruction checker classes that validate whether model responses follow specific formatting and content constraints. Each checker class inherits from the base Instruction class and implements methods to build instruction descriptions, manage instruction arguments, and verify response compliance. The checkers cover diverse constraints including language requirements, sentence/word counts, formatting rules (bullets, sections, paragraphs), keyword presence/absence, content structure, and stylistic requirements.
The implementation is based on Google Research's IFEval (Instruction Following Evaluation) framework and supports verifiable instruction-following testing through programmatic validation.
Usage
Use this module when evaluating a model's ability to follow explicit instructions about response format, structure, content constraints, or stylistic requirements. The checkers enable automated testing of instruction-following capabilities through deterministic validation rules.
Code Reference
Source Location
- Repository: EvolvingLMMs_Lab_Lmms_eval
- File: lmms_eval/tasks/ifeval/instructions.py
Signature
class Instruction:
def __init__(self, instruction_id):
self.id = instruction_id
def build_description(self, **kwargs):
raise NotImplementedError("`build_description` not implemented.")
def get_instruction_args(self):
raise NotImplementedError("`get_instruction_args` not implemented.")
def get_instruction_args_keys(self):
raise NotImplementedError("`get_instruction_args_keys` not implemented.")
def check_following(self, value):
raise NotImplementedError("`check_following` not implemented.")
# Example checker classes:
class ResponseLanguageChecker(Instruction):
def build_description(self, *, language=None): ...
def check_following(self, value): ...
class NumberOfSentences(Instruction):
def build_description(self, *, num_sentences=None, relation=None): ...
def check_following(self, value): ...
class KeywordChecker(Instruction):
def build_description(self, *, keywords=None): ...
def check_following(self, value): ...
Import
from lmms_eval.tasks.ifeval.instructions import (
Instruction,
ResponseLanguageChecker,
NumberOfSentences,
PlaceholderChecker,
BulletListChecker,
KeywordChecker,
NumberOfWords,
JsonFormat,
# ... and 20+ other checker classes
)
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| instruction_id | str | Yes | Unique identifier for the instruction instance |
| **kwargs | dict | Varies | Instruction-specific parameters (e.g., num_sentences, keywords, language) |
| value | str | Yes | The response text to validate (for check_following method) |
Outputs
| Name | Type | Description |
|---|---|---|
| description | str | Human-readable instruction text returned by build_description() |
| instruction_args | dict | Dictionary of instruction parameters returned by get_instruction_args() |
| following | bool | True if response follows the instruction, False otherwise (from check_following()) |
Available Checker Classes
Structural Checkers
- NumberOfSentences - Validates sentence count with relational operators
- NumberOfWords - Validates word count with relational operators
- PlaceholderChecker - Checks for required placeholders in [bracket] format
- BulletListChecker - Validates exact number of bullet list items
- SectionChecker - Validates numbered sections with specific splitter keywords
- ParagraphChecker - Validates paragraph count with *** dividers
- ParagraphFirstWordCheck - Validates paragraph count and first word of specific paragraph
Content Checkers
- KeywordChecker - Validates presence of required keywords
- KeywordFrequencyChecker - Validates keyword occurrence frequency
- ForbiddenWords - Validates absence of forbidden words
- KeySentenceChecker - Validates presence of specific sentences
- RephraseChecker - Validates rephrasing with specific change patterns
- RephraseParagraph - Validates rephrasing with word overlap constraints
Format Checkers
- JsonFormat - Validates JSON format with optional markdown ticks
- TitleChecker - Validates title wrapped in <<double angular brackets>>
- QuotationChecker - Validates entire response wrapped in double quotes
- HighlightSectionChecker - Validates markdown highlights with *asterisks*
- PostscriptChecker - Validates postscript section with P.S. or P.P.S marker
Language and Style Checkers
- ResponseLanguageChecker - Validates response language using ISO 639-1 codes
- CapitalLettersEnglishChecker - Validates ALL CAPS English text
- LowercaseLettersEnglishChecker - Validates all lowercase English text
- LetterFrequencyChecker - Validates specific letter occurrence frequency
- CapitalWordFrequencyChecker - Validates frequency of ALL CAPS words
- CommaChecker - Validates absence of commas
Structural Flow Checkers
- ConstrainedResponseChecker - Validates constrained response options
- ConstrainedStartChecker - Validates response starts with specific phrase
- EndChecker - Validates response ends with specific phrase
- TwoResponsesChecker - Validates two distinct responses separated by ******
- RepeatPromptThenAnswer - Validates prompt repetition before answer
Usage Examples
# Example 1: Check response language
checker = ResponseLanguageChecker(instruction_id="lang_001")
description = checker.build_description(language="fr")
# Returns: "Your ENTIRE response should be in French language, no other language is allowed."
is_following = checker.check_following("Bonjour, comment allez-vous?")
# Returns: True
# Example 2: Check sentence count
checker = NumberOfSentences(instruction_id="sent_001")
description = checker.build_description(num_sentences=5, relation="at least")
# Returns: "Your response should contain at least 5 sentences."
is_following = checker.check_following("Sentence one. Sentence two. Sentence three. Sentence four. Sentence five.")
# Returns: True
# Example 3: Check keywords
checker = KeywordChecker(instruction_id="key_001")
description = checker.build_description(keywords=["python", "programming"])
# Returns: "Include keywords ['programming', 'python'] in the response."
is_following = checker.check_following("I love Python programming and coding.")
# Returns: True
# Example 4: Check bullet list
checker = BulletListChecker(instruction_id="bullet_001")
description = checker.build_description(num_bullets=3)
# Returns: "Your answer must contain exactly 3 bullet points..."
response = "* First point\n* Second point\n* Third point"
is_following = checker.check_following(response)
# Returns: True
# Example 5: Check JSON format
checker = JsonFormat(instruction_id="json_001")
description = checker.build_description()
is_following = checker.check_following('{"name": "John", "age": 30}')
# Returns: True