Implementation:EvolvingLMMs Lab Lmms eval IFEval Instructions

Knowledge Sources	EvolvingLMMs_Lab_Lmms_eval
Domains	Natural_Language_Processing, Model_Evaluation, Instruction_Following
Last Updated	2026-02-14 00:00 GMT

Overview

A library of instruction checker classes for evaluating instruction-following capabilities in language models.

Description

This module provides 30+ instruction checker classes that validate whether model responses follow specific formatting and content constraints. Each checker class inherits from the base Instruction class and implements methods to build instruction descriptions, manage instruction arguments, and verify response compliance. The checkers cover diverse constraints including language requirements, sentence/word counts, formatting rules (bullets, sections, paragraphs), keyword presence/absence, content structure, and stylistic requirements.

The implementation is based on Google Research's IFEval (Instruction Following Evaluation) framework and supports verifiable instruction-following testing through programmatic validation.

Usage

Use this module when evaluating a model's ability to follow explicit instructions about response format, structure, content constraints, or stylistic requirements. The checkers enable automated testing of instruction-following capabilities through deterministic validation rules.

Code Reference

Source Location

Repository: EvolvingLMMs_Lab_Lmms_eval
File: lmms_eval/tasks/ifeval/instructions.py

Signature

class Instruction:
    def __init__(self, instruction_id):
        self.id = instruction_id

    def build_description(self, **kwargs):
        raise NotImplementedError("`build_description` not implemented.")

    def get_instruction_args(self):
        raise NotImplementedError("`get_instruction_args` not implemented.")

    def get_instruction_args_keys(self):
        raise NotImplementedError("`get_instruction_args_keys` not implemented.")

    def check_following(self, value):
        raise NotImplementedError("`check_following` not implemented.")

# Example checker classes:
class ResponseLanguageChecker(Instruction):
    def build_description(self, *, language=None): ...
    def check_following(self, value): ...

class NumberOfSentences(Instruction):
    def build_description(self, *, num_sentences=None, relation=None): ...
    def check_following(self, value): ...

class KeywordChecker(Instruction):
    def build_description(self, *, keywords=None): ...
    def check_following(self, value): ...

Import

from lmms_eval.tasks.ifeval.instructions import (
    Instruction,
    ResponseLanguageChecker,
    NumberOfSentences,
    PlaceholderChecker,
    BulletListChecker,
    KeywordChecker,
    NumberOfWords,
    JsonFormat,
    # ... and 20+ other checker classes
)

I/O Contract

Inputs

Name	Type	Required	Description
instruction_id	str	Yes	Unique identifier for the instruction instance
**kwargs	dict	Varies	Instruction-specific parameters (e.g., num_sentences, keywords, language)
value	str	Yes	The response text to validate (for check_following method)

Outputs

Name	Type	Description
description	str	Human-readable instruction text returned by build_description()
instruction_args	dict	Dictionary of instruction parameters returned by get_instruction_args()
following	bool	True if response follows the instruction, False otherwise (from check_following())

Available Checker Classes

Structural Checkers

NumberOfSentences - Validates sentence count with relational operators
NumberOfWords - Validates word count with relational operators
PlaceholderChecker - Checks for required placeholders in [bracket] format
BulletListChecker - Validates exact number of bullet list items
SectionChecker - Validates numbered sections with specific splitter keywords
ParagraphChecker - Validates paragraph count with *** dividers
ParagraphFirstWordCheck - Validates paragraph count and first word of specific paragraph

Content Checkers

KeywordChecker - Validates presence of required keywords
KeywordFrequencyChecker - Validates keyword occurrence frequency
ForbiddenWords - Validates absence of forbidden words
KeySentenceChecker - Validates presence of specific sentences
RephraseChecker - Validates rephrasing with specific change patterns
RephraseParagraph - Validates rephrasing with word overlap constraints

Format Checkers

JsonFormat - Validates JSON format with optional markdown ticks
TitleChecker - Validates title wrapped in <<double angular brackets>>
QuotationChecker - Validates entire response wrapped in double quotes
HighlightSectionChecker - Validates markdown highlights with *asterisks*
PostscriptChecker - Validates postscript section with P.S. or P.P.S marker

Language and Style Checkers

ResponseLanguageChecker - Validates response language using ISO 639-1 codes
CapitalLettersEnglishChecker - Validates ALL CAPS English text
LowercaseLettersEnglishChecker - Validates all lowercase English text
LetterFrequencyChecker - Validates specific letter occurrence frequency
CapitalWordFrequencyChecker - Validates frequency of ALL CAPS words
CommaChecker - Validates absence of commas

Structural Flow Checkers

ConstrainedResponseChecker - Validates constrained response options
ConstrainedStartChecker - Validates response starts with specific phrase
EndChecker - Validates response ends with specific phrase
TwoResponsesChecker - Validates two distinct responses separated by ******
RepeatPromptThenAnswer - Validates prompt repetition before answer

Usage Examples

# Example 1: Check response language
checker = ResponseLanguageChecker(instruction_id="lang_001")
description = checker.build_description(language="fr")
# Returns: "Your ENTIRE response should be in French language, no other language is allowed."
is_following = checker.check_following("Bonjour, comment allez-vous?")
# Returns: True

# Example 2: Check sentence count
checker = NumberOfSentences(instruction_id="sent_001")
description = checker.build_description(num_sentences=5, relation="at least")
# Returns: "Your response should contain at least 5 sentences."
is_following = checker.check_following("Sentence one. Sentence two. Sentence three. Sentence four. Sentence five.")
# Returns: True

# Example 3: Check keywords
checker = KeywordChecker(instruction_id="key_001")
description = checker.build_description(keywords=["python", "programming"])
# Returns: "Include keywords ['programming', 'python'] in the response."
is_following = checker.check_following("I love Python programming and coding.")
# Returns: True

# Example 4: Check bullet list
checker = BulletListChecker(instruction_id="bullet_001")
description = checker.build_description(num_bullets=3)
# Returns: "Your answer must contain exactly 3 bullet points..."
response = "* First point\n* Second point\n* Third point"
is_following = checker.check_following(response)
# Returns: True

# Example 5: Check JSON format
checker = JsonFormat(instruction_id="json_001")
description = checker.build_description()
is_following = checker.check_following('{"name": "John", "age": 30}')
# Returns: True

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment