Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:EvolvingLMMs Lab Lmms eval IFEval Instructions

From Leeroopedia
Revision as of 12:31, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/EvolvingLMMs_Lab_Lmms_eval_IFEval_Instructions.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Knowledge Sources
Domains Natural_Language_Processing, Model_Evaluation, Instruction_Following
Last Updated 2026-02-14 00:00 GMT

Overview

A library of instruction checker classes for evaluating instruction-following capabilities in language models.

Description

This module provides 30+ instruction checker classes that validate whether model responses follow specific formatting and content constraints. Each checker class inherits from the base Instruction class and implements methods to build instruction descriptions, manage instruction arguments, and verify response compliance. The checkers cover diverse constraints including language requirements, sentence/word counts, formatting rules (bullets, sections, paragraphs), keyword presence/absence, content structure, and stylistic requirements.

The implementation is based on Google Research's IFEval (Instruction Following Evaluation) framework and supports verifiable instruction-following testing through programmatic validation.

Usage

Use this module when evaluating a model's ability to follow explicit instructions about response format, structure, content constraints, or stylistic requirements. The checkers enable automated testing of instruction-following capabilities through deterministic validation rules.

Code Reference

Source Location

Signature

class Instruction:
    def __init__(self, instruction_id):
        self.id = instruction_id

    def build_description(self, **kwargs):
        raise NotImplementedError("`build_description` not implemented.")

    def get_instruction_args(self):
        raise NotImplementedError("`get_instruction_args` not implemented.")

    def get_instruction_args_keys(self):
        raise NotImplementedError("`get_instruction_args_keys` not implemented.")

    def check_following(self, value):
        raise NotImplementedError("`check_following` not implemented.")

# Example checker classes:
class ResponseLanguageChecker(Instruction):
    def build_description(self, *, language=None): ...
    def check_following(self, value): ...

class NumberOfSentences(Instruction):
    def build_description(self, *, num_sentences=None, relation=None): ...
    def check_following(self, value): ...

class KeywordChecker(Instruction):
    def build_description(self, *, keywords=None): ...
    def check_following(self, value): ...

Import

from lmms_eval.tasks.ifeval.instructions import (
    Instruction,
    ResponseLanguageChecker,
    NumberOfSentences,
    PlaceholderChecker,
    BulletListChecker,
    KeywordChecker,
    NumberOfWords,
    JsonFormat,
    # ... and 20+ other checker classes
)

I/O Contract

Inputs

Name Type Required Description
instruction_id str Yes Unique identifier for the instruction instance
**kwargs dict Varies Instruction-specific parameters (e.g., num_sentences, keywords, language)
value str Yes The response text to validate (for check_following method)

Outputs

Name Type Description
description str Human-readable instruction text returned by build_description()
instruction_args dict Dictionary of instruction parameters returned by get_instruction_args()
following bool True if response follows the instruction, False otherwise (from check_following())

Available Checker Classes

Structural Checkers

  • NumberOfSentences - Validates sentence count with relational operators
  • NumberOfWords - Validates word count with relational operators
  • PlaceholderChecker - Checks for required placeholders in [bracket] format
  • BulletListChecker - Validates exact number of bullet list items
  • SectionChecker - Validates numbered sections with specific splitter keywords
  • ParagraphChecker - Validates paragraph count with *** dividers
  • ParagraphFirstWordCheck - Validates paragraph count and first word of specific paragraph

Content Checkers

  • KeywordChecker - Validates presence of required keywords
  • KeywordFrequencyChecker - Validates keyword occurrence frequency
  • ForbiddenWords - Validates absence of forbidden words
  • KeySentenceChecker - Validates presence of specific sentences
  • RephraseChecker - Validates rephrasing with specific change patterns
  • RephraseParagraph - Validates rephrasing with word overlap constraints

Format Checkers

  • JsonFormat - Validates JSON format with optional markdown ticks
  • TitleChecker - Validates title wrapped in <<double angular brackets>>
  • QuotationChecker - Validates entire response wrapped in double quotes
  • HighlightSectionChecker - Validates markdown highlights with *asterisks*
  • PostscriptChecker - Validates postscript section with P.S. or P.P.S marker

Language and Style Checkers

  • ResponseLanguageChecker - Validates response language using ISO 639-1 codes
  • CapitalLettersEnglishChecker - Validates ALL CAPS English text
  • LowercaseLettersEnglishChecker - Validates all lowercase English text
  • LetterFrequencyChecker - Validates specific letter occurrence frequency
  • CapitalWordFrequencyChecker - Validates frequency of ALL CAPS words
  • CommaChecker - Validates absence of commas

Structural Flow Checkers

  • ConstrainedResponseChecker - Validates constrained response options
  • ConstrainedStartChecker - Validates response starts with specific phrase
  • EndChecker - Validates response ends with specific phrase
  • TwoResponsesChecker - Validates two distinct responses separated by ******
  • RepeatPromptThenAnswer - Validates prompt repetition before answer

Usage Examples

# Example 1: Check response language
checker = ResponseLanguageChecker(instruction_id="lang_001")
description = checker.build_description(language="fr")
# Returns: "Your ENTIRE response should be in French language, no other language is allowed."
is_following = checker.check_following("Bonjour, comment allez-vous?")
# Returns: True

# Example 2: Check sentence count
checker = NumberOfSentences(instruction_id="sent_001")
description = checker.build_description(num_sentences=5, relation="at least")
# Returns: "Your response should contain at least 5 sentences."
is_following = checker.check_following("Sentence one. Sentence two. Sentence three. Sentence four. Sentence five.")
# Returns: True

# Example 3: Check keywords
checker = KeywordChecker(instruction_id="key_001")
description = checker.build_description(keywords=["python", "programming"])
# Returns: "Include keywords ['programming', 'python'] in the response."
is_following = checker.check_following("I love Python programming and coding.")
# Returns: True

# Example 4: Check bullet list
checker = BulletListChecker(instruction_id="bullet_001")
description = checker.build_description(num_bullets=3)
# Returns: "Your answer must contain exactly 3 bullet points..."
response = "* First point\n* Second point\n* Third point"
is_following = checker.check_following(response)
# Returns: True

# Example 5: Check JSON format
checker = JsonFormat(instruction_id="json_001")
description = checker.build_description()
is_following = checker.check_following('{"name": "John", "age": 30}')
# Returns: True

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment