Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:EvolvingLMMs Lab Lmms eval LLM Judge Factory

From Leeroopedia

Overview

This implementation provides a factory class for creating judge instances based on configuration. It implements the Factory design pattern to abstract provider instantiation, allowing easy switching between different API backends (OpenAI, Azure OpenAI, async variants, dummy implementations) through configuration rather than code changes.

File Location

/tmp/kapso_repo_sslb_59s/lmms_eval/llm_judge/factory.py (54 lines)

Related Principle

LLM as Judge

Dependencies

  • os: Environment variable access
  • typing: Type hints
  • LLM Judge Base: ServerInterface base class
  • LLM Judge Protocol: ServerConfig
  • Provider implementations:
    • OpenAIProvider: OpenAI API implementation
    • AzureOpenAIProvider: Azure OpenAI API implementation
    • AsyncOpenAIProvider: Async OpenAI implementation
    • AsyncAzureOpenAIProvider: Async Azure OpenAI implementation
    • DummyProvider: Testing/mock implementation

Core Components

ProviderFactory

Factory class for creating judge instances based on API type.

Class Attributes

_provider_classes

_provider_classes = {
    "openai": OpenAIProvider,
    "azure": AzureOpenAIProvider,
    "async_openai": AsyncOpenAIProvider,
    "async_azure": AsyncAzureOpenAIProvider,
    "dummy": DummyProvider
}

Dictionary mapping API type strings to provider class implementations. This registry pattern enables:

  • Easy lookup of provider classes
  • Dynamic provider registration
  • Clear enumeration of supported providers

Methods

register_additional_providers

@classmethod
def register_additional_providers(cls):
    """Register additional providers if available"""
    pass

Placeholder method for registering additional providers dynamically. Currently not implemented.

Design Note: The TODO comment indicates this should be a decorator-based registration system in the future.

Future Pattern:

@ProviderFactory.register("custom_provider")
class CustomProvider(ServerInterface):
    ...

create_provider

@classmethod
def create_provider(
    cls,
    api_type: Optional[str] = None,
    config: Optional[ServerConfig] = None
) -> ServerInterface

Create a judge instance based on API type.

Parameters:

  • api_type (Optional[str]): Type of API to use. Supported values:
    • "openai": OpenAI API (synchronous)
    • "azure": Azure OpenAI API (synchronous)
    • "async_openai": OpenAI API (asynchronous)
    • "async_azure": Azure OpenAI API (asynchronous)
    • "dummy": Mock implementation for testing
    • If None, reads from API_TYPE environment variable (defaults to "openai")
  • config (Optional[ServerConfig]): Configuration for the judge instance (model name, temperature, retry settings, etc.)

Returns:

  • ServerInterface: An instance of the appropriate judge implementation

Raises:

  • ValueError: If api_type is not in the supported providers list

Process:

  1. If api_type is None, reads from environment variable API_TYPE (default: "openai")
  2. Converts api_type to lowercase for case-insensitive matching
  3. Validates api_type against _provider_classes registry
  4. Retrieves appropriate provider class from registry
  5. Instantiates and returns provider with given config

Example Usage:

# Explicit provider type
config = ServerConfig(model_name="gpt-4", temperature=0.0)
judge = ProviderFactory.create_provider("openai", config)

# From environment variable
import os
os.environ["API_TYPE"] = "azure"
judge = ProviderFactory.create_provider(config=config)

# Async variant
async_judge = ProviderFactory.create_provider("async_openai", config)

register_provider

@classmethod
def register_provider(cls, api_type: str, judge_class: type):
    """Register a new judge implementation"""
    if not issubclass(judge_class, ServerInterface):
        raise ValueError(f"{judge_class} must be a subclass of ServerInterface")
    cls._provider_classes[api_type] = judge_class

Dynamically register a new judge implementation at runtime.

Parameters:

  • api_type (str): Identifier for the new provider (e.g., "anthropic", "cohere")
  • judge_class (type): Class implementing ServerInterface

Raises:

  • ValueError: If judge_class is not a subclass of ServerInterface

Validation:

  • Ensures type safety by validating inheritance from ServerInterface
  • Prevents registration of incompatible classes

Example:

class AnthropicProvider(ServerInterface):
    def evaluate(self, request):
        # Implementation
        ...

    def is_available(self):
        return True

# Register the provider
ProviderFactory.register_provider("anthropic", AnthropicProvider)

# Now it can be used
judge = ProviderFactory.create_provider("anthropic", config)

Design Patterns

Factory Pattern

Encapsulates object creation logic, allowing clients to request objects by type without knowing concrete class details.

Benefits:

  • Decouples client code from concrete implementations
  • Centralizes provider selection logic
  • Simplifies testing by allowing mock provider injection
  • Enables configuration-driven provider selection

Registry Pattern

_provider_classes dictionary acts as a registry of available providers, supporting dynamic lookup and registration.

Benefits:

  • Easy to extend with new providers
  • Clear enumeration of supported types
  • Runtime provider discovery

Environment-based Configuration

Defaults to environment variable (API_TYPE) when api_type not specified, following 12-factor app principles.

Benefits:

  • Environment-specific configuration without code changes
  • Easier deployment across different environments
  • Configuration through environment variables or explicit parameters

Provider Types

Synchronous Providers

  • openai: Standard OpenAI API client (blocking I/O)
  • azure: Azure OpenAI API client (blocking I/O)
  • dummy: Mock implementation for testing

Use Cases:

  • Simple scripts with sequential evaluation
  • Low-volume evaluation tasks
  • Synchronous application contexts

Asynchronous Providers

  • async_openai: Async OpenAI API client (non-blocking I/O)
  • async_azure: Async Azure OpenAI API client (non-blocking I/O)

Use Cases:

  • High-throughput batch evaluation
  • Concurrent request processing
  • Integration with async frameworks (aiohttp, FastAPI)

Testing Provider

  • dummy: Returns mock responses without API calls

Use Cases:

  • Unit testing evaluation pipelines
  • Development without API access
  • CI/CD environments

Usage Patterns

Basic Usage

from lmms_eval.llm_judge.factory import ProviderFactory
from lmms_eval.llm_judge.protocol import ServerConfig

config = ServerConfig(
    model_name="gpt-4",
    temperature=0.0,
    max_tokens=512,
    timeout=30
)

judge = ProviderFactory.create_provider("openai", config)
result = judge.evaluate_binary(
    question="What is 2+2?",
    answer="4",
    prediction="Four"
)

Environment-Based Configuration

import os

# Set via environment
os.environ["API_TYPE"] = "azure"

# Factory reads from environment
judge = ProviderFactory.create_provider(config=config)

Async Batch Processing

import asyncio

config = ServerConfig(
    model_name="gpt-4-turbo",
    max_concurrent=10  # Limit concurrent requests
)

async_judge = ProviderFactory.create_provider("async_openai", config)

async def batch_evaluate():
    results = await async_judge.evaluate_binary_batch_async(
        questions=["Q1", "Q2", "Q3"],
        answers=["A1", "A2", "A3"],
        predictions=["P1", "P2", "P3"]
    )
    return results

results = asyncio.run(batch_evaluate())

Custom Provider Registration

from lmms_eval.llm_judge.base import ServerInterface
from lmms_eval.llm_judge.protocol import Request, Response

class CustomLLMProvider(ServerInterface):
    def evaluate(self, request: Request) -> Response:
        # Custom implementation
        return Response(
            content="Custom evaluation result",
            model_used=self.config.model_name,
            success=True
        )

    def is_available(self) -> bool:
        return True

# Register
ProviderFactory.register_provider("custom_llm", CustomLLMProvider)

# Use
judge = ProviderFactory.create_provider("custom_llm", config)

Testing with Dummy Provider

# Use dummy provider for testing
test_judge = ProviderFactory.create_provider("dummy")

# Returns mock responses without API calls
result = test_judge.evaluate_binary(
    question="Test question",
    answer="Test answer",
    prediction="Test prediction"
)

Configuration Options

The factory accepts ServerConfig with the following key parameters:

  • model_name: Model identifier (e.g., "gpt-4", "gpt-3.5-turbo")
  • temperature: Sampling temperature (default: 0.0 for deterministic evaluation)
  • max_tokens: Maximum response length (default: 1024)
  • timeout: API request timeout in seconds (default: 60)
  • num_retries: Number of retry attempts on failure (default: 5)
  • retry_delay: Delay between retries in seconds (default: 10)
  • max_concurrent: Maximum concurrent requests for async providers (default: 10)

Error Handling

Unknown API Type:

try:
    judge = ProviderFactory.create_provider("unknown_type")
except ValueError as e:
    print(f"Error: {e}")
    # Error: Unknown API type: unknown_type.
    # Supported types: ['openai', 'azure', 'async_openai', 'async_azure', 'dummy']

Invalid Provider Class:

class NotAJudge:
    pass

try:
    ProviderFactory.register_provider("bad", NotAJudge)
except ValueError as e:
    print(f"Error: {e}")
    # Error: <class 'NotAJudge'> must be a subclass of ServerInterface

Extension Guidelines

Adding a New Provider

  1. Implement ServerInterface or AsyncServerInterface
  2. Register via register_provider()
  3. Document provider-specific configuration requirements
  4. Add to _provider_classes if part of core distribution

Example:

from lmms_eval.llm_judge.base import AsyncServerInterface

class HuggingFaceProvider(AsyncServerInterface):
    async def evaluate_async(self, request: Request) -> Response:
        # HuggingFace API implementation
        ...

    def is_available(self) -> bool:
        # Check HF API availability
        ...

ProviderFactory.register_provider("huggingface", HuggingFaceProvider)

Related Implementations

  • LLM Judge Base: Base classes that providers must implement
  • LLM Judge Protocol: ServerConfig and other protocol types
  • Provider implementations in lmms_eval/llm_judge/providers/

Best Practices

  1. Use environment variables for production deployments to avoid hardcoding API types
  2. Choose async providers for batch evaluation (>10 requests)
  3. Use dummy provider for unit tests to avoid API costs
  4. Set appropriate max_concurrent limits to respect API rate limits
  5. Register custom providers rather than modifying factory code directly
  6. Validate configurations before passing to factory

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment